Close Menu
AsiaTokenFundAsiaTokenFund
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
What's Hot

Breaking: Ripple and SEC Files Joint Dismissal of the Appeals – XRP Price Up 8%

August 7, 2025

Trump signs executive order to end banking discrimination against crypto industry

August 7, 2025

President Trump Officially Signs Executive Order to Allow 401(k) To Tap into Bitcoin and Crypto Assets

August 7, 2025
Facebook X (Twitter) Instagram
Facebook X (Twitter) YouTube LinkedIn
AsiaTokenFundAsiaTokenFund
ATF Capital
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
AsiaTokenFundAsiaTokenFund

Perplexity AI Leverages NVIDIA Inference Stack to Handle 435 Million Monthly Queries

0
By Aggregated - see source on December 6, 2024 Blockchain
Share
Facebook Twitter LinkedIn Pinterest Email


Terrill Dicki
Dec 06, 2024 04:17

Perplexity AI utilizes NVIDIA’s inference stack, including H100 Tensor Core GPUs and Triton Inference Server, to manage over 435 million search queries monthly, optimizing performance and reducing costs.





Perplexity AI, a leading AI-powered search engine, is successfully managing over 435 million search queries each month, thanks to NVIDIA’s advanced inference stack. The platform has integrated NVIDIA H100 Tensor Core GPUs, Triton Inference Server, and TensorRT-LLM to efficiently deploy large language models (LLMs), according to NVIDIA’s official blog.

Serving Multiple AI Models

To meet diverse user demands, Perplexity AI operates over 20 AI models simultaneously, including variations of the open-source Llama 3.1 models. Each user request is matched with the most suitable model using smaller classifier models that determine user intent. These models are deployed across GPU pods, each managed by an NVIDIA Triton Inference Server, ensuring efficiency under strict service-level agreements (SLAs).

The pods are hosted within a Kubernetes cluster, featuring an in-house front-end scheduler that directs traffic based on load and usage. This ensures consistent SLA adherence, optimizing performance and resource utilization.

Optimizing Performance and Costs

Perplexity AI employs a comprehensive A/B testing strategy to define SLAs for varied use cases. This process aims to maximize GPU utilization while maintaining target SLAs, optimizing inference serving costs. Smaller models focus on minimizing latency, while larger, user-facing models like Llama 8B, 70B, and 405B undergo detailed performance analysis to balance costs and user experience.

Performance is further enhanced by parallelizing model deployment across multiple GPUs, increasing tensor parallelism to achieve lower serving costs for latency-sensitive requests. This strategic approach has enabled Perplexity to save approximately $1 million annually by hosting models on cloud-based NVIDIA GPUs, surpassing third-party LLM API service costs.

Innovative Techniques for Enhanced Throughput

Perplexity AI is collaborating with NVIDIA to implement ‘disaggregating serving,’ a method that separates inference phases onto different GPUs, significantly boosting throughput while adhering to SLAs. This flexibility allows Perplexity to utilize various NVIDIA GPU products to optimize performance and cost-efficiency.

Further improvements are anticipated with the upcoming NVIDIA Blackwell platform, promising substantial performance gains through technological innovations, including a second-generation Transformer Engine and advanced NVLink capabilities.

Perplexity’s strategic use of NVIDIA’s inference stack underscores the potential for AI-powered platforms to manage vast query volumes efficiently, delivering high-quality user experiences while maintaining cost-effectiveness.

Image source: Shutterstock


Credit: Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Blockchain Association’s Summer Mersinger Praises Exec Orders

August 7, 2025

NYDFS Secures Settlement From Paxos Over Binance Dealings

August 7, 2025

Singapore’s Digital Finance Model in Focus at the BFSI IT Summit – Singapore 2025

August 7, 2025
Leave A Reply Cancel Reply

What's New Here!

Breaking: Ripple and SEC Files Joint Dismissal of the Appeals – XRP Price Up 8%

August 7, 2025

Trump signs executive order to end banking discrimination against crypto industry

August 7, 2025

President Trump Officially Signs Executive Order to Allow 401(k) To Tap into Bitcoin and Crypto Assets

August 7, 2025

Blockchain Association’s Summer Mersinger Praises Exec Orders

August 7, 2025
AsiaTokenFund
Facebook X (Twitter) LinkedIn YouTube
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
© 2025 asiatokenfund.com - All Rights Reserved!

Type above and press Enter to search. Press Esc to cancel.

Ad Blocker Enabled!
Ad Blocker Enabled!
Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.