Close Menu
AsiaTokenFundAsiaTokenFund
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
What's Hot

Norway Pauses CBDC Plans as Norges Bank Says No Immediate Need for Digital Krone

December 11, 2025

Bitcoin Crashes Below $90K as $520M Liquidations Hit, On-Chain Data Hint Deeper Crash

December 11, 2025

MEXC’s ELIZAOS Euphoria Campaign Concludes with 22,000+ Participants and $53.5 Billion in Futures Volume

December 11, 2025
Facebook X (Twitter) Instagram
Facebook X (Twitter) YouTube LinkedIn
AsiaTokenFundAsiaTokenFund
ATF Capital
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
AsiaTokenFundAsiaTokenFund

NVIDIA’s NCCL 2.24 Enhances Networking Reliability and Observability

0
By Aggregated - see source on March 14, 2025 Blockchain
Share
Facebook Twitter LinkedIn Pinterest Email


Joerg Hiller
Mar 14, 2025 02:22

NVIDIA’s latest NCCL 2.24 release introduces new features to enhance multi-GPU and multinode communication, including RAS subsystem, NIC Fusion, and FP8 support, optimizing deep learning training.





The NVIDIA Collective Communications Library (NCCL) has introduced its latest version, 2.24, bringing significant advancements in networking reliability and observability for multi-GPU and multinode (MGMN) communication. As reported by NVIDIA Developer Blog, this release is optimized specifically for NVIDIA GPUs and networking, making it an essential component for multi-GPU deep learning training.

NCCL 2.24 New Features

The update includes several new features aimed at enhancing performance and reliability:

  • Reliability, Availability, and Serviceability (RAS) subsystem
  • User Buffer (UB) registration for multinode collectives
  • NIC Fusion
  • Optional receive completions
  • FP8 support
  • Strict enforcement of NCCL_ALGO and NCCL_PROTO

The RAS Subsystem

The RAS subsystem is one of the standout additions in NCCL 2.24. It is designed to assist users in diagnosing application issues like crashes and hangs, particularly in large-scale deployments. This low-overhead infrastructure offers a global view of running applications, enabling the detection of anomalies such as unresponsive nodes or lagging processes. It operates by creating a network of threads across NCCL processes that monitor each other’s health through regular keep-alive messages.

Enhancements in User Buffer Registration

NCCL 2.24 introduces user buffer (UB) registration for multinode collectives, allowing more efficient data transfer and reduced GPU resource consumption. The library now supports UB registration for multiple ranks-per-node collective networking and standard peer-to-peer networks, offering significant performance gains, particularly for operations like AllGather and Broadcast.

NIC Fusion

With the expansion of many-NIC systems, NCCL has adapted to optimize network communication. The new NIC Fusion feature allows the logical merging of multiple NICs into a single entity, ensuring efficient use of network resources. This capability is particularly beneficial for systems with more than one NIC per GPU, addressing issues such as crashes and inefficient resource allocation.

Additional Features and Fixes

The update also introduces optional receive completions for LL and LL128 protocols, allowing for reduced overhead and congestion. NCCL 2.24 supports native FP8 reductions on NVIDIA Hopper and newer architectures, enhancing processing capabilities. Additionally, stricter enforcement of NCCL_ALGO and NCCL_PROTO is implemented, ensuring more precise tuning and error handling for users.

This update also includes various bug fixes and minor improvements, such as adjustments to PAT tuning and enhancements in memory allocation functions, enhancing the overall robustness and efficiency of the NCCL library.

Image source: Shutterstock


Credit: Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Crypto Now Tops 10% For Half Of APAC’s Wealthy

December 11, 2025

Why Traders Trust Prediction Markets

December 10, 2025

BNB Chain Highlights: Key Metrics and Ecosystem Developments

December 10, 2025
Leave A Reply Cancel Reply

What's New Here!

Norway Pauses CBDC Plans as Norges Bank Says No Immediate Need for Digital Krone

December 11, 2025

Bitcoin Crashes Below $90K as $520M Liquidations Hit, On-Chain Data Hint Deeper Crash

December 11, 2025

MEXC’s ELIZAOS Euphoria Campaign Concludes with 22,000+ Participants and $53.5 Billion in Futures Volume

December 11, 2025

Coinbase Makes All Solana Tokens Accessible With New In-App DEX Upgrade

December 11, 2025
AsiaTokenFund
Facebook X (Twitter) LinkedIn YouTube
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
© 2025 asiatokenfund.com - All Rights Reserved!

Type above and press Enter to search. Press Esc to cancel.

Ad Blocker Enabled!
Ad Blocker Enabled!
Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.