Close Menu
AsiaTokenFundAsiaTokenFund
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
What's Hot

Breaking: Ripple’s XRP Hits New ATH Following Successful Crypto Week

July 17, 2025

Crypto Bills Pass Final House Vote: GENIUS Act Headed to President Trump’s Desk

July 17, 2025

Congress moves forward on digital asset regulations with GENIUS, CLARITY Acts

July 17, 2025
Facebook X (Twitter) Instagram
Facebook X (Twitter) YouTube LinkedIn
AsiaTokenFundAsiaTokenFund
ATF Capital
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
AsiaTokenFundAsiaTokenFund

NVIDIA’s CUTLASS 3.x Enhances GEMM Kernel Design with Modular Abstractions

0
By Aggregated - see source on July 17, 2025 Blockchain
Share
Facebook Twitter LinkedIn Pinterest Email


Caroline Bishop
Jul 17, 2025 14:52

NVIDIA’s CUTLASS 3.x introduces a modular, hierarchical system for GEMM kernel design, improving code readability and extending support to newer architectures like Hopper and Blackwell.





NVIDIA’s latest iteration of its CUDA Templates for Linear Algebra Subroutines and Solvers, known as CUTLASS 3.x, introduces a modular and hierarchical approach to General Matrix Multiply (GEMM) kernel design. This update aims to maximize the flexibility and performance of GEMM implementations across various NVIDIA architectures, according to NVIDIA’s announcement on their developer blog.

Innovative Hierarchical System

The redesign in CUTLASS 3.x focuses on a hierarchical system of composable and orthogonal building blocks. This structure allows for extensive customization through template parameters, enabling developers to either rely on high-level abstractions for performance or delve into lower layers for more advanced modifications. Such flexibility is crucial for adapting to diverse hardware specifications and user requirements.

Architectural Support and Code Readability

With the introduction of CUTLASS 3.x, NVIDIA extends support to its latest architectures, including Hopper and Blackwell, enhancing the library’s applicability to modern GPU designs. The redesign also significantly improves code readability, making it easier for developers to implement and optimize GEMM kernels.

Conceptual GEMM Hierarchy

The conceptual GEMM hierarchy in CUTLASS 3.x is independent of specific hardware features, structured into five layers: Atom, Tiled MMA/Copy, Collective, Kernel, and Device layers. Each layer serves as a point of composition for abstractions from the previous layer, allowing for high customization and performance optimization.

Collective Layer Enhancements

The collective layer, encompassing both mainloop and epilogue components, orchestrates the execution of spatial micro-kernels and post-processing operations. This layer leverages hardware-accelerated synchronization primitives to manage pipelines and asynchronous operations, crucial for optimizing performance on modern GPUs.

Kernel and Device Layer Innovations

The kernel layer in CUTLASS 3.x assembles collective components into a device kernel, facilitating execution over a grid of threadblocks or clusters. Meanwhile, the device layer provides host-side logic for kernel launch, supporting features like cluster support and CUDA stream management.

Conclusion

Through CUTLASS 3.x, NVIDIA offers a comprehensive and adaptable framework for GEMM kernel design, catering to the needs of developers working with advanced GPU architectures. This release underscores NVIDIA’s commitment to providing robust tools for optimizing computational workloads, enhancing both performance and developer experience.

For more details, refer to the official announcement on the NVIDIA Developer Blog.

Image source: Shutterstock


Credit: Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Russia’s Sberbank Seeks Crypto Custody Green Light

July 17, 2025

Shiba Inu (SHIB) Gains Momentum Amid AI Innovations, Whale Activity, and Technical Breakouts

July 17, 2025

Trump-Linked World Liberty Tokens Set to Trade

July 17, 2025
Leave A Reply Cancel Reply

What's New Here!

Breaking: Ripple’s XRP Hits New ATH Following Successful Crypto Week

July 17, 2025

Crypto Bills Pass Final House Vote: GENIUS Act Headed to President Trump’s Desk

July 17, 2025

Congress moves forward on digital asset regulations with GENIUS, CLARITY Acts

July 17, 2025

Ethereum: Is this the start of a 2017-style ETH rally? – Data suggests…

July 17, 2025
AsiaTokenFund
Facebook X (Twitter) LinkedIn YouTube
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
© 2025 asiatokenfund.com - All Rights Reserved!

Type above and press Enter to search. Press Esc to cancel.

Ad Blocker Enabled!
Ad Blocker Enabled!
Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.