MiniMax M2.7 Brings 230B-Parameter AI Model to NVIDIA Infrastructure

Ted Hisokawa
Apr 12, 2026 01:37

MiniMax releases M2.7, a 230B-parameter mixture-of-experts model optimized for NVIDIA GPUs with up to 2.7x throughput gains on Blackwell hardware.

MiniMax has released M2.7, a 230-billion parameter open-weights AI model designed specifically for autonomous agent workflows, now available across NVIDIA’s inference ecosystem including the company’s latest Blackwell Ultra GPUs.

The model represents a significant efficiency play in enterprise AI. Despite its massive 230B total parameters, M2.7 activates only 10B parameters per token—a 4.3% activation rate achieved through mixture-of-experts (MoE) architecture with 256 local experts. This keeps inference costs manageable while maintaining the reasoning capacity of a much larger model.

Performance Numbers on Blackwell

NVIDIA collaborated with open source communities to optimize M2.7 for production workloads. Two key optimizations—a fused QK RMS Norm kernel and FP8 MoE integration from TensorRT-LLM—delivered substantial throughput improvements on Blackwell Ultra GPUs.

Testing with a 1K/1K input/output sequence length dataset showed vLLM achieved up to 2.5x throughput improvement, while SGLang hit 2.7x gains. Both optimizations were implemented within a single month, suggesting further performance headroom exists.

Technical Architecture

M2.7 supports 200K input context length across 62 layers, using multi-head causal self-attention with Rotary Position Embeddings (RoPE). A top-k expert routing mechanism activates only 8 of the 256 experts for any given input, which is how the model maintains low inference costs despite its scale.

The architecture targets coding challenges and complex agentic tasks—workflows where AI systems need to plan, execute, and iterate autonomously rather than respond to single prompts.

Deployment Options

Developers can access M2.7 through multiple channels. NVIDIA’s NemoClaw reference stack provides a one-click deployment for running autonomous agents with OpenShell runtime. The model is also available through NVIDIA NIM containerized microservices for on-premise, cloud, or hybrid deployments.

For teams wanting to customize the model, NVIDIA’s NeMo AutoModel library supports fine-tuning with published recipes. Reinforcement learning workflows are available through NeMo RL with sample configurations for 8K and 16K sequence lengths.

Free GPU-accelerated endpoints on build.nvidia.com allow testing before committing to infrastructure. The open weights are also available on Hugging Face for self-hosted deployments.

The release positions MiniMax as a credible alternative to closed models from OpenAI and Anthropic for enterprises building autonomous AI systems, particularly those already invested in NVIDIA infrastructure.

Image source: Shutterstock

Credit: Source link

What's Hot

MiniMax M2.7 Brings 230B-Parameter AI Model to NVIDIA Infrastructure

Weekend Crypto Perps Are Signal, Not Noise, Binance Research Finds – Bitcoin News

Dogecoin Cracks Again: BTC Pair Collapse Signals Imminent Drop To $0.07

MiniMax M2.7 Brings 230B-Parameter AI Model to NVIDIA Infrastructure

Verifiable AI Agents Emerge From Ethereum Hackathon With Real Use Cases

LangChain Warns AI Agent Memory Lock-In Could Create Vendor Monopolies

AAVE Price Prediction: Targets $108 by April 13th Amid Mixed Technical Signals

MiniMax M2.7 Brings 230B-Parameter AI Model to NVIDIA Infrastructure

Weekend Crypto Perps Are Signal, Not Noise, Binance Research Finds – Bitcoin News

Dogecoin Cracks Again: BTC Pair Collapse Signals Imminent Drop To $0.07

how does blockchain improve privacy

What's Hot

Performance Numbers on Blackwell

Technical Architecture

Deployment Options

Related Posts