↓Skip to main content

AMD at Hot Chips 2025: Deep Dive into CDNA 4 Architecture and MI350 Accelerators

1 September 2025·430 words·3 mins

AMD CDNA 4 MI350

Table of Contents

Table of Contents

AMD MI350 Accelerator

At Hot Chips 2025, AMD architects presented a deep dive into the CDNA 4 architecture powering the new MI350 accelerator family. Building on the MI300 foundation, MI350 introduces major architectural refinements and performance enhancements.

The AI Boom and Hardware Demands
#

Large Language Models: Explosive Growth

Large Language Models (LLMs) continue to scale rapidly, requiring longer context lengths and greater memory capacity.

GenAI Needs

To sustain performance, hardware must deliver:

Higher memory bandwidth and capacity
Better energy efficiency
Scalable multi-GPU clustering for massive AI models

MI350 Series Launch
#

Instinct MI350 Series

The MI350 family is now shipping, with two platform options:

MI350X → air-cooled
MI355X → liquid-cooled

Architectural Highlights
#

MI350 Architecture Enhancements

185 billion transistors
Chiplet + 3D stacking design
8 compute dies stacked across 2 I/O dies
Compute dies built on TSMC N3P 3nm
I/O dies remain on 6nm
Peak frequency: 2.4 GHz
Liquid-cooled TDP: 1.4 kW

MI350 GPU Chiplets

Infinity Fabric upgraded to IF 4:

+2 TB/s bandwidth vs IF 3
Fewer cross-die links → wider, lower-frequency D2D connections → higher efficiency
7 IF links per socket

MI350 GPU Cache & Hierarchy

Cache improvements:

LDS doubled compared to MI300
Each XCD has 4 MB L2 cache with coherence across dies

Data Formats and Compute Performance
#

Supported Data Formats

CDNA 4 introduces:

New FP6 and FP4 formats
Nearly 2× throughput for key data types

Supported Data Formats Performance Comparison

→ AI math performance is now over 2× faster than competing accelerators.

System and Platform Design
#

Flexible GPU Partitioning

Configurable as single NUMA domain or dual NUMA domains
XCDs can be partitioned into multiple logical GPUs

Infinity Platform

Connectivity:

Up to 8 GPUs in a fully connected topology via Infinity Fabric
PCIe connects GPUs to CPUs and NICs

Air Cooled UBB

OAM modules + universal baseboard (UBB):

Supports 8 GPUs per board
Air-cooled rack: up to 64 GPUs
Liquid-cooled rack: up to 96–128 GPUs

Software and Performance
#

ROCm 7

The ROCm 7 software stack is maturing alongside hardware, improving overall performance.

Inference Performance

GPU Training Performance

Inference and training benchmarks show strong gains across workloads.

Roadmap Outlook
#

Annual Roadmap

AMD reaffirmed its roadmap:

MI350 shipping now
MI400 arriving next year with up to 10× AI performance uplift

Instinct MI400

Conclusion
#

MI350/CDNA 4 continues the chiplet + 3D stacking strategy
Bandwidth, cache, and efficiency are significantly improved
AI data formats expanded (FP6, FP4), nearly doubling math throughput
Flexible system design: NUMA partitioning and large-scale GPU topologies
ROCm software keeps pace with hardware gains
Roadmap remains solid with MI400 on the horizon

Related

AMD Ryzen 5 5500X3D: First Benchmark Leak for Budget-Friendly X3D CPU

16 August 2025·547 words·3 mins

AMD Ryzen 5 5500X3D 3D V-Cache AM4 Gaming CPU

AMD Prepares to Launch Dual 3D V-Cache Ryzen 9000 Processors

5 August 2025·515 words·3 mins

AMD Raises MI350 Price by 70% to $25,000, Targeting AI Accelerator Leadership

29 July 2025·600 words·3 mins

AMD MI350 AI Accelerator CDNA 4 HBM3E Price Increase NVIDIA Blackwell Competitor