### **Differentiated Access Memories**

Philip Levis and Caroline Trippel First MemoryDAX/DAM Winter Workshop 1/10/2025



## In One Slide

- Memory is increasingly the bottleneck in computing systems
  - ML accelerators: SRAM, HBM bandwidth and capacity
  - Cloud and datacenter servers: DDR bandwidth and cost
- Further gains in performance require transformative changes to memory
- Understanding *which* changes requires understanding many layers
  - Devices and circuits: what memories are possible and what are their tradeoffs?
  - Architecture: how do we organize memories and maintain coherence?
  - Software: what do we need memory to do?
- We've just started a 5-year project to explore and help define the future of memory: 6 months in

### Our Thesis

- Over the past 20 years, processors have increasingly relied on specialization to improve performance and efficiency
- Over the next 20 years, memory will too
- Computing system memory will be heterogeneous
- Differentiated access memories
  - Differ in read/write properties (write once, write many)
  - Differ in optimized access pattern
  - Differ in retention/lifetime of data
  - Differ in write endurance
  - Differ in density/capacity

## Many Kinds of Memory...

DRAM SRAM MRAM RRAM FRAM PCM Flash GC HGC FeFet OS-OS

|            | Energy/power<br>(active) |                                 | •••       | Access t<br>latency | Access time,<br>atency           | endurance reten                  | retention                                     | tion Density<br>(capacity)   | On-logic chip<br>integration     |                                      |
|------------|--------------------------|---------------------------------|-----------|---------------------|----------------------------------|----------------------------------|-----------------------------------------------|------------------------------|----------------------------------|--------------------------------------|
|            | read                     | Write                           |           | read                | Write                            |                                  |                                               |                              | layer                            | Multiple<br>layers<br>for<br>density |
| High       | MRAM,<br>PCM,            | RRAM,<br>MRAM,<br>PCM,<br>Flash | DRAM      | Flash               | Flash                            | DRAM,<br>SRAM, OS-<br>OS GC, HGC | Flash, RRAM,<br>MRAM, PCM,<br>FeFET,<br>FeRAM | Flash, FeFET                 | MRAM,<br>PCM,<br>RRAM,<br>FeRAM, | FeFET,<br>OS-OS<br>GC                |
| Medium     |                          | DRAM,<br>FeRAM                  |           | PCM,<br>FeFET,      | RRAM,<br>PCM,<br>FeFET,<br>FeRAM | FeRAM,<br>MRAM                   | OS-OS GC,<br>HGC                              | DRAM,<br>FeRAM, OS-<br>OS GC | DRAM                             |                                      |
| Medium low | FeFET,<br>OS-OS<br>GC    | FeFET                           | GC        | MRAM,<br>OS-OS      | DRAM,<br>OS-OS<br>GC,<br>HGC     | PCM, RRAM                        | DRAM                                          | HGC, MRAM,<br>RRAM, PCM,     |                                  |                                      |
| low        | HGC                      |                                 | · · · · · | SRAM,<br>HGC        | SRAM                             | Flash, FeFET                     |                                               | SRAM                         | Flash                            | Flash,<br>DRAM                       |



### Results in the Past 6 Months

- Classifying memory: broad groups, defined by software use cases
  - Long term memory (LtRAM), short term memory (StRAM)
- Tools and hardware to guide which memories to use, when
  - Dynamic software in datacenter servers
  - Designing and provisioning ML accelerators
  - Gain cells and their design
- Integrating heterogeneous memory models and memories
  - Packaging, integration, and thermal concerns
  - Memory consistency and correctness
- Architectural memory primitives and verification

### Structure

### **Benefits**

Drawbacks

Uses

|           | SRAM                                          |
|-----------|-----------------------------------------------|
| Structure | 6T                                            |
| Benefits  | Fast<br>Easy to integrate<br>Low static power |
| Drawbacks | Sparse                                        |
| Uses      | Fast read/write caches                        |

|           | SRAM                                          | DRAM                             |
|-----------|-----------------------------------------------|----------------------------------|
| Structure | 6T                                            | 1T1C                             |
| Benefits  | Fast<br>Easy to integrate<br>Low static power | Dense                            |
| Drawbacks | Sparse                                        | No logic<br>High power           |
| Uses      | Fast read/write caches                        | Large, random-<br>access RW data |

|           | SRAM                                          | DRAM                             | Block Flash                                                                             |
|-----------|-----------------------------------------------|----------------------------------|-----------------------------------------------------------------------------------------|
| Structure | 6T                                            | 1T1C                             | 1G                                                                                      |
| Benefits  | Fast<br>Easy to integrate<br>Low static power | Dense                            | HUGE Capacity                                                                           |
| Drawbacks | Sparse                                        | No logic<br>High power           | No logic<br>Low endurance<br>Expensive, slow<br>erases<br>Block access<br>Low bandwidth |
| Uses      | Fast read/write caches                        | Large, random-<br>access RW data | Large, read-mostly<br>data                                                              |

|           | SRAM                                          | DRAM                             | Block Flash                                                                             | LtRAM<br>(long-term RAN                             |
|-----------|-----------------------------------------------|----------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------|
| Structure | 6T                                            | 1T1C                             | 1G                                                                                      | FeRAM, MRAM<br>RRAM, FRAM                           |
| Benefits  | Fast<br>Easy to integrate<br>Low static power | Dense                            | HUGE Capacity                                                                           | Dense<br>Low Read Energ                             |
| Drawbacks | Sparse                                        | No logic<br>High power           | No logic<br>Low endurance<br>Expensive, slow<br>erases<br>Block access<br>Low bandwidth | Writes are slow a<br>high energy<br>Limited enduran |
| Uses      | Fast read/write caches                        | Large, random-<br>access RW data | Large, read-mostly<br>data                                                              | Write rarely<br>(static caches)                     |

### **.M)** M, √

### rgy

and

nce

5)

|           | SRAM                                          | DRAM                             | Block Flash                                                                             | LtRAM<br>(long-term RAN                             |
|-----------|-----------------------------------------------|----------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------|
| Structure | 6T                                            | 1T1C                             | 1G                                                                                      | FeRAM, MRAM<br>RRAM, FRAM                           |
| Benefits  | Fast<br>Easy to integrate<br>Low static power | Dense                            | HUGE Capacity                                                                           | Dense<br>Low Read Energ                             |
| Drawbacks | Sparse                                        | No logic<br>High power           | No logic<br>Low endurance<br>Expensive, slow<br>erases<br>Block access<br>Low bandwidth | Writes are slow a<br>high energy<br>Limited enduran |
| Uses      | Fast read/write caches                        | Large, random-<br>access RW data | Large, read-mostly<br>data                                                              | Write rarely<br>(static caches)                     |

### StRAM (short-term RAM) M) Gain Cells M, (2T, 3T) Dense Low Energy ergy and Active research Refresh power nce

Write-and-read Write-and-read

S)

### Two Example Uses of StRAM and LtRAM: Datacenter Servers, **ML** Accelerators

## In Servers (x86)



## I-Caches (StRAM)



## In Servers (x86)



# DDIO





# I/O Caches (StRAMI)







LtRAM

stores weights (read-optimized, dense)

### Compute, layered on top

dense connections with LtRAM below





### Faculty Team



Philip Levis



Caroline Trippel



Chris Gregg



Mark Horowitz



Subhasish Mitra



Thierry Tambe





Keith Winstein



Mary Wootters



| 9:00  | Welcome and Project Overview                                                |  |  |  |  |
|-------|-----------------------------------------------------------------------------|--|--|--|--|
|       | Massive, Diverse, Tightly Integrated with Compute – from Device to Software |  |  |  |  |
|       | Memory Access Pattern Classification                                        |  |  |  |  |
|       | Data Lifetime and Its Refresh Implications                                  |  |  |  |  |
| 10:45 | Break                                                                       |  |  |  |  |
| 11:00 | MemGlue for Heterogeneous Architectures                                     |  |  |  |  |
|       | Conserving Memory Bandwidth with Virtual Gather                             |  |  |  |  |
| 12:00 | Walk and Lunch                                                              |  |  |  |  |
| 1:30  | Integration: Performance, Power, and Thermal Constraints                    |  |  |  |  |
|       | Gain Cell Compiler                                                          |  |  |  |  |
|       | Synthesizing High Level Models from RTL                                     |  |  |  |  |
| 3:00  | Break                                                                       |  |  |  |  |
| 3:15  | Panel: Memory Has to Change                                                 |  |  |  |  |
| 4:15  | Differentiated Access Memories:What's Next                                  |  |  |  |  |
| 4:45  | Closing                                                                     |  |  |  |  |

