



## Future of Memory: Massive, Diverse, Tightly Integrated with Compute – from Device to Software

Shuhan Liu<sup>1\*</sup>, Robert M. Radway<sup>1</sup>, Xinxin Wang<sup>1</sup>, Jimin Kwon<sup>1</sup>, Caroline Trippel<sup>2</sup>, Philip Levis<sup>2</sup>, Subhasish Mitra<sup>1,2</sup>, <u>H.-S. Philip Wong<sup>1\*</sup></u> <sup>1</sup>Department of EE, <sup>2</sup>Department of CS, Stanford University, CA, USA.

(\*E-mail: shliu98@stanford.edu, hspwong@stanford.edu)

#### **Memory Needs Outpace Memory Advances**



https://nano.stanford.edu/downloads/technology-integration-trend

### **Software Assumes Uniform Memory**



#### A word-addressable random-access uniform memory address space

### **Software Use of Memory: Very Diverse**



**Data Analytics** 

#### Streams of data

- Write-once, read-once
- Filters (scans)
- Joins (random access)



Append-Mostly Databases

#### Read >> Write

- Write once
- Mostly append
- Read many times
- Scans
- Random access



Machine Learning Accelerator



High-Speed Networking

#### **Blocked operations**

#### **Packet-oriented**

- Blocked operations
- Sparse accesses
- Read multiple times
- Write many times

- Ultra-low latency
- Header processing
- Packet-oriented
- Read once
- Write once

Philip Levis, Differentiated Memory (DAM) Project white paper, https://dam.stanford.edu/

#### **Diverse Memories**

*Various* parts of this memory have to perform functions which differ somewhat in their nature and considerably in their purpose ... — J. von Neumann 1946



STT-MRAM

Spin transfer torque magnetic random access memory РСМ

Phase change memory

#### RRAM

Resistive switching random access memory

#### Gain Cell

**G**ain **c**ell memory (quasi-non-volatile)

#### FeRAM

Ferro-electric 1T1C memory (destructive read)

#### FeFET

Ferro-electric field effect transistor

Updated from: H.-S. P. Wong, S. Salahuddin, Nature Nanotech., 2015.

#### **Diverse Memories**

Focus on: integration of memory with new capabilities as a tool in our toolbox



STT-MRAM

Spin transfer torque magnetic random access memory PCM

Phase change memory

#### RRAM

Resistive switching random access memory

#### Gain Cell

**G**ain **c**ell memory (quasi-non-volatile)

#### FeRAM

Ferro-electric 1T1C memory (destructive read)

#### FeFET

Ferro-electric field effect transistor

Updated from: H.-S. P. Wong, S. Salahuddin, Nature Nanotech., 2015.

### **Massive** Memory On-Chip



7

#### Future of Memory: Massive, Diverse, Tightly Integrated with Compute



### **Diverse Memory:**

- How to choose?
- How to use?
- What attributes are important?



#### **Exposing Hardware to Software**



### **Abstraction Layer Needed**

Search

#### Map data type to memory classes **R/W Energy** Capacity AI/ML SRAM Data lifetime **R/W Speed** training DRAM Data Type - C R/W statistics AI/ML **MEM Class - C** inference Granularity Flash Workload Activity profile Data Gain Cell profiling .......... Latency analytics Endurance RRAM Data Type - B **MEM Class - B Bandwidth** Transactional MRAM Retention Power databases PCM Speed Knowledge **MEM Class - A** Integration **FeFET** extraction Data Type - A Process Area/cost **FeRAM**

Reliability

Density

### **Software Data Types**

Type A "mostly read" – e.g. AI/ML inference weight memory and processor instruction caches Type B "streaming data" – e.g. streaming I/O, AI/ML activations, and data analytics Type C "frequent write" – e.g. buffers for a file system, AI/ML training memory



### Type A "mostly read" – Frequent Reads, Infrequent Writes, Predictable Accesses

#### Trade-off write costs for better read

| Data<br>type | Example              | Read<br>Energy<br>(pJ/bit) | Read<br>Latency<br>(ns) | Write<br>Energy<br>(pJ/bit) | Write<br>Latency<br>(ns) | Endurance<br>(cycles) | Retention<br>(s) | Capacity | Access<br>granularity | Memory<br>Today | Future<br>Memory |
|--------------|----------------------|----------------------------|-------------------------|-----------------------------|--------------------------|-----------------------|------------------|----------|-----------------------|-----------------|------------------|
| Α            | Instruction<br>cache | < 0.5                      | < 1                     | < 500                       | < 1,000                  | > 1× 10 <sup>8</sup>  | >1               | 8KB-1MB  | Word<br>(8-16B)       | SRAM            | MRAM,<br>RRAM    |



PCM

memory

#### STT-MRAM

Spin transfer torque magnetic random access memory Phase change Resistive

switching random access memory

RRAM

### **COMBINATION of Attributes Matters**

#### Trade-off write costs for better read, but write also matters

| Data<br>type | Example              | Read<br>Energy<br>(pJ/bit) | Read<br>Latency<br>(ns) | Write<br>Energy<br>(pJ/bit) | Write<br>Latency<br>(ns) | Endurance<br>(cycles) | Retention<br>(s) | Capacity | Access<br>granularity | Memory<br>Today | Future<br>Memory |
|--------------|----------------------|----------------------------|-------------------------|-----------------------------|--------------------------|-----------------------|------------------|----------|-----------------------|-----------------|------------------|
| Α            | Instruction<br>cache | < 0.5                      | < 1                     | < 500                       | < 1,000                  | >1×10 <sup>8</sup>    | > 1              | 8KB-1MB  | Word<br>(8-16B)       | SRAM            | MRAM,<br>RRAM    |



#### Example RRAM/MRAM : Write energy & endurance should be optimized together

### Type B "streaming data" – Frequent Writes, Few Reads per Write, Short Data Lifetime

#### Trade-off retention for speed/density/energy

| Data<br>type | Example            | Read<br>Energy<br>(pJ/bit) | Read<br>Latency<br>(ns) | Write<br>Energy<br>(pJ/bit) | Write<br>Latency<br>(ns) | Endurance<br>(cycles) | Retention<br>(s) | Capacity | Access<br>granularity | Memory<br>Today | Future<br>Memory    |
|--------------|--------------------|----------------------------|-------------------------|-----------------------------|--------------------------|-----------------------|------------------|----------|-----------------------|-----------------|---------------------|
| В            | Video<br>streaming | < 200                      | < 1, 000                | < 200                       | < 1,000                  | >1×10 <sup>9</sup>    | 0.1 - 10         | 1KB-10MB | Page<br>(KB)          | DRAM            | FeRAM,<br>Gain Cell |



Gain cell memory (quasi-non-volatile)

Ferro-electric 1T1C memory (destructive read)

### **Trade-off Design Knob Matters**

#### **Trade-off retention for speed/density/energy**

| Data<br>type | Example            | Read<br>Energy<br>(pJ/bit) | Read<br>Latency<br>(ns) | Write<br>Energy<br>(pJ/bit) | Write<br>Latency<br>(ns) | Endurance<br>(cycles) | Retention<br>(s) | Capacity | Access<br>granularity | Memory<br>Today | Future<br>Memory    |
|--------------|--------------------|----------------------------|-------------------------|-----------------------------|--------------------------|-----------------------|------------------|----------|-----------------------|-----------------|---------------------|
| В            | Video<br>streaming | < 200                      | < 1, 000                | < 200                       | < 1,000                  | >1×10 <sup>9</sup>    | 0.1 - 10         | 1KB-10MB | Page<br>(KB)          | DRAM            | FeRAM,<br>Gain Cell |



#### **Oxide Semiconductor Gain Cell**



Shuhan Liu, ..., H.-S. Philip Wong, IEDM 2023, T-ED 2024, VLSI 2024

### Hybrid Gain Cell – High-density Scalable to N5



### **Optimize Tradeoff Guided by Software Use**



### Diverse Hardware Specs for Software Data Types A, B, C



### **Typical Memory Comparison**



We may be working too hard for no good reason !

- Attributes in isolation
- Not application-correlated

|           | SRAM | DRAM   | RRAM | MRAM   |
|-----------|------|--------|------|--------|
| Energy    | Low  | Medium | High | High   |
| Speed     | High | Medium | Low  | Low    |
| Density   | Low  | Medium | High | High   |
| Endurance | High | High   | Low  | Medium |

### Memory Comparison w/ Improvement Target

**Improvements** needed for each **memory** technology to be used in the **software** use cases, based on state-of-the-art macro demonstrations.

| Data<br>Type | SRAM    | 3D V-<br>Cache   | DRAM      | OS-OS Gain<br>Cell | Hybrid<br>Gain Cell | RRAM                           | MRAM            | РСМ                      | FeRAM          |
|--------------|---------|------------------|-----------|--------------------|---------------------|--------------------------------|-----------------|--------------------------|----------------|
| В            | Density | Standby<br>power | Retention | Capacity           | Capacity            | Endurance<br>& write<br>energy | Write<br>energy | Endurance & write energy | Read<br>energy |

Type B "streaming data" – e.g. streaming I/O, AI/ML activations, and data analytics

### Physical Layers with Interface Protocol (Today)



# The KEY is INTEGRATION





### Devices, Materials, Process Technologies, and Microelectronic Ecosystem Beyond the Exit of the Device Miniaturization Tunnel

160

H.-S. Philip Wong<sup>D</sup>, *Life Fellow, IEEE*, and Subhasish Mitra<sup>D</sup>, *Fellow, IEEE* 

H.-S. P. Wong and S. Mitra, *IEEE Trans. Materials for Electron Devices* (T-MAT), 2024.

### **RRAM & Gain Cell Integration on Si CMOS: On-Chip Physical Integration**



Shuhan Liu, ..., H.-S. Philip Wong, IEDM 2024, paper 15-3

### **RRAM & Gain Cell Integration on Si CMOS: On-Chip Architectural Integration**



Shuhan Liu, ..., H.-S. Philip Wong, IEDM 2024, paper 15-3

### RRAM non-volatility provides 9× System energy benefits



### High-Capacity RRAM: 1T8R, 3D RRAM



### **Continuum of Interconnection Density**



Inter-chip integration *continuum* 

#### Interconnect Density – Inter-Chip Physical Integration



H.-S. P. Wong and S. Mitra, IEEE Trans. Materials for Electron Devices (T-MAT), 2024.

### Illusion System – Inter-Chip Architectural Integration



Three Key Ideas: <u>Enough</u> on-chip memory + <u>Quick</u> chip ON/OFF + <u>Special</u> mapping

R.M. Radway, ... Subhasish Mitra, IEDM 2021, paper 25.4 and Nature Electronics 2021.

### Illusion within 1.15 × Dream EDP



Illusion  $\approx$  Dream

1.15× Dream EDP

**Illusion Energy** 

≤ 1.1×

**Dream Energy** 

Illusion Exec. Time

*≤* 1.05×

Dream Exec. Time

(measured for AI inference)

R.M. Radway, ... Subhasish Mitra, IEDM 2021, paper 25.4 and Nature Electronics 2021; K. Prabhu\*, R.M. Radway\*, ... Priyanka Raina, JSSC 2022.

# Hardware-proven backed by theory



#### 6 to 8 Chip Illusions 32 KB to 96 MB Systems

#### Future of Memory: Massive, Diverse, Tightly Integrated with Compute – from Device to Software

- <u>Massive</u> High-Density On-Chip Memory
- **Diverse** Memories Exposed to Software
- **<u>Tight Integration</u>** with Compute Physically and Architecturally



### Acknowledgments



Semiconductor Research Corporation







#### CHIMES Center for Heterogeneous Integration

of Micro Electronic Systems











Stanford Differentiated Access Memories Project



#### Stanford NMTRI NON-VOLATILE MEMORY TECHNOLOGY RESEARCH INITIATIVE

### **Continuum of Interconnection Density**



Inter-chip integration *continuum*