# Side Channel Risks in Hardware Trusted Execution Environments (TEEs)



Wenhao Wang (王文浩) May 2019



# **About Me**

- Research Associate Professor (副研究员) of Institute of Information Engineering, CAS
- Previously worked as a visiting researcher in Indiana University (with <u>Prof.</u> <u>XiaoFeng Wang</u>)
- Research Interests
  - System Security
  - Computer Architectural Security
  - Isolation with Hardware Features
  - Privacy Preserving Computing Technologies
  - Cryptography (esp. Symmetric Cryptanalysis)

□ Email: <u>wangwenhao@iie.ac.cn</u>

## Scenarios when Hardware TEEs are needed

□ Users' private data are delegated to untrusted (public) cloud servers

□ Multi-sources (federated) deep learning training

□ Machine-Learning As A Service

Data sharing of genomic data or big data

## Scenarios when Hardware TEEs are needed

#### Crypto Techniques

- > FHE、MPC、Searchable Encryption、ZK *etc*.
- Extremally High Communication and Computation Overhead
- □ Hardware Techniques
  - > Intel TXT, ARM Trustzone, Intel SGX, AMD SEV etc.

## Hardware TEEs – A Review

| Secure<br>coprocessors | XOM  | Aegis<br>TPM | ARM<br>Trustzone | Intel TXT | Bastion | Intel SGX | AMD<br>SME/SEV | Intel<br>TME/MKTME |
|------------------------|------|--------------|------------------|-----------|---------|-----------|----------------|--------------------|
| 1994                   | 2000 | 2003         | 8 2004           | 2009      | 2010    | 2013      | 2016           | 2017               |

Excluded: Intel CAT/CET/SMEP/SMAP/VT-x/PT

# Intel SGX



Memory Encryption
Access Control
Remote Attestation

## Intel SGX

□ Enclave memory is stored within the Enclave Page Cache



# Intel SGX

Access Control Non-Enclave Access Physical Linear Traditional No Address Address Enclave Address IA Page Table Access? in EPC? Checks Yes Yes Enclave Access Address Replace in EPC? No Address No **Security Checks** With Abort are performed Yes Page Signal when address Check Fault translation is **EPCM** loaded into TLB. Allow Yes Checks No Memory Pass? Access

## What is a side channel?

□ Side channels from resources shared crossing multi-domains



# Side channels – An example (Cache Timing Attacks)

- □ The cache holds copies of aligned blocks of B bytes in main memory (blocks).
- When a memory access instruction is processed, memory cell is searched in the cache first.
- If a cache miss occurs, a full memory block is copied into the appropriate set (S possible sets) into one of the W cache lines.





# Side channels – An example (Cache Timing Attacks)



- 1. Completely evict victim data from cache
- 2. Trigger a victim data access
- 3. Access attacker memory again and see which cache sets are slow

### Side channels – An example (Controlledchannel Attacks)



# Side channels – Others?

#### **Memory Hierarchy**

Data Caching creates fast and slow execution paths, leading to timing differences depending on whether data is in the cache or not

#### **Function Unit Contention**

Sharing of hardware leads to contention, whether a program can use some hardware leaks information about other programs

#### Stateful Functional Units

Program's behavior can affect state of the function units (e.g. branching target), and other programs can observe the output (which depends on the state)

#### Variable Instruction Execution Timing

Execution of different instructions or same instruction with different operands takes different amount of time

#### Physical Emanations

Execution of programs affects physical characteristics of the chip, such as thermal changes (e.g. avx512), which can be observed

# Can we reduce the interrupts for page based attacks?

□ 1. Passive observation over the Access bit of a PTE



# Can we reduce the interrupts for page based attacks?

□ 2. Measuring the time between accesses to pages



# Can we reduce the interrupts for page based attacks?

□ 3. Clearing TLB entries from the other Hyper-thread to force a page table walk



# Hyper-threading (SMT)

Hyper-Threading enables new side channel attack surfaces



AS: architectural state (eax, ebx, control registers, etc.)

# **Problems with Hyper-Threading**



## Naïve Solutions do not work

Simply disabling Hyper-Threading

- No effective way to verify
  - cpuid, rdtscp and rdpid are not supported in enclave mode
- Remote attestation
  - Does not contain information about Hyper-Threading (before our work)
- Create a shadow thread from the enclave program to occupy the other hyper-thread
- □ How to reliably verify the physical-core co-location?

Co-location test with Contrived Data Races



□ Otherwise: At least one observe data races with low probability



- When co-located, communication time < execution time
- Each thread read the value written by the other thread with very **high** probability.

|                     | Communication time    |       |
|---------------------|-----------------------|-------|
| Protected<br>thread | read write read write |       |
|                     |                       |       |
| Shadow<br>thread    | read write read       | write |

- When **not** co-located, communication time > execution time
- Each thread read the value written by the other thread with very **low** probability.

| Thread T <sub>0</sub>                    |                                              | Thread T <sub>1</sub>                                  |                                              |
|------------------------------------------|----------------------------------------------|--------------------------------------------------------|----------------------------------------------|
|                                          | 31 cmovl %rbx, %r10                          |                                                        | 31 cmp \$1, %r9                              |
| <initialization>:</initialization>       | 32 sub %rax, %r9                             | <pre>1 <initialization>:</initialization></pre>        | 32 ; continuous number?                      |
| mov \$colocation_count, %rdx             | 33 cmp \$1, %r9                              | 2 mov \$colocation_count, %rdx                         | 33 cmova %r11, %r10                          |
| xor %rcx, %rcx                           | <pre>34 ; continuous number?</pre>           | 3 xor %rcx, %rcx                                       | 34 add %r10, %rcx                            |
| ; co-location test counter               | 35 cmova %r11, %r10                          | <pre>4 ; co-location test counter</pre>                | 35 shl \$b_count, %rbx                       |
| <synchronization>:</synchronization>     | 36 add %r10, %rcx                            | 5 <synchronization>:</synchronization>                 | 36 ; bit length of \$count                   |
| 5 ··· ; acquire lock 0                   | <pre>37 shl \$b_count, %rbx</pre>            | 6 ··· ; release lock 0                                 | 37 mov %rax, %r9                             |
| .sync0:                                  | 38 ; bit length of \$count                   | 7 .sync2:                                              | 38 ; record the last number                  |
| <pre>mov %rdx, (sync_addr1)</pre>        | 39 mov %rax, %r9                             | 8 mov %rdx, (sync_addr0)                               | 39 <b><store>:</store></b>                   |
| cmp %rdx, (sync_addr0)                   | 40 ; record the last number                  | <pre>9 cmp %rdx, (sync_addr1)</pre>                    | 40 mov %rsi, (%r8)                           |
| je .sync1                                | 41 <padding 0="" instructions="">:</padding> | 10 je .sync3                                           | 41 <padding 1="" instructions="">:</padding> |
| jmp .sync0                               | 42 nop                                       | 11 jmp .sync2                                          | 42 mov (%r8), %rax                           |
| .sync1:                                  | 43 nop                                       | 12 .sync3:                                             | 43 lfence                                    |
| mfence                                   | 44                                           | 13 mfence                                              | 44 mov (%r8), %rax                           |
| mov \$0, (sync_addr0)                    | 45 nop                                       | 14 mov \$0, (sync_addr1)                               | 45 lfence                                    |
| <initialize a="" round="">:</initialize> | 46 mov (%r8), %rax                           | <pre>15 <initialize a="" round="">:</initialize></pre> | 46 mov (%r8), %rax                           |
| mov \$begin0, %rsi                       | 47 mov (%r8), %rax                           | 16 mov \$begin1, %rsi                                  | 47 lfence                                    |
| mov \$1, %rbx                            | 48                                           | 17 mov \$1, %rbx                                       | 48 mov (%r8), %rax                           |
| mfence                                   | 49 mov (%r8), %rax                           | 18 mfence                                              | 49 lfence                                    |
| mov \$addr_v, %r8                        | 50 dec %rsi                                  | 19 mov \$addr_v, %r8                                   | 50 mov (%r8), %rax                           |
| <co-location test="">:</co-location>     | 51 cmp \$end0, %rsi                          | 20 <co-location test="">:</co-location>                | 51 lfence                                    |
| .L0:                                     | 52 jne .LO                                   | 21 .L2:                                                | 52 dec %rsi                                  |
| <load>:</load>                           | 53 ; finish 1 co-location test               | 22 <load>:</load>                                      | 53 cmp \$end1, %rsi                          |
| mov (%r8), %rax                          | 54 <all finished?="" rounds="">:</all>       | 23 mov (%r8), %rax                                     | 54 jne .L2                                   |
| <store>:</store>                         | 55 ··· ; release lock 1                      | 24 <update counter="">:</update>                       | 55 ; finish 1 co-location test               |
| mov %rsi, (%r8)                          | 56 dec %rdx                                  | 25 mov \$0, %r10                                       | 56 <all finished?="" rounds="">:</all>       |
| <update counter="">:</update>            | 57 cmp \$0, %rdx                             | 26 mov \$0, %r11                                       | 57 ··· ; acquire lock 1                      |
| mov \$0, %r10                            | 58 jne .sync0                                | 27 cmp \$end0, %rax                                    | 58 dec %rdx                                  |
| mov \$0, %r11                            |                                              | 28 ; a data race happens?                              | 59 cmp \$0, %rdx                             |
| cmp \$end0, %rax                         |                                              | 29 cmovg %rbx, %r10                                    | 60 jne .sync2                                |
| ; a data race happens?                   |                                              | 30 sub %rax, %r9                                       |                                              |

Hypothesis Test based security model

#### **Different padding instruction patterns**

Use of CMOV instructions

HyperRace: An LLVM based tool to eradicate all side-channel threats due to Hyper-Threading.
protected thread code



# Conclusion

□ The SGX design opens up many side channels.

- □ These side channels can be combined
  - > To make the attack stealthy and hard to detect
  - To achieve fine-grained observation
- □ The attacker can even reduce the noises by controlling the SW/HW environment.
- □ The side channel threats against SGX can not be ignored.

□ How to design future TEEs?

- > HW/SW co-design?
- Real world implications

### **Questions?**



