

# Composable Edge Cloud Systems With NVMe over 5G URLLC

Frederik Pfautsch Endric Schubert

#### 6G Industry Projects



Focus use case scenarios and application areas are

- Campus networks (automation, campus logistics, ...),
- Medical scenarios (hospitals, emergency, operation theatre, ...)
- Mobility (automotive, commercial vehicles, drones, ...)
- Global coverage (rural areas, in-X networking, ...)

Almost all **components** and many new **system engineering concepts for 6G systems** will be addressed with a focus on

- · Joint Communications and Sensing,
- Realtime and sync'ed networking,
- D2D, infrastructure-less, nomadic and organic networking,
- Device management, authentication, security concepts,
- Disaggregation, Open RAN evolution, OpenXG, open interfaces for third parties, ...,
- Massive usage of AI everywhere,
- · RF components (antennas, modulators, microelectronics),
- mmW and THz technology integration,
- Energy-efficient Terabit and other specialized transceiver technologies.

20.12.2022

Hans D. Schotten



#### Background

#### Frederik Pfautsch

frederik.pfautsch@missinglinkelectronics.com

- MSc Computer Engineering
   @TU Berlin
- Master's thesis in cooperation with Fraunhofer HHI, MLE and Uni Ulm
- "5G Berlin" campus network (Release 15)
- MLE
   R&D Lead Engineer for 5G/6G Radio
   Sidelink Comm & Precision Time Synch

#### **Endric Schubert**

endric.schubert@missinglinkelectronics.com

- Dipl.-Ing. ET Univ. Karlsruhe
- PhD CS Univ. Tuebingen
- Honorary Professor Univ. Ulm
- Ambassador Startup Sued
- Background in Semiconductors, EDA,
   Domain Specific Architectures w/ FPGA
- 60+ technical publications
- 20+ patents



#### MLE - Experts for Domain-Specific Compute Architectures

#### Our Mission:

- Deliver HW and SW for
   High-Performance (Embedded) Compute Systems
   & Solutions
- Offering pre-validated subsystems with FPGA IP blocks and open-source software
- Support customer projects with deep expertise and hands-on design services







Head-quartered in Silicon Valley with Design Offices in Germany

- Founded 2010, employee owned
- 18+ Certified FPGA Designers
- Customers include technology leaders, US and European government agencies, Fortune 500 companies
- Partners to:







Premier Partner



















#### **6G TakeOff** (lead: Deutsche Telekom)

- 3D networking satellites, HAPs, LAPs, drones
- Deep integration of 6G non-terrestrial networks (NTN)

#### 6G ICAS4Mobility (lead: Bosch)

- Integrated Communication & Sensing for Mobility using sidelinks,
- Mobility scenarios with cars, AGVs, drones, ..., security and privacy.



⇒ 5G/6G edge cloud devices with limited storage capacity

 NVMe over 5G? It should be possible, 5G has the low latency and bandwidth promises!

- Side effect: Measure capabilities of 5G Release 15 thoroughly

#### Interlude - PCIe

- De-facto standard for general purpose peripheral connectivity within x86 PCs and servers
- Easy extension of CotS-computers with almost any type of peripheral
- Every new PCIe gen approx. doubles the available bandwidth
  - PCle Gen 4: 31.5 GByte/s (x16)
  - PCle Gen 5: 63.015 GByte/s (x16)
- Packet-based, layered protocol (TLPs)





| Range   | Encoding | Time                 |
|---------|----------|----------------------|
| Default | 0b0000   | 50 – 50 000 μs       |
| A       | 0b0001   | $50 - 100  \mu s$    |
|         | 0b0010   | 1 - 10  ms           |
| В       | 0b0101   | 16 - 55  ms          |
|         | 0b0110   | 65 - 210  ms         |
| С       | 0b1001   | 260 - 900  ms        |
|         | 0b1010   | $1 - 3.5 \mathrm{s}$ |
| D       | 0b1101   | $4 - 13 \mathrm{s}$  |
|         | 0b1110   | $17 - 64 \mathrm{s}$ |



#### Interlude - NVMe

NVMe is an example of a modern, fast, PCIe based communication protocol.

- Avoid software reads to device registers
- Hardware device implementation can issue multiple reads in parallel, masking the round trip time
- Also software can only transfer 64 bits per access
- Pipeline processing for example by allowing for lazy pointer updates of queues
- Scale with the number of CPU cores by having independent queues/ringbuffers and MSI-X interrupts















See: Schubert, Braun and Langenbach: "PCI Express over IP - Accelerated" Embedded World Conference, 2016



# PCIe over TCP/IP Tunneling





# Proposal: PCIe over TCP/IP



- Fully transparent to network equipment
  - Just a bunch of TCP sessions
  - No special traffic handling required
- Fully transparent to PCIe
  - Reliable transport via TCP
  - Congestion control via TCP
- Based on separated and distributed upstream and downstream switch ports
  - Easily scalable via TCP session count
  - Support for multiple ethernet ports
  - Decouples cable routing from transaction layer routing
- Independent of lower network layers, e.g. physical layer

#### 5G



#### 5G



#### 5G - URLLC





## Setup







## Latency Chain

#### ø 16.2 ms!



#### **PCIe over 5G is Feasible!**

Default PCIe Completion Timeout: 50 µs to 50 ms



## Latency Map





# Latency – PCIe





ø 16.2 ms

- "PCIelat"
  - Kernel module
  - Ruby script
- Default PCIe
   Completion Timeout:
   50 μs to 50 ms



# Latency – PL2PL





ø 20.2 ms

- VHDL counter
- Replaces PCIe traffic
- Customizable packet size

# Latency – PL2PL





ø 20.2 ms

- VHDL counter
- Replaces PCIe traffic
- Customizable packet size

## Latency – PL2PL





ø 20.2 ms

- VHDL counter
- Replaces PCIe traffic
- Customizable packet size

### Summary

- It works:)
- High latency, high variance (tail latency)
   ⇒ Attached PC does not boot, re-enumeration necessary
- Latency measured by reference setup is comparable to other published setups
- 5G Release 15 introduces the majority (>99%) of latency in our setup!
- Latency is mostly independent of packet size (difference vanishes due to the high latency in general)

#### Outlook URLLC

- 5G Release 15 only implements the basic requirements for URLLC, such as "micro slots"
- 5G Release 16 and 17 will begin to support URLLC
  - Does URLLC offer enough bandwidth?
- Hardware improvements during 2022 supporting Release 16
  - Mediatek M80 chip platform released in Q1 2022
  - Qualcomm X65 or X62



# Backup Slides



## Latency – PS2PS





ø 17.7 ms

- User space C-program
- Server/Client with
  gettimeofday()
- Customizable packet size



|                                      | <b>Upstream</b> (μs) | Downstream<br>(µs) |
|--------------------------------------|----------------------|--------------------|
| NPAP                                 | 0.256 - 3.616        | 0.256 - 3.616      |
| "fake ethernet"                      | 0.816 - 2.320        | 0.816 - 2.320      |
| $PL \rightarrow PS \rightarrow PL *$ | 22.5 - 28.1          | 22.5 - 28.1        |
| PS 	o PL 	o PS *                     | 27.1 - 33.0          | 27.1 - 33.0        |
| Linux iptables                       | 10.2                 | 11.0               |
| Linux USB Network Stack*             | 36.2                 | 23.9               |
| ICMP ping to 5G core*                | 117                  | <b>7</b> 00        |
| ICMP ping to VM*                     | 11 600               |                    |
| hping to VM*                         | 14 100               |                    |

#### (a) Individual component latencies

|           | 5 <b>G</b>    | GbE           |
|-----------|---------------|---------------|
|           | (µs)          | (µs)          |
| PCIe*     | 16 161        | 186.3         |
| PL to PL* | 20254 - 32987 | 165.7 - 196.4 |
| PS to PS* | 14998 - 22597 | 156.0 - 212.6 |

(b) E2E latencies



- Xilinx Integrated Logic Analyzer
- Count cycles



## Latency – PS/PL







US: ø 25.6 µs

DS: ø 32.2 µs

PL -> PS -> PL:VHDL Counter

PS -> PL -> PS:User space C-program

## Latency – iptables





US: ø 10.2 μs DS: ø 11.0 μs

- libpcap timestamps
- tcpdump of both interfaces
- SNAT/DNAT latency



## Latency – Linux USB stack

|                          | <b>Upstream</b> (μs) | Downstream (µs) |
|--------------------------|----------------------|-----------------|
| NPAP                     | 0.256 - 3.616        | 0.256 - 3.616   |
| "fake ethernet"          | 0.816 - 2.320        | 0.816 - 2.320   |
| PL 	o PS 	o PL *         | 22.5 - 28.1          | 22.5 - 28.1     |
| PS 	o PL 	o PS *         | 27.1 - 33.0          | 27.1 - 33.0     |
| Linux iptables           | 10.2                 | 11.0            |
| Linux USB Network Stack* | 36.2                 | 23.9            |
| ICMP ping to 5G core*    | 117                  | <b>7</b> 00     |
| ICMP ping to VM*         | 11 600               |                 |
| hping to VM*             | 14 100               |                 |

#### (a) Individual component latencies

| -         |               |               |
|-----------|---------------|---------------|
|           | 5G            | GbE           |
|           | (µs)          | (µs)          |
| PCIe*     | 16 161        | 186.3         |
| PL to PL* | 20254 - 32987 | 165.7 - 196.4 |
| PS to PS* | 14998 - 22597 | 156.0 - 212.6 |

(b) E2E latencies



- tcpdump with usbmon
- Match USB packets to network packets

