

# Low-Latency Solutions for Storage-Hungry Embedded Applications

### "Flash-on-Ethernet?"

Dr. Endric Schubert, MLE

Dr. David Boggs, MLE

Dr. Ulrich Langenbach, Fraunhofer HHI

Dr. Peter Gregorius, Fraunhofer HHI



MLEcorp.com/FoE

# Flash Memory Systems-of-Systems





## Flash Memory SSD Architecture Overview

Others: Flash Controller **Encryption DRAM Cache** MLE: Interface Controller **DRAM Controller** Flash Array Flash Interface Controller Controller Flash Array Host Flash Array Flash Array RAID Controller **Power Mgmt** Voltage Big Regulator Capacitor PCB/Chassis

(Courtesy: SNIA.org)



## Flash Memory Flash-on-Ethernet





# Benefits of NVMexpress

- Built for PCle and Flash
- Multi-Queue Facilitates Acceleration





### Benefits of Ethernet





### Benefits of Ethernet



#### Server Class Adapter & LOM Ethernet Ports



Source data: Crehan Research, 2012

IEEE 802.3 Higher Speed Ethernet Consensus Ad Hoc

September 2012

15



# Flash Memory 400 Gigabit/s Ethernet









# Challenges with iSCSI

#### Computational complexity, Unpredictable latency







# Network Protocol Acceleration Technology from FhG HHI





# Network Protocol Acceleration Technology from FhG HHI





# Network Protocol Acceleration More Than TOE



#### Fraunhofer HHI

- Entire TCP / UDP protocol processing inside FPGA
- Option to run Application Layer processing in HW, too!

#### State-of-the-Art

- Software-only
- TCP Offload Engine (TOE)
- requires CPU





# Network Protocol Acceleration Best in Class Performance

- Stand-alone TCP/IP & UDP/IP stack
- Point-to-point 1GbE or 10GbE
- Full line rate of TPRmax= 9.5896 Gbps
- TCP R/W latency of TTR(W) ≥ 1.4 μs
- UDP R/W latency of TUR(W) ≥ 0.75 µs
- Round trip time of RTTmin ≥ 2.25 µs

(2013 benchmarking data from Fraunhofer HHI)





# Hardware Acceleration Enabled by Modern All-Programmable SoC

Programmable I/O

Programmable Software

Programmable Logic



"Put the processing burden where it belongs!"



# FPGA Implementation via High-Level Synthesis





- Xilinx UG902 Vivado Design Suite User Guide, High-Level Synthesis
- Xilinx XAPP1209 Designing Protocol Processing Systems with Vivado HLS



## Flash-on-Ethernet Architecture

Configurable, elastic system

Balance data rates for Latency and Bandwidth

- SSD
  200k IOPS
  800 MB/s
  PCle Gen2 x2
  10GbE
- SSD

   800k IOPS
   3 GB/s

   PCle Gen3 x4
   40 GbE





# Flash-on-Ethernet Lab Setup at MLE

- Avnet Mini-ITX
- XilinxZynq 7045
- PetaLinux
- AHCI SSD via PCle
- NPAP
- 10GbE





# **Preliminary Results**

- Good determinism
- Reasonable Latency





Embrace Faster Ethernet: 25GbE, 40GbE



(Courtesy: Brad Booth, Microsoft, 25G Ethernet CFI)



Missing Link Electronics www.missinglinkelectronics.com

Endric Schubert endric@MLEcorp.com

Phone US: +1 (408) 320-6139

Phone DE: +49 (731) 141149-66





System-of-systems are loosely coupled Embedded Systems which greatly benefit from the high performance of modern SSD technology. Machine visioning, medical imaging, and advanced driver assist systems are among those storage-hungry applications. However, communication latency and bandwidth in between the systems have a significant impact on the overall robustness, cost and performance.

Current techniques based on fieldbuses such as CAN, Flexray, have begun to hit the bandwidth wall and are more and more replaced by multi-Gigabit Ethernet plus techniques for hardware-acceleration of networking protocol stacks.

We present a proof-of-concept implementation specifically targeted for storagehungry System-of-systems. Integrated into a modern FPGA with multicore ARM CPUs to run Open Source Linux, single-chip solutions become possible which provide full compatibility with all relevant network and storage interface protocols and can reach userspace latencies within few microseconds.



# Missing Link Electronics is ...

We are a Silicon Valley based technology company with offices in Germany. We are partner of leading electronic device and solution providers and have been enabling key innovators in the automotive, industrial, test & measurement markets to build better Embedded Systems, faster.

Our mission is yo develop and market technology solutions for Embedded Systems Realization via pre-validated IP and expert application support, and to combine off-the-shelf devices with Open-Source Software for dependable, configurable Embedded System platforms.

Our expertise is I/O connectivity and acceleration of data communication protocols, additionally opening up FPGA technology for analog applications, and the integration and optimization of Open Source Linux and Android software stacks on modern extensible processing architectures.

MLE is a technology partner of Fraunhofer Heinrich-Hertz-Institute, a Certified Xilinx Alliance Partner, a member of the Altera Design Service Network, and an active contributor to the Open Source software ecosystem.











## demory Fraunhofer HHI is...

Founded in 1949, the German Fraunhofer-Gesellschaft undertakes applied research of direct utility to private and public enterprise and of wide benefit to society. With a workforce of over 23,000, the Fraunhofer-Gesellschaft is Europe's biggest organization for applied research, and currently operates a total of 67 institutes and research units. The organization's core task is to carry out research of practical utility in close cooperation with its customers from industry and the public sector.



Today it is the leading research institute for networking and telecommunications technology, "Driving the Gigabit Society".







- [1] E. P. Markatos, "Speeding up tcp/ip: Faster processors are not enough," in In 21st IEEE International Performance, Computing, and Communication Conference. IEEE, 2002, pp. 341–345.
- [2] P. E. McKenney, Is Parallel Programming Hard, And, If So, What Can You Do About It? Corvallis, OR, USA: kernel.org, 2011, available: http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html.
- [3] Z.-Z. Wu and H.-C. Chen, "Design and implementation of tcp/ip offload engine system over gigabit ethernet," in Computer Communications and Networks, 2006. ICCCN 2006. Proceedings.15th International Conference on, 2006, pp. 245–250.
- [4] G. Regnier, D. Minturn, G. McAlpine, V. Saletore, and A. Foong, "Eta: experience with an intel reg; xeon trade; processor as a packet processing engine," in High Performance Interconnects, 2003. Proceedings. 11<sup>th</sup> Symposium on, 2003, pp. 76–82.
- [5] R. Braden, "Requirements for Internet Hosts Communication Layers," RFC 1122 (INTERNET STANDARD), Internet Engineering Task Force, Oct. 1989, updated by RFCs 1349, 4379, 5884, 6093, 6298, 6633, 6864. [Online]. Available: http://www.ietf.org/rfc/rfc1122.txt
- [6] A. Salman, M. Rogawski, and J. Kaps, "Efficient hardware accelerator for ipsec based on partial reconfiguration on xilinx fpgas," in Reconfigurable Computing and FPGAs (ReConFig), 2011 International Conference on, 2011, pp. 242–248.
- [7] U. Langenbach, A. Berthe, B. Traskov, S. Weide, K. Hofmann, P. Gregorius, "A 10 GbE TCP/IP Hardware Stack as part of a Protocol Acceleration Platform", 2013 IEEE 3rd International Conference on Consumer Electronics
- [8] E. Schubert, D. Boggs, P. Gregorius, S. Voss, U. Langenbach, "Low-Latency Networking for Systemsof-Systems", 2014 Embedded World Conference