TCP/UDP/IP Network Protocol Accelerator Platform (NPAP)
IP Cores for Accelerating Network Protocols
The German Fraunhofer Heinrich-Hertz-Institute (HHI) has partnered with MLE to market the proven network accelrators “TCP/IP & UDP Network Protocol Accelerator Platform (NPAP)”. This customizable solution enables high-bandwidth, low-latency communication solutions for FPGA- and ASIC-based systems for 1G / 2.5G / 5G / 10G / 25G / 40G / 50G / 100G Ethernet links.
MLE is a licensee of Fraunhofer HHI, and offers a range of technology services, sublicenses and business models compatible with customer’s ASIC or FPGA project settings, world-wide.
If you are interested in optimizing your Linux based system for best performance with NPAP, we do suggest to read this technical publication from Bruno Leitao, IBM: “Tuning 10Gb network cards on Linux“.
- Accelerate CPUs by offloading TCP/UDP/IP processing into programmable logic (“Offloading”)
- Increase network throughput and reduce transport latency
- Bring full TCP/UDP/IP connectivity to FPGAs even if no CPU available (“Full Acceleration”)
- Complete and customizable turn-key solutions and IP cores based on the TCP/UDP/IP stack from the Fraunhofer HHI
- All MAC / Ethernet / IPv4 / UDP / TCP processing is implemented in HDL code, synthesizable to modern FPGAs and ASIC
- User applications can either be implemented in FPGA logic or in software via application-specific interfaces to CPUs
- Highly modular TCP/UDP/IP stack implementation in synthesizable HDL
- Full line rate of 70 Gbps or more in FPGA, 100 Gbps or more in ASIC
- 128-bit wide bi-directional data paths with streaming interfaces
- Multiple, parallel TCP engines for scalable processing
- Network Interface Card functionality with Bypass (optional)
- DPDK Stream interface (optional)
- Corundum NIC integration with performance DMA and PCIe (optional)
- FPGA-based SmartNICs
- In-Network Compute Acceleration (INCA)
- Hardware-only implementation of TCP/IP in FPGA
- PCIe Long Range Extension
- Networked storage, such as iSCSI
- Test & Measurement connectivity
- Automotive backbone connectivity based on open standards
- Video-over-IP for 3G / 6G / 12G transports
- Increase throughput for 10G/25G/50G/100G Ethernet
- Reduce latency in System-of-Systems communication
Remote Evaluation System for Network Protocol Accelerator Platform (NPAP)
Try out the Network Protocol Accelerator Platform (NPAP) using MLE’s Remote Evaluation System which lets you connect to MLE’s IP core evaluation lab via a remote connection so you can evaluate and try out this IP core from MLE and partners.
- Evaluate and try the IP core when it is running live on an FPGA system under your control – which saves you from engineering time to integrate and compile the IP core on target hardware
- Have your own copy of a virtual environment – which allows you to run your tests, keep your logs, for example, if your calendar forces you to interrupt your current evaluation
This remote evaluation is based on the NPAP-10G Evaluation Reference Design (ERD) for Xilinx Zynq UltraScale+ MPSoC running on the ZCU102 DevKit. The ZCU102 Devkit is physically connect via a 10GigE Twinax cable to a Mellanox ConnectX-4 LX 10G/25G NIC which sits inside the host running Your VM.
Below are the FPGA-based solutions currently supported by MLE’s Remote Evaluation System for MLE’s IP core evaluation:
MLE’s Network accelerators – Network Protocol Accelerator Platform (NPAP) – is available as a combination of Intellectual Property (IP) Cores, reference designs, and design integration services:
|Product Name||Deliverables||Example Pricing|
|Network Processing Device||Integrated processing device solution, built on top of leading FPGA technology, encapsulating one or more Network Protocol Accelerators for 1GbE and/or 10GbE.||
Based on NRE and unit volume
|Intellectual Property (IP) Cores||Single-Project or Multi-Project Use; ASIC or FPGA; Modular and application-specific IP cores, and example design projects; delivered as encrypted netlists or RTL.||starting at $78,000.-
(depends on FPGA device and line rate, please inquire)
|Evaluation Reference Design (ERD)||Available upon request as FPGA design project, with optional customizations (different target device, different transceivers, etc)||free-of-charge|
|Application-specific R&D Services||Advanced network protocol acceleration R&D services with access to acceleration experts from Fraunhofer HHI and/or MLE.||$1,680.- per engineering day|
- MLE Network Protocol Accelerator Platform (NPAP) Datasheet (updated June 2023) for
- IP Core and features description
- Architecture choices
- Latency analysis results
- FPGA resource numbers for AMD/Xilinx, Intel, Microsemi
- IETF RFC1122 protocols supported
- IP Core code changelog
Frequently Asked Questions
NPAP is integrated with the FPGA vendors PCS/PMA layer and thus is compatible with other IEEE compliant Ethernet Network Interface Cards (NIC) for 1 GigE, 10 GigE, 25 GigE, 40 GigE, 50 GigE, 100 GigE. Please refer to the FPGA device vendors documentation of the subsystem for further information.
NPAP implements all networking functions required by IETF RFC 1122 and thus is interoprable with software stacks from Microsoft Windows, Open-Source Linux (3.x or newer) as well as Mellanox/libvma or SolarFlare OpenOnload. Please refer to the NPAP Datasheet for more information.
Yes, typically, we configure and instantiate NPAP with BRAMs for the Rx/Tx buffers. For applications where NPAP transmits data to a server we suggest 128K Bytes per TCP session (i.e. TCP port instance) to accomodate the (slower) processing of the software TCP stack running on the Recipient. Please refer to the NPAP Datasheet for more information.
Here is the metric to determine TCP buffer sizes for NPAP (keep in mind, that TCP buffers are placed on both ends: Tx side and Rx side):
Buffer size (in bits) = Bandwidth (in bits-per-second) * RTT (in seconds)
RTT is the Round-Trip Time which is the time for the Sender to transmit the data plus the time-of-flight for the data, plus the time it takes the Recipient to check for packet correctness (CRC), plus the time for the Recipient to send out the ACK, plus the time-of-flight for the ACK, plus the time it takes the Sender to process the ACK and release the buffer.
- If the recipient is NPAP in a direct connection then we can assume ACK times less than 20 microseconds, i.e. buffer sizes shall be 200k bits. Means in this case a 32 kBytes on-chip BlockRAM per TCP session will be sufficient.
- If the recipient is software then RTT can be much longer, mostly due to the longer processing times in the OS on the recipient side. For a modern Linux we can assume RTT of 100 microseconds, or longer (see here  or run a ‘ping localhost’ on your machine). Means buffer sizes shall be around 1M bits, or the 128K Bytes of BRAM we typically instantiate.
Founded in 1949, the German Fraunhofer-Gesellschaft undertakes applied research of direct utility to private and public enterprise and of wide benefit to society. With a workforce of over 23,000, the Fraunhofer-Gesellschaft is Europe’s biggest organization for applied research, and currently operates a total of 67 institutes and research units. The organization’s core task is to carry out research of practical utility in close cooperation with its customers from industry and the public sector.
Fraunhofer HHI was founded in 1928 as “Heinrich-Hertz-Institut für Schwingungsforschung“ and joined in 2003 the Fraunhofer-Gesellschaft as the “Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut„. Today it is the leading research institute for networking and telecommunications technology, “Driving the Gigabit Society” .