NVMe Fast FPGA RAID Recorder System

July 8, 2025AMD Alveo, Fast FPGA RAID, FFRAID, High Speed Data Recording, High Speed Data Streaming

NVMe Fast FPGA RAID Recorder System

Unlike 3rd-party Network-Attached-Storage (NAS) systems which have limited read/write bandwidth, MLE NVMe Fast FPGA RAID (FFRAID) Recorder System, based on MLE’s NVMe FFRAID Accelerator, can scale to 400 Gbps or more, delivering loss-less and gapless data recording from multiple data sources onto a RAID of NVMeTM SSDs.

MLE NVMe FFRAID Recorder is a turnkey system – delivered as a ready-to-run appliance – to support data-in-motion pre- and post-processing and is highly scalable with regards to bandwidth and recording capacity.

Key Features

Scalable from 100 to 400 Gbps, or more
Cascading of multiple systems with time-synchronization
Start-Pause-Stop Data Recording
Pre-trigger Data Recording in circular buffers
Adaptable signal front-ends
Read/write compatible with Linux Software-RAID

Applications

Autonomous Vehicle Path Record & Replay
Automotive / Medical / Industrial Test Equipment
Broadcast Recording
High-speed Radar / Lidar / Camera Data Acquisition & Storage
Network Telemetry and Analytics
Very Deep Network Packet Capture of Ethernet or IPv4 or TCP/UDP Data

NVMe FFRAID Recorder Turnkey System Availability

MLE NVMe FFRAID Recorder is a ready-to-run turnkey system which

integrates multiple MLE NVMe FFRAID Accelerator subsystems
on off-the-shelf FPGA cards
along with a standard Linux server
which has been optimized for PCIe/NVMe cost/performance and
with a choice of pre-validated NVMe U.2/U.3 SSDs

Formfactor choices for a turnkey NVMe FFRAID recording system include bench-top appliance, 19”-rack mount systems, or embedded recording systems which have been optimized for Size, Weight, Cost and Power.

Bench-Top NVMe FFRAID Recorder System

MLE NVMe FFRAID Bench-Top Recorder System can be highly customized based on the “Mayflower” system from Inonet GmbH. The Inonet Quicktray makes it easy to swap a RAID-0 unit of 4x NVMe SSDs – a nice feature for data recording in the field, or the recording of many different data sets.

19” Rackmount NVMe FFRAID Recorder System

MLE NVMe FFRAID 19” Server is intended for rack-level integration.

Embedded Recording Systems

Embedded recording systems can come in various form factors and use different SoC-FPGAs, all optimized for Size, Weight and Power, and/or Cost (SWaPC).

On the right is an example of NVMe FFRAID Embedded Recorder system based on AMD Versal AI Edge System-on-Chip.

Exemplary Remote User Interface

From a user’s perspective, MLE NVMe Fast FPGA RAID Recorder is implemented via a Remote Procedure Call (RPC) API. This enables users to quickly implement their own look&feel GUI via a webpage running on a separate machine, connected to the MLE NVMe Fast FPGA RAID Recorder via LAN.

To facilitate integration and testing, MLE provides a complete example set of Python and CURL commands for running the Recorder via this RPC API.

MLE NVMe FFRAID vs Network-Attached Storage (NAS)

Unlike 3rd-party Network-Attached-Storage (NAS) systems which have limited read/write bandwidth, NVMe FFRAID can scale to 400 Gbps, or more.

	MLE NVMe FFRAID	Typical NAS system
Data Rate	100 to 400 Gbps	limited to < 80 Gbps
High-accuracy IEEE time synchronization	yes	no
Recording Capability	highly optimized to store high-speed data to NVMe storage	file system based
Add-on Capability	on-the-fly data decimation indexing and adding metadata transparent data proxy	None
Storage Location	RAID of NVMe SSDs	RAID of NVMe SSDs
Remote Management	yes, via host Linux	yes
Front-End	adaptable, channel-based, high-speed I/O	network only
SSD Security	Self-encryption TCG OPAL	Self-encryption TCG OPAL
RAID compatibility	open-source Linux	proprietary or open-source Linux

NVMe FFRAID Accelerator Subsystem

MLE NVMe FFRAID Recorder is a customizable turnkey solution which is based on the MLE NVMe FFRAID Accelerator Subsystem, featuring:

Scalable channel-based architecture
Compatibility and interoperability with Linux MDRAID
Support for many data acquisition use cases including decimation or metadata indexing
Simplex record, simplex replay, half-duplex and full-duplex modes

Channel-Based Architecture

MLE’s NVMe FFRAID Recorder implements a channel-based architecture where each data source/sink can be associated with a dedicated RAID engine and a dedicated storage space. Each channel can have 10/25/50/100 Gbps, or combinations thereof.

Adaptable signal front-ends support many different I/O standards in a “mix & match” fashion.

This channel-based architecture along with the combination of FPGA NVMe Recording Stack plus a well-tuned PCIe setup, delivers a best-in-class price/performance ratio for high-speed data acquisition, recording and replay. MLE’s multi-core NVMe Host Controller Subsystem supports dedicated NVMe queues per SSD in a PCIe Peer-to-Peer communication.

The NVMe FFRAID Recorder also supports high-performance and high-endurance NVMe U.2/U.3 SSDs with self-encryption TCG OPAL security function!

Recording Capacity and Scalability

MLE’s NVMe FFRAID Recorder supports a wide range of NVMe SSDs and can be scaled from M.2 SSDs for small and light-weight embedded systems up to large 19” racks using high-performance U.2 or U.3 SSDs. Scalability also includes selecting from different SSD capacities and Drive-Writes-per-Day (DWPD) models. Here a table of possible recording times in minutes:

	Recording Speed (Gbps)
Storage (TiB)		100	150	200	250	300	350	400
	5	7.2	4.8	3.6	2.9	2.4	2	1.8
	10	14.3	9.5	7.2	5.7	4.8	4.1	3.6
	15	21.5	14.3	10.7	8.6	7.2	6.1	5.4
	20	28.6	19.1	14.3	11.5	9.5	8.2	7.2
	25	35.8	23.9	17.9	14.3	11.9	10.2	8.9
	30	42.9	28.6	21.5	17.2	14.3	12.3	10.7
	35	50.1	33.4	25.1	20.0	16.7	14.3	12.5
	40	57.3	38.2	28.6	22.9	19.1	16.4	14.3
	45	64.4	42.9	32.2	25.8	21.5	18.4	16.1
	50	71.6	47.7	35.8	28.6	23.9	20.5	17.9
Recording Time in Minute(s)

Data Acquisition Pre- and Post-Processing

Besides record/replay of raw data NVMe FFRAID supports data-in-motion pre- and post-processing that enables you to add your custom algorithms for indexing and metadata generation, on-the-fly data decimation, or running in “spy-mode” as a transparent data proxy.

Plain Recording, Loss-Less and Gapless

Ingress data from the high-speed sensors are transferred and recorded at-speed and as-is onto the NVMe FFRAID.

Data Proxy & Record

Communication from a high-speed data source can be transported to a data sink while this data is also recorded at-speed.

Data Decimation & Record

Unwanted pieces of the ingress data is removed on-the-fly prior to storage, certain regions-of-interest (ROI), for example.

Adding Meta-Data & Record

Ingress data can be analyzed on-the-fly to generate indexing information for later search, for example. This metadata is then recorded along with the ingress data. Metadata can, for example, be: Hardware timestamps, regions-of-interest, search indexes.

NVMe FFRAID is Linux Compatible

NVMe FFRAID is fully compatible with Linux Software-RAID (via the Linux MD driver). This allows recording at high data rates and replaying at slower speeds, or vice versa. For performance reasons, NVMe FFRAID stores your data as so-called Linux block storage, i.e. no filesystems are used which slow down data acquisition and/or retrieval. Hence, you can record via NVMe FFRAID and replay that same data from a Linux MDRAID, and vice versa:

“Simplex Record”

Ingress data (1) is recorded at high-speed using NVMe FFRAID (2). Once recording is done the NVMe FFRAID releases the SSD RAID and Linux opens this as an MDRAID. Then data can be replayed via Linux (3), typically at lower speeds, and, for example sent out via a Linux network connection (4).

“Simplex Replay”

Ingress data (1) comes in via a Linux NIC, or any other Linux userspace software, for example, and is recorded onto a Linux MDRAID (2). Once recording is done, Linux releases the SSD RAID and NVMe FFRAID then opens it. Then data can be replayed via NVMe FFRAID (3) and be streamed-out at high data rates (4).

“Half-Duplex Record & Replay”

Ingress data (1) is recorded at high-speed using NVMe FFRAID (2). Once recording is done, then data can be replayed via NVMe FFRAID (3) and be streamed-out at high data rates (4). Because you operate the NVMe SSDs purely in sequential read (ex-or write), this features best performance.

“Full-Duplex Record & Replay”

Ingress data is recorded at high-speed using NVMe FFRAID. At the same time, while recording, data is replayed from the NVMe FFRAID and be streamed-out at high data rates. Because typical NVMe SSDs deliver less performance when writes happen parallel to rads, you will experience less performance in this mode.

Availability & Pricing

NVMe FFRAID is available as IP cores (for select FPGA devices), as NVMe FFRAID Cards (for select off-the-shelf FPGA cards), and as NVMe FFRAID Recorder (a variety of turnkey system appliances from MLE or MLE partners).

VistProduct Name	Deliverables	Example Pricing
Evaluation Reference Design (ERD)	For evaluation puposes, and upon request, we can provide ready-to-run loaner cards or systems based on our PCIe 4.0 or PCIe 5.0 implementation.	Upon Request Inquire
NVMe FFRAID Recorder System	A ready-to-use yet customizable turnkey system built by MLE or MLE partners where hardware, software and “gateware” is fully integrated and tested. Different formfactor choices from medium-sized embedded PC to 19″ rack mount, lab-use or ruggedized. You can bring your own SSDs, or choose SSDs from our many options depending on the storage capacity required.	Starting at $24,800.- (depends on SSD performance and capacity) Inquire
NVMe FFRAID Card	Our NVMe FFRAID Cards are based on off-the-shelf 3rd party FPGA cards (such as AMD Alveo or Altera). Includes a Single-Project-Use netlist FPGA design license so you can alter the FPGA-based signal frontend.	Starting at $9,800.- Inquire
Intellectual Property (IP) Cores	Single-Project or Multi-Project Use for select FPGA devices; Modular and application-specific IP cores, and example design projects; delivered as encrypted netlist or RTL. Learn more about MLE NVMe FFRAID Accelerator offerings.	Inquire

Please contact MLE for additional details on NVMe FFRAID, including customization services and other product and licensing options.

Documentation

Datasheet

Brochure

Application Note

Success Story

News

Frequently Asked Questions

Does the NVMe FFRAID Recorder support file systems?

No, the NVMe FFRAID Recorder uses so-called Block Storage. So, no file systems are not supported. For each data transfer the user application logic selects a start and maximum end address, and then data is written to flash in a linear fashion. This achieves best performance and avoids write amplifications.

Does the NVMe FFRAID Recorder support drive partitions?

Partitions are not explicitly supported. However, the user application logic can use the NVMe FFRAID Recorder to read the SSD’s partition table and then set up transfers with start and maximum end address to be aligned to partitions.

Does the NVMe FFRAID Recorder support NVMe namespaces?

Only one single namespace is supported per SSD.

How many SSDs can be connected to the NVMe FFRAID Recorder?

The standard for the NVMe FFRAID Recorder is 4/8/16 SSDs. The number of SSDs can be adjusted to your application within certain limits, for example: the accumulated sustained write speed should be faster than the incoming data stream, or too many SSDs can cause latency issues. However, we can customize the NVMe FFRAID Recorder for your application to support more complex PCIe topologies. Please ask us for more details.

How many NVMe IO Queues does the NVMe FFRAID Recorder support and what is the depth of the NVMe IO Queue?

NVMe FFRAID Recorder currently supports one single IO Queue per SSD. This IO Queue can have up to 128 entries, each with up to 128 KiB data. I.e. you can have up to 16 MiB of “data in flight” per SSD. If needed, we can change the depth and size of this IO Queue. However, given the needs of streaming applications increasing the number of IO Queues may not be advantageous.

Does the NVMe FFRAID Recorder support PCIe Peer-to-Peer?

Yes, this is supported. Peer-to-Peer transfers can be very attractive as it frees up the host CPU. Team MLE can customize the NVMe FFRAID Recorder for your application to support many more complex PCIe topologies, including multiple direct-attached SSDs, multiple SSDs connected via a 3rd party PCIe switch chip, including PCIe Peer-to-Peer. Please ask us for more details.

How many parallel streams can be processed?

Currently, the NVMe FFRAID Recorder handles 16 independent data streams. To save resources, the number of streams can be reduced without losing the overall performance by widening the data paths.

Does the NVMe FFRAID Recorder support M.2 PCIe connectivity?

Yes. Because the NVMe FFRAID Recorder is agnostic to the formfactor of your SSD M.2, U.2, U.3, EDSFF and so on are supported, as long as your SSD “speaks the NVMe protocol” and not SATA nor SAS.

What are the best SSDs to use and from which vendor?

While, again, the NVMe FFRAID Recorder is compatible to work with any NVMe SSD, there are a couple of other aspects to keep in mind when selecting an NVMe SSD: Noise, vibration, harshness, temperature throttling, local RAM buffers, SLC, MLC, TLC, QLC, 3D-XPoint, etc. To enable our customers to deliver dependable performance solutions, we have worked with a set of 3rd party SSD vendors and would be happy to give you technical guidance in your project. Please inquire.

Auto/RPS SDV Prototyping

September 5, 2024ECU, in-vehicle Networking, Network Acceleration, Zonal Gateway

Automotive Rapid Prototyping System (Auto/RPS)

MLE provides an FPGA-based Rapid Prototyping System (RPS) catering to the specific needs of automotive engineers designing next-generation Zone Based Architectures.

MLE Auto/RPS enables automotive system engineers to design and to validate software-defined vehicle (SDV) functions along with MLE Auto/TSN in-vehicle networking.

MLE Auto/RPS was designed as a shortcut into A-sample hardware development of Zonal Gateways / ECUs and implements an FPGA Full System Stack based on the Trenz Electronic TE0950-02 SoC-FPGA Development Kit featuring the AMD Versal AI Edge FPGA, and an automotive FPGA subsystem from MLE.

Edit Template

Features and Benefits

Based on open standards and open-source software
Support for multiple, different sensor inputs
Backbone connectivity up to 100 Gbps
Open-source real-time operating systems
Multi-core ARM processing system
Flexible, adaptable FPGA design for implementing data acquisition and data preprocessing (DADP)
Supports Secure OTA via ARM OP-TEE Trusted Execution Environment (optional)

IO Interfaces

12 V DC power supply
1 Gig Ethernet
USB 2.0 for JTAG and console
2x ports for 25G Ethernet (up to 4 ports optional)
MIPI CSI-2 x2 Camera Input (optional)
GMSL (via optional adapter AD-GMSL2ETH-SL)
Up to 2x CAN-FD (via CRUVI HS) (optional)
Up to 2x CAN 2.0B (via CRUVI HS) (optional)
PCIe 4.0 x4 NVMe M.2 SSD via Opsero FMC with 4x GTYP for (optional)

Processing Functionality

Dual-core ARM Cortex A72
Dual-core ARM Cortex R5F
8 GB DDR4 DRAM
128 MByte SPI Flash (primary boot option)
32 GB eMMC (secondary boot option)
MicroSD Card (for Linux root file system, for example)
150k LUTs
464 DSP Engines
34 AI Engines-ML with up to 45 TOPS INT8

Dual-Core ARM Cortex-A72 Software Environment

Ubuntu 2024.04 LTS (pre-installed)
Linux OS Debian 12 PREEMPT_RT (optional)
Yocto project design flow (optional)
AMD/Xilinx Petalinux (optional)

Dual-Core ARM Cortex-R5F Software Environment (optional)

FreeRTOS (optional)

FPGA System Block Diagram

FPGA Development Kit

Auto/RPS-TE-0950-25G

Hardware based on TE0950 AMD Versal™ AI Edge Evalboard from Trenz Electronic
Features the AMD Versal™ AI Edge VE2302-1LSE
FPGA Full System Stack for MLE Auto/TSN 2x25G
Ubuntu 24.04 LTS for ARM
Customized MLE Auto/TSN Linux kernel 6.6.10
12 VDC for lab and table top operation

Pricing and Availability

Product Name	Deliverables	Example Pricing
Rapid Prototyping System (Base) AUTORPS-TE-0950-25G	FPGA Full System Stack for MLE Auto/TSN stack for 2x 25 GigE comprising hardware (FPGA board, power supply, active cooling, enclosure), system FPGA config (bitfile and rootfs).	$3,880,- per unit ( MOQ 2 units) Purchase at Trenz Electronics Shop
System FPGA Development Kit	AMD/Xilinx Vivado Design Project plus Commercial Single-Project-Use License delivered as encrypted netlists or RTL.	Please Inquire
Application-specific R&D Services	Advanced FPGA design services with access to acceleration experts from MLE.	$1,880.- per engineering day (or fixed price project fee)

Please contact MLE for additional details on Auto/RPS products and services or other product and licensing options.

Edit Template

Documentation

Brochure

Application Note

Technical Document

News

Brochure

Application Note

Technical Document

News

Trenz FPGA Boards

December 1, 2023

FPGA Starter Kits

FPGA System-on-Modules

Carrier Board Design

Chip-Down Turnkey Solutions

MLE’s partner, Trenz Electronic, offers FPGA Starterkits, FPGA modules (System-on-Modules), Carrier Boards and Chip-Down Turnkey Solutions.

FPGA Starter Kits

FPGA Starterkits get you faster and closer towards your FPGA target hardware. Trenz Electronic has been designing Starterkits for engineering purposes: Components have been carefully selected for availability, cost / performance and manufacturability (not just for marketing reasons). And, more important, Trenz Electronic Starterkits have a clear path towards product ramp-up, either via a risk-optimized, customized baseboard with a SoM, or via a cost-optimized so-called chipdown implementation.

Discover more Trenz FPGA Starter Kits

FPGA Modules (System-on-Modules)

FPGA Modules, (System-on-Modules, SoM) are Components Off-the-Shelf (COTS) which significantly de-risk your system design because many tedious and risky engineering steps have been taken care of already: Clock, power, reset, configuration, high-speed memory interfaces, etc have been implemented in a reliable and fully tested module. You can pick from large variety of SoMs with different device families, device temperatures and device speed-grades. This is combined with a very competitive pricing because you benefit from volume-based step-pricing even if you need very few units.

Trenz TE0820 MPSoC FPGA Module with AMD Zynq™ UltraScale+™ — Trenz TE0820 MPSoC-Module mit AMD Zynq™ UltraScale+™

Trenz TE0712 FPGA Module with AMD Artix™ 7 — Trenz TE0712 FPGA-Module mit AMD Artix™ 7

Trenz TE0720 SoC FPGA Module with AMD Zynq™ 7000 — Trenz TE0720 SoC-Module with AMD Zynq™ 7000

Trenz TE0813 MPSoC FPGA Module with AMD Zynq™ UltraScale+™ — Trenz TE0813 MPSoC-Module mit AMD Zynq™ UltraScale+™

More FPGA System-on-Modules

Carrier Board Design for FPGA Modules

Obviously, MLE and Trenz can help you with the design and manufacturing of your custom carrier board. Please contact us to discuss technical and commercial details. Until your custom carrier board arrives we suggest you can start development by using one of our ready-to-run carrier boards:

TEF1002 Carrier board for Trenz FPGA modules — TEF1002 Carrier board for Trenz modules

Carrierboard for a TE0728 Automotive Zynq-7020 FPGA Module — Carrierboard for a TE0728 Automotive Zynq-7020 SoC Module

UltraITX+ Baseboard for Trenz TE080X UltraSOM+ FPGA Modules — UltraITX+ Baseboard for Trenz TE080X UltraSOM+

UltraITX+ Baseboard for Trenz TE081X UltraSOM+ FPGA Modules — UltraITX+ Baseboard for Trenz TE081X UltraSOM+

More FPGA Carrier Boards

Chip-Down Turnkey Solutions

Depending on your SWAP-C (Size, Weight and Power and Cost-Down) requirements, your FPGA target hardware can be implemented via a so-called “Chip-Down” Turnkey solution. For a low NRE fee we “stretch” the PCB of the SoM to make additional space for extra components and connectors. Hence, the SoM becomes your FPGA single-board computer. This optimizes the Bill-of-Materials as it removes unwanted components including the headers between the former SoM and the former carrier board.

Integrated Solutions for Success

Success Story

4k Video Processing in FPGA with HDMI 2.0

Support a semi-custom / chip-down PCB design based on FPGA Modules (System-on-Module) from Trenz Electronic GmbH with HMDI Input and Output, hardware bring-up and testing, integrate and test HDMI 1.4/2.0 Rx and Tx subsystem (Xilinx PG236 v3.1 and PG235 v3.1) for 4k video processing, integrate Xilinx PetaLinux-based processing system for management and control.

Application Note

MLE NVMe FPGA Full System Stack for AMD Versal AI Edge

The NVMe Full System Stack enables seamless data streaming between the ARM cores on AMD Versal Edge FPGAs and NVMe SSDs via the PS PCIe Root Port with data rates from 1 GiB/s to 5.4 GiB/s, supporting dynamic, complex file system for multiple applications to access data.

Shift-Left Your FPGA Design Projects

FPGA Full System Stacks comprising off-the-shelf FPGA Modules (System-on-Modules, SoM) plus pre-validated FPGA IP Cores and subsystems can greatly accelerate the time-to-market of your FPGA design project.

News

From Software to Silicon: Accelerating Automotive In-Vehicle Network Protocols for Zonal Architectures

Oct 13, 2025

MLE presents “From Software to Silicon: Accelerating Automotive In-Vehicle Network Protocols for Zonal Architectures” at the “Driving the Future Symposium”

Oct 8, 2025

Zonal/SDV Architecture Exploration: From Whiteboard to Vehicle Demo in 9 Months

Jun 17, 2025

Request for the IP-integrated FPGA modules now!

Data Diodes

July 25, 2023

Smart Data Diodes

Data Diodes - Unidirectional Security Gateway

A data diode or unidirectional network bridge / unidirectional security gateway is a piece of hardware used to connect two separated networks with the purpose to allow data to travel only in one direction, specifically, from one network into another. Applications are found in high security environments where they connect two or more networks of differing security classifications while making it physically impossible to transfer data in the direction from the lower to the higher security classification.

MLE offers customizable FPGA-based Data Diodes for multi-Gigabit Ethernet!

For this, MLE has partnered with Fraunhofer HHI to provide the industry-proven TCP/UDP/IP Network Protocol Acceleration Platform (NPAP) in form of NPAC, a PCIe Network Protocol Accelerator Card with quad-port 10G Ethernet. NPAC-40G implements reliable high-bandwidth low-latency TCP/UDP/IP transport plus Linux PCIe stream device drivers and can run customizable In-Network Processing such as red/ black network separation functionality on the integrated FPGA subsystem.

Features and Benefits of Data Diodes

FHHL PCIe Card, PCIe 3.1 x8
4x SFP+ for 10 Gig Ethernet
Intel Stratix 10 GX 400 FPGA, hardened
Tx-only and Rx-only (data-diode) network paths disconnect at PCB level or at circuit level
Optional TCP/IP Tx-only or Rx-only (FPGA-integrated TCP endpoint)
Optional In-Network Processing for Deep Packet Inspection and/or Firewall
Optional access logging
Customizable, Ready-to-Run

Applications

Sending status Information from sensitive industrial plants
Sending video streams from sensitive video equipment / cameras
Protect classified data in high security networks and prevent it from leaking to low security networks, e.g. in defense
Critical Infrastructure and Industrial Internet of Things (IIoT)
- Power plants and nuclear power plants
- Power and water utilities and providers
- Oil and gas deployments
- Transportation, rail and air
Intelligence & Defense
- Data Center
- Tactical and removable media solutions
Commercial
- Financial services
- Manufacturing
- Cloud services
- Telecommunications providers
- Security Information and Event Management logs
- Intrusion Detection logs

Availability

MLE Data Diodes are available as a licensable full system stack or delivered as an integrated hardware/firmware/software solution in form of customizable FPGA-based Network Interface Cards (NIC) or as FPGA-based appliances.

Deliverables include:

Pre-configured PCIe Card, ready-to-run
Linux device drivers (GPL sources)
Application-specific expert design service (optional)
Appliance implementation (optional)

Documentation

Brochure

Application Note

Brochure

Application Note

Hardware

June 6, 2023Chip-Down Turnkey, Development Kit, FPGA boards, System-on-Modules

FPGA Hardware & Turnkey Systems

De-Risk System Designs With FPGA Full System Stacks (FFSS)

The difficulty in programming FPGAs, in particular those System-on-Chip (SoC) FPGA with embedded CPUs, has long been considered a disadvantage that prevents FPGA from becoming a general computation solution. However, integrated and pre-validated building blocks or FPGA hardware such as FPGA Modules combined with Compute, Video, Storage and Networking FPGA software subsystems significantly increase your productivity while shortening your time-to-market for new product initiatives.

MLE provides FPGA Full System Stacks integrating FPGA hardware like Starterkits, System-on-Modules (SoMs) and customized boards from Trenz Electronic. This gives you a low-risk design trajectory from a working proof-of-concept (POC) over a low to mid volume implementation with SoMs eventually into a custom board for mid to high volume cost-down. During every single phase you will receive expert FPGA support from MLE.

FPGA Hardware Solutions

FPGA System-on-Modules

From FPGA Starterkits over FPGA Boards and System-on-Modules (SoM) to off-the-shelf or customized carrier boards, the close partnership between MLE and Trenz Electronic get you faster and closer towards your FPGA target hardware.

FPGA HPC Cards

High-Performance Compute benefits from Domain Specific Architectures to accelerate applications using FPGAs to offload CPUs. Large FPGAs with many resources, wide PCIe interfaces and High-Bandwidth Memory (HBM2), and multi-hundred Gigabit network interfaces have become a must have.

partnered with

Trenz Electronic GmbH

Trenz Electronic GmbH and MLE have established close engineering collaboration and a track record of shipping integrated FPGA boards and solutions based on Trenz System-on-Modules (SoM) running MLE’s System Software Stacks with Compute, Video, Storage and/or Network Acceleration.

partnered with

Pro Design Electronic GmbH

PRO DESIGN Electronic GmbH and MLE have partnered to provide custom turnkey solutions based on MLE FPGA Full Accelerators for Networking, Storage and Video Image Processing integrated with Pro Design Accelerator Cards.

Function Accelerators

March 8, 2023

Network Function Accelerators, FACs, NICs and SmartNICs

NICs, SmartNICs, and Function Accelerator Cards with Network Accelerator

A Network Interface Card (NIC) is a component that connects computers via networks, these days mostly via IEEE Ethernet – but what makes a NIC a SmartNIC? How can FPGA Network Accelerator make it operate more efficiently and enhance its performance to deliver deterministic networking?

With the push for Software-Defined Networking, (mostly open source) software running on standard server CPUs became a more flexible and cost-effective alternative to custom networking silicon and appliances. However, in the post Dennard scaling area, server CPU performance improvements cannot keep up with increasing computational demand of faster network port speeds.

This widening performance gap creates the need for so-called SmartNICs. SmartNIC not only implement Domain-Specific Architecture for network processing but also offload host CPUs from running portions of the network processing stack and, thereby, free up CPU cores to run the “real” application.

According to Gartner, Function Accelerator Cards (FACs) incorporate functions on the NIC that would have been done on dedicated network appliances. Hence, all FACs are essentially NICs, but not all NICs/SmartNICs are FACs. When deployed properly, FACs can increase bandwidth performance, can reduce transport latencies and can improve compute efficiency, which translates to less energy consumption.

MLE has partnered with FPGA vendors, Fraunhofer Institutes and EMS partners to implement the FPGA Network Accelerator on FPGA-based FACs which deliver cost-efficient solutions for ultra-reliable, low-latency, deterministic networking.

Features of FPGA Network Accelerator

Ultra-Reliable, Low-Latency, Deterministic Networking

With ultra-reliable, low-latency, deterministic networking we have borrowed a concept from 5G wireless communication (5G URLLC) and have applied this to LAN (Local Area Network) and WAN (Wide Area Network) wired communication:

Ultra-Reliable means no packets get lost in transport
Low-Latency means that packets get processed by a FAC at a fraction of CPU processing times
Deterministic means that there is an upper bound for transport and for processing latency

We do this by combining the TCP protocol, fully accelerated (in FPGA or ASIC using NPAP), with TSN (Time Sensitive Networking) optimized for stream processing at data rates of 10/25/50/100 Gbps. These so-called TCP-TSN-Cores, the FPGA network accelerator, not only give us precise time synchronization but also traffic shaping, traffic scheduling and stream reservation with priorities.

We believe that FPGAs are very well positioned as programmable compute engines for network processing because FPGAs can implement “stream processing” more efficiently than CPUs or GPUs can do. In particular, when the networking data stays local to the FPGA fabric Data-in-Motion processing can be done within 100s of clock cycles (which is 100s of nano-seconds) and can be sent back a few 100 clock cycles later, an aspect with is referred to as Full-Accelerated In-Network Compute.

While FPGA technology has been on the forefront of Moore’s Law and modern devices such as AMD/Xilinx Versal Prime or Intel Agilex or Achronix Speedster7t can hold millions of gates, FPGA processing resources must be used wisely, when Bill-of-Materials costs are important. Therefore, at MLE we have put together a unique combination of FPGA and open-source software to achieve best-in-class performance while addressing cost metrics more in-line with CPU-based SmartNICs.

Unique and Cost-Efficient Combination of Open Source

The Open Source Technologies We Borrow From

Linux kernel

Meanwhile highly optimized for networking

OpenvSwitch

An open source multi-layer network switch

Corundum

A vendor-neutral open-source high-performance FPGA-based NIC

SONiC

Software for Opensource Networking in the Cloud

Dataplane Development Kit

DPDK

Intel Compiler for SystemC

An open source High-Level Synthesis engine

OpenNIC

The GitHub project focusing on AMD/Xilinx Alveo cards

Xilinx Vitis HLS LLVM

The High-Level Synthesis Frontend for Xilinx FPGAs

High-Level Synthesis plays a vital role in our implementation as it allows MLE and MLE customers to turn algorithms implemented in C/C++/SystemC into efficient FPGA logic which is portable between different FPGA vendors.

To build a high-performance FAC platform, portions of the above have been integrated together with proven 3rd party networking technologies:

NPAP, the Network Protocol Accelerator Platform which is a TCP/UDP/IP Full Accelerator that comes from Fraunhofer HHI
TSN, which is Time Sensitive Networking, a collection of IEEE Standards implemented by Fraunhofer IPMS

Corundum In-Network Compute + TCP Full Accelerator

Corundum is an open-source FPGA-based NIC which features a high-performance datapath between multiple 10/25/50/100 Gigabit Ethernet ports and the PCIe link to the host CPU. Corundum has several unique architectural features: For example, transmit, receive, completion, and event queue states are stored efficiently in block RAM or ultra RAM, enabling support for thousands of individually-controllable queues.

MLE is a contributor to the Corundum project. Please visit our Developer Zone for services and downloads for Corundum full system stacks pre-built for various in-house and off-the-shelf FPGA boards.

MLE combines the Corundum NIC with NPAP, the TCP/UDP/IP Full Accelerator from Fraunhofer HHI, via a so-called TCP Bypass which minimizes processing latency of network packets: Each packet gets processed in parallel by the Corundum NIC and by NPAP. The moment it can be determined that the packet shall be handled by NPAP (based on IP address and port number) this packet gets invalidated inside the Corundum NIC. If a packet shall not be processed by NPAP, it get’s dropped in NPAP and will solely be processed by the Corundum NIC.

Fundamentally, this implements network protocol processing in multiple stages: Network data which is latency sensitive does get processed using full acceleration, while all other network traffic is handled either by a companion CPU and/or by the host CPU.

Applications of FPGA Network Accelerator

MLE’s Network Accelerators are of particular value where network bandwidth and latency constraints are key:

Wired and Wireless Networking
Acceleration of Software-Defined Wide Area Networks (SD-WAN)
- Video Conferencing
- Online Gaming
- Industrial Internet-of-Things (IIoT)
Handling of Application Oriented Network Services
Mobile 5G User-Plane Function Acceleration
Mobile 5G URLLC Core Network Processing with TSN
Offloading OpenvSwitch (OvS), vRouter, etc

Key Benefits

The following shows the key benefits of MLE’s technology by comparing open-source SD-WAN switching in native CPU software mode against MLE’s FPGA Network Accelerator:

Compared with plain CPU software processing MLE’s Ultra-Reliable Low-Latency Deterministic Networking increases network bandwidth and throughput close to Ethernet line rates, in particular for smaller packets, which reduces the need for over-provisioning within the backbone. And, processing latencies can be shortened significantly which is important, for example, when delivering a lively audio/video conferencing experience over WAN.

Availability

MLE’s FPGA Network Accelerator is available as a licensable full system stack and delivered as an integrated hardware/firmware/software solution. In close collaboration with partners in the FPGA ecosystem, MLE has ported and tested variations of the stack on a growing list of FPGA cards. Currently, this list comprises high-performance 3rd party hardware as well as MLE-designed cost-optimized hardware:

FPGA Card	Hardware Description & Features	Status
	NPAC-Ketch, MLE-designed single-slot FHHL PCIe card Cost-optimized Intel Stratix 10 GX 400 FPGA Optional 4 GB DDR4 SO-DIMM attached to Programmable-Logic 4x SFP+ (4x 10 GigE)PCIe 3.1 8 GT/sec x8 lanes 50 Watts TDP passive cooling front-to-back	Available Inquire
	Alveo U280, AMD/Xilinx-designed dual-slot FHFL PCIe card AMD/Xilinx UltraScale+ FPGA 32 GB DDR4 DRAM plus 8 GB HBM2 DRAM 2x QSFP28 (2x 100 GigE or 4x 25 GigE or 8x 10 GigE) PCIe 4.0 16 GT/sec x8 lanes 225 Watts TDP active cooling	Early Access Inquire
	N6000-PL, Intel-designed single-slot FHHL PCIe card Intel Agilex AGF014 F Series FPGA 4x 4 GB DDR4 SO-DIMM attached to Programmable-Logic 2x QSFP28 (2x 100 GigE or 4x 25 GigE or 8x 10 GigE) PCIe 4.0 16 GT/sec x16 lanes ARM A53 Quad-Core CPU with 1 GB DDR4 DRAM running Linux 125 Watts TDP passive cooling	Early Access Inquire

Documentation

Brochure

Technical Brief

Brochure

Technical Brief

EMS & Turnkey Solutions

June 7, 2017

EMS & Turnkey Solutions

In close collaboration with our electronic engineering and manufacturing services partners MLE provides complete, integrated turnkey solutions including hardware / PCB design and manufacturing, systems / software / FPGA development and test.

Elemaster SpA

Elemaster (former CAD-UL Electronic Services GmbH) in Ulm, Germany, and MLE have a successful history of delivering FPGA-based Turnkey Systems for Automotive, Aerospace, Defense and Test & Measurement applications. Close proximity enables both teams to respond very shortly to customer’s special needs.

Inquire more Info

NVMe Fast FPGA RAID Recorder System

Key Features

Applications

NVMe FFRAID Recorder Turnkey System Availability

Bench-Top NVMe FFRAID Recorder System

19” Rackmount NVMe FFRAID Recorder System

Embedded Recording Systems

Exemplary Remote User Interface

MLE NVMe FFRAID vs Network-Attached Storage (NAS)

NVMe FFRAID Accelerator Subsystem

Channel-Based Architecture

Recording Capacity and Scalability

Data Acquisition Pre- and Post-Processing

Plain Recording, Loss-Less and Gapless

Data Proxy & Record

Data Decimation & Record

Adding Meta-Data & Record

NVMe FFRAID is Linux Compatible

“Simplex Record”

“Simplex Replay”

“Half-Duplex Record & Replay”

“Full-Duplex Record & Replay”

Availability & Pricing

Documentation

Frequently Asked Questions​

Automotive Rapid Prototyping System (Auto/RPS)

Features and Benefits

IO Interfaces

Processing Functionality

Dual-Core ARM Cortex-A72 Software Environment

Dual-Core ARM Cortex-R5F Software Environment (optional)

FPGA System Block Diagram

FPGA Development Kit

Auto/RPS-TE-0950-25G

Pricing and Availability

Documentation

FPGA Starter Kits

FPGA Modules (System-on-Modules)

Carrier Board Design for FPGA Modules

Chip-Down Turnkey Solutions

Integrated Solutions for Success

4k Video Processing in FPGA with HDMI 2.0

Request for the IP-integrated FPGA modules now!

Smart Data Diodes

Data Diodes - Unidirectional Security Gateway

Features and Benefits of Data Diodes

Applications

Availability

Documentation

FPGA Hardware & Turnkey Systems

De-Risk System Designs With FPGA Full System Stacks (FFSS)

FPGA Hardware Solutions

FPGA HPC Cards

Pro Design Electronic GmbH

Network Function Accelerators, FACs, NICs and SmartNICs

NICs, SmartNICs, and Function Accelerator Cards with Network Accelerator

Features of FPGA Network Accelerator

Ultra-Reliable, Low-Latency, Deterministic Networking

Unique and Cost-Efficient Combination of Open Source

Corundum In-Network Compute + TCP Full Accelerator

Applications of FPGA Network Accelerator

Key Benefits

Availability

FPGA Card

Hardware Description & Features

Status

Documentation

EMS & Turnkey Solutions

Frequently Asked Questions