# Langzeitverfügbarkeit mit All-Programmable System-on-Chips

Dr. Endric Schubert, Missing Link Electronics



# **Backgrounder Endric Schubert**

- 1991 Dipl.-Ing. ET, Univ. Karlsruhe
- 1996 Dr. rer. nat. Wilhelm-Schickard-Institut f. Informatik, Univ. Tübingen
- 1996 1998 Advanced Technology Group, Exemplar Logic / Mentor Graphics, EDA for FPGA & ASIC (RTL Synthesis)
- 1999 2003 CTO, Bridges2Silicon, Inc. (EDA, System-on-Chip Debug)
- 2003 2007 CTO, ESIC Solutions (Technology Advisory for EDA & Embedded Systems Design)
- Since 2003 Guest Lecturer at Institute for Microelectronics, Univ. of Ulm
- Since 2008 CTO, Missing Link Electronics (Design Services, Configurable Systems)
- 50+ Technical presentations, inventor on 20+ patents





# **Missing Link Electronics**

- Founded 2008
- Silicon Valley HQ
- 12 Engineers in Neu-Ulm, GER
- Expertise

- FPGA & Linux
- I/O Connectivity, High-Speed SerDes
- Acceleration of Algorithms & Protocols
- Heterogeneous Multi-Core Processing



Bob Gardner EDA



Endric Schubert FPGA Technology



Sebastian Stiemke Automotive / Industrial



Bob Barker Semiconductors



# **Missing Link Electronics – Design Services**





## What We Really Do





# **Missing Link Electronics**

Technology Partners





## **IEEE Spectrum**

#### SPECIAL REPORT: 50 YEARS OF MOORE'S LAW

The glorious history and inevitable decline of one of technology's greatest winning streaks

🗄 Share | 🖂 Email | 🖂 Print | 🖉 Reprint



Fifty years ago this month, Gordon Moore forecast a bright future for electronics. His ideas were later distilled into a single organizing principle—Moore's Law—that has driven technology forward at a staggering clip. We have all benefited from this miraculous development, which has forcefully shaped our modern world.

In this special report, we find that the end won't be sudden and apocalyptic but rather gradual and complicated. Moore's Law truly is the gift that keeps on giving—and surprising, as well.



The Multiple Lives of Moore's Law

Why Gordon Moore's grand prediction has endured for 50 years SOME



The Death of Moore's Law Will Spur Innovation

As transistors stop shrinking, open-source hardware will have its



Moore's Law Might Be Slowing Down, But Not Energy Efficiency

Miniaturization may be tough, but there's still room to drive down power consumption in modern computers



Gordon Moore: The Man Whose Name Means Progress The visionary engineer reflects on 50 years of Moore's Law



day

## Moore's Law in Two Pictures

1958 IC - 1 Transistor (Jack Kilby's first IC)







## Moore's Law – 50 years and counting





## Moore's Law – 50 years and counting ... but





# Hardware vs. Software Data Processing

Sequential Processing with CPU

• C, C++ Program

Parallel Processing with Logic Gates

• VHDL, Verilog "Program"





Courtesy: Dr. Andre DeHon, UPenn



# **Choices for Implementing Embedded Systems**





# What is an FPGA (Programmable Logic)?

- A Field-Programmable Gate Array (FPGA) is an integrated circuit designed to be configured by the customer or designer <u>after</u> manufacturing—hence " fieldprogrammable" (Wikipedia)
- In their simplest form FPGAs contain:
  - Configurable Logic Blocks
    - AND, OR, Invert & many other logic functions
  - Configurable interconnect
    - Enabling Logic Blocks to be connected together
  - I/O Interfaces
- With these elements an arbitrary logic design may be created





# FPGAs - Circa 1990

## Common Use Cases:

- Glue Logic
- Simple State Machines
- Prototyping

## Pro: Easy to use

- Logic that could be connected Like LEGO blocks
- Con: Resource limited
- Many FPGAs needed to implement one single CPU





# **Convergence of Processing Systems**





## **Convergence of Processing Systems**





2015-09

## **Multiple FPGA Vendors With Integrated CPUs**







# FPGA as All-Programmable Embedded System-on-Chip

- Programmable I/Os (LVTTL, LVDS, High-Speed SerDes)
- Programmable logic functions (State machines and dataflow)
- Programmable block interconnect (Buses and Network-on-Chip)
- Programmable Fixed-Function Processing (Ethernet MAC, Video Codecs)
- Programmable CPUs (for software processing with or w/o Operating Systems)





# **FPGA Vendors Fully Support Embedded and Safety**





## **FPGA: Software-Defined Printed Circuit Boards**



Integrating individual microcontroller, DSP, ADC/DAC and I/O controller devices into one single FPGA-based System-on-a-Chip gives more flexibility for hardware changes without re-spinning a new PCB.



# **Programmable Connectivity**

Different I/O Standards & Protocols

- Single-ended LVTTL
- Differential LVDS
- High-Speed SerDes Transceivers
- Multi-Protocol
  - SPI, IIC, MMC, ...
  - CAN, LIN, FlexRay, ...
  - PCle Gen1, Gen 2, Gen 3, Gen 4
  - SATA-6G, SAS-12G, UFS-12G, ...
  - USB, Ethernet, ...
- Analog I/O via hard blocks or Soft Analog Sigma-Delta Modulators





# **FPGA Performance Chart**

#### Data from <a href="http://www.xilinx.com/products/silicon-devices/fpga.html">http://www.xilinx.com/products/silicon-devices/fpga.html</a>

#### FPGA Comparison Table

|                                                     | Kintex-7       | Virtex-7       | Kintex<br>UltraScale | Kintex<br>UltraScale+ | Virtex<br>UltraScale | Virtex<br>UltraScale+ |
|-----------------------------------------------------|----------------|----------------|----------------------|-----------------------|----------------------|-----------------------|
| Logic Cells (K)                                     | 478            | 1,955          | 1,161                | 915                   | 4,433                | 2,863                 |
| UltraRAM (Mb)                                       |                |                | -                    | 36.0                  | -                    | 432.0                 |
| Block RAM (Mb)                                      | 34             | 68             | 76                   | 34.5                  | 132.9                | 94.5                  |
| DSP Slices                                          | 1,920          | 3,600          | 5,520                | 3,528                 | 2,880                | 11,904                |
| DSP Performance (symmetric FIR)                     | 2,845<br>GMACs | 5,335<br>GMACs | 8,180 GMACs          | 6,287 GMACs           | 4,268 GMACs          | 21,213 GMACs          |
| Transceiver Count                                   | 32             | 96             | 64                   | 76                    | 120                  | 128                   |
| Maximum Transceiver Speed (Gb/s)                    | 12.5           | 28.05          | 16.3                 | 32.75                 | 30.5                 | 32.75                 |
| Total Transceiver Bandwidth (full<br>duplex) (Gb/s) | 800            | 2,784          | 2,086                | 2,478                 | 5,886                | 8,384                 |
| Memory Interface (DDR3 )                            | 1,866          | 1,866          | 2,133                | 2,133                 | 2,133                | 2,133                 |
| Memory Interface (DDR4)                             | -              | -              | 2,400                | 2,667                 | 2,400                | 2,667                 |
| PCI Express®                                        | x8 Gen2        | x8 Gen3        | x8 Gen3              | x8 Gen 4<br>x16 Gen 3 | x8 Gen3              | x8 Gen 4<br>x16 Gen 3 |



# Power vs. Performance of All-Programmable SoCs

### SoC FPGA as (yet) another computer

|         | Intel<br>i7-4770 | Xilinx<br>Zynq 7045              |  |  |  |
|---------|------------------|----------------------------------|--|--|--|
| Compute | ~100 GFLOPS      | 5 GFLOPS (PS)<br>778 GFLOPS (PL) |  |  |  |
| TDP     | 84 W             | <20 W (typ)                      |  |  |  |

SOC FPGA has 4x more compute With ¼ the power dissipation!



[http://www.xilinx.com/products/technology/dsp.html]



# **Predictable Architectures For Higher Performance**

Current architecture limits maximum performance to total DMA bandwidth.

Separate control flow and dataflow for higher bandwidth via FPGA-based inline processing, integrates NIC into FPGA fabric.





# **Coprocessors Enable Acceleration of Your Software**

 Direct connection between processor and a soft coprocessor

- Provides offloading of processing task
- Enables Dramatic
   Performance
   Improvements

Page





# **Accelerator Options: Attached as Slave**

- Pro: Simple System Architecture, Simple Register Interface
- Con: Limited communication bandwidth





Page

# Accelerator Option: Attached as a Master (High Performance Port to Memory)

- Pro: High Data Bandwidth
- Con: Increased Design Complexity, Increased Latency





# Accelerator Option: Attached as a Master (Coherent DMA to Level-2 Cache)

- Pro: Low latency, high data bandwidth for short bursts
- Con: Increased design complexity, adverse caching effects on SW





# Accelerator Option: ARM Built-in NEON Engine

- 4096 point FFT Complex 32 bit floating point
  - ARM processor alone 830 usec
  - NEON SIMD engine 571 usec
  - Hardware in PL fabric 129 usec

## 45% FFT Acceleration Using NEON Instructions and ARM NE10 DSP Library

## **6.4x FFT Acceleration Using ACP Attached Accelerator**



# FPGAs Today (2015)

Common Use Cases:

- Complete Embedded Processing in Integrated Systems-on-Chip
- High-Performance Computing, DSP, Terabit Packet processing

Pro: Lots of resources

- Many CPUs fit into one single FPGA Con: Expert programming skills needed
- I/O standards & protocols in High-Speed SerDes, HW-SW-Interfaces for parallel processing





# **Design and Verification for FPGAs - I/O Programming**

|                       | GTH_            | x1Y12                     | GTH                    | _X1Y13   |           | GTH_                      | X1Y14            |        | GTH_X1Y15                            |        |     |  |
|-----------------------|-----------------|---------------------------|------------------------|----------|-----------|---------------------------|------------------|--------|--------------------------------------|--------|-----|--|
| MGT Link Status       | 2.996 Gbps      |                           | 3.0 Gbps<br>CPOLLOCKED |          |           | 2.996 Gbps<br>CPLL LOCKED |                  |        | 3.0 Gbps                             |        |     |  |
| PLL Status            |                 |                           |                        |          |           |                           |                  |        | CPLL LOCKED                          |        |     |  |
| Loopback Mode         | Near-End PUS    |                           | V (Near-End PICS       |          | •         | Near-End PMA              |                  |        |                                      |        |     |  |
| - Channel Raset       | Re              | set                       | ~                      | eset     | Reset     |                           |                  | Reset  |                                      |        |     |  |
| TXRX Reset            | TX Reset        | RX Reset                  | TX Reset               | Rx Reset |           | TXReset                   | RX Reset         |        | TX Reset                             | RX Res | et. |  |
| TX Polarity Invert    | 5               | 1                         |                        |          |           | 1.5                       | 1                |        |                                      |        |     |  |
| TX Error Inject       | tnji            | ect                       | Inject                 |          |           | Inject                    |                  |        | Inject                               |        |     |  |
| TX Diff Output Swing  | (250 mV (0000)  |                           | [250 mV (0000)         |          | Ŧ         | (250 mV (0000)            |                  | -      | [250 mV (0000)                       |        |     |  |
| TX Pre-Cursor         | U.00 dH (00000) |                           | 0.00 dB (00000)        |          | ¥         | 0.00 dH (00000)           | 0 dB (00000) 🕞 💌 |        | <ul> <li>[0.00 dB (00000)</li> </ul> |        |     |  |
| TX Post-Cursor        | 0.00 49 (00000) |                           | 0.00 dB (00000)        |          | -         | 0.00 dB (00000)           | 001              |        | <ul> <li>[0:00 GR (00000)</li> </ul> |        |     |  |
| RX Polarity Invert    | 6               | 1                         |                        |          |           | 5                         | 1                |        |                                      |        |     |  |
| Termination Voltage   | GNU             |                           | [GND]                  | 3        | ¥         | GND 💌                     |                  | GNU    |                                      |        |     |  |
| RX Common Mode        | aloo mv         |                           | Wn 008                 |          | *         | 100 mv                    |                  | Vm 008 |                                      |        |     |  |
| BERT Settings         |                 |                           |                        |          |           |                           |                  |        |                                      |        |     |  |
| TX Data Pattern       | PHES / oit      |                           | PHES /-bit             |          | •         | PKBS 7-bit                |                  | ٠      | PRES 7-bit                           |        |     |  |
| RX Data Pattern       | PHES /-bit      |                           | PRBS 7-bit             |          | •         | PHBS 7-bit                |                  | *      | PRES 7-04                            |        |     |  |
| RX Bit Error Radio    | 2.379           | E-002                     | 235                    | 4E-012   |           | 2.379                     | 2.379E-002       |        | 2.889E-011                           |        |     |  |
| RX Received Bit Count | 4.799           | 9E011 4 247E011 3.869E010 |                        |          | 3.462E010 |                           |                  |        |                                      |        |     |  |
| RX Bit Error Count    | 1.142           | E010                      | 0.0                    | 006000   |           | 9 203E008                 |                  |        | 0.000E000                            |        |     |  |
| BERTReset             | Re              | set                       | Reset Reset            |          |           | Reset                     |                  |        |                                      |        |     |  |
| Clocking Settings     | -               | ~                         |                        |          |           |                           |                  |        |                                      |        |     |  |
| TXUSRCLK Freq (MHz)   | 93              | 93.77 93.77 93.77         |                        | 93.77    |           |                           |                  |        |                                      |        |     |  |
| TXUSRCLK2 Freq (MHz)  | 93.             | 77                        | 9                      | 3.77     |           | 93.77                     |                  | 93.77  |                                      |        |     |  |
| RXUSRCLK Freq (MHz)   | 93.             | 65                        | .9                     | 3.77     |           | 93                        | 65               |        | 90                                   | 1.77   |     |  |
|                       | ×               |                           |                        |          |           |                           |                  |        |                                      | -      |     |  |



# **Design and Verification for FPGAs - I/O Verification**

### **Bit Error Ratio**





# **Design and Verification for FPGAs – Digital Logic Design**

- Typically Hardware Description Languages (HDL) are used such as Verilog and VHDL.
- Designer must describe all 4 dimensions: functionality, structure, parallelism, timing



```
ENTITY counter IS
  PORT(count val: OUT integer;
  clk: INOUT BOOLEAN);
END ENTITY counter;
ARCHITECTURE proc OF counter IS
  SIGNAL cnt: integer;
BEGIN
  p: PROCESS
  BEGIN
    WAIT ON clk event and clk=1:
    cnt <= cnt+1;
  END PROCESS p;
  count val <= cnt;
END ARCHITECTURE proc;
```



# **High-Level Synthesis Design Flow for SoC FPGA**

• Input C/C++/SystemC into High-Level Synthesis to generate VHDL/Verilog code





# Working Principles of High-Level Synthesis

• Design automation runs scheduling and resource allocation to generate RTL code comprising data path plus state machines for control.





# **Benefits of High-Level Synthesis**

• Automatic performance optimization via parallelization at dataflow level



 Automatic interface synthesis and code generation for variety of real-life HW/SW connectivity

| Bus Interfaces |      |        | Argument | Variable       |               |     | Pointer<br>Variable |                    |      | Array |                      |        | Reference<br>Variable |                   |      |   |
|----------------|------|--------|----------|----------------|---------------|-----|---------------------|--------------------|------|-------|----------------------|--------|-----------------------|-------------------|------|---|
|                |      |        | Туры     | Pet            | Perceby value |     |                     | Pass by television |      |       | Pase by<br>reference |        |                       | Pass by reference |      |   |
| Stream         | Lite | Master | 1        | Interface Type | 1             | 10  | 0                   | 1                  | 10   | 0     | 1                    | 10     | 0                     | 4                 | 10   | 0 |
|                |      |        | 4000     | ap_none        | Ð             |     |                     | Ð                  |      |       |                      |        |                       | D                 |      |   |
|                |      |        | 4000     | ap_stable      | 1             |     |                     |                    |      |       |                      |        |                       |                   |      |   |
|                |      |        | 4000     | ap_ack         |               |     |                     |                    |      |       |                      |        |                       |                   |      |   |
|                |      |        | -        | ag_vid         |               |     |                     |                    |      | D     |                      |        | -                     |                   |      | D |
|                |      |        | 400      | ap_avid        |               |     |                     |                    | D    |       |                      |        |                       |                   | D    |   |
|                |      |        |          | ap_hs          |               |     |                     |                    |      |       |                      |        |                       |                   |      |   |
| _              |      |        | 4000     | ap_memory      |               |     |                     |                    |      |       | D                    | D      | D                     |                   |      |   |
|                |      |        | 4000     | ap_filo        |               |     |                     |                    | 1-1  |       |                      | 12-17  |                       |                   | 1000 |   |
|                |      |        |          | ap_bus         |               |     | _                   |                    |      |       |                      |        |                       |                   |      |   |
|                | -    |        |          | ap_ctri_none   |               |     |                     |                    |      |       |                      |        |                       |                   |      |   |
|                |      |        | 4000     | ap_ctrl_ha     |               |     | D                   |                    |      |       |                      |        |                       |                   |      |   |
|                |      |        |          | ap_stri_chain  |               |     |                     | 1                  | 1    |       |                      |        |                       |                   |      |   |
| -              | 1    |        |          | S              |               | Sup | ported              | Inter              | lace |       | Une                  | upport | ed int                | terface           | 0    |   |



# Modern FPGAs Enable On-Chip-Debug and Verification

- FPGA is not the DUT!
- FPGA can be the DUT plus the TestBench plus extra on-chip debug
- With on-chip logic analyzers, or onchip custom debug circuitry, you can analyze and fix your DUT without messy extra hardware setups!





# Agile Design and Verification for Modern FPGAs

| Abstraction Layer                | Example                                   | Design                                                     | Verification                                    |  |  |  |
|----------------------------------|-------------------------------------------|------------------------------------------------------------|-------------------------------------------------|--|--|--|
| Board Level                      | PCB, chipsets,<br>interfaces, media, etc. | PCB, System Design                                         | Rapid Prototyping<br>In System Debugging        |  |  |  |
| Electronic System<br>Level (ESL) | , , , , ,                                 |                                                            | System C models,<br>Bus Functional Models       |  |  |  |
| Functional Blocks                | H.264, FEC, AES                           | In-house or 3rd party IP-<br>Core, High-Level<br>Synthesis | Debug, HighLevel SIM,<br>Co-Simulation          |  |  |  |
| Digital Logic                    | FSM, control- and dataflow                | VHDL, Verilog,<br>SystemVerilog                            | RTL Simulation, Logic<br>Analyzer               |  |  |  |
| Ι/Ο                              | LVTTL, LVDS, MGT                          | VHDL, Verilog, Dynamic Reconfiguration Ports               | Eye diagrams, Network<br>Analyzer, Oscilloscope |  |  |  |



# **Contact Information**

Missing Link Electronics www.MLEcorp.com

Endric Schubert <u>endric@MLEcorp.com</u> Phone US: +1 (408) 320-6139 Phone DE: +49 (731) 141149-66



