Please fill in the form below, so we can support you in your request.
Please fill in the form below, so we can support you in your request.

    ASICAMD (Xilinx)AchronixIntel (Altera)LatticeMicrochip (MicroSemi)Other

    We are glad that you preferred to contact us. Please fill our short form and one of our friendly team members will contact you back.

      ASICAMD (Xilinx)AchronixIntel (Altera)LatticeMicrochip (MicroSemi)Other


      Molex Mini-Fit Jr – Know your Power, and Don’t Get Confused – “PCIe” and “Xilinx Not PCIe” Power Connector

      Our Mission: If It Is Packets, We Make It Go Faster – today more on PCIe, this time, Powering PCIe Cards with FPGAs…

      And with packets we mean: Networking using TCP/UDP/IP over 10G/25G/50G/100G Ethernet; PCI Express (PCIe), CXL, OpenCAPI; data storage using SATA, SAS, USB, NVMe; video image processing using HDMI, DisplayPort, SDI, FPD-III.

      To move packets you need power, but how much can you draw from an interface?

      PCI Express Cards Consumption Limit

      PCIe has been around for a long time, since 2003, and many people know the maximum power draw of 75W per PCIe slot, but is it that simple? The short answer is no, but let’s have a look into the spec. It states that all cards can consume up to 3A on the 3.3 V power rail with the following restrictions applying:

      • x1 cards: 0,5A on 12V, but overall consumption limit is 10W
      • x4 – x16 cards: 2,1A on 12V, but overall consumption limit is 25W 

      But where does the 75W come from? Well, exceptions apply:

      High power devices can draw more power after the initialization and configuration:

      • x1 cards can consume 25W 
      • x16 cards can consume 75W

      Using Additional Connectors for Higher PCIe Card Consumption

      With additional connectors (6pin and 8 pin) cards can consume up to 300W. A 6 pin adds 75W while a 8 pin adds 150W.

      But be careful, in the FPGA world there are two versions of the 6/8 pin Molex Mini-Fit Jr connectors, the Xilinx and the PCIe option. 

      The 6 Pin option has 2 pins (3 and 4) swapped. If you mix up the connectors 12V will directly connect to Ground, this can destroy your device.

      6 Pin Power Connector
      Pin PCIe Spec AMD/Xilinx Dev Boards
      1 +12 V +12 V
      2 +12 V +12 V
      3 +12 V Ground
      4 Ground +12 V
      5 Sense Sense
      6 Ground Ground

      The 8 Pin option the changes are less subtle, all pins have a different function and cause an electrical short.

      8 Pin Power Connector
      Pin PCIe Spec AMD/Xilinx Alveo Cards
      1 +12 V Ground
      2 +12 V Ground
      3 +12 V Ground
      4 Sense1 Ground
      5 Ground +12 V
      6 Sense0 +12 V
      7 Ground +12 V
      8 Ground +12 V

      A Deep Dive into AMD/Xilinx AXI Bridge for PCI Express (AMD/Xilinx PG194) and Why We Tweaked C_M_AXI_NUM_READQ

      Executive Summary

      AMD/Xilinx’ AXI Bridge for PCI Express (PG194) implements a bi-directional communication channel from and to FPGA internal memory mapped AXI4 masters and slaves to and from external PCIe connected memory mapped devices, with the FPGA operating as PCIe endpoint or root port.

      In many scenarios the performance of forwarding communication between the two protocols, AXI4 and PCIe, is sufficient and the AMD/Xilinx IP core can be used as well. However, in certain cases tweaking is necessary to achieve the expected throughput.

      Depending on the amount of extra performance required the modification ranges from simply tuning a hidden parameter to patching the IP’s HDL sources. In the example project used for this description the PCIe peer to peer (P2P) write performance from an FPGA to a 12 NVMe SSD RAID0 increased from 2.700 MiB/s to 4.900 MiB/s to 8.600 MiB/s.

      AMD/Xilinx AXI Bridge for PCI Express Overview

      The AMD/Xilinx AXI Bridge for PCI Express is implemented differently for different AMD/Xilinx FPGA families. This description focuses on the “AMD/Xilinx DMA//Bridge Subsystem for PCI Express in AXI Bridge mode” implementation as found in AMD/Xilinx UltraScale+ devices.

      The bridge IP core comprises the actual bridging logic, converting PCIe TLPs to AXI4 transactions and vice versa, and a PCIe IP core implementing the physical PCIe interface and forwarding PCIe TLPs via multiple AXI4-Stream interfaces. The PCIe IP core for Ultrascale+ is described in PG213. The bridging logic has two main and distinct blocks, the so-called Slave Bridge and the so-called Master Bridge. The Slave Bridge receives AXI4 memory mapped transactions from FPGA internal AXI4 master IP cores and converts them to PCIe request TLPs, operating as a PCIe bus master or “Requester”. The Master Bridge on the other hand receives PCIe request TLPs as a “Completer” and converts them to AXI4 memory mapped transactions targeting FPGA internal AXI4 slave IP cores.

      For this description we will only look at the Master Bridge and specifically only at PCIe read request TLPs resulting in AXI4 reads initiated by the AMD/Xilinx AXI Bridge for PCI Express. The AMD/Xilinx PG213 PCI Express IP interfaces involved are the two AXI4-Stream interfaces CQ (Completer reQuest) and CC (Completer Completion). Remember, for these kinds of transfers the FPGA is the Completer of PCI traffic, and it first receives a PCIe Read Request TLP on CQ, and then responds with a PCIe Completion TLP on CC.

      The CQ interface of the AMD/Xilinx PG213 PCIe IP core has some sideband signals independent of the AXI4-Stream interface to handle the flow of TLPs. This is necessary to cope with some of the PCIe ordering rules. Specifically PG213 allows connected logic, the AMD/Xilinx PG194 AXI bridge logic in this case, to hold off any PCIe Read Request TLPs (called non-posted requests or np) and instead receive PCIe Write Request TLPs that arrived at the PCIe core later but can now skip the line and move forward in the receive queue. The AMD/Xilinx PG213 PCIe IP core implements a flow control mechanism at its AXI4-Stream CQ interface to achieve this. This flow control is not to be confused with PCIe flow control, and the credits mentioned in this context are not PCIe flow control credits, although the concept is very similar.

      The AXI4-Stream CQ slave, the Master Bridge logic in this case, can provide up to 32 credits for non-posted requests to the AMD/XIlinx PG213 PCIe IP Core. Each non-posted request consumes one credit when sent over the CQ interface. And each assertion of the pcie_cq_np_req two-bit signal grants one or two credits depending on the signal’s value. The AMD/Xilinx PG213 PCIe IP Core maintains an internal counter named pcie_cq_np_req_count to track the number of available credits. As mentioned, each non-posted request decrements the counter and pcie_cq_np_req assertions increment it. The counter can assume values of 0 to 32. If the counter hits 0, no non-posted requests are sent over the CQ interface any more and the AMD/Xilinx PG213 PCIe IP Core instead provides posted requests if available.

      Performance Limitations in certain Scenarios

      The AMD/Xilinx PG194 Master Bridge by default only provides a maximum of 2 credits for non-posted requests to the AMD/Xilinx PG213 PCIe IP Core. These two credits are initially granted as a baseline and then it seems that there is one credit grant following each non-posted request reception. Since there is latency involved in the loop from request reception to granting the credit to the AMD/Xilinx PG213 incrementing the credit counter and then to finally providing another non-posted request based on the counter update, this basically implements a non-posted request rate limit. In combination with small non-posted requests this can significantly limit the achievable bandwidth. In one example of 12 NVMe SSDs RAID0 reading from the FPGA, bandwidth was limited to 2.700 MiB/s for a PCIe Gen3 x16 link, which is far less than expected in this setup.

       In the below screenshot of the Vivado ILA debugger it can be seen that two read transactions follow each other closely, are then followed by a pause, and then another two close by read transactions. This pattern continues and obviously the AXI4 possible bandwidth (matching the PCIe bandwidth) is not fully utilized.

      Looking at the simulation of this same setup, the root cause can be seen. The AMD/Xilinx PG194 Master Bridge tries to maintain a pcie_cq_np_req credit level of 2, as mentioned above. These two credits are rapidly consumed by two consecutive PCIe non-posted read requests. After receiving each of them the AMD/Xilinx PG194 Master Bridge grants a new credit soon thereafter by asserting pcie_cq_np_req, but the latency of the path described above until the next request is provided is just too high to prevent a long pause. The pcie_cq_np_req_counter drops to 0 and transfers are throttled.

      The setting defining the pcie_cq_np_req credit level initially granted and maintained by the AMD/Xilinx PG194 Master Bridge is configured by the hidden AMD/Xilinx PG194 configuration parameter C_M_AXI_NUM_READQ (not to be confused with C_M_AXI_NUM_READ). As mentioned, it defaults to 2. This is also the only value officially supported by AMD/Xilinx (which is why the parameter is hidden). However, one other value, 4, can be set via TCL. Doing so bumps the NVMe SSD RAID0 bandwidth to 4.900 MiB/s.

      set_property CONFIG.C_M_AXI_NUM_READQ 4 [get_ips xdma_0]

      A further increment is not possible via TCL. Instead, the HDL has to be patched to use a value outside the TCL allowed range. An increment to 8 increased the bandwidth of our example setup to 8.600 MiB/s, which is very close to the expected, as the pattern generator providing the SSDs DMA data is limited to 9.000 MiB/s.

      set hdl_synth_top \
      [get_files -all -used_in synthesis \
      -compile_order sources \
      -of [get_ips xdma_0]]
      package require fileutil
      ::fileutil::updateInPlace $hdl_synth_top \
      [list string map \
      [list {.C_M_AXI_NUM_READQ(2),} \

      Learn more about our capabilities in PCI Express (PCIe) Connectivity to achieve extra performance.

      Stop Wobbling Around – A Salute From The MLE Embedded Lab

      Our Mission: If It Is Packets, We Make It Go Faster – today, how to properly mount those large FPGA-based PCIe Cards … theory and practice!

      Everyone working with evaluation hardware knows the struggles: On its own it works, but when you assemble an entire setup you have many loosely or weakly connected parts. This easily can damage a part or much worse, you spend an entire day debugging. 

      To overcome this issue, MLE sometimes gets creative and designs additional carriers to securely mount extension cards or additional hardware. 

      As a small example, the following images show a 3D Model of how we securely mount additional Hardware to the Xilinx ZCU102:  

      • SD-Mux, which we use to swap SD-Card images
      • PCIe Card

      Here’s the theory:

      The stabilizations for PCIe and SD-Mux are 3D printed by our engineers (yes they have some neat toys at home). This helps MLE to have a clean stable lab setup.

      In this case we tested a SAS 12G storage controller on the Xilinx ZCU102.  

      And the first prototype looks like this:

      If you want to enhance your Lab as well, we give you a kickstart. You can find our printable 3D Models at thingiverse:

       What do you do to make your work easier/better in the lab?

        Learn more about our PCIe IP core offerings.