CXL™ Use-cases Driving the Need For Low Latency Performance Retimers
CXL™ is the new open interface standard for latency sensitive processing in a data center. Highlights on the application details and use cases for moving data at 32GT/s are discussed in this article.
As the high-performance computing demands of data center workloads increase, a new class of interconnect standard and a new ultra-low-latency signal transmission technology are required to advance the performance in Artificial Intelligence (AI), Machine Learning (ML), Advanced Driver Assisted Systems (ADAS) and other computational workload applications. Data center operators are deploying an array of heterogeneous elements, including accelerators, into these applications. Heterogenous computing uses multiple types of processors sharing memory and other resources efficiently. Compute Express Link™ (CXL™) is a new open standard that delivers new memory coherency and resource sharing capabilities as an overlay on top of the PCIe® Gen 5.0 physical layer that will find initial deployment between high speed CPUs and other devices. CXL delivers a high speed and low-latency link between different components such as CPU, GPU, SoC (System on Chip), FPGA (Field Programmable Logic Array), memory and Network Interface Card (NIC) components. CXL is poised for rapid industry adoption because it leverages the PCIe physical and electrical interface that is pervasive in servers and components today.
The CXL 1.1/2.0 standard defines three protocols that enable ground-breaking new use models: They are the CXL.mem, CXL.cache, and CXL.io protocols – see Figure 1. CXL.cache enables an accelerator, like a GPUs or FPGAs, to coherently cache host memory locally, perform value added functions, and then return ownership of that memory to the CPU for further processing. CXL.mem enables CXL attached memory to appear to software in an identical manner as DDR attached memory by providing low latency, coherent memory semantics. CXL.io is a mandatory for all attached CXL components and replicates much of the standard including the PCIe stack with dynamic framing, transaction layer packet/data link layer packet but encapsulated in CXL format. CXL breaks new ground in providing access to the CPU memory subsystem with load/store semantics in a coherent and high-speed manner. Prior to CXL, accelerators must interrupt the CPU and access CPU’s DDR memory through the CPU’s IO MMU with much higher latency and overhead. Typical CXL access times to CPU memory resources are therefore reduced to DRAM type access times in the 10’s of nanosecond (ns) range, as opposed to 10’s of milliseconds (ms) range. This technology breakthrough supports the type of real time data processing required the data center workloads described previously.
While PCIe Gen 5 delivers the sizzling data rates necessary for these workloads, it also is a significant challenge at the physical layer level running at 32 Gbps. At this data rate, circuit traces’ length and connectors can degrade the signal quality up to a point where the bit error rates become excessively high necessitating the need for either very expensive board materials and additional layers or requiring a PCIe Gen 5 retimer at a manageable cost. Any additional component in these high speed CXL data path can introduce latency and lead to a significant degradation in workload performance.
Low Latency Retimer to the Rescue
Microchip’s XpressConnect™ retimer is the ideal solution to meet the requirements of latency sensitive PCIe Gen 5.0 / CXL 1.1/2.0 applications. PCIe 5.0/CXL retimers restore and extend the reach of 32Gbps signals. Retimers use clock-data-recovery (CDR) to sample the equalized PCIe signal and recover clock and data. Retimers therefore reset any jitter in the received signal and compensate for channel frequency loss by opening the eye of the waveform. A retimer is considered as an additional component in the data path – so low latency operation is a vital characteristic of any retimer. Microchip’s low latency XpressConnect retimer delivers on this requirement. XpressConnect retimers can triple the reach of the PCIe 5.0/CXL 1.1 signals and has a latency that is 80% lower than PCIe specification requirement, with pin-to-pin latency of less than 10ns.
Figure 2 illustrates the three CXL standard’s Type 1, Type 2, and Type 3 use cases where a low latency PCIe 5.0/CLX retimer can help to maintain signal integrity. The first use case illustrates how a caching device or accelerators in Partitioned Global Address Space (PGAS) network interface card. The second case is an accelerator with memory used in a GPU accelerator application. The Type 3 use case illustrates how CXL memory buffer support memory bandwidth and capacity expansion.
CPU devices supporting PCIe Gen5 and CXL 1.1 flexibly auto-negotiate on a per port basis to either the PCIe 5.0 protocol or CXL protocol during link training. This enables end users to plug in standard PCIe 5.0-only add-in cards or CXL add-in cards into the same CEM connector as shown in Figure 3. A PCIe 5.0/CXL low latency retimer guarantees the signal quality for a wide range of large and complex topologies.
In summary, PCIe Gen5/CXL system applications can use Microchip’s XpressConnect PCIe 5.0/CXL low latency retimers to extend and maintain signal integrity for heterogeneous processing in data center server and storage disaggregation applications. These Memory and accelerator applications require low latency retimers for maximum performance.