Report Files ---------------------- LegUp Report ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After compiling software to hardware, LegUp generates a summary report file (``reports/summary.legup.rpt``) to show some insights about the generated RTL circuit, such as the top-level module interface, scheduling information, memory usage and etc. The following will explain each section of the report file. RTL Interface Section ++++++++++++++++++++++ The RTL Interface section shows the interfaces used by the top-level module. Below is an example of the RTL interface report table: .. code-block:: text +---------------------------------------------------------------------------------------------+ | RTL Interface Generated by LegUp | +--------------+-------------------+--------------------+------------------+------------------+ | C++ Name | Interface Type | Signal Name | Signal Bit-width | Signal Direction | +--------------+-------------------+--------------------+------------------+------------------+ | | Control | clk | 1 | input | | | | finish | 1 | output | | | | ready | 1 | output | | | | reset | 1 | input | | | | start | 1 | input | +--------------+-------------------+--------------------+------------------+------------------+ | input_a_fifo | Input AXI Stream | input_a_fifo_ready | 1 | output | | | | input_a_fifo_valid | 1 | input | | | | input_a_fifo | 16 | input | +--------------+-------------------+--------------------+------------------+------------------+ The table shows the interface for each top-level function argument (or global variable accessed by both the SW testbench and the top-level function). * The first column of the table shows the name of the argument or global variable. * The second column shows the interface types used for this argument or global variable. * The last three columns list the names, bit-widths, and directions for all the signals that are included in the interface. For example, the AXI Stream interface has three signals in this case, `input_a_fifo` for the 16-bit data, with associated valid and ready signals. Note that the `Control` interface is the standard module control interface that is always used by any LegUp-generated module, and hence there is no C++ name attached to it. For more details about the RTL interface, please refer to :ref:`rtl_interface`. Scheduling Result ++++++++++++++++++++++ The scheduling result section primarily shows the cycle latency of each basic block inside each function. A basic block is a collection of instructions that always run together. Below is an example table: .. code-block:: text Basic Block Latencies: +-------------------------------------------------+ | Function: sobel_filter (non-pipelined function) | +--------------------------------+----------------+ | Basic Block | Cycle Latency | +--------------------------------+----------------+ | %entry | 1 | | %for.cond1.preheader | 1 | | %for.body3 | 2 | | %for.body3.for.inc54_crit_edge | 1 | | %for.cond14.preheader | 9 | | %for.inc54 | 2 | | %for.inc57 | 1 | | %for.end59 | 1 | +--------------------------------+----------------+ LegUp's `Schedule Viewer` is more helpful in this case to visualize the scheduling and the control-flow between basic blocks. Pipeline Result ++++++++++++++++++++++ The pipeline result section reports the initiation interval, pipeline length, iteration count, and latency for each pipelined loop or function. .. code-block:: text +-------------+------------------------+---------------+-----------------------------------+---------------------+-----------------+-----------------+---------+ | Label | Function | Basic Block | Location in Source Code | Initiation Interval | Pipeline Length | Iteration Count | Latency | +-------------+------------------------+---------------+-----------------------------------+---------------------+-----------------+-----------------+---------+ | LoopConv | Pthread_Conv1(void*) | %for.body26.i | line 95 of ./digit_recognition.h | 1 | 5 | 9 | 13 | | LoopMaxpool | Pthread_Maxpool(void*) | %for.body15.i | line 156 of ./digit_recognition.h | 1 | 4 | 4 | 7 | | LoopConv_1 | Pthread_Conv2(void*) | %for.body26.i | line 95 of ./digit_recognition.h | 1 | 5 | 9 | 13 | | LoopConv_2 | Pthread_Conv3(void*) | %for.body26.i | line 95 of ./digit_recognition.h | 1 | 5 | 9 | 13 | | LoopConv_3 | Pthread_Conv4(void*) | %for.body26.i | line 95 of ./digit_recognition.h | 2 | 6 | 9 | 22 | | LoopFC | Pthread_FC(void*) | %for.body9.i | line 204 of ./digit_recognition.h | 1 | 5 | 49 | 53 | +-------------+------------------------+---------------+-----------------------------------+---------------------+-----------------+-----------------+---------+ The iteration count and latency may not be available for a pipelined function or a pipelined loop with non-deterministic loop bound. Please refer to :ref:`loop_pipelining` and :ref:`function_pipelining` for more details. LegUp's `Schedule Viewer` also gives more details about how individual instructions are scheduled inside each pipeline. Memory Usage ++++++++++++++++++++++ The memory usage section lists the memories used by the generated circuit, grouped by the type of memory architecture. Please refer to :ref:`mem_arch` section for more details about the memory architecture used by LegUp. .. code-block:: text +------------------------------------------------------------------------------------------+ | Local Memories | +------------------------+-----------------------+------+-------------+------------+-------+ | Name | Accessing Function(s) | Type | Size [Bits] | Data Width | Depth | +------------------------+-----------------------+------+-------------+------------+-------+ | conv1_weights_a0_a0_a0 | Pthread_Conv1 | ROM | 144 | 16 | 9 | | conv1_weights_a1_a0_a0 | Pthread_Conv1 | ROM | 144 | 16 | 9 | +------------------------+-----------------------+------+-------------+------------+-------+ +--------------------------------------------------------------------------------------------------------+ | Shared Local Memories | +-------------------------+--------------------------------+----------+-------------+------------+-------+ | Name | Accessing Function(s) | Type | Size [Bits] | Data Width | Depth | +-------------------------+--------------------------------+----------+-------------+------------+-------+ | conv1_output_a0_a0_a0 | Pthread_Conv1, Pthread_Maxpool | RAM | 10816 | 16 | 676 | | maxpool_output_a0_a0_a0 | Pthread_Conv2, Pthread_Maxpool | RAM | 2704 | 16 | 169 | | conv1_output_valid | Pthread_Conv1, Pthread_Maxpool | Register | 8 | 8 | 1 | | maxpool_output_valid | Pthread_Conv2, Pthread_Maxpool | Register | 8 | 8 | 1 | +-------------------------+--------------------------------+----------+-------------+------------+-------+ +----------------------------------------------------------------------------------------------------------------+ | Aliased Memories | +------------------------+---------------------+-----------------------+------+-------------+------------+-------+ | Name | Memory Controller | Accessing Function(s) | Type | Size [Bits] | Data Width | Depth | +------------------------+---------------------+-----------------------+------+-------------+------------+-------+ | foo_entry_local_array1 | memory_controller_0 | foo, foo_sub | RAM | 160 | 32 | 5 | | foo_entry_local_array2 | memory_controller_0 | foo, foo_sub | RAM | 160 | 32 | 5 | +------------------------+---------------------+-----------------------+------+-------------+------------+-------+ +-------------------------------------------------------------------------------------------------+ | I/O Memories | +---------------------------+-----------------------+----------+-------------+------------+-------+ | Name | Accessing Function(s) | Type | Size [Bits] | Data Width | Depth | +---------------------------+-----------------------+----------+-------------+------------+-------+ | classifier_input_valid | Pthread_Conv1 | Register | 0 | 8 | 0 | | classifier_input_a0_a0_a0 | Pthread_Conv1 | RAM | 0 | 16 | 0 | | classifier_output | Pthread_MaxComp | FIFO | 0 | 4 | 0 | | fc_weights_a0_a0_a0 | Pthread_FC | RAM | 0 | 16 | 0 | | fc_weights_a1_a0_a0 | Pthread_FC | RAM | 0 | 16 | 0 | +---------------------------+-----------------------+----------+-------------+------------+-------+ The example tables above show the accessing functions and the hardware implementation of each "memory" in the software. * The `Type` column shows how a memory is implemented in hardware, which could be in RAM, ROM (read-only), register or FIFO (only applicable to ``legup::FIFO`` type variables in C++). * The `Size` column reports the total size of the memory in bits, equals to `Data Width` * `Depth`. * The `Data Width` refers to the bit-width of the data ports of a RAM/FIFO, or the bit-width of a register. * The `Depth` field represents the depth of a RAM or FIFO, and it is always 1 for register. * `Aliased Memories` have an additional column showing the name of the `Memory Controller` of which the memory is being placed behind. So you can see which memories are aliasing and being put behind the same memory controller to support aliased memory accesses. Simulation and RTL Synthesis, Place & Route Report ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LegUp also generates a summary report (``reports/summary.results.rpt``) to show the results of simulation and RTL synthesis + Place & Route. Below is an example report: .. code-block:: text ====== 1. Simulation Cycle Latency ====== Number of calls: 2500 Cycle latency: 2532 SW/HW co-simulation: PASS ====== 2. Timing Result ====== +--------------+---------------+-------------+-------------+----------+-------------+ | Clock Domain | Target Period | Target Fmax | Worst Slack | Period | Fmax | +--------------+---------------+-------------+-------------+----------+-------------+ | clk | 10.000 ns | 100.000 MHz | 6.834 ns | 3.166 ns | 315.856 MHz | +--------------+---------------+-------------+-------------+----------+-------------+ ====== 3. Resource Usage ====== +---------------+------+--------+------------+ | Resource Type | Used | Total | Percentage | +---------------+------+--------+------------+ | 4LUT | 2913 | 299544 | 0.97 | | DFF | 4177 | 299544 | 1.39 | | I/O Register | 0 | 1536 | 0.00 | | User I/O | 0 | 512 | 0.00 | | uSRAM | 0 | 2772 | 0.00 | | LSRAM | 0 | 952 | 0.00 | | Math | 24 | 924 | 2.60 | +---------------+------+--------+------------+ The first section shows the result of `SW/HW co-simulation`, including the number of calls of the top-level function, the total cycle latency for running the whole simulation, and whether the SW/HW co-simulation has passed. Note that when :ref:`function_pipelining` is used, the auto-generated RTL testbench for SW/HW co-simulation can inject a new set of inputs to the top-level module without waiting for the previous "function calls" to finish. And because of this overlapped execution, you could see the average cycle per call being close to the reported initiation interval of the pipelined function. For instance, the example report above is from a pipelined function with an initiation interval of 1, and we can see that the average cycle per call is very close to 1 (2532/2500). The next two sections show the timing and resource usage results, parsed from Libero's report files after running RTL Synthesis and Place & Route. The target period is what you set in :ref:`legup_constraints`. If the `Clock Period` is not set, LegUp uses a default clock period for the target FPGA family.