• AVR Freaks

Hot!DMA speed

Author
filipdimitrov.com
Starting Member
  • Total Posts : 79
  • Reward points : 0
  • Joined: 2010/12/27 06:34:43
  • Location: 0
  • Status: offline
2012/07/19 07:21:00 (permalink)
0

DMA speed

Hi all,
 
Did anyone test DMA transfer between RAM and RAM?
I need to transfer about 1K data, and wondering if I should use DMA (and read all stuff related and loose a day :D) or just memcpy.
I need to wait for this transfer, so actual async operation is not needed/desired.
I know that theoretically DMA is faster, but there is overhead to setup transfer etc..
 
10x
 
#1

13 Replies Related Threads

    filipdimitrov.com
    Starting Member
    • Total Posts : 79
    • Reward points : 0
    • Joined: 2010/12/27 06:34:43
    • Location: 0
    • Status: offline
    Re:DMA speed 2012/07/19 07:21:31 (permalink)
    0
    ups sory, I'm using PIC32 at 80 MHz
    #2
    DarioG
    Allmächtig.
    • Total Posts : 54081
    • Reward points : 0
    • Joined: 2006/02/25 08:58:22
    • Location: Oesterreich
    • Status: offline
    Re:DMA speed 2012/07/19 07:24:38 (permalink)
    0
    The DMA doc shows some functions for memcpy and strcpy using DMA Smile

    GENOVA :D :D ! GODO
    #3
    threedog
    Super Member
    • Total Posts : 998
    • Reward points : 0
    • Joined: 2009/12/04 12:28:11
    • Location: Boise
    • Status: offline
    Re:DMA speed 2012/07/19 08:31:25 (permalink)
    0
    filipdimitrov.com
     Did anyone test DMA transfer between RAM and RAM?
    I need to transfer about 1K data, and wondering if I should use DMA (and read all stuff related and loose a day :D) or just memcpy.
    I need to wait for this transfer, so actual async operation is not needed/desired.
    I know that theoretically DMA is faster, but there is overhead to setup transfer etc..
     

     
    I would be interested to see the performance comparison.
    I have my doubts that DMA RAM to RAM transfer on a PIC32 would be much faster than the few lines of memcpy code (hopefully all in cache).
     
    things to consider
    The example code is not of any value.
    Align both the 1K data buffers on a 32bit boundary.
    set the transaction size to 4 bytes
    this will allow you to use a 32bit transfer per transaction rather than 8bit per transfer. (should be 4 times faster since only 256 DMA transactions should be used)
    do not poll for the DMA to be completed, if you need conformation, use the DMA ISR.  Might as well use a specially written 32bit memcopy() function if you are going to wait for the DMA to finish.
    remember, DMA is useful to do tasks in the background so the CPU can continue to do work in the foreground.
     
     
     
    #4
    filipdimitrov.com
    Starting Member
    • Total Posts : 79
    • Reward points : 0
    • Joined: 2010/12/27 06:34:43
    • Location: 0
    • Status: offline
    Re:DMA speed 2012/07/19 08:42:22 (permalink)
    0
    10x for hints :)
    I suppose if data is too small CPU will do better than DMA.
    Question is where is that limit :)
     
    #5
    filipdimitrov.com
    Starting Member
    • Total Posts : 79
    • Reward points : 0
    • Joined: 2010/12/27 06:34:43
    • Location: 0
    • Status: offline
    Re:DMA speed 2012/07/23 09:04:34 (permalink)
    0
    Surprisingly, I get faster times from memcpy...:
     
    1K transfer
    =========================
    DMA transfer : 948
    memcpy transfer : 388
    Ratio : 2.44

    5K transfer
    =========================
    DMA transfer : 4533 (only transfer :  4488)
    memcpy transfer : 1796
    Ratio : 2.524
    ==========================================
    Numbers are ticks from core timer.
    I'm using PIC32MX695F512H, flash cache is on, PClock is 80MHz, CPU clock is 80 MHz...interrupts disabled, only one DMA channel is active.
     
    This is my test code :
     

    #define EVENTS_RAW_BUFF_SIZE        (10 * 1024)
    static BYTE __attribute__((aligned(4))) g_nEventsRAWBuff[EVENTS_RAW_BUFF_SIZE];

    #define DMA_TEST_SZ (1024 * 5)
    #define DMA_TEST_ITERATIONS 1

    #define ReadCoreTimer(val)    asm volatile("mfc0   %0, $9" : "=r"(val))

    void DMATransfer()
    {
        IEC1CLR = 0x00020000;                // disable DMA channel 0 interrupts
        IFS1CLR = 0x00020000;                // clear existing DMA channel 0 interrupt flag
        IEC1CLR = 0x00010000;                // disable DMA channel 0 interrupts
        IFS1CLR = 0x00010000;                // clear existing DMA channel 0 interrupt flag
        DMACONSET = 0x00008000;                // enable the DMA controller
        DCH0CON = 0x3;                        // channel off, pri 3, no chaining
        DCH0ECON = 0;                        // no start or stop irq’s, no pattern match

        // program the transfer
        DCH0DSA = KVA_TO_PA(g_nEventsRAWBuff);    // Transfer destination physical address
        DCH0SSA = DCH0DSA + DMA_TEST_SZ;        // Transfer source physical address

        DCH0SSIZ = DMA_TEST_SZ;                // Block transfer source size bytes
        DCH0DSIZ = DMA_TEST_SZ;                // Block transfer destination size bytes

        DCH0CSIZ = DMA_TEST_SZ;                // Cell transfer bytes

        DCH0INTCLR = 0X00FF00FF;            // clear existing events, disable all interrupts
        DCH0INTbits.CHBCIE = 1;                // Channel Block Transfer Complete Interrupt Enable bit

        DCH0CONSET = 0x80;                    // turn channel on

        // initiate a transfer
        DCH0ECONSET = 0x00000080;            // set CFORCE to 1

        while(IFS1bits.DMA0IF == 0);        // Wait DMA to complete

        DMACONCLR = 0x00008000;                // Disable the DMA controller
    }

    void DMATransferTest()
    {
        int i = 0;
        UINT32 nStartTick;
        UINT32 nEndTick;

        ReadCoreTimer(nStartTick);

        for(i = 0; i < DMA_TEST_ITERATIONS; i++)
            DMATransfer();

        ReadCoreTimer(nEndTick);
        TRACE("DMA transfer : %d \r\n", nEndTick - nStartTick);

        // Check memcpy transfer speed
        ReadCoreTimer(nStartTick);

        for(i = 0; i < DMA_TEST_ITERATIONS; i++)
            memcpy(g_nEventsRAWBuff, g_nEventsRAWBuff + DMA_TEST_SZ, DMA_TEST_SZ);

        ReadCoreTimer(nEndTick);
        TRACE("memcpy transfer : %d \r\n", nEndTick - nStartTick);
    }

     
    #6
    filipdimitrov.com
    Starting Member
    • Total Posts : 79
    • Reward points : 0
    • Joined: 2010/12/27 06:34:43
    • Location: 0
    • Status: offline
    Re:DMA speed 2012/07/23 09:13:03 (permalink)
    0
    All optimizations on code are off.
    #7
    DarioG
    Allmächtig.
    • Total Posts : 54081
    • Reward points : 0
    • Joined: 2006/02/25 08:58:22
    • Location: Oesterreich
    • Status: offline
    Re:DMA speed 2012/07/23 15:12:37 (permalink)
    0
    strange then...

    GENOVA :D :D ! GODO
    #8
    vl
    Super Member
    • Total Posts : 222
    • Reward points : 0
    • Joined: 2012/05/15 22:29:27
    • Location: 0
    • Status: offline
    Re:DMA speed 2012/07/23 18:52:02 (permalink)
    0
    filipdimitrov.com All optimizations on code are off.

    If I see it correctly, you're using the library memcpy. Then I guess your optimization settings do not affect the precompiled library...
    You can implement a simple memcpy with 32-bit transfers, and see how it goes like. Try it with __attribute__ ((mips16)) as well.

    #9
    filipdimitrov.com
    Starting Member
    • Total Posts : 79
    • Reward points : 0
    • Joined: 2010/12/27 06:34:43
    • Location: 0
    • Status: offline
    Re:DMA speed 2012/07/24 00:37:27 (permalink)
    0
    Yes, this is library function, but I also configure project to use no optimized libs.
    I just wanted to know if DMA can do better than memcpy.
    In theory, no matter what you do in C/ASM... DMA controller should be faster.
     
     
    #10
    threedog
    Super Member
    • Total Posts : 998
    • Reward points : 0
    • Joined: 2009/12/04 12:28:11
    • Location: Boise
    • Status: offline
    Re:DMA speed 2012/07/24 10:29:53 (permalink)
    0
    filipdimitrov.com
     Yes, this is library function, but I also configure project to use no optimized libs.
    I just wanted to know if DMA can do better than memcpy.
    In theory, no matter what you do in C/ASM... DMA controller should be faster.
     

    DMA cannot completely dominate the IData bus.  To do so would effectively disable the CPU from executing any code while the DMA is on progress.
    #11
    DarioG
    Allmächtig.
    • Total Posts : 54081
    • Reward points : 0
    • Joined: 2006/02/25 08:58:22
    • Location: Oesterreich
    • Status: offline
    Re:DMA speed 2012/07/24 15:01:59 (permalink)
    0
    I am thinking about DMa quite often recently, and I usually believe that it uses say "high side" of a clock cycle while CPU uses the "low" one... or alike Smile

    GENOVA :D :D ! GODO
    #12
    cbeif
    Starting Member
    • Total Posts : 56
    • Reward points : 0
    • Joined: 2012/05/09 14:28:30
    • Location: 0
    • Status: offline
    Re:DMA speed 2019/08/05 14:42:51 (permalink)
    0
    I ran on a PIC32MZ1025DAG169


    If I set 
    #define DMA_TEST_ITERATIONS 1
    then I get an output of 
    DMA transfer : 2730
    memcpy transfer : 1979
     
    if I set
    #define DMA_TEST_ITERATIONS 2
    then I get
    DMA transfer : 2818   (huh, huge speed up on second run!!!)
    memcpy transfer : 3754
     
    if I set
    #define DMA_TEST_ITERATIONS 100
    Then the test results i get are this.
    DMA transfer : 11393
    memcpy transfer : 178265
     
    and for 
    #define DMA_TEST_ITERATIONS 10000
    DMA transfer : 877643
    memcpy transfer : 17795195
     
    So it seems the very first DMA transfer is extremely slow,
    but subsequent transfers are up to 20 times faster.
    I have no explanation for this
     
     
     
    #13
    nigelwright7557
    Super Member
    • Total Posts : 284
    • Reward points : 0
    • Joined: 2006/11/06 08:15:51
    • Location: 0
    • Status: offline
    Re:DMA speed 2019/08/05 16:24:40 (permalink)
    0
    I designed a USB oscilloscope that read data from portb to a buffer as fast as possible.
    Just using a copy loop I found it faster than using the DMA.
    DMA was about 20% slower.
     
    #14
    Jump to:
    © 2019 APG vNext Commercial Version 4.5