• AVR Freaks

LockedGetting USB faster...

Page: 12 > Showing page 1 of 2
Author
Guest
Super Member
  • Total Posts : 80503
  • Reward points : 0
  • Joined: 2003/01/01 00:00:00
  • Location: 0
  • Status: online
2006/07/06 09:41:08 (permalink)
0

Getting USB faster...

I have a delema here that I'm trying to solve, and it's similar to this thread: http://forum.microchip.com/tm.aspx?m=165751  However it is different. I'm trying to replace an old parrallel port interface with a USB interface so that the device can be used on newer computers (that dont come with parrallel ports anymore, or serial ports for that matter, only USB)  I got the device working through the modified CDC example code, my issue is that much of the communication between the control program on the computer and the device happens at no more than 10 bytes at a time, so I send alot of packets back and fourth, rather than sending fewer larger sized packets. I could try to rewrite the application code to organize everything into a more buffered aproach, but that would require an essentual overhaul of the program due to the way it's written (trust me I dont want to do that, especialy sense it's written in lab view). 
 The main communication for the device is done over I2C, talking to DAC's, ADC's and EEPROM's, etc. I'm currently using the 18F4550 to take data and addresses off the USB and put them onto an I2C bus. Like I said before, I got that working, and here is the delima: using the drivers that came with the CDC code for windows I cant get the host to send more than 1 packet per USB frame (1ms as I'm running at 12mhz). Even if I reduce the packet size to 8 or 16 bytes it will still only send a maximum of 1 packet per frame. So like said, I can try to use 64 byte packets and rewrite the application code to be effecient with 64 byte packets (a huge pain) or try to figure out a way to get more than one IRP to go through each frame. Oh, I should say that only 1 packet per frame is way to slow, and over the parallel port I was getting about 1 byte every 20us. Which is slower than 64 bytes/ms but the application code only tries to send no more than 10 bytes per packet (again reducing packet size does not allow for more than 1 packet per frame) and getting the application to do more than that requires a major overhaul.

   Now, a guy on the forum here named hwti came up with a way to do more than 1 IRP per frame and get good packet rates, while still implimenting it as a CDC device, which would be perfict for me, however I cant seem to get ahold of him. He did say:

    I use a firmware based on bminch assembler one, but interrupt-driven. I use software double-buffering (easier than ping-pong buffering) : when the SIE owns buffer1, I can fill buffer2, and after the packet is transmitted, I can change the USB buffer address to buffer2.

I wrote a custom driver based on microsoft sample drivers : isousb (how to use multiple IRPs) and bulkusb.
It is a basic (not all features programmed) virtual COM port driver. With many IRPs, I can get a great speed even with packets smaller than the maximum packet size.
And it tries to read data as soon as its buffer isn't full, so speed depends less on the size of the requests done by the application.

I can get about 600KB/s (or even more, it depends what time the PIC spends to get data to send (buffer to copy, or code memory to read, ...) 


I have NO experiance in writting custom drivers and am facing a point where I am going to have to try unless i can get ahold of some code or another driver that can talk to 18F4550 and allow more than 1 IRP per frame. Can any of you help me out?

                                                                                                              - thanks
post edited by shushikiary - 2006/07/06 09:43:59
#1

37 Replies Related Threads

    Guest
    Super Member
    • Total Posts : 80503
    • Reward points : 0
    • Joined: 2003/01/01 00:00:00
    • Location: 0
    • Status: online
    RE: Getting USB faster... 2006/07/07 14:31:26 (permalink)
    0
    Ok, after doing some more research I have found that there is a URB (USB request block), which contains a buffer of USB IRP's (I/O request packet). This buffer is then sent in the URB to the USB driver (/ host) and it then scheduals the apropriate packets based on available bandwidth, pipeline type, etc. Now, if you request a number of bytes that will fit into 1 packet(the packet size is what ever your USB buffer size is set to on the 18f4550) then only 1 IRP will be put into the URB and so only 1 packet will be sent that frame. Only 1 URB is allowed per frame. Say your packet size is 64 bytes, if you request only 64 bytes, only 1 IRP will be sent in the frame, and your max data speed is 64kbytes/s (as you can only send 1 packet per frame). However if you request 128 bytes at a time then you will send 2 IRPs per URB and get 128 bytes/ms or 128kbytes/s, etc etc.. up to the maximum of 19, 64 byte packets per frame as specified in the USB 2.0 spec. (this if of course with bulk and interupt transfers, and assumes that nothing else on the host is using up bandwidth in the 12mhz pipe).
      Also, there are not that many differences between bulk and interupt transfers other than the fact that the host strobes the device every so many ms based on a number given in the EP descriptor to see if the device needs anything. This makes it so that you will most definitly get at least 1 packet per number of ms said in the EP descriptor in, where in bulk this does not occur, and packets are only transmitted when bandwidth is available on the bus. In the full speed model (12 mhz) the same number of packets per frame are available for interupt or bulk transfers, and the buffer sizes are the same (64,32,16, or 8 bytes). There are 19 packets per frame allowed with 64 byte buffers, 33 packets with 32 bytes, 51 with 16, and 71 with 8.
      In isochronius transfers the URB is organized differently such that more than one IRP can be sent per frame, so using the ISOUSB example in the windows DDK hwti modified the BULKUSB example to do this, even though with BULKUSB if more data is requested than can fit into 1 packet then multiple IRP's are sent. However, I still have no clue what the heck the drivers are doing, because I do not know the windows kernel very well, and my hopes for writting a driver that can do this are slim.
      BUT! For all you out there that are having bandwidth problems this might clear some things up for you. If you need to do lots of small transfers it would be best to dump all those transfers into a buffer, send them all out at the same time (say 51, 16 byte transfers grouped into one 816 byte buffer) rather than requesting 16 bytes each time around the driver loop. This way the driver will send 51 IRP's, and devide the buffer into 51 16 byte packets, just like you want, and they will all be sent in 1 frame (if the bandwidth is available, if not, then they will be sent as fast as the host can get them there based on the available bandwidth).
    post edited by shushikiary - 2006/07/07 14:34:31
    #2
    Guest
    Super Member
    • Total Posts : 80503
    • Reward points : 0
    • Joined: 2003/01/01 00:00:00
    • Location: 0
    • Status: online
    RE: Getting USB faster... 2006/07/13 14:14:30 (permalink)
    0
    Ok, so I have the labview code re-organized inorder to request to send a whole bunch of data at once, and then the USB drivers splice that up into packets and send them out, so that is optomized, however it is still running too slow, and I found out another reason why.
      On the 18f4550 it will only send 1 packet per frame back to the host if you have a buffer the same size as your packet size. HOWEVER if you make the EP IN buffer larger than the IN packet size then when you hand the buffer over it will write more than 1 packet per frame. For example: if your buffer size is 256 (the value you write in the BD cnt) and you put in the EP descriptor that the packet size is 64 bytes, then when you hand the buffer over to the USB state machine on chip it will send out 4-64 byte backets in that frame (assuming there isnt a NAK from the host, and the request didnt occur too close to the EOF (end of frame)).  
      So in order to deal with this I am re-writting the CDC example code from microchip to have 2x256 byte circular buffers. 1 buffer that takes in commands until it is full, and keeps count of how many bytes have come in. Every time bytes are read out of this buffer the count of the number of bytes read in is reduced by the bytes read out. The 2nd buffer is a write out buffer that writes on the IN pipe back to the host. Now, the whole goal if this thing is that it takes in commands from the USB which tell what to do on an I2C bus, essentually a USB to I2C converter. As the parrallel port was previously used to talk to a philips chip that controled the I2Cbus, however i'm replacing that chip with the 18f4550, and the parrallel port with USB. So every time we read something off the I2C bus, or want to return that a write over the I2C was successful we write to the 2nd circular buffer. When all commands out of the 1st circular buffer are completed, or when the 2nd buffer is full, it's handed over to the USB state machine, which then will send as many packets as needed to send all our data back.    This allows the I2C bus to stay as busy as possible while data is still going in and out of the circular buffers.
       If not talking over the I2C bus and simply returning 256 bytes of data at a time I can read at 256kbytes/s, if I reserved all of the USB memory for the IN buffer (remember that's transfer to the host) then we could achieve data rates of 1024kbytes/s. You can do this, however if you reserve anything larger than 256 bytes in ram you must refrence it with a pointer as it will span multiple RAM banks, and you can reserve space outside of the USB RAM if you want, which means it is possible to achieve the max bandwidth the USB 2.0 spec states for 12mhz, which is 1216kbytes/s. You would just use up alot of space, and it would be hard to fill the IN buffer with data fast enough. Say you could write a useful byte to the buffer every clock cycle (running at 48mhz) then you could write 50,000 bytes a milisecond (1000000ns/ms)/(20ns/clk), however you're more likely to be slower than that. So with very effecient code I think it's possible to reach those data rates. It's just how you organize your buffers is very important. The same is true when writting to the device, the USB state machine will turn over the UOWN bit when 1 packet is recieved, therefore the number of packets you recieve per frame depends on how fast you can turn over the UOWN bit (of course you dont want to turn it over until you have used all the data you recieved in some fashion). Due to the fact that the USB state machine turns over the UOWN bit each packet, it's useless to have an OUT EP buffer larger than your packet size, so here the pingpon buffering whould be usefull, though you'd still be limited in how many packets per frame you could recieve based on how fast you can turn over the UOWN bit, you would get at least 1 extra packet perframe with pingpong buffering.
     
      I'm glad this thing is starting to get figured out... I'll see how fast I can get this thing going.....
    post edited by shushikiary - 2006/07/13 14:23:53
    #3
    Guest
    Super Member
    • Total Posts : 80503
    • Reward points : 0
    • Joined: 2003/01/01 00:00:00
    • Location: 0
    • Status: online
    RE: Getting USB faster... 2006/07/20 15:34:52 (permalink)
    0
    Ok, the code if finished and tested and works like it should I'll post it here for you guys and say what changes I made to the example CDC code..... though right now it only gets 64kbyte/s back to the host from the 18f4550, it gets upwards of 600kbyte/s down to the pic, assuming it's not running the I2C bus, if you send it I2C commands then it runs only as fast as the I2C bus can handle it (once the 256 byte in buffer is full). This code would work well as USB to I2C converter, and the USB interface is emulated as a comport. The packets in and out are each 64 bytes, though I have not tried implimenting the idea of making the in buffer larger than the in packet to see if I could get more than 64k/s into the host. The circular buffer scheme works well however, and yes it is set up to work with the USB boot laoder, as this code was designed for the USB demo board.

    However I did have to make some changes to the USB board, the temperature sensor had to come off as it was SPI based, so I took it off the board, and then you have to make sure to have pull up resistors on an I2C bus as the SSP moldual uses open drains on the outputs when in I2C (as per the I2C spec). 4k resistors worked just fine for me at 400khz (at 5v this means 12.5 mA sink when driving the bus low, which is well within the pic spec)


    just remember try to jam as many commands as possible into the serial port each time you write to the pic, this will give you the best data rate to the pic, aka say you want to do a write, then a read, and then a write read... well then you want to put all three of those into an array and pass that array to the com port, rather than trying to do each command with a new write to the com port, as then you'll only get 1 command/ms.

    also, the commands are set, so here they are:

    Read:
    1st byte = 0x01
    2nd byte = slave address
    3rd byte = numbers of bytes to read (limited to 255 bytes, same reason as below)

    Write:
    1st byte = 0x0A
    2nd byte = slave address
    3rd byte = bytes to write
    4th byte = data
    5th byte = ....  and so on up to 255 bytes (limited by the fact that the number of bytes is a byte)

    Write Read:
    1st byte = 0x07
    2nd byte = slave address
    3rd byte = value you want written to the slave
    4th byte = number of bytes to read (limited to 255 bytes, same reason as above)

    there is also an I2C reset function that you must call before you start communication on the I2C bus, and it sets the buad rate, etc and it's command is:
    1st byte = 0x08
    2nd byte = 0,1,2, or 3   0 means 400khz, 1 = 300khz, 2 = 200khz, and 3 = 100khz

    if you dont call the above function first, you will never seen any activity on the I2C bus.



    after each function is completed it will write a status byte back to output_buffer, which says if there was any errors, etc. For writes, only 1 byte is returned, and for reads as well as write reads all the bytes read are returned along with status byte at the end. So if you read 144 bytes you'll get 145 back, the last being the status byte.

    the codes for the status bytes are:
    0x00  good/no errors
    0x05  no ack on writting of address
    0x09  no ack on data write

    if an error occurs during a write, the error will simply be put in the 1 byte return value, however if an error occures on a write read or read then however many bytes you requested to read get returned with an error reported in the status bytes, but the data returned will be garbage aka... if there was an error, dont use the data returned.

    The code will run through any commands in the input buffer (called commands[]), if we have caught up to all the commands in the commands buffer or there is 64 or more bytes ready to send out in the output_buffer then data will be sent back to the host. So if you really want to, you can request a single command and it will be done and send it's return byte/bytes when it's done. This also is set up so that if you wanted to send 50,000 commands, and dumped that to the buffer on the com port, then the pic would cycle through all 50,000 commands returning status and data as fast the I2C bus can handle it.
    When the I2C bus cant keep up with the commands being written in the commands buffer, and the buffer becomes full, the SIE will send a NACK back to the USB host, and the host will try to send the data again next frame, the code will only write to the commands buffer if there is enough space in the buffer for the entire packet it has recieved. This way, as soon as there is enough space in the buffer, and the host tries again, the next command will be written, so writing 50,000 commands will all eventually get there, and all the write backs will go back through the output_buffer as fast as the I2C bus can run. There is a possibility of a 1ms delay due to the framing issue on the USB, but it's reduced as much as possible, as the I2C will empty the command buffer as much as it can before the next frame, allowing the USB host to send many packets in 1 frame if the commands buffer has the space. 

    With the circular buffer scheme it's possible to run the IN, OUT, and I2C all at the same time, for maximum through put. Also note that the code is nonblocking in every part, so that the USB/CDC code works fine. There would be some performance gain if each EP buffer was actually 256 bytes and I wrote straight to them instead of copying in a for loop, however getting the SIE to read only 1 variable 64 byte section of it's EP buffer for a send out at a time would be difficult, as it always wants to read from the start of the buffer, and I dont know how you'd get it to write to a variable part either, again it always want to write starting at the start of the buffer, so circular buffers wouldnt really work.... so I just took the time hit and worked with for loops. Even if you wanted to use a buffer that moved the pointers to the buffers up and down in memory rather than always adding to them, you'd have to empty the out buffer before you could hand it back over to the SIE, and when you handed the In buffer back over to the SIE you wouldnt be able to write to it. Also, if the I2C bus does not have pull ups, or the bus is never released by a slave, etc... then the code will not enter into the state machine that controls the I2C and will not return any data, causing a stall. I may add code that causes a time out and report error. This stall will also occur if during reading data the slave dies or stops during writting of the data, as the code will simply poll DataRead() forever.
      also note that the I2C.h function ReadI2C() was not used as it has blocking code in it (though you wouldnt know that by looking at the .h file, and you can get to the c code for it... i figured it out when trying to figure out why things wernt working quite like they should on the I2C bus). turns out that this function waits tell data is ready and then returns with the data, aka it blocks. So instead I control the master control bit and read straight out of the SSPBUF.

    Alot of the CDC code was moved into the user.c code so that it could be modified to run with the circular buffers, though the code is pretty much the same, and works with the entirely same ideas. And though it's a small time savings that is trivial, I no longer have to call CDC functions, the only thing used is a macro from the CDC.h file.

    changes made to cdc.h:

    extern POINTER pCDCDst;
    //byte getsUSBUSART(char *buffer, byte len);
    //void putrsUSBUSART(const rom char *data);
    //void putsUSBUSART(char *data);


    changes made to main.c:
    //CDCTxService();


    and the way the code is set up the EP out size doesnt matter, however the EP in size needs to be 64, which can be set by setting

    #define CDC_BULK_IN_EP_SIZE      64    

    in usbcfg.h

    if you do not want to use this size of packet, and change this value to something smaller (it cant be larger and meet USB 2.0 spec) then you must also the 64 here to what ever you set it to:

    if((outbytes >= 64) && ((cdc_trf_state == CDC_TX_READY) || (cdc_trf_state == CDC_TX_BUSY_ZLP)))
    {
    mUSBUSARTTxRam(output_buffer,64);
    }

    The nice thing is also that at 400khz the maximum bandwidth of the I2C bus is 400,000/ 9 (say you're doing a write read and your past the write and resend address part, then you just read as much data as you want, and you have to send an ack for each byte, that's why it's 9 clocks) is 44,444.44444 bytes/s  aka only about 45k/s which is slower than the max bandwidth we have to send back to the host with, so we are ok!




    also note that there is a change needed to the linker file for this to code to work...

     ok i realized I can upload files.. dumb me, so here is an attachment...
     
    it has the entire project with all the code, if you load the project and compile it, it will use the modified linker file and make a hex file that will work fine with a the boot loader...
     
    post edited by shushikiary - 2006/07/21 13:29:59
    #4
    Guest
    Super Member
    • Total Posts : 80503
    • Reward points : 0
    • Joined: 2003/01/01 00:00:00
    • Location: 0
    • Status: online
    RE: Getting USB faster... 2006/07/27 11:52:20 (permalink)
    0
    I just realized that it would be possible to increase the speed of this whole thing by chaning the pointer in the USB buffer descriptor. This way, since the 2 circular buffers are already in USB memory as said in the linker file, I could move the USB buffers into the same memory space simply by chaning the BD pointer value, and then update the pointer to point to the next valid set of data in the buffer. This would work well for the IN EP, the OUT EP might be a little trickier, as it would wast some buffer space, as I wouldnt be able to read from the part of circular buffer owned by the SIE, I'd have to wait tell it handed it back over, which means that there would always be a 64 byte section that had to be open in the OUT buffer... I dont like that idea, unless we made the packet size 16, and then only 16 bytes would always have to be open....  Most likely I'll leave the OUT and commands buffer the way it is, but I'll modify the IN buffer to share the circular buffer space. 

     
    I also figured out why it was only sending 1 64 byte packet back per frame: The CDC code is set up in such a way that if you only request to send 64 bytes, then it will send a zero length packet after it, causing the transfer to end for that frame. However if you reqest more than that to be sent out, then code will not send the ZLP, and instead end the trasnfer either with a non full packet, or a zero length packet when it reaches the end of the request. I actually accidently took this into acount in the above code listing, as it will set the CDC flags to send more data if it comes into the sendout() function with the ZLP flag and more data is ready. In this way it will send as many 64 byte packets back to the host per frame as the host will allow (on my particular machine, only 11, which is still 704kb/s), I think this is because I have a USB mouse and keyboard connected to the same HOST.
    post edited by shushikiary - 2006/07/27 12:16:45
    #5
    EmbeddedMan
    Junior Member
    • Total Posts : 112
    • Reward points : 0
    • Joined: 2005/10/15 15:08:38
    • Status: offline
    RE: Getting USB faster... 2006/07/27 21:50:58 (permalink)
    0
    shushikiary,
        Wow - this is so very, very cool. So, what you're saying is that, if the PIC weren't doing anythign else, you could have 704KBytes/s (you wrote it as 704K bits/s, but I know what you mean) back to the host with your new CDC code? And 600KBytes/s from the host to the PIC? Are both of these 'measured' speeds (did you do an experiment and actually see those speeds) or theoretical speeds that 'should' be true?

         I'd love to try out your project and see how fast it can go in both directions simultainously. Any ideas on what it _should_ be able to do?

    *Brian
    #6
    Guest
    Super Member
    • Total Posts : 80503
    • Reward points : 0
    • Joined: 2003/01/01 00:00:00
    • Location: 0
    • Status: online
    RE: Getting USB faster... 2006/07/28 11:45:10 (permalink)
    0
    Yes, though that's all my HOST would let me use for bandwidth, but like said I have several other USB devices on that host. And yes I ment Kbytes.
    the way I measured it is by putting a toggle LED command in the CDC  code right when it handed over the EP buffer to the SIE (aka tell the SIE to sent an IN packet), and another one when the CDC code had finished copying out of the OUT EP into the chosen buffer. I then tied an osciloscope to both LED lines, and a differential probe to the USB lines. If you know the USB spec well enough, you can find the sync section of each packet trasnfer pretty easy, and then not only count the packets on the o-scope by looking at USB, but also by looking at the LED's.... of course I dont have a very very expensive USB analyzer, sadly.
    Then I wrote code that wrote as fast as it could to the serial port, and wrote code for the pic that would write as fast as it could to the USB when switch 3 was pressed. This let me see exactly how fast I could transfer data. I also made sure to empty the OUT buffer as fast as possible. So when doing the bandwidth testing, I was just dumping data to see how fast it could possibly go. Since I know each packet size, and always made all requests devisible by the packet size, I always had full pacekts. So since I had a scheme to count packets, I would simlpy multiply the packet size by the number of packets transfered. Though doing any data crunching would most likely slow the code down, that's as fast as I could get it going.

    The code posted above is the full code that is not setup for bandwidth testing, it is set up to use the I2C bus, so if you try to send it I2C commands the speed of the USB will be band limited by the I2C bus. If you modify the code just before the if((countbytes > 0) || (curntcom != 0)) line of code that says countbytes = 0; Then you can always write to the commands buffer, so the pic will never NAK a packet, and you wont have to worry about what data you write as you'll never enter the I2C code.

    then you can write as much data to the circular buffer as you want and it will never count as full, it will just overwrite the data in the buffer. Then if you put a line of code like this:
    if(outbytes == 0)
    outbytes = 64; 
    as the very first 2 lines of code (after variable definition) in the sendout() function, then it will always send back to the HOST as fast as the HOST/pic can, and then when you write to it, you will write to it as fast as you possibly can, that will let you do maximum bandwidth testing.

    hope that explains things.

    also, the USB 2.0 Spec is not very clear in what the bandwidth both up and down at the same time as, as it says "so many ___ sized packets per frame per EP" now that doesnt say IN EP or OUT EP, it just says EP. So the theoretical max, stated in the USB 2.0 spec, is 1216kbyte/s one direction, and the same the other, OR It's 1216kbytes/s max both directions combined... i'm not sure. Buf if you made the code additions I listed above, it would let you test both at the same time if you have PC code that reads and writes to the serial port as fast as it can.


    I also just recently added more comments to the code, and made the I2C code more efficient, as there where small gaps in the timings, I.E. there was a couple micro second gap between the I2C START condition and the writting of the address, I did my best to get rid of gaps and the like. here is the updated code, hope the comments help a little more:
    post edited by shushikiary - 2006/07/28 11:56:32
    #7
    Guest
    Super Member
    • Total Posts : 80503
    • Reward points : 0
    • Joined: 2003/01/01 00:00:00
    • Location: 0
    • Status: online
    RE: Getting USB faster... 2006/07/28 13:56:27 (permalink)
    0
    I just realized something after thinking about it for a little bit.... notice how I only get 600kbyte/s into the PIC? and similar, though a little more out of the PIC? I bet what's going on is that the HOST splits the totally bandwidth between the OUT and IN pipes on the single EP, meaning that they will both only get about 600kbyte/s speeds... Though perhaps this host (like some i've seen talked about on the forum) doesnt completely match the USB 2.0 spec and will allow more than 19, 64 byte packets per frame, which is why I get 704 kbyte/s back from the pic....    Though I dont have anything to prove this right now, i think it might be a possibility.
    #8
    tty2006
    Starting Member
    • Total Posts : 88
    • Reward points : 0
    • Joined: 2006/09/08 13:43:26
    • Location: 0
    • Status: offline
    RE: Getting USB faster... 2006/09/11 09:38:53 (permalink)
    0
    I downloaded shushikiary's attachment 2 to try a faster datarate and better performance of emulated RS232 communication. To my surprise, the getsUSBUSART() function has been commented out. Any explanation?
     
    Thanks and regards, Young
    #9
    EmbeddedMan
    Junior Member
    • Total Posts : 112
    • Reward points : 0
    • Joined: 2005/10/15 15:08:38
    • Status: offline
    RE: Getting USB faster... 2006/09/13 21:30:14 (permalink)
    0
    shushikiary,
         I really hope you don't mind (please let me know if you do) but I've modifed your origional project so that I can test the maximum speed. All I have now is a measurement of the speed from the PC to the PIC, and all I can get is 250KB/s. I'm posting my code in hopes that you can look at how I've mofied your origional and say "well, since you changed this piece right here, it's never going to run as fast as I had it!"

         Included in the project is a basic application written in Liberty Basic (www.libertybasic.com - you can download the free eval version to run my code) that shows I can send about 2MB in 4 seconds. They way that I know everything is getting to the PIC is that I count up the number of bytes received by the PIC, and whenever possible, I send out the current count back up to the PC.

         As a bigger picture item here, if it really is possible to get to 600 or 700 KB/s in either direction (and I have no reason to doubt you) _everyone_ would love to see actual proof that we could all run and show ourselves exactly how it works that fast. So I'm trying to get a set of project that anyone can build that would allow demonstratable, debuggable, proof of the max throughput of Microchip CDC under Windows.

         BTW, for some reason I can't seem to send more than 32KB at a time to the PIC - if I make my strings longer than 32K, Liberty Basic hangs up. Maybe it's LB, maybe Windows, I'm not sure.

         So, any chance you can look at the user.c file in this project and comment on if I'm doing anything that would limit me to a much lower speed than you saw?

    Thanks _so_much_,

    *Brian
    #10
    tty2006
    Starting Member
    • Total Posts : 88
    • Reward points : 0
    • Joined: 2006/09/08 13:43:26
    • Location: 0
    • Status: offline
    RE: Getting USB faster... 2006/09/14 09:34:48 (permalink)
    0
    Hi Brian,
     
    I am also trying shushikiary's code, but no luck. I downloaded your version, and found it was configured for pic18f2455. My board uses pic18f4550 and so I had to modify your firmware. Howevere, after I compiled and wrote to pic, the device is no longer recognized by PC. I checked and modified the configuration bits to fit my hardware: 96MHz PLL, PLL Src/2, 16MHz osc, EC+PLL+USB.
     
    BTW The Microchip version of RS232 emulation firmware runs fine.
     
    On PC side, I use VC++6, the translation of your code is somewhat like this:
     
    m_Comm.Open(6, 921600, 'n');
    int len = len;
    char tx[3200], rx[3200];
    for(i=0; i<3200; i++) {
        tx='%';
    }
    for(i=0; i<2000000/len; i++) {
        m_Comm.SendData(tx, len);
        m_Comm..ReadData(rx, len);
    }
     
    m_Comm.Close();
     
    However, I couldn't make it work. Any hint?
     
    Thanks and regards, Young
    #11
    tty2006
    Starting Member
    • Total Posts : 88
    • Reward points : 0
    • Joined: 2006/09/08 13:43:26
    • Location: 0
    • Status: offline
    RE: Getting USB faster... 2006/09/14 09:40:08 (permalink)
    0
    Interesting. The code posted is a bit different from I wrote. Let me try it again:
     
    m_Comm.Open(6, 921600, 'n');
    int len = len;
    char tx[3200], rx[3200];
    for(i=0; i<3200; i++) {
        tx='%';
    }
    for(i=0; i<2000000/len; i++) {
        m_Comm.SendData(tx, len);
        m_Comm.ReadData(rx, len);
    }
     
    m_Comm.Close();

    #12
    EmbeddedMan
    Junior Member
    • Total Posts : 112
    • Reward points : 0
    • Joined: 2005/10/15 15:08:38
    • Status: offline
    RE: Getting USB faster... 2006/09/14 10:42:54 (permalink)
    0
    tty2006,
         Yah, my boards are all 2455s, not the 'big brother' you guys all have. :) I'm working on this as an update to my little UBW (USB Bit Whacker) project. (http://www.greta.dhs.org/UBW) so I'm just using those boards.

         I don't know what your code issue is - I just converted the project from the origional 4550 by changing what header files and linker scripts were used. It was pretty simple. I'm sure to convert back you just undo those changes.

        When I build the project, I don't have any config bits as my bootloader has all of that in it.

    *Brian
    #13
    tty2006
    Starting Member
    • Total Posts : 88
    • Reward points : 0
    • Joined: 2006/09/08 13:43:26
    • Location: 0
    • Status: offline
    RE: Getting USB faster... 2006/09/15 12:08:14 (permalink)
    0
    Thanks, Brian.
     
    I've modified and made your version work with my hardware. The communication between PC and PIC works most of time. However, if it goes wrong one time, the communication is then blocked. No more PIC->PC can be made, while PC->PIC seems to be fine.
     
    I also used "Advanced USB Port Monitor v2.1.0.12" to check the data traffic. There are tens of transactions with success until one saying "buffer overrun". The following is a copy of the events near the accident:
     
    [332] URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER 20060915145929.415 (+0)
    Pipe handle: 0x81E4A784
    Transfer flags: 0x00000003 (USBD_TRANSFER_DIRECTION_IN, USBD_SHORT_TRANSFER_OK)
    Transfer buffer: 0x823CC004

    [333] URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER 20060915145929.415 (+0)
    Pipe handle: 0x81E4A764
    Transfer flags: 0x00000002 (USBD_SHORT_TRANSFER_OUT, USBD_SHORT_TRANSFER_OK)
    Transfer buffer: 0x824EE620


    [333] URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER (SUCCESS/0x00000000) 20060915145929.415 (+0)
    IRP status: 0x00000000 (STATUS_SUCCESS)


    [332] URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER (SUCCESS/0x00000000) 20060915145929.415 (+0)
    IRP status: 0x00000000 (STATUS_SUCCESS)


    [334] URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER 20060915145929.415 (+0)
    Pipe handle: 0x81E4A784
    Transfer flags: 0x00000003 (USBD_TRANSFER_DIRECTION_IN, USBD_SHORT_TRANSFER_OK)
    Transfer buffer: 0x823CC004


    [335] URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER 20060915145929.415 (+0)
    Pipe handle: 0x81E4A764
    Transfer flags: 0x00000002 (USBD_SHORT_TRANSFER_OUT, USBD_SHORT_TRANSFER_OK)
    Transfer buffer: 0x824EE620


    [335] URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER (SUCCESS/0x00000000) 20060915145929.415 (+0)
    IRP status: 0x00000000 (STATUS_SUCCESS)


    [334] URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER (ERROR: BUFFER_OVERRUN/0xC000000C) 20060915145929.415 (+0)
    IRP status: 0xC0000001 (ERROR)


    [336] URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER 20060915145929.415 (+0)
    Pipe handle: 0x81E4A764
    Transfer flags: 0x00000002 (USBD_SHORT_TRANSFER_OUT, USBD_SHORT_TRANSFER_OK)
    Transfer buffer: 0x824EE620


    [336] URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER (ERROR: DEV_NOT_RESPONDING/0xC0000005) 20060915145929.425 (+10)
    IRP status: 0xC0000001 (ERROR)

     
    I realized that you mostly send data from PC to PIC and receive only two bytes from PIC to PC. I assume both directions should be the same, but I will test it later.
     
    Thanks and Regards, Young
    #14
    tty2006
    Starting Member
    • Total Posts : 88
    • Reward points : 0
    • Joined: 2006/09/08 13:43:26
    • Location: 0
    • Status: offline
    RE: Getting USB faster... 2006/09/15 13:21:48 (permalink)
    0
    When  I change to 1 byte per packet for both directions, the communication blocking problem happens less frequently, but it happens randomly about every 10 - 20 kbytes been sent.
     
    Any hint?
     
    Thanks and regards, Young
    #15
    EmbeddedMan
    Junior Member
    • Total Posts : 112
    • Reward points : 0
    • Joined: 2005/10/15 15:08:38
    • Status: offline
    RE: Getting USB faster... 2006/09/15 19:26:49 (permalink)
    0
    tty2006,
        What happens when the blocking problem pops up? Does the USB connection crash? Do you need to reset the PIC? I've run many, many megabytes through from my PC to my PIC, and I've yet to have a problem (that I can see). I'm not running a USB analyzer like you are, however. Could I be having this problem and just not seeing it?
         You're not useing the Liberty Basic application I wrote to send data from the PC, are you?

    *Brian
    #16
    EmbeddedMan
    Junior Member
    • Total Posts : 112
    • Reward points : 0
    • Joined: 2005/10/15 15:08:38
    • Status: offline
    RE: Getting USB faster... 2006/09/15 19:29:19 (permalink)
    0
    OK, so I think I'm starting to understand things a bit more here.

    I've attached a project that is only slightly modified from the one above that proves I can get 672KB/s down into the PIC. The problem is that in order to get this speed, I had to take out the code that actually does something with that data. I just throw it all away. If you run the project and the Liberty Basic application, it will tell you exactly what speed you can get from your PC down to your PIC.

    *Brian
    #17
    EmbeddedMan
    Junior Member
    • Total Posts : 112
    • Reward points : 0
    • Joined: 2005/10/15 15:08:38
    • Status: offline
    RE: Getting USB faster... 2006/09/15 19:41:44 (permalink)
    0
    So now here is the _exact_ same code, except that I've enabled the loop that copies the 64 bytes/packet from the USB buffer to the 'application' buffer. Note, we still don't _do_ anything with the data we're receiving yet other than copy it over, but we're taking our first step by copying it out. This is an 'operation' on the data coming from the PC.

    Now, we get only 343KB/s.

    What this means to me is that the speed of data transfer is limited by how much PIC CPU time there is to process the data, not by the USB system or windows. In other words, yes, you can get 700KB of data from the PC to the PIC per second, but it doesn't do you _any_ good since you don't have any spare CPU cycles to actually use the data on the PIC!

    Let's do a little thought experiment. Let's say we could get 1MB/s from the PC to the PIC. Let's say our PIC ran at 40MHz. How many cycles would we have between each byte that came down from the PC? (yes, I know it comes as packets, but bare with me). 10. 10 lousy cycles. In that 10 cycles, you need to 'do what you do' with that byte, and some small percentage of the overhead of servicing the SIE, as well as any other code (ISRs, etc.) that are running on the PIC. My point is that the PIC's just aren't fast enough (clock speed wise) to actually use that much data in any sophisticated manner. Even with 500KB/s and 48MHz, you still only get 24 cycles per byte.

    Now, things would be different if you could have the SIE dump 4K at a time into a buffer, and then have some code that rips through that buffer all at once. Even at 12MIPS, we would have a much more usefull system.

    So, I've satisfied myself that the theoretical speed of 700KB/s is probably reachable, that the real world speed of 650KB/s is easily reachable, but is useless to me. Instead, we have to optomize the heck out of the code that actually uses this data in order to see any semi-fast speeds out of the CDC software.

    Should I be thrown in the loony bin for thinking this way?

    *Brian
    post edited by EmbeddedMan - 2006/09/15 19:46:34
    #18
    tty2006
    Starting Member
    • Total Posts : 88
    • Reward points : 0
    • Joined: 2006/09/08 13:43:26
    • Location: 0
    • Status: offline
    RE: Getting USB faster... 2006/09/16 14:42:45 (permalink)
    0
    Thanks, Brian.
     
    1. There is nothing like crash happening to my USB-RS232 emulation testing program. The major sign is that the data stream in PIC->PC direction stops sending in, if PIC->PC direction is blocked. However, if I run HyperTerminal (Windows utility), communication blocking of PC->PIC will freeze the application.
     
    2. PC->PIC direction is blocked much less frequently. This may be cause I send 1 byte data out (PC->PIC) and 32 bytes in (PIC->PC).
     
    3. It seems to me that bidirection of data communication is the cause of communication blocking, because if one direction is blocked, the other direction will never get blocked again. (or I did not have the luck to see it?) I will do more experiment to explore it.
     
    4. My interest is to receive large amount of data from a PIC device, like a video camera. On the other hand, I think you are trying to send out large amount of data from PC to a PIC device. Can you explain why you need to send lots of data to PIC?
     
    I need more understanding of your comments. Please forgive me and correct me if the following are wrong:
     
    5. According to full speed USB specification, the data transfer rate is: 64 bytes x 19 packet/msec x 1000msec = 1.216Mbytes/sec. Therefore, PIC needs to handle every byte in 0.812usec in order to achieve the speed.
     
    6. As my PIC runs at 48MHz, every one byte data needs to be handled within 10 clock cycles, or about 4 - 6 instructions, in order to reach 1.2Mbytes/sec. That is a very tough job, especially if the program is written in C. However, it is possible to achieve such a speed if the program is written in ASM and carefully optimized.
     
    6. There is almost nothing else the PIC can do except for solely data transfer, if not possible, at 1.2Mbytes/sec. It may be possible to achieve such a speed in store/load operations, like mass storage devices. This is not for audio/video processing where codec/DSP is involved and is very instruction consuming.
     
    7. For 600Kbytes/sec transfer rate, the job becomes more possible, as the PIC spends 8 - 12 instructions per byte of data. However, it is still not very easy if the program is written in C.
     
    8. Your achievement of 343Kbytes/sec is great, considering your code is in C. You should be able to increase it significantly if you modify it with ASM.
     
    Regards, Young
     
    #19
    tty2006
    Starting Member
    • Total Posts : 88
    • Reward points : 0
    • Joined: 2006/09/08 13:43:26
    • Location: 0
    • Status: offline
    RE: Getting USB faster... 2006/09/16 16:50:32 (permalink)
    0
    It's almost sure that the blocking is caused by some kind of collisions during data transfer. Therefore I am suspecting that the CDC sample code from Microchip has some defect / bug that does not prevent data transfer from collision. More specificly, I think that the data for either directions must wait for a empty "train" carrying no in/out data before it can be "loaded" and transfered. 
     
    I will test more and check out which flag I should watch for if there is any data coming in. Any suggestion?
     
     
    #20
    Page: 12 > Showing page 1 of 2
    Jump to:
    © 2020 APG vNext Commercial Version 4.5