32 mb SDRAM via EBI -- WORKS!
The title might be misleading because there's alot of work involved, but I wanted others to know that it is indeed possible to get 20ns performance for linear access from a $3 external memory chip.
When weighing my memory options recently I was reminded how expensive SRAM is. Nearly $20 for a fast SRAM nearly doubles my silicon cost. Unfortunatly I need write speed of 20ns so I had no choice.
After thinking about it, I decided to drop a $5 FPGA on my board and a $3 SDRAM memory chip and learn verilog in the process. After all, I'd get an FPGA for free (to fix all the other issues I'm having with this blasted chip) and $8 for the memory. Seemed like a good deal.
While I can't release my design, I think what's more useful to you out there is knowing that it's not hard to do this and that the EBI works "good enough" to do this.
To get it to work properly, there are a few "features" I took advantage of in EBI that weren't really there for general purpose but their inclusion was key in allowing this to happen.
First thing to know is that SDRAM takes time to go fetch a burst of 8 words. But that time is averaged over the 8 words so it winds up giving 20 ns performance per BYTE on a 50 mhz bus. That's 50 MB/sec. The unfortuante thing is that EBI has tonnnes of bugs that slow it down to this level. It is very possible to have 200 MB/sec with this bus design (EBI) but these bugs kill performance.
Anyway, EBI is configured :
1) 50 mhz -- for SRAM (This makes me cringe that they did this. 1 for SRAM, 2 for NOR-- this is supposed to be general purpose!)
2) Page read - This is KEY-- because it lets you use the tRC delay to hide the memory read. The address is presented on the bus .. tRC occurs... then you hand out the result.
3) CACHE configured in the tlb. You have to map the address space of the EBI window using TLB. Turning on cache makes the transactions appear (always) as bursts of one cache line in size. This is key because if it were doing 1 byte/word at a time the performance would be terrible. You'd have to pay tRC (6 or so clock) read latency per word instead of for a set of 8!
4) EBI clock generated inside the FPGA. This is a project in itself since you can't output PBCLK8 (the EBI clock). You have to clock in EBI data on the rising edge of this clock and there's no way to get it out of the chip! And if you output another clock it will have a random phase shift! The trick is to run a clock at 200 mhz sourced from a reference clock output from the pick (use your fpga's PLL to do this). Then divide it down to get the 50 mhz EBI clock. To get it at the same phase, reset the divider when the EBI's chip select falls. This should line up all transitions from the PIC on the falling clock edge. You read on the rising edge.
The last thing is that for both reads and write EBI transactions is that I keep a "window" of time free ahead of the corresponding SDRAM transaction. This slot is used to do refresh when needed, and could also be used as a 2nd port. The sad thing is that running without the /READY line on EBI makes you have to "fix" the timing so it doesn't change. SO no matter what this slot has to be there. This slows things down a bit.
Using /READY is a non-starter unfortunately. This requires about 3 clocks per word which is terrible performance. But with the 8 word burst for read/write I am getting about 40 mb/sec for both reads and writes UNLESS I hit the tRC bug below. When that happens I drop to about 15 mb/sec for reads. I am trying to see if microchip will fix this but if the past is a reference probably not. BUT if your application (like mine) keeps the fast reads/writes linear then you don't hit this and keep the 40-50 mb/sec rate.
This is pretty much all you need to know. Armed with this and the timing diagrams of SDRAM and some good FPGA tools you can do it too.
Now for the EBI bugs-- just about everything on this chip is a bit brain damanged and EBI is no exception:
1) If a read is performed on a cache line that isn't loaded and it is a read of an address that isn't on a 16 byte boundry you pay for tRC twice. The read will actually start at the requested address (not rounded down!) and wrap around at the high end of the 16 byte boundry back down to the low end of the boundry When it does this you pay for tRC again! This is super annoying and reduces read performance.
2) There are spurious read transactions in a cache write. To filter these, just assume that when a write comes in that you will get 8 words and ignore any reads.
3) The EBI claims that you can clock it at 100 mhz so long as you observe the min timing in the family datasheets. I've found that running at 100mhz gives you flaky performance. The address lines for example glitch. I haven't determined for sure that it isn't an SI issue on my board but I don't think it is.
4) There are strange delays in between every 2 words written. This may be a bandwidth limitation inside the PIC32 but it isn't documented.
Anyway... it IS possible.. but Microchip threw some challenges in there for ya :)
post edited by tj256 - 2015/09/08 07:26:23