Spartan 6 SP605 VHDL external ram usage? - storage

I'm new to using VHDL and have run into an issue with my project. I'm trying to make an FPGA to converts from one communication protocol to a different one, and for this purpose it would be useful to be able to store (hopefully multiple) packets before converting.
Before I tried to store this data in arrays, but it became quickly apparent that this takes up far too much space on the FPGA. Therefore, I have been searching for a way to store the data on the DDR3 ram on the SP605 board (http://www.xilinx.com/support/documentation/boards_and_kits/xtp067_sp605_schematics.pdf, page 9). I however cannot find instructions on how to write or read data from this. I'm trying to store one 8bit std_logic_vector per clock cycle to later be accessed.
Can anyone advise me on how to proceed?

Xilinx offers an IP Core generator. This IP catalog contains a Memory Interface Generator (MIG) which generates an IP Core to access different memory types. Configure this core for DDR3.
Writing a DDR3 controller in VHDL is not a project for a beginner not even for an experienced designer.
The state machine is simple and well known, but the calibration logic is very costly.
You should consider a caching or burst read/write technique, because DDR memory can not be accessed in every cycle.

Related

What does the Streaming stand for in Streaming SIMD Extensions (SSE)?

I've looked everywhere and I still can't figure it out. I know of two associations you can make with streams:
Wrappers for backing data stores meant as an abstraction layer between consumers and suppliers
Data becoming available with time, not all at once
SIMD stands for Single Instruction, Multiple Data; in the literature the instructions are often said to come from a stream of instructions. This corresponds to the second association.
I don't exactly understand why the Streaming in Streaming SIMD Extensions (or in Streaming Multiprocessor either), however. The instructions are coming from a stream, but can they come from anywhere else? Do we or could we have just SIMD extensions or just multiprocessors?
Tl;dr: can CPU instructions be non-streaming, i.e. not come from a stream?
SSE was introduced as an instruction set to improve performance in multimedia applications. The aim for the instruction set was to quickly stream in some data (some bit of a DVD to decode for example), process it quickly (using SIMD), and then stream the result to an output (e.g. the graphics ram). (Almost) All SSE instructions have a variant that allows it to read 16bytes from memory. The instruction set also contains instructions to control the CPU cache and HW prefetcher. It's pretty much just a marketing term.

Altera DE10 standard writing to DDR using FPGA

I need to make a fpga module that can read and write to the ddr memory of the DE 10 standard fpga board. But I have no idea on where to start. Can some one please point me in a right direction.
Thank you.
Ideally you will take some demo project built for your exact board, and alter it accordingly to your needs.
The generic way - download the "SystemBuilder" sw from Terasic site (the manufacturer of the DE10 boards). Find your board, open "Resources" section and download needed stuff.
Run SystemBuilder, select the desired interfaces instantiated, including hard processor system. Systembuilder will create a template project with pin locations assigned.
Within your project run Qsys platform designer, instantiate the HPS core, configure the required interfaces (FPGA-to-HPS SDRAM Interface), fill in the settings for the sdram chips used. I can't remember if I've used some "Golden hardware reference design" project, but I had the needed numbers to configure ddr controller for exact chips on the other type of board (DE10-nano).
Run the tcl script "hps_sdram_p0_pin_assignments.tcl" to complete the ddr3 sdram pin assignment (standards applied, etc). On the fpga side then you'll have a memory-mapped interface, and you access ddr3 like it's just a static ram, but respecting the wait requests when controller asserts it.
It's very likely you'll find something pre-built in one of big archives present in "Resources" section on the Terasic site page for your board.
In any case, you'd better go through some tutorials on HPS instantiation, that's a huge topic, involving preparing the preloader that runs before linux, etc.

Data transfer from one file to other in Xilinx

I haven't worked with block memories concept in Xilinx before. I want to put some simple numbers in a text file and save it. Then take those numbers and multiply by 2 and save in another file. I have written VHDL code but this is involving I/O so i have to use block RAM. But I have no clue about it. I have read tutorials and datasheet but still can't figure out how to do my task using BRAM. I am pasting my code with this question. Please let me know if we have to do some sort of programming for BRAM. when I am trying to compile the code, it is showing error that inFIle does not exist.
VHDL is not a programming language.
There are some programming-language-like features in VHDL (for example file IO), but these are only there to help write testbench code for simulation. When writing VHDL, don't think about coding software. Think about the hardware structure that you want to describe.
In hardware, there is no such thing as a "file". There is a hardware interface consisting of fixed signals (address, data, enables) to, e.g., a block RAM. You can read a word of data from the memory by specifying an address, but this will always be raw data.
To get the raw data into the block RAM, there will pretty much always be some software process running on an embedded or external CPU. The software running on the CPU can interpret the file system, and pass the relevant information for hardware-assisted processing to the hardware core (e.g., starting address in memory of data to be processed, length of data, parameterization of algorithm, etc.). Alternatively, there may be streaming data sources and sinks that pass through the hardware for processing.
This is what hardware is best at: processing a continuous stream of data and performing the same set of calculations on each data word.

OpenCL subbuffers, why is important?

I try to implement a multi-gpu OpenCL code. In my model, GPUs have to communicate and
exchange data.
I found (I don't remember where, it is been some time) that one solution is to deal with
subbuffers. Can anybody explain, as simple as possible, why subbuffers are important
in OpenCL? As far as I can understand, one can do exactly the same using only buffers.
Thanks a lot,
Giorgos
Supplementary Question:
What is the best way to exchange data between GPUs?
I am not sure(or I do not know) how sub-buffers will provide solutions to your problem when dealing with Multiple GPU's. AFAIK sub buffers provide a view into a buffer i.e a single buffer can be divided into chunks of smaller buffers(sub buffers) providing a layer of software abstraction, Sub-buffers are advantageous in same cases where in you need keep an offset first element to be zero.
To address multiGPU or MultiDevice problem OpenCL 1.2 provides API from where you can copy memory objects directly from One GPU to other using clEnqueueMigrateMemObjects OpenCL API call http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueMigrateMemObjects.html

Relevant microcontroller specs for (very) simple image processing

My and my fellow students are deciding on a choosing a simple microcontroller to do very basic image processing. We are basically trying to implement template matching to find a set of objects in specific portions of the image. We'd like to use a connect a webcam to the microcontroller to do the job take the pictures and look for the objects. We also require basic wireless communication (e.g. bluetooth or wifi).
I don't think we will have the luxury of using state-of-the-art microcontroller, but something thats been around for a while (due to budget and stuff). Could anyone please advise on which specs of the microcontrolelr would be the most relevant for the above task (e.g. CPU, MIPS, etc).
Thanks a lot!
For this kind of a task, I would say the amount of RAM is the most relevant spec.
A microcontroller with an external memory interface allows you to extend the data space with additional SRAM to hold your image data.
Also note, that memory is needed for any protocol stacks you need to implement (Bluetooth, TCP/IP even more so).
You probably want to have total RAM in tens of kilobytes, preferably 100+ kB.
It is also nice to have plenty of program memory available when learning and experimenting. Later on you can try to optimize and squeeze your code into a more confined device.
As for the architecture, choose something you can easily find development tools and examples for.ARM, AVR and PIC are all good candidates among others.
Also find out what interfaces you need to use to
control the camera (e.g. I2C or SPI)
read pixel data (e.g. parallel or analog)
Connecting directly to a webcam's USB interface would not be a straightforward task, as the microcontroller would need to act as a USB host.
Good luck with your project!
You may need a microcontroller with following features:
USB 2.0 Host controller
1.2MB of memory for buffer 640*480*2(bytes per pixel)*2(double buffer)
(you may use lower resolution if there are not enough memory)
Wifi controller
CPU power strong enough for your task
Ready open source code
It seems that broadcom controllers may be useful here.
Also, you can by off-the-shell Wifi router with usb port and use it for your project
(i.e. Linksys E3000 )

Resources