Read and Write Image Block into DDR using Vivado IP Block - image-processing

We are working on project where we need to do some image processing on FPGA. For that purpose we are using ZedBoard with linaro (Ubuntu Version) running on it.
What we have already done is we have stored the image in binary form pixel by pixel in DDR using python script on Processing System of Zedboard.
Now our task is to read the content of DDR memory, process it and send back the processed output to DDR Memory again. We are using vivado xilinx tool for FPGA part. We tried to use AXI-DMA with AXI-Interconnect to read and write data from DDR.
My question is, Do We need to use SDK and some sort of C coding to read and write DDR Memory on Programmable Logic side? As we want to make our module start reading the data from DDR with a control signal and then start actual processing of Image data. Once we read specific block of data, process it and store the result back to DDR memory on the fly. We are not sure which IP Block do we need in our block design for vivado. Also do we need Block Ram Memory at the end before sending the date to DDR.
Can anyone who already done this sort of project or have any knowledge ? Any help from your side will be appreciated !
Thanks

The zynq FPGA provide an AMBA AXI interconnect for that purpose.
This is the interconnect on the right.

Related

Triggering SMI interrupt from OS with a valid communication buffer

I'm playing around with UEFI and SMM and currently am trying to trigger an SMI interrupt from ring 0 on an Intel NUC machine. I've been using Chipsec to do so but couldn't properly specify a valid communication buffer, which the SW SMI handler gets as one of its parameters.
The only clue I found in the UEFI specs is in "Appendix O - UEFI ACPI Data Table" under Table 310. SMM Communication ACPI Table but the specified method doesn't seem to work. I'm taking a black box approach as I don't have access to the NUC's SMRAM.
What is the working way of successfully specifying the communication buffer for an SMI? Some code samples will be greatly appreciated.
It is a secure SMI call must be invoked, then this buffer could be retrieved. Modern UEFI BIOS provides special mechanism to present external user to get private data. This information did not reveal on any documents.

FATFS integration on SPI NAND FLASH

I'm trying to integrate FATFS file system on Micron NAND SPI FLASH. I'm using the SPI peripheral of the STM32L486RG as interface.
I have developped a low level driver through which I'm able to read, write and erase data from different locations in the NAND memory.
I have then integrated my Low-level driver APIs under diskio.c file in order they could be used by fatfs APIs.
I have successfully formatted the memory through f_mkfs (I'm getting FR_OK with both f_mkfs and f_open APIs and when debugging the fs object is containing the FAT signature).
However, when I try to write buffer into the file that I have created using f_oprn , I get "FR_INT_ERR" .
I have debugged my code step by step and I found that my get_fat function returns (1) as result which means that an internal error has occurred .
Any idea what could be the issue ?
I guess you need to erase the memory's sector you mean to write in - even though you write per pages and not per entire sector - and that's why using FatFs becomes tricky in NAND Flash.
Since your purpose is to bound the logical drive to the entire physical drive, you need to use the option ( FM_SDF | FM_ANY ) for the parameter opt into the f_mkfs function to format the memory.

Save the outputs of motes in mote memory

I am using two motes. one has unicast sender program on it and one has uni-cast receiver program on it. Instead of connecting receiving mote with PC, I want to use batteries for mote power source and I want to save outputs of both motes on its mote memory. How can I save output(printf command outputs) of each mote in mote memory and retrieve later on after completion of experiments. Is there any method(built-in functions, commands or code snippet) available for this
P.S. I am using zolertia z1 motes
The straightforward way is to use the xmem interface. The function prototypes are declared in file xmem.h:
https://github.com/contiki-os/contiki/blob/master/core/dev/xmem.h
For Z1 there's a platform-specific implementation of xmem in platform's directory.
If you have never worked with flash memory before, note that the "rewrite" operation is typically not supported by the hardware. You need to erase a whole sector of the flash before you can write anything in that sector. Therefore, the typical usage pattern for dumping sensor data or logs is "write only at the end, never modify". When the current sector is full, erase the next one and write there, and so until the whole flash is full.
Contiki also the Coffee filesystem, which a higher level interface if you need one.

Beagleboard: How do I send/receive data to/from the DSP?

I have a beagleboard with TMS320C64x+ DSP. I'm working on an image processing beagleboard application. Here's how it's going to work:
The ARM reads an image from a file and puts the image in a 2D array.
The arm sends the matrix to the DSP. The DSP receives the matrix.
The DSP performs the image processing algorithm on the received matrix (the algorithm code uses about 5MB of dynamically allocated memory).
The DSP sends the processed image (matrix) to the ARM. The arm receives the matrix.
The arm saves the processed image to a file.
I'v already written the code for steps 1,3,5. What is the easiest way to do steps 3+4 (sending the data)? Code examples are welcome.
The easiest way is to use shared memory:
Use the CMEM kernel module to allocate a chunk of memory on the ARM that can be accessed from ARM and DSP. Then pass the pointer down to the DSP using the DspBios NOTIFY component.
Once the DSP is done with processing you can notify the ARM via NOTIFY.
This way there is no need to copy the data from the ARM to the DSP or vice versa. All you have to make sure is, that the data comes from the CMEM component. This makes sure the memory is contiguous (the DSP does not know about the ARM memory manager).
Shared memory is the right approach, but learning how to do it can be a pain. The C6Run tool can abstract the ARM/DSP communications for you making it easier. Although NOTIFY is really the right API to use, C6Run utilizes CMEM using an older API.
If you want to try C6Run out on the BeagleBoard, the easiest way is by following the instructions on the eLinux wiki for setting up C6Run for the ECE597 course given by Mark Yoder at Rose-Hulman. These instructions depend on running the Angstrom demo image(2). A stable version that was used to demonstrate functionality of the hardware is documented as well(3).
(2): www.angstrom-distribution.org/demo/beagleboard
(3): code.google.com/p/beagleboard/wiki/BeagleBoardDiagnosticsNext

How to mitigate host + device memory tranfer bottlenecks in OpenCL/CUDA

If my algorithm is bottlenecked by host to device and device to host memory transfers, is the only solution a different or revised algorithm?
There are a couple things you can try to mitigate the PCIe bottleneck:
Asynchronous transfers - permits overlapping computation and bulk transfer
Mapped memory - allows a kernel to stream data to/from the GPU during execution
Note that neither of these techniques makes the transfer go faster, they just reduce the time the GPU is waiting on the data to arrive.
With the cudaMemcpyAsync API function you can initiate a transfer, launch one or more kernels that do not depend on the result of the transfer, synchronize the host and device, and then launch kernels that were waiting on the transfer to complete. If you can structure your algorithm such that you're doing productive work while the transfer is taking place, then asynchronous copies are a good solution.
With the cudaHostAlloc API function you can allocate host memory that can read and written directly from the GPU. The reason this is faster is that a block that needs host data only needs to wait for a small portion of the data to be transferred. In contrast, the usual approach makes all blocks wait until the entire transfer is complete. Mapped memory essentially breaks a big monolithic transfer into a bunch or smaller copy operations, so the latency is reduced.
You can read more about these topics in Section 3.2.6-3.2.7 of the CUDA Programming Guide and Section 3.1 of the CUDA Best Practices Guide. Chapter 3 of the OpenCL Best Practices Guide explains how to use these features in OpenCL.
You really need to do the math to be certain that you're going to be doing enough processing on the GPU to make it worthwhile transferring data between host and GPU. Ideally you do this at the design stage, before doing any coding, since it can be a deal-breaker.

Resources