What does the Streaming stand for in Streaming SIMD Extensions (SSE)? - stream

I've looked everywhere and I still can't figure it out. I know of two associations you can make with streams:
Wrappers for backing data stores meant as an abstraction layer between consumers and suppliers
Data becoming available with time, not all at once
SIMD stands for Single Instruction, Multiple Data; in the literature the instructions are often said to come from a stream of instructions. This corresponds to the second association.
I don't exactly understand why the Streaming in Streaming SIMD Extensions (or in Streaming Multiprocessor either), however. The instructions are coming from a stream, but can they come from anywhere else? Do we or could we have just SIMD extensions or just multiprocessors?
Tl;dr: can CPU instructions be non-streaming, i.e. not come from a stream?

SSE was introduced as an instruction set to improve performance in multimedia applications. The aim for the instruction set was to quickly stream in some data (some bit of a DVD to decode for example), process it quickly (using SIMD), and then stream the result to an output (e.g. the graphics ram). (Almost) All SSE instructions have a variant that allows it to read 16bytes from memory. The instruction set also contains instructions to control the CPU cache and HW prefetcher. It's pretty much just a marketing term.

Related

Spartan 6 SP605 VHDL external ram usage?

I'm new to using VHDL and have run into an issue with my project. I'm trying to make an FPGA to converts from one communication protocol to a different one, and for this purpose it would be useful to be able to store (hopefully multiple) packets before converting.
Before I tried to store this data in arrays, but it became quickly apparent that this takes up far too much space on the FPGA. Therefore, I have been searching for a way to store the data on the DDR3 ram on the SP605 board (http://www.xilinx.com/support/documentation/boards_and_kits/xtp067_sp605_schematics.pdf, page 9). I however cannot find instructions on how to write or read data from this. I'm trying to store one 8bit std_logic_vector per clock cycle to later be accessed.
Can anyone advise me on how to proceed?
Xilinx offers an IP Core generator. This IP catalog contains a Memory Interface Generator (MIG) which generates an IP Core to access different memory types. Configure this core for DDR3.
Writing a DDR3 controller in VHDL is not a project for a beginner not even for an experienced designer.
The state machine is simple and well known, but the calibration logic is very costly.
You should consider a caching or burst read/write technique, because DDR memory can not be accessed in every cycle.

Data transfer from one file to other in Xilinx

I haven't worked with block memories concept in Xilinx before. I want to put some simple numbers in a text file and save it. Then take those numbers and multiply by 2 and save in another file. I have written VHDL code but this is involving I/O so i have to use block RAM. But I have no clue about it. I have read tutorials and datasheet but still can't figure out how to do my task using BRAM. I am pasting my code with this question. Please let me know if we have to do some sort of programming for BRAM. when I am trying to compile the code, it is showing error that inFIle does not exist.
VHDL is not a programming language.
There are some programming-language-like features in VHDL (for example file IO), but these are only there to help write testbench code for simulation. When writing VHDL, don't think about coding software. Think about the hardware structure that you want to describe.
In hardware, there is no such thing as a "file". There is a hardware interface consisting of fixed signals (address, data, enables) to, e.g., a block RAM. You can read a word of data from the memory by specifying an address, but this will always be raw data.
To get the raw data into the block RAM, there will pretty much always be some software process running on an embedded or external CPU. The software running on the CPU can interpret the file system, and pass the relevant information for hardware-assisted processing to the hardware core (e.g., starting address in memory of data to be processed, length of data, parameterization of algorithm, etc.). Alternatively, there may be streaming data sources and sinks that pass through the hardware for processing.
This is what hardware is best at: processing a continuous stream of data and performing the same set of calculations on each data word.

Which API to play audio from a buffer in ios and osx?

I would like to do this very simple thing: playing PCM audio data from memory.
The audio samples will come from sound-synthesis algorithms, pre-loaded sample files or whatever. My question is really about how to play the buffers, not how to fill them with data.
So I'm looking for the best way to re-implement my old, deprecated AudioWrapper (which was based on AudioUnits V1), but I could not find in the Apple Documentation an API that would fulfill the following:
Compatible with 10.5 through 10.7.
Available in ios.
Does not rely on a third-party library.
Be future proof (for example: not based on Carbon, 64 bits...).
I'm considering using OpenAL, but is it really the best option ? I've seen negative opinions about it, it might be too complex and overkill, and might add performance overhead ?
At worse, I could have two different implementations of that AudioWrapper, but if possible, I'd really like to avoid having one version for each system (ios, 10.5, 10.6, 10.7...). Also, it will be in C++.
EDIT: I need a good latency, the system must respond to user interactions in under 20 ms (the buffers must be between 128 and 512 samples at 44KHz)
AudioQueues are quite common. However, their I/O buffer sizes are large enough that they are not ideal for interactive I/O (e.g. a synth).
For lower latency, try AudioUnits -- the MixerHost sample may be a good starting point.
Not sure about OS X 10.5, but I'm directly using the Audio Units API for low-latency audio analysis and synthesis on OS X 10.6, 10.7, and iOS 3.x thru 5.x. My wrapper file to generalize the API came to only a few hundred lines of plain C, with a few ifdefs.
The latency of Audio Queues was too high for my low latency stuff on iOS, whereas the iOS RemoteIO Audio Unit seems to allow buffers as short as 256 samples (but sometimes only down to 1024 when the display goes off) at a 44100 sample rate.

Relevant microcontroller specs for (very) simple image processing

My and my fellow students are deciding on a choosing a simple microcontroller to do very basic image processing. We are basically trying to implement template matching to find a set of objects in specific portions of the image. We'd like to use a connect a webcam to the microcontroller to do the job take the pictures and look for the objects. We also require basic wireless communication (e.g. bluetooth or wifi).
I don't think we will have the luxury of using state-of-the-art microcontroller, but something thats been around for a while (due to budget and stuff). Could anyone please advise on which specs of the microcontrolelr would be the most relevant for the above task (e.g. CPU, MIPS, etc).
Thanks a lot!
For this kind of a task, I would say the amount of RAM is the most relevant spec.
A microcontroller with an external memory interface allows you to extend the data space with additional SRAM to hold your image data.
Also note, that memory is needed for any protocol stacks you need to implement (Bluetooth, TCP/IP even more so).
You probably want to have total RAM in tens of kilobytes, preferably 100+ kB.
It is also nice to have plenty of program memory available when learning and experimenting. Later on you can try to optimize and squeeze your code into a more confined device.
As for the architecture, choose something you can easily find development tools and examples for.ARM, AVR and PIC are all good candidates among others.
Also find out what interfaces you need to use to
control the camera (e.g. I2C or SPI)
read pixel data (e.g. parallel or analog)
Connecting directly to a webcam's USB interface would not be a straightforward task, as the microcontroller would need to act as a USB host.
Good luck with your project!
You may need a microcontroller with following features:
USB 2.0 Host controller
1.2MB of memory for buffer 640*480*2(bytes per pixel)*2(double buffer)
(you may use lower resolution if there are not enough memory)
Wifi controller
CPU power strong enough for your task
Ready open source code
It seems that broadcom controllers may be useful here.
Also, you can by off-the-shell Wifi router with usb port and use it for your project
(i.e. Linksys E3000 )

What is a 'Stream', relating to cin and cout?

A tutorial is talking about cin and cout:
"Syntactically these streams are not used as functions: instead, data are written to streams or read from them using the operators <<, called the insertion operator and >>, called the extraction operator."
What is a 'stream'?
Consider a "Stream" as a physical hose, or pipe. At one end, someone may pour some water in. At the other end, it will come out. This is 'reading' and 'writing' to the stream.
A stream is just a place where data goes. It can be a 'socket stream' (over the internet) or a 'file stream' (to a file), or perhaps a 'memory stream', just data written to a place in-memory (ram).
A "stream" is an object that represents a source of data, or a place where data can be written.
Examples include file handles and pipes - things that you can read data from or write data to.
An important property of streams is that they share a common interface, so the same code can write to either a file or a pipe (for instance) without needing to be rewritten.
You should look at streams as abstractions on underlying 'sources' or 'sinks' of data. A source is something you read data from, and a sink is something you write data to.
The concept of streams allows you to perform I/O on various forms of media, network connections, pipes between applications, files, etc.
The stream abstraction is very valuable to us as developers as it allows us to simplify input and output, and it gives us the flexibility to arrange and reconnect the sources and destinations of these streams.
A good analogy is that of a hose. You can send and receive data through hoses, and you can connect these hoses to various things.
By allowing programs to talk through hoses, we allow all sorts of programs to talk to each other, and we increase interoperability and utility vastly.
This is at the heart of the UNIX philosophy, and supports some very powerful programming idioms.

Resources