How to use a DSP Slice in FPGAs (Artix7) - signal-processing

I recently started programming on FPGAs and i have to work with the onboard DSP Slices.
My instantiation is copied from the user guide, but I dont know exactly how to do the behavioral part of it. There is also a little working code, but I dont know how to combine them :(
Please help me guys I am quite desperate...
btw i use VHDL for develloping
Library UNISIM;
use UNISIM.vcomponents.all;
library UNIMACRO;
use unimacro.Vcomponents.all;
ADDMACC_MACRO_inst : ADDMACC_MACRO
generic map (
DEVICE => "7SERIES", -- Target Device: "7SERIES", "VIRTEX6", "SPARTAN6"
LATENCY => 4, -- Desired clock cycle latency, 1-4
WIDTH_PREADD => 25, -- Pre-Adder input bus width, 1-25
WIDTH_MULTIPLIER => 18, -- Multiplier input bus width, 1-18
WIDTH_PRODUCT => 48) -- MACC output width, 1-48
port map (
PRODUCT => PRODUCT, -- MACC result output, width defined by WIDTH_PRODUCT generic
MULTIPLIER => MULTIPLIER, -- Multiplier data input, width determined by WIDTH_MULTIPLIER generic
PREADDER1 => PREADDER1, -- Preadder data input, width determined by WIDTH_PREADDER generic
PREADDER2 => PREADDER2, -- Preadder data input, width determined by WIDTH_PREADDER generic
CARRYIN => CARRYIN, -- 1-bit carry-in input
CE => CE, -- 1-bit input clock enable
CLK => CLK, -- 1-bit clock input
LOAD => LOAD, -- 1-bit accumulator load input
LOAD_DATA => LOAD_DATA, -- Accumulator load data input, width defined by WIDTH_PRODUCT generic
RST => RST -- 1-bit input active high synchronous reset
);
-- End of ADDMACC_MACRO_inst instantiation
And there is some working code, i have written so far:
entity test is
Port (clk : inout std_logic;
oclk : out std_logic;
end test;
architecture Behavioral of test is
begin
oclk <= not clk;
process
begin
clk <= '1';
wait for 5 ns;
clk <= '0';
wait for 5 ns;
end process;
end Behavioral;

Related

VHDL runtime: invalid memory access (dangling accesses or stack size too small)

Weird error when running the test bench, I have never seen this before. I am attempting to simulate an 8-bit calculator with 4 registers. The calculator has 8 bit instructions for add,subtract, branches on equal, load immediate, and print to the monitor. I've checked to make sure I am not in any infinite loops. I've researched online, and there does not seem to be any specific reasons why I may have this error.
I've triple checked for loops, memory leaks, and also tried increasing the stack frame. None have worked out so far. Using commands ghdl -a, -e, and -r to analyze, compile, and run.
'''
architecture structural of calculator_tb is
component calculator is
port(
I : in std_logic_vector(7 downto 0); --instruction input
clk : in std_logic
);
end component calculator;
signal I : std_logic_vector(7 downto 0);
signal clk : std_logic;
begin
calculator_0 : calculator port map(I, clk);
process
file instruction_file : text is in "instructions.txt"; --Instructions in text(ASCII) file.
variable instruction_line : line;
variable intruction_vector : bit_vector(7 downto 0);
begin
while (not(endfile(instruction_file))) loop --Loop to the end of the text file.
wait for 1 ns;
clk <= '0';
readline(instruction_file, instruction_line); --Read in instruction line
read(instruction_line, intruction_vector); --merge instruction to bit vector
I <= to_stdlogicvector(intruction_vector); --Convert bit vector to std_logic_vector and pass instruction to the calculator input.
--Create a rising edge for the clock.
wait for 1 ns;
clk <= '1';
end loop;
assert false report "end of test" severity note;
end process;
end architecture structural;
''''
The runtime error I receive is:
invalid memory access (dangling accesses or stack too small)

How is /= translated to actual hardware in vhdl

I am a beginner in VHDL/FPGA programming. I want to compare two 32-bit std_logic_vectors. I am currently using:
if ( RX_FRAME(to_integer(s_data_counter)).Data /= REF_FRAME(to_integer(s_data_counter)).Data ) then
s_bad_frame <= '1';
state <= DONE;
end if;
Here RX_FRAME and REF_FRAME are 2 arrays of std_logic_vector(31 downto 0)
I want to know how the synthesis tool translates /= into hardware.
Is it advisable to use this? Or should I do an XOR of the concerned vectors and check the resulting vector against zeros? In case I do an XOR and check against zeroes, doesn't it increase the amount of hardware needed?
I am using Vivado Design Suite 2015.3.
You should do compare with /= to really benefit from a language like VHDL and advanced synthesis tools like Xilinx Vivado.
The synthesis tool will then implement this using internal LUTs in the FPGA, maybe with a function similar to XOR gates for variable arguments, or AND/NOT gates if one of the arguments evaluates to a constant. The best way to see the actual implementation is to bring up a GUI view in the tool that shows the implemented design.
But starting to double guess the tool by doing XOR gates yourself is usually a bad idea, since the tool is usually much better to determine the best implementation. However, if you experience that the tool can't identify a specific construction and select an effective implementation, it may be a good idea to guide the tool with more implementation near coding style, but for a compare like /= this is rarely the case.
As Morten already presented, compare operations are implemented in LUTs doing some kind of X(N)OR and AND/(N)OR aggregation.
But it could be faster ...
FPGAs have fast carry chains, which can be used to speed up compare operations with wide inputs, but synthesis tools mostly don't utilize this special resources.
How to do an equality comparation using a carry chain?
Carry chains can be implemented as kill-propagate chains. This naming comes from ripple carry adders, wherein a carry out can be generated, propagated from carry in or be killed.
A comparator starts with an active carry in (all is equal). Each step computes in the LUT: An = Bn. If yes, propagate the carry bit else kill it.
If the carry out is high (the initial value has survived the chain) all bits were equal.
Appendix for Morten Zilmer
I don't have an example code for the equal or unequal operation, but I have a similar example for prefix and and prefix or operators using carry chains to speed up computation for wide inputs.
prefix_and calculates: y(i) <= '1' when x(i downto 0) = (i downto 0 => '1') else '0';
Explanation:
The resulting vector is 1 until the first 0 is found, after that it is 0.
Or in other words: The first zero found at position i while going from 0 to n kills all remaining bits regardless of the input bits.
prefix_or calculates: y(i) <= '0' when x(i downto 0) = (i downto 0 => '0') else '1';
Explanation:
The resulting vector is 0 until the first 1 is found, after that it is 1.
Or in other words: The first one found at position i while going from 0 to n generates an one and propagates it to all remaining bits regardless of the input bits.
The following code is a generic VHDL description of prefix_and. It is vendor independent, but uses special primitives (MUXCY) on Xilinx FPGAs.
architecture rtl of arith_prefix_and is
begin
y(0) <= x(0);
gen1: if N > 1 generate
signal p : unsigned(N-1 downto 1);
begin
p(1) <= x(0) and x(1);
gen2: if N > 2 generate
p(N-1 downto 2) <= unsigned(x(N-1 downto 2));
-- Generic Carry Chain through Addition
genGeneric: if VENDOR /= VENDOR_XILINX generate
signal s : std_logic_vector(N downto 1);
begin
s <= std_logic_vector(('0' & p) + 1);
y(N-1 downto 2) <= s(N downto 3) xor ('0' & x(N-1 downto 3));
end generate genGeneric;
-- Direct Carry Chain by MUXCY Instantiation
genXilinx: if VENDOR = VENDOR_XILINX generate
component MUXCY
port (
S : in std_logic;
DI : in std_logic;
CI : in std_logic;
O : out std_logic
);
end component;
signal c : std_logic_vector(N-1 downto 0);
begin
c(0) <= '1';
genChain: for i in 1 to N-1 generate
mux : MUXCY
port map (
S => p(i),
DI => '0',
CI => c(i-1),
O => c(i)
);
end generate genChain;
y(N-1 downto 2) <= c(N-1 downto 2);
end generate genXilinx;
end generate gen2;
y(1) <= p(1);
end generate gen1;
end architecture;
Source: PoC.arith.prefix_and

Unable to synthesize a signal because of bad synchronous descriptionin VHDL

I have a problem with the Synthesise in VHDL. This is the part of the code where it gives me error:
CASE stare_curenta IS
WHEN verde =>
stare_urm <= albastru;
rosuS1368stg <= '1';
galbenS1368stg <= '0';
verdeS1368stg <= '0';
rosuS1368 <= '0';
galbenS1368 <= '0';
if ( clock'event and clock = '0') then
galbenS1368 <= '1';
end if;
verdeS1368 <= '1';
rosup1v1i4v2i3v1i2v2i6v1i5v2i8v1i7v2 <= '0';
verdep1v1i4v2i3v1i2v2i6v1i5v2i8v1i7v2 <= '1';
rosuS2457stg <= '1';
galbenS2457stg <= '0';
if (clock'event and clock = '0') then
galbenS2457stg <= '1';
end if;
verdeS2457stg <= '0';
rosuS2457 <= '1';
galbenS2457 <= '0';
verdeS2457 <= '0';
rosup2v1i1v2i4v1i3v2i5v1i8v2i7v1i6v2 <= '1';
verdep2v1i1v2i4v1i3v2i5v1i8v2i7v1i6v2 <= '0';
I have another process of clock and clock'event below, like this one:
PROCESS(clock,stare_urm)
BEGIN
if (clock'event and clock = '1')then
stare_curenta <= stare_urm;
end if;
END PROCESS;
The 'Check Syntax' and 'Simulation' are going well, only the Synthesise it gives me the error: Signal galbenS1368 cannot be synthesized, bad synchronous description. The description style you are using to describe a synchronous element (register, memory, etc.) is not supported in the current software release.
Thank you!
The problem is:
if (clock'event and clock = '0') then
inside your state decoding. It's not a syntax problem, and simulators will dutifully carry out exactly what you wrote (though you may not get the results you intended), but as the error message says, it's not a supported synthesis style (embedding a clocked segment of code, i.e. a portion of a process intended to create a register, inside a larger combinational process).
Either way, I'm not sure it's what you really want to do. A process is evaluated when a signal in its sensitivity list changes. The way you've coded it, you're effectively saying "when this process is evaluated, if at that exact instant the clock is falling, register the signal", even though what you probably intended was to assign a value which is then registered on the next falling clock edge.
Assuming that last statement is so, it basically tells you how to code it. It has two parts. (1) assign a value:
when verde =>
...
galbenS2457stg <= '1';
(2) which is registered on the next falling clock edge:
process (clock)
begin
if clock'event and clock = '0' then
galbenS2457stg_reg <= galbenS2457stg;
end if;
end process;
Synchronous design in synthesis is essentially register -> a bunch of combinational logic -> register -> combinational logic -> etc. Coding it like that, with your registers separated from your combinational logic in code, is a good way to start thinking more in terms of hardware.
edit for clarification
Based on your responses in the comments, it seems I was not clear enough on what I was recommending.
You have some sort of process to handle your state machine. I'm assuming most of it is unclocked. You tried to insert small, clocked portions into it, and this is what the tool is complaining about. I'm suggesting you try coding in the following manner (not complete code, just to illustrate the point):
comb1 : process (...) -- fill in sensitivity list as needed
begin
-- state machine decode
...
when verde =>
...
galbenS2457stg <= '1';
...
end process comb1;
clocked1 : process (clock)
begin
if clock'event and clock = '0' then
galbenS2457stg_reg <= galbenS2457stg;
end if;
end process clocked1;
Change the names if you need to - since you only posted part of your code, it wasn't evident that galbenS2457stg was an output at all. However you name them, the signal assigned in your state decode process should be an internal signal, and would be declared in the architecture declarative region, not as an output port, while the clocked signal would be the output port, and would be declared as such.

Storing array in FPGA

I am trying to implement a simple multiplier. I have a text file, from in which there are two columns. I am multiplying column 1 to column 2. Here is code in Verilog:
module File_read(
input clk
);
reg [21:0] captured_data[0:10];
reg [21:0] a[0:8];
reg [21:0] b[0:8];
reg [43:0] product[0:5];
`define NULL 0
integer n=0;
integer i=0;
initial
$readmemh("abc.txt",captured_data);
always #(posedge clk) begin
product[i]<=captured_data[n]*captured_data[n+1];
n<=n+2;
i<=i+1;
end
endmodule
I have Xilinx SpartanĀ®-6 LX45 FPGA board. And it offers 128M bit ddr2 ram and 16Mbyte x4 SPI Flash for configuration & data storage.
Now I want to store my file into FPGA board into memory. So how can I do this? Do I have to use IP core to access memory or by any other source?
P.S: This is my first time, I am storing anything on FPGA.
Regards!
Awais
First of all don't use DDR or Flash memory, unless you really need them. Your FPGA has plenty of BlockRAMs to store several thousand arguments for your multiplier.
One easy way is to instantiate 2 BlockRAMs and load them at compile time with data from a file. Xilinx offers tools like data2mem to achieve this.
Alternatively, you can use Ethernet or a UART connection to send the test data to your design.
Edit 1 - How to instantiate BlockRAM
Solution 1: A generic VHDL description.
type T_RAM is array(LINES - 1 downto 0) of std_logic_vector(BITS-1 downto 0);
signal ram : T_RAM;
begin
process (Clock)
begin
if rising_edge(Clock) then
if (WriteEnable = '1') then
ram(to_integer(WriteAddress)) <= d;
end if;
q <= ram(to_integer(ReadAddress));
end if;
end process;
Solution 2: The IPCore generator has a wizard to create BlockRAMs and assign external files.
Solution 3: Manually instantiate a BlockRAM macro. Each FPGA family comes with a HDL library guide of supported macros. For example the Virtex-5 has a RAMB36 macro on page 311.
The usage of BlockRAMs with data2MEM and *.bmm (BlockRAM memory map) files is described here.

How to ignore output ports with port maps

Well often in VHDL I notice that a certain component has multiple output ports. Ie in one of our examples we were given the following component:
COMPONENT eight_bitadder
PORT ( a, b: in std_logic_vector(7 downto 0);
f: in std_logic;
C: out std_logic_vector(7 downto 0);
o, z: out std_logic);
END COMPONENT;
Where z determines if the result is 0, and o triggers on overflow.
Now in my case I wish to use this adder, however the actual result is not of importance, rather I only wish to check if the result is "0". I could of course add a dummy signal and store the port to this signal, however that seems needlessly complicated, and might add extra components during synthesis?
When you instantiate the component you can leave the output ports that you don't care about open. The only signal you care about below is "overflow".
EDIT: Note that the synthesis tools will optimize away any outputs that are not being used.
EIGHT_BITADDER_INST : eight_bitadder
port map (
a => a,
b => b,
f => f,
c => open,
o => overflow,
z => open
);
You also could choose to not tie an output to anything like so:
EIGHT_BITADDER_INST : eight_bitadder
port map (
a => a,
b => b,
f => f,
o => overflow
);
Notice that I simply did not include outputs c and z in the port map. Some may debate on the clarity of this (since it may not be clear that outputs c and z exists), but it also reduces the code to only what is necessary.

Resources