I am a beginner in VHDL/FPGA programming. I want to compare two 32-bit std_logic_vectors. I am currently using:
if ( RX_FRAME(to_integer(s_data_counter)).Data /= REF_FRAME(to_integer(s_data_counter)).Data ) then
s_bad_frame <= '1';
state <= DONE;
end if;
Here RX_FRAME and REF_FRAME are 2 arrays of std_logic_vector(31 downto 0)
I want to know how the synthesis tool translates /= into hardware.
Is it advisable to use this? Or should I do an XOR of the concerned vectors and check the resulting vector against zeros? In case I do an XOR and check against zeroes, doesn't it increase the amount of hardware needed?
I am using Vivado Design Suite 2015.3.
You should do compare with /= to really benefit from a language like VHDL and advanced synthesis tools like Xilinx Vivado.
The synthesis tool will then implement this using internal LUTs in the FPGA, maybe with a function similar to XOR gates for variable arguments, or AND/NOT gates if one of the arguments evaluates to a constant. The best way to see the actual implementation is to bring up a GUI view in the tool that shows the implemented design.
But starting to double guess the tool by doing XOR gates yourself is usually a bad idea, since the tool is usually much better to determine the best implementation. However, if you experience that the tool can't identify a specific construction and select an effective implementation, it may be a good idea to guide the tool with more implementation near coding style, but for a compare like /= this is rarely the case.
As Morten already presented, compare operations are implemented in LUTs doing some kind of X(N)OR and AND/(N)OR aggregation.
But it could be faster ...
FPGAs have fast carry chains, which can be used to speed up compare operations with wide inputs, but synthesis tools mostly don't utilize this special resources.
How to do an equality comparation using a carry chain?
Carry chains can be implemented as kill-propagate chains. This naming comes from ripple carry adders, wherein a carry out can be generated, propagated from carry in or be killed.
A comparator starts with an active carry in (all is equal). Each step computes in the LUT: An = Bn. If yes, propagate the carry bit else kill it.
If the carry out is high (the initial value has survived the chain) all bits were equal.
Appendix for Morten Zilmer
I don't have an example code for the equal or unequal operation, but I have a similar example for prefix and and prefix or operators using carry chains to speed up computation for wide inputs.
prefix_and calculates: y(i) <= '1' when x(i downto 0) = (i downto 0 => '1') else '0';
Explanation:
The resulting vector is 1 until the first 0 is found, after that it is 0.
Or in other words: The first zero found at position i while going from 0 to n kills all remaining bits regardless of the input bits.
prefix_or calculates: y(i) <= '0' when x(i downto 0) = (i downto 0 => '0') else '1';
Explanation:
The resulting vector is 0 until the first 1 is found, after that it is 1.
Or in other words: The first one found at position i while going from 0 to n generates an one and propagates it to all remaining bits regardless of the input bits.
The following code is a generic VHDL description of prefix_and. It is vendor independent, but uses special primitives (MUXCY) on Xilinx FPGAs.
architecture rtl of arith_prefix_and is
begin
y(0) <= x(0);
gen1: if N > 1 generate
signal p : unsigned(N-1 downto 1);
begin
p(1) <= x(0) and x(1);
gen2: if N > 2 generate
p(N-1 downto 2) <= unsigned(x(N-1 downto 2));
-- Generic Carry Chain through Addition
genGeneric: if VENDOR /= VENDOR_XILINX generate
signal s : std_logic_vector(N downto 1);
begin
s <= std_logic_vector(('0' & p) + 1);
y(N-1 downto 2) <= s(N downto 3) xor ('0' & x(N-1 downto 3));
end generate genGeneric;
-- Direct Carry Chain by MUXCY Instantiation
genXilinx: if VENDOR = VENDOR_XILINX generate
component MUXCY
port (
S : in std_logic;
DI : in std_logic;
CI : in std_logic;
O : out std_logic
);
end component;
signal c : std_logic_vector(N-1 downto 0);
begin
c(0) <= '1';
genChain: for i in 1 to N-1 generate
mux : MUXCY
port map (
S => p(i),
DI => '0',
CI => c(i-1),
O => c(i)
);
end generate genChain;
y(N-1 downto 2) <= c(N-1 downto 2);
end generate genXilinx;
end generate gen2;
y(1) <= p(1);
end generate gen1;
end architecture;
Source: PoC.arith.prefix_and
Related
I have signal:
signal sig: std_logic_vector(N - 1 downto 0);
Where N defined in generic, and can be from 16 to 1024.
In code i need to compare this to zero:
if unsigned(sig) = 0 then
do somth
end if;
But how can I know which delay would be of such comparator?
My design work on 100 MHz, so I need to make some divider, which will skip some tackts to obtain result, something like that:
constant CHECK_TACKTS : natural := 100;
signal check : boolean;
signal wait_check_cntr: natural range 0 to CHECK_TACKTS;
-- states
when SOME_STATE=>
check <= unsigned(sig) = 0;
wait_check_cntr <= 0;
state <= CHECK_ZERO
when CHECK_ZERO =>
if wait_check_cntr = CHECK_TACKTS then
if check then
--do somth
end if;
else
wait_check_cntr <= wait_check_cntr + 1;
end if;
But how can I calculate CHECK_TACKTS, if I know, thet counter period is 10 ns? If xilinx synthes tool build full compare tree on luts, seems like compare time can be proportional to log2(N), but what about lut time? Of course I can do research and measure timings from report on several points and than perform regression, but may be there are simpiler way?
The easiest way to pipeline your comparaison operation on Xilinx is to let the tool do it for you. You need to activate the "register balancing" option and use syntax such as:
if rising_edge(clk) then
check_0 <= unsigned(sig) = 0;
check_1 <= check_0;
check <= check_1;
end if;
XST (or Vivado) will distribute the compare operation on three cycles (for that case).
If you prefer not to rely on the synthesis tool, you can manually divide the operation yourself:
if rising_edge(clk) then
check_msb <= unsigned(sig'left downto sig'length/2) = 0;
check_lsb <= unsigned(sig'length/2-1 downto 0) = 0;
check <= check_msb and check_lsb;
end if;
This may not be the optimal way to balance the comparaison, but the VHDL code is simple, easy to modify and to understand.
I have a problem with the Synthesise in VHDL. This is the part of the code where it gives me error:
CASE stare_curenta IS
WHEN verde =>
stare_urm <= albastru;
rosuS1368stg <= '1';
galbenS1368stg <= '0';
verdeS1368stg <= '0';
rosuS1368 <= '0';
galbenS1368 <= '0';
if ( clock'event and clock = '0') then
galbenS1368 <= '1';
end if;
verdeS1368 <= '1';
rosup1v1i4v2i3v1i2v2i6v1i5v2i8v1i7v2 <= '0';
verdep1v1i4v2i3v1i2v2i6v1i5v2i8v1i7v2 <= '1';
rosuS2457stg <= '1';
galbenS2457stg <= '0';
if (clock'event and clock = '0') then
galbenS2457stg <= '1';
end if;
verdeS2457stg <= '0';
rosuS2457 <= '1';
galbenS2457 <= '0';
verdeS2457 <= '0';
rosup2v1i1v2i4v1i3v2i5v1i8v2i7v1i6v2 <= '1';
verdep2v1i1v2i4v1i3v2i5v1i8v2i7v1i6v2 <= '0';
I have another process of clock and clock'event below, like this one:
PROCESS(clock,stare_urm)
BEGIN
if (clock'event and clock = '1')then
stare_curenta <= stare_urm;
end if;
END PROCESS;
The 'Check Syntax' and 'Simulation' are going well, only the Synthesise it gives me the error: Signal galbenS1368 cannot be synthesized, bad synchronous description. The description style you are using to describe a synchronous element (register, memory, etc.) is not supported in the current software release.
Thank you!
The problem is:
if (clock'event and clock = '0') then
inside your state decoding. It's not a syntax problem, and simulators will dutifully carry out exactly what you wrote (though you may not get the results you intended), but as the error message says, it's not a supported synthesis style (embedding a clocked segment of code, i.e. a portion of a process intended to create a register, inside a larger combinational process).
Either way, I'm not sure it's what you really want to do. A process is evaluated when a signal in its sensitivity list changes. The way you've coded it, you're effectively saying "when this process is evaluated, if at that exact instant the clock is falling, register the signal", even though what you probably intended was to assign a value which is then registered on the next falling clock edge.
Assuming that last statement is so, it basically tells you how to code it. It has two parts. (1) assign a value:
when verde =>
...
galbenS2457stg <= '1';
(2) which is registered on the next falling clock edge:
process (clock)
begin
if clock'event and clock = '0' then
galbenS2457stg_reg <= galbenS2457stg;
end if;
end process;
Synchronous design in synthesis is essentially register -> a bunch of combinational logic -> register -> combinational logic -> etc. Coding it like that, with your registers separated from your combinational logic in code, is a good way to start thinking more in terms of hardware.
edit for clarification
Based on your responses in the comments, it seems I was not clear enough on what I was recommending.
You have some sort of process to handle your state machine. I'm assuming most of it is unclocked. You tried to insert small, clocked portions into it, and this is what the tool is complaining about. I'm suggesting you try coding in the following manner (not complete code, just to illustrate the point):
comb1 : process (...) -- fill in sensitivity list as needed
begin
-- state machine decode
...
when verde =>
...
galbenS2457stg <= '1';
...
end process comb1;
clocked1 : process (clock)
begin
if clock'event and clock = '0' then
galbenS2457stg_reg <= galbenS2457stg;
end if;
end process clocked1;
Change the names if you need to - since you only posted part of your code, it wasn't evident that galbenS2457stg was an output at all. However you name them, the signal assigned in your state decode process should be an internal signal, and would be declared in the architecture declarative region, not as an output port, while the clocked signal would be the output port, and would be declared as such.
I am trying to implement a simple multiplier. I have a text file, from in which there are two columns. I am multiplying column 1 to column 2. Here is code in Verilog:
module File_read(
input clk
);
reg [21:0] captured_data[0:10];
reg [21:0] a[0:8];
reg [21:0] b[0:8];
reg [43:0] product[0:5];
`define NULL 0
integer n=0;
integer i=0;
initial
$readmemh("abc.txt",captured_data);
always #(posedge clk) begin
product[i]<=captured_data[n]*captured_data[n+1];
n<=n+2;
i<=i+1;
end
endmodule
I have Xilinx Spartan®-6 LX45 FPGA board. And it offers 128M bit ddr2 ram and 16Mbyte x4 SPI Flash for configuration & data storage.
Now I want to store my file into FPGA board into memory. So how can I do this? Do I have to use IP core to access memory or by any other source?
P.S: This is my first time, I am storing anything on FPGA.
Regards!
Awais
First of all don't use DDR or Flash memory, unless you really need them. Your FPGA has plenty of BlockRAMs to store several thousand arguments for your multiplier.
One easy way is to instantiate 2 BlockRAMs and load them at compile time with data from a file. Xilinx offers tools like data2mem to achieve this.
Alternatively, you can use Ethernet or a UART connection to send the test data to your design.
Edit 1 - How to instantiate BlockRAM
Solution 1: A generic VHDL description.
type T_RAM is array(LINES - 1 downto 0) of std_logic_vector(BITS-1 downto 0);
signal ram : T_RAM;
begin
process (Clock)
begin
if rising_edge(Clock) then
if (WriteEnable = '1') then
ram(to_integer(WriteAddress)) <= d;
end if;
q <= ram(to_integer(ReadAddress));
end if;
end process;
Solution 2: The IPCore generator has a wizard to create BlockRAMs and assign external files.
Solution 3: Manually instantiate a BlockRAM macro. Each FPGA family comes with a HDL library guide of supported macros. For example the Virtex-5 has a RAMB36 macro on page 311.
The usage of BlockRAMs with data2MEM and *.bmm (BlockRAM memory map) files is described here.
How can I make a memory module in which DATA bus width are passed as parameter to each instances and my design re-configure itself according to the parameter? For example, assuming I have byte addressable memory and DATA-IN bus width is 32 bit (4 bytes written in each cycle) and DATA-OUT is 16 bits (2 bytes read each cycle). For other instance DATA-IN is 64 bits and DATA-OUT is 16 bits. For all such instances my design should work.
What I have tried is to generate write pointer values according to design parameters, e.g. DATA-IN 32 bit, write pointer will increment 4 every cycle while writing. For 64 bit -increment will be by 8 and so on.
Problem is: how to make 4 or 8 or 16 bytes to be written in single cycle according to parameters passed to instance?
//Something as following I want to implement. This memory instance can be considered as internal memory of FIFO having different datawidth for reading and writing in case you think of an application of such memory
module mem#(parameter DIN=16, parameter DOUT=8, parameter ADDR=4,parameter BYTE=8)
(
input [DIN-1:0] din,
output [DOUT-1:0] dout,
input wen,ren,clk
);
localparam DEPTH = (1<<ADDR);
reg [BYTE-1:0] mem [0:DEPTH-1];
reg wpointer=5'b00000;
reg rpointer=5'b00000;
reg [BYTE-1:0] tmp [0:DIN/BYTE-1];
function [ADDR:0] ptr;
input [4:0] index;
integer i;
begin
for(i=0;i<DIN/BYTE;i=i+1) begin
mem[index] = din[(BYTE*(i+1)-1):BYTE*(i)]; // something like this I want to implement, I know this line is not allowed in verilog, but is there any alternative to this?
index=index+1;
end
ptr=index;
end
endfunction
always #(posedge clk) begin
if(wen==1)
wpointer <= wptr(wpointer);
end
always #(posedge clk) begin
if(ren==1)
rpointer <= ptr(rpointer);
end
endmodule
din[(BYTE*(i+1)-1):BYTE*(i)] will not compile in Verilog because the MSB and LSB select bits are both variables. Verilog requires a known range. +: is for part-select (also known as a slice) allows a variable select index and a constant range value. It was introduced in IEEE Std 1364-2001 § 4.2.1. You can also read more about it in IEEE Std 1800-2012 § 11.5.1, or refer to previously asked questions: What is `+:` and `-:`? and Indexing vectors and arrays with +:.
din[BYTE*i +: BYTE] should work for you, alternatively you can use din[BYTE*(i+1)-1 -: BYTE].
Also, you should use non-blocking assignments (<=) to mem. In your code read and write can happen at the same time. With blocking there is a race condition between when accessing the same byte. It may synthesize, but your RTL and gate simulation may generated different results. I also strongly advice agent using functions for assigning memory. Functions in synthesizable code without nasty surprises need to self contained without references on anything outside of the function and any internal variables are always reset to a static constant at the start of the function.
With the guidelines mentioned above, I'd recommend recoding to something like the below. This is a template to start with, not a free lunch. I left out the out-of-range index compensation for you to figure out on your own.
...
localparam DEPTH = (1<<ADDR);
reg [BYTE-1:0] mem [0:DEPTH-1];
reg [ADDR-1:0] wpointer, rpointer;
integer i;
initial begin // init values for pointers (FPGA, not ASIC)
wpointer = {ADDR{1'b0}};
rpointer = {ADDR{1'b0}};
end
always #(posedge clk) begin
if (ren==1) begin
for(i=0; i < DOUT/BYTE; i=i+1) begin
dout[BYTE*i +: BYTE] <= mem[rpointer+i];
end
rpointer <= rpointer + (DOUT/BYTE);
end
if (wen==1) begin
for(i=0; i < DIN/BYTE; i=i+1) begin
mem[wpointer+i] <= din[BYTE*i +: BYTE];
end
wpointer <= wpointer + (DIN/BYTE);
end
end
I have an array of vectors that I want to be stored in Block RAM on a Virtex-5 using ISE 13.4. It is 32Kb which should fit in 1 BRAM but it is all being stored in logic. My system uses an AMBA APB bus so I check for a select line and an enable line. Please help me understand why this code isn't inferring a BRAM. Note: this is a dummy example which is simpler to understand and should help me with my other code.
architecture Behavioral of top is
type memory_array is array (63 downto 0) of std_logic_vector(31 downto 0);
signal memory : memory_array;
attribute ram_style: string;
attribute ram_style of memory : signal is "block";
begin
process(Clk)
begin
if(rising_edge(Clk)) then
if(Sel and Wr_en and Enable) = '1' then
memory(to_integer(Paddr(5 downto 0))) <= Data_in;
elsif(Sel and not Wr_en and Enable) = '1' then
Data_out <= memory(to_integer(Paddr(5 downto 0)));
end if;
end if;
end process;
end Behavioral;
I declare the ram_style of the array as block but the XST report says: WARNING:Xst:3211 - Cannot use block RAM resources for signal <Mram_memory>. Please check that the RAM contents is read synchronously.
It appears that the problem lies in a read_enable condition, but the Virtex 5 User Guide makes it sound like there is an enable and a write_enable on the BRAM hard blocks. I could drive the output all the time, but I don't want to and that would waste power. Any other ideas?
Your logic may not match how your device's BRAM works (there are various limitations depending on the device). Usually, the data_out is updated on every clock cycle the RAM is enabled for, not just "when not writing" - try this:
process(Clk)
begin
if(rising_edge(Clk)) then
if(Sel and Enable) = '1' then
Data_out <= memory(to_integer(Paddr(5 downto 0)));
if wr_en = '1' then
memory(to_integer(Paddr(5 downto 0))) <= Data_in;
end if;
end if;
end if;
end process;
I moved the Data_out assignment "upwards" to make it clear that it gets the "old" value - that's the default behaviour of the BRAM, although other styles can also be set up.
Alternatively, the tools may be being confused by the sel and enable and write all in a single if statement - this is because they are mainly "template matching" rather than "function matching" when inferring BRAM. You may find that simply splitting out an "enable if" and a "write if" (as I did above) whilst keeping the rest of the functionality the same is sufficient to make the synthesiser do what is required.
If you are using Xilinx's XST then you can read all about inferring RAMs in the docs (page 204 onwards of my XST user guide - the chapter is called "RAM HDL Coding techniques")
Use the appropriate macro for the BRAM block on your device? I found that to work much better than relying on the synthesis tool not beeing stupid.
I tried many different combinations and here is the only one I got to work:
en_BRAM <= Sel and Enable;
process(Clk)
begin
if(rising_edge(Clk)) then
if(en_BRAM = '1')then
if(Wr_en = '1') then
icap_memory(to_integer(Paddr(5 downto 0))) <= Data_in;
else
Data_out <= icap_memory(to_integer(Paddr(5 downto 0)));
end if;
end if;
end if;
end process;
So I think the enable needs to be on the whole RAM and it can only be 1 signal. Then the write enable can also only be 1 signal and the read has to be only an else statement (not if/elsif). This instantiates a BRAM according to XST in ISE 13.3 on Windows 7 64-bit.