Resources overused - image-processing

Resources overused - image-processing

Can anybody help me to optimize this code.I guess the nested looping is taking a lot of resources. The census_transform is not taking much resources but this correlation block is taking 158% of resources.How can I avoid the use of loops.
module correlation(clk,disp_map
);
input clk;
output reg [5:0] disp_map;
reg [7:0] d = 0;
wire [119:0] census_left;
wire [119:0] census_right;
reg [119:0] temp [0:63];
reg [119:0] hamming;
reg en_l = 1;
reg en_r = 0;
reg [6:0]prevSAD = 7'h7f;
reg [6:0] SAD = 0;
reg BestPosition = 0;
// integer count =0;
integer j =0;
integer i =0;
integer k =0;
integer p =0;
//----------------------------------------------------------------------//
/*always # (posedge clk)
count = count+1;*/
//----------------------------------------------------------------------//
always # (posedge clk)
begin
if(d==63)
en_r = 1;
else
d = d+1;
//-----------------------------------------------------------------------//
if(j<=16380)
begin
temp[0] <= census_left;
for(i=0;i<63;i=i+1)
begin
temp[i+1] <= temp[i];
end
j=j+1;
end
//------------------------------------------------------------------------//
if(en_r)
begin
for(k=0;k<63;k=k+1)
begin
hamming = census_right^temp[k];
for(p=0;p<=119;p=p+1)
SAD = SAD + hamming[p];
if(prevSAD > SAD)
begin
prevSAD = SAD;
BestPosition = k;
end
end
disp_map = BestPosition;
end
end
census_transform transform(clk,en_l,en_r,census_left,census_right);
endmodule

While your resource usage is not clear, I've got a couple of things here.
Never mix blocking and nonblocking assignments.
Why are you declaring different loop variables? A single variable is enough.
Rather than assigning default values, use reset logic in the design.
Please clarify about which resources are you talking about, your simulator version and which nested looping are to be discussed.
There can many more points, I have listed just a few.

Related

Task does not update testbench sclk

I'm trying to understand why my signal is not updating when it is processed by the task.
As you could see below, the problem is related to the signal that internally on the task are changing correctly but even in a hierarchical call do not change the signal outside the task.
//-------------------------------
timeunit 1ps;
timeprecision 1ps;
`define CLK_HALF_PERIOD 10
`define SCK_HALF_PERIOD 30
module tbench ();
logic clk;
logic sclk;
logic RST;
hwpe_stream_intf_stream MOSI();
hwpe_stream_intf_stream MISO();
logic try;
initial begin
spi_send (.addr({1'b1,3'b111,12'd1,16'd0 }),
.data(1),
.MISO(try),
.MOSI(MOSI.data),
.SCK(sclk));
end
always
begin
# `CLK_HALF_PERIOD clk = 1;
# `CLK_HALF_PERIOD clk = 0;
end
task automatic spi_send (
input logic [31:0] addr,
input logic [31:0] data,
input logic MISO, // not used
ref logic MOSI,
ref logic SCK
);
integer i = 0;
$display ("add=%-32d",addr );
for (i=0; i<32; i=i+1) begin
//$display("add", 31-i , " MOSI ",MOSI);
// MOSI = ;
MOSI = addr[31-i];
tbench.try = MOSI;
#`SCK_HALF_PERIOD
tbench.sclk = 1'b1;
#`SCK_HALF_PERIOD;
tbench.sclk = 1'b0;
$display("add", addr[30-i] , " MOSI ",MOSI);
end
endtask
endmodule
tbench.sclk and MOSI are not changing globally, but only locally.
Here is the interface:
interface hwpe_stream_intf_stream() ;
logic valid;
logic ready;
logic data;
logic [8/8-1:0] strb;
modport source (
output valid, data, strb,
input ready
);
modport sink (
input valid, data, strb,
output ready
);
endinterface

You need to zoom in to the beginning of your waveforms to see sclk toggling. It toggles between 0 and 2000ps, then stops toggling.
You can add this to your testbench to stop the simulation much sooner to make it more obvious:
initial #3ns $finish;

Verifying single port RAM using Verilog

I've written a Verilog testbench for a single PORT SRAM with write operation at address i, and reading it successively at i-1 address. something like this below
task write_read;
integer i;
begin
for (i=1,i<=20,i=i+1) begin
write_mode(i,$urandom); // i= address, $urandom=data
read_mode(i-1); //i-1 address
end
end
endtask
PS: write_mode, read_mode are tasks that set wen, cs, and some scan mode pins along with delays.
I'm seeing correct read and write operations in the Verdi waveform visually. But, I want to verify data at the address i being written is the same as the data at the address i being read from the log files. If they don't match, it should display an ERROR.
I'm not sure how to implement this in the code. When I've hundreds of compilers, I can't go inside all the compiler paths, open their waveform files and check read, write operations manually.
I tried to store $urandom data for a particular address location, but it gets overwritten with each iterative cycle. I can use a function to return the $urandom value, but my environment contains delays, so I can't use functions.
In a nutshell, I'm looking for Verilog code help on memory verification without dumping waveforms.
can someone please help? please, Let me know if more details are needed
Thanks

Using the Kristen framework you get some test bench code generated which is available for reuse. Below is one file from that test bench called test_support.vh. The file contains functions for displaying errors and counting errors. I would recommend that you use === or !== when comparing memory locations as undefined signals can match inadvertently. At the end of the test, you call the display_test_final_status and that will create an overal test report for you in the log file. After your full regression completes you may now run a grep on the log files looking for ERROR and it will show anything that failed in a consistent mannor.
I have a ovearching Perl regression script that runs all of my tests and does the grep automagically afterwords and sends a success or fail email.
On a per test basis, you will need to set some defines to indicate the test name, and whether you want errors to halt the test immediately.
Good luck.
// -----------------------------------------------------------
// test_support.vh
// Generated file specifies which numerical test cases to run.
// Kristen Software License - Version 1.0 - January 1st, 2019
//
// Permission is hereby granted, free of charge, to any person or organization
// obtaining a copy of the software and accompanying documentation covered by
// this license (the "Software") to use, reproduce, display, distribute,
// execute, and transmit the Software, and to prepare derivative works of the
// Software, and to permit third-parties to whom the Software is furnished to
// do so, all subject to the following:
//
// The copyright notices in the Software and this entire statement, including
// the above license grant, this restriction and the following disclaimer,
// must be included in all copies of the Software, in whole or in part, and
// all derivative works of the Software, unless such copies or derivative
// works are solely in the form of machine-executable object code generated by
// a source language processor.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
// SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
// FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
// ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
// DEALINGS IN THE SOFTWARE.
// GENERATED FILE - DO NOT MODIFY THIS FILE MANUALLY.
// -----------------------------------------------------------
logic[63:0] error_count = 0;
logic[63:0] lcl_error_count = 0;
logic bool_quick_mode = 0;
logic[511:0] support_test_passes;
logic[15:0] support_test_fails [0:511];
logic[511:0] support_test_was_run;
task test_init;
error_count = 0;
lcl_error_count = 0;
endtask
task display_test_begin_status;
begin
`ifdef QUICK_MODE
bool_quick_mode = 1;
`endif
$display("=========================================================================");
$display("| Test: %s QUICK_MODE = %s", `TEST_NAME_STR, bool_quick_mode ? "ON" : "OFF");
$display("| VERBOSE = %0d", `VERBOSE);
$display("=========================================================================");
end
endtask
task display_test_start;
input[31:0] test_id;
input string test_description;
begin
lcl_error_count = 0;
$display("=========================================================================");
$display("%0t: Test %0d : %s.", $time, test_id, test_description);
$display("=========================================================================");
end
endtask
task display_test_end;
input[31:0] test_id;
begin
$display("=========================================================================");
$display("%0t: Test %0d Complete with %0d ERRORS.", $time, test_id, lcl_error_count);
$display("=========================================================================");
support_test_was_run[test_id] = 1'b1;
if (lcl_error_count == 0)
support_test_passes[test_id] = 1'b1;
else
support_test_fails[test_id] = lcl_error_count;
end
endtask
task display_error_inc;
input string error_description;
begin
error_count++;
lcl_error_count++;
//$display("=========================================================================");
$display("%0t: ERROR: %s : error_count: %0d",$time, error_description, error_count );
//$display("=========================================================================");
`ifdef TEST_STOP_ON_ERROR
if (error_count >= `TEST_STOP_ON_ERROR_LIMIT) begin
$display("%0t, Stopping on error count = %d, %m", $time, error_count);
$finish();
end
`endif
end
endtask
task display_test_final_status;
//input string testname;
begin
$display("=========================================================================");
$display("%0t: Test %s %s with %0d ERRORS",$time, `TEST_NAME_STR, error_count > 0 ? "FAILS" : "PASSES",error_count);
$display("=========================================================================");
if (error_count !== 'h0)
begin
$display("Test failures:");
for (int err_fail_cnt = 0;err_fail_cnt < 512; err_fail_cnt = err_fail_cnt + 1)
begin
if (support_test_was_run[err_fail_cnt] == 1'b1 && support_test_passes[err_fail_cnt] != 1'b1)
begin
$display("Test %d, fails with %d errors", err_fail_cnt, support_test_fails[err_fail_cnt]);
end
end
end
end
endtask
task display_no_test_found;
input[31:0] test_id;
input string test_description;
begin
lcl_error_count = 0;
$display("=========================================================================");
$display("%0t: Test %0d : %s NOT FOUND SKIPPING.", $time, test_id, test_description);
$display("=========================================================================");
end
endtask
From the Kristen generated test bench, when a RAM unit is detected, this is the test that is run. I would treat this as an algorithmic solution, as you would have to generate several vector strings and additional functions that convert those strings to addresses.
xreg_v1_0_0_write_mpi_test is a task that writes to the RAM and xreg_v1_0_0_read_mpi_test is a task that reads from the RAM. There is a compare task xreg_v1_0_0_register_compare_with_error which obviously generates an error. I'll show those below as well.
// This is a casex snippet from the RAM test
7'd2: begin
// xreg_v1_0_0_mem_ack test
$display("%0t, xreg_v1_0_0_mem_ack test", $time);
oo_limit = get_entries_by_index(tc_index) > 256 ? 256 : get_entries_by_index(tc_index); //Max number of entries for this test
mm_step = get_entry_offset(tc_index);
nn_limit = perfect_by_datawidth(get_size_by_index(get_address_by_index(tc_index)),DATAPATH_WIDTH)/8; // Number of bytes for this entry
nn_step = DATAPATH_WIDTH == 8 ? 1 : DATAPATH_WIDTH == 16 ? 2 : DATAPATH_WIDTH == 32 ? 4 : 8;
for(oo = 0; oo < oo_limit; oo = oo + 1) // Move through each entry we will be testing
begin
for (nn = 0; nn < nn_limit; nn = nn + nn_step) // Walk through each access of a given entry
begin
test_address = get_address_by_index(tc_index) + (oo*mm_step) + nn;
expected = oo[7:0] ^ nn[7:0] ^ mm_step[7:0] ^ nn_step[7:0];
expected = {expected[6:0],expected[7],expected[2:0],expected[7:3],expected[5],expected[3],expected[1],expected[7],expected[0],expected[2],expected[6],expected[4],expected[7:0]};
xreg_v1_0_0_write_mpi_test(aclk, test_address, expected);
end
end
for(oo = 0; oo < oo_limit; oo = oo + 1) // Move through each entry we will be testing
begin
for (nn = 0; nn < nn_limit; nn = nn + nn_step) // Walk through each access of a given entry
begin
test_address = get_address_by_index(tc_index) + (oo*mm_step) + nn;
mask = get_entry_mask_by_index(tc_index);
expected = oo[7:0] ^ nn[7:0] ^ mm_step[7:0] ^ nn_step[7:0];
expected = {expected[6:0],expected[7],expected[2:0],expected[7:3],expected[5],expected[3],expected[1],expected[7],expected[0],expected[2],expected[6],expected[4],expected[7:0]} & (get_entry_mask_by_index(tc_index) >> (nn*8));
xreg_v1_0_0_read_mpi_test(aclk, test_address, result);
xreg_v1_0_0_register_compare_with_error(test_address,expected,result);
end
end
end
And here are some of the referenced tasks that show writes, reads and compares as implemented. Some of this is System Verilog I think, so you will have to treat it as algorithmic.
task automatic xreg_v1_0_0_read_mpi_test;
ref logic clock;
input [`TEST_ADDR_WIDTH-1:0] address;
output [`TEST_DATA_WIDTH-1:0] result;
begin
$display("ENTER >>> xreg_v1_0_0_read_mpi_test");
repeat(1) #(posedge clock);
`TEST_MPI_ADDR = address;
`TEST_MPI_RD_REQ = 1'b1;
`TEST_MPI_ENABLE = 1'b1;
while (!`TEST_MPI_ACK) repeat(1) #(posedge clock);
`TEST_MPI_ADDR = 'h0;
`TEST_MPI_RD_REQ = 1'b0;
`TEST_MPI_ENABLE = 1'b0;
result = `TEST_MPI_RD_DATA;
$display("WAITING ACK <<< xreg_v1_0_0_read_mpi_test");
while (`TEST_MPI_ACK) repeat(1) #(posedge clock);
$display("EXIT <<< xreg_v1_0_0_read_mpi_test");
end
endtask
task automatic xreg_v1_0_0_write_mpi_test;
ref logic clock;
input [`TEST_ADDR_WIDTH-1:0] address;
input [`TEST_DATA_WIDTH-1:0] data;
begin
$display("ENTER >>> xreg_v1_0_0_write_mpi_test");
repeat(1) #(posedge clock);
`TEST_MPI_ADDR = address;
`TEST_MPI_WR_DATA = data;
`TEST_MPI_WR_REQ = 1'b1;
`TEST_MPI_ENABLE = 1'b1;
while (!`TEST_MPI_ACK) repeat(1) #(posedge clock);
`TEST_MPI_ADDR = 'h0;
`TEST_MPI_WR_DATA = 'h0;
`TEST_MPI_WR_REQ = 1'b0;
`TEST_MPI_ENABLE = 1'b0;
$display("WAITING ACK <<< xreg_v1_0_0_write_mpi_test");
while (`TEST_MPI_ACK) repeat(1) #(posedge clock);
$display("EXIT <<< xreg_v1_0_0_write_mpi_test");
end
endtask
task automatic xreg_v1_0_0_register_compare_with_error;
input [`TEST_ADDR_WIDTH-1:0] address;
input [`TEST_DATA_WIDTH-1:0] expected;
input [`TEST_DATA_WIDTH-1:0] result;
begin
$display("ENTER >>> xreg_v1_0_0_register_compare_with_error");
if (expected !== result)
begin
$display("%0t, Address = 0x%X", $time, address);
$display("%0t, Expected 0x%X", $time, expected);
$display("%0t, Read 0x%X", $time, result);
display_error_inc("xreg_v1_0_0_rdconst_test: Mismatch between read and expected data.");
end
$display("EXIT <<< xreg_v1_0_0_register_compare_with_error");
end
endtask

You would want to first store the write data in a queue. Then when reading the SRAM pass the address to the queue and compare the queue_data_out and the sram_data_out. print ERROR if the comparison fails.
See pseudo code below.
task write_read;
integer i;
bit write_data_queue [$];
begin
for (i=1,i<=20,i=i+1) begin
write_mode(i,$urandom); // i= address, $urandom=data
write_data_queue.push_front($urandom);
read_mode(i-1); //i-1 address
queue_data = write_data_queue.pop_front();
if(queue_data != read_mode(i-1))
$display("ERROR'\n")
end
end
endtask

dart: changing list[i] changes list[i-1] too

I have a List of the type Model. when I loop all its elements and loop the next one except for the last one, then change the last one manually, the one before changes.
here is a little code to reproduce the problem (also you can run it directly in dartpad from here)
void main() {
List<Model> s = [];
for (int i = 0; i < 5; i++) {
s.add(Model(i));
}
for (int i = 0; i < s.length - 1; i++) {
s[i] = s[i + 1];
}
print(s);
s[s.length-1].x = 100;
print(s);
}
class Model {
int x;
Model(this.x);
#override
String toString() => 'x: ' + this.x.toString();
}
notice that this problem does not happen when you comment out the for loop or the manual change, or instead of changing the last one's property, you reassign a new value to it, like s[s.length - 1] = Model(100);. seems like dart for some reason is re-running the loop.

When you run the second for loop, you assign the i + 1th Model to the ith position in the list:
// initialise list
for (int i = 0; i < s.length; i++) {
s[i] = s[i + 1]
}
If you unwrap the loop, it looks roughly like this:
s[0] = s[1];
s[1] = s[2];
s[2] = s[3];
s[3] = s[4];
Notice that this leaves s[4] unchanged, but also assigns it to s[3].
In Dart, variables contain references to objects. This means that when your list runs s[3] = s[4];, both s[3] and s[4] point to the same object.
Now, if you modify s[4] you, are actually modifying the objects that s[4] refers to, which happens to also be the object that s[3] refers to, so they both appear to change in your list.

Concatenate block of memory into a wire array?

In my project I have something like this:
reg [15:0] mem [3:0];
wire [63:0] data;
I know I can concatenate the mem into data like this:
assign data = {mem[3], mem[2], mem[1], mem[0]};
but it becomes some bad work when the memory grows big:
reg [3:0] mem [255:0];
wire [1023:0] data;
I'm afraid writing something like this isn't going to be a good idea, even I can write some other Python or Ruby script to generate such a line.
assign data = {mem[255], ..........., mem[0]};
summon_cthulhu();
Is there any better approach to do this?
Note: This is not an XY problem - it's the exact problem that I want to solve.

Use a generate-for loop
genvar ii;
for (ii=0;ii<256;ii=ii+1)
assign data[ii*16+:16] = mem[ii];

Here is one way to do it.
parameter MEM_WIDTH = 4;
parameter MEM_DEPTH = 256;
localparam DATA_SIZE = (MEM_WIDTH * MEM_DEPTH);
reg[MEM_WIDTH-1:0]mem[MEM_DEPTH-1:0];
reg[DATA_SIZE-1:0]data;
always#(*)
begin
for(i=0; i<MEM_DEPTH; i=i+1)
begin
data[i*MEM_WIDTH +: MEM_WIDTH] = mem[i];
end
end

Fast implementation of BWT in Lua

local function fShallowCopy(tData)
local tOutput = {}
for k,v in ipairs(tData) do
tOutput[k] = v
end
return tOutput
end
local function fLexTblSort(tA,tB) --sorter for tables
for i=1,#tA do
if tA[i]~=tB[i] then
return tA[i]<tB[i]
end
end
return false
end
function fBWT(tData)
--setup--
local iSize = #tData
local tSolution = {}
local tSolved = {}
--key table--
for n=1,iSize do
tData[iSize] = fRemove(tData,1)
tSolution[n] = fShallowCopy(tData)
end
table.sort(tSolution,fLexTblSort)
--encode output--
for i=1,iSize do
tSolved[i] = tSolution[i][iSize]
end
--finalize--
for i=1,iSize do
if fIsEqual(tSolution[i],tData) then
return i,tSolved
end
end
return false
end
Above is my current code for achieving BWT encoding in Lua. The issue is because of the size of the tables and lengths of loops it takes a long time to run. For a 1000 character input the average encoding time is about 1.15 seconds. Does anyone have suggestions for making a faster BWT encoding function?
the biggest slowdowns appear to be in fLexTblSort and fShallowCopy. I have included both above the BWT function as well.

If I see right, your algorithm has complexity O(n^2 log n), if the sort is quicksort. The comparator function fLexTblSort takes O(n) itself for each pair of values you compare.
As I checked with my implementation from few years back, I see possible space to improve. You create all the possible rotations of the tData, which takes also a lot of time. I used only single data block and I stored only starting positions of particular rotations. You also use a lot of loops which can shrink into less.
Mine implementation was in C, but the concept can be used also in Lua. The idea in some hybrid pseudocode between your Lua and C.
function fBWT(tData)
local n = #tData
local tSolution = {}
for(i = 0; i < n; i++)
tSolution[i] = i;
--table.sort(tSolution, fLexTblSort)
quicksort(tData, n, tSolution, 0, n)
for(i = 0; i < n; i++){
tSolved[i] = tData[(tSolution[i]+n-1)%n];
if( tSolution[i] == 0 )
I = i;
}
return I, tSolved
end
You will also need your own sort function, because the standard does not offer enough flexibility for this magic. Quicksort is a good idea (you might avoid some of the arguments, but I pasted just the C version I was using):
void swap(int array[], int left, int right){
int tmp = array[right];
array[right] = array[left];
array[left] = tmp;
}
void quicksort(uint8_t data[], int length, int array[], int left, int right){
if(left < right){
int boundary = left;
for(int i = left + 1; i < right; i++){
if( offset_compare(data, length, array, i, left) < 0 ){
swap(array, i, ++boundary);
}
}
swap(array, left, boundary);
quicksort(data, length, array, left, boundary);
quicksort(data, length, array, boundary + 1, right);
}
}
The last step is your own comparator function (similar to your original, but working on the rotations, again in C):
/**
* compare one string (fixed length) with different rotations.
*/
int offset_compare(uint8_t *data, int length, int *array, int first, int second){
int res;
for(int i = 0; i < length; i++){
res = data[(array[first]+i)%length] - data[(array[second]+i)%length];
if( res != 0 ){
return res;
}
}
return 0;
}
This is the basic idea I came up with few years ago and which worked for me. Let me know if there is something not clear or some mistake.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Resources overused - image-processing

Related

Task does not update testbench sclk

Verifying single port RAM using Verilog

dart: changing list[i] changes list[i-1] too

Concatenate block of memory into a wire array?

Fast implementation of BWT in Lua

Categories

Resources