How to loop over array in Z3Py - z3

As part of a reverse engineering exercise, I'm trying to write a Z3 solver to find a username and password that satisfy the program below. This is especially tough because the z3py tutorial that everyone refers to (rise4fun) is down.
#include <iostream>
#include <string>
using namespace std;
int main() {
string name, pass;
cout << "Name: ";
cin >> name;
cout << "Pass: ";
cin >> pass;
int sum = 0;
for (size_t i = 0; i < name.size(); i++) {
char c = name[i];
if (c < 'A') {
cout << "Lose: char is less than A" << endl;
return 1;
}
if (c > 'Z') {
sum += c - 32;
} else {
sum += c;
}
}
int r1 = 0x5678 ^ sum;
int r2 = 0;
for (size_t i = 0; i < pass.size(); i++) {
char c = pass[i];
c -= 48;
r2 *= 10;
r2 += c;
}
r2 ^= 0x1234;
cout << "r1: " << r1 << endl;
cout << "r2: " << r2 << endl;
if (r1 == r2) {
cout << "Win" << endl;
} else {
cout << "Lose: r1 and r2 don't match" << endl;
}
}
I got that code from the assembly of a binary, and while it may be wrong I want to focus on writing the solver. I'm starting with the first part, just calculating r1, and this is what I have:
from z3 import *
s = Solver()
sum = Int('sum')
name = Array('name', IntSort(), IntSort())
for c in name:
s.add(c < 65)
if c > 90:
sum += c - 32
else:
sum += c
r1 = Xor(sum, 0x5678)
print s.check()
print s.model()
All I'm asserting is that there are no letters less than 'A' in the array, so I expect to get back an array of any size that has numbers greater than 65.
Obviously this is completely wrong, mainly because it infinite loops. Also, I'm not sure I'm calculating sum correctly, because I don't know if it's initialized to 0. Could someone help figure out how to get this first loop working?
EDIT: I was able to get a z3 script that is close to the C++ code shown above:
from z3 import *
s = Solver()
sum = 0
name = Array('name', BitVecSort(32), BitVecSort(32))
i = Int('i')
for i in xrange(0, 1):
s.add(name[i] >= 65)
s.add(name[i] < 127)
if name[i] > 90:
sum += name[i] - 32
else:
sum += name[i]
r1 = sum ^ 0x5678
passwd = Array('passwd', BitVecSort(32), BitVecSort(32))
r2 = 0
for i in xrange(0, 5):
s.add(passwd[i] < 127)
s.add(passwd[i] >= 48)
c = passwd[i] - 48
r2 *= 10
r2 += c
r2 ^= 0x1234
s.add(r1 == r2)
print s.check()
print s.model()
This code was able to give me a correct username and password. However, I hardcoded the lengths of one for the username and five for the password. How would I change the script so I wouldn't have to hard code the lengths? And how would I generate a different solution each time I run the program?

Arrays in Z3 do not necessarily have any bounds. In this case the index-sort is Int, which means unbounded integers (not machine integers). Consequently, for c in name will run forever because it enumerates name[0], name[1], name[2], ...
It seems that you actually have a bound in the original program (name.size()), so it would suffice to enumerate up to that limit. Otherwise you might need a quantifier, e.g., \forall x of Int sort . name[x] < 65. This comes with all the warnings about quantifiers, of course (see e.g., the Z3 Guide)

Suppose the length is to be determined. Here is what I think you could do:
length = Int('length')
x = Int('x')
s.add(ForAll(x,Implies(And(x>=0,x<length),And(passwd[x] < 127,passwd[x] >=48))))

Related

undefined reference to `dgemm_' in gfortran in windows subsystem ubuntu

I have the following Fortran code from https://software.intel.com/content/www/us/en/develop/documentation/mkl-tutorial-fortran/top/multiplying-matrices-using-dgemm.html
I am trying to use gfortran complile it (named as dgemm.f90)
! Fortran source code is found in dgemm_example.f
PROGRAM MAIN
IMPLICIT NONE
DOUBLE PRECISION ALPHA, BETA
INTEGER M, K, N, I, J
PARAMETER (M=2000, K=200, N=1000)
DOUBLE PRECISION A(M,K), B(K,N), C(M,N)
PRINT *, "This example computes real matrix C=alpha*A*B+beta*C"
PRINT *, "using Intel(R) MKL function dgemm, where A, B, and C"
PRINT *, "are matrices and alpha and beta are double precision "
PRINT *, "scalars"
PRINT *, ""
PRINT *, "Initializing data for matrix multiplication C=A*B for "
PRINT 10, " matrix A(",M," x",K, ") and matrix B(", K," x", N, ")"
10 FORMAT(a,I5,a,I5,a,I5,a,I5,a)
PRINT *, ""
ALPHA = 1.0
BETA = 0.0
PRINT *, "Intializing matrix data"
PRINT *, ""
DO I = 1, M
DO J = 1, K
A(I,J) = (I-1) * K + J
END DO
END DO
DO I = 1, K
DO J = 1, N
B(I,J) = -((I-1) * N + J)
END DO
END DO
DO I = 1, M
DO J = 1, N
C(I,J) = 0.0
END DO
END DO
PRINT *, "Computing matrix product using Intel(R) MKL DGEMM "
PRINT *, "subroutine"
CALL DGEMM('N','N',M,N,K,ALPHA,A,M,B,K,BETA,C,M)
PRINT *, "Computations completed."
PRINT *, ""
PRINT *, "Top left corner of matrix A:"
PRINT 20, ((A(I,J), J = 1,MIN(K,6)), I = 1,MIN(M,6))
PRINT *, ""
PRINT *, "Top left corner of matrix B:"
PRINT 20, ((B(I,J),J = 1,MIN(N,6)), I = 1,MIN(K,6))
PRINT *, ""
20 FORMAT(6(F12.0,1x))
PRINT *, "Top left corner of matrix C:"
PRINT 30, ((C(I,J), J = 1,MIN(N,6)), I = 1,MIN(M,6))
PRINT *, ""
30 FORMAT(6(ES12.4,1x))
PRINT *, "Example completed."
STOP
END
By gfortran -lblas -llapack dgemm.f90, I got
/tmp/ccUtHQz1.o: In function `MAIN__':
dgemm.f90:(.text+0x794): undefined reference to `dgemm_'
collect2: error: ld returned 1 exit status
I searched that this type of question has been asked time to time, but I haven't found a solution for my case :(
I tried to use python load blas, based on https://software.intel.com/content/www/us/en/develop/articles/using-intel-mkl-in-your-python-programs.html
from ctypes import *
mkl = cdll.LoadLibrary("./anaconda3/lib/libmkl_rt.so")
dgemm = mkl.cblas_dgemm
def print_mat(mat, m, n):
for i in xrange(0,m):
print " ",
for j in xrange(0,n):
print mat[i*n+j],
print
Order = 101 # 101 for row-major, 102 for column major data structures
TransA = 111 # 111 for no transpose, 112 for transpose, and 113 for conjugate transpose
TransB = 111
m = 2
n = 4
k = 3
lda = k
ldb = n
ldc = n
alpha = 1.0
beta = -1.0
amat = c_double * 6
bmat = c_double * 12
cmat = c_double * 8
a = amat(1,2,3, 4,5,6)
b = bmat(0,1,0,1, 1,0,0,1, 1,0,1,0)
c = cmat(5,1,3,3, 11,4,6,9)
print "\nMatrix A ="
print_mat(a,2,3)
print "\nMatrix B ="
print_mat(b,3,4)
print "\nMatrix C ="
print_mat(c,2,4)
print "\nCompute", alpha, "* A * B + ", beta, "* C"
dgemm( c_int(Order), c_int(TransA), c_int(TransB), c_int(m), c_int(n), c_int(k), c_double(alpha), byref(a), c_int(lda), byref(b), c_int(ldb), c_double(beta), byref(c), c_int(ldc))
print_mat(c,2,4)
print
The above code works. Hence, the question may be related to use mkl with gfortran?
You should follow Intel's website to set the compiler flags for gfortran + MKL. Otherwise your will be linking with something else.
https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl/link-line-advisor.html

implications of using _mm_shuffle_ps on integer vector

SSE intrinsics includes _mm_shuffle_ps xmm1 xmm2 immx which allows one to pick 2 elements from xmm1 concatenated with 2 elements from xmm2. However this is for floats, (implied by the _ps , packed single). However if you cast your packed integers __m128i, then you can use _mm_shuffle_ps as well:
#include <iostream>
#include <immintrin.h>
#include <sstream>
using namespace std;
template <typename T>
std::string __m128i_toString(const __m128i var) {
std::stringstream sstr;
const T* values = (const T*) &var;
if (sizeof(T) == 1) {
for (unsigned int i = 0; i < sizeof(__m128i); i++) {
sstr << (int) values[i] << " ";
}
} else {
for (unsigned int i = 0; i < sizeof(__m128i) / sizeof(T); i++) {
sstr << values[i] << " ";
}
}
return sstr.str();
}
int main(){
cout << "Starting SSE test" << endl;
cout << "integer shuffle" << endl;
int A[] = {1, -2147483648, 3, 5};
int B[] = {4, 6, 7, 8};
__m128i pC;
__m128i* pA = (__m128i*) A;
__m128i* pB = (__m128i*) B;
*pA = (__m128i)_mm_shuffle_ps((__m128)*pA, (__m128)*pB, _MM_SHUFFLE(3, 2, 1 ,0));
pC = _mm_add_epi32(*pA,*pB);
cout << "A[0] = " << A[0] << endl;
cout << "A[1] = " << A[1] << endl;
cout << "A[2] = " << A[2] << endl;
cout << "A[3] = " << A[3] << endl;
cout << "B[0] = " << B[0] << endl;
cout << "B[1] = " << B[1] << endl;
cout << "B[2] = " << B[2] << endl;
cout << "B[3] = " << B[3] << endl;
cout << "pA = " << __m128i_toString<int>(*pA) << endl;
cout << "pC = " << __m128i_toString<int>(pC) << endl;
}
Snippet of relevant corresponding assembly (mac osx, macports gcc 4.8, -march=native on an ivybridge CPU):
vshufps $228, 16(%rsp), %xmm1, %xmm0
vpaddd 16(%rsp), %xmm0, %xmm2
vmovdqa %xmm0, 32(%rsp)
vmovaps %xmm0, (%rsp)
vmovdqa %xmm2, 16(%rsp)
call __ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
....
Thus it seemingly works fine on integers, which I expected as the registers are agnostic to types, however there must be a reason why the docs say that this instruction is only for floats. Does someone know any downsides, or implications I have missed?
There is no equivalent to _mm_shuffle_ps for integers. To achieve the same effect in this case you can do
SSE2
*pA = _mm_shuffle_epi32(_mm_unpacklo_epi32(*pA, _mm_shuffle_epi32(*pB, 0xe)),0xd8);
SSE4.1
*pA = _mm_blend_epi16(*pA, *pB, 0xf0);
or change to the floating point domain like this
*pA = _mm_castps_si128(
_mm_shuffle_ps(_mm_castsi128_ps(*pA),
_mm_castsi128_ps(*pB), _MM_SHUFFLE(3, 2, 1 ,0)));
But changing domains may incur bypass latency delays on some CPUs. Keep in mind that according to Agner
The bypass delay is important in long dependency chains where latency is a bottleneck, but
not where it is throughput rather than latency that matters.
You have to test your code and see which method above is more efficient.
Fortunately, on most Intel/AMD CPUs, there is usually no penalty for using shufps between most integer-vector instructions. Agner says:
For example, I found no delay when mixing PADDD and SHUFPS [on Sandybridge].
Nehalem does have 2 bypass-delay latency to/from SHUFPS, but even then a single SHUFPS is often still faster than multiple other instructions. Extra instructions have latency, too, as well as costing throughput.
The reverse (integer shuffles between FP math instructions) is not as safe:
In Agner Fog's microarchitecture on page 112 in Example 8.3a, he shows that using PSHUFD (_mm_shuffle_epi32) instead of SHUFPS (_mm_shuffle_ps) when in the floating point domain causes a bypass delay of four clock cycles. In Example 8.3b he uses SHUFPS to remove the delay (which works in his example).
On Nehalem there are actually five domains. Nahalem seems to be the most effected (the bypass delays did not exist before Nahalem). On Sandy Bridge the delays are less significant. This is even more true on Haswell. In fact on Haswell Agner said he found no delays between SHUFPS or PSHUFD (see page 140).

associative arrays in awk challenging memory limits

This is related to my recent post in Awk code with associative arrays -- array doesn't seem populated, but no error and also to optimizing loop, passing parameters from external file, naming array arguments within awk
My basic problem here is simply to compute from detailed ancient archival financial market data, daily aggregates of #transactions, #shares, value, BY DATE, FIRM-ID, EXCHANGE, etc. Learnt to use associative arrays in awk for this, and was thrilled to be able to process 129+ million lines in clock time of under 11 minutes. Literally before I finished my coffee.
Became a little more ambitious, and moved from 2 array subscripts to 4, and now I am unable to process more than 6500 lines at a time.
Get error messages of the form:
K:\User Folders\KRISHNANM\PAPERS\FII_Transaction_Data>zcat
RAW_DATA\2003_1.zip | gawk -f CODE\FII_daily_aggregates_v2.awk >
OUTPUT\2003_1.txt&
gawk: CODE\FII_daily_aggregates_v2.awk:33: (FILENAME=- FNR=49300)
fatal: more_no des: nextfree: can't allocate memory (Not enough space)
On some runs the machine has told me it lacks as little as 52 KB of memory. I have what I think of a std configuration with Win-7 and 8MB RAM.
(Economist by training, not computer scientist.) I realize that going from 2 to 4 arrays makes the problem computationally much more complex for the computer, but is there something one can do to improve memory management at least a little bit. I have tried closing everything else I am doing. The error always has to do only with memory, never with disk space or anything else.
Sample INPUT:
49290,C198962542782200306,6/30/2003,433581,F5811773991200306,S5405611832200306,B5086397478200306,NESTLE INDIA LTD.,INE239A01016,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,591.13,5655,3342840.15,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49291,C198962542782200306,6/30/2003,433563,F6292896459200306,S6344227311200306,B6110521493200306,GRASIM INDUSTRIES LTD.,INE047A01013,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,495.33,3700,1832721,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49292,C198962542782200306,6/30/2003,433681,F6513202607200306,S1724027402200306,B6372023178200306,HDFC BANK LTD,INE040A01018,6/26/2003,1,E745964372424200306,REG_DL_STLD_02,242,2600,629200,REG_DL_INSTR_EQ,REG_DL_DLAY_D,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49293,C7885768925200306,6/30/2003,48128,F4406661052200306,S7376401565200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,44600,5575000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49294,C7885768925200306,6/30/2003,48129,F4500260787200306,S1312094035200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,4,E912851176274200306,REG_DL_STLD_04,125,445600,55700000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49295,C7885768925200306,6/30/2003,48130,F6425024637200306,S2872499118200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,48000,6000000,REG_DL_INSTR_EU,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
Code
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
( $17~/_EQ\>/ ) { if (date[$10]++ == 0) date_list[d++] = $10;
if (isin[$9]++ == 0) isin_list[i++] = $9;
if (exch[$12]++ == 0) exch_list[e++] = $12;
if (fii[$5]++ == 0) fii_list[f++] = $5;
}
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
(( $17~/_EQ\>/ ) && ( $11~/1|2|3|5|9|1[24]/ )) {{ ++BNR[$10,$9,$12,$5]} {BSH[$10,$9,$12,$5] += $15} {BRV[$10,$9,$12,$5] += $16} }
(( $17~/_EQ\>/ ) && ( $11~/4|1[13]/ )) {{ ++SNR[$10,$9,$12,$5]} {SSH[$10,$9,$12,$5] += $15} {SRV[$10,$9,$12,$5] += $16} }
END {
{ print NR, "records processed."}
{ print " " }
{ printf("%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n", \
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV") }
{ for (u = 0; u < d; u++)
{
for (v = 0; v < i; v++)
{
for (w = 0; w < e; w++)
{
for (x = 0; x < f; x++)
#check first below for records with zeroes, don't print them
{ if (BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] + SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] > 0)
{ BR = BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SR = SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BS = BSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BV = BRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SS = SSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SV = SRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
{ printf("%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n", \
date_list[u], isin_list[v], exch_list[w], fii_list[x], BR, SR, BS, SS, BV, SV) } }
}
}
}
}
}
}
Expected output
6 records processed.
DATE ISIN EXCH FII BNR SNR BSH SSH BRV SRV
6/27/2003 INE239A01016 E9035083824200306 F5811773991200306 1 0 5655 0 3342840.15 0.00
6/27/2003 INE047A01013 E9035083824200306 F6292896459200306 1 0 3700 0 1832721.00 0.00
6/26/2003 INE040A01018 E745964372424200306 F6513202607200306 1 0 2600 0 629200.00 0.00
6/28/2003 INE585B01010 E912851176274200306 F4406661052200306 1 0 44600 0 5575000.00 0.00
6/28/2003 INE585B01010 E912851176274200306 F4500260787200306 0 1 0 445600 0.00 55700000.00
It is in this case that as the number of input records exceeds 6500, I end up having memory problems. Have about 7 million records in all.
For a 2 array subscript problem, albeit on a different data set, where 129+ million lines were processed in clock time of 11 minutes using the same GNU-AWK on the same machine, see optimizing loop, passing parameters from external file, naming array arguments within awk
Question: is it the case that awk is not very smart with memory management, but that some other more modern tools (say, SQL) would accomplish this task with the same memory resources? Or is this simply a characteristic of associative arrays, which I found magical in enabling me to avoid many passes over the data, many loops and SORT procedures, but which maybe work well up to 2 array subscripts, and then face exponential memory resource costs after that?
Afterword: the super-detailed almost-idiot-proof tutorial along with the code provided by Ed Morton in comments below makes a dramatic difference, especially his GAWK script tst.awk. He taught me about (a) using SUBSEP intelligently (b) tackling needless looping, which is crucial in this problem which tends to have very sparse arrays, with various AWK constructs. Compared to performance with my old code (only up to 6500 lines of input accepted on one machine, another couldn't even get that far), the performance of Ed Morton's tst.awk can be seen from the table below:
**filename start end min in ln out lines
2008_1 12:08:40 AM 12:27:18 AM 0:18 391438 301160
2008_2 12:27:18 AM 12:52:04 AM 0:24 402016 314177
2009_1 12:52:05 AM 1:05:15 AM 0:13 302081 238204
2009_2 1:05:15 AM 1:22:15 AM 0:17 360072 276768
2010_1 "slept" 507496 397533
2010_2 3:10:26 AM 3:10:50 AM 0:00 76200 58228
2010_3 3:10:50 AM 3:11:18 AM 0:00 80988 61725
2010_4 3:11:18 AM 3:11:47 AM 0:00 86923 65885
2010_5 3:11:47 AM 3:12:15 AM 0:00 80670 63059**
Times were obtained simply from using %time% on lines before and after tst.awk was executed, all put in a simple batch script, "min" is the clock time taken (per whatever rounding EXCEL does by default), "in ln" and "out lines" are lines of input and output, respectively. From processing the entire data that we have, from Jan 2003 to Jan 2014, we find the theoretical max number of output records = #dates*#ISINs*#Exchanges*#FIIs = 2992*2955*567*82268, while the actual number of total output lines is only 5,261,942, which is only 1.275*10^(-8) of the theoretical max -- very sparse indeed. That there was sparseness, we did guess earlier, but that the arrays could be SO sparse -- which matters a lot for memory management -- we had no way of telling till something actually completed, for a real data set. Time taken seems to increase exponentially in input size, but within limits that pose no practical difficulty. Thanks a ton, Ed.
There is no problem with associative arrays in general. In awk (except gawk for true 2D arrays) an associative array with 4 subscripts is identical to one with 2 subscripts since in reality it only has one subscript which is the concatenation of each of the pseudo-subscripts separated by SUBSEP.
Given you say I am unable to process more than 6500 lines at a time. the problem is far more likely to be in the way you wrote your code than any fundamental awk issue so if you'd like more help, post a small script with sample input and expected output that demonstrates your problem and attempted solution to see if we have suggestions on way to improve it's memory usage.
Given your posted script, I expect the problem is with those nested loops in your END section When you do:
for (i=1; i<=maxI; i++) {
for (j=1; j<=maxJ; j++) {
if ( arr[i,j] != 0 ) {
print arr[i,j]
}
}
}
you are CREATING arr[i,j] for every possible combination of i and j that didn't exist prior to the loop just by testing for arr[i,j] != 0. If you instead wrote:
for (i=1; i<=maxI; i++) {
for (j=1; j<=maxJ; j++) {
if ( (i,j) in arr ) {
print arr[i,j]
}
}
}
then the loop itself would not create new entries in arr[].
So change this block:
if (BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] + SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] > 0)
{
BR = BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SR = SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BS = BSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BV = BRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SS = SSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SV = SRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
which is probably unnecessarily turning each of BNR, SNR, BSH, BRV, SSH, and SRV into huge but highly sparse arrays, to something like this:
idx = date_list[u] SUBSEP isin_list[v] SUBSEP exch_list[w] SUBSEP fii_list[x]
BR = (idx in BNR ? BNR[idx] : 0)
SR = (idx in SNR ? SNR[idx] : 0)
if ( (BR + SR) > 0 )
{
BS = (idx in BSH ? BSH[idx] : 0)
BV = (idx in BRV ? BRV[idx] : 0)
SS = (idx in SSH ? SSH[idx] : 0)
SV = (idx in SRV ? SRV[idx] : 0)
and let us know if that helps. Also check your code for other places where you might be doing the same.
The reason you have this problem with 4 subscripts when you didn't with 2 is simply that you have 4 levels of nesting in the loops now creating much larger and more sparse arrays when when you just had 2.
Finally - you have some weird syntax in your script, some of which #MarkSetchell pointed out in a comment, and your script isn't as efficient as it could be since you're not using else statements and so testing for multiple conditions that can't possibly all be true and you're testing the same condition repeatedly, and it's not robust as you aren't anchoring your REs (e.g you test /4|1[13]/ instead of /^(4|1[13])$/ so for example your 4 would match on 14 or 41 etc. instead of just 4 on its own) so change your whole script to this:
$ cat tst.awk
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
$17 ~ /_EQ\>/ {
if (!seenDate[$10]++) date_list[++d] = $10
if (!seenIsin[$9]++) isin_list[++i] = $9
if (!seenExch[$12]++) exch_list[++e] = $12
if (!seenFii[$5]++) fii_list[++f] = $5
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
idx = $10 SUBSEP $9 SUBSEP $12 SUBSEP $5
if ( $11 ~ /^([12359]|1[24])$/ ) {
++BNR[idx]; BSH[idx] += $15; BRV[idx] += $16
}
else if ( $11 ~ /^(4|1[13])$/ ) {
++SNR[idx]; SSH[idx] += $15; SRV[idx] += $16
}
}
END {
print NR, "records processed."
print " "
printf "%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n",
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV"
for (u = 1; u <= d; u++)
{
for (v = 1; v <= i; v++)
{
for (w = 1; w <= e; w++)
{
for (x = 1; x <= f; x++)
{
#check first below for records with zeroes, don't print them
idx = date_list[u] SUBSEP isin_list[v] SUBSEP exch_list[w] SUBSEP fii_list[x]
BR = (idx in BNR ? BNR[idx] : 0)
SR = (idx in SNR ? SNR[idx] : 0)
if ( (BR + SR) > 0 )
{
BS = (idx in BSH ? BSH[idx] : 0)
BV = (idx in BRV ? BRV[idx] : 0)
SS = (idx in SSH ? SSH[idx] : 0)
SV = (idx in SRV ? SRV[idx] : 0)
printf "%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n",
date_list[u], isin_list[v], exch_list[w], fii_list[x], BR, SR, BS, SS, BV, SV
}
}
}
}
}
}
I added seen in front of 4 array names just because by convention arrays testing for the pre-existence of a value are typically named seen. Also, when populating the SNR[] etc arrays I created an idx variable first instead of repeatedly using the field numbers every time for both ease of changing it in future and mostly because string concatenation is relatively slow in awk and that's whats happening when you use multiple indices in an array so best to just do the string concatenation once explicitly. And I changed your date_list[] etc arrays to start at 1 instead of zero because all awk-generated arrays, strings and field numbers start at 1. You CAN create an array manually that starts at 0 or -357 or whatever number you want but it'll save shooting yourself in the foot some day if you always start them at 1.
I expect it could be made more efficient still by restricting the nested loops to only values that could exist for the enclosing loop index combinations (e.g. not every value of u+v+w is possible so there will be times when you shouldn't bother looping on x). For example:
$ cat tst.awk
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
$17 ~ /_EQ\>/ {
if (!seenDate[$10]++) date_list[++d] = $10
if (!seenIsin[$9]++) isin_list[++i] = $9
if (!seenExch[$12]++) exch_list[++e] = $12
if (!seenFii[$5]++) fii_list[++f] = $5
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
idx = $10 SUBSEP $9 SUBSEP $12 SUBSEP $5
if ( $11 ~ /^([12359]|1[24])$/ ) {
seen[$10,$9]
seen[$10,$9,$12]
++BNR[idx]; BSH[idx] += $15; BRV[idx] += $16
}
else if ( $11 ~ /^(4|1[13])$/ ) {
seen[$10,$9]
seen[$10,$9,$12]
++SNR[idx]; SSH[idx] += $15; SRV[idx] += $16
}
}
END {
printf "d = %d\n", d | "cat>&2"
printf "i = %d\n", i | "cat>&2"
printf "e = %d\n", e | "cat>&2"
printf "f = %d\n", f | "cat>&2"
print NR, "records processed."
print " "
printf "%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n",
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV"
for (u = 1; u <= d; u++)
{
date = date_list[u]
for (v = 1; v <= i; v++)
{
isin = isin_list[v]
if ( (date,isin) in seen )
{
for (w = 1; w <= e; w++)
{
exch = exch_list[w]
if ( (date,isin,exch) in seen )
{
for (x = 1; x <= f; x++)
{
fii = fii_list[x]
#check first below for records with zeroes, don't print them
idx = date SUBSEP isin SUBSEP exch SUBSEP fii
if ( (idx in BNR) || (idx in SNR) )
{
if (idx in BNR)
{
bnr = BNR[idx]
bsh = BSH[idx]
brv = BRV[idx]
}
else
{
bnr = bsh = brv = 0
}
if (idx in SNR)
{
snr = SNR[idx]
ssh = SSH[idx]
srv = SRV[idx]
}
else
{
snr = ssh = srv = 0
}
printf "%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n",
date, isin, exch, fii, bnr, snr, bsh, ssh, brv, srv
}
}
}
}
}
}
}
}

How do you convert 8-bit bytes to 6-bit characters?

I have a specific requirement to convert a stream of bytes into a character encoding that happens to be 6-bits per character.
Here's an example:
Input: 0x50 0x11 0xa0
Character Table:
010100 T
000001 A
000110 F
100000 SPACE
Output: "TAF "
Logically I can understand how this works:
Taking 0x50 0x11 0xa0 and showing as binary:
01010000 00010001 10100000
Which is "TAF ".
What's the best way to do this programmatically (pseudo code or c++). Thank you!
Well, every 3 bytes, you end up with four characters. So for one thing, you need to work out what to do if the input isn't a multiple of three bytes. (Does it have padding of some kind, like base64?)
Then I'd probably take each 3 bytes in turn. In C#, which is close enough to pseudo-code for C :)
for (int i = 0; i < array.Length; i += 3)
{
// Top 6 bits of byte i
int value1 = array[i] >> 2;
// Bottom 2 bits of byte i, top 4 bits of byte i+1
int value2 = ((array[i] & 0x3) << 4) | (array[i + 1] >> 4);
// Bottom 4 bits of byte i+1, top 2 bits of byte i+2
int value3 = ((array[i + 1] & 0xf) << 2) | (array[i + 2] >> 6);
// Bottom 6 bits of byte i+2
int value4 = array[i + 2] & 0x3f;
// Now use value1...value4, e.g. putting them into a char array.
// You'll need to decode from the 6-bit number (0-63) to the character.
}
Just in case if someone is interested - another variant that extracts 6-bit numbers from the stream as soon as they appear there. That is, results can be obtained even if less then 3 bytes are currently read. Would be useful for unpadded streams.
The code saves the state of the accumulator a in variable n which stores the number of bits left in accumulator from the previous read.
int n = 0;
unsigned char a = 0;
unsigned char b = 0;
while (read_byte(&byte)) {
// save (6-n) most significant bits of input byte to proper position
// in accumulator
a |= (b >> (n + 2)) & (077 >> n);
store_6bit(a);
a = 0;
// save remaining least significant bits of input byte to proper
// position in accumulator
a |= (b << (4 - n)) & ((077 << (4 - n)) & 077);
if (n == 4) {
store_6bit(a);
a = 0;
}
n = (n + 2) % 6;
}

How can I do mod without a mod operator?

This scripting language doesn't have a % or Mod(). I do have a Fix() that chops off the decimal part of a number. I only need positive results, so don't get too robust.
Will
// mod = a % b
c = Fix(a / b)
mod = a - b * c
do? I'm assuming you can at least divide here. All bets are off on negative numbers.
a mod n = a - (n * Fix(a/n))
For posterity, BrightScript now has a modulo operator, it looks like this:
c = a mod b
If someone arrives later, here are some more actual algorithms (with errors...read carefully)
https://eprint.iacr.org/2014/755.pdf
There are actually two main kind of reduction formulae: Barett and Montgomery. The paper from eprint repeat both in different versions (algorithms 1-3) and give an "improved" version in algorithm 4.
Overview
I give now an overview of the 4. algorithm:
1.) Compute "A*B" and Store the whole product in "C" that C and the modulus $p$ is the input for that algorithm.
2.) Compute the bit-length of $p$, say: the function "Width(p)" returns exactly that value.
3.) Split the input $C$ into N "blocks" of size "Width(p)" and store each in G. Start in G[0] = lsb(p) and end in G[N-1] = msb(p). (The description is really faulty of the paper)
4.) Start the while loop:
Set N=N-1 (to reach the last element)
precompute $b:=2^{Width(p)} \bmod p$
while N>0 do:
T = G[N]
for(i=0; i<Width(p); i++) do: //Note: that counter doesn't matter, it limits the loop)
T = T << 1 //leftshift by 1 bit
while is_set( bit( T, Width(p) ) ) do // (N+1)-th bit of T is 1
unset( bit( T, Width(p) ) ) // unset the (N+1)-th bit of T (==0)
T += b
endwhile
endfor
G[N-1] += T
while is_set( bit( G[N-1], Width(p) ) ) do
unset( bit( G[N-1], Width(p) ) )
G[N-1] += b
endwhile
N -= 1
endwhile
That does alot. Not we only need to recursivly reduce G[0]:
while G[0] > p do
G[0] -= p
endwhile
return G[0]// = C mod p
The other three algorithms are well defined, but this lacks some information or present it really wrong. But it works for any size ;)
What language is it?
A basic algorithm might be:
hold the modulo in a variable (modulo);
hold the target number in a variable (target);
initialize modulus variable;
while (target > 0) {
if (target > modulo) {
target -= modulo;
}
else if(target < modulo) {
modulus = target;
break;
}
}
This may not work for you performance-wise, but:
while (num >= mod_limit)
num = num - mod_limit
In javascript:
function modulo(num1, num2) {
if (num2 === 0 || isNaN(num1) || isNaN(num2)) {
return NaN;
}
if (num1 === 0) {
return 0;
}
var remainderIsPositive = num1 >= 0;
num1 = Math.abs(num1);
num2 = Math.abs(num2);
while (num1 >= num2) {
num1 -= num2
}
return remainderIsPositive ? num1 : 0 - num1;
}

Resources