Mix two variables with different lengths in a third variable in Powershell - powershell-2.0

How can I mix two variables with different lengths in a third variable?
Variable1 has 48 entries, variable2 has 16 entries. Variable3 should have after every third line from variable 1 the entries from variable two in every fourth line.
The length of the two variables could be different, but is always divisible by 3.
$i = 0 ; $var3 = $var1 | % { "$_ $($var2[$i])"; $i++ }
Doesn't work, because it is for variables of the same length
Example:
$Var1 (48 entries)
Name1
Location1
Country1
Name2
Location2
Country2
.
.
Name16
Location16
Country16
$Var2 (16 entries)
Date1
Date2
.
.
Date16
$Var3 (should have 64 entries)
Name1
Location1
Country1
Date1
.
.
Name16
Location16
Country16
Date16

I'm assuming $var1 and $var2 are arrays, and the "entries" are each elements.
If I had to use the method you're using to store variables, I'd do it like this:
$var1 = #('Name1','Location1','Country1','Name2','Location2','Country2','Name3','Location3',
'Country3','Name4','Location4','Country4','Name5','Location5','Country5','Name6','Location6',
'Country6','Name7','Location7','Country7','Name8','Location8','Country8','Name9','Location9',
'Country9','Name10','Location10','Country10','Name11','Location11','Country11','Name12',
'Location12','Country12','Name13','Location13','Country13','Name14','Location14','Country14',
'Name15','Location15','Country15','Name16','Location16','Country16');
$var2 = #('Date1','Date2','Date3','Date4','Date5','Date6','Date7','Date8','Date9','Date10',
'Date11','Date12','Date13','Date14','Date15','Date16');
$var3 = #();
for ($i = 0; $i -lt $var2.Count; $i++) {
$var3 += $var1[$i * 3];
$var3 += $var1[($i * 3) + 1];
$var3 += $var1[($i * 3) + 2];
$var3 += $var2[$i];
}
In reality, I'd probably store this as an array of hashtables, or in a PSObject/PSCustomObject as tuples. Hell, I might even prefer building a DataTable to a flat array.

Related

Need DXL code to arrange attribute lines into table (converting DOORS data to LaTeX source)

I have a DXL script which parses all data in DOORS columns into a LaTeX -compatible text source file. What I can't figure out is how to re-order some data into a tabular - compatible format. The attributes in question are DXL links to a reference DOORS module, so there is one line (separated by a line-feed) per link in each cell. Currently I loop thru all columns for each object (row), with the code snippet (part of the full script)
for col in doorsModule do {
var_name = title( col )
if( ! main( col ) && search( regexp "Absolute Number", var_name, 0 ) == false )
{
// oss is my output stream variable
if ( length(text(col, obj) ) > 0 )
{
oss << "\\textbf{";
oss << var_name; // still the column title here
oss << "}\t"
var_name = text( col, obj );
oss << var_name;
oss << "\n\n";
c++;
}
}
}
Examples of the contents of a cell, where I have separately parsed the Column Name to bold and collected it prior to collecting the cell contents. All four lines are the contents of a single cell.
\textbf{LinkedItemName}
DISTANCE
MinSpeed
MaxSpeed
Time
\textbf{Unit}
m
km/h
km/h
minutes
\textbf{Driver1}
100
30
80
20
\textbf{Driver2}
50
20
60
10
\textbf{Driver3}
60
30
60
30
What I want to do is re-arrange the data so that I can write the source code for a table, to wit:
\textbf{LinkedItemName} & \textbf{Unit} & \textbf{Driver1} & \textbf{Driver2} & \textbf{Driver3} \\
DISTANCE & m & 100 & 50 & 60 \\
MinSpeed & km/h & 30 & 20 & 30 \\
MaxSpeed & km/h & 80 & 60 & 60 \\
Time & minutes & 20 & 10 & 30 \\
I know in advance the exact Attribute names I'm "collecting." I can't figure out how to manipulate the data returned from each cell (regex or otherwise) to create my desired final output. I'm guessing some regex code (in DXL) might be able to assign the contents of each line within a cell to a series of variables, but don't quite see how.
Combination of regex and string assembly seems to work. Here's a sample bit of code (some of which is straight from the DOORS DXL Reference Manual)
int idx = 0
Array thewords = create(1,1)
Array thelen = create(1,1)
Regexp getaline = regexp2 ".*"
// matches any character except newline
string txt1 = "line 1\nline two\nline three\n"
// 3 line string
while (!null txt1 && getaline txt1) {
int ilen = length(txt1[match 0])
print "ilen is " ilen "\n"
put(thelen, ilen, idx, 0)
putString(thewords,txt1[match 0],0,idx)
idx ++
// match 0 is whole of match
txt1 = txt1[end 0 + 2:] // move past newline
}
int jj
// initialize to simplify adding the "&"
int lenone = (int get(thelen,0,0) )
string foo = (string get(thewords, 0, 0,lenone ) )
int lenout
for (jj = 1; jj < idx; jj++) {
lenout = (int get(thelen,jj,0) )
foo = foo "&" (string get(thewords, 0, jj,lenout ) )
}
foo = foo "\\\\"
// foo is now "line 1&line two&line three\\ " (without quotes) as LaTeX wants

How to get each individual digit of a given number in Basic?

I have one program downloaded from internet and need to get each digit printed out from a three digit number. For example:
Input: 123
Expected Output:
1
2
3
I have 598
Need to Get:
5
9
8
I try using this formula but the problem is when number is with decimal function failed:
FIRST_DIGIT = (number mod 1000) / 100
SECOND_DIGIT = (number mod 100) / 10
THIRD_DIGIT = (number mod 10)
Where number is the above example so here is calulation:
FIRST_DIGIT = (598 mod 1000) / 100 = 5,98 <== FAILED...i need to get 5 but my program shows 0 because i have decimal point
SECOND_DIGIT = (598 mod 100) / 10 = 9,8 <== FAILED...i need to get 9 but my program shows 0 because i have decimal point
THIRD_DIGIT = (598 mod 10) = 8 <== CORRECT...i get from program output number 8 and this digit is correct.
So my question is is there sample or more efficient code that get each digit from number without decimal point? I don't want to use round to round nearest number because sometime it fill failed if number is larger that .5.
Thanks
The simplest solution is to use integer division (\) instead of floating point division (/).
If you replace each one of your examples with the backslash (\) instead of forward slash (/) they will return integer values.
FIRST_DIGIT = (598 mod 1000) \ 100 = 5
SECOND_DIGIT = (598 mod 100) \ 10 = 9
THIRD_DIGIT = (598 mod 10) = 8
You don't have to do any fancy integer calculations as long as you pull it apart from a string:
INPUT X
X$ = STR$(X)
FOR Z = 1 TO LEN(X$)
PRINT MID$(X$, Z, 1)
NEXT
Then, for example, you could act upon each string element:
INPUT X
X$ = STR$(X)
FOR Z = 1 TO LEN(X$)
Q = VAL(MID$(X$, Z, 1))
N = N + 1
PRINT "Digit"; N; " equals"; Q
NEXT
Additionally, you could tear apart the string character by character:
INPUT X
X$ = STR$(X)
FOR Z = 1 TO LEN(X$)
SELECT CASE MID$(X$, Z, 1)
CASE " ", ".", "+", "-", "E", "D"
' special char
CASE ELSE
Q = VAL(MID$(X$, Z, 1))
N = N + 1
PRINT "Digit"; N; " equals"; Q
END SELECT
NEXT
I'm no expert in Basic but looks like you have to convert floating point number to Integer. A quick google search told me that you have to use Int(floating_point_number) to convert float to integer.
So
Int((number mod 100)/ 10)
should probably the one you are looking for.
And, finally, all string elements could be parsed:
INPUT X
X$ = STR$(X)
PRINT X$
FOR Z = 1 TO LEN(X$)
SELECT CASE MID$(X$, Z, 1)
CASE " "
' nul
CASE "E", "D"
Exponent = -1
CASE "."
Decimal = -1
CASE "+"
UnaryPlus = -1
CASE "-"
UnaryNegative = -1
CASE ELSE
Q = VAL(MID$(X$, Z, 1))
N = N + 1
PRINT "Digit"; N; " equals"; Q
END SELECT
NEXT
IF Exponent THEN PRINT "There was an exponent."
IF Decimal THEN PRINT "There was a decimal."
IF UnaryPlus THEN PRINT "There was a plus sign."
IF UnaryNegative THEN PRINT "There was a negative sign."

associative arrays in awk challenging memory limits

This is related to my recent post in Awk code with associative arrays -- array doesn't seem populated, but no error and also to optimizing loop, passing parameters from external file, naming array arguments within awk
My basic problem here is simply to compute from detailed ancient archival financial market data, daily aggregates of #transactions, #shares, value, BY DATE, FIRM-ID, EXCHANGE, etc. Learnt to use associative arrays in awk for this, and was thrilled to be able to process 129+ million lines in clock time of under 11 minutes. Literally before I finished my coffee.
Became a little more ambitious, and moved from 2 array subscripts to 4, and now I am unable to process more than 6500 lines at a time.
Get error messages of the form:
K:\User Folders\KRISHNANM\PAPERS\FII_Transaction_Data>zcat
RAW_DATA\2003_1.zip | gawk -f CODE\FII_daily_aggregates_v2.awk >
OUTPUT\2003_1.txt&
gawk: CODE\FII_daily_aggregates_v2.awk:33: (FILENAME=- FNR=49300)
fatal: more_no des: nextfree: can't allocate memory (Not enough space)
On some runs the machine has told me it lacks as little as 52 KB of memory. I have what I think of a std configuration with Win-7 and 8MB RAM.
(Economist by training, not computer scientist.) I realize that going from 2 to 4 arrays makes the problem computationally much more complex for the computer, but is there something one can do to improve memory management at least a little bit. I have tried closing everything else I am doing. The error always has to do only with memory, never with disk space or anything else.
Sample INPUT:
49290,C198962542782200306,6/30/2003,433581,F5811773991200306,S5405611832200306,B5086397478200306,NESTLE INDIA LTD.,INE239A01016,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,591.13,5655,3342840.15,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49291,C198962542782200306,6/30/2003,433563,F6292896459200306,S6344227311200306,B6110521493200306,GRASIM INDUSTRIES LTD.,INE047A01013,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,495.33,3700,1832721,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49292,C198962542782200306,6/30/2003,433681,F6513202607200306,S1724027402200306,B6372023178200306,HDFC BANK LTD,INE040A01018,6/26/2003,1,E745964372424200306,REG_DL_STLD_02,242,2600,629200,REG_DL_INSTR_EQ,REG_DL_DLAY_D,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49293,C7885768925200306,6/30/2003,48128,F4406661052200306,S7376401565200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,44600,5575000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49294,C7885768925200306,6/30/2003,48129,F4500260787200306,S1312094035200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,4,E912851176274200306,REG_DL_STLD_04,125,445600,55700000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49295,C7885768925200306,6/30/2003,48130,F6425024637200306,S2872499118200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,48000,6000000,REG_DL_INSTR_EU,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
Code
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
( $17~/_EQ\>/ ) { if (date[$10]++ == 0) date_list[d++] = $10;
if (isin[$9]++ == 0) isin_list[i++] = $9;
if (exch[$12]++ == 0) exch_list[e++] = $12;
if (fii[$5]++ == 0) fii_list[f++] = $5;
}
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
(( $17~/_EQ\>/ ) && ( $11~/1|2|3|5|9|1[24]/ )) {{ ++BNR[$10,$9,$12,$5]} {BSH[$10,$9,$12,$5] += $15} {BRV[$10,$9,$12,$5] += $16} }
(( $17~/_EQ\>/ ) && ( $11~/4|1[13]/ )) {{ ++SNR[$10,$9,$12,$5]} {SSH[$10,$9,$12,$5] += $15} {SRV[$10,$9,$12,$5] += $16} }
END {
{ print NR, "records processed."}
{ print " " }
{ printf("%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n", \
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV") }
{ for (u = 0; u < d; u++)
{
for (v = 0; v < i; v++)
{
for (w = 0; w < e; w++)
{
for (x = 0; x < f; x++)
#check first below for records with zeroes, don't print them
{ if (BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] + SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] > 0)
{ BR = BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SR = SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BS = BSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BV = BRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SS = SSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SV = SRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
{ printf("%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n", \
date_list[u], isin_list[v], exch_list[w], fii_list[x], BR, SR, BS, SS, BV, SV) } }
}
}
}
}
}
}
Expected output
6 records processed.
DATE ISIN EXCH FII BNR SNR BSH SSH BRV SRV
6/27/2003 INE239A01016 E9035083824200306 F5811773991200306 1 0 5655 0 3342840.15 0.00
6/27/2003 INE047A01013 E9035083824200306 F6292896459200306 1 0 3700 0 1832721.00 0.00
6/26/2003 INE040A01018 E745964372424200306 F6513202607200306 1 0 2600 0 629200.00 0.00
6/28/2003 INE585B01010 E912851176274200306 F4406661052200306 1 0 44600 0 5575000.00 0.00
6/28/2003 INE585B01010 E912851176274200306 F4500260787200306 0 1 0 445600 0.00 55700000.00
It is in this case that as the number of input records exceeds 6500, I end up having memory problems. Have about 7 million records in all.
For a 2 array subscript problem, albeit on a different data set, where 129+ million lines were processed in clock time of 11 minutes using the same GNU-AWK on the same machine, see optimizing loop, passing parameters from external file, naming array arguments within awk
Question: is it the case that awk is not very smart with memory management, but that some other more modern tools (say, SQL) would accomplish this task with the same memory resources? Or is this simply a characteristic of associative arrays, which I found magical in enabling me to avoid many passes over the data, many loops and SORT procedures, but which maybe work well up to 2 array subscripts, and then face exponential memory resource costs after that?
Afterword: the super-detailed almost-idiot-proof tutorial along with the code provided by Ed Morton in comments below makes a dramatic difference, especially his GAWK script tst.awk. He taught me about (a) using SUBSEP intelligently (b) tackling needless looping, which is crucial in this problem which tends to have very sparse arrays, with various AWK constructs. Compared to performance with my old code (only up to 6500 lines of input accepted on one machine, another couldn't even get that far), the performance of Ed Morton's tst.awk can be seen from the table below:
**filename start end min in ln out lines
2008_1 12:08:40 AM 12:27:18 AM 0:18 391438 301160
2008_2 12:27:18 AM 12:52:04 AM 0:24 402016 314177
2009_1 12:52:05 AM 1:05:15 AM 0:13 302081 238204
2009_2 1:05:15 AM 1:22:15 AM 0:17 360072 276768
2010_1 "slept" 507496 397533
2010_2 3:10:26 AM 3:10:50 AM 0:00 76200 58228
2010_3 3:10:50 AM 3:11:18 AM 0:00 80988 61725
2010_4 3:11:18 AM 3:11:47 AM 0:00 86923 65885
2010_5 3:11:47 AM 3:12:15 AM 0:00 80670 63059**
Times were obtained simply from using %time% on lines before and after tst.awk was executed, all put in a simple batch script, "min" is the clock time taken (per whatever rounding EXCEL does by default), "in ln" and "out lines" are lines of input and output, respectively. From processing the entire data that we have, from Jan 2003 to Jan 2014, we find the theoretical max number of output records = #dates*#ISINs*#Exchanges*#FIIs = 2992*2955*567*82268, while the actual number of total output lines is only 5,261,942, which is only 1.275*10^(-8) of the theoretical max -- very sparse indeed. That there was sparseness, we did guess earlier, but that the arrays could be SO sparse -- which matters a lot for memory management -- we had no way of telling till something actually completed, for a real data set. Time taken seems to increase exponentially in input size, but within limits that pose no practical difficulty. Thanks a ton, Ed.
There is no problem with associative arrays in general. In awk (except gawk for true 2D arrays) an associative array with 4 subscripts is identical to one with 2 subscripts since in reality it only has one subscript which is the concatenation of each of the pseudo-subscripts separated by SUBSEP.
Given you say I am unable to process more than 6500 lines at a time. the problem is far more likely to be in the way you wrote your code than any fundamental awk issue so if you'd like more help, post a small script with sample input and expected output that demonstrates your problem and attempted solution to see if we have suggestions on way to improve it's memory usage.
Given your posted script, I expect the problem is with those nested loops in your END section When you do:
for (i=1; i<=maxI; i++) {
for (j=1; j<=maxJ; j++) {
if ( arr[i,j] != 0 ) {
print arr[i,j]
}
}
}
you are CREATING arr[i,j] for every possible combination of i and j that didn't exist prior to the loop just by testing for arr[i,j] != 0. If you instead wrote:
for (i=1; i<=maxI; i++) {
for (j=1; j<=maxJ; j++) {
if ( (i,j) in arr ) {
print arr[i,j]
}
}
}
then the loop itself would not create new entries in arr[].
So change this block:
if (BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] + SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] > 0)
{
BR = BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SR = SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BS = BSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BV = BRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SS = SSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SV = SRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
which is probably unnecessarily turning each of BNR, SNR, BSH, BRV, SSH, and SRV into huge but highly sparse arrays, to something like this:
idx = date_list[u] SUBSEP isin_list[v] SUBSEP exch_list[w] SUBSEP fii_list[x]
BR = (idx in BNR ? BNR[idx] : 0)
SR = (idx in SNR ? SNR[idx] : 0)
if ( (BR + SR) > 0 )
{
BS = (idx in BSH ? BSH[idx] : 0)
BV = (idx in BRV ? BRV[idx] : 0)
SS = (idx in SSH ? SSH[idx] : 0)
SV = (idx in SRV ? SRV[idx] : 0)
and let us know if that helps. Also check your code for other places where you might be doing the same.
The reason you have this problem with 4 subscripts when you didn't with 2 is simply that you have 4 levels of nesting in the loops now creating much larger and more sparse arrays when when you just had 2.
Finally - you have some weird syntax in your script, some of which #MarkSetchell pointed out in a comment, and your script isn't as efficient as it could be since you're not using else statements and so testing for multiple conditions that can't possibly all be true and you're testing the same condition repeatedly, and it's not robust as you aren't anchoring your REs (e.g you test /4|1[13]/ instead of /^(4|1[13])$/ so for example your 4 would match on 14 or 41 etc. instead of just 4 on its own) so change your whole script to this:
$ cat tst.awk
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
$17 ~ /_EQ\>/ {
if (!seenDate[$10]++) date_list[++d] = $10
if (!seenIsin[$9]++) isin_list[++i] = $9
if (!seenExch[$12]++) exch_list[++e] = $12
if (!seenFii[$5]++) fii_list[++f] = $5
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
idx = $10 SUBSEP $9 SUBSEP $12 SUBSEP $5
if ( $11 ~ /^([12359]|1[24])$/ ) {
++BNR[idx]; BSH[idx] += $15; BRV[idx] += $16
}
else if ( $11 ~ /^(4|1[13])$/ ) {
++SNR[idx]; SSH[idx] += $15; SRV[idx] += $16
}
}
END {
print NR, "records processed."
print " "
printf "%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n",
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV"
for (u = 1; u <= d; u++)
{
for (v = 1; v <= i; v++)
{
for (w = 1; w <= e; w++)
{
for (x = 1; x <= f; x++)
{
#check first below for records with zeroes, don't print them
idx = date_list[u] SUBSEP isin_list[v] SUBSEP exch_list[w] SUBSEP fii_list[x]
BR = (idx in BNR ? BNR[idx] : 0)
SR = (idx in SNR ? SNR[idx] : 0)
if ( (BR + SR) > 0 )
{
BS = (idx in BSH ? BSH[idx] : 0)
BV = (idx in BRV ? BRV[idx] : 0)
SS = (idx in SSH ? SSH[idx] : 0)
SV = (idx in SRV ? SRV[idx] : 0)
printf "%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n",
date_list[u], isin_list[v], exch_list[w], fii_list[x], BR, SR, BS, SS, BV, SV
}
}
}
}
}
}
I added seen in front of 4 array names just because by convention arrays testing for the pre-existence of a value are typically named seen. Also, when populating the SNR[] etc arrays I created an idx variable first instead of repeatedly using the field numbers every time for both ease of changing it in future and mostly because string concatenation is relatively slow in awk and that's whats happening when you use multiple indices in an array so best to just do the string concatenation once explicitly. And I changed your date_list[] etc arrays to start at 1 instead of zero because all awk-generated arrays, strings and field numbers start at 1. You CAN create an array manually that starts at 0 or -357 or whatever number you want but it'll save shooting yourself in the foot some day if you always start them at 1.
I expect it could be made more efficient still by restricting the nested loops to only values that could exist for the enclosing loop index combinations (e.g. not every value of u+v+w is possible so there will be times when you shouldn't bother looping on x). For example:
$ cat tst.awk
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
$17 ~ /_EQ\>/ {
if (!seenDate[$10]++) date_list[++d] = $10
if (!seenIsin[$9]++) isin_list[++i] = $9
if (!seenExch[$12]++) exch_list[++e] = $12
if (!seenFii[$5]++) fii_list[++f] = $5
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
idx = $10 SUBSEP $9 SUBSEP $12 SUBSEP $5
if ( $11 ~ /^([12359]|1[24])$/ ) {
seen[$10,$9]
seen[$10,$9,$12]
++BNR[idx]; BSH[idx] += $15; BRV[idx] += $16
}
else if ( $11 ~ /^(4|1[13])$/ ) {
seen[$10,$9]
seen[$10,$9,$12]
++SNR[idx]; SSH[idx] += $15; SRV[idx] += $16
}
}
END {
printf "d = %d\n", d | "cat>&2"
printf "i = %d\n", i | "cat>&2"
printf "e = %d\n", e | "cat>&2"
printf "f = %d\n", f | "cat>&2"
print NR, "records processed."
print " "
printf "%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n",
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV"
for (u = 1; u <= d; u++)
{
date = date_list[u]
for (v = 1; v <= i; v++)
{
isin = isin_list[v]
if ( (date,isin) in seen )
{
for (w = 1; w <= e; w++)
{
exch = exch_list[w]
if ( (date,isin,exch) in seen )
{
for (x = 1; x <= f; x++)
{
fii = fii_list[x]
#check first below for records with zeroes, don't print them
idx = date SUBSEP isin SUBSEP exch SUBSEP fii
if ( (idx in BNR) || (idx in SNR) )
{
if (idx in BNR)
{
bnr = BNR[idx]
bsh = BSH[idx]
brv = BRV[idx]
}
else
{
bnr = bsh = brv = 0
}
if (idx in SNR)
{
snr = SNR[idx]
ssh = SSH[idx]
srv = SRV[idx]
}
else
{
snr = ssh = srv = 0
}
printf "%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n",
date, isin, exch, fii, bnr, snr, bsh, ssh, brv, srv
}
}
}
}
}
}
}
}

How to convert a Lua string to float

I am writing a simple Lua script to calculate a median from a Sorted Set (http://redis.io/commands/#sorted_set) within Redis 2.8. The script is below
local cnt = redis.call("ZCARD", KEYS[1])
if cnt > 0 then
if cnt%2 > 0 then
local mid = math.floor(cnt/2)
return redis.call("ZRANGE", KEYS[1], mid, mid)
else
local mid = math.floor(cnt/2)
local vals = redis.call("ZRANGE", KEYS[1], mid-1, mid)
return (tonumber(vals[1]) + tonumber(vals[2]))/2.0
end
else
return nil
end
The problem is the script always returns an integer, when the result should be a float. The result is wrong.
$ redis-cli zrange floats 0 100
1) "1.1"
2) "2.1"
3) "3.1"
4) "3.4975"
5) "42.63"
6) "4.1"
$ redis-cli EVAL "$(cat median.lua)" 1 floats
(integer) 3
The correct result should be (3.1 + 3.4975)/2.0 == 3.298
From the documentation for EVAL:
Lua has a single numerical type, Lua numbers. There is no distinction between integers and floats. So we always convert Lua numbers into integer replies, removing the decimal part of the number if any. If you want to return a float from Lua you should return it as a string, exactly like Redis itself does (see for instance the ZSCORE command).
Therefore, you should update your script so you are returning the float as a string:
return tostring((tonumber(vals[1]) + tonumber(vals[2]))/2.0)
sorry i have no explanation of this i encounter this by accidental when making script
function int2float(integer)
return integer + 0.0
end

How to output SAS IML values for printing

I would like to take values that are calculated in IML to be used in SAS' printing functionality %PRNTINIT. This functionality is used to provide weekly reports from a database that can get updated.
I have some legacy code that uses proc sql to declare macro-type values that are called upon later, e.g.:
Declaring variables: tot1-tot4
*Get total number of subjects in each group in macro variable;
proc sort data = avg3; by description; run;
proc sql noprint;
select _freq_
into: tot1-:tot4
from avg3;
quit;
Calling variables tot1-tot4 for printing
%print(column = 1, style = bold, just = center, lines = bottom:none);
%print("(N= &tot1)", column = 2, just = center, lines = bottom:none);
%print("(N= &tot2)", column = 3, just = center, lines = bottom:none);
%print("(N= &tot3)", column = 4, just = center, lines = bottom:none);
%print("(N= &tot4 )", column = 5, just = center, lines = bottom:none);
%print(column = 6, just = center, lines = bottom:none);
I would like to be able to call values from IML similarly, if possible.
Example data:
data test ;
input age type gender $;
cards;
1 1 m
1 1 m
1 1 m
1 1 f
1 1 f
1 2 f
2 1 m
2 1 f
2 2 m
2 2 m
2 2 m
2 2 m
2 2 m
2 2 f
2 2 f
2 2 f
;
proc freq data = test;
tables type*age / chisq norow nocol nopercent outexpect out=out1 ;
tables type*gender / chisq norow nocol nopercent outexpect out=out2 ;
run;
options missing=" ";
proc iml;
reset print;
use out2;
read all var {count} into count;
type1 = count[1:2] ;
type2 = count[3:4] ;
tab = type1 || type2 ;
cols = tab[+,] ;
rows = tab[,+] ;
tot = sum(tab) ;
perc = round(cols / tot, .01) ;
cell_perc = round(tab / (cols//cols) , .01) ;
expect = (rows * cols) / tot ;
chi_1 = sum((tab - expect)##2/expect) ;
p_chi_1 = 1-CDF('CHISQUARE',chi_1, ((ncol(tab)-1)*(nrow(tab)-1)));
print tab p_chi_1 perc cell_perc;
out_sex = tab || (. // p_chi_1);
print out_sex;
print out_sex[colname={"1","2"}
rowname={"f" "m" "p-value"}
label="Table of Type by Gender"];
call symput(t1_sum, cols[1,1]) ;
%let t2_sum = put(cols[1,2]) ;
%let t1_per = perc[1,1] ;
%let t2_per = perc[1,2] ;
%let t1_f = tab[1,1] ;
%let t1_m = tab[2,1] ;
%let t2_f = tab[1,2] ;
%let t2_m = tab[2,2] ;
%let t1_f_p = cell_perc[1,1] ;
%let t1_m_p = cell_perc[2,1] ;
%let t2_f_p = cell_perc[1,2] ;
%let t2_m_p = cell_perc[2,2] ;
%let p_val = p_chi_1 ;
***** is it possible to list output values here for use in table building ??? ;
* like: %let t1_f = tab[1,1]
%let t2_f = tab[2,1] etc... ;
quit;
So I would like to declare a print statement like the following:
%print( "(N=&tab[1,1], column = 1, just=center, lines = bottom:none);
%print( "(N=&tab[1,2], column = 2, just=center, lines = bottom:none);
etc...
Any help on this is greatly appreciated...
Update: Unable to extract declared macro values out of IML
I have been able to calculate the correct values and format the table successfully.
However, I am unable to extract the values for use in the print macro.
I have created some matrices and calculated values in IML, but when I try to declare macro variables for use later, the only thing that is returned is the literal value that I declared the variable to be... e.g.:
and
You can see in the table what I want the numbers to be, but have thus far been unsuccessful. I have tried using %let, put, symput , and symputx without success.
call symput(t1_sum, cols[1,1]) ;
%let t2_sum = put(cols[1,2]) ;
%let t1_per = perc[1,1] ;
%let t2_per = perc[1,2] ;
%let t1_f = tab[1,1] ;
%let t1_m = tab[2,1] ;
%let t2_f = tab[1,2] ;
%let t2_m = tab[2,2] ;
%let t1_f_p = cell_perc[1,1] ;
%let t1_m_p = cell_perc[2,1] ;
%let t2_f_p = cell_perc[1,2] ;
%let t2_m_p = cell_perc[2,2] ;
Blerg...
A SAS/IML matrix must be all numeric or all character, which is why your attempt did not work. However, the SAS/IML PRINT statement has several options that enable you to label columns and rows and to apply formats. See http://blogs.sas.com/content/iml/2011/08/01/options-for-printing-a-matrix/
Together with the global SAS OPTIONS statement, I think you can get the output that you want.
1) Use the global statement
options missing=" ";
to tell SAS to print missing values as blanks.
2) Your goal is to append a column to the 2x2 TAB matrix. You can use (numeric) missing values for the rows that have no data:
out_age = tab || (. // p_chi_1);
3) You can now print this 2x3 table, and use the COLNAME= and ROWNAME= options to display row headings:
print out_age[rowname={"1","2"}
colname={"f" "m" "p-value"}
label="Table of Type by Gender"];
If you really want to use your old-style macro, you can use the SYMPUTX statement to copy values from SAS/IML into macro variables, as shown in this article: http://blogs.sas.com/content/iml/2011/10/17/does-symput-work-in-iml/
After much searching, and finding some help here on SO, I was able to piece the answer together.
The output value has to be a string
The name of the macro variable has to be in quotes
The call has to be to symputx in order to trim extra whitespace (when compared to symput)
Code to extract IML values to macro variables
call symputx('t1_sum', char(cols[1,1])) ;
call symputx('t2_sum', char(cols[1,2])) ;
call symputx('t1_per', char(perc[1,1])) ;
call symputx('t2_per', char(perc[1,2])) ;
call symputx('t1_f' , char(tab[1,1])) ;
call symputx('t1_m' , char(tab[2,1])) ;
call symputx('t2_f' , char(tab[1,2])) ;
call symputx('t2_m' , char(tab[2,2])) ;
call symputx('t1_f_p' , char(cell_perc[1,1])) ;
call symputx('t1_m_p' , char(cell_perc[2,1])) ;
call symputx('t2_f_p' , char(cell_perc[1,2])) ;
call symputx('t2_m_p' , char(cell_perc[2,2])) ;
call symputx('p_val' , char(round(p_chi_1, .001))) ;
Partial code used to build table using old SAS %PRNTINIT macro
...
%print("Female", column = 1, just = center );
%print("&t1_f (&t1_f_p)", column = 2, just = center );
%print("&t2_f (&t2_f_p)", column = 3, just = center );
%print(, column = 4, just = center );
%print(proc = newrow);
%print("Male", column = 1, just = center );
%print("&t1_m (&t1_m_p)", column = 2, just = center );
%print("&t2_m (&t2_m_p)", column = 3, just = center );
%print("&p_val", column = 4, just = center );
...
The desired result:

Resources