Need DXL code to arrange attribute lines into table (converting DOORS data to LaTeX source) - ibm-doors

I have a DXL script which parses all data in DOORS columns into a LaTeX -compatible text source file. What I can't figure out is how to re-order some data into a tabular - compatible format. The attributes in question are DXL links to a reference DOORS module, so there is one line (separated by a line-feed) per link in each cell. Currently I loop thru all columns for each object (row), with the code snippet (part of the full script)
for col in doorsModule do {
var_name = title( col )
if( ! main( col ) && search( regexp "Absolute Number", var_name, 0 ) == false )
{
// oss is my output stream variable
if ( length(text(col, obj) ) > 0 )
{
oss << "\\textbf{";
oss << var_name; // still the column title here
oss << "}\t"
var_name = text( col, obj );
oss << var_name;
oss << "\n\n";
c++;
}
}
}
Examples of the contents of a cell, where I have separately parsed the Column Name to bold and collected it prior to collecting the cell contents. All four lines are the contents of a single cell.
\textbf{LinkedItemName}
DISTANCE
MinSpeed
MaxSpeed
Time
\textbf{Unit}
m
km/h
km/h
minutes
\textbf{Driver1}
100
30
80
20
\textbf{Driver2}
50
20
60
10
\textbf{Driver3}
60
30
60
30
What I want to do is re-arrange the data so that I can write the source code for a table, to wit:
\textbf{LinkedItemName} & \textbf{Unit} & \textbf{Driver1} & \textbf{Driver2} & \textbf{Driver3} \\
DISTANCE & m & 100 & 50 & 60 \\
MinSpeed & km/h & 30 & 20 & 30 \\
MaxSpeed & km/h & 80 & 60 & 60 \\
Time & minutes & 20 & 10 & 30 \\
I know in advance the exact Attribute names I'm "collecting." I can't figure out how to manipulate the data returned from each cell (regex or otherwise) to create my desired final output. I'm guessing some regex code (in DXL) might be able to assign the contents of each line within a cell to a series of variables, but don't quite see how.

Combination of regex and string assembly seems to work. Here's a sample bit of code (some of which is straight from the DOORS DXL Reference Manual)
int idx = 0
Array thewords = create(1,1)
Array thelen = create(1,1)
Regexp getaline = regexp2 ".*"
// matches any character except newline
string txt1 = "line 1\nline two\nline three\n"
// 3 line string
while (!null txt1 && getaline txt1) {
int ilen = length(txt1[match 0])
print "ilen is " ilen "\n"
put(thelen, ilen, idx, 0)
putString(thewords,txt1[match 0],0,idx)
idx ++
// match 0 is whole of match
txt1 = txt1[end 0 + 2:] // move past newline
}
int jj
// initialize to simplify adding the "&"
int lenone = (int get(thelen,0,0) )
string foo = (string get(thewords, 0, 0,lenone ) )
int lenout
for (jj = 1; jj < idx; jj++) {
lenout = (int get(thelen,jj,0) )
foo = foo "&" (string get(thewords, 0, jj,lenout ) )
}
foo = foo "\\\\"
// foo is now "line 1&line two&line three\\ " (without quotes) as LaTeX wants

Related

How can I generate an array of numbers between multiple ranges defined in separate cells in Sheet?

A1: 1 | B1: 4
A2: 3 | B2: 6
How can I get {1, 2, 3, 3, 4, 4, 5, 6} out of this?
– – – – – –
I know this way:
=ArrayFormula({ROW(INDIRECT(A1&":"&B1)); ROW(INDIRECT(A2&":"&B2))})
That does the job perfectly but what if I don't know, how many ranges there will be? I want to generate an array of all the numbers between values specified in cells from A1:B1 all the way to A:B.
Thank you in advance!
Here is a relatively simple formula to generate the array you're talking about based on an infinite number of ranges in columns A and B.
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(SEQUENCE(1,MAX(B1:B10-A1:A10)+1,0)+A1:A10&"|"&B1:B10),"|",0,0),"Select Col1 where Col1<=Col2 order by Col1",0))
You can see it demonstrated in the tab called Demo 2 on this sheet.
In Excel 365 with your data in columns A and B, pick a cell and enter:
="{" & TEXTJOIN(",",TRUE,SEQUENCE(,MAX(A:B),MIN(A:B))) & "}"
EDIT#1:
Try this VBA macro:
Sub MakeArray()
Dim I As Long, N As Long, J, k
Dim strng As String
Dim arr As Variant
N = Cells(Rows.Count, "A").End(xlUp).Row
For I = 1 To N
For J = Cells(I, 1) To Cells(I, 2)
strng = strng & "," & J
Next J
Next I
strng = Mid(strng, 2)
strng = "{" & Join(fSort(Split(strng, ",")), ",") & "}"
MsgBox strng
End Sub
Public Function fSort(ByVal arry)
Dim I As Long, J As Long, Low As Long
Dim Hi As Long, Temp As Variant
Low = LBound(arry)
Hi = UBound(arry)
J = (Hi - Low + 1) \ 2
Do While J > 0
For I = Low To Hi - J
If arry(I) > arry(I + J) Then
Temp = arry(I)
arry(I) = arry(I + J)
arry(I + J) = Temp
End If
Next I
For I = Hi - J To Low Step -1
If arry(I) > arry(I + J) Then
Temp = arry(I)
arry(I) = arry(I + J)
arry(I + J) = Temp
End If
Next I
J = J \ 2
Loop
fSort = arry
End Function
The macro:
creates a comma-separated string from each A/B pair
sorts the string
outputs the string

identify yAxis tick interval

I am using highcharts to display total income per quarter (in thousands of pounds) for a variety of departments.
Sometimes the income for the department is quite small. in this case, the y axis values contains 2 decimal places
Sometimes the income is larger and the y axis values contain 1 decimal place
And occasionally, the value are very large and the y axis values do not contain any decimal places
Fiddle to demonstrate different formatting
The problem I have is that the current formatting of the y axis looks wrong.
I need to set the number of decimal places on the y axis values based on the tick interval: -
small values (i.e. 0.25, 0.5, 0.75, 1 etc) need to be formatted to 2 decimal places.
larger values (i.e. 0.5, 1, 1.5, 2 etc) need to be formatted to 1 decimal place.
very large values (i.e. 80, 90, 100, 110 etc) need no decimal places.
The actual values can be up to 3 decimal places (e.g. 0.306, 0.518 (small) 1.429, 1.806 (larger) 102.429, 160.806(very large))
My code builds up a script string and then uses ScriptManager.RegisterStartupScript to run the script.
I have tried to set the number of decimal places based on the values
Dim yAxisValue as Double= 0
Dim numberOfDP as Integer = 0
...
While reader.Read
If yAxisValue < reader.Item("YAxisValues").ToString Then
yAxisValue = reader.Item("YAxisValues").ToString
If Val(yAxisValue) < 1 Then
numberOfDP = 2
ElseIf Val(yAxisValue) < 10 Then
numberOfDP = 1
End If
End If
End While
MyScript = MyScript & "yAxis: {" & vbCrLf
MyScript = MyScript & "labels: {" & vbCrLf
MyScript = MyScript & "style: {color: 'black'," & vbCrLf
MyScript = MyScript & "'fontSize': '11pt'}," & vbCrLf
MyScript = MyScript & "format: '{value:. & numberOfDP & f}'" & vbCrLf
MyScript = MyScript & "}" & vbCrLf
MyScript = MyScript & "}" & vbCrLf
But I would rather base the formatting on the actual tick interval.
Is there any way I can do this?
I am not sure about ScriptManager.RegisterStartupScript, but in Highcharts you can set yAxis.labels.formatter and determine there how many decimals should be displayed, for example:
function formatter () {
var dec = this.axis.tickInterval > 1 ? 0 : (this.axis.tickInterval > 0.1 ? 1 : 2);
return this.value.toFixed(dec);
}
Now just use in yAxis options that:
yAxis: {
labels: {
formatter: formatter
}
},
And live demo for you: https://jsfiddle.net/4y8n33ob/

associative arrays in awk challenging memory limits

This is related to my recent post in Awk code with associative arrays -- array doesn't seem populated, but no error and also to optimizing loop, passing parameters from external file, naming array arguments within awk
My basic problem here is simply to compute from detailed ancient archival financial market data, daily aggregates of #transactions, #shares, value, BY DATE, FIRM-ID, EXCHANGE, etc. Learnt to use associative arrays in awk for this, and was thrilled to be able to process 129+ million lines in clock time of under 11 minutes. Literally before I finished my coffee.
Became a little more ambitious, and moved from 2 array subscripts to 4, and now I am unable to process more than 6500 lines at a time.
Get error messages of the form:
K:\User Folders\KRISHNANM\PAPERS\FII_Transaction_Data>zcat
RAW_DATA\2003_1.zip | gawk -f CODE\FII_daily_aggregates_v2.awk >
OUTPUT\2003_1.txt&
gawk: CODE\FII_daily_aggregates_v2.awk:33: (FILENAME=- FNR=49300)
fatal: more_no des: nextfree: can't allocate memory (Not enough space)
On some runs the machine has told me it lacks as little as 52 KB of memory. I have what I think of a std configuration with Win-7 and 8MB RAM.
(Economist by training, not computer scientist.) I realize that going from 2 to 4 arrays makes the problem computationally much more complex for the computer, but is there something one can do to improve memory management at least a little bit. I have tried closing everything else I am doing. The error always has to do only with memory, never with disk space or anything else.
Sample INPUT:
49290,C198962542782200306,6/30/2003,433581,F5811773991200306,S5405611832200306,B5086397478200306,NESTLE INDIA LTD.,INE239A01016,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,591.13,5655,3342840.15,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49291,C198962542782200306,6/30/2003,433563,F6292896459200306,S6344227311200306,B6110521493200306,GRASIM INDUSTRIES LTD.,INE047A01013,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,495.33,3700,1832721,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49292,C198962542782200306,6/30/2003,433681,F6513202607200306,S1724027402200306,B6372023178200306,HDFC BANK LTD,INE040A01018,6/26/2003,1,E745964372424200306,REG_DL_STLD_02,242,2600,629200,REG_DL_INSTR_EQ,REG_DL_DLAY_D,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49293,C7885768925200306,6/30/2003,48128,F4406661052200306,S7376401565200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,44600,5575000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49294,C7885768925200306,6/30/2003,48129,F4500260787200306,S1312094035200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,4,E912851176274200306,REG_DL_STLD_04,125,445600,55700000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49295,C7885768925200306,6/30/2003,48130,F6425024637200306,S2872499118200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,48000,6000000,REG_DL_INSTR_EU,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
Code
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
( $17~/_EQ\>/ ) { if (date[$10]++ == 0) date_list[d++] = $10;
if (isin[$9]++ == 0) isin_list[i++] = $9;
if (exch[$12]++ == 0) exch_list[e++] = $12;
if (fii[$5]++ == 0) fii_list[f++] = $5;
}
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
(( $17~/_EQ\>/ ) && ( $11~/1|2|3|5|9|1[24]/ )) {{ ++BNR[$10,$9,$12,$5]} {BSH[$10,$9,$12,$5] += $15} {BRV[$10,$9,$12,$5] += $16} }
(( $17~/_EQ\>/ ) && ( $11~/4|1[13]/ )) {{ ++SNR[$10,$9,$12,$5]} {SSH[$10,$9,$12,$5] += $15} {SRV[$10,$9,$12,$5] += $16} }
END {
{ print NR, "records processed."}
{ print " " }
{ printf("%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n", \
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV") }
{ for (u = 0; u < d; u++)
{
for (v = 0; v < i; v++)
{
for (w = 0; w < e; w++)
{
for (x = 0; x < f; x++)
#check first below for records with zeroes, don't print them
{ if (BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] + SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] > 0)
{ BR = BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SR = SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BS = BSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BV = BRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SS = SSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SV = SRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
{ printf("%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n", \
date_list[u], isin_list[v], exch_list[w], fii_list[x], BR, SR, BS, SS, BV, SV) } }
}
}
}
}
}
}
Expected output
6 records processed.
DATE ISIN EXCH FII BNR SNR BSH SSH BRV SRV
6/27/2003 INE239A01016 E9035083824200306 F5811773991200306 1 0 5655 0 3342840.15 0.00
6/27/2003 INE047A01013 E9035083824200306 F6292896459200306 1 0 3700 0 1832721.00 0.00
6/26/2003 INE040A01018 E745964372424200306 F6513202607200306 1 0 2600 0 629200.00 0.00
6/28/2003 INE585B01010 E912851176274200306 F4406661052200306 1 0 44600 0 5575000.00 0.00
6/28/2003 INE585B01010 E912851176274200306 F4500260787200306 0 1 0 445600 0.00 55700000.00
It is in this case that as the number of input records exceeds 6500, I end up having memory problems. Have about 7 million records in all.
For a 2 array subscript problem, albeit on a different data set, where 129+ million lines were processed in clock time of 11 minutes using the same GNU-AWK on the same machine, see optimizing loop, passing parameters from external file, naming array arguments within awk
Question: is it the case that awk is not very smart with memory management, but that some other more modern tools (say, SQL) would accomplish this task with the same memory resources? Or is this simply a characteristic of associative arrays, which I found magical in enabling me to avoid many passes over the data, many loops and SORT procedures, but which maybe work well up to 2 array subscripts, and then face exponential memory resource costs after that?
Afterword: the super-detailed almost-idiot-proof tutorial along with the code provided by Ed Morton in comments below makes a dramatic difference, especially his GAWK script tst.awk. He taught me about (a) using SUBSEP intelligently (b) tackling needless looping, which is crucial in this problem which tends to have very sparse arrays, with various AWK constructs. Compared to performance with my old code (only up to 6500 lines of input accepted on one machine, another couldn't even get that far), the performance of Ed Morton's tst.awk can be seen from the table below:
**filename start end min in ln out lines
2008_1 12:08:40 AM 12:27:18 AM 0:18 391438 301160
2008_2 12:27:18 AM 12:52:04 AM 0:24 402016 314177
2009_1 12:52:05 AM 1:05:15 AM 0:13 302081 238204
2009_2 1:05:15 AM 1:22:15 AM 0:17 360072 276768
2010_1 "slept" 507496 397533
2010_2 3:10:26 AM 3:10:50 AM 0:00 76200 58228
2010_3 3:10:50 AM 3:11:18 AM 0:00 80988 61725
2010_4 3:11:18 AM 3:11:47 AM 0:00 86923 65885
2010_5 3:11:47 AM 3:12:15 AM 0:00 80670 63059**
Times were obtained simply from using %time% on lines before and after tst.awk was executed, all put in a simple batch script, "min" is the clock time taken (per whatever rounding EXCEL does by default), "in ln" and "out lines" are lines of input and output, respectively. From processing the entire data that we have, from Jan 2003 to Jan 2014, we find the theoretical max number of output records = #dates*#ISINs*#Exchanges*#FIIs = 2992*2955*567*82268, while the actual number of total output lines is only 5,261,942, which is only 1.275*10^(-8) of the theoretical max -- very sparse indeed. That there was sparseness, we did guess earlier, but that the arrays could be SO sparse -- which matters a lot for memory management -- we had no way of telling till something actually completed, for a real data set. Time taken seems to increase exponentially in input size, but within limits that pose no practical difficulty. Thanks a ton, Ed.
There is no problem with associative arrays in general. In awk (except gawk for true 2D arrays) an associative array with 4 subscripts is identical to one with 2 subscripts since in reality it only has one subscript which is the concatenation of each of the pseudo-subscripts separated by SUBSEP.
Given you say I am unable to process more than 6500 lines at a time. the problem is far more likely to be in the way you wrote your code than any fundamental awk issue so if you'd like more help, post a small script with sample input and expected output that demonstrates your problem and attempted solution to see if we have suggestions on way to improve it's memory usage.
Given your posted script, I expect the problem is with those nested loops in your END section When you do:
for (i=1; i<=maxI; i++) {
for (j=1; j<=maxJ; j++) {
if ( arr[i,j] != 0 ) {
print arr[i,j]
}
}
}
you are CREATING arr[i,j] for every possible combination of i and j that didn't exist prior to the loop just by testing for arr[i,j] != 0. If you instead wrote:
for (i=1; i<=maxI; i++) {
for (j=1; j<=maxJ; j++) {
if ( (i,j) in arr ) {
print arr[i,j]
}
}
}
then the loop itself would not create new entries in arr[].
So change this block:
if (BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] + SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] > 0)
{
BR = BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SR = SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BS = BSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BV = BRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SS = SSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SV = SRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
which is probably unnecessarily turning each of BNR, SNR, BSH, BRV, SSH, and SRV into huge but highly sparse arrays, to something like this:
idx = date_list[u] SUBSEP isin_list[v] SUBSEP exch_list[w] SUBSEP fii_list[x]
BR = (idx in BNR ? BNR[idx] : 0)
SR = (idx in SNR ? SNR[idx] : 0)
if ( (BR + SR) > 0 )
{
BS = (idx in BSH ? BSH[idx] : 0)
BV = (idx in BRV ? BRV[idx] : 0)
SS = (idx in SSH ? SSH[idx] : 0)
SV = (idx in SRV ? SRV[idx] : 0)
and let us know if that helps. Also check your code for other places where you might be doing the same.
The reason you have this problem with 4 subscripts when you didn't with 2 is simply that you have 4 levels of nesting in the loops now creating much larger and more sparse arrays when when you just had 2.
Finally - you have some weird syntax in your script, some of which #MarkSetchell pointed out in a comment, and your script isn't as efficient as it could be since you're not using else statements and so testing for multiple conditions that can't possibly all be true and you're testing the same condition repeatedly, and it's not robust as you aren't anchoring your REs (e.g you test /4|1[13]/ instead of /^(4|1[13])$/ so for example your 4 would match on 14 or 41 etc. instead of just 4 on its own) so change your whole script to this:
$ cat tst.awk
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
$17 ~ /_EQ\>/ {
if (!seenDate[$10]++) date_list[++d] = $10
if (!seenIsin[$9]++) isin_list[++i] = $9
if (!seenExch[$12]++) exch_list[++e] = $12
if (!seenFii[$5]++) fii_list[++f] = $5
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
idx = $10 SUBSEP $9 SUBSEP $12 SUBSEP $5
if ( $11 ~ /^([12359]|1[24])$/ ) {
++BNR[idx]; BSH[idx] += $15; BRV[idx] += $16
}
else if ( $11 ~ /^(4|1[13])$/ ) {
++SNR[idx]; SSH[idx] += $15; SRV[idx] += $16
}
}
END {
print NR, "records processed."
print " "
printf "%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n",
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV"
for (u = 1; u <= d; u++)
{
for (v = 1; v <= i; v++)
{
for (w = 1; w <= e; w++)
{
for (x = 1; x <= f; x++)
{
#check first below for records with zeroes, don't print them
idx = date_list[u] SUBSEP isin_list[v] SUBSEP exch_list[w] SUBSEP fii_list[x]
BR = (idx in BNR ? BNR[idx] : 0)
SR = (idx in SNR ? SNR[idx] : 0)
if ( (BR + SR) > 0 )
{
BS = (idx in BSH ? BSH[idx] : 0)
BV = (idx in BRV ? BRV[idx] : 0)
SS = (idx in SSH ? SSH[idx] : 0)
SV = (idx in SRV ? SRV[idx] : 0)
printf "%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n",
date_list[u], isin_list[v], exch_list[w], fii_list[x], BR, SR, BS, SS, BV, SV
}
}
}
}
}
}
I added seen in front of 4 array names just because by convention arrays testing for the pre-existence of a value are typically named seen. Also, when populating the SNR[] etc arrays I created an idx variable first instead of repeatedly using the field numbers every time for both ease of changing it in future and mostly because string concatenation is relatively slow in awk and that's whats happening when you use multiple indices in an array so best to just do the string concatenation once explicitly. And I changed your date_list[] etc arrays to start at 1 instead of zero because all awk-generated arrays, strings and field numbers start at 1. You CAN create an array manually that starts at 0 or -357 or whatever number you want but it'll save shooting yourself in the foot some day if you always start them at 1.
I expect it could be made more efficient still by restricting the nested loops to only values that could exist for the enclosing loop index combinations (e.g. not every value of u+v+w is possible so there will be times when you shouldn't bother looping on x). For example:
$ cat tst.awk
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
$17 ~ /_EQ\>/ {
if (!seenDate[$10]++) date_list[++d] = $10
if (!seenIsin[$9]++) isin_list[++i] = $9
if (!seenExch[$12]++) exch_list[++e] = $12
if (!seenFii[$5]++) fii_list[++f] = $5
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
idx = $10 SUBSEP $9 SUBSEP $12 SUBSEP $5
if ( $11 ~ /^([12359]|1[24])$/ ) {
seen[$10,$9]
seen[$10,$9,$12]
++BNR[idx]; BSH[idx] += $15; BRV[idx] += $16
}
else if ( $11 ~ /^(4|1[13])$/ ) {
seen[$10,$9]
seen[$10,$9,$12]
++SNR[idx]; SSH[idx] += $15; SRV[idx] += $16
}
}
END {
printf "d = %d\n", d | "cat>&2"
printf "i = %d\n", i | "cat>&2"
printf "e = %d\n", e | "cat>&2"
printf "f = %d\n", f | "cat>&2"
print NR, "records processed."
print " "
printf "%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n",
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV"
for (u = 1; u <= d; u++)
{
date = date_list[u]
for (v = 1; v <= i; v++)
{
isin = isin_list[v]
if ( (date,isin) in seen )
{
for (w = 1; w <= e; w++)
{
exch = exch_list[w]
if ( (date,isin,exch) in seen )
{
for (x = 1; x <= f; x++)
{
fii = fii_list[x]
#check first below for records with zeroes, don't print them
idx = date SUBSEP isin SUBSEP exch SUBSEP fii
if ( (idx in BNR) || (idx in SNR) )
{
if (idx in BNR)
{
bnr = BNR[idx]
bsh = BSH[idx]
brv = BRV[idx]
}
else
{
bnr = bsh = brv = 0
}
if (idx in SNR)
{
snr = SNR[idx]
ssh = SSH[idx]
srv = SRV[idx]
}
else
{
snr = ssh = srv = 0
}
printf "%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n",
date, isin, exch, fii, bnr, snr, bsh, ssh, brv, srv
}
}
}
}
}
}
}
}

How do I add the total the value of a range of cells, when they contain both numbers and letters?

for example my cell contains T 6.5 I want to look for all cells in a row that contain T and add the values of the numbers also contained in that cell.
If you don't mind a row beneath your data to help total things, I would suggest using the following as a quick solution.
Assuming that your data is in Row 1 (starting in cell A1), insert the below formula in Row 2 (cell A2) and copy it to the right as far as you have data in Row 1.
=IF(IFERROR(SEARCH("T ",A1),0)<>0,VALUE(SUBSTITUTE(A1,"T ","")),0)
From there you can total the values across Row 2 using this:
=SUM(2:2)
Note that I assumed that there was a space after "T" in your example above and that this is explicitly included in the first formula above. It simply strips that text from the cell and adds up the remaining numerical value IF the cells in Row 1 have a "T " in them.
Hope this helps or points you in the right direction.
Cheers!
Here is an example for row #7
Sub SumARow()
Dim roww As Long, r As Range, _
Zum As Double, v As Variant
roww = 7
For Each r In Cells(roww, 1).EntireRow.Cells
v = CStr(r.Value)
If InStr(1, v, "T") > 0 Then
Zum = Zum + GetNumber(v)
End If
Next r
MsgBox Zum
End Sub
Public Function GetNumber(s As Variant) As Double
Dim msg As String, i As Long
GetNumber = 0
msg = ""
For i = 1 To Len(s)
ch = Mid(s, i, 1)
If ch Like "[0-9]" Or ch = "." Then
msg = msg & ch
End If
Next i
If msg = "" Then Exit Function
GetNumber = CDbl(msg)
End Function

How to generate random lines of text of a given length from a dictionary of words (bin-packing problem)?

I need to generate three lines of text (essentially jibberish) that are each 60 characters long, including a hard return at the end of each line. The lines are generated from a dictionary of words of various lengths (typically 1-8 characters). No word may be used more than once, and words must be separated by spaces. I think this is essentially a bin-packing problem.
The approach I've taken so far is to create a hashMap of the words, grouped by their lengths. I then choose a random length, pull a word of that length from the map, and append it to the end of the line I'm currently generating, accounting for spaces or a hard return. It works about half the time, but the other half of the time I'm getting stuck in an infinite loop and my program crashes.
One problem I'm running into is this: as I add random words to the lines, groups of words of a given length may become depleted. This is because there are not necessarily the same number of words of each length in the dictionary, e.g., there may only be one word with a length of 1. So, I might need a word of a given length, but there are no longer any words of that length available.
Below is a summary of what I have so far. I'm working in ActionScript, but would appreciate insight into this problem in any language. Many thanks in advance.
dictionary // map of words with word lengths as keys and arrays of corresponding words as values
lengths // array of word lengths, sorted numerically
min = lengths[0] // minimum word length
max = lengths[lengths.length - 1] // maximum word length
line = ""
while ( line.length < 60 ) {
len = lengths[round( rand() * ( lengths.length - 1 ) )]
if ( dictionary[len] != null && dictionary[len].length > 0 ) {
diff = 60 - line.length // number of characters needed to complete the line
if ( line.length + len + 1 == 60 ) {
// this word will complete the line exactly
line += dictionary[len].splice(0, 1) + "\n"
}
else if ( min + max + 2 >= diff ) {
// find the two word lengths that will complete the line
// ==> this is where I'm having trouble
}
else if ( line.length + len + 1 < 60 - max ) {
// this word will fit safely, so just add it
line += dictionary[len].splice(0, 1) + " "
}
if ( dictionary[len].length == 0 ) {
// delete any empty arrays and update min and max lengths accordingly
dictionary[len] = null
delete dictionary[len]
i = lengths.indexOf( len )
if ( i >= 0 ) {
// words of this length have been depleted, so
// update lengths array to ensure that next random
// length is valid
lengths.splice( i, 1 )
}
if ( lengths.indexOf( min ) == -1 ) {
// update the min
min = lengths[0]
}
if ( lengths.indexOf( max ) == -1 ) {
// update the max
max = lengths[lengths.length - 1]
}
}
}
}
You should think of an n-letter word as being n+1 letters, because each word has either a space or return after it.
Since all your words are at least 2 characters long, you don't ever want to get to a point where you have 59 characters filled in. If you get to 57, you need to pick something that is 2 letters plus the return. If you get to 58, you need a 1-letter word plus the return.
Are you trying to optimize for time? Can you have the same word multiple times? Multiple times in one line? Does it matter if your words are not uniformly distributed, e.g. a lot of lines contain "a" or "I" because those are the only one-letter words in English?
Here's the basic idea. For each line, start choosing word lengths, and keep track of the word lengths and total character count so far. As you get toward the end of the line, choose word lengths less than the number of characters you have left. (e.g. if you have 5 characters left, choose words in the range of 2-5 characters, counting the space.) If you get to 57 characters, pick a 3-letter word (counting return). If you get to 58 characters, pick a 2-letter word (counting return.)
If you want, you can shuffle the word lengths at this point, so all your lines don't end with short words. Then for each word length, pick a word of that length and plug it in.
dictionnary = Group your words by lengths (like you already do)
total_length = 0
phrase = ""
while (total_length < 60){
random_length = generate_random_number(1,8)
if (total_length + random_length > 60)
{
random_length = 60 - total_length // possibly - 1 if you cound \n and -2 if you
// append a blank anyway at the end
}
phrase += dictionnary.get_random_word_of_length(random_length) + " "
total_length += random_length + 1
}

Resources