Real length of a Thai UTF-8 encoded string in Delphi

Real length of a Thai UTF-8 encoded string in Delphi - delphi

Thai is a very special language. You can write vowels (32 in total) as in any other languages right after the consonant, or IN FRONT of it, or ON TOP of it, or ON THE BOTTOM of it (ok, just the short and long "u" sound can go on the bottom, but anyway...).
Furthermore, there are other modifiers (the 4 tone markers, the ga-ran, the mai-tai-ku and other ones) that can go ON TOP of an already existing vowel!
For example:
ที่ดีที่สุด (the best)
As you can see, if I try to print it with a monospaced font, the "real length" would be of 5 characters, but all the UTF-8 strlen routines give me back 11 characters - which is TOTALLY CORRECT, but I need to know the "actual space" that the string will use on screen/on printer, when printed monospaced.
Sure, an easy solution would be to list all the special characters that can go on the top or on the bottom of the word, and remove them from the total count.
Since I am not sure I can find all the special characters, is there already a routine made in any language so that I can translate it in Delphi?
Thank you

In C++:
/*---------------------------------------------------------------------------*/
/* thai_tcslen */
/*---------------------------------------------------------------------------*/
long thai_tcslen(_TCHAR *buff)
{
long bufpos = 0;
long normal_length = _tcslen(buff);
long thai_length = 0;
for (bufpos = 0; bufpos < normal_length; ++bufpos) {
if ( *(buff+bufpos) != _T('Ñ')/*mai han na kaad*//*-047*/
&& *(buff+bufpos) != _T('Ô')/*sara ee *//*-044*/
&& *(buff+bufpos) != _T('Õ')/*sara eeeee *//*-043*/
&& *(buff+bufpos) != _T('Ö')/*sara uu *//*-042*/
&& *(buff+bufpos) != _T('×')/*sara uuuuu *//*-041*/
&& *(buff+bufpos) != _T('Ø')/*sara oo *//*-040*/
&& *(buff+bufpos) != _T('Ù')/*sara ooooo *//*-039*/
&& *(buff+bufpos) != _T('ç')/*mai tai khoo *//*-025*/
&& *(buff+bufpos) != _T('è')/*mai aek *//*-024*/
&& *(buff+bufpos) != _T('é')/*mai toe *//*-023*/
&& *(buff+bufpos) != _T('ê')/*mai cha ta wah *//*-022*/
&& *(buff+bufpos) != _T('ë')/*mai tree *//*-021*/
&& *(buff+bufpos) != _T('ì')/*ka ran *//*-020*/
) {
++thai_length;
}
}
return thai_length;
} /* thai_tcslen */
in VB6:
Public Function ThaiStringLength(ByRef ThaiString As String) As Long
Dim b As String, noLengthChars(13) As Byte
b = ThaiString
noLengthChars(0) = 209
noLengthChars(1) = 212
noLengthChars(2) = 213
noLengthChars(3) = 214
noLengthChars(4) = 215
noLengthChars(5) = 216
noLengthChars(6) = 217
noLengthChars(7) = 231
noLengthChars(8) = 232
noLengthChars(9) = 233
noLengthChars(10) = 234
noLengthChars(11) = 235
noLengthChars(12) = 236
Dim o As Long
For o = 0 To 12
If InStr(b, Chr(noLengthChars(o))) > 0 Then
b = Replace(b, Chr(noLengthChars(o)), "")
End If
Next
ThaiStringLength = Len(b)
End Function

Related

Need DXL code to arrange attribute lines into table (converting DOORS data to LaTeX source)

I have a DXL script which parses all data in DOORS columns into a LaTeX -compatible text source file. What I can't figure out is how to re-order some data into a tabular - compatible format. The attributes in question are DXL links to a reference DOORS module, so there is one line (separated by a line-feed) per link in each cell. Currently I loop thru all columns for each object (row), with the code snippet (part of the full script)
for col in doorsModule do {
var_name = title( col )
if( ! main( col ) && search( regexp "Absolute Number", var_name, 0 ) == false )
{
// oss is my output stream variable
if ( length(text(col, obj) ) > 0 )
{
oss << "\\textbf{";
oss << var_name; // still the column title here
oss << "}\t"
var_name = text( col, obj );
oss << var_name;
oss << "\n\n";
c++;
}
}
}
Examples of the contents of a cell, where I have separately parsed the Column Name to bold and collected it prior to collecting the cell contents. All four lines are the contents of a single cell.
\textbf{LinkedItemName}
DISTANCE
MinSpeed
MaxSpeed
Time
\textbf{Unit}
m
km/h
km/h
minutes
\textbf{Driver1}
100
30
80
20
\textbf{Driver2}
50
20
60
10
\textbf{Driver3}
60
30
60
30
What I want to do is re-arrange the data so that I can write the source code for a table, to wit:
\textbf{LinkedItemName} & \textbf{Unit} & \textbf{Driver1} & \textbf{Driver2} & \textbf{Driver3} \\
DISTANCE & m & 100 & 50 & 60 \\
MinSpeed & km/h & 30 & 20 & 30 \\
MaxSpeed & km/h & 80 & 60 & 60 \\
Time & minutes & 20 & 10 & 30 \\
I know in advance the exact Attribute names I'm "collecting." I can't figure out how to manipulate the data returned from each cell (regex or otherwise) to create my desired final output. I'm guessing some regex code (in DXL) might be able to assign the contents of each line within a cell to a series of variables, but don't quite see how.

Combination of regex and string assembly seems to work. Here's a sample bit of code (some of which is straight from the DOORS DXL Reference Manual)
int idx = 0
Array thewords = create(1,1)
Array thelen = create(1,1)
Regexp getaline = regexp2 ".*"
// matches any character except newline
string txt1 = "line 1\nline two\nline three\n"
// 3 line string
while (!null txt1 && getaline txt1) {
int ilen = length(txt1[match 0])
print "ilen is " ilen "\n"
put(thelen, ilen, idx, 0)
putString(thewords,txt1[match 0],0,idx)
idx ++
// match 0 is whole of match
txt1 = txt1[end 0 + 2:] // move past newline
}
int jj
// initialize to simplify adding the "&"
int lenone = (int get(thelen,0,0) )
string foo = (string get(thewords, 0, 0,lenone ) )
int lenout
for (jj = 1; jj < idx; jj++) {
lenout = (int get(thelen,jj,0) )
foo = foo "&" (string get(thewords, 0, jj,lenout ) )
}
foo = foo "\\\\"
// foo is now "line 1&line two&line three\\ " (without quotes) as LaTeX wants

How To Write - If Item Does Not Match Multiple Numbers - In Swift?

I have a grid of buttons 11 buttons x 7 buttons. I need to check the tags of the buttons that are not on the outside edge of the grid. My current solution is to exclude the tags that are on the outside edge. The buttons are in an outlet collection. So the tags I need to exclude are 0-10, 21, 32, 43, 54, 65, 76, 0, 11, 22, 33, 44, 55, 66-76.
How do you check if an item in an if-statement, that's inside of a for-loop, matches multiple numbers? You can see the code below will become a mess if I play it out with 28 different || conditions. The numbers I'm trying to match are not in sequence.
for item in buttonOutlets {
if item.tag != 0 || item.tag != 1 || item.tag != 2 {
var tag = item.tag
var tagMinusOne: Int? = Int(item.tag) - 1
var tagMinusTen: Int? = Int(item.tag) - 10
var tagMinusEleven: Int? = Int(item.tag) - 11
var tagMinusTwelve: Int? = Int(item.tag) - 12
var titleLabel = item.titleLabel?.text
var minusTwelve: String? = buttonOutlets[tagMinusTwelve!].titleLabel?.text!
var minusOne: String? = buttonOutlets[tagMinusOne!].titleLabel?.text!
}
}
Here is what works but is a mess. Note that the tags are not in a perfect range. Updated code:
func checkForMatchingCells() {
for item in buttonOutlets {
var tag = item.tag as Int
if tag != 0
&& tag != 1
&& tag != 2
&& tag != 3
&& tag != 4
&& tag != 5
&& tag != 6
&& tag != 7
&& tag != 8
&& tag != 9
&& tag != 10
&& tag != 11
&& tag != 12
&& tag != 33
&& tag != 44
&& tag != 55
&& tag != 21
&& tag != 32
&& tag != 43
&& tag != 54
&& tag != 65
&& tag != 66
&& tag != 67
&& tag != 68
&& tag != 69
&& tag != 70
&& tag != 71
&& tag != 72
&& tag != 73
&& tag != 74
&& tag != 75
&& tag != 76 {
println(tag)
var tagMinusOne: Int? = Int(item.tag) - 1
var tagMinusTen: Int? = Int(item.tag) - 10
var tagMinusEleven: Int? = Int(item.tag) - 11
var tagMinusTwelve: Int? = Int(item.tag) - 12
var titleLabel = item.titleLabel?.text
var minusTwelve: String? = buttonOutlets[tagMinusTwelve!].titleLabel?.text
var minusOne: String? = buttonOutlets[tagMinusOne!].titleLabel?.text
println(minusOne)
}
}

I can think of two ways. First is with ClosedIntervals:
(0...10).contains(3) // true
Or if you have more complex numbers, you can use contains methods on other CollectionTypes:
Set([2, 5, 8]).contains(5) // true
Although I'm not sure what you're trying to do. (The statement x != 0 || x != 1 || x != 2 will return true for every number, for example. Maybe you meant x != 0 && x != 1 && x != 2?)
With your edit, the most effecient solution is to create a Set of things you want to exclude:
var toExclude: Set = [21, 32, 43, 54, 65, 76, 0, 11, 22, 33, 44, 55]
toExclude.unionInPlace(0...10)
toExclude.unionInPlace(66...76)
And then the condition in your if statement would be:
if !toExclude.contains(item.tag) {...

associative arrays in awk challenging memory limits

This is related to my recent post in Awk code with associative arrays -- array doesn't seem populated, but no error and also to optimizing loop, passing parameters from external file, naming array arguments within awk
My basic problem here is simply to compute from detailed ancient archival financial market data, daily aggregates of #transactions, #shares, value, BY DATE, FIRM-ID, EXCHANGE, etc. Learnt to use associative arrays in awk for this, and was thrilled to be able to process 129+ million lines in clock time of under 11 minutes. Literally before I finished my coffee.
Became a little more ambitious, and moved from 2 array subscripts to 4, and now I am unable to process more than 6500 lines at a time.
Get error messages of the form:
K:\User Folders\KRISHNANM\PAPERS\FII_Transaction_Data>zcat
RAW_DATA\2003_1.zip | gawk -f CODE\FII_daily_aggregates_v2.awk >
OUTPUT\2003_1.txt&
gawk: CODE\FII_daily_aggregates_v2.awk:33: (FILENAME=- FNR=49300)
fatal: more_no des: nextfree: can't allocate memory (Not enough space)
On some runs the machine has told me it lacks as little as 52 KB of memory. I have what I think of a std configuration with Win-7 and 8MB RAM.
(Economist by training, not computer scientist.) I realize that going from 2 to 4 arrays makes the problem computationally much more complex for the computer, but is there something one can do to improve memory management at least a little bit. I have tried closing everything else I am doing. The error always has to do only with memory, never with disk space or anything else.
Sample INPUT:
49290,C198962542782200306,6/30/2003,433581,F5811773991200306,S5405611832200306,B5086397478200306,NESTLE INDIA LTD.,INE239A01016,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,591.13,5655,3342840.15,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49291,C198962542782200306,6/30/2003,433563,F6292896459200306,S6344227311200306,B6110521493200306,GRASIM INDUSTRIES LTD.,INE047A01013,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,495.33,3700,1832721,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49292,C198962542782200306,6/30/2003,433681,F6513202607200306,S1724027402200306,B6372023178200306,HDFC BANK LTD,INE040A01018,6/26/2003,1,E745964372424200306,REG_DL_STLD_02,242,2600,629200,REG_DL_INSTR_EQ,REG_DL_DLAY_D,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49293,C7885768925200306,6/30/2003,48128,F4406661052200306,S7376401565200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,44600,5575000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49294,C7885768925200306,6/30/2003,48129,F4500260787200306,S1312094035200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,4,E912851176274200306,REG_DL_STLD_04,125,445600,55700000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49295,C7885768925200306,6/30/2003,48130,F6425024637200306,S2872499118200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,48000,6000000,REG_DL_INSTR_EU,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
Code
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
( $17~/_EQ\>/ ) { if (date[$10]++ == 0) date_list[d++] = $10;
if (isin[$9]++ == 0) isin_list[i++] = $9;
if (exch[$12]++ == 0) exch_list[e++] = $12;
if (fii[$5]++ == 0) fii_list[f++] = $5;
}
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
(( $17~/_EQ\>/ ) && ( $11~/1|2|3|5|9|1[24]/ )) {{ ++BNR[$10,$9,$12,$5]} {BSH[$10,$9,$12,$5] += $15} {BRV[$10,$9,$12,$5] += $16} }
(( $17~/_EQ\>/ ) && ( $11~/4|1[13]/ )) {{ ++SNR[$10,$9,$12,$5]} {SSH[$10,$9,$12,$5] += $15} {SRV[$10,$9,$12,$5] += $16} }
END {
{ print NR, "records processed."}
{ print " " }
{ printf("%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n", \
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV") }
{ for (u = 0; u < d; u++)
{
for (v = 0; v < i; v++)
{
for (w = 0; w < e; w++)
{
for (x = 0; x < f; x++)
#check first below for records with zeroes, don't print them
{ if (BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] + SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] > 0)
{ BR = BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SR = SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BS = BSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BV = BRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SS = SSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SV = SRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
{ printf("%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n", \
date_list[u], isin_list[v], exch_list[w], fii_list[x], BR, SR, BS, SS, BV, SV) } }
}
}
}
}
}
}
Expected output
6 records processed.
DATE ISIN EXCH FII BNR SNR BSH SSH BRV SRV
6/27/2003 INE239A01016 E9035083824200306 F5811773991200306 1 0 5655 0 3342840.15 0.00
6/27/2003 INE047A01013 E9035083824200306 F6292896459200306 1 0 3700 0 1832721.00 0.00
6/26/2003 INE040A01018 E745964372424200306 F6513202607200306 1 0 2600 0 629200.00 0.00
6/28/2003 INE585B01010 E912851176274200306 F4406661052200306 1 0 44600 0 5575000.00 0.00
6/28/2003 INE585B01010 E912851176274200306 F4500260787200306 0 1 0 445600 0.00 55700000.00
It is in this case that as the number of input records exceeds 6500, I end up having memory problems. Have about 7 million records in all.
For a 2 array subscript problem, albeit on a different data set, where 129+ million lines were processed in clock time of 11 minutes using the same GNU-AWK on the same machine, see optimizing loop, passing parameters from external file, naming array arguments within awk
Question: is it the case that awk is not very smart with memory management, but that some other more modern tools (say, SQL) would accomplish this task with the same memory resources? Or is this simply a characteristic of associative arrays, which I found magical in enabling me to avoid many passes over the data, many loops and SORT procedures, but which maybe work well up to 2 array subscripts, and then face exponential memory resource costs after that?
Afterword: the super-detailed almost-idiot-proof tutorial along with the code provided by Ed Morton in comments below makes a dramatic difference, especially his GAWK script tst.awk. He taught me about (a) using SUBSEP intelligently (b) tackling needless looping, which is crucial in this problem which tends to have very sparse arrays, with various AWK constructs. Compared to performance with my old code (only up to 6500 lines of input accepted on one machine, another couldn't even get that far), the performance of Ed Morton's tst.awk can be seen from the table below:
**filename start end min in ln out lines
2008_1 12:08:40 AM 12:27:18 AM 0:18 391438 301160
2008_2 12:27:18 AM 12:52:04 AM 0:24 402016 314177
2009_1 12:52:05 AM 1:05:15 AM 0:13 302081 238204
2009_2 1:05:15 AM 1:22:15 AM 0:17 360072 276768
2010_1 "slept" 507496 397533
2010_2 3:10:26 AM 3:10:50 AM 0:00 76200 58228
2010_3 3:10:50 AM 3:11:18 AM 0:00 80988 61725
2010_4 3:11:18 AM 3:11:47 AM 0:00 86923 65885
2010_5 3:11:47 AM 3:12:15 AM 0:00 80670 63059**
Times were obtained simply from using %time% on lines before and after tst.awk was executed, all put in a simple batch script, "min" is the clock time taken (per whatever rounding EXCEL does by default), "in ln" and "out lines" are lines of input and output, respectively. From processing the entire data that we have, from Jan 2003 to Jan 2014, we find the theoretical max number of output records = #dates*#ISINs*#Exchanges*#FIIs = 2992*2955*567*82268, while the actual number of total output lines is only 5,261,942, which is only 1.275*10^(-8) of the theoretical max -- very sparse indeed. That there was sparseness, we did guess earlier, but that the arrays could be SO sparse -- which matters a lot for memory management -- we had no way of telling till something actually completed, for a real data set. Time taken seems to increase exponentially in input size, but within limits that pose no practical difficulty. Thanks a ton, Ed.

There is no problem with associative arrays in general. In awk (except gawk for true 2D arrays) an associative array with 4 subscripts is identical to one with 2 subscripts since in reality it only has one subscript which is the concatenation of each of the pseudo-subscripts separated by SUBSEP.
Given you say I am unable to process more than 6500 lines at a time. the problem is far more likely to be in the way you wrote your code than any fundamental awk issue so if you'd like more help, post a small script with sample input and expected output that demonstrates your problem and attempted solution to see if we have suggestions on way to improve it's memory usage.
Given your posted script, I expect the problem is with those nested loops in your END section When you do:
for (i=1; i<=maxI; i++) {
for (j=1; j<=maxJ; j++) {
if ( arr[i,j] != 0 ) {
print arr[i,j]
}
}
}
you are CREATING arr[i,j] for every possible combination of i and j that didn't exist prior to the loop just by testing for arr[i,j] != 0. If you instead wrote:
for (i=1; i<=maxI; i++) {
for (j=1; j<=maxJ; j++) {
if ( (i,j) in arr ) {
print arr[i,j]
}
}
}
then the loop itself would not create new entries in arr[].
So change this block:
if (BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] + SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]] > 0)
{
BR = BNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SR = SNR[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BS = BSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
BV = BRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SS = SSH[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
SV = SRV[date_list[u],isin_list[v],exch_list[w],fii_list[x]]
which is probably unnecessarily turning each of BNR, SNR, BSH, BRV, SSH, and SRV into huge but highly sparse arrays, to something like this:
idx = date_list[u] SUBSEP isin_list[v] SUBSEP exch_list[w] SUBSEP fii_list[x]
BR = (idx in BNR ? BNR[idx] : 0)
SR = (idx in SNR ? SNR[idx] : 0)
if ( (BR + SR) > 0 )
{
BS = (idx in BSH ? BSH[idx] : 0)
BV = (idx in BRV ? BRV[idx] : 0)
SS = (idx in SSH ? SSH[idx] : 0)
SV = (idx in SRV ? SRV[idx] : 0)
and let us know if that helps. Also check your code for other places where you might be doing the same.
The reason you have this problem with 4 subscripts when you didn't with 2 is simply that you have 4 levels of nesting in the loops now creating much larger and more sparse arrays when when you just had 2.
Finally - you have some weird syntax in your script, some of which #MarkSetchell pointed out in a comment, and your script isn't as efficient as it could be since you're not using else statements and so testing for multiple conditions that can't possibly all be true and you're testing the same condition repeatedly, and it's not robust as you aren't anchoring your REs (e.g you test /4|1[13]/ instead of /^(4|1[13])$/ so for example your 4 would match on 14 or 41 etc. instead of just 4 on its own) so change your whole script to this:
$ cat tst.awk
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
$17 ~ /_EQ\>/ {
if (!seenDate[$10]++) date_list[++d] = $10
if (!seenIsin[$9]++) isin_list[++i] = $9
if (!seenExch[$12]++) exch_list[++e] = $12
if (!seenFii[$5]++) fii_list[++f] = $5
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
idx = $10 SUBSEP $9 SUBSEP $12 SUBSEP $5
if ( $11 ~ /^([12359]|1[24])$/ ) {
++BNR[idx]; BSH[idx] += $15; BRV[idx] += $16
}
else if ( $11 ~ /^(4|1[13])$/ ) {
++SNR[idx]; SSH[idx] += $15; SRV[idx] += $16
}
}
END {
print NR, "records processed."
print " "
printf "%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n",
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV"
for (u = 1; u <= d; u++)
{
for (v = 1; v <= i; v++)
{
for (w = 1; w <= e; w++)
{
for (x = 1; x <= f; x++)
{
#check first below for records with zeroes, don't print them
idx = date_list[u] SUBSEP isin_list[v] SUBSEP exch_list[w] SUBSEP fii_list[x]
BR = (idx in BNR ? BNR[idx] : 0)
SR = (idx in SNR ? SNR[idx] : 0)
if ( (BR + SR) > 0 )
{
BS = (idx in BSH ? BSH[idx] : 0)
BV = (idx in BRV ? BRV[idx] : 0)
SS = (idx in SSH ? SSH[idx] : 0)
SV = (idx in SRV ? SRV[idx] : 0)
printf "%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n",
date_list[u], isin_list[v], exch_list[w], fii_list[x], BR, SR, BS, SS, BV, SV
}
}
}
}
}
}
I added seen in front of 4 array names just because by convention arrays testing for the pre-existence of a value are typically named seen. Also, when populating the SNR[] etc arrays I created an idx variable first instead of repeatedly using the field numbers every time for both ease of changing it in future and mostly because string concatenation is relatively slow in awk and that's whats happening when you use multiple indices in an array so best to just do the string concatenation once explicitly. And I changed your date_list[] etc arrays to start at 1 instead of zero because all awk-generated arrays, strings and field numbers start at 1. You CAN create an array manually that starts at 0 or -357 or whatever number you want but it'll save shooting yourself in the foot some day if you always start them at 1.
I expect it could be made more efficient still by restricting the nested loops to only values that could exist for the enclosing loop index combinations (e.g. not every value of u+v+w is possible so there will be times when you shouldn't bother looping on x). For example:
$ cat tst.awk
BEGIN { FS = "," }
# For each array subscript variable -- DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5), after checking for type = EQ, set up counts for each value, and number of unique values.
$17 ~ /_EQ\>/ {
if (!seenDate[$10]++) date_list[++d] = $10
if (!seenIsin[$9]++) isin_list[++i] = $9
if (!seenExch[$12]++) exch_list[++e] = $12
if (!seenFii[$5]++) fii_list[++f] = $5
# For cash-in, buy (B), or cash-out, sell (S) count NR = no of records, SH = no of shares, RV = rupee-value.
idx = $10 SUBSEP $9 SUBSEP $12 SUBSEP $5
if ( $11 ~ /^([12359]|1[24])$/ ) {
seen[$10,$9]
seen[$10,$9,$12]
++BNR[idx]; BSH[idx] += $15; BRV[idx] += $16
}
else if ( $11 ~ /^(4|1[13])$/ ) {
seen[$10,$9]
seen[$10,$9,$12]
++SNR[idx]; SSH[idx] += $15; SRV[idx] += $16
}
}
END {
printf "d = %d\n", d | "cat>&2"
printf "i = %d\n", i | "cat>&2"
printf "e = %d\n", e | "cat>&2"
printf "f = %d\n", f | "cat>&2"
print NR, "records processed."
print " "
printf "%-11s\t%-13s\t%-20s\t%-19s\t%-7s\t%-7s\t%-14s\t%-14s\t%-18s\t%-18s\n",
"DATE", "ISIN", "EXCH", "FII", "BNR", "SNR", "BSH", "SSH", "BRV", "SRV"
for (u = 1; u <= d; u++)
{
date = date_list[u]
for (v = 1; v <= i; v++)
{
isin = isin_list[v]
if ( (date,isin) in seen )
{
for (w = 1; w <= e; w++)
{
exch = exch_list[w]
if ( (date,isin,exch) in seen )
{
for (x = 1; x <= f; x++)
{
fii = fii_list[x]
#check first below for records with zeroes, don't print them
idx = date SUBSEP isin SUBSEP exch SUBSEP fii
if ( (idx in BNR) || (idx in SNR) )
{
if (idx in BNR)
{
bnr = BNR[idx]
bsh = BSH[idx]
brv = BRV[idx]
}
else
{
bnr = bsh = brv = 0
}
if (idx in SNR)
{
snr = SNR[idx]
ssh = SSH[idx]
srv = SRV[idx]
}
else
{
snr = ssh = srv = 0
}
printf "%-11s\t%13s\t%20s\t%19s\t%7d\t%7d\t%14d\t%14d\t%18.2f\t%18.2f\n",
date, isin, exch, fii, bnr, snr, bsh, ssh, brv, srv
}
}
}
}
}
}
}
}

PNG validation on iOS

Writing a mapping application on iOS, making use of OpenStreetMap tiles.
Map tile images are downloaded asynchronously and stored in a dictionary, or persisted in a SQLite DB.
Occasionally, for whatever reason, while attempting to render a map tile image, I get the following error:
ImageIO: <ERROR> PNGinvalid distance too far back
This causes nasty black squares to appear over my map.
This is the piece of code in which this occurs:
NSData *imageData = [TileDownloader RetrieveDataAtTileX:(int)tilex Y:(int)tiley Zoom:(int)zoomLevel];
if (imageData != nil) {
NSLog(#"Obtained image data\n");
UIImage *img = [[UIImage imageWithData:imageData] retain];
// Perform the image render on the current UI context.
// ERROR OCCURS BETWEEN PUSH AND POP
UIGraphicsPushContext(context);
[img drawInRect:[self rectForMapRect:mapRect] blendMode:kCGBlendModeNormal alpha:1.0f];
UIGraphicsPopContext();
[img release];
}
Now, what I'm looking for is a way to ensure a png is valid before attempting to render it to my map.
Edit: The system also occasionally throws this error:
ImageIO: <ERROR> PNGIDAT: CRC error

I found this in other question and put together what solved the issue for me. Hope you find this helpful.
The PNG format has several built in checks. Each "chunk" has a CRC32 check, but to check that you'd need to read the full file.
A more basic check (not foolproof, of course) would be to read the start and ending of the file.
The first 8 bytes should always be the following (decimal) values { 137, 80, 78, 71, 13, 10, 26, 10 } (ref). In particular, the bytes second-to-fourth correspond to the ASCII string "PNG".
In hexadecimal:
89 50 4e 47 0d 0a 1a 0a
.. P N G ...........
You can also check the last 12 bytes of the file (IEND chunk). The middle 4 bytes should correspond to the ASCII string "IEND". More specifically the last 12 bytes should be (in hexa):
00 00 00 00 49 45 4e 44 ae 42 60 82
........... I E N D ...........
(Strictly speaking, it's not really obligatory for a PNG file to end with those 12 bytes, the IEND chunk itself signals the end of the PNG stream and so a file could in principle have extra trailing bytes which would be ignored by the PNG reader. In practice, this is extremely improbable).
Here is an implementation:
- (BOOL)dataIsValidPNG:(NSData *)data
{
if (!data || data.length < 12)
{
return NO;
}
NSInteger totalBytes = data.length;
const char *bytes = (const char *)[data bytes];
return (bytes[0] == (char)0x89 && // PNG
bytes[1] == (char)0x50 &&
bytes[2] == (char)0x4e &&
bytes[3] == (char)0x47 &&
bytes[4] == (char)0x0d &&
bytes[5] == (char)0x0a &&
bytes[6] == (char)0x1a &&
bytes[7] == (char)0x0a &&
bytes[totalBytes - 12] == (char)0x00 && // IEND
bytes[totalBytes - 11] == (char)0x00 &&
bytes[totalBytes - 10] == (char)0x00 &&
bytes[totalBytes - 9] == (char)0x00 &&
bytes[totalBytes - 8] == (char)0x49 &&
bytes[totalBytes - 7] == (char)0x45 &&
bytes[totalBytes - 6] == (char)0x4e &&
bytes[totalBytes - 5] == (char)0x44 &&
bytes[totalBytes - 4] == (char)0xae &&
bytes[totalBytes - 3] == (char)0x42 &&
bytes[totalBytes - 2] == (char)0x60 &&
bytes[totalBytes - 1] == (char)0x82);
}

I know this is a super old thread, but I was looking around for an NSData extension that actually validated the crc32's in the PNG data chunks. Having not found one, I adapted one from some other source.
This will actually flag bad PNG CRC's, which isn't done (shockingly) by most image libraries
static const unsigned int datacrc32_table[256] =
{
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419,
0x706af48f, 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4,
0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07,
0x90bf1d91, 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de,
0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, 0x136c9856,
0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9,
0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4,
0xa2677172, 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b,
0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3,
0x45df5c75, 0xdcd60dcf, 0xabd13d59, 0x26d930ac, 0x51de003a,
0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423, 0xcfba9599,
0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190,
0x01db7106, 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f,
0x9fbfe4a5, 0xe8b8d433, 0x7807c9a2, 0x0f00f934, 0x9609a88e,
0xe10e9818, 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01,
0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, 0x6c0695ed,
0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950,
0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3,
0xfbd44c65, 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2,
0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a,
0x346ed9fc, 0xad678846, 0xda60b8d0, 0x44042d73, 0x33031de5,
0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa, 0xbe0b1010,
0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17,
0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6,
0x03b6e20c, 0x74b1d29a, 0xead54739, 0x9dd277af, 0x04db2615,
0x73dc1683, 0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8,
0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1, 0xf00f9344,
0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb,
0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a,
0x67dd4acc, 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5,
0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1,
0xa6bc5767, 0x3fb506dd, 0x48b2364b, 0xd80d2bda, 0xaf0a1b4c,
0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55, 0x316e8eef,
0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe,
0xb2bd0b28, 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31,
0x2cd99e8b, 0x5bdeae1d, 0x9b64c2b0, 0xec63f226, 0x756aa39c,
0x026d930a, 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713,
0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, 0x92d28e9b,
0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242,
0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1,
0x18b74777, 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c,
0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45, 0xa00ae278,
0xd70dd2ee, 0x4e048354, 0x3903b3c2, 0xa7672661, 0xd06016f7,
0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc, 0x40df0b66,
0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605,
0xcdd70693, 0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8,
0x5d681b02, 0x2a6f2b94, 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b,
0x2d02ef8d
};
unsigned int
datacrc32 (unsigned int crc, unsigned char *buf, int len)
{
unsigned char *end;
crc = ~crc;
for (end = buf + len; buf < end; ++buf)
crc = datacrc32_table[(crc ^ *buf) & 0xff] ^ (crc >> 8);
return ~crc;
}
-(BOOL)isCRCValidPNG {
char chnk [5];
int l = 0;
int size = (int)[self length];
unsigned int crc = 0;
unsigned char c;
unsigned int csum = 0;
unsigned char b;
unsigned char *tileBytes = (unsigned char *)[self bytes];
if (self.length > 8){
const unsigned char pngHeaderBytes[] = { 0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a };
for (int i = 0 ; i < 8 ; ++i){
if (tileBytes[i] != pngHeaderBytes[i])
return NO;
}
}
// process chunks
int bytePtr = 8;
strcpy (chnk, "");
do
{
// get chunk size
if (bytePtr+4 > size)
return NO;
l = 0;
for (int i = 0; i < 4; i++)
{
l = (l << 8) + tileBytes[bytePtr++];
}
printf("l is %08x",l);
// get chunk name
crc = 0;
strcpy (chnk, "");
if (bytePtr+4 > size)
return NO;
for (int i = 0; i < 4; i++)
{
c = tileBytes[bytePtr++];
crc = datacrc32 (crc, &c, 1);
chnk[i] = (char) c;
}
chnk[4] = '\0';
printf ("%s (%3d )", chnk, l);
// chunk data
if (bytePtr+l > size)
return NO;
for (int i = 0; i < l; i++)
{
c = tileBytes[bytePtr++];
crc = datacrc32 (crc, &c, 1);
}
// checksum
csum = 0;
if (bytePtr+4 > size)
return NO;
for (int i = 0; i < 4; i++)
{
c = tileBytes[bytePtr++];
csum = (csum << 8) + (int) c;
b = (unsigned char) ((crc >> 8 * (3 - i)) & 0xFF);
// printf ("b = %02x\n", b);
}
if (crc == csum)
NSLog(#"Chunk %s validated",chnk);
else
NSLog(#"chunk %s invalid ",chnk);
if (crc != csum)
return NO;
}
while (strcmp (chnk, "IEND") != 0);
return YES;
}

Switched from my own Asynchronous Download Queue Manager to the All Seeing I implementation. Problem became a moot point.

The Swift Version
func checkPNGImageDataFormat(_ imageData:Data) -> Bool
{
//More expensive since it has to go through entire data
//Check entire header magic number and IEND trailer in PNG data
var status:Bool = true
if(imageData.count < 12)
{
return false
}
let totalBytes = imageData.count
let bytes = imageData.withUnsafeBytes {
[UInt8](UnsafeBufferPointer(start: $0, count: totalBytes))
}
let header:Bool = bytes[0] == 0x89 && bytes[1] == 0x50 && bytes[2] == 0x4e && bytes[3] == 0x47 && bytes[4] == 0x0d && bytes[5] == 0x0a && bytes[6] == 0x1a && bytes[7] == 0x0a
let iend:Bool = bytes[totalBytes - 12] == 0x00 && bytes[totalBytes - 11] == 0x00 && bytes[totalBytes - 10] == 0x00 && bytes[totalBytes - 9] == 0x00 && bytes[totalBytes - 8] == 0x49 && bytes[totalBytes - 7] == 0x45 && bytes[totalBytes - 6] == 0x4e && bytes[totalBytes - 5] == 0x44 && bytes[totalBytes - 4] == 0xae && bytes[totalBytes - 3] == 0x42 && bytes[totalBytes - 2] == 0x60 && bytes[totalBytes - 1] == 0x82
status = header && iend
return status
}

scanf and gets buffer

im having a problem with scanf and gets. and I kno that its bound to errors but I couldn't find any other way. This way, the name is printing out but It doesn't print out the first letter of it.
Here's my code:
#include <stdio.h>
float calculations(int age, float highBP, float lowBP);
char option;
int counter, age;
char temp_name[50];
float highBP, lowBP, riskF, optimalH = 120.0, optimalL = 80.0;
typedef struct {
char name[50]; /*which represents the patient’s name*/
int age; /*which represents the patient’s age*/
float highBP; /*highBP, which represents the patient’s high (systolic) blood pressure*/
float lowBP; /*lowBP, which represents the patient’s low (diastolic) blood pressure*/
float riskF; /*riskFactor, which represents the patient’s risk factor for stroke due to hypertension.*/
}patient;/*end structure patient*/
patient *pRecords[30];
void printMenu()
{
printf("\n---------------------------------------------------------\n");
printf("|\t(N)ew record\t(D)isplay db\t(U)pdate record\t|\n");
printf("|\t(L)oad disk\t(W)rite disk\t(E)mpty disk\t|\n");
printf("|\t(S)ort db\t(C)lear db\t(Q)uit \t\t|\n");
printf("---------------------------------------------------------\n");
printf("choose one:");
}/*end print menu*/
void enter()
{
if(counter == 30)
printf("database full.");
else{
printf("name: ");
while(getchar()=='\n');
gets(temp_name);
strcpy(pRecords[counter]->name , temp_name);
printf("age: "); scanf("%d", &age);
pRecords[counter]->age = age;
printf("highBP: "); scanf("%f", &highBP);
pRecords[counter]->highBP = highBP;
printf("lowBP: "); scanf("%f", &lowBP);
pRecords[counter]->lowBP = lowBP;
float temp = calculations(age, highBP,lowBP);
pRecords[counter]->riskF = temp;
/*printf("name: %s, age: %d, highbp:%.1f, lowBP:%.1f\n", pRecords[counter]->name,pRecords[counter]->age,pRecords[counter]->highBP,pRecords[counter]->lowBP);
printf("risk factor: %.1f\n", pRecords[counter]->riskF);*/
counter ++;
}
}/*end of void enter function*/
memallocate(int counter){
pRecords[counter] = (patient *)malloc (sizeof(patient));
}/*end memallocate function*/
void display()
{
printf("===============================\n");
int i;
for(i=0; i<counter; i++)
{
printf("name: %s\n", pRecords[i]->name);
printf("age: %d\n", pRecords[i]->age);
printf("bp: %.2f %.2f\n", pRecords[i]->highBP, pRecords[i]->lowBP);
printf("risk: %.2f\n\n", pRecords[i]->riskF);
}/*end of for loop*/
printf("========== %d records ==========", counter);
}/*end of display method*/
float calculations(int age, float highBP, float lowBP)
{ float risk;
if((highBP <= optimalH) && (lowBP <= optimalL))
{ risk = 0.0;
if(age >=50)
risk = 0.5;
}
else if(highBP <= optimalH && (lowBP>optimalL && lowBP <=(optimalL+10)))
{ risk= 1.0;
if(age >=50)
risk = 1.5;
}
else if ((highBP >optimalH && highBP <= (optimalH+10))&& lowBP <=optimalL)
{ risk= 1.0;
if(age >=50)
risk= 1.5;
}
else if((highBP > optimalH && highBP <=(optimalH+10)) && (lowBP >optimalL && lowBP <= (optimalL+10)))
{ risk= 2.0;
if(age >=50)
risk = 2.5;
}
else if(highBP < optimalH && (lowBP >(optimalL+11) && lowBP<(optimalL+20)))
{ risk = 3.0;
if(age >=50)
risk = 3.5;
}
else if((lowBP < optimalL) && (highBP >(optimalH+11) && highBP<(optimalH+20)))
{ risk = 3.0;
if(age >=50)
risk = 3.5;
}
else if((highBP>=(optimalH+11) && highBP <= (optimalH+20))&& (lowBP>=(optimalL+11) && lowBP<=(optimalL+20)))
{ risk = 4.0;
if(age >=50)
risk = 4.5;
}
else
{ risk = 5.0;
if(age >=50)
risk = 5.5;
}
return risk;
}/*end of calculation function*/
main()
{
printMenu();
char option=getchar();
while(option != 'q' || option != 'Q'){
if(option == 'N' || option == 'n')
{
memallocate(counter);
enter();
printMenu();
}
if (option == 'L' || option == 'l')
{
printMenu();
}
if(option == 'S' || option == 's')
{
printMenu();
}
if(option == 'D' || option == 'd')
{
display();
printMenu();
}
if(option == 'W' || option == 'w')
{
printMenu();
}
if(option == 'C' || option == 'c')
{
printMenu();
}
if(option == 'U' || option == 'u')
{
printMenu();
}
if(option == 'E' || option == 'e')
{
printMenu();
}
if(option == 'Q' || option == 'q')
{
exit(0);
}
option = getchar();
}/*end while*/
system("pause");
}/*end main*/
sample output:
---------------------------------------------------------
| (N)ew record (D)isplay db (U)pdate record |
| (L)oad disk (W)rite disk (E)mpty disk |
| (S)ort db (C)lear db (Q)uit |
---------------------------------------------------------
choose one: n
name: judy
age: 30
high bp: 110
low bp: 88
3
---------------------------------------------------------
| (N)ew record (D)isplay db (U)pdate record |
| (L)oad disk (W)rite disk (E)mpty disk |
| (S)ort db (C)lear db (Q)uit |
---------------------------------------------------------
choose one: n
name: cindy white
age: 52
high bp: 100.7
low bp: 89.4
---------------------------------------------------------
| (N)ew record (D)isplay db (U)pdate record |
| (L)oad disk (W)rite disk (E)mpty disk |
| (S)ort db (C)lear db (Q)uit |
---------------------------------------------------------
choose one: d
===============================
name: udy
age: 30
bp: 110.00 88.00
risk: 1.0
name: indy white
age: 52
bp: 100.70 89.40
risk: 1.5
========== 2 records ==========

Your while loop and use of gets() is generally not good practice.
Try something like:
fflush(stdin);
fgets(pRecords[counter]->name, sizeof(pRecords[counter]->name), stdin);
Try
if (strlen(pRecords[counter]->name) > 0)
{
pRecords[counter]->name[strlen(pRecords[counter]->name) - 1] = '\0';
}

You lose the first character to while(getchar()=='\n');. I don't know why that statement is necessary, but it loops until it gets a character that is not '\n' (which is 'j' and 'c' in your case).

while (getchar() == '\n');
This line eats the newlines plus one character. When getchar() does not return a newline, it has already consumed the first character.
Look at ungetc() to write that character back onto the stream.

This:
while(getchar()=='\n');
loops until it gets a non-newline, which will be the first character of the name.
Try this instead:
do
c = getchar();
while(c == '\n');
ungetc(c, stdin);

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Real length of a Thai UTF-8 encoded string in Delphi - delphi

Related

Need DXL code to arrange attribute lines into table (converting DOORS data to LaTeX source)

How To Write - If Item Does Not Match Multiple Numbers - In Swift?

associative arrays in awk challenging memory limits

PNG validation on iOS

scanf and gets buffer

Categories

Resources