How to fetch string using lua pattern matching - lua

below is my string
local Amount =[[
Customer Details Net Amount
# Seq Name
Amount NTR
1 CDABCDEFGHIJ00564
0,1234
2 CDABCDEFGHIJ00565
0,0361
3 CDABCDEFGHIJ00566
0,0361
4 CDABCDEFGHIJ00567
0,0722
5 CDABCDEFGHIJ00568
0,0000
6 CDABCDEFGHIJ00569
0,0000
7 CDABCDEFGHIJ00570
0,0000
8 CDABCDEFGHIJ00571
0,7091
9 CDABCDEFGHIJ00572
1,4240
10 CDABCDEFGHIJ00573
0,0361
11 CDABCDEFGHIJ00574
0,5790
12 CDABCDEFGHIJ00575
0,4060
13 CDABCDEFGHIJ00576
0,3610
14 CDABCDEFGHIJ00577
0,6859
15 CDABCDEFGHIJ00578
0,2888
16 CDABCDEFGHIJ00579
0,0000
17 CDABCDEFGHIJ00580
0,0000
18 CDABCDEFGHIJ00581
0,0000
19 CDABCDEFGHIJ00582
0,0000
20 CDABCDEFGHIJ00583
0,0000
21 CDABCDEFGHIJ00584
0,0000
22 CDABCDEFGHIJ00585
0,8978
23 CDABCDEFGHIJ00586
0,0000
24 CDABCDEFGHIJ00587
2,3882
25 CDABCDEFGHIJ00588
0,0000
26 CDABCDEFGHIJ00589
2,0216
27 CDABCDEFGHIJ00590
1,7540
28 CDABCDEFGHIJ00591
0,0000
29 CDABCDEFGHIJ00592
0,0722
30 CDABCDEFGHIJ00593
0,0361
31 CDABCDEFGHIJ00594
0,0000
32 CDABCDEFGHIJ00595
0,0000
Total NAT files
11,9269
Direct inquiries to:
]]
by executing the code below
local ptrn = '\n([%d%p]+)\n'
for val1, val2 in string.gmatch(Amount, ptrn) do
print ("val1:=\t" .. (val1 or '').."\tval2:=\t"..(val2 or ''))
end
basically from the above string I want to fetch the last 5 digits of the string which is 00564 in val1 and the amount which is 0,1234 in val2 variable, but all this should in one pattern. This is a record, every record is starting with a number like this is 1 record or row
1 CDABCDEFGHIJ00564
0,1234
and this is 2nd record or row and so on
2 CDABCDEFGHIJ00565
0,0361
plese help....

It seems to me that %d+%s+%a+(%d+)\n%s*([%d,]+) should do the trick: the first %d+ will catch the row number, %s+ to match the white space after. %a+(%d+) will match CDABCDEFGHIJ00592 and capture the digits in the end (no way to specify that you want exactly five digits though). \n%s* will match the newline and any white space on the next line and ([%d,]+) will capture the last number with the comma.

Related

How T Transpose Multiple Columns Values by Groups between groups delimiters in adjacent Column Google Sheets?

I have the following minimal example data (in reality 100's of groups) in range A1:P9 (same data in range A14:A22):
With Sample A1:AR9:
2
61
219
2
4
2
:
61
219
26
26
26
94
21
33
4
26
26
26
94
2
2
:
154
26
40
19
3
2
21
33
14
1
2
3
:
87
39
54
38
26
32
38
26
32
87
39
54
38
26
23
23
4
6
28
2
154
26
2
2
40
19
14
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
23
2
23
4
4
3
6
20
28
Or Sample A14:AQ22:
2
61
219
2
:
61
219
4
:
26
26
26
94
2
:
21
33
4
26
26
26
94
2
:
154
26
2
:
40
19
3
2
21
33
14
:
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
:
23
2
:
23
4
:
3
6
20
2
154
26
2
2
40
19
14
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
23
2
23
4
4
3
6
20
28
I need the output as shown in range Q1:AR3 or as in range Q14:AQ16.
Basically, at each group delimited/inbetween values in Column A, I would need:
The intemediary adjacent values in Column B to be transposed horizontally
And the adjacent content of Columns C to P (14 Columns, at least) to be "joined" together horizontaly an sequencialy "per group", including the content of the delimiter's row (in Column A).
As a bonus it would be really nice to have the Transposed data followed by a :, and each sub Content of Columns C to P to be also separated by a | (as shown in screenshot Q1:AR3 or Q14:AR16).
(Or if it's more feasible, alternatively, the simpler to read 2nd model as in A14:AQ22).
I have a really hard time putting together a formula to come to the expected result.
All I could think of was:
Transposing Column B's content by getting the rows of the adjacent Cells with values in column A,
Concatenating with the Column letter,
Duplicating it in a new column, and Filtering out the blank intermediary cells,
Then shifting the duplicated column 1 cell up,
Then concatenating within a TRANSPOSE formula to get the range of the groups,
Then finally transposing all the groups from Columns B in a new Colum
(very convoluted but I couldn't find better way).
To get to that input:
=TRANSPOSE(B1:B3)
=TRANSPOSE(B4:B5)
=TRANSPOSE(B7:B9)
That was already a very manual and error prone process, and still I could not successfully think of how to do the remaining content joining of Column C to P in a formula.
I tested the following approach but it's not working and would be very tedious process to fix to go and to implement on large datasets:
=TRANSPOSE(B1:B3)&": "&JOIN( " | " , FILTER(C1:P1, NOT(C2:P2 = "") ))&JOIN( " | " , FILTER(C2:P2, NOT(C2:P2 = "") ))&JOIN( " | " , FILTER(C43:P3, NOT(C3:P3 = "") ))
=TRANSPOSE(B4:B5)&": "&JOIN( " | " , FILTER(C4:P4, NOT(C4:P4 = "") ))&JOIN( " | " , FILTER(C5:P5, NOT(C5:P5 = "") ))
=TRANSPOSE(B6:B9)&": "&JOIN( " | " , FILTER(C6:P6, NOT(C6:P6 = "") ))&JOIN( " | " , FILTER(C7:P7, NOT(C7:P7 = "") ))&JOIN( " | " , FILTER(C8:P8, NOT(C8:P8 = "") ))&JOIN( " | " , FILTER(C8:P8, NOT(C9:P9 = "") ))
What better approach to favor toward the expected result? Preferably with a Formula, or if not possible with a script.
Any help is greatly appreciated.
For Sample 1 try this out:
=LAMBDA(norm,MAP(UNIQUE(norm),LAMBDA(ζ,{TRANSPOSE(FILTER(B1:B9,norm=ζ)),":",SPLIT(BYROW(TRANSPOSE(FILTER(BYROW(C1:P9,LAMBDA(r,TEXTJOIN("ζ",1,r))),norm=ζ)),LAMBDA(rr,TEXTJOIN("γ|γ",1,rr))),"ζγ")})))(SORT(SCAN(,SORT(A1:A9,ROW(A1:A9),),LAMBDA(a,c,IF(c="",a,c))),ROW(A1:A9),))

YouTube Data API returning inconsistent results with duplicates

There have been numerous questions about inconsistent results from the YouTube Data API: 1, 2, 3, 4, 5, 6. Most of them have accepted answers that seem to indicate there was a problem with the API request that was fixed by the instructions in the answers. But none of those situations apply to the API request discussed here.
There have also been two questions about duplicates in the API results: 1, 2. Both of them have the same answer, which says to use the next-page token. But both questions say the token was used, so that answer is not helpful.
Yesterday, I submitted a series of API requests to get the list of most-viewed videos about 3D printing. The first request in the series was:
https://www.googleapis.com/youtube/v3/search?q=3D print&type=video&maxResults=50&part=id,snippet&order=viewCount&key=<my key>
I ran that in a VBA sub, which took the next-page token from each result and resubmitted the URL with &pageToken=<nextPageToken> inserted.
The result was a list of 649 unique video IDs. So far so good.
After making some changes in the VBA code and seeing some duplicates in the result set, I went back today and ran the original VBA sub again. The result was again a list of 649 video IDs, but this time the list included duplicates and it also included IDs that were not in yesterday's list and was missing IDs that were there yesterday. Here is a comparison from the first two pages and the last two pages of the two result sets:
Page
# on page
# overall
Run 1
Run 2
Same as
Seq
Dup
1
1
1
f2mdMcf-fJs
f2mdMcf-fJs
1
1
2
2
WSauz5KVKTU
WSauz5KVKTU
2
Seq
1
3
3
zsSCUWs7k9Q
XYIUM5TkhMo
None
1
4
4
B5Q1J5c8oNc
zsSCUWs7k9Q
3
Seq
1
5
5
cUxIb3Pt-hQ
B5Q1J5c8oNc
4
Seq
1
6
6
4yyOOn7pWnA
LDjE28szwr8
None
1
7
7
3N46jQ0Xi3c
cUxIb3Pt-hQ
5
Seq
1
8
8
08dBVz8_VzU
4yyOOn7pWnA
6
Seq
...
1
13
13
oeKIe1ik2O8
e1rQ8YwNSDs
11
Seq
1
14
14
FrG_eSECfps
RVB2JreIcoc
12
Seq
1
15
15
pPQCwz2q96o
oeKIe1ik2O8
13
Seq
1
16
16
uo3KuoEiu3I
pPQCwz2q96o
15
NOT
1
17
17
0U6aIwd5h9s
uo3KuoEiu3I
16
Seq
...
1
47
47
ShGsW68zbIo
iu9rhqsvrPs
46
Seq
1
48
48
0q0xS7W78KQ
ShGsW68zbIo
47
Seq
1
49
49
UheJQsXOAnY
0q0xS7W78KQ
48
Seq
Dup
1
50
50
H8AcqOh0wis
H8AcqOh0wis
50
NOT
Dup
2
1
51
EWq3-2VuqbQ
0q0xS7W78KQ
48
NOT
Dup
2
2
52
scuTZza4f_o
H8AcqOh0wis
50
NOT
Dup
2
3
53
bJWJW-mz4_U
UheJQsXOAnY
49
NOT
2
4
54
Ii4VYsh9OlM
EWq3-2VuqbQ
51
NOT
2
5
55
r2-OGUu57pU
scuTZza4f_o
52
Seq
2
6
56
8KTnu18Mi9Q
bJWJW-mz4_U
53
Seq
2
7
57
DconsfGsXyA
Ii4VYsh9OlM
54
Seq
2
8
58
UttEvLJP3l8
8KTnu18Mi9Q
56
NOT
2
9
59
GJOOLH9ZP2I
DconsfGsXyA
57
Seq
2
10
60
ewgmg9Q5Ab8
UttEvLJP3l8
58
Seq
...
13
35
635
qHpR_p8lA4I
FFVOzo7tSV8
639
Seq
13
36
636
DplwDDZNTRc
76IBjdM9s6g
640
Seq
13
37
637
3AObqGsimr8
qEh0uZuu7_U
None
13
38
638
88keQ4PWH18
RhfGJduOlrw
641
Seq
13
39
639
FFVOzo7tSV8
QxzH9QkirCU
643
NOT
13
40
640
76IBjdM9s6g
Qsgz4GbL8O4
None
13
41
641
RhfGJduOlrw
BSgg7mEzfqY
644
Seq
13
42
642
lVEqwV0Nlzg
VcmjbJ2q8-w
645
Seq
13
43
643
QxzH9QkirCU
gOU0BCL-TXs
None
13
44
644
BSgg7mEzfqY
IoOXQUcW24s
646
Seq
13
45
645
VcmjbJ2q8-w
o4_2_a6LzFU
647
Seq
Dup
14
1
646
IoOXQUcW24s
o4_2_a6LzFU
647
NOT
Dup
14
2
647
o4_2_a6LzFU
ijVPcGaqVjc
648
Seq
14
3
648
ijVPcGaqVjc
nk3FlgEuG-s
649
Seq
14
4
649
nk3FlgEuG-s
27ZLFn8Dejg
None
The last three columns have the following meanings:
Same as: If an ID from Run 2 is the same as an ID from Run 1, then this column has the # overall for Run 1.
Seq: Indicates whether the number in column "Same as" is one more than the previous number in that column.
Dup: Indicates whether an ID from Run 2 occurred more than once in that run.
Problems:
The videos XYIUM5TkhMo, LDjE28szwr8, qEh0uZuu7_U, Qsgz4GbL8O4, gOU0BCL-TXs, and 27ZLFn8Dejg were returned as #3, 6, 637, 640, 643, and 649 in Run 2, but were not returned at all in Run 1.
The videos FrG_eSECfps, r2-OGUu57pU, lVEqwV0Nlzg were returned as #14, 55, 642, in Run 1, but were not in Run 2.
The videos 0q0xS7W78KQ, H8AcqOh0wis, and o4_2_a6LzFU were returned as #49, 50, and 645 in Run 2, but then each appears a second time in that run (as well as appearing in Run 1 as #48, 50, and 647).
These results are troubling. They mean that no single search will return a reliable list of videos for a given value of q.
I mentioned at the beginning that previous questions about inconsistent results from the YouTube Data API had answers that seemed to resolve those inconsistencies. Is there a way to do that for this search? Is there something wrong with the way I'm composing the search that is causing the problem?
If there isn't a way to fix the search, then I suppose the only way to get a list of videos on the topic with high confidence of it being complete is to run the search multiple times and merge the results until no new IDs appear that were not in a previous result set. But even then, one would not know if there are other videos lurking undetected.

Get a list of function results until result > x

I basically want the same thing as this OP:
Is there a J idiom for adding to a list until a certain condition is met?
But I cant get the answers to work with OP's function or my own.
I will rephrase the question and write about the answers at the bottom.
I am trying to create a function that will return a list of fibonacci numbers less than 2.000.000. (without writing "while" inside the function).
Here is what i have tried:
First, i picked a way to culculate fibonacci numbers from this site:
https://code.jsoftware.com/wiki/Essays/Fibonacci_Sequence
fib =: (i. +/ .! i.#-)"0
echo fib i.10
0 1 1 2 3 5 8 13 21 34
Then I made an arbitrary list I knew was larger than what I needed. :
fiblist =: (fib i.40) NB. THIS IS A BAD SOLUTION!
Finally, I removed the numbers that were greater than what I needed:
result =: (fiblist < 2e6) # fiblist
echo result
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1.34627e6
This gets the right result, but is there a way to avoid using some arbitrary number like
40 in "fib i.40" ?
I would like to write a function, such that "func 2e6" returns the list of fibonacci numbers below 2.000.000. (without writing "while" inside the function).
echo func 2e6
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1.34627e6
here are the answers from the other question:
first answer:
2 *^:(100&>#:])^:_"0 (1 3 5 7 9 11)
128 192 160 112 144 176
second answer:
+:^:(100&>)^:(<_) ] 3
3 6 12 24 48 96 192
As I understand it, I just need to replace the functions used in the answers, but i dont see how
that can work. For example, if I try:
echo (, [: +/ _2&{.)^:(100&>#:])^:_ i.2
I get an error.
I approached it this way. First I want to have a way of generating the nth Fibonacci number, and I used f0b from your link to the Jsoftware Essays.
f0b=: (-&2 +&$: -&1) ^: (1&<) M.
Once I had that I just want to put it into a verb that will check to see if the result of f0b is less than a certain amount (I used 1000) and if it was then I incremented the input and went through the process again. This is the ($:#:>:) part. $: is Self-Reference. The right 0 argument is the starting point for generating the sequence.
($:#:>: ^: (1000 > f0b)) 0
17
This tells me that the 17th Fibonacci number is the largest one less than my limit. I use that information to generate the Fibonacci numbers by applying f0b to each item in i. ($:#:>: ^: (1000 > f0b)) 0 by using rank 0 (fob"0)
f0b"0 i. ($:#:>: ^: (1000 > f0b)) 0
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
In your case you wanted the ones under 2000000
f0b"0 i. ($:#:>: ^: (2000000 > f0b)) 0
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269
... and then I realized that you wanted a verb to be able to answer your original question. I went with dyadic where the left argument is the limit and the right argument generates the sequence. Same idea but I was able to make use of some hooks when I went to the tacit form. (> f0b) checks if the result of f0b is under the limit and ($: >:) increments the right argument while allowing the left argument to remain for $:
2000000 (($: >:) ^: (> f0b)) 0
32
fnum=: (($: >:) ^: (> f0b))
2000000 fnum 0
32
f0b"0 i. 2000000 fnum 0
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269
I have little doubt that others will come up with better solutions, but this is what I cobbled together tonight.

Proper way to set corARMA() correlation structure for lme {nlme}

I have a time series with temperature measures every 5-minutes over ca. 5-7 days. I'm looking to set the correlation structure for my model as I have considerable temporal autocorrelation. I've decided that moving averages would be the best form, but I am unsure what to specify within the correlation = corARMA(q=?) part of the model. Here is the following output for ACF(m1):
lag ACF
1 0 1.000000000
2 1 0.906757430
3 2 0.782992821
4 3 0.648405513
5 4 0.506600300
6 5 0.369248402
7 6 0.247234208
8 7 0.139716028
9 8 0.059351579
10 9 -0.009968973
11 10 -0.055269347
12 11 -0.086383590
13 12 -0.108512009
14 13 -0.114441343
15 14 -0.104985321
16 15 -0.089398656
17 16 -0.070320370
18 17 -0.051427604
19 18 -0.028491302
20 19 0.005331508
21 20 0.044325557
22 21 0.083718759
23 22 0.121348020
24 23 0.143549745
25 24 0.151540265
26 25 0.146369313
It appears that there is highly significant autocorrelation in the first ca. 7 lags. See also the attached images: 1[Residuals] & 2[Model]
Would this mean I set the correlation = corARMA(q=7)?

torch / lua: retrieving n-best subset from Tensor

I have following code now, which stores the indices with the maximum score for each question in pred, and convert it to string.
I want to do the same for n-best indices for each question, not just single index with the maximum score, and convert them to string. I also want to display the score for each index (or each converted string).
So scores will have to be sorted, and pred will have to be multiple rows/columns instead of 1 x nqs. And corresponding score value for each entry in pred must be retrievable.
I am clueless as to lua/torch syntax, and any help would be greatly appreciated.
nqs=dataset['question']:size(1);
scores=torch.Tensor(nqs,noutput);
qids=torch.LongTensor(nqs);
for i=1,nqs,batch_size do
xlua.progress(i, nqs)
r=math.min(i+batch_size-1,nqs);
scores[{{i,r},{}}],qids[{{i,r}}]=forward(i,r);
end
tmp,pred=torch.max(scores,2);
answer=json_file['ix_to_ans'][tostring(pred[{i,1}])]
print(answer)
Here is my attempt, I demonstrate its behavior using a simple random scores tensor:
> scores=torch.floor(torch.rand(4,10)*100)
> =scores
9 1 90 12 62 1 62 86 46 27
7 4 7 4 71 99 33 48 98 63
82 5 73 84 61 92 81 99 65 9
33 93 64 77 36 68 89 44 19 25
[torch.DoubleTensor of size 4x10]
Now, since you want the N best indexes for each question (row), let's sort each row of the tensor:
> values,indexes=scores:sort(2)
Now, let's look at what the return tensors contain:
> =values
1 1 9 12 27 46 62 62 86 90
4 4 7 7 33 48 63 71 98 99
5 9 61 65 73 81 82 84 92 99
19 25 33 36 44 64 68 77 89 93
[torch.DoubleTensor of size 4x10]
> =indexes
2 6 1 4 10 9 5 7 8 3
2 4 1 3 7 8 10 5 9 6
2 10 5 9 3 7 1 4 6 8
9 10 1 5 8 3 6 4 7 2
[torch.LongTensor of size 4x10]
As you see, the i-th row of values is the sorted version (in increasing order) of the i-th row of scores, and each row in indexes gives you the corresponding indexes.
You can get the N best values/indexes for each question (i.e. row) with
> N_best_indexes=indexes[{{},{indexes:size(2)-N+1,indexes:size(2)}}]
> N_best_values=values[{{},{values:size(2)-N+1,values:size(2)}}]
Let's see their values for the given example, with N=3:
> return N_best_indexes
7 8 3
5 9 6
4 6 8
4 7 2
[torch.LongTensor of size 4x3]
> return N_best_values
62 86 90
71 98 99
84 92 99
77 89 93
[torch.DoubleTensor of size 4x3]
So, the k-th best value for question j is N_best_values[{{j},{values:size(2)-k+1}]], and its corresponding index in the scores matrix is given by this row, column values:
row=j
column=N_best_indexes[{{j},indexes:size(2)-k+1}}].
For example, the first best value (k=1) for the second question is 99, which lies at the 2nd row and 6th column in scores. And you can see that values[{{2},values:size(2)}}] is 99, and that indexes[{{2},{indexes:size(2)}}] gives you 6, which is the column index in the scores matrix.
Hope that I explained my solution well.

Resources