Proper way to set corARMA() correlation structure for lme {nlme} - nlme

I have a time series with temperature measures every 5-minutes over ca. 5-7 days. I'm looking to set the correlation structure for my model as I have considerable temporal autocorrelation. I've decided that moving averages would be the best form, but I am unsure what to specify within the correlation = corARMA(q=?) part of the model. Here is the following output for ACF(m1):
lag ACF
1 0 1.000000000
2 1 0.906757430
3 2 0.782992821
4 3 0.648405513
5 4 0.506600300
6 5 0.369248402
7 6 0.247234208
8 7 0.139716028
9 8 0.059351579
10 9 -0.009968973
11 10 -0.055269347
12 11 -0.086383590
13 12 -0.108512009
14 13 -0.114441343
15 14 -0.104985321
16 15 -0.089398656
17 16 -0.070320370
18 17 -0.051427604
19 18 -0.028491302
20 19 0.005331508
21 20 0.044325557
22 21 0.083718759
23 22 0.121348020
24 23 0.143549745
25 24 0.151540265
26 25 0.146369313
It appears that there is highly significant autocorrelation in the first ca. 7 lags. See also the attached images: 1[Residuals] & 2[Model]
Would this mean I set the correlation = corARMA(q=7)?

Related

I want to create cross tab to count total number of project in each year and active project

I am new in using Cognos and I have data for the overall project, and I need to create some kind of table or cross tab may be to count the overall project of each year and how many of them are active, canceled and inactive
I have tried using a cross tab but no success.
ProjectId Status Date
1589 Active 8/29/2018
1566 Inactive 4/17/2018
1042 Cancelled 1/6/2014
1374 Completed 1/20/2015
1543 Completed 8/4/2014
1065 Cancelled 7/15/2014
1397 Completed 10/1/2012
1520 Inactive 4/13/2017
1420 Completed 1/1/2015
1443 Completed 1/1/2015
1048 Cancelled 10/16/2014
1002 Active 2/6/2017
1357 Completed 1/19/2017
1606 Active 11/6/2018
Output should look like this
New Projects Active Cancelled/Terminated/Inactive Carried Forward
2013 32 45 4 11 30
2014 45 75 17 14 44
2015 46 90 25 21 44
2016 30 74 27 10 37
2017 82 119 11 26 82
2018 86 168 29 24 115
2019 23 138 9 4 125
Going with -- project Id, status, Date
The ideal scenario is we have a data item for year. If not, to get the year
extract(year, Date)
Calculation data items: for each count
For example, this is for active
if (status = 'Active')Then(1)Else(0)
For properties
Make sure the aggregation is set to total
Adding the column should give you the count

torch / lua: retrieving n-best subset from Tensor

I have following code now, which stores the indices with the maximum score for each question in pred, and convert it to string.
I want to do the same for n-best indices for each question, not just single index with the maximum score, and convert them to string. I also want to display the score for each index (or each converted string).
So scores will have to be sorted, and pred will have to be multiple rows/columns instead of 1 x nqs. And corresponding score value for each entry in pred must be retrievable.
I am clueless as to lua/torch syntax, and any help would be greatly appreciated.
nqs=dataset['question']:size(1);
scores=torch.Tensor(nqs,noutput);
qids=torch.LongTensor(nqs);
for i=1,nqs,batch_size do
xlua.progress(i, nqs)
r=math.min(i+batch_size-1,nqs);
scores[{{i,r},{}}],qids[{{i,r}}]=forward(i,r);
end
tmp,pred=torch.max(scores,2);
answer=json_file['ix_to_ans'][tostring(pred[{i,1}])]
print(answer)
Here is my attempt, I demonstrate its behavior using a simple random scores tensor:
> scores=torch.floor(torch.rand(4,10)*100)
> =scores
9 1 90 12 62 1 62 86 46 27
7 4 7 4 71 99 33 48 98 63
82 5 73 84 61 92 81 99 65 9
33 93 64 77 36 68 89 44 19 25
[torch.DoubleTensor of size 4x10]
Now, since you want the N best indexes for each question (row), let's sort each row of the tensor:
> values,indexes=scores:sort(2)
Now, let's look at what the return tensors contain:
> =values
1 1 9 12 27 46 62 62 86 90
4 4 7 7 33 48 63 71 98 99
5 9 61 65 73 81 82 84 92 99
19 25 33 36 44 64 68 77 89 93
[torch.DoubleTensor of size 4x10]
> =indexes
2 6 1 4 10 9 5 7 8 3
2 4 1 3 7 8 10 5 9 6
2 10 5 9 3 7 1 4 6 8
9 10 1 5 8 3 6 4 7 2
[torch.LongTensor of size 4x10]
As you see, the i-th row of values is the sorted version (in increasing order) of the i-th row of scores, and each row in indexes gives you the corresponding indexes.
You can get the N best values/indexes for each question (i.e. row) with
> N_best_indexes=indexes[{{},{indexes:size(2)-N+1,indexes:size(2)}}]
> N_best_values=values[{{},{values:size(2)-N+1,values:size(2)}}]
Let's see their values for the given example, with N=3:
> return N_best_indexes
7 8 3
5 9 6
4 6 8
4 7 2
[torch.LongTensor of size 4x3]
> return N_best_values
62 86 90
71 98 99
84 92 99
77 89 93
[torch.DoubleTensor of size 4x3]
So, the k-th best value for question j is N_best_values[{{j},{values:size(2)-k+1}]], and its corresponding index in the scores matrix is given by this row, column values:
row=j
column=N_best_indexes[{{j},indexes:size(2)-k+1}}].
For example, the first best value (k=1) for the second question is 99, which lies at the 2nd row and 6th column in scores. And you can see that values[{{2},values:size(2)}}] is 99, and that indexes[{{2},{indexes:size(2)}}] gives you 6, which is the column index in the scores matrix.
Hope that I explained my solution well.

How to QUERY a Spreadsheet by Row value

I Have a table in my spreadsheet like this
FEB MAR APR MAY
10 14 7 13
12 9 8 19
15 11 14 16
And I want to use this info in another table. What I want to accomplish is in this another table compare two months by getting this info with the name of the month.
FEB APR
10 7
12 8
15 14
What I did was
=QUERY(AnotherTable!1:1001; "SELECT * WHERE Row2 = 'FEB'")
But it didn't seems to work.
Any thoughts?
You might be able to use a FILTER formula instead:
=FILTER(AnotherTable!1:1001;AnotherTable!2:2="FEB")
or to return both months:
=FILTER(AnotherTable!1:1001,((AnotherTable!2:2="FEB")+(AnotherTable!2:2="APR")))
Use TRANSPOSE built-in function two times, the first to flip the source data, the second to flip the result, and instead of referencing rows, the formula should reference columns.
The resulting formula is
=TRANSPOSE(QUERY(TRANSPOSE(A:D),"Select * where Col1='FEB' OR Col1='APR'"))
Applying the above formula to the following source data
FEB MAR APR MAY
10 14 7 13
12 9 8 19
15 11 14 16
will return the following result
FEB APR
10 7
12 8
15 14

how to join two pandas dataframe on specific column

I have 1st pandas dataframe which looks like this
order_id buyer_id caterer_id item_id qty_purchased
387 139 1 7 3
388 140 1 6 3
389 140 1 7 3
390 36 1 9 3
391 79 1 8 3
391 79 1 12 3
391 79 1 7 3
392 72 1 9 3
392 72 1 9 3
393 65 1 9 3
394 65 1 10 3
395 141 1 11 3
396 132 1 12 3
396 132 1 15 3
397 31 1 13 3
404 64 1 14 3
405 146 1 15 3
And the 2nd dataframe looks like this
item_id meal_type
6 Veg
7 Veg
8 Veg
9 NonVeg
10 Veg
11 Veg
12 Veg
13 NonVeg
14 Veg
15 NonVeg
16 NonVeg
17 Veg
18 Veg
19 NonVeg
20 Veg
21 Veg
I want to join this two data frames on item_id column. So that the final data frame should contain item_type where it has a match with item_id.
I am doing following in python
pd.merge(segments_data,meal_type,how='left',on='item_id')
But it gives me all nan values
You have to check types by dtypes of both columns (names) to join on.
If there are different, you can cast them, because you need same dtypes. Sometimes numeric columns are string columns, but looks like numbers.
If there are both same string types, maybe help cast both of them to int. Problem can be some whitespaces:
segments_data['item_id'] = segments_data['item_id'].astype(int)
meal_type['item_id'] = meal_type['item_id'].astype(int)
pd.merge(segments_data,meal_type,how='left',on='item_id')

How to fetch string using lua pattern matching

below is my string
local Amount =[[
Customer Details Net Amount
# Seq Name
Amount NTR
1 CDABCDEFGHIJ00564
0,1234
2 CDABCDEFGHIJ00565
0,0361
3 CDABCDEFGHIJ00566
0,0361
4 CDABCDEFGHIJ00567
0,0722
5 CDABCDEFGHIJ00568
0,0000
6 CDABCDEFGHIJ00569
0,0000
7 CDABCDEFGHIJ00570
0,0000
8 CDABCDEFGHIJ00571
0,7091
9 CDABCDEFGHIJ00572
1,4240
10 CDABCDEFGHIJ00573
0,0361
11 CDABCDEFGHIJ00574
0,5790
12 CDABCDEFGHIJ00575
0,4060
13 CDABCDEFGHIJ00576
0,3610
14 CDABCDEFGHIJ00577
0,6859
15 CDABCDEFGHIJ00578
0,2888
16 CDABCDEFGHIJ00579
0,0000
17 CDABCDEFGHIJ00580
0,0000
18 CDABCDEFGHIJ00581
0,0000
19 CDABCDEFGHIJ00582
0,0000
20 CDABCDEFGHIJ00583
0,0000
21 CDABCDEFGHIJ00584
0,0000
22 CDABCDEFGHIJ00585
0,8978
23 CDABCDEFGHIJ00586
0,0000
24 CDABCDEFGHIJ00587
2,3882
25 CDABCDEFGHIJ00588
0,0000
26 CDABCDEFGHIJ00589
2,0216
27 CDABCDEFGHIJ00590
1,7540
28 CDABCDEFGHIJ00591
0,0000
29 CDABCDEFGHIJ00592
0,0722
30 CDABCDEFGHIJ00593
0,0361
31 CDABCDEFGHIJ00594
0,0000
32 CDABCDEFGHIJ00595
0,0000
Total NAT files
11,9269
Direct inquiries to:
]]
by executing the code below
local ptrn = '\n([%d%p]+)\n'
for val1, val2 in string.gmatch(Amount, ptrn) do
print ("val1:=\t" .. (val1 or '').."\tval2:=\t"..(val2 or ''))
end
basically from the above string I want to fetch the last 5 digits of the string which is 00564 in val1 and the amount which is 0,1234 in val2 variable, but all this should in one pattern. This is a record, every record is starting with a number like this is 1 record or row
1 CDABCDEFGHIJ00564
0,1234
and this is 2nd record or row and so on
2 CDABCDEFGHIJ00565
0,0361
plese help....
It seems to me that %d+%s+%a+(%d+)\n%s*([%d,]+) should do the trick: the first %d+ will catch the row number, %s+ to match the white space after. %a+(%d+) will match CDABCDEFGHIJ00592 and capture the digits in the end (no way to specify that you want exactly five digits though). \n%s* will match the newline and any white space on the next line and ([%d,]+) will capture the last number with the comma.

Resources