Combination table of elements from another table

Combination table of elements from another table - google-sheets

I have a table of arguments which looks like this:
argument 1
argument 2
min
max
elem1 elem2
elemA
8
15
elem1 elem2 elem3
elemB elemC elemD elemE
16
32
elem1 elem2 elem3
elemF elemG elemH elemI
39
42
elem4
elemF elemG elemH elemI
42
83
Elements in first and second column are strings, for example:
elem1 = 'Rio de Janeiro', elem2 = 'Tokio', elemA='New York' and so on...
Min and max columns are ranges and can be both treated as argument 3.
I am trying to write down a table of every possible combination of elements from argument 1 column with arguments 2 and 3, like this:
argument 1
argument 2
min
max
elem1
elemA
8
15
elem2
elemA
8
15
elem1
elemB
16
32
elem1
elemC
16
32
elem1
elemD
16
32
elem1
elemE
16
32
elem2
elemB
16
32
elem2
elemC
16
32
elem2
elemD
16
32
elem2
elemE
16
32
elem3
elemB
16
32
elem3
elemC
16
32
elem3
elemD
16
32
elem3
elemE
16
32
elem1
elemF
39
42
elem1
elemG
39
42
elem1
elemH
39
42
elem1
elemI
39
42
elem2
elemF
39
42
elem2
elemG
39
42
elem2
elemH
39
42
elem2
elemI
39
42
elem3
elemF
39
42
elem3
elemG
39
42
elem3
elemH
39
42
elem3
elemI
39
42
elem4
elemF
42
83
elem4
elemG
42
83
elem4
elemH
42
83
elem4
elemI
42
83
I have no idea what formulas should I use.

In the example data, elem1 elem2 uses a space as separator, but Rio de Janeiro also includes spaces. It is unclear whether the data is in a format that can be parsed programatically.
In the event the real data does not include strings like Rio de Janeiro, or you are using separators other than spaces, try something like this to get started:
=arrayformula( split( query( flatten( split(A1:A10, " ") & "|" & B1:B10 & "|" & C1:C10 ), "where not Col1 starts with '|' ", 0 ), "|" ) )
Replace the " " with the separator character you are using.
The formula will expand column A but not column B. To expand column B as well, apply a similar formula to the results from the first one.

I am not sure if what I did will work for you, but it seemed to have worked with the test data that you have uploaded. It is not one formula, but rather a few that automatically update the "Final Result" when you change the test data.
Here is the link to the spreadsheet.
The formula in the first column under final result would be enough to get all of the combinations, but there is a hard cap of 50,000 characters on the CONCAT formula, which I do not know how to bypass.
EDIT:
I have changed the calculator to match what you were trying to achieve:
You will find the new version on the "EDIT" tab.
If you are using data that is different from the "Test Data" that is referenced in the formulas, just make sure to change around the references in the formulas to fit your new data. I have also added these instructions in the spreadsheet.

Thanks to reddit user u/MattyPKing I found a solution.
For anyone interested here is spreadsheet he sent me.

Related

torch / lua: retrieving n-best subset from Tensor

I have following code now, which stores the indices with the maximum score for each question in pred, and convert it to string.
I want to do the same for n-best indices for each question, not just single index with the maximum score, and convert them to string. I also want to display the score for each index (or each converted string).
So scores will have to be sorted, and pred will have to be multiple rows/columns instead of 1 x nqs. And corresponding score value for each entry in pred must be retrievable.
I am clueless as to lua/torch syntax, and any help would be greatly appreciated.
nqs=dataset['question']:size(1);
scores=torch.Tensor(nqs,noutput);
qids=torch.LongTensor(nqs);
for i=1,nqs,batch_size do
xlua.progress(i, nqs)
r=math.min(i+batch_size-1,nqs);
scores[{{i,r},{}}],qids[{{i,r}}]=forward(i,r);
end
tmp,pred=torch.max(scores,2);
answer=json_file['ix_to_ans'][tostring(pred[{i,1}])]
print(answer)

Here is my attempt, I demonstrate its behavior using a simple random scores tensor:
> scores=torch.floor(torch.rand(4,10)*100)
> =scores
9 1 90 12 62 1 62 86 46 27
7 4 7 4 71 99 33 48 98 63
82 5 73 84 61 92 81 99 65 9
33 93 64 77 36 68 89 44 19 25
[torch.DoubleTensor of size 4x10]
Now, since you want the N best indexes for each question (row), let's sort each row of the tensor:
> values,indexes=scores:sort(2)
Now, let's look at what the return tensors contain:
> =values
1 1 9 12 27 46 62 62 86 90
4 4 7 7 33 48 63 71 98 99
5 9 61 65 73 81 82 84 92 99
19 25 33 36 44 64 68 77 89 93
[torch.DoubleTensor of size 4x10]
> =indexes
2 6 1 4 10 9 5 7 8 3
2 4 1 3 7 8 10 5 9 6
2 10 5 9 3 7 1 4 6 8
9 10 1 5 8 3 6 4 7 2
[torch.LongTensor of size 4x10]
As you see, the i-th row of values is the sorted version (in increasing order) of the i-th row of scores, and each row in indexes gives you the corresponding indexes.
You can get the N best values/indexes for each question (i.e. row) with
> N_best_indexes=indexes[{{},{indexes:size(2)-N+1,indexes:size(2)}}]
> N_best_values=values[{{},{values:size(2)-N+1,values:size(2)}}]
Let's see their values for the given example, with N=3:
> return N_best_indexes
7 8 3
5 9 6
4 6 8
4 7 2
[torch.LongTensor of size 4x3]
> return N_best_values
62 86 90
71 98 99
84 92 99
77 89 93
[torch.DoubleTensor of size 4x3]
So, the k-th best value for question j is N_best_values[{{j},{values:size(2)-k+1}]], and its corresponding index in the scores matrix is given by this row, column values:
row=j
column=N_best_indexes[{{j},indexes:size(2)-k+1}}].
For example, the first best value (k=1) for the second question is 99, which lies at the 2nd row and 6th column in scores. And you can see that values[{{2},values:size(2)}}] is 99, and that indexes[{{2},{indexes:size(2)}}] gives you 6, which is the column index in the scores matrix.
Hope that I explained my solution well.

Problems Implementing AR, ARMA, and possibly more complex timeseries models in pymc3 using theano.scan

I try to implement a simple ARMA model, however have serious difficulties getting it to run. When adding a parameter to the error term everything works fine (see the return x_m1 + a*e statement, commented out below), however if I add a parameter to the auto regressive part, I get a FloatingPointError or LinAlgError or PositiveDefiniteError, depending on the initialization method I use.
The code is also put into a gist you can find here. The model definition is replicated here:
with pm.Model() as model:
a = pm.Normal("a", 0, 1)
sigma = pm.Exponential('sigma', 0.1, testval=F(.1))
e = pm.Normal("e", 0, sigma, shape=(N-1,))
def x(e, x_m1, a):
# return x_m1 + a*e
return a*x_m1 + e
x, updates = theano.scan(
fn=x,
sequences=[e],
outputs_info=[tt.as_tensor_variable(data.iloc[0])],
non_sequences=[a]
)
x = pm.Deterministic('x', x)
lam = pm.Exponential('lambda', 5.0, testval=F(.1))
y = pm.StudentT("y", mu=x, lam=lam, nu=1, observed=data.values[1:]) #
with model:
trace = pm.sample(2000, init="NUTS", n_init=1000)
Here the errors respective to the initialization methods:
"ADVI" / "ADVI_MAP": FloatingPointError: NaN occurred in ADVI optimization.
"MAP": LinAlgError: 35-th leading minor not positive definite
"NUTS": PositiveDefiniteError: Scaling is not positive definite. Simple check failed. Diagonal contains negatives. Check indexes [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71]
For details on the error messages, please look at this github issue posted at pymc3.
To be explicit, I really would like to have a scan-like solution which is easily extendable to for instance a full ARMA model. I know that one can represent the presented AR(1) model without scan by defining logP as already done in pymc3/distributions/timeseries.py#L18-L46, however I was not able to extend this vectorized style to a full ARMA model. The use of theano.scan seems preferable I think.
Any help is highly appriciated!

Autofill adjacent column from based on header value

I have some monthly data that is running across a sheet that looks a bit like the below -
Item Sep-15 Item Oct-15 Item Nov-15
SKU1 23 SKU1 43 SKU1 22
SKU2 43 SKU2 32 SKU2 34
SKU3 34 SKU3 44 SKU3 36
SKU4 32 SKU4 24 SKU4 45
As I want to run a query over the data I need to transpose the data from the three 'groups' of columns to one single column. I can do that fine with item and quantity data using query({A:A;C:C;E:E},"select * etc.
What I am trying to also do is bring the value data heading and create a 3rd column so that the data looks like this -
SKU1 23 Sep-15
SKU2 43 Sep-15
SKU3 34 Sep-15
SKU4 32 Sep-15
SKU1 43 Oct-15
SKU2 32 Oct-15
SKU3 44 Oct-15
SKU4 24 Oct-15
SKU1 22 Nov-15
SKU2 34 Nov-15
SKU3 36 Nov-15
SKU4 45 Nov-15
Any ideas on what combination of functions I can use to populate those date values ?

To repeat the dates without using REPT (because of it's inherent limitations --> the maximum number of repetitions is 100) you could try:
=ArrayFormula({regexreplace(to_text(G3:G11), "\d+", G2&""); regexreplace(to_text(K3:K11), "\d+", K2&""); regexreplace(to_text(O3:O11), "\d+", O2&""); regexreplace(to_text(S3:S11), "\d+", S2&"")}+0)
Note: In the above I assume
the dates to be in G2, K2, O2 and S2
the data starting in row 3 to 11 (change to suit).

Indy 10 IdTCPSever Readbytes scrambling data

I am trying to use Indy10 ReadBytes() in Delphi 2007 to read a large download of a series of data segments formatted as [#bytes]\r\n where #bytes indicates the number of bytes. My algorithm is:
Use ReadBytes() to get the [#]\r\n text, which is normally 10 bytes.
Use ReadBytes() to get the specified # data bytes.
Go to step 1 if more data segments need to be processed, i.e., # is negative.
This works well but frequently I don't get the expected text at step 1. Here's a short example after 330 successful data segments:
Data received from last step 2 ReadBytes(). NOTE embedded Step 1 [-08019]\r\n text.
Line|A033164|B033164|C033164|D033164|E033164|F033164|G033164|H033164|EndL\r|Begin
Line|A033165|B033165|C033165|D033165|E033165|F033165|G033165|H033165|EndL\r|Begin
Line|A033166|B033166|C033166|D033166|E033166|F033166|G033166|H033166|EndL\r[-08019]
\r\n|Begin
Line|A033167|B033167|C033167|D033167|E033167|F033167|G033167|H033167|EndL\r|Begin
Line|A033168|B033168|C033168|D033168|E033168|F033168|G033168|H033168|EndL\r|Begin
Socket data captured by WireShark.
0090 30 33 33 31 36 36 7c 42 30 33 33 31 36 36 7c 43 033166|B033166|C
00a0 30 33 33 31 36 36 7c 44 30 33 33 31 36 36 7c 45 033166|D033166|E
00b0 30 33 33 31 36 36 7c 46 30 33 33 31 36 36 7c 47 033166|F033166|G
00c0 30 33 33 31 36 36 7c 48 30 33 33 31 36 36 7c 45 033166|H033166|E
00d0 6e 64 4c 0d ndL.
No. Time Source Destination Protocol Length Info
2837 4.386336000 000.00.247.121 000.00.172.17 TCP 1514 40887 > 57006 [ACK] Seq=2689776 Ack=93 Win=1460 Len=1460
Frame 2837: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits) on interface 0
Ethernet II, Src: Cisco_60:4d:bf (e4:d3:f1:60:4d:bf), Dst: Dell_2a:78:29 (f0:4d:a2:2a:78:29)
Internet Protocol Version 4, Src: 000.00.247.121 (000.00.247.121), Dst: 000.00.172.17 (000.00.172.17)
Transmission Control Protocol, Src Port: 40887 (40887), Dst Port: 57006 (57006), Seq: 2689776, Ack: 93, Len: 1460
Data (1460 bytes)
0000 5b 2d 30 38 30 31 39 5d 0d 0a 7c 42 65 67 69 6e [-08019]..|Begin
0010 20 4c 69 6e 65 7c 41 30 33 33 31 36 37 7c 42 30 Line|A033167|B0
0020 33 33 31 36 37 7c 43 30 33 33 31 36 37 7c 44 30 33167|C033167|D0
0030 33 33 31 36 37 7c 45 30 33 33 31 36 37 7c 46 30 33167|E033167|F0
Does anyone know why this happens? Thanks
More information. We do socket reading from a single thread and don't call Connected() while reading. Here's relevant code snippet:
AClientDebugSocketContext.Connection.Socket.ReadBytes(inBuffer,byteCount,True);
numBytes := Length(inBuffer);
Logger.WriteToLogFile(BytesToString: '+BytesToString(inBuffer,0,numBytes),0);
Move(inBuffer[0], Pointer(Integer(Buffer))^, numBytes);

Embedded data like you describe, especially at random times, usually happens when you read from the same socket in multiple threads at the same time without adequate synchronization between them. One thread may receive a portion of the incoming data, and another thread may receive another portion of the data, and they end up storing their data in the InputBuffer in the wrong order. Hard to say for sure if that your problem since you did not provide any code. The best option is to make sure you never read from the same socket in multiple threads at all. That includes any calls to Connected(), since it performs a read operation internally. You should do all of your reading within a single thread. If that is not an option, then at least wrap your socket I/O with some kind of inter-thread lock, such as a critical section or mutex.
Update: You are accessing a TIdContext object via your own AClientDebugSocketContext variable. Where is that code being used exactly? If it is not in the context of the server's OnConnect, OnDisconnect, OnExecute, or OnException events, then you are reading from the same socket across multiple threads, because TIdTCPServer internally calls Connected() (which does a read) in between calls to the OnExecute event for that TIdContext object.

gnuplot skips data file "with no valid points"

I've got a datafile with some values:
-55 471 485 500
-50 495 510 524
-40 547 562 576
-30 603 617 632
-20 662 677 691
-10 726 740 754
0 794 807 820
10 865 877 889
20 941 951 962
25 980 990 1000
30 1018 1029 1041
40 1097 1111 1125
50 1180 1196 1213
60 1266 1286 1305
70 1355 1378 1402
80 1447 1475 1502
90 1543 1575 1607
100 1642 1679 1716
110 1745 1786 1828
120 1849 1896 1943
125 1900 1950 2000
130 1950 2003 2056
140 2044 2103 2162
150 2124 2189 2254
When I call the following gnuplot script:
set terminal latex
set output 'foo.tex'
unset key
set format "%g"
set autoscale
set xlabel "Temperatur an $R_1$ [$^{{\degree}C}$]"
set ylabel 'Ladezeit [$ms$]'
f(r) =(log(1/3)*r*(47*(10e-6)))*-1
plot [-55:150] [0:3] '/some/path/res/kty_81-121.dat' using 1:(f($3)) with lines
gnuplot spits out a rather general error warning: Skipping data file with no valid points. After hours of doing research about this problem I have still no answer.
Does someone know how to fix this?

When divide integer with integer, gnuplot automatically cast the output into integer. Thus, the argument of the log function becomes zero (i.e. int(1/3) = 0), and became -inf. Change the function as below.
f(r) =(log(1.0/3.0)*r*(47*(10e-6)))*-1

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Combination table of elements from another table - google-sheets

Thanks to reddit user u/MattyPKing I found a solution. For anyone interested here is spreadsheet he sent me.

Related

torch / lua: retrieving n-best subset from Tensor

Problems Implementing AR, ARMA, and possibly more complex timeseries models in pymc3 using theano.scan

Autofill adjacent column from based on header value

Indy 10 IdTCPSever Readbytes scrambling data

gnuplot skips data file "with no valid points"

Categories

Resources