I have two columns, A and B, where A contains the number of values for corresponding value in B. I want to create column C that contains the number of values from A but with the value from B. So for example:
| A | | B |
| 2 | | 40 |
| 3 | | 60 |
Should produce:
| C |
| 40 |
| 40 |
| 60 |
| 60 |
| 60 |
So 2 of 40 and 3 of 60. This could be in memory (I only want to use C in a formula, don't really need it as an actual column) or as its own column.
Give a try on below formula-
=ArrayFormula(TRANSPOSE(SPLIT(TEXTJOIN("#",TRUE,REPT(B1:B2&"#",A1:A2)),"#")))
My sheet:
+---------+-----------+---------+---------+-----------+
| product | value 1 | value 2 | value 3 | value 4 |
+---------+-----------+---------+---------+-----------+
| name 1 | 700,000 | 500 | 10,000 | 2,000,000 |
+---------+-----------+---------+---------+-----------+
| name 2 | 200,000 | 800 | 20,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 3 | 100,000 | 150 | 6,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 4 | 1,000,000 | 1,000 | 25,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 5 | 2,000,000 | 1,500 | 30,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 6 | 2,500,000 | 3,000 | 65,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 7 | 300,000 | 300 | 12,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 8 | 350,000 | 200 | 9,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 9 | 900,000 | 1,200 | 28,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 10 | 150,000 | 100 | 5,000 | ? |
+---------+-----------+---------+---------+-----------+
What I am attempting is to predict the empty columns based on the data that I do have. Maybe just one of the columns that contain data in every row or maybe I should be only focusing on one column that contains data in every row?
I have used FORECAST previously but had more data in the column that I was predicting values for which the lack of data I think is my root problem(?). Not sure if FORECAST is best for this so any recommendations for other functions are most welcome.
The last thing I can add though is that the known value in column E (value 4) is a confident number and ideally it's used in any formula that I end up with (although I am open to any other recommendations).
The formula I was using:
=FORECAST(D3,E2,$D$2:$D$11)
I don't think this is possible without more information. If you think about it, Value 4 can be a constant (always 2,000,000), be dependent on only one other value (say 200 times value 3), or be a complex formula (say add values 1, 2, and 3 with a constant). Each of these 3 models agree with the values for name 1, however they generate vastly different value 4 predictions.
In the case of name 2, the models would output the following for value 4:
Constant: 2,000,000
Value 3: 8,000,000
Sum: 2,489,700
Each of those values could be valid without providing further constraints (either through data points or specifying the kind of model, but probably both).
So lets say I have multiple probs, where one prob has two input DataFrames:
Input:
One constant stream of data (e.g. from a sensor) Second step: Multiple streams from multiple sensors
> df_prob1_stream1
timestamp | ident | measure1 | measure2 | total_amount |
----------------------------+--------+--------------+----------+--------------+
2019-09-16 20:00:10.053174 | A | 0.380 | 0.08 | 2952618 |
2019-09-16 20:00:00.080592 | A | 0.300 | 0.11 | 2982228 |
... (1 million more rows - until a pre-defined ts) ...
One static DataFrame of information, mapped to an unique identifier called ident, which needs to be assigned to the ident column in each df_probX_streamX in order to let the system recognize, that this data is related.
> df_global
ident | some1 | some2 | some3 |
--------+--------------+----------+--------------+
A | LARGE | 8137 | 1 |
B | SMALL | 1234 | 2 |
Output:
A binary classifier [0,1]
So how can I suitable train XGBoost to be able to make the best usage of one timeseries DataFrame in combination with one static DataFrame (containg additional context information) in one prob? Any help would be appreciated.
could you please advise if it is possible to calculate the predictions in SPSS Modeler when having two conditions for the model
i.e. we need to calculate the future values for the respective ID and at the same time we need to see the split per Var1.
So far we have used Time Series node but there we set just one target Value (currency1). Would it be please possible to have the output in the format that we have one figure as the prediction for the respective ID and having there also Var1 reflected in the split. We need this split per Var1 as one ID has more values in Var1, so it is not the case as in Var3 where we have just one value assigned to the ID.
ID | Value (currency1) | Value (currency2) | Period | Var1 | Var2 | Var3
---------------------------------------------------------------------------
U1 | 1000 | 1200 | 1/1/2000 | 100 | abc | 1p1
U1 | 500 | 600 | 2/1/2000 | 100 | abc | 1p1
U1 | 700 | 840 | 3/1/2000 | 200 | def | 1p1
U2 | 500 | 600 | 1/1/2000 | 100 | ghj | 1p2
U2 | 800 | 960 | 4/1/2000 | 300 | abc | 1p2
Thank you very much in advance for any help / advice.
I need to SUM all values in the column till the current cell, where the values in a different column are the same.
Example:
------------------------------------------------------
| FIRST | SECOND | SUM |
------------------------------------------------------
| VALUE A | NUMBER 1 | NUMBER 1 |
------------------------------------------------------
| VALUE A | NUMBER 2 | NUMBER 1 + NUMBER 2 |
------------------------------------------------------
| VALUE B | NUMBER 3 | NUMBER 3 |
-------------------------------------------------------
| VALUE B | NUMBER 4 | NUMBER 3 + NUMBER 4 |
-------------------------------------------------------
| VALUE B | NUMBER 5 | NUMBER 3 + NUMBER 4 + NUMBER 5 |
-------------------------------------------------------
The first column has strings, the second numbers and the first holds the results.
Write an if statement to do this. Compare the value of the first column between the current and previous rows, if they have the same value add previous sum and current value together, otherwise the sum becomes the current value.
In code, this would be the formula in C3
=if(exact(A3,A2),C2+B3,B3)