SPSS: How would I create a column summing up means / medians / range from Compare Means function? - spss

I'm trying to sum up across a row for different numerical variables that have been processed through the Compare Means function.
Below (without the last 'Total' column') is what I have generated from Compare Means; I'm looking to generate the last Total column.
+--------+-------+-------+-------+-------+
| | Var 1 | Var 2 | Var 3 | Total |
+--------+-------+-------+-------+-------+
| Mean | 10 | 1 | 2 | |
| Median | 4 | 20 | 4 | |
| Range | 6 | 40 | 1 | |
| Std.dev| 3 | 3 | 3 | |
+--------+-------+-------+-------+-------+
Here's the syntax of my command:
MEANS TABLES=VAR_1 VAR_2 VAR_3
/CELLS=MEAN STDDEV MEDIAN RANGE.

Can't really imagine what the use is for summing these values, but forget about why - this is how:
The OMS command takes results from the output and puts them in a new dataset which you can then further analyse, as you requested.
DATASET DECLARE MyResults.
OMS /SELECT TABLES /IF COMMANDS=['Means'] SUBTYPES=['Report'] /DESTINATION FORMAT=SAV OUTFILE='MyResults' .
* now your original code.
MEANS TABLES=VAR_1 VAR_2 VAR_3 /CELLS=MEAN STDDEV MEDIAN RANGE.
* now your results are captured - we'll go see them.
omsend.
dataset activate MyResults.
* the results are now in a new dataset, which you can analyse.
compute total=sum(VAR_1, VAR_2, VAR_3).
exe.

Related

Create column taking number of values from one column and actual value from another column

I have two columns, A and B, where A contains the number of values for corresponding value in B. I want to create column C that contains the number of values from A but with the value from B. So for example:
| A | | B |
| 2 | | 40 |
| 3 | | 60 |
Should produce:
| C |
| 40 |
| 40 |
| 60 |
| 60 |
| 60 |
So 2 of 40 and 3 of 60. This could be in memory (I only want to use C in a formula, don't really need it as an actual column) or as its own column.
Give a try on below formula-
=ArrayFormula(TRANSPOSE(SPLIT(TEXTJOIN("#",TRUE,REPT(B1:B2&"#",A1:A2)),"#")))

How to forecast (or any other function) in Google Sheets with only one cell of data?

My sheet:
+---------+-----------+---------+---------+-----------+
| product | value 1 | value 2 | value 3 | value 4 |
+---------+-----------+---------+---------+-----------+
| name 1 | 700,000 | 500 | 10,000 | 2,000,000 |
+---------+-----------+---------+---------+-----------+
| name 2 | 200,000 | 800 | 20,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 3 | 100,000 | 150 | 6,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 4 | 1,000,000 | 1,000 | 25,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 5 | 2,000,000 | 1,500 | 30,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 6 | 2,500,000 | 3,000 | 65,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 7 | 300,000 | 300 | 12,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 8 | 350,000 | 200 | 9,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 9 | 900,000 | 1,200 | 28,000 | ? |
+---------+-----------+---------+---------+-----------+
| name 10 | 150,000 | 100 | 5,000 | ? |
+---------+-----------+---------+---------+-----------+
What I am attempting is to predict the empty columns based on the data that I do have. Maybe just one of the columns that contain data in every row or maybe I should be only focusing on one column that contains data in every row?
I have used FORECAST previously but had more data in the column that I was predicting values for which the lack of data I think is my root problem(?). Not sure if FORECAST is best for this so any recommendations for other functions are most welcome.
The last thing I can add though is that the known value in column E (value 4) is a confident number and ideally it's used in any formula that I end up with (although I am open to any other recommendations).
The formula I was using:
=FORECAST(D3,E2,$D$2:$D$11)
I don't think this is possible without more information. If you think about it, Value 4 can be a constant (always 2,000,000), be dependent on only one other value (say 200 times value 3), or be a complex formula (say add values 1, 2, and 3 with a constant). Each of these 3 models agree with the values for name 1, however they generate vastly different value 4 predictions.
In the case of name 2, the models would output the following for value 4:
Constant: 2,000,000
Value 3: 8,000,000
Sum: 2,489,700
Each of those values could be valid without providing further constraints (either through data points or specifying the kind of model, but probably both).

Mapping timeseries+static information into an ML model (XGBoost)

So lets say I have multiple probs, where one prob has two input DataFrames:
Input:
One constant stream of data (e.g. from a sensor) Second step: Multiple streams from multiple sensors
> df_prob1_stream1
timestamp | ident | measure1 | measure2 | total_amount |
----------------------------+--------+--------------+----------+--------------+
2019-09-16 20:00:10.053174 | A | 0.380 | 0.08 | 2952618 |
2019-09-16 20:00:00.080592 | A | 0.300 | 0.11 | 2982228 |
... (1 million more rows - until a pre-defined ts) ...
One static DataFrame of information, mapped to an unique identifier called ident, which needs to be assigned to the ident column in each df_probX_streamX in order to let the system recognize, that this data is related.
> df_global
ident | some1 | some2 | some3 |
--------+--------------+----------+--------------+
A | LARGE | 8137 | 1 |
B | SMALL | 1234 | 2 |
Output:
A binary classifier [0,1]
So how can I suitable train XGBoost to be able to make the best usage of one timeseries DataFrame in combination with one static DataFrame (containg additional context information) in one prob? Any help would be appreciated.

How to set more conditions (targets) in the Time Series Node in SPSS Modeler?

could you please advise if it is possible to calculate the predictions in SPSS Modeler when having two conditions for the model
i.e. we need to calculate the future values for the respective ID and at the same time we need to see the split per Var1.
So far we have used Time Series node but there we set just one target Value (currency1). Would it be please possible to have the output in the format that we have one figure as the prediction for the respective ID and having there also Var1 reflected in the split. We need this split per Var1 as one ID has more values in Var1, so it is not the case as in Var3 where we have just one value assigned to the ID.
ID | Value (currency1) | Value (currency2) | Period | Var1 | Var2 | Var3
---------------------------------------------------------------------------
U1 | 1000 | 1200 | 1/1/2000 | 100 | abc | 1p1
U1 | 500 | 600 | 2/1/2000 | 100 | abc | 1p1
U1 | 700 | 840 | 3/1/2000 | 200 | def | 1p1
U2 | 500 | 600 | 1/1/2000 | 100 | ghj | 1p2
U2 | 800 | 960 | 4/1/2000 | 300 | abc | 1p2
Thank you very much in advance for any help / advice.

SUM cells until current cell where value of another column is the same

I need to SUM all values in the column till the current cell, where the values in a different column are the same.
Example:
------------------------------------------------------
| FIRST | SECOND | SUM |
------------------------------------------------------
| VALUE A | NUMBER 1 | NUMBER 1 |
------------------------------------------------------
| VALUE A | NUMBER 2 | NUMBER 1 + NUMBER 2 |
------------------------------------------------------
| VALUE B | NUMBER 3 | NUMBER 3 |
-------------------------------------------------------
| VALUE B | NUMBER 4 | NUMBER 3 + NUMBER 4 |
-------------------------------------------------------
| VALUE B | NUMBER 5 | NUMBER 3 + NUMBER 4 + NUMBER 5 |
-------------------------------------------------------
The first column has strings, the second numbers and the first holds the results.
Write an if statement to do this. Compare the value of the first column between the current and previous rows, if they have the same value add previous sum and current value together, otherwise the sum becomes the current value.
In code, this would be the formula in C3
=if(exact(A3,A2),C2+B3,B3)

Resources