Copy variable between SPSS Datasets using Syntax - spss

EDITED based on feedback...
Is there a way to copy a variable form one open dataset to another in SPSS? What I have tried is to create a scratch variable that captures the value of the variable, and use that scratch variable in a Compute command into the next dataset:
DATASET ACTIVATE DataSet1.
COMPUTE #IDSratch = ID.
Dataset Activate DataSet2.
Compute ID = #IDScratch.
This fails because activating Dataset2 causes the scratch variable to be dropped from memory.
Match Files and/or STAR JOIN syntax will work for most scenarios but in my case because Dataset1 has many more records than Dataset2 AND there are no matching keys in both datasets, this yields extra records.
My original question was "Is there a simple, direct way of copying a variable between datasets?" and the answer still appears to be that merging the files via syntax is the best/only method if using syntax.

Since SPSS version 21.0, the STAR JOIN command (see documentation here) allows you to use SQL syntax to join datasets. So basically, you could get only the variables you want from each dataset.
Assume your first dataset is called data_1 and has id and var_1a. Your second data is called data_2, has the same id and var_2a; and you just want to pull var_2a to the first dataset. If both datasets are open, you can run:
dataset activate data_1.
STAR JOIN
/SELECT t0.var_1a, t1.var_2a
/FROM * AS t0
/JOIN 'data_2' AS t1
ON t0.id=t1.id
/OUTFILE FILE=*.
The link I provided above has plenty of examples on how to join variables of files that are saved in your computer.

Related

Structure of Data with Mixed ANOVA in SPSS

I was told that I need a 3x2x2 mixed ANOVA. I am new relatively new to SPSS. I was wondering if someone could explain how to the data needs to be structured in SPSS. Meaning, how would I structure the rows and columns?
I have 3 trials of data (each trial containing hundreds of measurements) with 2 treatment conditions (0 Volts, and 15 Volts), and 2 different substrates used on which I grew cells (TCPS and PCSA) this is the in-between groups.
Please see this Infographic
I originally ran multiple T-tests but was told to do ANOVA, here is the original plot with t-tests (it gives an idea of how I originally intended to look at the data).
As a side note- if anyone knows how to do this with python, I am also open to do it that way. Just based one what I have gathered, SPSS seems to be the only route.
For anyone in the future: I was not able to do this in SPSS. However doing this in R is relatively simple. Put all data in one column of a CSV file (all 3 trials). For my case I had 3 additional columns of identifiers (Trial, Voltage, and Substrate). Open the file in R, store the data as a model, then run the ANOVA on the model with:
model1 <- lm(NeuriteLength ~ Trial + Substrate*Voltage, data = D)
anova(model1)

Add tag to existing data point in InfluxDB

Is there a way to add a tag to an existing entry in InfluxDB measurement? If not in the existing db measurement, is there a way to insert the records with a new tag into a new influx measurement?
Currently I have a set of measurements that should probably be entries in a single measurement where their current measurement names should be tag-keys in the superset of the merged measurements.
e.g.
show measurements
measurement1
measurement2
measurement3
measurement4
should instead be tags on the data included in each measurement and union to form a single measurement joinedmeasurement with indexed tags measurment1, measurement2,...
It would have to be done manually via queries.
For example, in python using the official client:
from influxdb import InfluxDBClient
client = InfluxDBClient('localhost', database='my_db')
measurement = 'measurement1'
db_data = client.query('select value from %s' % (measurement))
data_to_write = [{'measurement': 'joinedmeasurement',
'tags': ['measurement1'],
'time': d['time'],
'fields': {'value': d['value']},
}
for d in db_data.get_points()]
client.write_points(data_to_write)
And so on for the rest of the measurements. Can run the above in a loop to do all of them in one go.
Consider using named fields though in addition to tags. The above example only uses one field - can have as many as you want.
This improves performance further, though obviously fields are not indexed so do not use them for data that queries are to run on.

How do I programmatically merge cases from datasets with conflicting variable names?

I want to add cases from many SPSS dataset to one SPSS dataset.
Here's my code:
DATASET ACTIVATE DataSet1.
ADD FILES /FILE=*
/FILE='Path\to\dataset.sav'.
EXECUTE.
But I get this error: Mismatched variable types on the input files.
I want SPSS to ignore the conflicting columns and add cases only from the columns where there is no conflict.
How do I do this?
This occurs because variables of the same name in the two different data sources have either different format types (STRING, NUMERIC, DATE ect) or either they are both STRINGS but of different length.
The latter, string variables of different lenghts, can be solved like this:
DATA LIST FREE / V(A1).
BEGIN DATA.
a b c
END DATA.
DATASET NAME DS1.
DATA LIST FREE / V(A2).
BEGIN DATA.
1 2 3
END DATA.
DATASET NAME DS2.
STATS ADJUST WIDTHS VARIABLES=ALL WIDTH=MAX /FILES DS1 DS2.
DATASET ACTIVATE DS1.
ADD FILES FILE=* /FILE=DS2.
However, if you have mismatch of different format types then that is a tad more complicated to solve due to many different permutations, so you would probably want to asses which variables are problematic and harmonize/delete them before merging files. Probably worth carrying out this exercise nonetheless as having same variable names with different format type could be signs of erroneous data.
If you know which variables conflict, you can use the KEEP subcommand to select the others, or you can use the RENAME command to assign new names and adjust the results afterwards.
If you need to harmonize the names and the issue is something like differing string lengths for variables that should be the same, the STATS ADJUST WIDTHS extension command can harmonize the widths before you merge.

Can SPSS treat a collection of Nominal Variables as one variable?

I have a lot of movie data from IMDB and I'm in the middle of cleaning up the data and making it so that 1 row = 1 movie as the database often has multiple records for a single film.
I've restructured the data so that what was a single 'Country' variable with multiple cases for a single film, is now a set of 29 country columns. A single film may have up to 29 countries affiliated with it (most have just 1 or 2).
I plan to do some simple descriptive statistics and expected frequencies, perhaps look for correlations with other variables like genre etc.
Is it possible to have SPSS treat all 29 variables as a single variable? It doesn't matter which of the country variables a country is present in, just that it is present in one of them. For example I might want to find all Indian films, and ask SPSS to check for each row, whether 'India' is in any one of the country variables and return the row if it is present in any of them.
Is this possible, or do I just need to manually instruct SPSS with a list of OR commands whenever I run a query.
There are two types of multiple response sets: multiple dichotomy, which would be 29 yes/no variables as you describe, and multiple category, in which you have a list of arbitrary categories. See the MRSETS command for details.
Once defined, CTABLES can do all your statistical calculations on these, and these sets can also be used in graphics constructed in the Chart Builder or GGRAPH commands.
Don't confuse the sets created by MRSETS with the older MULTIPLE RESPONSE procedure, which is still available. MRSETS definitions persist with the data and are used with CTABLES and GGRAPH only.
With the ANY function, as Andy said above, you would use the individual variables, but you can use TO. So, for example, you could write
COMPUTE FILM7 = ANY(7, f1 to f29)
if you have MC variables. If using the MD structure, you would have to check, say, variable f7 in this example.

What is " rows in all the tables" in for each loop?

When setting up a for each loop to read products from an "objProduct" object variable, I got three options in "Enumerator Mode" pane as snapshot shows:
I know "Rows in the first table" is the right option for current case. However, I'm curious in which scenarios will the second and third options be used?
Seems that "ADO Object Source Variable" will contain multiple tables if 2nd/3rd is applied. That's confusing... shouldn't one variable be regarded as one table and thus, only the first option is needed?
P.S.
I did researches and only MSDN sheds some light as below, but not quite clear when they will be applied and for what purpose.
**Rows in all tables (ADO.NET dataset only)**
Select to enumerate rows in all tables. This option is available only if the objects to enumerate are all members of the same ADO.NET dataset.
**All tables (ADO.NET dataset only)**
Select to enumerate tables only.
Let's say that you execute the following SQL in an Execute SQL Task (using an ADO.NET connection) and you store the full result set in an SSIS Object variable.
select * from
(select 1 as id, 'test' as description) resultSet1
;
select * from
(select 2 as anotherId, 'test2' as description union
select 3 as anotherId, 'test3' as description) resultSet2
That object is actually a System.Data.DataSet, which can contain multiple result sets (accessible via the Tables property). Each of those result sets is a System.Data.DataTable object. Within each result set (or System.Data.DataTable) you have rows.
The Rows in all tables (ADO.NET dataset only) and All tables (ADO.NET dataset only) options can be used when you need to iterate through all the result sets (instead of just the first one). The difference between the two is what objects are being enumerated over.
Rows in all tables (ADO.NET dataset only) - take all the rows of data returned from the SQL above and go through them one by one, mapping the column values to variables specified in your Variable Mappings. For the example above, you would have 3 total iterations (3 total rows). This behavior in a Script Task would look something like this:
All tables (ADO.NET dataset only) - take all the result sets from the SQL above and go through them one by one, mapping the result set to the variable specified in Variable Mappings. For the example above, you would have 2 total iterations (2 total result sets). This behavior in a Script Task would look something like this:
I've never had the need to use either one of these options, so I can't provide any specific scenarios where I've used them.

Resources