OMS for UNIANOVA isn't working for Estimated Marginal Means? - spss

Using SPSS v.26, I am seeking to use OMS to generate a new dataset that contains the grand mean and estimated means for independent variables using the UNIANOVA command. The dependent variable is 'WSoPSS'; the independent variable is 'PathwayID', and there is a covariate as 'P1Cov'. Syntax as follows:
DATASET DECLARE Run01.
OMS
/SELECT TABLES
/IF COMMANDS=['UNIANOVA'] SUBTYPES=[' Estimated Marginal Means']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='Run01' VIEWER=YES
/TAG='Run01'.
UNIANOVA WSoPSS BY PathwayID WITH P1Cov
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/EMMEANS=TABLES(OVERALL) WITH(P1Cov=MEAN)
/EMMEANS=TABLES(PathwayID) WITH(P1Cov=MEAN) COMPARE ADJ(LSD)
/PRINT ETASQ DESCRIPTIVE
/CRITERIA=ALPHA(.05)
/DESIGN=P1Cov PathwayID.
OMSEND tag = ['Run01'].
The analysis and output is all fine, but the OMS triggers an error: "OMS cannot produce the requested dataset or file. In SAV format, all tables selected must have the same number of columns. The number of columns in table Estimates does not match the number of columns in the previous tables (2:1)."
This is the entirety of the syntax I am running. Hours of searching IBM manuals hasn't revealed an explanation, so any assistance would be greatly appreciated, cheers.

The problem is you have two EMMEANS subcommands in your test:
/EMMEANS=TABLES(OVERALL) WITH(P1Cov=MEAN)
/EMMEANS=TABLES(PathwayID) WITH(P1Cov=MEAN) COMPARE ADJ(LSD)
the first one produces the table for the GRAND MEAN, and the second produces a table with analysis by PathwayID. The two tables have a different number of columns (the second table has an extra column for PathwayID), and that's what prevents the OMS from stacking them into one table.
Assuming what you need is only the second table, if you just delete the first of the two rows in the command the 'OMS' will work fine.

Related

Several lines for 1 case in a dataset that has to be analysed is SPSS

I have a large dataset in excel. A lot of cases have different lines. I should get cells from several colums in different colums. I have very basic knowledge of SQL and Pyspark (Python). I was told this should be possible with syntax in SPSS. The cells contain text. At this point I have 26 columns, from which 5 colums generate different lines. For example: diagnosis, some patients have more than 1 diagnosis, which gives several lines for this patient.
Can someone help me out?
I searched, but didn't find the right solution. I found a lot on pivot tables, but that is not what I need, since I need to move information to a new column.
e.g.: medical diagnosis could become med_diagn1, med_diagn2,... Same for nursing problem. This is now 1 column and can become vpd1, vpd2,...

Tableau has two data tables, how can i make a filter only apply to single data table

I currently have two data tables each linked to a date table. The data tables are from salesforce. I can calculate the number of a certain case type per quarter without issue. I can also calculate the running sum over quarters to show instrument install base increasing. I want to divide the number of cases per qtr by the install base. This calculation works, but when I apply a filter to see different types of cases per instrument, the filter impacts the install base as well. I would like to keep the install base consistent. I tried different LOD, but no luck. Any suggestions on filters and LOD and where to place in tableau would be beneficial.
One option is to use a parameter for filtering and then having a calculated field that changes based on parameter values in one table but not in the other table. However, this type of filter would affect all worksheets that use the same data.

Google Sheets Cross Join Function Tables with More than Two Columns

The crossJoin function posted by #Max Makhrov from the below thread works almost completely for what I was hoping to achieve. It was in response to cross joining two columns and I tried joining two tables, one with two columns and one with five columns. It works but only partially.
The delimiter of the column data is stuck as comma ",". This could be problematic for values with commas. The delimiter variable in the function only defines the two ranges being joined.
If the column being joined is a date for example, it seems to extend out the full date text inclusive of time zone and fixed as text. Is there a way to allow for it to be non-text to be formatted? Even when it's parsed using the split() function it's definitely still text.
Result of JOIN is longer than the limit of 50,000 characters
Below is a link to the example input and output. The first output example is a standard cross join. The other is the actual desired output which filters for any data rows where the date in column 5 is greater than or equal to the date in column 2.
https://docs.google.com/spreadsheets/d/1FGS8lYyy60AH49Qyug8Uxaey5jxDksihOks7ll8Hq10/edit?usp=drivesdk
Your spreadsheet is View Only, so i can't demo it there, but try this. On the demo sheet, start a new tab, then put this formula in cell A2.
Happy to walk you through it a bit if it works. Otherwise, maybe make the sample editable so i can troubleshoot w/ you in the same place?
=ARRAYFORMULA(QUERY({HLOOKUP({"A","B"},{"A","B";Sheet1!A5:B},SEQUENCE(COUNTA(Sheet1!D5:D)*COUNTA(Sheet1!A5:A),1,0)/COUNTA(Sheet1!D5:D)+2),HLOOKUP({"D","E","F","G"},{"D","E","F","G";Sheet1!D5:G},MOD(SEQUENCE(COUNTA(Sheet1!D5:D)*COUNTA(Sheet1!A5:A),1,0),COUNTA(Sheet1!D5:D))+2)},"where Col2>=Col5"))

SPSS Frequency Plot Complication

I am having a hard time generating precisely the frequency table I am looking for using SPSS.
The data in question: cases (n = ~800) with categorical variables DX_n (n = 1-15), each containing ICD9 codes, many of which are the same code. I would like to create a frequency table that groups the DX_n variables such that I can view frequency of every diagnosis in this sample of cases.
The next step is to test the hypothesis that the clustering of diagnoses in this sample is different than that of another. If you have any advice as to how to test this, that would be really appreciated as well!
Thanks!
Edit: My attempts:
1) Analyze -> Descriptive Statistics -> Frequencies; then add variables DX_n (1-15) and display frequency charts. The output is frequencies of each ICD9 code per DX_n variable (so 15 tables are generated - I'm hoping to just have one grouped table).
2) I tried adjusting the output format to organize by variable and also to compare variables but neither option gives the output I'm looking for.
I think what you are looking for CTABLES. It can do parallel columns of frequencies, and it includes a column proportions test that can see whether the distributions differ
Thank you, JKP! You set me on exactly the right track. I'm not sure how I overlooked that menu. Just to clarify in case anyone else comes along needing to figure this out:
Group diagnosis variables into a multiple response set using Analyze > Custom Tables > Multiple Response Sets. Code the variables as categories.
http:// i.imgur.com/ipE9suf.png
Create a custom table with your new multiple response set as a row and the subsets to compare as columns. I set summary statistics to compute from rows and added the column n% column (sorted descending).
http:// i.imgur.com/hptIkfh.png
Under test statistics, include a column proportions z-test as JKP suggested.
http:// i.imgur.com/LYI6ZRl.png
Behold, your results:
http:// i.imgur.com/LgkBA8X.png
Thanks again, and best of luck to anyone else who runs across this.
-GCH
p.s. Sorry everyone, I was going to post images but don't have enough reputation points yet. Images detailing the steps in the GUI can be found at the obfuscated links above.

Summarized data between row labels in PowerPivot (V2) table?

What I have is:
A flat PowerPivot (V2) table to show the data as an ordinary Excel table (very much simplified, it's much wider):
|Starting date|Container|Color|Price|Price inc Tax|
|01.01.2009|container 240|blue|2,50 €|3,05 €|
|01.01.2009|container 240|red |3,60 €|4,39 €|
|01.01.2009|container 360|blue|4,20 €|5,12 €|
Might it be possible to format PowerPivot table so that the summarized columns are not in the end of a row? I'm trying to make a price list/catalog tool. There are a lot of fields in the table and some are less important and I'd like them to be shown after the prices. Starting date, Container and Color are column labels and the Price and Price Tax are summarized data.
Narutally I can't move the summarized data from Value area to Row or Column area in the field list, but is there any other way to reorder the columns so that I get the summarized data e.g. in between Starting date and Container?
Thanks!
Its possible but not totally straightforward.
Writing a measure that returns text is easy for instance:
=VALUES(Table1[Container])
....will return the text from a Column called Container in 'Table1' but ONLY if the context which has been established means that there is only one value for container (VALUES returns a single column table of all values that haven't been filtered out by the current context).
To make this robust you would need to trap errors so your whole formula would look like:
=IF(COUNTROWS(VALUES(Table1[Container]))>1,BLANK(),VALUES(Table1[Container]))
Once perfected this measure can be place after the more important data.
HTH
Jacob

Resources