Several lines for 1 case in a dataset that has to be analysed is SPSS - spss

I have a large dataset in excel. A lot of cases have different lines. I should get cells from several colums in different colums. I have very basic knowledge of SQL and Pyspark (Python). I was told this should be possible with syntax in SPSS. The cells contain text. At this point I have 26 columns, from which 5 colums generate different lines. For example: diagnosis, some patients have more than 1 diagnosis, which gives several lines for this patient.
Can someone help me out?
I searched, but didn't find the right solution. I found a lot on pivot tables, but that is not what I need, since I need to move information to a new column.
e.g.: medical diagnosis could become med_diagn1, med_diagn2,... Same for nursing problem. This is now 1 column and can become vpd1, vpd2,...

Related

Tableau Numbers Chart

I'm having issues figuring out how to combine numbers into one chart in Tableau. Attached is an example of what I'm looking to recreate.
I can make something similar to this using multiple worksheets, but I'm hoping to do it in one worksheet. The far left column are grouped items from the data source. i.e. Apple, grape, banana = Group 1. Maybe the groups are screwing it up, or maybe I'm not creating the LODs for "Things", "Stuff", and "Other" correctly. I would appreciate anyone's help. Attached Image

Crystal Reports Crosstab with multiple row fields(attributes)

I'm designing a business analysis report using Crystal Report XI and oracle stored procedure as data source. Report contains a crosstab with one row (on the left) and summarized values under selling station names.
Requirement is to have multiple attribute columns on left like Product ID, Product Name, Product Color, Product Size, Product Sold Date etc and at the end, summarized values. What I've done so far is a crosstab with only one column at left and then summarized values.
Here is the sample of crosstab as required.
I've done plenty of R&D but didn't find any appropriate solution.
The output of report is required to match the format provided by business user.
So the solution I devised is here:
Crosstab is used to aggregate and jointly display the distribution of two or more variables by tabulating their results against one dimension. Problem was how to increase the number of dimensions. Since this is against logic of crosstab, so I modified my stored-procedure and created one single string by concatenating the dimensions and created a crosstab against it. These dimensions are separated by a delimiter '~' or you can use some other for better readability.

How to combine dynamic ranges with OFFSET and INDIRECT functions?

I'm trying to create a formula that uses a dynamic range to link to the different tabs. It would then return the last 12 values before a blank cell, based on the value in the Header Column.
The tricky part is the data in each tab is of different sizes, so I thought it might have to incorporate either Offset(Blank()),0,-12) or something of the sort. I've tried a lot of different things, and this is the latest effort:
=Index(Indirect(A2&"!A9:AK9"),match("Conversions",Indirect(A2&"!A9:AK9"),)0,-12)
Edit: First post, sorry for the confusion. My goal is to make a large dashboard that has a dynamic chart using our monthly metrics. (I'm leaving the chart setup for another day)
The data varies in such, some have columns A:K, while some have a larger range of A:AK , etc. (With A:A being text). Since I've posted this, I have had some success by using Filter(), but the problem I'm not sure how to solve is finding the last 12 values before a blank cell.
Example of Data 1
Example of Data 2
Hopefully this helps explain the situation and I appreciate everyone for help.

BigQuery taking too much time on simple LEFT JOIN

So I'm doing a really basic left join, basically joining different identifiers of my database, described below :
SELECT
main_id,
DT.table_1.mid_id AS mid_id,
final_id
FROM DT.table_1
LEFT JOIN DT.table_2 ON DT.table_1.mid_id = DT.table_2.mid_id
Table 1 is composed of four columns, main_id, mid_id, firstSeen and lastSeen.
There is 17,014,676 rows, for 519 MB of data. Each row is composed of a unique main_id - mid_id couple, but a main_id/mid_id can appear multiple times in the table.
Table 2 is composed of four columns, mid_id, final_id, firstSeen and lastSeen.
There is 66,779,079 rows, for 3.86 GB of data. In the same way, each row is composed of a unique mid_id - final_id couple, but a mid_id/final_id can appear multiple times in the table.
BigQuery is using only 3.11 GB for the query himself.
first_id and mid_id are integers, final_id is a string.
The Query result was too big for bigQuery to resolve so I had to create a "result" table, containing first, mid and final id with the exact type I wrote above. The "Allow Large Results" option had to be selected, or an error was thrown.
My problem is that this simple query already took an hour, and is not even finalised yet ! I read that the good practice would have been to do a RIGHT JOIN so that the first table in the join is the biggest, but still, an hour is awfully long, even for that case !
Do you, kind people of Stack Overflow, have an explanation ?
Thank you by advance !

SPSS Frequency Plot Complication

I am having a hard time generating precisely the frequency table I am looking for using SPSS.
The data in question: cases (n = ~800) with categorical variables DX_n (n = 1-15), each containing ICD9 codes, many of which are the same code. I would like to create a frequency table that groups the DX_n variables such that I can view frequency of every diagnosis in this sample of cases.
The next step is to test the hypothesis that the clustering of diagnoses in this sample is different than that of another. If you have any advice as to how to test this, that would be really appreciated as well!
Thanks!
Edit: My attempts:
1) Analyze -> Descriptive Statistics -> Frequencies; then add variables DX_n (1-15) and display frequency charts. The output is frequencies of each ICD9 code per DX_n variable (so 15 tables are generated - I'm hoping to just have one grouped table).
2) I tried adjusting the output format to organize by variable and also to compare variables but neither option gives the output I'm looking for.
I think what you are looking for CTABLES. It can do parallel columns of frequencies, and it includes a column proportions test that can see whether the distributions differ
Thank you, JKP! You set me on exactly the right track. I'm not sure how I overlooked that menu. Just to clarify in case anyone else comes along needing to figure this out:
Group diagnosis variables into a multiple response set using Analyze > Custom Tables > Multiple Response Sets. Code the variables as categories.
http:// i.imgur.com/ipE9suf.png
Create a custom table with your new multiple response set as a row and the subsets to compare as columns. I set summary statistics to compute from rows and added the column n% column (sorted descending).
http:// i.imgur.com/hptIkfh.png
Under test statistics, include a column proportions z-test as JKP suggested.
http:// i.imgur.com/LYI6ZRl.png
Behold, your results:
http:// i.imgur.com/LgkBA8X.png
Thanks again, and best of luck to anyone else who runs across this.
-GCH
p.s. Sorry everyone, I was going to post images but don't have enough reputation points yet. Images detailing the steps in the GUI can be found at the obfuscated links above.

Resources