I am analyzing an electronic survey I made using Google Forms and I have the following problem.
One of the questions can take multiple answers in the form of Checkboxes as shown in the picture below. The question is in Greek so I have added some Choice1, Choice2, Choice3 etc next to each answer in order to facilitate my question.
In my data when someone chose lets say Choice1 and Choice2,
I will have an answer which is the concatenation of the strings he checked seperated with commas.
In this case it would be:
Choice1, Choice2
If someone else checked Choice1, Choice2 and Choice4
his answer in my data would be:
Choice1, Choice2, Choice4
The problem is SPSS has no way of seperating the substrings (seperated by commas) and understanding which Choices each case has in common. Or maybe there is a way but I don't know it :)
When I, for example, do a simple frequency analysis for this question it produces a table that perceives
Choice1, Choice2
as a completely different case from
Choice1, Choice2, Choice4
Ideally I would like to somehow tell SPSS to count the frequency of each unique Choice (Choice1, Choice2, Choice3 etc etc) rather than each unique combination of those Choices.
Is that possible? And if it is can you point me to the documentation I need to study to make it happen?
Thx a lot!
Imagine you are working with the following data, which is a CSV file you have downloaded from your online form. Copy and paste the text below and save it to a text file named "CourseInterestSurvey.CSV".
Timestamp,Which courses are you interested in?,What software do you use?
12/28/2012 11:57:56,"Research Methods, Data Visualization","Gnumeric, SPSS, R"
12/28/2012 11:58:09,Data Visualization,"SPSS, Stata, R"
12/28/2012 11:59:09,"Research Dissemination, Graphic Design",Adobe InDesign
12/28/2012 11:59:27,"Data Analysis, Data Visualization, Graphic Design","Excel, OpenOffice.org/Libre Office, Stata"
12/28/2012 11:59:44,Data Visualization,"R, Adobe Illustrator"
Read it into SPSS using the following syntax:
GET DATA
/TYPE=TXT
/FILE="path\to\CourseInterestSurvey.CSV"
/DELCASE=LINE
/DELIMITERS=","
/QUALIFIER='"'
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/IMPORTCASE=ALL
/VARIABLES=
Timestamp A19
CourseInterest A49
Software A41.
CACHE.
EXECUTE.
DATASET NAME DataSet2 WINDOW=FRONT.
LIST.
It currently looks like the image below--three columns (one timestamp, and two with the data we want):
Working with some syntax from here, we can split the cells up as follows:
* We know the string does not excede 50 characters.
* We got that information while we were reading our data in.
STRING #temp(a50).
* We're going to work on the "CourseInterest" variable.
COMPUTE #temp=CourseInterest.
* We're going to create 3 new variables with the prefix "CourseInterest".
* You should modify this according to the actual number of options your data has
* and the maximum length of one of the strings in your data.
VECTOR CourseInterest(3, a25).
* Here's where the actual variable creation takes place.
LOOP #i = 1 TO 3.
. COMPUTE #index=index(#temp,",").
. DO IF #index GT 0.
. COMPUTE CourseInterest(#i)=LTRIM(substr(#temp,1, #index-1)).
. COMPUTE #temp=substr(#temp, #index+1).
. ELSE.
. COMPUTE CourseInterest(#i)=LTRIM(#temp).
. COMPUTE #temp=''.
. END IF.
END LOOP IF #index EQ 0.
LIST.
The result:
This only addresses one column at a time, and I'm not familiar enough to modify it to work over multiple columns. However, if you were to switch over to R, I already have some readymade functions to help deal with exactly these kinds of situations.
Unfortunately there is no easy "built-in" way to achieve this, but it is certainly achievable with spreadsheet formulae, or Google Apps Script.
Using formulae, assuming your check box question lands in column D, this will produce a "normalised" list:
=ArrayFormula(TRANSPOSE(SPLIT(CONCAENATE(D2:D&",");",")))
and you can turn that into a two-column list and QUERY it to return a table of frequencies:
=ArrayFormula(QUERY(TRANSPOSE(SPLIT(CONCATENATE(D2:D&",");","))&{"",""};"select Col1, count(Col2) group by Col1 label Col1 'Item', count(Col2) 'Frequency'";0))
If your locale uses a comma as a decimal separator, replace {"",""} with {""\""}.
It is easy to split the fields into separate variables as described above. Now define these variables as a multiple response set (Analyze > Tables > Multiple Response Sets), and you can analyze these with the CTABLES or MULT REPONSE procedures and graph them using the Chart Builder
Related
As you can see I transpose codes into unique column headings so that debits and credits are analysed and summated. Summations are transposed in another sheet to create summary profit/loss account. I need help how to replicate the sum formula in column I to serve any expanded transposed unique codes and whether/how I should use arrayformula for the individual cell output.
EDIT
Actual output looks like this:
My problem is to how to automatically accommodate new entries/codes in the totals row and main body of cells. The data belongs to a residents' committee so I can only show anonymous data as image.
EDIT 2
Actual input is imported from bank records, then coded:
Query is pretty good for the SUM part.
Starting in column I, you can do:
=ArrayFormula(INDEX(QUERY(
0+OFFSET(I4,0,0,ROWS(F6:F),COUNTA(UNIQUE(F4:F))),
"select "&
JOIN(
",",
"sum(Col"&SEQUENCE(COUNTA(UNIQUE(F4:F)))&")"
)
),2))
The 0+ or the VALUE in the second one (they both do the same thing here) transforms the data cells to default to 0 if blank, otherwise the query fails. This also lets us refer to the columns by sequence number, which is what we do in the second argument. We build the query into something that looks like select sum(Col1),sum(Col2),...,sum(ColN). Since this gives us a header by default, we could relabel everything in the query statement, but that gives too much extra code, so the easier thing to do is use INDEX to select the sums.
The EQ part is fairly straightforward to Arrayify. Starting in I4:
=ArrayFormula(
(FILTER(F4:F,F4:F<>"")=FILTER(I2:2,I2:2<>""))*
IF(
Array_constrain(G4:G,COUNTA(FILTER(F4:F,F4:F<>"")),1),
G4:G,
-H4:H
)
)
The FILTERs just filter out the blank cells, and the Array_Constrain sizes the G column to the same size as the filtered F column.
I have already asked a similar question in another thread which was wonderfully answered by someone. However, it seems like that even if it did help, it was not 100% ok for all my GSheets and different projects. So I have been able to get more useful data from my source and I think that I would be able to have a bulletproof solution if I am able to find the good formula...
So my data looks like that:
Column A has a list separated by semicolons. Copy paste looks like that "Potato ; Banana ; Apple".
Column B has a list of IDs, which are linked to the data in column A. So Potato is ID 1871, Banana is ID 1890 and Apple is ID 1840. Copy paste of date is: "1871 ; 1890 ; 1840"
Column C should output formula with the value "Banana", because his ID is the highest of all (1890 > 1871 > 1840).
I have tried a lot of different things. Overall I tried to SPLIT the values with " ; " and create one array where I would try to sort them from the column or row where the IDs would be. I wasn't sure how to merge both results...
I tried to study the ARRAYFORMULA() function to see if it would help. gSheets is very nice to work with, but I come from a world of PHP where I would write 7 lines of code to achieve my goals, where here I need to do only 1 line of code and it is a challenge for me.
Any help is appreciated to find the formula in cell C1 that would get the value from A1 that has the highest ID in B1.
=ARRAYFORMULA(HLOOKUP(MAX(TRIM(SPLIT(B1, ";"))*1),
{TRIM(SPLIT(B1, ";"))*1; TRIM(SPLIT(A1, ";"))}, 2, 0))
I have raw data in my spreadsheet that comes from a Google Form that looks like the following:
(Cost) (Source) (Frivolous) (Medium) (Comments)
A B C D E
1 15.94 McDonalds Yes Credit was hungry
2 98.32 School No Check Paid for textbooks
3 843.00 Hospital No Check Surgery
4 0 asdff Yes N/A Ignore this one woops
5
6 23.99 Dentist No Credit Check up
I want this data to always be copied to a different sheet, but ONLY the data that matches a condition. That condition in this case is if Frivolous is No, meaning I only want on this separate page to track valid important spending.
My second page I want them to look like the following:
(Cost) (Source) (Frivolous) (Medium) (Comments)
A B C D E
1 98.32 School No Check Paid for textbooks
2 843.00 Hospital No Check Surgery
3 23.99 Dentist No Credit Check up
Notice how empty entries are ignored and also entries with Yes under Frivolous are ignored as well.
How would I achieve this? I have absolutely no idea how that would work since I've only been able to achieve this through filter which will not work for this.
I would like to say a few words in defense of Google Spreadsheets and show some great functions that will work, but they are not supported by [excel].
Query
First you may use simple query:
=QUERY(sheet1!A:E,"select * where C = 'No'")
This single short formula will give the desired result, there's no need to fill right and down.
Filter
Actually you may use filter too. This function seems to work too:
=FILTER(sheet1!A:E,sheet1!C:C="No")
Please, read more info about this functions:
Filter
Query and full Query Language Reference
You'll find many exciting things that could be done in Google spreadsheets.
Actually, I was having some trouble with [google-sheets] ArrayFormula function so I used an old-school formula with SMALL and INDEX function in its array form. In A2,
=iferror(index(Sheet13!A$1:A$99, small(index(row($1:$99)+(Sheet13!$C$1:$C$99<>"no")*1E+99, 0, 0), row(1:1))), "")
Fill both right and down.
So you were in fact correct that this could be solved in [excel] with an identical solution as [google-spreadsheet]. However, there are superior methods in newer [exce] (2010+) using the AGGREGATE function that [google-spreadsheet] does not support and I'm sure that [google-sheets] has more elegant functions that I am not recalling right this moment.
Look to Sheet13 and Sheet14 here for the working sample.
Edit for clarity:
The end result that I'm looking for will pass the name picked from the dropdown list in B1 into formulas below to check if each student has completed the training modules listed. The two sheets I'm looking at (for now) are 'Data' and 'Quick Reference' the rest are just placeholders.
I've looked and not found an answer so apologies if this has been asked before!
I'm trying to write a formula that will check form results and if it finds a match, return the contents from another column in that row.
The logical IF function works fine with hard coded entries, but not with cell contents.
So if I'm looking for "Deckhand USS Minnow I" I want to show the contents of Column H. Look on the Data and Quick Reference (Training Progress) tabs to see what I'm talking about.
This works:
=IF(((VLOOKUP($B1,Data!C2:J20,4)="Build Coconut Phone I")),"Yes", "No")
This doesn't:
=IF(((VLOOKUP($B1,Data!C2:J20,4)="Deckhand USS Minnow IV")),Data!H2:H20)
Google spreadsheet
Short answer
Use QUERY() instead of IF(Vlookup(),)
=QUERY(Data!C1:J20,"Select H where F = 'Deckhand USS Minnow IV'",1)
Explanation
While could be possible to use an array formula using IF and VLOOKUP, QUERY() does the work in a direct and simple way.
The formula in the short answer will include the column heading. If you don't want it use
=QUERY(Data!C2:J20,"Select H where F = 'Deckhand USS Minnow IV'",0)
References
QUERY - Google Docs editors Help
I would like to perform a multi criteria search of data in a column- contains data of check boxes(more than one option chosen).
For a clearer picture of what I am trying to do, screenshot below is a question in a form
Data from the form are saved in sheets like below,
So my concern here is if I would like to search/filter for the rows that contain "Commercial", the rows with Commercial,Engineering doesn't show up. That's definitely not an effective search.
Any advise on how can I go about this issue is kindly appreciated. If
Let's say you have your form in the response sheet in columns A to P, with the multiple choice in col D. If you want to filter your data on the word 'Commercial' you can either do:
=filter(A2:P, regexmatch(A2:P, "Commercial"))
or use query():
=query(A2:P, "select * where B contains 'Commercial' ")
Note: depending on your locale you may have to change the commas to semi-colons in order for the formulas to work.
I hope that helps ?
Following JPV's answer, I developed a line to make the query useful if you want to cross two categories. Let's suppose that someone in your checkbox example had picked all the options (IT, HR, Commercial, Engineering); and that you have created the cell with the dropdown option box in cell B1 with all your options, as JPV said.
Then, you want to filter and see all the people who had chosen IT and Commercial. You would, for that, create a second cell with the dropdown option box in, lets say C1; and then your query would be:
=query(A2:P, "select * where B contains '"&B1&"' and B contains '"&C1&"' ")
=FILTER(MOBILE!A2:E2000, ISNUMBER(SEARCH(A1,MOBILE!A2:A2000)))
SEARCH function will return a number of the position of the searched word (A1) in the searched strings in range (MOBILE!A2:A2000).
If the result of search is a number (ISNUMBER), then filter will return the TRUE rows result from the range MOBILE!A2:E2000.