Scenario and question:
Basically I have results for a matched pair survey of couples in SPSS. It's set up where the person A's answers to questions 1-10 are the first 10 variables and then person B's answers to questions 1-10 are the next 10 variables. But I need to run tests and produce crosstabs for individuals, so if I have 20 couples the crosstabs outputs should be out of 40. I was able to simply select all the data for the "person B"s in couples and just copy and paste it over, however I lost couple-specific data and I still need to be able to create new variables based on the matched pair information. My way around this was creating a new variable while still in matched pair form called CoupleNum, so even when they were in individual form I could say if their couple number equaled each other calculate this or that. But I don't actually know how to do this. In the same dataset, how do I compare rows for the same variable?
Example for what I'm talking about:
Here's fake data
A_CoupleNum
A1_HappyScale
B_CoupleNum
B1_HappyScale
1
6
1
4
2
2
2
3
3
9
3
7
I'd move it to individual form like
CoupleNum
HappyScale
1
6
2
2
3
9
1
4
2
3
3
7
And then I'd want to be able to make a new variable called CoupleHappiness that was the HappyScale for each person in the couple added together.
CoupleNum
HappyScale
CoupleHappiness
1
6
10
2
2
5
3
9
16
1
4
10
2
3
5
3
7
16
So essentially I'd want to code something like
if CoupleNum = CoupleNum CoupleHappiness = HappyScale + HappyScale
I know this is definitely not correct but hopefully it gets my point across and what I'd like to do.
Potential solutions I've found that don't work/I don't know how to make them work for my needs:
Since I'm new to SPSS, I've found several things that might work but I don't know SPSS syntax well enough to suit them for my needs. I've noticed people mention things like LAG functions or CREATE + LEAD if they were in adjacent rows, but they could be all over the place. Someone also mentioned using case numbers but I don't exactly understand that.
Sorry this was a really long question but I would appreciate any help!!
What you are looking for is the aggregate function. In this case you can use it this way:
NOTE - this code was edited and corrected:
aggregate out=* mode=addvariables /break CoupleNum/CoupleHappiness=sum(HappyScale).
The function groups all the rows by values of CoupleNum, then for each group the function will calculate the sum of HappyScale and put it in a new variable called CoupleHappiness.
Related
So in my current project, I am analyzing different ML models based on their quality. Right now, I'd like to put the quality in the context of the time a model needs to train. I track their quality using a F1 Score and I also log the needed time. Now I've been researching the best way to define some of a time-quality ratio but I am unsure how to reach that.
I've been thinking to create a table that has the F1 scores on the y-axis and the Time needed on the x-axis (or the other way around, I don't mind either but figured this makes most sense) but I struggle to define that in Google sheets. My table currently looks something like this (all values are imagined and could vary):
First Dataset
Time (in Min)
Quality (F1 Score)
Iteration 1
5
0
Iteration 2
8
0.1
Iteration 3
11
0.2
Iteration 4
21
0.5
Iteration 5
20
0.8
Iteration 6
21
1
And I'd like a table (this is manually created in GeoGebra) similar to this:
I'm aware I can manually pick my x-axis but was wondering what the best way would be to achieve this - if at all.
you can try a Line chart like this:
Could someone please help me with this problem? I've been trying some solutions and Googling, but so far I haven't found a solution.
I have a list of tasks and they need to be assigned to staff members.
There are several task types and my goal is to distribute each task type evenly to all staff members.
As an example, I prepared this data:
My goal is for each staff to get (roughly) the same number of tasks of each type.
Currently manual workflow:
Count available task per category.
Starting with Type A The first category is Type A, and there are 10 of them in the example.
Divide to staff members. 10 / 3 = 3.33
For uneven divisions, the remainder will go to a staff member. So 1 person will get 4 tasks while the other 2 people gets 3 tasks each.
Assign names to the "Task Assignment" column based on the calculation above.
Repeat the steps above for each task type.
Final result:
In the actual dataset, I could be dealing with 1000 - 1500 tasks, around 10 types, and up to a dozen staff members available for the day.
And this has to be done 1x a day, every day.
Using the manual method mentioned above is quite tedious and error-prone.
I'm hoping there is a way to use formulas to automate the assignment. I tried randomizing the assignment, but as the name suggests, it didn't provide a consistently even distribution.
If you have any ideas on how to solve this, I'd really appreciate it. Thank you so much!
My idea is to repeat list of available staff along the list of task types sorted by task type.
Like this:
All the job is done in column C
=transpose(split(rept(join("|";D3:D)&"|";ceiling(count(A3:A)/COUNTA(D3:D)));"|";1;1))
Formula takes number of tasks: count(A3:A), divides it by number of staff available: counta(d3:d), then divides one by another and rounds up using ceiling formula.
This is used to determine how many times list of Staff available should be repeated.
Repetition is made by first joining together this list (join with | as separator), then repeating (rept) and splitting (split by | ). Finally transposing to have all values in one column.
Columns F to I are just for testing to see if lists of task assignments are similar.
You can play with my solution here:
https://docs.google.com/spreadsheets/d/1OaNiACC8hTShyCLdXj7TmZywQAKMuu7dknVgEnT-ZqE/copy
I am currently working on a project where i need to predict next 4 quarters customer count for a retail client based on previous customer count of last three years i.e. quarterly data means total 12 data points. please suggest a beat approach to predict customer count for next 4 quarters.
Note:-I can't share the data but Customer count has a declining trend YOY.
Please let me know if more information is required or question is not clear.
With only 12 data points you would be hard-pushed to justify anything more than a simple regression analysis.
If the declining trend was so strong that you were at risk of passing below 0 sales you could look at taking a log to linearise the data.
If there is a strong seasonal cycle you will need to factor that in, but doing so also reduces the effective sample size from 12 to 9 quarters of data (three degrees of freedom being used up by the seasonalisation).
Thats about it really.
You dont specify explicitly how far in the future you want to make your predictions, but rather you do that implicitly when you make sure your model is robust and does not over-fit.
What does that mean?
Make sure that distribution of labels with your available independent varaibles has similiar distributions of that what you expect in future. You cant expect your model to learn patterns that were not there in the first place. So variables that show same information for distinct customer count values 4 quarters in the future are what you want to include.
I guess it is really easy, but I just cannot find the answer myself. The variable that I would like to calculate is the variable "Number_of brands_bought" (see below) and I've tried to use the aggregate function in SPSS with respondent as break variable and Brand as summaries of variables (and then I choose function count). However, it just does not give me the right answer.
Respondent Brand Number_of_brands_bought
1 1 3
1 2 3
1 3 3
1 3 3
2 1 2
2 2 2
3 1 3
3 4 3
3 5 3
Does anybody know what to do? Thanks in advance!
It's not clear from the description you have provided how the data is stored. It could be stored in one of two ways (possibly others) either:
1) Wide format
2) Long format
Hopefully this link works to my Google drive docs where I have mocked an example of both file structure formats:
Example Data
If the data is in wide format, where you have brands (bought) as individual dichotomous variables and one row per respondent then you can simply sum the values 1's indicating whether that brand had been bought (assuming 0=no/1=yes coding i.e. as oppose to 1=yes/2=no coding which sometimes is the case)
compute Num_Brands=sum(Bought_Brand01 to Bought_Brand05).
Alternatively, given you suggest the need to use aggregate function, perhaps it is that you have the data in long format i.e. respondents x brands. If that is the case then you can derive the sum of brands using aggregate:
The code in SPSS would be:
AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /Num_Brands=sum(Bought).
What I have is a website where I add collected data of every single shift in a factory's production lines. I add data like (Quantity in tonnes). What I want is to be able to have the data of for instance; the morning, late and night shift of the (Quantity in tonnes) which are in the Shift table and are present and visible in the Shift Index view all combined and added, and added in another page which is the Days Index page (Day contains the shifts, one day has 3 shifts), so I could see the 3 shifts' data summed up together into the data combined to see as the total output of the day.
For example, in the "Quantity in tonnes", I would like 7 + 10 + 12 (These are the inputs I already have and I have added through a form to the shifts index) to be summed up, and appear in the Days index page automatically without me interfering as "29" in the Quantity of tonnes columns in it.
How is that possible to do? I can't seem to figure out how to write the code for it so that it would loop over all the inputs and constantly give me the summed out outputs.
Let me know if you need to see any parts of my code and if there is anymore info I could add for you to understand.
Have a look at the groupdate gem, it allows you to group by day, week, hour of the day, etc.
Some code from your end would help, but here's an example use, if I wanted to get revenue for past 3 weeks:
time_range = 90.days.ago..Time.zone.now
total = Sales.where('status > 2').group_by_week(:date_scheduled, Time.zone, time_range).sum(:price)