I have a dataset that looks like this:
outlet name (string variable): name of media outlet (maximum 12), the last three outlets in the file are The Guardian, The Telegraph and The Independent.
score 1: scale
score 2: scale
...
score 7: scale.
What I want to do is compute a set of 21 new variables that show for each of the cases (media outlets), for each of the seven variables (scores), the difference between the score of that specific outlet, and the scores of the three outlets of interest: The Guardian, The Telegraph and The Independent (7 variables X 3 benchmark outlets=21). Essentially I want to compare each outlet's scores to my three benchmark outlets.
So for example I should have a new variable, named score1_Guardian, that for outlet 1 will be computed as: the score outlet 1 got for that variable - the score The Guardian got for that variable. Variable score2_Guardian will show, for each outlet, the difference between the score each specific outlet got on that variable and the score The Guardian got for that variable, and so on. So in this example, the outlet The Guardian will score 0 on all score1_Guardian to score7_Guardian variables.
There are simpler ways to do this than what I suggest below, but I like it better this way - less code and less temporary variables.
First I create a fake dataset according to your parameters:
data list list/outlet (a12) score1 to score7 (7f6).
begin data
'outlet1' 1 2 3 4 5 6 7
'outlet2' 2 3 4 5 6 7 8
'outlet3' 5 6 7 8 9 1 2
'Guardian' 7 8 9 1 2 5 6
'Telegraph' 5 12 12 3 4 4 2
'Independent' 2 2 2 2 2 2 2
end data.
Now we can get to work:
*going from wide to long form - just to avoid creating too many variables on the way.
varstocasese /make score from score1 to score7/index scorenum(score).
if outlet='Guardian' Guardian=score.
if outlet='Telegraph' Telegraph=score.
if outlet='Independent' Independent=score.
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES OVERWRITEVARS=YES
/BREAK=scorenum /Guardian=MAX(Guardian) /Telegraph=MAX(Telegraph) /Independent=MAX(Independent).
*now we have three new variables ready to compare.
compute Guardian=score - Guardian.
compute Telegraph=score - Telegraph.
compute Independent=score - Independent.
* last step - going back to wide format.
compute scorenum=substr(scorenum,6,1).
CASESTOVARS /id=outlet /index=scorenum/sep="_".
Related
<For example:
a variable has values 1 2 3 5 10 11 12 13 14 20 21 ....
I want to replace it with 1 2 3 4 5 6 7 8 9 10 11.....
I was using this command but is not giving, the desired results:
old variable=district
I want to replace value with the correct sequential values>
levelsof district, local(district_new)
foreach i in `district_new'{
replace district= mod(_n-1,707)+1
}
Not fully sure what you trying to do, but is this a solution to what you are trying to do:
sort district
replace district = _n
This will replace the values in district with 1 for the lowest current value, 2 for the second lowest value etc. This might not be a good solution if your variable may have duplicates.
I agree with #TheIceBear but more can be said that won't fit easily into comments.
The particular code posted boils down to a single statement repeated
replace district = mod(_n-1,707) + 1
as that action is repeated regardless of the values of district. In a dataset with 707 or fewer observations, that in turn would be equivalent to
replace district = _n
as #TheIceBear points out. If there were duplicate observations on any district, this would definitely be a bad idea, and something like
egen newid = group(district), label
would be a better idea. For more, see https://www.stata.com/support/faqs/data-management/creating-group-identifiers/
This may be beyond my skill level in Google Sheets, and it's certainly straining my brain to think through, but I have two columns out a large spreadsheet (30000 lines or so) that I need to find matches between unique values on one list, and non-unique but specific values ONLY on another list. That is, I would need the following list to return only the values on the left that had a 3 in the right column every time that value appears on the left, not just for a specific instance.
"Unique" Identifier (can repeat)
Value
1
2
2
3
3
2
4
2
5
3
6
2
1
2
2
2
3
2
4
2
5
2
6
2
I have the following formula from another couple answers mocked up, but it doesn't get me all the way there:=UNIQUE(FILTER(A2:A,B2:B>0))
How can I get it to exclude the ones that have, for instance, both a 2 and a 3 in the right column for the same value in the left column?
Edit: To put it in more real terms (I was trying to keep it abstract so I could understand the basics), I have a Catalog ID and a Condition for items, and need to find all Catalog IDs that only have Good copies, not any Very Good copies. This link should show what I want to achieve:
https://docs.google.com/spreadsheets/d/e/2PACX-1vSjenkDS2Mk3t4kTcDoJqSc8AV6ONu4Q17K1HPaIUdJkb7dhdnbAt-CzUxGO3ZoJISNpGajUtFTGz8c/pubhtml?gid=0&single=true
to return only the values on the left that had a 3 in the right column every time
try:
=UNIQUE(FILTER(A:A; B:B=3))
update 1:
=UNIQUE(FILTER(Sheet1!A:A; Sheet1!B:B="Good"))
update 2:
=UNIQUE(FILTER(Sheet1!A:A, Sheet1!B:B="Good",
NOT(COUNTIF(FILTER(Sheet1!A:A, Sheet1!B:B<>"Good"), Sheet1!A:A))))
I have a data file that looks like the first picture, I am reading it in to SPSS using FILE TYPE MIXED so that it looks like the second picture. How can I merge the cases based on the ID variable so that cases with the same ID variable are merged? The variable Age is repeated, so it does not matter which is selected, but it would be good if it were possible to select the first value.
Here is an example of the code I am using to read the data:
FILE TYPE MIXED RECORD=RecordID 1
/ WILD =WARN.
RECORD TYPE 1.
DATA LIST
/ ID 8-9 JobType 3-4 Age 5-7.
RECORD TYPE 2.
DATA LIST
/ ID 3-4 Sex 11 Salary 5-8.
RECORD TYPE 3.
DATA LIST
/ ID 6-7 Age 8-10 Hiring 3-5.
END FILE TYPE.
BEGIN DATA
1 1 39 1
1 3 27 2
1 2 27 3
1 3 25 4
2 1 9000 0
2 2 7500 0
2 3 4750 1
2 4 7250 1
3 76 1 39
3 98 2 27
3 8 3 27
3 44 4 25
END DATA.
LIST.
This should work:
sort cases by ID RecordID.
casestovars id=ID/index=RecordID.
If the ages are identical they collapse into one column. If they aren't, you'll get three age columns, and you'll be able to choose the one you prefer.
I have a google spreadsheet that has 6 cells with specific numbers in them. Every week, a series of numbers is entered in and I would like to flag the numbers in a separate column if they appear for that week. I was using the formula below where my numbers are in D2->I2 and the weekly ones would be in D18->I18 for example.
=arrayformula(sumproduct((D2:I2=D18:I18)))
Now, while this works, it's not quite what I'm trying to do. Unless the numbers match each other exactly, 1 2 3 4 5 6 to 1 2 3 4 5 6 then the addition doesn't happen. What I would like to have happen is that if, for example, the master column has 1 2 3 4 5 6 and the weekly column has 3 7 9 1 8 5 then the cell with the formula would display the value of 3 for matching three of the numbers that week.
Does anyone have a suggestion on how best to accomplish this?
See if this works ?
=ArrayFormula(sum(--regexmatch(D2:I2&"", join("|", D18:I18&""))))
with exclusion of empty cells in both ranges:
=iferror(ArrayFormula(sum(--regexmatch(to_text(filter(D2:I2, len(D2:I2))), "\b("&join("|", to_text(filter(D18:I18, len(D18:I18))))&")\b"))))
I have a Core Data model where Session has relationship to Activity which has timestamp property. Session are part of a Group, like this:
A-> 1 2 3 4 5 6 7 8 9
B1 X 1 1 1 X 1 1 1 1
B2 X X Y 2 X 2 2 2 2
B3 3 X 3 3 3 3 3 3 3
So, here Groups are A1, A2 etc, Session is A1B1, A1B2 etc. This number after A is group.order, number after B is session.order (in the data model).
I marked Sessions with completed Activity using X and last such session marked with Y (A3B2).
It's obvious that session can be done as customer wants, no order is mandatory. What I need is a predicate that will give me the next session after the last completed one. In the example above, that would be A3B3.
I tried this, but it does not work with Core Data (unsupported aggregate function...)
[NSPredicate predicateWithFormat:#"activity = nil AND
group.order >= activity.timestamp.#max.session.group.order AND
order > activity.timestamp.#max.session.order"]
I'm actually not sure it's the correct predicate, but never mind, it crashes because of #max so did not dwell on it much. I'm trying to figure out how to do this with SUBQUERY, but no luck.