How to assign same number encoding to several categories in SPSS? - spss

I have a dataset in SPSS with categorical variables: Never, Rarely, Sometimes, Regularly, Always; and I want to encode them for analysis. I need to assign the value 0 to the first 3 categories, but SPSS doesn't seem to let me to.
I have tried to click on the value column, and assign the label 0 to Never, then Rarely etc, but I can't. Once assigned to Never, you can't assign 0 to Rarely for example.
Does anyone know how to do this kind of encoding?

Each value of a variable can have only one value label, though you can apply the same value label string to different values. It sounds like your data values are actually Never, Rarely, etc. If that's the case, then you could assign the same label to multiple values, but that wouldn't result in any analyses treating them as the same group as long as the variable has a different string. As previously suggested, if you want them all to be treated as the same, recoding them (or creating a new variable based on recoding) would be necessary.

Related

HL7v2: Should the empty/blank repeating fields be removed?

Some segments in HL7v2 can be repeating, but what if one of those repetitions is blank? Should the blank repetition be removed? Or should they remain?
For example, in the below extract PID.13 is a repeating field, but the first repetition is blank. It does not even contain "" (empty string).
PID|||A123456789^^^555^PI||Data^Test^^^Mr||19500101|M|||123 Test Road^Testington^^^AA1 2AA||~07778895566|||M|||||||||||||""|||
The PID-13 field has been deprecated as of v2.7 and should no longer be used. Use PID-40 instead.
PID-13 is a special case because the first occurrence has a special meaning, so if there are multiple field repetitions then you shouldn't remove the first one even if it isn't populated. For other fields which don't have documented special cases, you can safely delete empty field occurrence without changing the meaning of the message.
Please refer to this answer.
There are two things needs to be understood.
First:- Empty/blank/null value is also a value. Blank repetitions should not be removed.
Following is what specifications (2.3.2.4 Repetition Separator) say:
2.3.2.4 Repetition Separator.
The repetition separator is used in some data fields to separate multiple occurrences of a field. It is used only where specifically authorized in the descriptions of the relevant data fields. The character that represents the repetition separator is specified for each message as the second character in the Encoding Characters data field of the MSH segment. Absent other considerations it is recommended that all sending applications use '~' as the repetition separator. However, all applications are required to accept whatever character is included in the Message Header and use it to parse the message.
Yes; it does not clearly say anything about removing or keeping empty sub-components. Yes, it neither specifically say that empty value is also a value nor the opposite. I fail to find it in other parts of specifications as well.
To come to the conclusion, we need to move to second thing.
Second:- The sequence of repetition values may also be important. This sequence will change if empty values are removed. This may also change the meaning of the value.
Let us take an example of PID.13 you mentioned in the question.
This field contains the patient's personal phone numbers. All personal phone numbers for the patient are sent in the following sequence. The first sequence is considered the primary number (for backward compatibility). If the primary number is not sent, then a repeat delimiter is sent in the first sequence.
As you can see above, empty value for first sub component tells you that "there is no primary number available for patient". By removing empty value, you are actually putting "secondary number" in place of primary number which may be wrong depending on your use case or implementation.
Other example of PID.3:
This field contains the list of identifiers (one or more) used by the facility to uniquely identify a patient (e.g., medical record number, billing number, birth registry, national unique individual identifier, etc.).
As you can see, by removing empty values in-between changes the meaning of identifier.
I will still prefer clear reference from specifications, but based on what said above, I will avoid removing empty values.

What's the best way of finding the highest value of a table, that is inside a table?

The title is somewhat self-explanatory. I have a table containing a lot of smaller tables with two integers. I need to find the highest/lowest value of the first and second integer within those sub-tables and then output the entire sub-table with the highest/lowest number.
The only way I can come up with seems fairly unintuitive and expensive.
If you need Lua to do the work for you, you can use table.sort and pass it your own sorting function that will sort based on the values of first/second integer and then pick first/last element from the sorted table. If you can maintain the sorted order when inserting (using binary search for a position), then you can always pick the first/last element as needed.

Sum returned values from arrayformula()?

I have a long list of values in a row that I need to perform several functions/vlookups on, multiply them all together and then sum all the final values. There are over 50 different values in a row so I'm trying to come up with a way to do this without manually typing out vlookups for each column.
Here is the formula I put together, however it seems to not be returning the correct value:
=sum(arrayformula(vlookup(offset($A$1,0,{9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54}),Input!$E:$F,2,FALSE)
*offset(indirect("$A"&ROW()),0,{9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54})))
+F6+BD6
It's as if it's doing what I want for some of the values, but not summing them all. Debugging this is an absolute pain, so I'm wondering if I'm even going about this in the right way.
Is there a better way to go about this? I'm wondering if the problem is that I've got two sets of arrays embedded in this function.

In Google Sheets, why does rank() treat two identical values differently?

I have a sheet with four values that I believe are all equal (i.e. =A1=B1 returns TRUE for each pair). However, when I use rank() on a list with those values, they receive different ranks.
To my knowledge, I'm not doing anything strange, such as a workaround to avoid duplicates. (In this scenario, I want duplicate ranks.) The values I'm trying to rank() were the result of a trunc(sum(...),1), so there aren't any hidden decimals places that I'm not noticing.
I'm just using rank(A1,A1:B1) and arrayformula(rank(A1:B1,A1:B1). These two formulas return different results, even.
Why is rank() treating these numbers as different? Is there some kind of flag or extra property on the cells that's not normally visible that is making them different?
This situation is a little hard to explain without seeing the data, so I've recreated the situation in this sheet:
https://docs.google.com/spreadsheets/d/1cL_15WnKgrxhJfT5lYYIg4sRAzaAzbpP7nH9hju1Rv4/edit?usp=sharing
It has to do with the Floating Point Error than with RANK.
In any case and since you are trying to see if " ...there aren't any hidden decimals places that's I'm not noticing" you could follow a different approach.
Using the ROUND function, round the values of your trunc(sum(...),1) results to however many decimal places you may need. OR
From the top menu choose Format> Number> More formats> Custom number formats and create your own
Following that, you will be able to visually spot the differences.
Additionally you can use the RANK function

How to Find out if a column contains any duplicates

I have a column of numbers. I want to know if there are any duplicates. I don't need to know how many or what their value is. I just want to know if there are any.
The best way I could figure out was to have another column of equal height to the column of numbers, with the formula:
=countif(A:A,A1)>1
So this will put a TRUE next to every number that has one or more duplicates in the list.
From here I need to see if this second column contains a TRUE.
So I have a final cell with this formula in it:
=lookup(true, B:B)
This always displays FALSE, even when there are duplicates in the list, with corresponding "TRUE" values next to them in column B.
Also, is there a simpler way of solving this problem?
Note: I can get it to work if the single cell result simply does an =OR(B:B) but I still want to know why my first way won't work and if there is an all around simpler way of doing this.
you can use both =unique(A:A) and also =counta(unique(A:A))
note: the A:A is just a dummy array i threw in for example, replace with whatever column you want to refer to.
to get a final yes or no, you could nest it together by putting =if(eq(counta(A:A),counta(unique(A:A))),"No Duplicates", "Contains Duplicates")
I'm not sure whether simpler (I am confident the formula could be simplified!) but copy/pasting the following might be deemed so:
=sum(if(ARRAYFORMULA(countif(A:A,A1:A)>1),1,0))
This should return 0 only if there are no duplicates. If a single entry is repeated twice (three instances) and all other values are unique, the result should be 3.
TRUE is curious as the behaviour is not what I expected and I differs from Excel where true would be converted to TRUE, which normally indicates an automatic change from text to function. I don't have an explanation but it may be connected with lookup because the boolean behaves as I would expect in say an if formula.

Resources