On SPSS, I have a spreadsheet with over 6000 people on it. Each person has taken a test at least two times and has at least two results for the same test. Some people have taken the test more than twice. Is it possible for SPSS to see if the two tests are within 6 months and if they are include the results of those tests next to the person and delete all other results?
Structure of the data
PersonNo Test Date Test Result Test Date 2 Test result 2, Test Date 3, Test result 3
PersonNo 512, 23-Aug-18, 65, 22-May-18, 72
Problem
PersonNo 98432, 09-Feb-18, 74, 06-Nov-18, 76, 10-Aug-18, 67
PersonNo 91203, 10-Dec-18, 75, 10-Sep-18, 65
PersonNo 75432, 01-Jan-18, 65, 01-Dec-18, 65
How I want it
PersonNo 98432, 09-Feb-18, 74, 10-Aug-18, 67
PersonNo 91203, 10-Dec-18, 75, 10-Sep-18, 65
Person No 75432 removed as they dont have two test results within 6 months
To see which tests were taken within 6 months of the TestDate variable you can use DATEDIFF(date1, date2, units). Since it's unclear how many TestDate fields you have, you may want to reorder your variables to use the VECTOR command to loop through them.
* assumes there are up to 11 tests each respondent may have taken.
VECTOR nextTestDate = TestDate2 TO TestDate 11 .
VECTOR nextTestResult = TestResult2 TO TestResult11 .
LOOP #i = 1 TO 10 .
* if not within 6 months then set date & result to sysmis .
DO IF (DATEDIFF(TestDate, nextTestDate(#i), 'days') > 182) .
RECODE nextTestDate(#i) nextTestResult(#i) (ELSE = SYSMIS) .
END IF .
END LOOP .
EXE .
You don't need to do this within VECTOR if you only have a couple of TestDate fields to check. From here you can delete any variables that no longer have any data in them (easily checked using DESC TestResult2 TO TestResult11).
Instead of a double loop and multiple comparisons, I suggest a restructure which enables sorting and comparison of only consecutive tests.
First I'm creating a small fake data to demonstrate on:
data list free/PersonNo (f6) date1(Date11) score1(f3) Date2(date11) score2 (f3) Date3 (date11) score3(f3) Date4 (date11) score4(f3).
begin data
98432, 09-Feb-18, 74, 06-Nov-18, 76, 10-Aug-18, 67, ,
91203, 10-Dec-18, 75, 10-Sep-18, 65, , , ,
75432, 01-Jan-18, 65, 01-Dec-18, 65, , , ,
12345, 19-Mar-18, 74, 26-Dec-19, 55, 10-Aug-18, 81, 19-Feb-19, 77
end data.
Now for the actual task:
* first step - restructuring to long format.
varstocases /make date from date1 date2 date3 date4/make score from score1 score2 score3 score4.
* now it is possible to sort by test date, compare the dates and keep only the relevant ones.
sort cases by PersonNo date.
if $casenum>1 and PersonNo=lag(PersonNo) cond=DATEDIFF(date, lag(date), 'days') < 182.
create cond2=lead(cond,1).
select if cond or cond2.
exe.
* At this point you have only the relevant persons and tests left.
You might continue your analysis in this structure,
but if you want the following code gets you back to the original structure.
compute ind=1.
if $casenum>1 and PersonNo=lag(PersonNo) ind=lag(ind)+1.
format ind(f1).
casestovars /id=PersonNo /index=ind /drop cond cond2 /groupby=index /separator="".
Related
I'm trying to get Google Sheets functions to calculate the difference in minutes between two bedtimes and have been spinning my wheels for at least five hours on this. Here are four examples of what I'm trying to accomplish:
BEDTIME 1 BEDTIME 2 DIFF IN MINS
9:00 PM 9:15 PM 15
9:00 PM 10:00 PM 60
11:30 PM 1:00 AM 90
1:00 AM 11:00 PM 120
As you can see, the date doesn't figure at all. I apologize for not offering up code, but I've tried at least half a dozen approaches from other answers and they aren't working -- mainly, I suspect, because most people are looking to find the time elapsed between the two times whereas I'm looking to determine "how much earlier" or "how much later" one bedtime is relative to another (always expressed as a positive value).
Any help would be appreciated. Thanks.
Times are stored as numbers between 0 and 1. If you subtract two times and multiply the result by 24 x 60 = 1440 and format as a number you’ll get number of minutes. I think you’ll need something like:
=MIN(ABS(1440*(B1-A1)), ABS(1440*(B1-A1-1)), ABS(1440*(B1-A1+1)))
The difference between two times is a duration. The question requests that durations be converted to "digital minutes", but that is often not as readable as one would think. 175 minutes is more difficult to understand than 2:55 hours.
There is therefore usually no point in multiplying by 24 * 60 — instead, just use the duration value as is:
=min( abs(B2 - A2), abs(B2 - A2 - 1), abs(B2 - A2 + 1) )
Format the result cell as Format > Number > Duration.
See this answer for an explanation of how date and time values work in spreadsheets.
use arrayformula:
=INDEX(IFERROR(1/(1/TRANSPOSE(QUERY(TRANSPOSE(
IF(A2:A&B2:B="", 0, ABS(1440*(B2:B-A2:A+{-1, 0, 1})))),
"select "&TEXTJOIN(",", 1,
"min(Col"&ROW(A2:A)-ROW(A2)+1&")"))))),, 2)
or:
=INDEX(IFERROR(1/(1/QUERY(SPLIT(FLATTEN(
ROW(A2:A)&"×"&ABS(1440*(B2:B-A2:A+{-1, 0, 1}))), "×"),
"select min(Col2) group by Col1 label min(Col2)''"))))
Try to implement a modulus function in your code. It would basically do something like this:
If x = -5, then y = f(x) = – (-5) = 5, since x is less than zero
If x = 10, then y = f(x) = 10, since x is greater than zero
If x = 0, then y = f(x) = 0, since x is equal to zero
Therefore calculating how much time passed without negative values.
Let's say I have 12 non-empty cells:
A
B
C
D
E
F
G
H
I
J
K
L
I usually use the following formula:
=IF(COUNTIF(A:A,"<>")=12,"OK","ERROR")
But if there are 8 non-empty cells I also want it to be OK, so I change it to:
=IF(COUNTIF(A:A,"<>")=12,"OK",IF(COUNTIF(A:A,"<>")=8,"OK","ERROR"))
I need to add more IF functions for all numbers multiple of 4, as they are all OK.
Is there any way to already warn a formula that whenever it is a multiple of 4, such as 4, 8, 12, 16, 20, 24 and so on, return the value OK?
According to the tip given by the user #Calculuswhiz in this comment → OK if the number of non-empty cells is a multiple of number 4, a simple way to solve the problem is to work with the MOD function, which returns the result of the module operator, the rest of a division operation.
Then, when the remainder is equal to 0, it is automatically noted that the number is a multiple of which it is trying to divide, in which case the formula that solves the problem would be as follows:
=IF(MOD(COUNTIF(A:A,"<>"),4)=0,"OK","ERROR")
I have a problem. We have coded item names which has certain values that I need to do calculations with.
I.E. ASG-120U9624M I need to extract only 120, 96, 24, as they are parameters required for calculations. Also 96 could be 220(2-3 digits). 24 could be only 12 or 24. I know that you can get values after certain symbols i.e (-, u) but can you detect that value ends before 12/24. If 96 value could be only 2 digits it would be easy but now it's out of my knowledge to do so. Need some help.
B1:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A1:A, "-(\d+)U")))
C1:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A1:A, "U(\d+)..M")))
D1:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A1:A, ".+(\d{2})M")))
Try this:
=ARRAYFORMULA(IFNA(IF(IFERROR(LEN(REGEXEXTRACT(A1:A, ".*U(\d{4})M")), 5) = 4, REGEXEXTRACT(A1:A, "^ASG-(\d{3})U(\d{2})(\d{2})M$"), REGEXEXTRACT(A1:A, "^ASG-(\d{3})U(\d{3})(\d{2})M$"))))
LEN(REGEXEXTRACT(A1:A, ".*U(\d{4})M")), 5) = 4 - Determine the number of digits from U-M
REGEXEXTRACT(A1:A, "^ASG-(\d{3})U(\d{2})(\d{2})M$") - use this regex if number of digits is 4.
REGEXEXTRACT(A1:A, "^ASG-(\d{3})U(\d{3})(\d{2})M$") - use this regex if number of digits is 5.
Sample Sheet:
Let's say your raw data runs A2:A. Place the following in B2:
=ArrayFormula(IF(A2:A="",,REGEXEXTRACT(A2:A,"(\d+)\D(\d+)(12|24)")))
This one formula will extract all three columns of numbers.
The regex captures three groups, each contained in parentheses. It reads: "Any number of digits followed by one non-digit followed by any number of digits up to a 12 or 24."
I have a "raw" data set that I´m trying to clean. The data set consists of individuals with the variable age between year 2000 and 2010. There are around 20000 individuals in the data set with the same problem.
The variable age is not increasing in the years 2004-2006. For example, for one individual it looks like this:
2000: 16,
2001: 17,
2002: 18,
2003: 19,
2004: 19,
2005: 19,
2006: 19,
2007: 23,
2008: 24,
2009: 25,
2010: 26,
So far I have tried to generate variables for the max age and max year:
bysort id: egen last_year=max(year)
bysort id: egen last_age=max(age)
and then use foreach combined with lags to try to replace age variable in decreasing order so that when the new variable last_age (that now are 26 in all years) rather looks like this:
2010: 26
2009: 25 (26-1)
2008: 24 (26-2) , and so on.
However, I have some problem with finding the correct code for this problem.
Assuming that for each individual the first value of age is not missing and is correct, something like this might work
bysort id (year): replace age = age[1]+(year-year[1])
Alternatively, if the last value of age is assumed to always be accurate,
bysort id (year): replace age = age[_N]-(year[_N]-year)
Or, just fix the ages where there is no observation-to-observation change in age
bysort id (year): replace age = age[_n-1]+(year-year[_n-1]) if _n>1 & age==age[_n-1]
In the absence of sample data none of these have been tested.
William's code is very much to the point, but a few extra remarks won't fit easily into a comment.
Suppose we have age already and generate two other estimates going forward and backward as he suggests:
bysort id (year): gen age2 = age[1] + (year - year[1])
bysort id (year): gen age3 = age[_N] - (year[_N] - year)
Now if all three agree, we are good, and if two out of three agree, we will probably use the majority vote. Either way, that is the median; the median will be, for 3 values, the sum MINUS the minimum MINUS the maximum.
gen median = (age + age2 + age3) - max(age, age2, age3) - min(age, age2, age3)
If we get three different estimates, we should look more carefully.
edit age* if max(age, age2, age3) > median & median > min(age, age2, age3)
A final test is whether medians increase in the same way as years:
bysort id (year) : assert (median - median[_n-1]) == (year - year[_n-1]) if _n > 1
I wrote a small script that creates Fibonacci sequence and returns a sum of all even integers.
function even_fibo()
-- create Fibonacci sequence
local fib = {1, 2} -- starting with 1, 2
for i=3, 10 do
fib[i] = fib[i-2] + fib[i-1]
end
-- calculate sum of even numbers
local fib_sum = 0
for _, v in ipairs(fib) do
if v%2 == 0 then
fib_sum = fib_sum + v
end
end
return fib_sum
end
fib = even_fibo()
print(fib)
The function creates the following sequence:
1, 2, 3, 5, 8, 13, 21, 34, 55
And returns the sum of its even numbers: 44
However, when I change the stop index from 10 to 100, in for i=3, 100 do the returned sum is negative -8573983172444283806 because the values become too big.
Why is my code working for 10 and not for 100?
Prior to version 5.3, Lua always stored numbers internally as floats. In 5.3 Lua numbers can be stored internally as integers or floats. One option is to run Lua 5.2, I think you'll find your code works as expected there. The other option is to initialize your array with floats which will promote all operations on them in the future to floats:
local fib = {1.0, 2.0}
Here is a hack written in hindsight.
The code exploits the mathematical fact that the even Fibonacci numbers are exactly those at indices that are multiple of 3.
This allows us to avoid testing the parity of very large numbers and provides high-order digits that are correct when you do the computation in floating-point. Then we redo it looking only at the low-order digits and combine the results. The output is 286573922006908542050, which agrees with WA. Values of d between 5 and 15 work fine.
a,b=0.0,1.0
s=0
d=10
for n=1,100/3 do
a,b=b,a+b
a,b=b,a+b
s=s+b
a,b=b,a+b
end
h=string.format("%.0f",s):sub(1,-d-1)
m=10^d
a,b=0,1
s=0
for n=1,100/3 do
a,b=b,(a+b)%m
a,b=b,(a+b)%m
s=(s+b)%m
a,b=b,(a+b)%m
end
s=string.format("%0"..d..".0f",s)
print(h..s)