Converting some values in a variable from minutes to hours while keeping the rest the same - spss

In a questionnaire I asked people for how long they ususally sleep a night. Now replys were supposed to be in a h.mm format but were accidently set to h only. This is why some participants gave their sleep duration in minutes or simply wrote "830" for 8 hours 30 minutes. Now I wanted to correct for both variations and tried this first:
RECODE Sleep (1=1) (2=2) (3=3) (4=4) (5=5) (6=6) (7=7) (8=8) (9=9) (10=10)(11=11)(12=12)(13=13)(14=14)(801 thru 899 = 8.5) (701 thru 799 = 7.5)(200 thru 600 = (Sleep/60))
INTO Sleep_rek.
RECODE Sleep_rek(LOWEST thru 4.5 = 0) (5, 6= 1) (6.5 thru 7 = 2) (7.5 thru HIGHEST=3)
INTO PSQI_K3_SleepDuration.
Execute.
Since it didn't work and I thought maybe Recode can't do that, I tried this instead:
RECODE Sleep (1=1) (2=2) (3=3) (4=4) (5=5) (6=6) (7=7) (8=8) (9=9) (10=10)(11=11)(12=12)(13=13)(14=14)(801 thru 899 = 8.5) (701 thru 799 = 7.5)
INTO Sleep_rek.
IF (Sleep = (300 thru 600) & Sleep_rek = sysmis) Sleep_rek = (Sleep/60).
RECODE Sleep_rek(LOWEST thru 4.5 = 0) (5, 6= 1) (6.5 thru 7 = 2) (7.5 thru HIGHEST=3)
INTO PSQI_K3_SleepDuration.
Execute.
However this threw an error as well:
The code is incomplete, check for missing operants, invalid operants,
non-matching paranthesis or too long strings.
The PSQI_K3_SleepDuration variable in the end was computed but anyone with a 300 to 600 value is still a missing value.
Can anyone tell me how to put it so it will work?

I can see a few errors in your syntax - see if it works once you've corrected them.
Indeed, recode .... (200 thru 600 = (Sleep/60)) can not work, you can't use a calculation as the target value.
recode ... (5, 6= 1) should be (5 6= 1) instead (no comma needed).
IF (Sleep = (300 thru 600) & Sleep_rek = sysmis) Sleep_rek = (Sleep/60) has a couple of errors: thru ia a subcommand for recode, can't be used like this, also can't use "sysmis" this way. Corrected version:
IF (Sleep>= 300 and Sleep<=600) and missing(Sleep_rek) Sleep_rek = (Sleep/60).
Your last recode is fine syntax-wise, but the ranges you are using seem to be wrong - since you'll be dividing numbers from 300 to 600 in 60, you will only get fractions in values 5 to 10. Yet the recode covers fractions in numbers under 4.5 and doesn't cover them in values between 5 to 7.5 (except specific values: 5, 6, 6.5, 7).

Related

group_by_day and sum works incorrect

I have a code:
total_rate = company.crews
.joins(:assignment)
.where(assignments: { date: week[:from]..week[:to] })
.group_by_day('assignments.date', format: '%Y-%m-%d')
.joins(:workers)
.sum('workers.rate')
in output, we have, as a crew with 2 workers, with rates = 4,5 and it should be 9. It's correct.
If in the last line it’s easy to do .sum ('workers.rate'), then everything is grouped normally, and summed up.
and the output is
{13:00 => 9} 9 is worker.first.rate = 4, worker.last = 5.
But, here, when I try to multiply by a condition more time, it turns out that every worker.reit is multiplied by the time difference,
But you need the amount of worker rate 9 * 13:00,
.sum('workers.rate * (time_arriving - time_leaving)')
not 4 * 13.00 + 5 * 13.00
That’s the trouble ...
Assuming Postgres:
'time' - 'time' will return an interval, multiplying it with an integer will not change the type.
You will need to extract the hours first, then you can multiply it by the rate.
This query shows you how to do get the hours:
select extract(epoch from time_leaving - time_starting) / 3600 from assignments
Note that select extract(hour from time_leaving - time_starting) from assignments will only extract full hours, so 12:00 - 10:15 will return '1'

Modular arithmetic using fractions

I'm stuck on this cryptography problem using multiplication of a whole number and a fraction mod 10.
Here is the equation:
7 * (4/11) mod 10 =?
I know I am supposed to convert this to an integer since the mod operator does not work with fractions, but I cannot figure this one out. Obviously,
7 * (4/11) = 28/11,
but I cannot get the mod 10 of a fraction. The instructor wants the exact answer, not a decimal. Any help would be greatly appreciated!
Have a look here: "Is it possible to do modulo of a fraction" on math.stackexchange.com.
One natural way to define the modular function is
a (mod b) = a − b ⌊a / b⌋
where ⌊⋅⌋ denotes the floor function. This is the approach used in the influential book Concrete Mathematics by Graham, Knuth, Patashnik.
This will give you 1/2(mod3)=1/2.
To work through your problem, you have a = 7 * (4/11) = 28/11, and b = 10.
a / b = (28/11)/10 = 0.25454545...
⌊a/b⌋ = 0
b ⌊a/b⌋ = 0 * 0 = 0
a - b ⌊a/b⌋ = 28/11 - 0 = 28/11
This means your answer is 28/11.
Wolfram Alpha agrees with me and gives 28/11 as the exact result. Google also agrees, but gives it as a decimal, 2.54545454.....
A fraction is an exact answer and not a decimal.
8
8 is the correct answer indeed.
7*4/11 mod 10 means we're looking at 7*4*x mod 10 where x is the modular inverse of 11 modulo 10, which means that 11*x mod 10 = 1.
This is true for x=1 (11*1 mod 10 = 1)
So 7*4*x mod 10 becomes 7*4*1 mod 10 which is 28 mod 10 = 8
I can speculate that the notation is wrong, and that the whole expression is supposed to be evaluated in mod 10 at each intermediate stage. Since ( 11 mod 1 ) is 1, then answer is (7 * 4) mod 10 = 8.
Imagine a calculator with support only for the ones digit.
I'm not saying this is the right answer, I agree 28/11 is the right answer as given, but I am trying to get into the head of the professor. This is common in cryptography, where every calculation is performed mod 2 ^ 256 or so.
This is how the original question probably should have been written, as this has a different meaning. When the (mod 10) is written at the end, it means that each term is evaluated with an implied mod 10 operation.
The problem is a bit weird, as the modulo value of 10 is not general purpose, because it is not prime. For example, the following can not be evaluated because 1/2 mod 10 is not defined, because 2 and 10 are not coprime.
So, here is the correct answer from the instructor. I have no idea how he came up with this:
7 4/11 mod 10 = ((7 4) mod 10)(11−1 mod 10) mod 10
= (28 mod 10)(1 mod 10) mod 10
= (8)(1) mod 10
= 8 mod 10
Using Python:
from fractions import Fraction
from math import fmod
print (fmod(Fraction(28, 11), 10))
The result will be 2.545454545454. So I guess 8 is wrong.

Moving Average across Variables in Stata

I have a panel data set for which I would like to calculate moving averages across years.
Each year is a variable for which there is an observation for each state, and I would like to create a new variable for the average of every three year period.
For example:
P1947=rmean(v1943 v1944 v1945), P1947=rmean(v1944 v1945 v1946)
I figured I should use a foreach loop with the egen command, but I'm not sure about how I should refer to the different variables within the loop.
I'd appreciate any guidance!
This data structure is quite unfit for purpose. Assuming an identifier id you need to reshape, e.g.
reshape long v, i(id) j(year)
tsset id year
Then a moving average is easy. Use tssmooth or just generate, e.g.
gen mave = (L.v + v + F.v)/3
or (better)
gen mave = 0.25 * L.v + 0.5 * v + 0.25 * F.v
More on why your data structure is quite unfit: Not only would calculation of a moving average need a loop (not necessarily involving egen), but you would be creating several new extra variables. Using those in any subsequent analysis would be somewhere between awkward and impossible.
EDIT I'll give a sample loop, while not moving from my stance that it is poor technique. I don't see a reason behind your naming convention whereby P1947 is a mean for 1943-1945; I assume that's just a typo. Let's suppose that we have data for 1913-2012. For means of 3 years, we lose one year at each end.
forval j = 1914/2011 {
local i = `j' - 1
local k = `j' + 1
gen P`j' = (v`i' + v`j' + v`k') / 3
}
That could be written more concisely, at the expense of a flurry of macros within macros. Using unequal weights is easy, as above. The only reason to use egen is that it doesn't give up if there are missings, which the above will do.
FURTHER EDIT
As a matter of completeness, note that it is easy to handle missings without resorting to egen.
The numerator
(v`i' + v`j' + v`k')
generalises to
(cond(missing(v`i'), 0, v`i') + cond(missing(v`j'), 0, v`j') + cond(missing(v`k'), 0, v`k')
and the denominator
3
generalises to
!missing(v`i') + !missing(v`j') + !missing(v`k')
If all values are missing, this reduces to 0/0, or missing. Otherwise, if any value is missing, we add 0 to the numerator and 0 to the denominator, which is the same as ignoring it. Naturally the code is tolerable as above for averages of 3 years, but either for that case or for averaging over more years, we would replace the lines above by a loop, which is what egen does.
There is a user written program that can do that very easily for you. It is called mvsumm and can be found through findit mvsumm
xtset id time
mvsumm observations, stat(mean) win(t) gen(new_variable) end

Do a predefined loop consisting of 4 variables 100 times

I am pretty new at SPSS macro's, but I think I need one.
I have 400 variables, I want to do this loop 400 times. My variables are ordered consecutively. So first I want to do this loop for variables 1 to 4, then for variables 5 to 8, then for variables 9 to 12 and so on.
vector TEQ5DBv=T0EQ5DNL to T4EQ5DNL.
loop #index = 1 to 4.
+ IF( MISSING(TEQ5DBv(#index+1))) TEQ5DBv(#index+1) = TEQ5DBv(#index) .
end loop.
EXECUTE.
Below is an example of what it appears to me you are trying to do. Note I replaced your use of the looping and index with a do repeat command. To me it is just more clear what you are doing by making two lists in the do repeat command as opposed to calling lead indexes in your loop.
*making data.
DATA LIST FIXED /X1 to X4 1-4.
BEGIN DATA
1111
0101
1 0
END DATA.
*I make new variables, so you dont overwrite your original variables.
vector X_rec (4,F1.0).
do repeat X_rec = X_rec1 to X_rec4 / X = X1 to X4.
compute X_rec = X.
end repeat.
execute.
do repeat X_later = X_rec2 to X_rec4 / X_early = X1 to X3.
if missing(X_later) = 1 X_later = X_early.
end repeat.
execute.
A few notes on this. Previously your code was overwriting your initial variables, in this code I create a set a new variables named "X_rec1 ... X_rec4", and then set those values to the same as the original set of variables (X1 to X4). The second do repeat command fills in the recoded variables if a missing value occurs with the previous variable. One big difference between this and your prior code, in your prior code if you ran it repeatedly it would continue to fill in the missing data, whereas my code would not. If you want to continue to fill in the missing data, you would just have to replace in the code above X_early = X1 to X3 with X_early = X_rec1 to X_rec3 and then just run the code at least 3 times (of course if you have a case with all missing data for the four variables, it will all still be missing.) Below is a macro to simplify calling this repeated code.
SET MPRINT ON.
DEFINE !missing_update (list = !TOKENS(1)).
!LET !list_rec = !CONCAT(!list,"_rec")
!LET !list_rec1 = !CONCAT(!list_rec,"1")
!LET !list_rec2 = !CONCAT(!list_rec,"2")
!LET !list_rec4 = !CONCAT(!list_rec,"4")
!LET !list_1 = !CONCAT(!list,"1")
!LET !list_3 = !CONCAT(!list,"3")
!LET !list_4 = !CONCAT(!list,"4")
vector !list_rec (4,F1.0).
do repeat UpdatedVar = !list_rec1 to !list_rec4 / OldVar = !list_1 to !list_4.
compute UpdatedVar = OldVar.
end repeat.
execute.
do repeat UpdatedVar = !list_rec2 to !list_rec4 / OldVar = !list_1 to !list_3.
if missing(UpdatedVar) = 1 UpdatedVar = OldVar.
end repeat.
execute.
!ENDDEFINE.
*dropping recoded variables I made before.
match files file = *
/drop X_rec1 to X_rec4.
execute.
!missing_update list = X.
I suspect there is a way to loop through all of the variables in the dataset without having to call the macro repeatedly for each set, but I'm not sure how to do it (it may not be possible within DEFINE, and you may have to resort to writing up a python program). Worst case you just have to write the above macro defined function 400 times!
Your Loop-Syntax is incorrect because when #index reaches "4" your code says that you want to do an operation on TEQ5DBv(5). So you definetly will get an error.
I don't know what exactly you want to do, but a nested loop might help you to achieve your goal.
Here is an example:
* Creating some Data.
DATA LIST FIXED /v1 to v12 1-12.
BEGIN DATA
1234 9012
2 4 6 8 1 2
1 3 5 7 9 1
12 56 90
456 012
END DATA.
* Vectorset of variables
VECTOR vv = v1 TO v12.
LOOP #i = 1 TO 12 BY 4.
LOOP #j = 0 TO 2. /* inner Loop runs only up to "2" so you wont exceed your inner block.
IF(MISSING(vv(#i+#j+1))) vv(#i+#j+1) = vv(#i+#j).
END LOOP.
END LOOP.
EXECUTE.

Size of the array that Fortran can handle

I have 30000 files to process each file has 80000 x 5 lines. I need to read all files and process them finding the average of each line. I have written the code to read and extract all data from the file. My code is in Fortran. There is an array of (30000 X 800000) My program could not go over (3300 X 80000). I need to add the 4th column of each file in 300 file steps, I mean 4th column of 1st file with 4th column of 301st file, 4th col of 2nd file with 4th col of 302nd file and so on .Do you think this is because of the limitation of the size of array that Fortran can handle? If so, is there any way to increase the size of the array that Fortran can handle? What about the no of files? My code looks like this:
This program runs well.
implicit double precision (a-h,o-z),integer(i-n)
dimension x(78805,5),y(78805,5),den(78805,5)
dimension b(3300,78805),bb(78805)
character*70,fn
nf = 3300 ! NUMBER OF FILES
nj = 78804 ! Number of rows in file.
ns = 300 ! No. of steps for files.
ncores = 11 ! No of Cores
c--------------------------------------------------------------------
c--------------------------------------------------------------------
!Initialization
do i = 0,nf
do j = 1, nj
x(j,1) = 0.0
y(j,2) = 0.0
den(j,4) = 0.0
c a(i,j) = 0.0
b(i,j) = 0.0
c aa(j) = 0.0
bb(j) = 0.0
end do
end do
c-------!Body program-----------------------------------------------
iout = 6 ! Output Files upto "ns" no.
DO i= 1,nf ! LOOP FOR THE NUMBER OF FILES
write(fn,10)i
open(1,file=fn)
do j=1,nj ! Loop for the no of rows in the domain
read(1,*)x(j,1),y(j,2),den(j,4)
if(i.le.ns) then
c a(i,j) = prob(j,3)
b(i,j) = den(j,4)
else
c a(i,j) = prob(j,3) + a(i-ns,j)
b(i,j) = den(j,4) + b(i-ns,j)
end if
end do
close(1)
c ----------------------------------------------------------
c -----Write Out put [Probability and density matrix]-------
c ----------------------------------------------------------
if(i.ge.(nf-ns)) then
do j = 1, nj
c aa(j) = a(i,j)/(ncores*1.0)
bb(j) = b(i,j)/(ncores*1.0)
write(iout,*) int(x(j,1)),int(y(j,2)),bb(j)
end do
close(iout)
iout = iout + 1
end if
END DO
10 format(i0,'.txt')
END
It's hard to say for sure because you haven't given all the details yet, but your problem is quite possibly that you are using a 32 bit compiler producing 32 bit executables and you are simply running out of address space.
Although your operating system supports 64 bit address space, your 32 bit process is still limited to 32 bit addresses.
You have found a limit at 3300*78805*8 which is just under 2GB and this supports my theory.
No matter what is the cause of your immediate problem, your fundamental problem is that you appear to be loading everything into memory at once. I've not closely studied your algorithm but on first inspection it seems likely that you could re-arrange it to avoid having everything in memory at once.

Resources