Fill blank variable with subsequent values - spss

I have a dataset structured like below;
id contracthours13 contracthours14 contracthours13u contracthours14u
12 . 13 . 13
13 30 30 . .
14 . . 15 16
15 . 5 6 7
If contracthours13 is missing I want the value in contracthours14 to move across. If this is missing then I want contacthours13u to move across and the same then for contracthours14u if the previous 3 are all missing. I know this is fairly simple syntax but I just can't get my head around how to do it without having the run simpler syntax 3 times. If anyone could help it would be greatly appreciated.
Edit: below is what I would like my dataset to look like afterwards.
id contracthours13
12 13
13 30
14 15
15 5

Look up VECTOR / LOOP examples.
DATA LIST FREE / ID CH13 CH14 CH13U CH14U.
BEGIN DATA.
1 -1 13 -1 -1
2 30 30 -1 -1
3 -1 -1 15 16
4 -1 5 6 7
END DATA.
DATASET NAME DSRaw.
RECODE ALL (-1=SYSMIS).
VECTOR V= CH14 TO CH14U.
LOOP #i = 1 TO 3 IF (NVALID(CH13)=0).
COMPUTE CH13=V(#i).
END LOOP IF NVALID(V(#i))=1.
LIST.
EXE.
**List**
ID CH13 CH14 CH13U CH14U
1.00 13.00 13.00 . .
2.00 30.00 30.00 . .
3.00 15.00 . 15.00 16.00
4.00 5.00 5.00 6.00 7.00
Number of cases read: 4 Number of cases listed: 4

Related

YouTube Data API returning inconsistent results with duplicates

There have been numerous questions about inconsistent results from the YouTube Data API: 1, 2, 3, 4, 5, 6. Most of them have accepted answers that seem to indicate there was a problem with the API request that was fixed by the instructions in the answers. But none of those situations apply to the API request discussed here.
There have also been two questions about duplicates in the API results: 1, 2. Both of them have the same answer, which says to use the next-page token. But both questions say the token was used, so that answer is not helpful.
Yesterday, I submitted a series of API requests to get the list of most-viewed videos about 3D printing. The first request in the series was:
https://www.googleapis.com/youtube/v3/search?q=3D print&type=video&maxResults=50&part=id,snippet&order=viewCount&key=<my key>
I ran that in a VBA sub, which took the next-page token from each result and resubmitted the URL with &pageToken=<nextPageToken> inserted.
The result was a list of 649 unique video IDs. So far so good.
After making some changes in the VBA code and seeing some duplicates in the result set, I went back today and ran the original VBA sub again. The result was again a list of 649 video IDs, but this time the list included duplicates and it also included IDs that were not in yesterday's list and was missing IDs that were there yesterday. Here is a comparison from the first two pages and the last two pages of the two result sets:
Page
# on page
# overall
Run 1
Run 2
Same as
Seq
Dup
1
1
1
f2mdMcf-fJs
f2mdMcf-fJs
1
1
2
2
WSauz5KVKTU
WSauz5KVKTU
2
Seq
1
3
3
zsSCUWs7k9Q
XYIUM5TkhMo
None
1
4
4
B5Q1J5c8oNc
zsSCUWs7k9Q
3
Seq
1
5
5
cUxIb3Pt-hQ
B5Q1J5c8oNc
4
Seq
1
6
6
4yyOOn7pWnA
LDjE28szwr8
None
1
7
7
3N46jQ0Xi3c
cUxIb3Pt-hQ
5
Seq
1
8
8
08dBVz8_VzU
4yyOOn7pWnA
6
Seq
...
1
13
13
oeKIe1ik2O8
e1rQ8YwNSDs
11
Seq
1
14
14
FrG_eSECfps
RVB2JreIcoc
12
Seq
1
15
15
pPQCwz2q96o
oeKIe1ik2O8
13
Seq
1
16
16
uo3KuoEiu3I
pPQCwz2q96o
15
NOT
1
17
17
0U6aIwd5h9s
uo3KuoEiu3I
16
Seq
...
1
47
47
ShGsW68zbIo
iu9rhqsvrPs
46
Seq
1
48
48
0q0xS7W78KQ
ShGsW68zbIo
47
Seq
1
49
49
UheJQsXOAnY
0q0xS7W78KQ
48
Seq
Dup
1
50
50
H8AcqOh0wis
H8AcqOh0wis
50
NOT
Dup
2
1
51
EWq3-2VuqbQ
0q0xS7W78KQ
48
NOT
Dup
2
2
52
scuTZza4f_o
H8AcqOh0wis
50
NOT
Dup
2
3
53
bJWJW-mz4_U
UheJQsXOAnY
49
NOT
2
4
54
Ii4VYsh9OlM
EWq3-2VuqbQ
51
NOT
2
5
55
r2-OGUu57pU
scuTZza4f_o
52
Seq
2
6
56
8KTnu18Mi9Q
bJWJW-mz4_U
53
Seq
2
7
57
DconsfGsXyA
Ii4VYsh9OlM
54
Seq
2
8
58
UttEvLJP3l8
8KTnu18Mi9Q
56
NOT
2
9
59
GJOOLH9ZP2I
DconsfGsXyA
57
Seq
2
10
60
ewgmg9Q5Ab8
UttEvLJP3l8
58
Seq
...
13
35
635
qHpR_p8lA4I
FFVOzo7tSV8
639
Seq
13
36
636
DplwDDZNTRc
76IBjdM9s6g
640
Seq
13
37
637
3AObqGsimr8
qEh0uZuu7_U
None
13
38
638
88keQ4PWH18
RhfGJduOlrw
641
Seq
13
39
639
FFVOzo7tSV8
QxzH9QkirCU
643
NOT
13
40
640
76IBjdM9s6g
Qsgz4GbL8O4
None
13
41
641
RhfGJduOlrw
BSgg7mEzfqY
644
Seq
13
42
642
lVEqwV0Nlzg
VcmjbJ2q8-w
645
Seq
13
43
643
QxzH9QkirCU
gOU0BCL-TXs
None
13
44
644
BSgg7mEzfqY
IoOXQUcW24s
646
Seq
13
45
645
VcmjbJ2q8-w
o4_2_a6LzFU
647
Seq
Dup
14
1
646
IoOXQUcW24s
o4_2_a6LzFU
647
NOT
Dup
14
2
647
o4_2_a6LzFU
ijVPcGaqVjc
648
Seq
14
3
648
ijVPcGaqVjc
nk3FlgEuG-s
649
Seq
14
4
649
nk3FlgEuG-s
27ZLFn8Dejg
None
The last three columns have the following meanings:
Same as: If an ID from Run 2 is the same as an ID from Run 1, then this column has the # overall for Run 1.
Seq: Indicates whether the number in column "Same as" is one more than the previous number in that column.
Dup: Indicates whether an ID from Run 2 occurred more than once in that run.
Problems:
The videos XYIUM5TkhMo, LDjE28szwr8, qEh0uZuu7_U, Qsgz4GbL8O4, gOU0BCL-TXs, and 27ZLFn8Dejg were returned as #3, 6, 637, 640, 643, and 649 in Run 2, but were not returned at all in Run 1.
The videos FrG_eSECfps, r2-OGUu57pU, lVEqwV0Nlzg were returned as #14, 55, 642, in Run 1, but were not in Run 2.
The videos 0q0xS7W78KQ, H8AcqOh0wis, and o4_2_a6LzFU were returned as #49, 50, and 645 in Run 2, but then each appears a second time in that run (as well as appearing in Run 1 as #48, 50, and 647).
These results are troubling. They mean that no single search will return a reliable list of videos for a given value of q.
I mentioned at the beginning that previous questions about inconsistent results from the YouTube Data API had answers that seemed to resolve those inconsistencies. Is there a way to do that for this search? Is there something wrong with the way I'm composing the search that is causing the problem?
If there isn't a way to fix the search, then I suppose the only way to get a list of videos on the topic with high confidence of it being complete is to run the search multiple times and merge the results until no new IDs appear that were not in a previous result set. But even then, one would not know if there are other videos lurking undetected.

Debugging APL code: how to use `#`(index) and `⊢` (right tack) together?

I am attempting to read Aaron Hsu's thesis on A data parallel compiler hosted on the GPU, where I have landed at some APL code I am unable to fix. I've attached both a screenshot of the offending page (page number 74 as per the thesis numbering on the bottom):
The transcribed code is as follows:
d ← 0 1 2 3 1 2 3 3 4 1 2 3 4 5 6 5 5 6 3 4 5 6 5 5 6 3 4
This makes sense: create an array named d.
⍳≢d
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
This too makes sense. Count the number of elements in d and create a sequence of
that length.
⍉↑d,¨⍳≢d
0 1 2 3 1 2 3 3 4 1 2 3 4 5 6 5 5 6 3 4 5 6 5 5 6 3 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
This is slightly challenging, but let me break it down:
zip the sequence ⍳≢d = 1..27 with the d array using the ,¨ idiom, which zips the two arrays using a catenation.
Then, split into two rows using ↑ and transpose to get columns using ⍉
Now the biggie:
(⍳≢d)#(d,¨⍳≢d)⊢7 27⍴' '
INDEX ERROR
(⍳≢d)#(d,¨⍳≢d)⊢7 27⍴' '
Attempting to break it down:
⍳≢d counts number of elements in d
(d,¨⍳≢d) creates an array of pairs (d, index of d)
7 27⍴' ' creates a 7 x 27 grid: presumably 7 because that's the max value of d + 1, for indexing reasons.
Now I'm flummoxed about how the use of ⊢ works: as far as I know, it just ignores everything to the left! So I'm missing something about the parsing of this expression.
I presume it is parsed as:
(⍳≢d)#((d,¨⍳≢d)⊢(7 27⍴' '))
which according to me should be evaluated as:
(⍳≢d)#((d,¨⍳≢d)⊢(7 27⍴' '))
= (⍳≢d)#((7 27⍴' ')) [using a⊢b = b]
= not the right thing
As I was writing this down, I managed to fix the bug by sheer luck: if we increment d to be d + 1 so we are 1-indexed, the bug no longer manifests:
d ← d + 1
d
1 2 3 4 2 3 4 4 5 2 3 4 5 6 7 6 6 7 4 5 6 7 6 6 7 4 5
then:
(⍳≢d)#(d,¨⍳≢d)⊢7 27⍴' '
1
2 5 10
3 6 11
4 7 8 12 19 26
9 13 20 27
14 16 17 21 23 24
15 18 22 25
However, I still don't understand how this works! I presume the context will be useful
for others attempting to leave the thesis, so I'm going to leave the rest of it up.
Please explain what (⍳≢d)#(d,¨⍳≢d)⊢7 27⍴' ' does!
I've attached the raw screenshot to make sure I didn't miss something:
I'm happy to see that you found the the off-by-one error. It stems from Aaron Hsu working with index origin 0. If you set ⎕IO←0 then his code will work.
Some dyadic operators can take an array operand, giving the sequence OPERATOR operand argument, e.g. in -#(1 2 3)(4 5 6 7). This poses a problem because both the operand and the argument are arrays, and juxtaposition of arrays forms a new array with those arrays as elements by a process known as stranding. Compare:
(1 2 3)(4 5 6 7)
┌─────┬───┐
│1 2 3│4 5│
└─────┴───┘
However, in the case of the operator with its array operand, we want to "break" this strand so the left part can act as operand while the right part acts as argument. One way to break the stranding up is by applying a function to the argument, giving the sequence OPERATOR operand Function argument. Now, we don't actually need any transformation of the argument, so an identity function will do: -#(1 2 3)⊢(4 5 6 7).
As for what (⍳≢d)#(d,¨⍳≢d)⊢7 27⍴' ' actually does:
7 27⍴' ' creates a blank matrix.
(⍳≢d) are indices to insert into specified slots in the matrix.
#(d,¨⍳≢d) indicates at which locations in the matrix the above should replace the existing values
⊢ serves solely to separate (d,¨⍳≢d) from 7 27⍴' '. The code could also have been written as ((⍳≢d)#(d,¨⍳≢d))7 27⍴' ' with parentheses serving to "bind" the operand to the operator.

Mapping standard number n bits to new system of n bits based on the amount of enabled bits

I'm trying to figure out a way to map numbers from our standard way of counting bits (binary) to a system where primarily the the cardinality of the bits in a number and then secondary the position of enabled bits in a number are used to map this said number to a new number ordered by the rules described. I've struggled to come up with a general fast method for any amount of bits. I want to know what methods there are for doing this and what is the best time complexity that can be achieved.
I provided an example mapping with 4 bits to make my question more clear.
0 0000 0
1 0001 1
2 0010 2
4 0100 3
8 1000 4
3 0011 5
5 0101 6
9 1001 7
6 0110 8
10 1010 9
12 1100 10
7 0111 11
11 1011 12
13 1101 13
14 1110 14
15 1111 15
I really don't know how to classify this problem in order to do more research on it. If anybody knows how to label this specifically and would like to share that with me I would be most grateful.

Get a list of function results until result > x

I basically want the same thing as this OP:
Is there a J idiom for adding to a list until a certain condition is met?
But I cant get the answers to work with OP's function or my own.
I will rephrase the question and write about the answers at the bottom.
I am trying to create a function that will return a list of fibonacci numbers less than 2.000.000. (without writing "while" inside the function).
Here is what i have tried:
First, i picked a way to culculate fibonacci numbers from this site:
https://code.jsoftware.com/wiki/Essays/Fibonacci_Sequence
fib =: (i. +/ .! i.#-)"0
echo fib i.10
0 1 1 2 3 5 8 13 21 34
Then I made an arbitrary list I knew was larger than what I needed. :
fiblist =: (fib i.40) NB. THIS IS A BAD SOLUTION!
Finally, I removed the numbers that were greater than what I needed:
result =: (fiblist < 2e6) # fiblist
echo result
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1.34627e6
This gets the right result, but is there a way to avoid using some arbitrary number like
40 in "fib i.40" ?
I would like to write a function, such that "func 2e6" returns the list of fibonacci numbers below 2.000.000. (without writing "while" inside the function).
echo func 2e6
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1.34627e6
here are the answers from the other question:
first answer:
2 *^:(100&>#:])^:_"0 (1 3 5 7 9 11)
128 192 160 112 144 176
second answer:
+:^:(100&>)^:(<_) ] 3
3 6 12 24 48 96 192
As I understand it, I just need to replace the functions used in the answers, but i dont see how
that can work. For example, if I try:
echo (, [: +/ _2&{.)^:(100&>#:])^:_ i.2
I get an error.
I approached it this way. First I want to have a way of generating the nth Fibonacci number, and I used f0b from your link to the Jsoftware Essays.
f0b=: (-&2 +&$: -&1) ^: (1&<) M.
Once I had that I just want to put it into a verb that will check to see if the result of f0b is less than a certain amount (I used 1000) and if it was then I incremented the input and went through the process again. This is the ($:#:>:) part. $: is Self-Reference. The right 0 argument is the starting point for generating the sequence.
($:#:>: ^: (1000 > f0b)) 0
17
This tells me that the 17th Fibonacci number is the largest one less than my limit. I use that information to generate the Fibonacci numbers by applying f0b to each item in i. ($:#:>: ^: (1000 > f0b)) 0 by using rank 0 (fob"0)
f0b"0 i. ($:#:>: ^: (1000 > f0b)) 0
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
In your case you wanted the ones under 2000000
f0b"0 i. ($:#:>: ^: (2000000 > f0b)) 0
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269
... and then I realized that you wanted a verb to be able to answer your original question. I went with dyadic where the left argument is the limit and the right argument generates the sequence. Same idea but I was able to make use of some hooks when I went to the tacit form. (> f0b) checks if the result of f0b is under the limit and ($: >:) increments the right argument while allowing the left argument to remain for $:
2000000 (($: >:) ^: (> f0b)) 0
32
fnum=: (($: >:) ^: (> f0b))
2000000 fnum 0
32
f0b"0 i. 2000000 fnum 0
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269
I have little doubt that others will come up with better solutions, but this is what I cobbled together tonight.

Why InfMnist (MNIST) size of 8M examples is calculated as 8 109 999 examples?

On the website :
http://leon.bottou.org/projects/infimnist
It says :
Generating files containing the MNIST8M training set:
$ infimnist lab 10000 8109999 > mnist8m-labels-idx1-ubyte
$ infimnist pat 10000 8109999 > mnist8m-patterns-idx3-ubyte
However, i fail to see why its from 10 000 to 8 109 999
Even if i do : 8 109 999 - 10 000 , it still doesnt make sense to me.
To me 8M would be 8 000 000 + 9 999 because i would end at 9 999 and start from 10 000 to 8 009 999 , which would be 8 million images.
Does anyone understand why its calculated as 8 109 999 ?
According to a fellow kaggle user, this is why :
The 8M dataset is the original images + 134 distortions/original. So there are
135*60,000 = 8,100,000
training images.
Adding the 10,000 test images you get 8,110,000 images.
The test images are from index 0 to 10,000-1=9,999 and the training images are from index 10,000 to 8,110,000-1 = 8,109,999.
I hope this helps.
The original dataset is also here:
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html
You can see that "# of data: 8,100,000"

Resources