Change PTILE algorithm in SPSS - spss

Using SPSS 22 and in the documentation it lists several types of percentile calculations:
HPTILE
WPTILE
RPTILE
EPTILE
APTILE
From what I gather the default is APTILE. I would like to change it to HPTILE. The thing is it doesn't really say where to change it in SPSS syntax.
So in the syntax, I have:
CTABLES
/TABLES
...
[VALIDN F40.0,ptile 5, ptile 10, ptile 15, ptile 20 PTILE 25, ptile 30, ptile 35
ptile 40, ptile 45, MEDIAN, MEAN, ptile 55, ptile 60, PTILE 65, ptile 70, PTILE 75, ptile 80, ptile 85, PTILE 90, PTILE 95]
I was hoping it would be as simple as changing PTILE to HPTILE, but it results in an TABLE: Text hptile. An invalid subcommand, keyword, or option was specified.
How can I change the percentile algorithm used?

Well, I managed to get help in the IBM forums. Turns out none of SPSS percentile algorithms match the algorithms used in Excel (percentile.inc) and most databases that have native percentile functions (percentile_cont).
At any rate, this was what worked:
EXAMINE VARIABLES = Item Sales BY Item
/PERCENTILES(10, 25, 50, 65, 75, 90) = [WAVERAGE]
[HAVERAGE]
[ROUND]
[EMPIRICAL]
[AEMPIRICAL]

Related

Dataflow - Fixed Window- AfterProcessingTrigger

I am using a fixed window of 60 seconds with a trigger time 10 second. I am facing few unexpected results. Could you please help me in understanding how exactly it works.All the detail I have provided below.
My Input to the pubsub topic is :
*name* *score* publish timestamp(every 5 seconds I am publishing one element)
Laia 30 2021-04-10 09:38:29.708000+0000
Victor 20 2021-04-10 09:38:34.695000+0000
Victor 50 2021-04-10 09:38:39.703000+0000
Laia 40 2021-04-10 09:38:44.701000+0000
Victor 10 2021-04-10 09:38:49.711000+0000
Victor 40 2021-04-10 09:38:54.721000+0000
Laia 40 2021-04-10 09:38:59.715000+0000
Laia 50 2021-04-10 09:39:04.741000+0000
Laia 20 2021-04-10 09:39:09.867000+0000
Laia 20 2021-04-10 09:39:14.749000+0000
My Code :
window_withTrigger = (words
| "window" >> beam.WindowInto(beam.window.FixedWindows(60),
trigger=AfterProcessingTime(1 * 10),
accumulation_mode= AccumulationMode.ACCUMULATING)
| "Group" >> GroupByKey())
window_withoutTrigger = (words
| "window" >> beam.WindowInto(beam.window.FixedWindows(60))
| "Group" >> GroupByKey())
O/P for window_withTrigger:
Laia [30]
Victor [20, 50, 10, 40]
Laia [50, 20, 20]
O/P for window_withoutTrigger:
Laia [30, 40, 40]
Victor [20, 50, 10, 40]
Laia [50, 20, 20]
Output without trigger I am getting all the 10 elements that I published to the topic and with trigger I am getting 8 elements. I notice with trigger it does not emit results in 10 seconds if there is no change in the key item i.e only if the i/p name is changing from laila to victor it emits result and once it emits for one key in a window it does not emit again even if I publish with the same key.
You are probably dropping the elements because of not using Repeatedly.
Here you have another answer where this is explained. Basically the idea is that if you don't add Repeatedly, the trigger would only fire once.
Official doc.

Multiply bytes to produce 16-bits, without shifting

Still learning the art of SIMD, I have a question: I have two packed 8-bits registers that I'd like to multiply-add with _mm_maddubs_epi16 (pmaddubsw) to obtain a 16-bits packed register.
I know that these bytes will produce always a number less that 256, so I'd like to avoid wasting the remaining 8 bits. For instance, the result of _mm_maddubs_epi16(v1, v2) should write the result in r where XX is, not where it will be (denoted with __).
v1 (04, 00, 0e, 00, 04, 00, 04, 00, 0a, 00, 0f, 00, 05, 00, 01, 00)
v2 (04, 00, 0e, 00, 04, 00, 04, 00, 0a, 00, 0f, 00, 05, 00, 01, 00)
r (__, XX, __, XX, __, XX, __, XX, __, XX, __, XX, __, XX, __, XX)
Can I do this without shifting the result?
PS. I don't have a nice processor, I am limited to AVX instructions.
In your vector diagram, is the highest element at the left or the right? Are the XX locations in the most or least significant byte of the pmaddubsw result?
To get results in the low byte of a word, from inputs in the high byte of each word:
Use _mm_mulhi_epu16 so you're effectively doing (v1 << 8) * (v2 << 8) >> 16, producing the result in the opposite byte from the input words. Since you say the product is strictly less than 256, you'll get an 8-bit result in the low byte of each 16-bit word.
(If your inputs are signed, use _mm_mulhi_epi16, but then a negative result would be sign-extended to the full 16 bits.)
To get results in the high byte of a word, from inputs in the low byte
You'll need to change how you load / create one of the inputs so instead of
MSB LSB | MSB LSB
v1_lo (00, 04, 00, 0e, 00, 04, 00, 04, 00, 0a, 00, 0f, 00, 05, 00, 01)
element# 15 14 13 12 ... 0
you have this: (both using Intel's notation where the left element is the highest number, so vector shifts like _mm_slli_epi128 shift bytes to the left in the diagram).
MSB LSB | MSB LSB
v1_hi (04, 00, 0e, 00, 04, 00, 04, 00, 0a, 00, 0f, 00, 05, 00, 01, 00)
element# 15 14 13 12 ... 0
With v2 still having its non-zero bytes in the high half of each word element, simply _mm_mullo_epi16(v1_hi, v2), and you'll get (v1 * v2) << 8 for free.
If you're already unpacking bytes with zeros to obtain v1 and v2, then unpack the other way. If you were using pmovzx (_mm_cvtepu8_epi16), then switch to using _mm_unpacklo_epi8(_mm_setzero_si128(), packed_v1 ).
If you were loading these vectors from memory in this already-zero-padded form, use an unaligned load offset by 1 byte so the zeros end up in the opposite location.
If what you really want is to start with input bytes that aren't unpacked with zeros to start with, I don't think you can avoid that. Or if you're masking instead of unpacking (to save shuffle-port throughput by using _mm_and_si128 instead), you're probably going to need a shift somewhere. You can shift instead of masking one way, though, using v1_hi = _mm_slli_epi16(v, 8): a left-shift by 8 with word granularity will knock leave the low byte zeroed.
Shift v1 or v2 and then use_mm_mullo_epi16().
Possible XY Problem? My guess is that _mm_unpacklo_epi8() and _mm_packus_epi16() may be useful for you.

How can I convert a GUID into a byte array in Ruby?

In order to save data traffic we want to send our GUID's as array of bytes instead of as a string (with the use of Google Protocol Buffers).
How can I convert a string representation of a GUID in Ruby to an array of bytes:
Example:
Guid: 35918bc9-196d-40ea-9779-889d79b753f0
=> Result: C9 8B 91 35 6D 19 EA 40 97 79 88 9D 79 B7 53 F0
In .NET this seems to be natively implemented:
http://msdn.microsoft.com/en-us/library/system.guid.tobytearray%28v=vs.110%29.aspx
Your example GUID is in a Microsoft specific format. From Wikipedia:
Other systems, notably Microsoft's marshalling of UUIDs in their COM/OLE libraries, use a mixed-endian format, whereby the first three components of the UUID are little-endian, and the last two are big-endian.
So in order to get that result, we have to move the bits around a little. Specifically, we have to change the endianess of the first three components. Let's start by breaking the GUID string apart:
guid = '35918bc9-196d-40ea-9779-889d79b753f0'
parts = guid.split('-')
#=> ["35918bc9", "196d", "40ea", "9779", "889d79b753f0"]
We can convert these hex-strings to binary via:
mixed_endian = parts.pack('H* H* H* H* H*')
#=> "5\x91\x8B\xC9\x19m#\xEA\x97y\x88\x9Dy\xB7S\xF0"
Next let's swap the first three parts:
big_endian = mixed_endian.unpack('L< S< S< A*').pack('L> S> S> A*')
#=> "\xC9\x8B\x915m\x19\xEA#\x97y\x88\x9Dy\xB7S\xF0"
L denotes a 32-bit unsigned integer (1st component)
S denotes a 16-bit unsigned integer (2nd and 3rd component)
< and > denote little-endian and big-endian, respectively
A* treats the remaining bytes as an arbitrary binary string (we don't have to convert these)
If you prefer an array of bytes instead of a binary string, you'd just use:
big_endian.bytes
#=> [201, 139, 145, 53, 109, 25, 234, 64, 151, 121, 136, 157, 121, 183, 83, 240]
PS: if your actual GUID isn't Microsoft specific, you can skip the swapping part.

How to create a "on/off" graphs with HighCharts?

I've read the documentation quite a few times, but I just can't seem to find a way to make a graph like this. Perhaps it's because I don't know what it's called, so I'm not even sure what to look for. Let me try to explain what I'm trying to do.
Normally if you have a series of points like this:
3 May, 5:00 PM ---> 0
3 May, 5:20 PM ---> 3
4 May, 5:00 PM ---> 0
4 May, 5:20 PM ---> 3
If you make a standard LINE GRAPH, high charts will plot the values INCREASE between the two. So I end up with this:
But the problem is, the values being shown are actually values changing at a point in time. In other words, what I want is this:
And even more importantly, it seems the spacing between time isn't correct. You'll notice that it creates a perfect zigzag, even though the times between the first and second point is 20 minutes (5PM to 5:20 PM), and the second point and 3rd point is 23 hours and 40 minutes (3 May 5:20 PM and 4 May 5PM). So what I really want is this:
Any idea what a graph like this is called?
Any idea how to make it using HighCharts?
UPDATE
The only solution I can think of right now, is to fake points between the real points. so for example if the value is 0 at 5PM and turns to 3 at 5:20 PM, then I will add 19 points in between these two. So at 5:01 I will make it 0, and 5:02 I will also make it 0, and 5:03 etc. Until 5:19. But even this method will result in a SLIGHTLY skewed line going up from 5:19 to 5:20. Which is what I'm actually trying to avoid.
Any ideas?
UPDATE 2
The "step : left" solution has definitely solved half of my problem, but for some reason I still have this:
You should now see that even though I have steps, they are not quite making the expected spacing. For 17:13 on 5 May, I expect the graph to be closer to the 6 May mark, than to the 5 May mark.
Any ideas as to why this is happening?
UPDATE 3
I created a jFiddle for my problem: https://jsfiddle.net/coderama/ubz7m0Lh/4/
UPDATE 4
Based on wergeld's input, it seems using "ordinal" on the x axis is the way to go --> http://api.highcharts.com/highstock#xAxis.ordinal
But it produces a pretty weird graph: https://jsfiddle.net/coderama/6tz8h53x/1/
I'll keep looking, but at least it feels like there's progress being made!
What you are looking for is the step option. You can set up something like:
$(function() {
$('#container').highcharts({
title: {
text: 'Step line types, with null values in the series'
},
xAxis: {
type: 'datetime',
tickInterval: 86400000
},
series: [{
data: [
[Date.UTC(2016, 04, 3, 17, 00), 0],
[Date.UTC(2016, 04, 3, 20, 00), 3],
[Date.UTC(2016, 04, 4, 17, 00), 0],
[Date.UTC(2016, 04, 5, 18, 00), 3],
[Date.UTC(2016, 04, 5, 19, 00), 0],
[Date.UTC(2016, 04, 6, 20, 00), 3],
[Date.UTC(2016, 04, 7, 17, 00), 0]
],
step: 'left'
}]
});
});
The step parameter tells highcharts how to go from your given point to the next point.

How to think about weights in Myrrix

I have the following input for Myrrix:
11, 101, 1
11, 102, 1
11, 103, 1
11, 104, 1000
11, 105, 1000
11, 106, 1000
12, 101, 1
12, 102, 1
12, 103, 1
12, 222, 1
13, 104, 1000
13, 105, 1000
13, 106, 1000
13, 333, 1000
I am looking for items to recommend to user 11. The expectation is that item 333 will be recommended first (because of the higher weights for user 13 and items 104, 105, 106).
Here are the recommendation results from Myrrix:
11, 222, 0.04709
11, 333, 0.0334058
Notice that item 222 is recommended with strength 0.047, but item 333 is only given a strength of 0.033 --- the opposite of the expected results.
I also would have expected the difference in strength to be larger (since 1000 and 1 are so different), but obviously that's moot when the order isn't even what I expected.
How can I interpret these results and how should I think about the weight parameter? We are working with a large client under a tight deadline and would appreciate any pointers.
It's hard to judge based on a small and synthetic data set. I think the biggest factor will be parameters here -- what are the # of features? lambda? I would expect features = 2 here. If it's higher I think you quickly over-fit this and the results are mostly the noise left over from that after it perfectly explains that user 11 doesn't interact with 222 and 333.
The values are quite low, suggesting both of these are not likely results, and so their order may be more noise than anything. Do you see different results if the model is rebuilt from another random starting point?

Resources