stack data and restructure without using var to cases or casestovar in SPSS - spss

I have the following situation: a loop (stack data) with only 1 index variable and with multiple items corresponding to the statements, as in the picture below (sorry it is Excel, but is the same as in SPSS):
stack data - cases on multiple lines, but never filling for 1 respondent all the columns
I want to reach to the following situation but without using casestovars to restructure, because that creates a lot of empty variables. I remember for older versions it was a command like Update, which was moving up the cases, to reach the following result:
reducing the cases per respondent
Like starting from this:
ID Index Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6
1 1 1 1
1 2 1 1
1 3 1 1
To reach to this:
ID Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6
1 1 1 1 1 1 1
But without using casestovars. Is there any command in SPSS syntax for this?
Thank you very much, have a nice day!

Not entirely sure how variable your data structure is likely to be in reality but if as demo'ed where you have only a single response for each q1_1 to q1_6 per respondent ID, then the below would be sufficient:
dataset declare dsAgg.
aggregate outfile="dsAgg" /break=respid /q1_1 to q1_6=max(q1_1 to q1_6).
Also not sure of the significance of duplicate index values within the same respondent IDs, if this was intended or not.

The following syntax could do the job -
* first we'll recreate your example data.
data list list/respid index q1_1 to q1_6.
begin data
1,1,1,,,,,
1,2,,2,,,,
1,3,,,1,,,
1,4,,,,2,,
1,5,,,,,1,
1,6,,,,,,2
2,1,3,,,,,
2,1,,4,,,,
2,2,,,5,,,
2,2,,,,4,,
2,3,,,,,3,
2,3,,,,,,2
end data.
* now to work: first thing is to make sure the data from each ID are together.
sort cases by respid index.
* the loop will fill down the data to the last line of each ID.
do repeat qq=q1_1 to q1_6.
if respid=lag(respid) and missing(qq) qq=lag(qq).
end repeat.
* the following lines will help recognize the last line for each ID and select it.
compute lineNR=$casenum.
aggregate /outfile=* mode=ADDVARIABLES/break=respid/MXlineNR=max(lineNR).
select if lineNR=MXlineNR.
exe.

Related

Are there only 16 digits in a sum of decimal

create table t (mnt decimal(20,2));
insert into t values (111340534626262);
insert into t values (0.56);
select sum(mnt) from t;
select sum(mnt::decimal(20,2))::decimal(20,2) from t;
I can't get more than 16 digits. Any idea?
Using IDS 12.10FC10.
When I run the code shown in my sqlcmd program, I get the output:
111340534626262.56
111340534626262.56
When I run the code shown in Informix's DB-Access program, I get this output (slightly altered):
(sum)
111340534626263
1 row(s) retrieved.
(expression)
111340534626263
1 row(s) retrieved.
The problem, therefore, is probably in the display mechanism in DB-Access rather than in the server itself.
If you're writing your own code, it is relatively straight-forward to ensure that the display is accurate and complete. Using DB-Access is not necessarily the best way to go.

Loading Parameter Table

I am trying to load a parameter table.
I get error messages when opening the Parameter Table and trying to load a txt file (created with Excel and saved as a tab-delimited txt) via Treatmant -> Import Variable Table -> Group.
I tried using the advice given here: How to use table loader in ztree?
But I cannot import the parameter table generated.
The error messages say, e.g.:
Syntax error: line 1 (or above)
Error in period 0; subject 1
Parameter table in z-Tree is a special table and (if I am not mistaken) they are not meant to be exported or imported.
I just assumed you would like to have a special matching structure. (If you are planing to do something else, my answer might not be relevant.)
If you want to manage the Group variable from a file, you can create a table, say MATCHING and load an external file the same way it is described in the post you put the link. For instance something like that:
Period Subject Group
1 1 3
1 2 3
1 3 2
...
2 1 2
2 2 1
2 3 3
and you can add a program (subjects.do) as follows under the background stage:
Group = MATCHING.find(Subject == :Subject & Period == :Period, Group);
Just make sure you define the group for each subject and each period as if the program cannot find a valid entry for the subject and the period, it will create trouble.
Note: If you are using z-Tree 4, it seems that the variables need to be initiated first. This can be done by adding a program under the table. In z-Tree 3, this is not necessary.

Is there a way with Rails/Postgres to get records in a random order BUT place certain values first?

I have an app that returns a bunch of records in a list in a random order with some pagination. For this, I save a seed value (so that refreshing the page will return the list in the same order again) and then use .order('random()').
However, say that out of 100 records, I have 10 records that have a preferred_level = 1 while all the other 90 records have preferred_level = 0.
Is there some way that I can place the preferred_level = 1 records first but still keep everything randomized?
For example, I have [A,1],[X,0],[Y,0],[Z,0],[B,1],[C,1],[W,0] and I hope I would get back something like [B,1],[A,1],[C,1],[Z,0],[X,0],[Y,0],[W,0].
Note that even the ones with preferred_level = 1 are randomized within themselves, just that they come before all the 0 records. In theory, I would hope whatever solution would place preferred_level = 2 before the 1 records if I were ever to add them.
------------
I had hoped it would be as intuitively simple as Model.all.order('random()').order('preferred_level DESC') but that doesn't seem to be the case. The second order doesn't seem to affect anything.
Any help would be appreciated!
This got the job done for me on a very similar problem.
select * from table order by preferred_level = 1 desc, random()
or I guess the Rails way
Model.order('preferred_level = 1 desc, random()')

SUM(LAST()) on GROUP BY

I have a series, disk, that contains a path (/mnt/disk1, /mnt/disk2, etc) and total space of a disk. It also includes free and used values. These values are updated at a specified interval. What I would like to do, is query to get the sum of the total of the last() of each path. I would also like to do the same for free and for used, to get a aggregate of the total size, free space, and used space of all of my disks on my server.
I have a query here that will get me the last(total) of all the disks, grouped by its path (for distinction):
select last(total) as total from disk where path =~ /(mnt\/disk).*/ group by path
Currently, this returns 5 series, each containing 1 row (the latest) and the value of its total. I then want to take the sum of those series, but I cannot just wrap the last(total) into a sum() function call. Is there a way to do this that I am missing?
Carrying on from my comment above about nested functions.
Building a toy example:
CREATE DATABASE FOO
USE FOO
Assuming your data is updated at intervals greater than[1] every minute:
CREATE CONTINUOUS QUERY disk_sum_total ON FOO
BEGIN
SELECT sum("total") AS "total_1m" INTO disk_1m_total FROM "disk"
GROUP BY time(1m)
END
Then push some values in:
INSERT disk,path="/mnt/disk1" total=30
INSERT disk,path="/mnt/disk2" total=32
INSERT disk,path="/mnt/disk3" total=33
And wait more than a minute. Then:
INSERT disk,path="/mnt/disk1" total=41
INSERT disk,path="/mnt/disk2" total=42
INSERT disk,path="/mnt/disk3" total=43
And wait a minute+ again. Then:
SELECT * FROM disk_1m_total
name: disk_1m_total
-------------------
time total_1m
1476015300000000000 95
1476015420000000000 126
The two values are 30+32+33=95 and 41+42+43=126.
From there, it's trivial to query:
SELECT last(total_1m) FROM disk_1m_total
name: disk_1m_total
-------------------
time last
1476015420000000000 126
Hope that helps.
[1] Picking intervals smaller than the update frequency prevents minor timing jitters from making all the data being accidentally summed twice for a given group. There might be some "zero update" intervals, but no "double counting" intervals. I typically run the query twice as fast as the updates. If the CQ sees no data for a window, there will be no CQ performed for that window, so last() will still give the correct answer. For example, I left the CQ running overnight and pushed no new data in: last(total_1m) gives the same answer, not zero for "no new data".

Return every nth row from database using ActiveRecord in rails

Ruby 1.9.2 / rails 3.1 / deploy onto heroku --> posgresql
Hi, Once a number of rows relating to an object goes over a certain amount, I wish to pull back every nth row instead. It's simply because the rows are used (in part) to display data for graphing, so once the number of rows returned goes above say 20, it's good to return every second one, and so forth.
This question seemed to point in the right direction:
ActiveRecord Find - Skipping Records or Getting Every Nth Record
Doing a mod on row number makes sense, but using basically:
#widgetstats = self.widgetstats.find(:all,:conditions => 'MOD(ROW_NUMBER(),3) = 0 ')
doesn't work, it returns an error:
PGError: ERROR: window function call requires an OVER clause
And any attempt to solve that with e.g. basing my OVER clause syntax on things I see in the answer on this question:
Row numbering in PostgreSQL
ends in syntax errors and I can't get a result.
Am I missing a more obvious way of efficiently returning every nth task or if I'm on the right track any pointers on the way to go? Obviously returning all the data and fixing it in rails afterwards is possible, but terribly inefficient.
Thank you!
I think you are looking for a query like this one:
SELECT * FROM (SELECT widgetstats.*, row_number() OVER () AS rownum FROM widgetstats ORDER BY id) stats WHERE mod(rownum,3) = 0
This is difficult to build using ActiveRecord, so you might be forced to do something like:
#widgetstats = self.widgetstats.find_by_sql(
%{
SELECT * FROM
(
SELECT widgetstats.*, row_number() OVER () AS rownum FROM widgetstats ORDER BY id
) AS stats
WHERE mod(rownum,3) = 0
}
)
You'll obviously want to change the ordering used and add any WHERE clauses or other modifications to suit your needs.
Were I to solve this, I would either just write the SQL myself, like the SQL that you linked to. You can do this with
my_model.connection.execute('...')
or just get the id numbers and find by id
ids = (1..30).step(2)
my_model.where(id => ids)

Resources