In SPSS Statistics Syntax File, I am looking to create a variable that calculates rank based on a desired partitioned column (e.g. equivalent to SQL "rank over (partition by column_a order by column b)" in Oracle SQL developer).
Please see the example:
Initial data without any filters:
Final output after applying get_rank:
To create a rank variable as described, first sort your data and then use the LAG function.
SORT CASES BY column_a column_b .
compute rank=1 .
IF ($CASENUM>1 AND column_a=LAG(column_a)) rank=LAG(rank) + 1 .
EXE .
LAG will look at the value of column_a for the prior case. In the syntax above it checks whether the value in column_a is different from that of the prior case.
If it has, then it will set the rank to 1. If it hasn't, then it will add 1 to the rank of the prior case. Just make sure your data is properly sorted first.
From there, if you want to look only at records that are rank=1, you can either use FILTER BY or SELECT IF to do that.
If indeed you only need the key to filter for key=1 then you can use this:
SORT CASES BY column_a column_b .
match files /file=* /by column_a /first=key1.
Now variable key1 will have value 1 for every first occurence of a column_a category, and you can use it to filter or select.
For a full ranking variable you can use this (don't even need to sort first):
RANK VARIABLES=b (A) BY a /RANK /TIES=MEAN.
Related
I'm using a QUERY function in Google Sheets. I have a named data range ("Contributions" in table on another sheet) that consists of many columns, but I'm only concerned with two of them. For simplicity sake, it looks something like this:
I have another table that contains the unique set of names (e.g.: "Fred", "Ginger", etc. each only once) and I want to extract the level # (column B) from the above table to insert the most recent (largest number) in this second table.
Right now, my query looks like this:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
The problem is, that it outputs both B & C data - e.g.:
11 Fred
But since I already have the name (in column A of this other table) I only want it to output the value from B - e.g.:
11
Is there a way to output only a subset (in this case 1 of 2) of the columns of output based on a directive within the query itself (as opposed to doing post-processing of the results)?
Outputting a Subset of Columns Used in Query
In order to output only certain columns of a query result, the query only needs to select the columns to be displayed while the constraints / conditions may utilize other columns of data.
For example (as an answer to my own question) - I have a table like this:
I needed to get the data from the row with a name matching another cell (on another sheet) and with the latest (largest) number - but I only want to output the number part.
My initial attempt was:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
But that output both B & C where I only wanted B. The answer (thanks to # Calculuswhiz) was to continue using C for the condition but only select on B:
=QUERY(Contributions, "select B where C='"&A5&"' order by B desc limit 1",1)
I wish to change a flag from Y to N. I have been given a select which gives me two records for which I would like to change the value.
As a newbie to anything more advanced than the basics, I am totally at a loss.
As this is a live table, I would am too cautious to attempt this without fully understanding that I will update the correct records.
SELECT a.*, '||' , b.* FROM basecode b, type t
WHERE b.b_id IN ('Val1', 'Val2', 'Val3')
AND b.btype_id = t.ttype_id
The resultant code gives me records with multiple fields, but I just wish to change just a couple of records flag fields from 'Y' to 'N'
Stripping away the other fields I have
iflag='Y'
oflag='Y'
And just want them to be set to 'N' from the previous select.
I want to be able to limit the activerecord objects to 20 being returned, then perform a where() that returns a subset of the limited objects which I currently know only 10 will fulfil the second columns criteria.
e.g. of ideal behaviour:
o = Object.limit(20)
o.where(column: criteria).count
=> 10
But instead, activerecord still looks for 20 objects that fulfil the where() criteria, but looks outside of the original 20 objects that the limit() would have returned on its own.
How can I get the desired response?
One way to decrease the search space is to use a nested query. You should search the first N records rather than all records which match a specific condition. In SQL this would be done like this:
select * from (select * from table order by ORDERING_FIELD limit 20) where column = value;
The query above will only search for the condition in 20 rows from the table. Notice how I have added a ORDERING_FIELD, this is required because each query could give you a different order each time you run it.
To do something similar in Rails, you could try the following:
Object.where(id: Object.order(:id).limit(20).select(:id)).where(column: criteria)
This will execute a query similar to the following:
SELECT [objects].* FROM [objects] WHERE [objects].[id] IN (SELECT TOP (20) [objects].[id] FROM [objects] ORDER BY [objects].id ASC) AND [objects].[column] = criteria
I have an Esper query that returns multiple rows, but I'd like to instead get one row, where that row has a list (or concatenated string) of all of the values from the (corresponding columns of the) matching rows that my current query returns.
For example:
SELECT Name, avg(latency) as avgLatency
FROM MyStream.win:time(5 min)
GROUP BY Name
HAVING avgLatency / 1000 > 60
OUTPUT last every 5 min
Returns:
Name avgLatency
---- ----------
A 65
B 70
C 75
What I'd really like:
Name
----
{A, B, C}
Is this possible to do via the query itself? I tried to make this work using subqueries, but I'm not working with multiple streams. I can't find any aggregation functions or enumeration functions in the Esper documentation that fits what I'm trying to do either.
Thanks to anybody that has any insight or direction for me here.
EDIT:
If this can't be done via the query, I'm open to changing the subscriber, or anything else, if necessary.
You can have a subscriber or listener do the concat. There is a "Multi-Row Delivery" for subscribers. Or use a table like below.
// create table to hold aggregation result
create table LatencyTable(name string primary key, avgLatency avg(double));
// update aggregations in table from events coming in
into LatencyTable select name, avg(latency) as avgLatency from MyStream#time(5 min) group by name;
// do a select with the "aggregate" enumeration method
select (select * from LatencyTable where avgLatency > x).aggregate(....) from pattern[every timer:interval(5 min)]
My code depends on the order of records in the table. My assumption was that a table can be considered a list so that the records maintain order. I have a small update code as shown below that will update a record at a particular index in the table.
p = pieces[index]
p.position = 0
p.save
I check the order of records before this update and after this update then i see that after the update the record that is updated is moved to the last of the list. I print Piece.all to print the list. The order is maintained in mysql but when i deploy it to heroku which uses postgre the order was not maintained so this was a surprising find for me.
Is there no guarantee of order in tables and one should not depend on the order? Please correct my misunderstanding and thanks for the clarification.
You should NEVER depend on the order in my honest opinion.
Rows are returned in an unspecified order, per sql specs, unless you add an order by clause. In Postgres, that means you'll get rows in, basically, the order that live rows read on the disk.
MySQL tends to return rows in the order they're inserted, and this is why you see the different in behavior.
If you want them to always be returned in the order they were created, you can use Item.order("created_at")
You state:
My assumption was that a table can be considered a list so that the
records maintain order.
This is incorrect. A table represents an unordered set. There is no inherent ordering in the table. A result set similarly lacks ordering. The only way to guarantee the ordering of a result set is to use ORDER BY in the query.
So, an update changes values in one or more columns in one or more rows. It does not change the "ordering" of rows, because they are not ordered.
Note: Under some circumstances, a query may appear to return results in a particular order. You really should not depend on this behavior, unless the query has an explicit ORDER BY.
Tables normally are unordered, and should be presumed to be unordered unless they have a CLUSTER(ed) index. That's an important piece of information because understanding clustered indexes is somewhat useful. That said, what you receive back from a query, the resultset, should be presumed to be unordered because the join-order is always undefined.
So if order matters always be explicit and use ORDER BY. Now for illustration let's have some fun.
CREATE TABLE bar ( qux serial PRIMARY KEY, asdf text );
INSERT INTO bar (asdf) ( VALUES ('z'),('x'),('g'),('a') );
Now we've got this,
SELECT * FROM BAR;
qux | asdf
-----+------
1 | z
2 | x
3 | g
4 | a
Now we create a CLUSTERed index,
CREATE INDEX asdfidx ON bar (asdf);
CLUSTER bar USING asdfidx;
Now the order is guaranteed,
SELECT * FROM bar;
qux | asdf
-----+------
4 | a
3 | g
2 | x
1 | z