Limit query on highest value of attribute - ruby-on-rails

I have a model Teststep with these columns and values:
+----------+----------+---------------+----------+
| name | sequence | inner_sequnce | revision |
+----------+----------+---------------+----------+
| Step A | 1 | 1 | 1 |
| Step B | 1 | 2 | 1 |
| Step B-2 | 1 | 2 | 2 |
| Step C | 1 | 3 | 1 |
| Step D | 2 | 1 | 1 |
+----------+----------+---------------+----------+
Now I want all teststeps with sequence 1 but only with the highest revision. So in this case it would include Step A, Step B-2 and Step C.
The query to get the teststeps with the right sequence is easy:
Teststep.where(sequence: 1)
But how can I make sure only Step B-2 is returned and not Step B or both Step B-2 and Step B?

You want to create a subquery and use an aggregate:
SELECT * FROM teststeps
WHERE
teststeps.sequence = 1
AND
teststeps.revision = (SELECT MAX(teststeps.revision) WHERE teststeps.sequence = 1)
The bad news is that the exact details of how to do this varies depending on which RDBMS is used.
In Rails you could do something like:
Teststep.where(sequence: 1)
.where(revision: Teststep.where(sequence: 1).maximum(:revision))

Related

Rails: create unique auto-incremental id based on sibling records

I have three models in my rails project, namely User, Game, Match
user can create many matches on each game
so table structure for matches is like
table name: game_matches
+----+---------+---------+-------------+------------+
| id | user_id | game_id | match_type | match_name |
+----+---------+---------+-------------+------------+
| 1 | 1 | 1 | practice | |
| 2 | 3 | 2 | challenge | |
| 3 | 1 | 1 | practice | |
| 4 | 3 | 2 | challenge | |
| 5 | 1 | 1 | challenge | |
| 6 | 3 | 2 | practice | |
+----+---------+---------+-------------+------------+
i want to generate match_name based on user_id, game_id and match_type values
for example match_name should be create like below
+----+---------+---------+-------------+-------------+
| id | user_id | game_id | match_type | match_name |
+----+---------+---------+-------------+-------------+
| 1 | 1 | 1 | practice | Practice 1 |
| 2 | 3 | 2 | challenge | Challenge 1 |
| 3 | 1 | 1 | practice | Practice 2 |
| 4 | 3 | 2 | challenge | Challenge 2 |
| 5 | 1 | 1 | challenge | Challenge 1 |
| 6 | 3 | 2 | practice | Practice 1 |
+----+---------+---------+-------------+-------------+
How can i achieve this auto incremental value in my rails model during new record creation.
Any help suggestions appreciated.
Thanks in advance.
I see two ways you can solve this:
DB: trigger
Rails: callback
Trigger (assuming Postgres):
DROP TRIGGER IF EXISTS trigger_add_match_name ON customers;
DROP FUNCTION IF EXISTS function_add_match_name();
CREATE FUNCTION function_add_match_name()
RETURNS trigger AS $$
BEGIN
NEW.match_name := (
SELECT
CONCAT(game_matches.match_type, ' ', COALESCE(count(*), 0))
FROM game_matches
WHERE game_matches.user_id = NEW.user_id AND game_matches.match_type = NEW.match_type
);
RETURN NEW;
END
$$ LANGUAGE 'plpgsql';
CREATE TRIGGER trigger_add_match_name
BEFORE INSERT ON game_matches
FOR EACH ROW
EXECUTE PROCEDURE function_add_match_name();
Please note that this is not tested.
Rails
class GameMatch
before_create :assign_match_name
private
def assign_match_name
number = GameMatch.where(user_id: user_id, match_type: match_type).count || 0
name = "#{match_type} #{number + 1}"
self.match_name = name
end
end
Again, untested.
I'd prefer the trigger solution since callbacks can be skipped or ommited altogether when inserting via pure SQL.
Also I'd add "match_number" column instead of the full name and then construct the name within the Model or a Decorator or a view Helper (more flexible, I18n) but the logic behind stays the same.
You should retrieve the last match_name for these user and game, split it, increase the counter and join back with a space. Unfortunately, SQL does not provide SPLIT function, so somewhat like below would be a good start:
SELECT match_name
FROM match_name
WHERE user_id = 3
AND game_id = 2
ORDER BY id DESC
LIMIT 1
I would actually better create a match_number column of type INT to keep the number by type and produce a name by concatenation the type with this number.

Automated way to create a confusion matrix in Google Sheets?

I have a table of this form in Google Sheets:
+---------+------------+--------+
| item_id | prediction | actual |
+---------+------------+--------+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 0 |
| 4 | 0 | 1 |
| 5 | 0 | 0 |
| 6 | 1 | 1 |
+---------+------------+--------+
And I'd like to know if there's an automated way to get this kind of summary, with the counts of items that fit the criteria specified in that row/column combination:
+----------+--------------+--------------+-------+
| | prediction=0 | prediction=1 | total |
+----------+--------------+--------------+-------+
| actual=0 | 1 | 1 | 2 |
| actual=1 | 1 | 3 | 4 |
+----------+--------------+--------------+-------+
| total | 2 | 4 | |
+----------+--------------+--------------+-------+
I've been doing this somewhat manually in Google Sheets by using COUNTIFS, but I'm wondering if there's a built-in way? I tried using pivot tables, but couldn't figure out how to get the calculated fields to show the data I want.
A coworker figured it out - you can get this by creating a pivot table with the correct columns and rows, and setting the value to item_id summarized by COUNTUNIQUE.

Why does my cypher query take 10 times longer when I run it with count()?

I start with the following query:
PROFILE
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN PContains
LIMIT 10
I get "5834 total db hits in 119 ms". The graph correctly shows 9 nodes, and 8 edges connecting them. Then I run an almost-identical query, except that I instead return count(distinct()):
PROFILE
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN count(distinct(SPrimePackage))
LIMIT 10
This gives "1382270 total db hits in 1771 ms". The result is correct: 8. However, why is count(distinct()) so much slower and more expensive? Should I be doing this some other way?
I'm running Neo4j 2.3.1
EDIT 1
To ensure I'm comparing apples to apples, and to highlight the question, here is a similar pair of queries and results:
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN SPrimePackage
LIMIT 10
Note it's returning "SPrimePackage" instead of "PContains" in the original. The result is "5834 total db hits in 740 ms".
Here is that exact same query with "count()":
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN count(SPrimePackage)
LIMIT 10
The result: "1382270 total db hits in 2731 ms". Note the only difference is the "count()". Intuitively, I would expect "count()" to add a single tallying step, but clearly it's doing much more than that. Why is "count()" triggering all of this extra work?
[UPDATED]
If you compared the PROFILE output of your 2 (edited) queries, you'd probably see that the only significant difference was the existence of an EagerAggregation operation in the COUNT() version of the query. Aggregation functions use EagerAggregation to collect in memory all the data being aggregated before actually performing the aggregation function (in this case, COUNT()). That requires additional work that is not needed when you do not use the aggregation function.
The following query still uses COUNT() in order to get the count, but greatly reduces the data that has to be aggregated, thus reducing the amount of work that needs to be done in the EagerAggregation step:
PROFILE
MATCH (SBase:Snapshot { timestamp:1454983481.304583 })
USING INDEX SBase:Snapshot(timestamp)
WHERE (SBase)-[:contains]->()
MATCH (s:Snapshot { timestamp:1454983521.642284 })-[:contains]->(SPrimePackage)
USING INDEX s:Snapshot(timestamp)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN COUNT(DISTINCT SPrimePackage)
LIMIT 10;
The above query assumes you have already created an index on :Snapshot(timestamp), to greatly speed up the search for the 2 :Snapshot nodes:
CREATE INDEX ON :Snapshot(timestamp);
Using some simple data, the profile I get is:
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
| +ProduceResults | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | COUNT(DISTINCT SPrimePackage) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +Limit | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | Literal(10) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +EagerAggregation | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +AntiSemiApply | 1 | 7 | 0 | anon[180], s -- SBase, SPrimePackage | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(Into) | 1 | 0 | 34 | anon[266] -- SBase, SPrimePackage | (SBase)-[:contains]->(SPrimePackage) |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Argument | 4 | 8 | 0 | SBase, SPrimePackage | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +CartesianProduct | 4 | 8 | 0 | SBase -- anon[180], SPrimePackage, s | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(All) | 4 | 8 | 10 | anon[180], SPrimePackage -- s | (s)-[:contains]->(SPrimePackage) |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +NodeIndexSeek | 2 | 2 | 4 | s | :Snapshot(timestamp) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +SemiApply | 1 | 2 | 0 | SBase | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(All) | 4 | 0 | 2 | anon[112], anon[126] -- SBase | (SBase)-[:contains]->() |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Argument | 2 | 2 | 0 | SBase | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +NodeIndexSeek | 2 | 2 | 3 | SBase | :Snapshot(timestamp) |
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
In addition to using indexing, the above query:
Does not bother to find all nodes contained by SBase, since we need to find just one contained node in order to identify a matching SBase node. The SemiApply operation will complete as soon as a single (SBase)-[:contains]->() match is found, and so the first MATCH clause will result in a single row per SBase instead of N rows. Based on the info in your question, I suspect N would have been about 8.
Has a Cartesian Product that should be pretty fast, since both "legs" of the product should have low cardinality.

How to perform batch update of a column within a range in psql

Here is my table structure:
Table name:Items
--------------------------------
id | category_id | code |
--------------------------------
1 | 1 | 15156 |
2 | 1 | 15157 |
2 | 1 | 15158 |
2 | 1 | 15159 |
2 | 1 | 15160 |
2 | 1 | 15161 |
Here code field is unique and its type is string. I need to increment code field values by +1(code field is string).
You can try
Item.update_all(code: "#{code.to_i + 1}")
If you want to read update_all
The update_all won't work because the record attributes are not available.
Better might be...
minimum = "15157"
maximum = "15160"
Item.where("code >= ? AND code <= ?", minimum, maximum).each{|i| i.update_attribute(:code, "#{i.code.to_i + 1}") }
(edited to reflect two arguments passed to update_attribute)
Edited to reflect #rustamagasanov suggestion to limit to a given range of code values...

Count occurrences of words from multiple columns

I have a spreadsheet like this, where the values A-E are the same options coming from a form:
+------+------+------+
| Opt1 | Opt2 | Opt3 |
+------+------+------+
| A | A | B |
| B | C | A |
| C | C | B |
| A | E | C |
| D | B | E |
| B | E | D |
+------+------+------+
I want to make a ranking, showing the most chosen options for each option. I already have this, where Rank is the ranking of the option and number is the count of the option:
+------+------+------+
| Rank | Opt1 | Numb |
+------+------+------+
| 1 | A | 2 |
| 1 | B | 2 |
| 3 | C | 1 |
| 3 | D | 1 |
+------+------+------+ (I have 3 of these, one for each option)
I want to do now a summary of the 3 options, making the same ranking but joining the options. It would be something like:
+------+------+------+
| Rank |Opt123| Numb |
+------+------+------+
| 1 | B | 5 |
| 2 | A | 4 |
| 2 | C | 4 |
| 4 | E | 3 |
| 5 | D | 2 |
+------+------+------+
The easiest way to do this would be getting the data from the three ranking tables or from the original three data columns?
And how would I do this?
I already have the formula to get the names of the options, the count and ranking, but I don't know how to make them work with multiple columns.
What I have (the F column is one of the data columns):
Column B on another sheet:
=SORT(UNIQUE(FILTER('Form Responses'!F2:F;NOT(ISBLANK('Form Responses'!F2:F)))); RANK(COUNTIF('Form Responses'!F2:F; UNIQUE(FILTER('Form Responses'!F2:F;NOT(ISBLANK('Form Responses'!F2:F))))); COUNTIF('Form Responses'!F2:F; UNIQUE(FILTER('Form Responses'!F2:F;NOT(ISBLANK('Form Responses'!F2:F))))); TRUE); FALSE)
Column C:
=ArrayFormula(COUNTIF('Form Responses'!F2:F; FILTER(B2:B;NOT(ISBLANK(B2:B)))))
Column A:
=ARRAYFORMULA(SORT(RANK(FILTER(C2:C;NOT(ISBLANK(C2:C))); FILTER(C2:C;NOT(ISBLANK(C2:C))))))
Edited:
Merge cols:
=TRANSPOSE(split(join(",",D2:D,E2:E),","))
merges 2 cols, not very clean, but works. (Same as here Stacking multiple columns on to one?)
Full formula:
=SORT(UNIQUE(FILTER(TRANSPOSE(split(join(",",D2:D,E2:E),","));NOT(ISBLANK(TRANSPOSE(split(join(",",D2:D,E2:E),",")))))); RANK(COUNTIF(TRANSPOSE(split(join(",",D2:D,E2:E),",")); UNIQUE(FILTER(TRANSPOSE(split(join(",",D2:D,E2:E),","));NOT(ISBLANK(TRANSPOSE(split(join(",",D2:D,E2:E),","))))))); COUNTIF(TRANSPOSE(split(join(",",D2:D,E2:E),",")); UNIQUE(FILTER(TRANSPOSE(split(join(",",D2:D,E2:E),","));NOT(ISBLANK(TRANSPOSE(split(join(",",D2:D,E2:E),","))))))); TRUE); FALSE)
The transpose could be done after the sort.

Resources