Row position in 100k+ records - ruby-on-rails

i have this code to get the position of each record:
Message.all.each_with_index do |msg, i|
message_order[msg.id] = i
end
But now i have 100k+ messages and it takes to long to iterate over all the records.
Can anyone tell me how to do this more performant? (I'm using oracle)
I thought about rownum but didn't come to a solution.
A solution which returns the position for just one message would be great.

I don't know if I have understood your problem.
If you need the message with a specified order id (for example: 5), you can execute something like:
SELECT message
FROM (SELECT message, ROWNUM AS ID
FROM (SELECT message
FROM tab1
ORDER BY some_date))
WHERE ID = 5;
Or, using analytical functions:
SELECT message
FROM (SELECT message,
ROW_NUMBER() OVER(ORDER BY some_date) AS ID
FROM tab1)
WHERE ID = 5;

This should work:
Message.where("id < :id", id: msg.id).count

Related

Double join on same table produces wrong result

Basically I have a Driver model that has many rides. Those rides has price field and I want to calculate driver's total_paid (the payment they have earned for all the time) and this_week_paid (the payment has been done only from the beginning of this week to the end of it) in one active record query.
I have achieved the correct number for total_paid part easily with one join like this:
Driver.joins(:rides).
select("#{Driver.table_name}.*, sum(substring(rides.price from '[0-9]+.[0-9]*')::numeric) as total_paid").
group("#{Driver.table_name}.id").
order("total_paid DESC, id")
Now when I try to add this_week_paid to that query:
Driver.joins("INNER JOIN rides this_week_rides ON #{Driver.table_name}.id = this_week_rides.driver_id").
joins("INNER JOIN rides all_rides ON #{Driver.table_name}.id = all_rides.driver_id").
select("#{Driver.table_name}.*, " +
"sum(substring(this_week_rides.price from '[0-9]+.[0-9]*')::numeric) as this_week_paid, " +
"sum(substring(all_rides.price from '[0-9]+.[0-9]*')::numeric) as total_paid").
where(this_week_rides: { created_at: Time.current.beginning_of_week..Time.current.end_of_week }).
group("#{Driver.table_name}.id").
order("this_week_paid DESC, id")
It runs without throwing any exceptions however, interestingly the total_paid field is two times of correct number and this_week_paid field is three times of the correct one ( Query answer: { this_week_paid: 188.46, total_paid: 159.9 }, the correct answer: { this_week_paid: 62.82, total_paid: 79.95 } ).
I did try to add where("this_week_rides.id != all_rides.id") and it gives me another wrong result ("this_week_paid" => 125.64,"total_paid" => 97.08)
What am I missing?
You join the same table twice and that will multiply the number of rows you get so that is why you get multiples of the expected result. Just join it once and filter in the select like this:
sum(substring(rides.price from '[0-9]+.[0-9]*')::numeric) filter (
where rides.created_at between time1 and time2
) as this_week_paid,
sum(substring(rides.price from '[0-9]+.[0-9]*')::numeric) as total_paid

Get the average of the most recent records within groups with ActiveRecord

I have the following query, which calculates the average number of impressions across all teams for a given name and league:
#all_team_avg = NielsenData
.where('name = ? and league = ?', name, league)
.average('impressions')
.to_i
However, there can be multiple entries for each name/league/team combination. I need to modify the query to only average the most recent records by created_at.
With the help of this answer I came up with a query which gets the result that I need (I would replace the hard-coded WHERE clause with name and league in the application), but it seems excessively complicated and I have no idea how to translate it nicely into ActiveRecord:
SELECT avg(sub.impressions)
FROM (
WITH summary AS (
SELECT n.team,
n.name,
n.league,
n.impressions,
n.created_at,
ROW_NUMBER() OVER(PARTITION BY n.team
ORDER BY n.created_at DESC) AS rowcount
FROM nielsen_data n
WHERE n.name = 'Social Media - Twitter Followers'
AND n.league = 'National Football League'
)
SELECT s.*
FROM summary s
WHERE s.rowcount = 1) sub;
How can I rewrite this query using ActiveRecord or achieve the same result in a simpler way?
When all you have is a hammer, everything looks like a nail.
Sometimes, raw SQL is the best choice. You can do something like:
#all_team_avg = NielsenData.find_by_sql("...your_sql_statement_here...")

Getting Conditional Count in Join with Laravel Query Builder

I am trying to achieve the following with Laravel Query builder.
I have a table called deals . Below is the basic schema
id
deal_id
merchant_id
status
deal_text
timestamps
I also have another table called merchants whose schema is
id
merchant_id
merchant_name
about
timestamps
Currently I am getting deals using the following query
$deals = DB::table('deals')
-> join ('merchants', 'deals.merchant_id', '=', 'merchants.merchant_id')
-> where ('merchant_url_text', $merchant_url_text)
-> get();
Since only 1 merchant is associated with a deal, I am getting deals and related merchant info with the query.
Now I have a 3rd table called tbl_deal_votes. Its schema looks like
id
deal_id
vote (1 if voted up, 0 if voted down)
timestamps
What I want to do is join this 3rd table (on deal_id) to my existing query and be able to also get the upvotes and down votes each deal has received.
To do this in a single query you'll probably need to use SQL subqueries, which doesn't seem to have good fluent query support in Laravel 4/5. Since you're not using Eloquent objects, the raw SQL is probably easiest to read. (Note the below example ignores your deals.deal_id and merchants.merchant_id columns, which can likely be dropped. Instead it just uses your deals.id and merchants.id fields by convention.)
$deals = DB::select(
DB::raw('
SELECT
deals.id AS deal_id,
deals.status,
deals.deal_text,
merchants.id AS merchant_id,
merchants.merchant_name,
merchants.about,
COALESCE(tbl_upvotes.upvotes_count, 0) AS upvotes_count,
COALESCE(tbl_downvotes.downvotes_count, 0) AS downvotes_count
FROM
deals
JOIN merchants ON (merchants.id = deals.merchant_id)
LEFT JOIN (
SELECT deal_id, count(*) AS upvotes_count
FROM tbl_deal_votes
WHERE vote = 1 && deal_id
GROUP BY deal_id
) tbl_upvotes ON (tbl_upvotes.deal_id = deals.id)
LEFT JOIN (
SELECT deal_id, count(*) AS downvotes_count
FROM tbl_deal_votes
WHERE vote = 0
GROUP BY deal_id
) tbl_downvotes ON (tbl_downvotes.deal_id = deals.id)
')
);
If you'd prefer to use fluent, this should work:
$upvotes_subquery = '
SELECT deal_id, count(*) AS upvotes_count
FROM tbl_deal_votes
WHERE vote = 1
GROUP BY deal_id';
$downvotes_subquery = '
SELECT deal_id, count(*) AS downvotes_count
FROM tbl_deal_votes
WHERE vote = 0
GROUP BY deal_id';
$deals = DB::table('deals')
->select([
DB::raw('deals.id AS deal_id'),
'deals.status',
'deals.deal_text',
DB::raw('merchants.id AS merchant_id'),
'merchants.merchant_name',
'merchants.about',
DB::raw('COALESCE(tbl_upvotes.upvotes_count, 0) AS upvotes_count'),
DB::raw('COALESCE(tbl_downvotes.downvotes_count, 0) AS downvotes_count')
])
->join('merchants', 'merchants.id', '=', 'deals.merchant_id')
->leftJoin(DB::raw('(' . $upvotes_subquery . ') tbl_upvotes'), function($join) {
$join->on('tbl_upvotes.deal_id', '=', 'deals.id');
})
->leftJoin(DB::raw('(' . $downvotes_subquery . ') tbl_downvotes'), function($join) {
$join->on('tbl_downvotes.deal_id', '=', 'deals.id');
})
->get();
A few notes about the fluent query:
Used the DB::raw() method to rename a few selected columns.
Otherwise, there would have been a conflict between deals.id
and merchants.id in the results.
Used COALESCE to default null votes to 0.
Split the subqueries into separate PHP strings to improve readability.
Used left joins for the subqueries so deals with no upvotes/downvotes still show up.

Sqlite Group By reverses the order

This is my query :
SELECT * FROM Message WHERE ParentMessage = ? GROUP BY MessageId
This reverses the order of the results.
Not sure why
Records on screen before Group By :
A
B
C
D
Records on screen after Group By :
D
C
B
A
In SQL, the results of a query do not have any guaranteed order unless you are using ORDER BY.
(In this case, it's likely that the query optimizer has estimated that using an index in a certain way would make the execution faster.)
If your tableName is Message and Column name is ParentMessage finally your row name is Message ID,the result is below like this
select * from Message WHERE ParentMessage in (MessageId) order by ParentMessage DESC
Otherwise
select * from Message WHERE ParentMessage order by MessageId DESC

Rails to always save a changed state what method would you override

Ok.. so I have boss that's a bit of a nut when it comes to using the date as an indicator of change. He doesn't trust it.
What I want to do is have something work the same way as the date update that comes native with active record, but instead base it on an ever increasing number..
I know... the number of seconds since 1973 is constantly getting bigger Well unless you count daylight savings and things.
I'm wondering if there are any thoughts, on how to do this gracefully..
Note I have 20 tables that need this and I am a big fan of DRY.
Have a look at http://api.rubyonrails.org/classes/ActiveRecord/Locking/Optimistic.html, I think this is exactly what you want.
Optimistic locking within ActiveRecord means that if a lock_version column is present on a specific table then it will be updated (+1) every time you change that record (via ActiveRecord, of course).
I ended up using a mass trigger inside the database.
The function creates a record (or updates it) in a new table called data_changed.
def create_trigger_function(schema)
puts "DAVE: creating trigger function for tenant|schema #{schema.to_s}"
sql = "CREATE OR REPLACE FUNCTION \""+schema+"\".insert_into_delta_table() RETURNS TRIGGER AS 'BEGIN
UPDATE \""+schema+"\".data_changes SET status = 1, created_at = now() where table_name = TG_TABLE_NAME and record_id = NEW.id;
INSERT INTO \""+schema+"\".data_changes (status, table_name, market_id, record_id, created_at)
( select m.* from \""+schema+"\".data_changes as ds right outer join
(select 1, CAST (TG_TABLE_NAME AS text ) as name , markets.id, NEW.id as record_id, now() from \""+schema+"\".markets) as m
on
ds.record_id = m.record_id
and ds.market_id = m.id
and table_name = name
where ds.id is null );
RETURN NULL;
END;' LANGUAGE plpgsql;"
connection.execute(sql);
end
Now all I have to do to find all the changed "products" is
update data_changes set status = 2 where status = 1 and table_name = 'products'
select * from products where id in (select record_id from data_changes where status = 2 and table_name = 'products')
update data_changes set status = 3 where status = 2 and table_name = 'products'
If a product gets updated after I do my first update, but before I do the select, then it won't show up in my select, because it's id will be reset to 1.
If a product gets updated after I do my select, but before I do the last update,then again it will not be affected, by the last update.
The contents of my select, will be out of date, but there's no real way of avoiding that.

Resources