Rails loop through ActiveRecord::Associations:CollectionProxy - ruby-on-rails

I've got a ActiveRecord::Associations:CollectionProxy of the format
#<DailyViewMetric article_id: xxxxxx, date: xxxxxxx, views: xxxxxx, visitors xxxxx,.....> in a variable called metrics.
The article_id is a foreign key and is repeated in the table (as one article can have metrics on consecutive days (i.e. 15 views today and 20 the day after).
I need a way to loop through this and apply different operations to some of these metrics like get the smallesest date for each article, the total number of views (= sum(views)). I tried metrics.map do |a| and metrics.collect but always just get ruby errors undefined method 'views' For example let's go with the following simplified dataset
article_id date views
1 2014-01-01 10
2 2014-01-01 15
1 2014-01-02 20
2 2014-01-02 12
3 2014-01-02 6
should result in the following new array afterwards:
article_id date views
1 2014-01-01 30
2 2014-01-01 27
3 2014-01-02 6
as you can see the views variable holds the sum of the views for that respective article, the date variable is the minimum of the dates. How do I do this properly? I also tried metrics.to_a but I still get this error.
EDIT
I tried DailyViewMetric.find_by_sql("SELECT article_id, sum(views) from daily_view_metrics where article_id in(SELECT id from articles where user_id=xxx) GROUP BY article_id")
which, if i execute the query in the mysql console, works perfectly fine and returns the second table from up above. but when I run it in the rails console it gives me
[#<DailyViewMetric id: nil, article_id: 1089536>, #<DailyViewMetric id: nil, article_id: 1128849>, #<DailyViewMetric id: nil, article_id: 1141623>,

You can do this completely in SQL/ActiveRecord. The query you want to run ultimately is
SELECT article_id, min(date), sum(views)
FROM daily_value_metrics -- Or whatever your table is called
GROUP BY article_id
You can run this with ActiveRecord with the following:
table = DailyValueMetric.arel_table
results = DailyValueMetric.select(table[:article_id],
table[:date].minimum.as('date'),
table[:views].sum.as('views')).group(:article_id).to_a
# Calling to_a so I can call first for the example
results.first.date #=> date
results.first.views #=> views
results.first.article_id #=> Id
The records will look like
[#<DailyViewMetric id: nil, article_id: 1089536>, ...]
Because the SQL query does not return an id column in the result set. This is because of the way that ActiveRecord::Base#inspect shows the columns defined on the table, and not the returned values outside of the table columns. Views and Date will not necessarily be shown, unless there is a column with the same name, but if you call those attributes on the model instance, you will be able to get the value

Related

Rails Query Table for all records that were created between a certain time of all days

I have a rails app with PGSQL Database, that allows users to add Posts.
For analytics purposes, I need to query records that were created between 2 ad 4pm of every day and display that count.
I could get all results and iterate through it, but i do not think that is an efficient way of going about things, cause I am expecting 1000s of records each day.
Use PostgreSQL's EXTRACT for that. You can refer to more datetime functions here.
Your query should look like:
Post.where("EXTRACT(hour FROM created_at) BETWEEN 14 AND 16")
add a .count if you're interested in the count only.
Please keep timezones in mind, if you're saving datetimes in UTC in database but you're looking for posts between 14-16 in the local timezone, you'll have to shift the hours a bit (or use AT TIME ZONE from the link I mentioned to factor in for DST).
class Post < ApplicationRecord
scope :todays_latest_posts, { where("(HOUR(created_at) BETWEEN ? AND ?) AND DATE(created_at) = ?", 14, 16, Date.today) }
end
Query records that were created between 2 ad 4pm of every day and display =>
Post.todays_latest_posts
Sql query will be
SELECT `posts`.* FROM `posts` WHERE ((HOUR(created_at) BETWEEN 14 AND 16) AND DATE(created_at) = '2018-11-24') LIMIT 11
Output =>
=> #<ActiveRecord::Relation [#<post id: 112, title: nil, .... created_at: "2018-11-24 11:31:35", updated_at: "2018-11-24 11:31:36">, [], ...]>
Count =>
Post.todays_latest_posts.count

Rails ActiveRecord: Missing column in grouping query

Key.select('products.name as product, product_groups.name as product_group, AVG(keys.cost) as cost')
.group('products.id, product_groups.id')
.left_joins(:product,:product_group)
the result:
=> #<ActiveRecord::Relation [#<Key id: nil, cost: 0.6e1>, #<Key id: nil, cost: 0.4e1>]>
Expected return 3 field, but returnig value: 2 field.
I found the solution. The detail areas in the console did not appear as HASH.
In my understanding the grouping statement would only return Aggregated columns and columns that are used to group the data set. In your case you have not used the grouping columns in the select list but, some other fields. As a result you don't receive the other two columns.

.distinct not returning unique entries

I have a table of Projects, in that table, there may be the same project multiple times with the same name, however the created_at month will be different. I'm trying to select the most recent record of each project in my table. Using select works, however I need the entire record so that then I can loop through the records and print out different attributes eg price or what not.
I've tried:
Project.distinct(:project_name) – Prints all records (to check this I copied the project name and did a find and all projects with the identical name would still print out)
Project.order(project_name: :asc, created_at: :desc).uniq(:project_name) – Same result as above
Project.select(:project_name).distinct – Pulls only 1 of each, however it only selects the project name and no other data from the record.
This is the case where DISTINCT ON comes to rescue.
This should work:
Project.select("DISTINCT ON (project_name) *").order("project_name, created_at DESC")
for selecting only particular columns specify them instead of *.
Project.select("DISTINCT ON (project_name) project_name, created_at, price").order("project_name, created_at DESC")

Whats the best way to get the nth to last record?

I saw here: How to get last N records with activerecord? that the best way to get the last 5 records is SomeModel.last(5).
Is the best way to get the 5th last record only SomeModel.last(5).first or is it something else?
Thanks in advance!
What you're looking for is a combination of LIMIT, OFFSET, and ensuring that the query is using ORDER BY as a constraint. This is described on the PostgreSQL - Queries: Limit and Offset documentation page.
An example is:
irb> SomeModel.count
=> 35
irb> SomeModel.order(:id).reverse_order.limit(1).offset(4)
# SomeModel Load (0.7ms) SELECT "some_models".* FROM "some_models" ORDER BY "some_models".id DESC LIMIT 1 OFFSET 4
=> #<ActiveRecord::Relation [#<SomeModel id: 31, name: "hello world", created_at: "2014-03-24 21:52:46", updated_at: "2014-03-24 21:52:46">]>
Which yields the same result (albeit as an AR::Relation) as:
irb> SomeModels.last(5).first
# SELECT "some_models".* FROM "some_models" ORDER BY "some_models"."id" DESC LIMIT 5
=> #<SomeModel id: 31, name: "hello world", created_at: "2014-03-24 21:52:46", updated_at: "2014-03-24 21:52:46">
The difference is that in the former query, PostgreSQL will treat the .reverse_order.limit.offset query in memory, and restrict the values appropriately before it returns the result. In the latter, PostgreSQL will return 5 results, and you restrict the values in Ruby by using #first.
You can use offset with a query in the reverse order. For example, the fifth last created record can be retrieved with:
SomeModel.order("created_at DESC").offset(4).first
The simplest way to get the 5th last record is with combination of descending sort, and fifth:
SomeModel.order("created_at DESC").fifth

Keep historical database relations integrity when data changes

I hesitate between various alternative when it comes to relations that have "historical"
value.
For example, let's say an User has bought an item at a certain date... if I just store this the classic way like:
transation_id: 1
user_id: 2
item_id: 3
created_at: 01/02/2010
Then obviously the user might change its name, the item might change its price, and 3 years later when I try to create a report of what happend I have false data.
I have two alternative:
keep it stupid like I shown earlier, but use something like https://github.com/airblade/paper_trail and do something like:
t = Transaction.find(1);
u = t.user.version_at(t.created_at)
create a database like transaction_users and transaction_items and copy the users/items into these tables when a transaction is made. The structure would then become:
transation_id: 1
transaction_user_id: 2
transaction_item_id: 3
created_at: 01/02/2010
Both approach have their merits, tho solution 1 looks much simpler... Do you see a problem with solution 1? How is this "historical data" problem usually solved? I have to solve this problem for 2-3 models like this for my project, what do you reckon would be the best solution?
Taking the example of Item price, you could also:
Store a copy of the price at the time in the transaction table
Creating a temporal table for item prices
Storing a copy of the price in the transaction table:
TABLE Transaction(
user_id -- User buying the item
,trans_date -- Date of transaction
,item_no -- The item
,item_price -- A copy of Price from the Item table as-of trans_date
)
Getting the price as of the time of transaction is then simply:
select item_price
from transaction;
Creating a temporal table for item prices:
TABLE item (
item_no
,etcetera -- All other information about the item, such as name, color
,PRIMARY KEY(item_no)
)
TABLE item_price(
item_no
,from_date
,price
,PRIMARY KEY(item_no, from_date)
,FOREIGN KEY(item_no)
REFERENCES item(item_no)
)
The data in the second table would looke something like:
ITEM_NO FROM_DATE PRICE
======= ========== =====
A 2010-01-01 100
A 2011-01-01 90
A 2012-01-01 50
B 2013-03-01 60
Saying that from the first of January 2010 the price of Article A was 100. It changed the first of Januari 2011 to 90, and then again to 50 from the first of January 2012.
You will most likely add a TO_DATE to the table, even though it's a denormalization (the TO_DATE is the next FROM_DATE).
Finding the price as of the transaction would be something along the lines of:
select t.item_no
,t.trans_date
,p.item_price
from transaction t
join item_price p on(
t.item_no = p.item_no
and t.trans_date between p.from_date and p.to_date
);
ITEM_NO TRANS_DATE PRICE
======= ========== =====
A 2010-12-31 100
A 2011-01-01 90
A 2011-05-01 90
A 2012-01-01 50
A 2012-05-01 50
I'll went with PaperTrail, it keeps history of all my models, even their destruction. I could always switch to point 2 later on if it doesn't scale.

Resources