Rails: Find biggest number out of val.size - ruby-on-rails

user = SkillUser.find_all_by_skill_id(skill_id)
user.size
gives me: 1 2 2 1 3 1 3 1 3 2 1 1 3
How can I get the biggest value (in this case 3) out of this row of numbers?
Thanks for help

You can use the maximum scope on your ActiveRelation:
SkillUser.maximum(:rating)
If you want the maximum of an attribute called rating.
If you want to count the number of users per skill id, try:
SkillUser.count(:group => :skill_id).max_by { |skill_id,count| count }
This gives you both the skill_id and the number of users for the skill with most users.
For a more efficient way (by doing the whole calculation in SQL), try:
SkillUser.limit(1).reverse_order.count(:group => :skill_id, :order => :count)
# Giving the SQL:
# => SELECT COUNT(*) AS count_all, "skill_users"."skill_id" AS skill_id
# FROM "skill_users" GROUP BY "skill_users"."skill_id"
# ORDER BY "skill_users"."count" DESC LIMIT 1
Be aware that count must be called last because it doesn't return an ActiveRelation for you to further scope the query.

You should use ActiveRecord::Calculations
http://ar.rubyonrails.org/classes/ActiveRecord/Calculations/ClassMethods.html
for performance reasons
1.9.3-194 (main):0 > User.maximum(:id)
(1.6ms) SELECT MAX("users"."id") AS max_id FROM "users"
=> 3

Fastest way to find a single maximum value in an unsorted list of
integer is to scan the list from left to right and memorize the
largest value so far.
If you sort the list first, you get the
additional benefit of easily finding the 2nd, 3rd etc. largest
values easily as well.
If you take one of the "maximum" methods hidden in ruby ... you should check what the implementors are doing to pick the max and compare it to 1. and 2. above :-)
Explanations:
to 1. Doing it this way, you just have to pick each value in the list exactly once and compare it once to the maximum so-far.
to 2. Sorting costs O(n*log n) ops in the average if you got a list with n entries. Obviously this is more than the O(n) in solution 1., but you get a bit more
to 3. Well.. I prefer knowing what happens, but your preferences might vary

Related

Duplicates in the result of a subquery

I am trying to count distinct sessionIds from a measurement. sessionId being a tag, I count the distinct entries in a "parent" query, since distinct() doesn't works on tags.
In the subquery, I use a group by sessionId limit 1 to still benefit from the index (if there is a more efficient technique, I have ears wide open but I'd still like to understand what's going on).
I have those two variants:
> select count(distinct(sessionId)) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 3757
> select count(sessionId) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 4206
To my understanding, those should return the same number, since group by sessionId limit 1 already returns distinct sessionIds (in the form of groups).
And indeed, if I execute:
select * from UserSession group by sessionId limit 1
I have 3757 results (groups), not 4206.
In fact, as soon as I put this in a subquery and re-select fields in a parent query, some sessionIds have multiple occurrences in the final result. Not always, since there is 17549 rows in total, but some are.
This is the sign that the limit 1 is somewhat working, but some sessionId still get multiple entries when re-selected. Maybe some kind of undefined behaviour?
I can confirm that I get the same result.
In my experience using nested queries does not always deliver what you expect/want.
Depending on how you use this you could retrieve a list of all values for a tag with:
SHOW TAG VALUES FROM UserSession WITH KEY=sessionId
Or to get the cardinality (number of distinct values for a tag):
SHOW TAG VALUES EXACT CARDINALITY FROM UserSession WITH KEY=sessionId.
Which will return a single row with a single column count, containing a number. You can remove the EXACT modifier if you don't need to be exact about the result: SHOW TAG VALUES CARDINALITY on Influx Documentation.

Better method to find the second largest element in ruby on rails active record query

I am using this query to find the 2nd largest element. I am making query on value column.
Booking.where("value < ?", Booking.maximum(:value)).last
Is there any better query than this? Or any alternative to this.
PS - value is not unique. There could be two elements with same value
This should work.
Booking.select("DISTINCT value").order('value DESC').offset(1).limit(1)
Which will generate this query :
SELECT DISTINCT value FROM "bookings" ORDER BY value DESC LIMIT 1 OFFSET 1
You can use offset and last:
Booking.order(:value).offset(1).last
Which will produce following SQL statement:
SELECT `bookings`.* FROM `bookings`
ORDER BY `bookings`.`value` DESC
LIMIT 1 OFFSET 1

ActiveRecord query searching for duplicates on a column, but returning associated records

So here's the lay of the land:
I have a Applicant model which has_many Lead records.
I need to group leads by applicant email, i.e. for each specific applicant email (there may be 2+ applicant records with the email) i need to get a combined list of leads.
I already have this working using an in-memory / N+1 solution
I want to do this in a single query, if possible. Right now I'm running one for each lead which is maxing out the CPU.
Here's my attempt right now:
Lead.
all.
select("leads.*, applicants.*").
joins(:applicant).
group("applicants.email").
having("count(*) > 1").
limit(1).
to_a
And the error:
Lead Load (1.2ms) SELECT leads.*, applicants.* FROM "leads" INNER
JOIN "applicants" ON "applicants"."id" = "leads"."applicant_id"
GROUP BY applicants.email HAVING count(*) > 1 LIMIT 1
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column
"leads.id" must appear in the GROUP BY clause or be used in an
aggregate function
LINE 1: SELECT leads.*, applicants.* FROM "leads" INNER JOIN
"appli...
This is a postgres specific issue. "the selected fields must appear in the GROUP BY clause".
must appear in the GROUP BY clause or be used in an aggregate function
You can try this
Lead.joins(:applicant)
.select('leads.*, applicants.email')
.group_by('applicants.email, leads.id, ...')
You will need to list all the fields in leads table in the group by clause (or all the fields that you are selecting).
I would just get all the records and do the grouping in memory. If you have a lot of records, I would paginate them or batch them.
group_by_email = Hash.new { |h, k| h[k] = [] }
Applicant.eager_load(:leads).each_batch(10_000) do |batch|
batch.each do |applicant|
group_by_email[:applicant.email] << applicant.leads
end
end
You need to use a .where rather than using Lead.all. The reason it is maxing out the CPU is you are trying to load every lead into memory at once. That said I guess I am still missing what you actually want back from the query so it would be tough for me to help you write the query. Can you give more info about your associations and the expected result of the query?

Order By String Property Value and Pagination in Neo4j

I am using neo4j to create a social network application. The data model has a FRIEND relationship between two USER nodes. I need to get all the friends of mine ordered by displayName (Unique Indexed).
I need pagination for this query. I will send the last name from the list I got from the previous query results. And I want to limit each page to 20 names.
MATCH (u:USER{displayName:{id}})-[:FRIEND]-(f:USER)
RETURN f
ORDER BY f.displayName
LIMIT 20;
What is the best way to do this? Will SKIP work here, sending SKIP 0, SKIP 1*20, SKIP 2*20, ...
You can use the query in this way i think :
ORDER BY f.displayName LIMIT START_POSITION , LAST_POSITION;
For example:
ORDER BY f.displayName LIMIT 0 , 20;
ORDER BY f.displayName LIMIT 21 , 40;
Yes, you can use the SKIP clause to do what you want. In the following, I assume that you provide the page value (starting at 0) as a parameter.
MATCH (u:USER{displayName:{id}})-[:FRIEND]-(f:USER)
RETURN f
ORDER BY f.displayName
SKIP {page} * 20
LIMIT 20;
Note that this technique is not foolproof if the list of friends can change during paging.

How to select data for defined page and total count of records?

I have a table with paginated data and this is the way I select data for each page:
#visitors = EventsVisitor
.select('visitors.*, events_visitors.checked_in, events_visitors.checkin_date, events_visitors.source, events_visitors.id AS ticket_id')
.joins(:visitor)
.order(order)
.where(:event_id => params[:event_id])
.where(filter_search)
.where(mode)
.limit(limit)
.offset(offset)
Also to build table pagination I need to know total count of records. Currently my solution for this is very rough:
total = EventsVisitor
.select('count(*) as count, events_visitors.*')
.joins(:visitor)
.order(order)
.where(:event_id => params[:event_id])
.where(filter_search)
.where(mode)
.first()
.count
So my question is as follows - What is the optimal ruby way to select limited data for the current page and total count of records?
I noticed that if I do #visitors.count - additional sql query will be generated:
SELECT COUNT(count_column) FROM (SELECT 1 AS count_column FROM `events_visitors` INNER JOIN `visitors` ON `visitors`.`id` = `events_visitors`.`visitor_id` WHERE `events_visitors`.`event_id` = 1 LIMIT 15 OFFSET 0) subquery_for_count
First of all, I do not understand what is the reason to send an additional query to get a count of data that we already have, I mean that after we got data from database in #visitors we can count it with ruby without need to send additional queries to DB.
Second - I thought that maybe there are some ways to use something like .total_count that will generate similar count(*) query but without that useless limit/offset?
you should except limit and offset
http://guides.rubyonrails.org/active_record_querying.html#except .
See how kaminari does it
https://github.com/kaminari/kaminari/blob/92052eedf047d65df71cc0021a9df9df1e2fc36e/lib/kaminari/models/active_record_relation_methods.rb#L11
So it might be something like
total = #visitors.except(:offset, :limit, :order).count

Resources