How to use join with sort in Solr? - join

I'm trying to sort documents of type 'Case' by the 'Name' of the 'Contact' they belong to in Solr. But cases have no 'ContactName' field or similar, only 'ContactId'.
Only examples I could find are iterations of the example on this link: https://wiki.apache.org/solr/Join
But I couldn't apply it to my situation because of the sorting afterwards. The following gives me the cases I want but I can't sort it by the contact name afterwards because it only returns the fields of the cases.
{!join from=Id to=ContactId}*:*
SQL equivalent of what I want would be something like:
SELECT Case.Id, Contact.Name
FROM Case
LEFT JOIN Contact
ON Case.ContactId = Contact.Id
ORDER BY Contact.Name ASC;

So to answer my own question after some digging and a Solr training:
It is not best practice to use joins in a NoSql database like Solr. If you need joins, then your database is structured wrong. You should index everything you need, in the document itself, even if it is redundant. So in my case, I should index 'Contact.Name' field in my 'Case' documents.
Still, it is apparently possible to use SQL queries in Solr in case it is absolutely needed, if you're using SolrCloud but it doesn't support joins. However it is possible to work around that as follows:
SELECT s1.Id
FROM salesforce s1, salesforce s2
WHERE s1._type_ = 'Case' and s2._type_ = 'Contact' AND s1.ContactId = s2.Id
ORDER BY s2.Name ASC
It should be noted that the fields after '.' like the 'Id' in 's1.Id' must have 'docValues' activated in the schema. More info on docValues is here.

Related

Need help writing sql query in rails so I get ActiveRelation

I need an active record relation that gives me the latest record of a region, city, bed combination. I have the sql query written as below, but I need to figure out if there is away to use a different approach to have it return an active record relation and not an array. Any suggestions?
Current query:
#current_ltm_market_stats = LtmStatsByBedCount.find_by_sql(" SELECT *
FROM ltm_stats_by_bed_counts lstats
WHERE lstats.city_id = '#{#city_id}'
#{#region_id_condition}
AND (lstats.beds,lstats.city_id,lstats.region_id,lstats.reporting_date)
IN (SELECT lstats.beds,
lstats.city_id,
lstats.region_id,
max(lstats.reporting_date)
FROM ltm_stats_by_bed_counts lstats
WHERE lstats.city_id = '#{#city_id}'
#{#region_id_condition}
GROUP BY city_id, region_id, beds)
ORDER BY lstats.year DESC,lstats.month DESC")
I had tried this before which did result in a relation but it runs really slowly and the result is not exactly the same. Are there any better rails ways to do this?
#all_ltm_market_stats = LtmStatsByBedCount.where(city_id: #market.city_id, region_id: #market.region_id)
#current_ltm_market_stats = #latest_year_ltm_market_stats.where(month: #latest_year_ltm_market_stats.all_ltm_market_stats.select('Max(year)'))
Information in the question is incomplete, so i might have to update my answer when additional details are added, But here is the initial draft with available information:
#current_ltm_market_stats = LtmStatsByBedCount.
where(city_id: #city_id).
where(#region_id_condition).
where("(beds, city_id, region_id, reporting_date) IN (
SELECT lstats.beds,
lstats.city_id,
lstats.region_id,
max(lstats.reporting_date)
FROM ltm_stats_by_bed_counts lstats
WHERE lstats.city_id = '#{#city_id}'
#{#region_id_condition}
GROUP BY city_id, region_id, beds)").
order(year: :desc, month: :desc)
Note that you might have to adjust your #region_id_condition a bit for this to work.
Theoretically it is equivalent of your SQL version(which means it will generate same sql excluding table alias) and returns the AR relation object. Which is the only requirement in the question. Obviously SQL might be improved with additional information as well.
Additionally, you will want to have carefully crafted indexes on this table if you are going to use this query on larger datasets frequently.

Is there anyway to make a lesser impact on my database with this request?

For the analytics of my site, I'm required to extract the 4 states of my users.
#members = list.members.where(enterprise_registration_id: registration.id)
# This pulls roughly 10,0000 records.. Which is evidently a huge data pull for Rails
# Member Load (155.5ms)
#invited = #members.where("user_id is null")
# Member Load (21.6ms)
#not_started = #members.where("enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
# Member Load (82.9ms)
#in_progress = #members.joins(:quizzes).where('quizzes.section_id IN (?) and (quizzes.completed is null or quizzes.completed = ?)', #sections.map(&:id), false).group("enterprise_members.id HAVING count(quizzes.id) > 0")
# Member Load (28.5ms)
#completes = Quiz.where(enterprise_member_id: registration.members, section_id: #sections.map(&:id)).completed
# Quiz Load (138.9ms)
The operation returns a 503 meaning my app gives up on the request. Any ideas how I can refactor this code to run faster? Maybe by better joins syntax? I'm curious how sites with larger datasets accomplish what seems like such trivial DB calls.
The answer is your indexes. Check your rails logs (or check the console in development mode) and copy the queries to your db tool. Slap an "Explain" in front of the query and it will give you a breakdown. From here you can see what indexes you need to optimize the query.
For a quick pass, you should at least have these in your schema,
enterprise_members: needs an index on enterprise_member_id
members: user_id
quizes: section_id
As someone else posted definitely look into adding indexes if needed. Some of how to refactor depends on what exactly you are trying to do with all these records. For the #members query, what are you using the #members records for? Do you really need to retrieve all attributes for every member record? If you are not using every attribute, I suggest only getting the attributes that you actually use for something, .pluck usage could be warranted. 3rd and 4th queries, look fishy. I assume you've run the queries in a console? Again not sure what the queries are being used for but I'll toss in that it is often useful to write raw sql first and query on the db first. Then, you can apply your findings to rewriting activerecord queries.
What is the .completed tagged on the end? Is it supposed to be there? only thing I found close in the rails api is .completed? If it is a custom method definitely look into it. You potentially also have an use case for scopes.
THIRD QUERY:
I unfortunately don't know ruby on rails, but from a postgresql perspective, changing your "not in" to a left outer join should make it a little faster:
Your code:
enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
Better version (in SQL):
select blah
from enterprise_members em
left outer join quizzes q on q.enterprise_member_id = em.id
join users u on u.id = q.enterprise_member_id
where quizzes.section_id in (?)
and q.enterprise_member_id is null
Based on my understanding this will allow postgres to sort both the enterprise_members table and the quizzes and do a hash join. This is better than when it will do now. Right now it finds everything in the quizzes subquery, brings it into memory, and then tries to match it to enterprise_members.
FIRST QUERY:
You could also create a partial index on user_id for your first query. This will be especially good if there are a relatively small number of user_ids that are null in a large table. Partial index creation:
CREATE INDEX user_id_null_ix ON enterprise_members (user_id)
WHERE (user_id is null);
Anytime you query enterprise_members with something that matches the index's where clause, the partial index can be used and quickly limit the rows returned. See http://www.postgresql.org/docs/9.4/static/indexes-partial.html for more info.
Thanks everyone for your ideas. I basically did what everyone said. I added indexes, resorted how I called everything, but the major difference was using the pluck method.. Here's my new stats :
#alt_members = list.members.pluck :id # 23ms
if list.course.sections.tests.present? && #sections = list.course.sections.tests
#quiz_member_ids = Quiz.where(section_id: #sections.map(&:id)).pluck(:enterprise_member_id) # 8.5ms
#invited = list.members.count('user_id is null') # 12.5ms
#not_started = ( #alt_members - ( #alt_members & #quiz_member_ids ).count #0ms
#in_progress = ( #alt_members & #quiz_member_ids ).count # 0ms
#completes = ( #alt_members & Quiz.where(section_id: #sections.map(&:id), completed: true).pluck(:enterprise_member_id) ).count # 9.7ms
#question_count = Quiz.where(section_id: #sections.map(&:id), completed: true).limit(5).map{|quiz|quiz.answers.count}.max # 3.5ms

Return duplicate records (activerecord, postgres)

I have the following query returning duplicate titles, but :id is nil:
Movie.select(:title).group(:title).having("count(*) > 1")
[#<Movie:0x007f81f7111c20 id: nil, title: "Fargo">,
#<Movie:0x007f81f7111ab8 id: nil, title: "Children of Men">,
#<Movie:0x007f81f7111950 id: nil, title: "The Martian">,
#<Movie:0x007f81f71117e8 id: nil, title: "Gravity">]
I tried adding :id to the select and group but it returns an empty array. How can I return the whole movie record, not just the titles?
A SQL-y Way
First, let's just solve the problem in SQL, so that the Rails-specific syntax doesn't trick us.
This SO question is a pretty clear parallel: Finding duplicate values in a SQL Table
The answer from KM (second from the top, non-checkmarked, at the moment) meets your criteria of returning all duplicated records along with their IDs. I've modified KM's SQL to match your table...
SELECT
m.id, m.title
FROM
movies m
INNER JOIN (
SELECT
title, COUNT(*) AS CountOf
FROM
movies
GROUP BY
title
HAVING COUNT(*)>1
) dupes
ON
m.title=dupes.title
The portion inside the INNER JOIN ( ) is essentially what you've generated already. A grouped table of duplicated titles and counts. The trick is JOINing it to the unmodified movies table, which will exclude any movies that don't have matches in the query of dupes.
Why is this so hard to generate in Rails? The trickiest part is that, because we're JOINing movies to movies, we have to create table aliases (m and dupes in my query above).
Sadly, it Rails doesn't provide any clean ways of declaring these aliases. Some references:
Rails GitHub issues mentioning "join" and "alias". Misery.
SO Question: ActiveRecord query with alias'd table names
Fortunately, since we've got the SQL in-hand, we can use the .find_by_sql method...
Movie.find_by_sql("SELECT m.id, m.title FROM movies m INNER JOIN (SELECT title, COUNT(*) FROM movies GROUP BY title HAVING COUNT(*)>1) dupes ON m.first=.first")
Because we're calling Movie.find_by_sql, ActiveRecord assumes our hand-written SQL can be bundled into Movie objects. It doesn't massage or generate anything, which lets us do our aliases.
This approach has its shortcomings. It returns an array and not an ActiveRecord Relation, which means it can't be chained with other scopes. And, in the documentation for the find_by_sql method, we get extra discouragement...
This should be a last resort because using, for example, MySQL specific terms will lock you to using that particular database engine or require you to change your call if you switch engines.
A Rails-y Way
Really, what is the SQL doing above? It's getting a list of names that appear more than once. Then, it's matching that list against the original table. So, let's just do that using Rails.
titles_with_multiple = Movie.group(:title).having("count(title) > 1").count.keys
Movie.where(title: titles_with_multiple)
We call .keys because the first query returns an hash. The keys are our titles. The where() method can take an array, and we've handed it an array of titles. Winner.
You could argue one line of Ruby is more elegant than two. And if that one line of Ruby has an ungodly string of SQL embedded within it, how elegant is it really?
Hope this helps!
You can try to add id in your select:
Movie.select([:id, :title]).group(:title).having("count(title) > 1")

executing query from ruby on rails the right way

I'm just beginning with ruby on rails and have a question regarding a bit more complex query. So far I've done simple queries while looking at rails guide and it worked really well.
Right now I'm trying to get some Ids from database and I would use those Ids to get the real objects and do something with them. Getting those is a bit more complex than simple Object.find method.
Here is how my query looks like :
select * from quotas q, requests r
where q.id=r.quota_id
and q.status=3
and r.text is not null
and q.id in
(
select A.id from (
select max(id) as id, name
from quotas
group by name) A
)
order by q.created_at desc
limit 1000;
This would give me 1000 ids when executing this query from sql manager. And I was thinking to obtain the list of ids first and then find objects by id.
Is there a way to get these objects directly by using this query? Avoiding ids lookup? I googled that you can execute query like this :
ActiveRecord::Base.connection.execute(query);
Assuming Quota has_many :requests,
Quota.includes(:requests).
where(status:3).
where('requests.text is not null').
where("quotas.id in (#{subquery_string_here})").
order('quotas.created_at desc').limit(1000)
I'm by no means an expert but most basic SQL functionality is baked into ActiveRecord. You might also want to look at the #group and #pluck methods for ways to eliminate the ugly string subquery.
Calling #to_sql on a relationship object will show you the SQL command it is equivalent to, and may help with your debugging.
I would use find_by_sql for this. I wouldn't swear that this is exactly right, but as I recall you can pretty much plonk an SQL statement into a find_by_sql and the resulting columns will be returned as attributes of an array of objects of the class you call it on:
status = 3
Quota.find_by_sql('
select *
from quotas q, requests r
where q.id=r.quota_id
and q.status= ?
and r.text is not null
and q.id in
(
select A.id from (
select max(id) as id, name
from quotas
group by name) A
)
order by q.created_at desc
limit 1000;', status)
If you come to Rails as someone used to writing raw SQL, you're probably better off using this syntax than stringing together a bunch of ActiveRecord methods - the result is the same, so it's just a matter of what you find more readable.
Btw, you shouldn't use string interpolation (i.e. #{variable} syntax) inside an SQL query. Use the '?' syntax instead (see my example) to avoid SQL injection potential.

rails select and include

Can anyone explain this?
Project.includes([:user, :company])
This executes 3 queries, one to fetch projects, one to fetch users for those projects and one to fetch companies.
Project.select("name").includes([:user, :company])
This executes 3 queries, and completely ignores the select bit.
Project.select("user.name").includes([:user, :company])
This executes 1 query with proper left joins. And still completely ignores the select.
It would seem to me that rails ignores select with includes. Ok fine, but why when I put a related model in select does it switch from issuing 3 queries to issuing 1 query?
Note that the 1 query is what I want, I just can't imagine this is the right way to get it nor why it works, but I'm not sure how else to get the results in one query (.joins seems to only use INNER JOIN which I do not in fact want, and when I manually specifcy the join conditions to .joins the search gem we're using freaks out as it tries to re-add joins with the same name).
I had the same problem with select and includes.
For eager loading of associated models I used native Rails scope 'preload' http://apidock.com/rails/ActiveRecord/QueryMethods/preload
It provides eager load without skipping of 'select' at scopes chain.
I found it here https://github.com/rails/rails/pull/2303#issuecomment-3889821
Hope this tip will be helpful for someone as it was helpful for me.
Allright so here's what I came up with...
.joins("LEFT JOIN companies companies2 ON companies2.id = projects.company_id LEFT JOIN project_types project_types2 ON project_types2.id = projects.project_type_id LEFT JOIN users users2 ON users2.id = projects.user_id") \
.select("six, fields, I, want")
Works, pain in the butt but it gets me just the data I need in one query. The only lousy part is I have to give everything a model2 alias since we're using meta_search, which seems to not be able to figure out that a table is already joined when you specify your own join conditions.
Rails has always ignored the select argument(s) when using include or includes. If you want to use your select argument then use joins instead.
You might be having a problem with the query gem you're talking about but you can also include sql fragments using the joins method.
Project.select("name").joins(['some sql fragement for users', 'left join companies c on c.id = projects.company_id'])
I don't know your schema so i'd have to guess at the exact relationships but this should get you started.
I might be totally missing something here but select and include are not a part of ActiveRecord. The usual way to do what you're trying to do is like this:
Project.find(:all, :select => "users.name", :include => [:user, :company], :joins => "LEFT JOIN users on projects.user_id = users.id")
Take a look at the api documentation for more examples. Occasionally I've had to go manual and use find_by_sql:
Project.find_by_sql("select users.name from projects left join users on projects.user_id = users.id")
Hopefully this will point you in the right direction.
I wanted that functionality myself,so please use it.
Include this method in your class
#ACCEPTS args in string format "ASSOCIATION_NAME:COLUMN_NAME-COLUMN_NAME"
def self.includes_with_select(*m)
association_arr = []
m.each do |part|
parts = part.split(':')
association = parts[0].to_sym
select_columns = parts[1].split('-')
association_macro = (self.reflect_on_association(association).macro)
association_arr << association.to_sym
class_name = self.reflect_on_association(association).class_name
self.send(association_macro, association, -> {select *select_columns}, class_name: "#{class_name.to_sym}")
end
self.includes(*association_arr)
end
And you will be able to call like: Contract.includes_with_select('user:id-name-status', 'confirmation:confirmed-id'), and it will select those specified columns.
The preload solution doesn't seem to do the same JOINs as eager_load and includes, so to get the best of all worlds I also wrote my own, and released it as a part of a data-related gem I maintain, The Brick.
By overriding ActiveRecord::Associations::JoinDependency.apply_column_aliases() like this then when you add a .select(...) then it can act as a filter to choose which column aliases get built out.
With gem 'brick' loaded, in order to enable this selective behaviour, add the special column name :_brick_eager_load as the first entry in your .select(...), which turns on the filtering of columns while the aliases are being built out. Here's an example:
Employee.includes(orders: :order_details)
.references(orders: :order_details)
.select(:_brick_eager_load,
'employees.first_name', 'orders.order_date', 'order_details.product_id')
Because foreign keys are essential to have everything be properly associated, they are automatically added, so you do not need to include them in your select list.
Hope it can save you both query time and some RAM!

Resources