There is some code in the project I'm working on where a dynamic finder behaves differently in one code branch than it does in another.
This line of code returns all my advertisers (there are 8 of them), regardless of which branch I'm in.
Advertiser.findAllByOwner(ownerInstance)
But when I start adding conditions, things get weird. In branch A, the following code returns all of my advertisers:
Advertiser.findAllByOwner(ownerInstance, [max: 25])
But in branch B, that code only returns 1 advertiser.
It doesn't seem possible that changes in application code could affect how a dynamic finder works. Am I wrong? Is there anything else that might cause this not to work?
Edit
I've been asked to post my class definitions. Instead of posting all of it, I'm going to post what I think is the important part:
static mapping = {
owner fetch: 'join'
category fetch: 'join'
subcategory fetch: 'join'
}
static fetchMode = [
grades: 'eager',
advertiserKeywords: 'eager',
advertiserConnections: 'eager'
]
This code was present in branch B but absent from branch A. When I pull it out, things now work as expected.
I decided to do some more digging with this code present to see what I could observe. I found something interesting when I used withCriteria instead of the dynamic finder:
Advertiser.withCriteria{owner{idEq(ownerInstance.id)}}
What I found was that this returned thousands of duplicates! So I tried using listDistinct:
Adviertiser.createCriteria().listDistinct{owner{idEq(ownerInstance.id)}}
Now this returns all 8 of my advertisers with no duplicates. But what if I try to limit the results?
Advertiser.createCriteria().listDistinct{
owner{idEq(ownerInstance.id)}
maxResults 25
}
Now this returns a single result, just like my dynamic finder does. When I cranked maxResults upto 100K, now I get all 8 of my results.
So what's happening? It seems that the joins or the eager fetching (or both) generated sql that returned thousands of duplicates. Grails dynamic finders must return distinct results by default, so when I wasn't limiting them, I didn't notice anything strange. But once I set a limit, since the records were ordered by ID, the first 25 records would all be duplicate records, meaning that only one distinct record will be returned.
As for the joins and eager fetching, I don't know what problem that code was trying to solve, so I can't say whether or not it's necessary; the question is, why does having this code in my class generate so many duplicates?
I found out that the eager fetching was added (many levels deep) in order to speed up the rendering of certain reports, because hundreds of queries were being made. Attempts were made to eager fetch on demand, but other developers had difficulty going more than one level deep using finders or Grails criteria.
So the general answer to the question above is: instead of eager by default, which can cause huge nightmares in other places, we need to find a way to do eager fetching on a single query that can go more than one level down the tree
The next question is, how? It's not very well supported in Grails, but it can be achieved by simply using Hibernate's Criteria class. Here's the gist of it:
def advertiser = Advertiser.createCriteria()
.add(Restrictions.eq('id', advertiserId))
.createCriteria('advertiserConnections', CriteriaSpecification.INNER_JOIN)
.setFetchMode('serpEntries', FetchMode.JOIN)
.uniqueResult()
Now the advertiser's advertiserConnections, will be eager fetched, and the advertiserConnections' serpEntries will also be eager fetched. You can go as far down the tree as you need to. Then you can leave your classes lazy by default - which they definitely should be for hasMany scenarios.
Since your query are retrieving duplicates, there's a chance that this limit of 25 records return the same data, consequently your distinct will reduce to one record.
Try to define the equals() and hashCode() to your classes, specially if you have some with composite primary key, or is used as hasMany.
I also suggest you to try to eliminate the possibilities. Remove the fetch and the eager one by one to see how it affects your result data (without limit).
Related
I have an extremely slow query that looks like this:
people = includes({ project: [{ best_analysis: :main_language }, :logo] }, :name, name_fact: :primary_language)
.where(name_id: limit(limit).unclaimed_people(opts))
Look at the includes method call and notice that is loading huge number of associations. In the RailsSpeed book, there is the following quote:
“For example, consider this:
Car.all.includes(:drivers, { parts: :vendors }, :log_messages)
How many ActiveRecord objects might get instantiated here?
The answer is:
# Cars * ( avg # drivers/car + avg log messages/car + average parts/car * ( average parts/vendor) )
Each eager load increases the number of instantiated objects, and in turn slows down the query. If these objects aren't used, you're potentially slowing down the query unnecessarily. Note how nested eager loads (parts and vendors in the example above) can really increase the number of objects instantiated.
Be careful with nesting in your eager loads, and always test with production-like data to see if includes is really speeding up your overall performance.”
The book fails to mention what could be a good substitute for this though. So my question is what sort of technique could I substitute for includes?
Before i jump to answer. I don't see you using any pagination or limit on a query, that may help quite a lot.
Unfortunately, there aren't any, really. And if you use all of the objects in a view that's okay. There is a one possible substitute to includes, though. It quite complex, but it still helpful sometimes: you join all needed tables, select only fields from them that you use, alias them and access them as a flat structure.
Something like
(NOTE: it uses arel helpers. You need to include ArelHelpers::ArelTable in models where you use syntax like NameFact[:id])
relation.join(name_fact: :primary_language).select(
NameFact[:id].as('name_fact_id')
PrimaryLanguage[:language].as('primary_language')
)
I'm not sure it will work for your case, but that's the only alternative I know.
I have an extremely slow query that looks like this
There are couple of potential causes:
Too many unnecessary objects fetched and created. From you comment, looks like that is not the case and you need all the data that is being fetched.
DB indexes not optimised. Check the time taken by query. Explain the generated query (check logs to get query or .to_sql) and make sure it is not doing table scan and other costly operations.
I want to query some objects from the database using a WHERE clause similar to the following:
#monuments = Monument.where("... lots of SQL ...").limit(6)
Later on, in my view I use methods like #monuments.first, then I loop through #monuments, then I display #monuments.count.
When I look at the Rails console, I see that Rails queries the database multiple times, first with a limit of 1 (for #monuments.first), then with a limit of 6 (for looping through all of them), and finally it issues a count() query.
How can I tell ActiveRecord to only execute the query once? Just executing the query once with a limit of 6 should be enough to get all the data I need. Since the query is slow (80ms), repeating it costs a lot of time.
In your situation you'll want to trigger the query before you your call to first because while first is a method on Array, it's also a “finder method” on ActiveRecord objects that'll fetch the first record.
You can prompt this with any method that requires data to work with. I prefer using to_a since it's clear that we'll be dealing with an array after:
#moments = Moment.where(foo: true).to_a
# SQL Query Executed
#moments.first #=> (Array#first) <Moment #foo=true>
#moments.count #=> (Array#count) 42
In this case, you can also use first(6) in place of limit(6), which will also trigger the query. It may be less obvious to another developer on your team that this is intentional, however.
AFAIK, #monuments.first should not hit the db, I confirmed it on my console, maybe you have multiple instance with same variable or you are doing something else(which you haven't shared here), share the exact code and query and we might debug.
Since, ActiveRecord Collections acts as array, you can use array analogies to avoid querying the db.
Regarding first you can do,
#monuments[0]
Regarding the count, yes, it is a different query which hits the db, to avoid it you can use length as..
#monuments.length
UPDATED: Wes hit a home run here! Thanks.. I've added a Rails version I was developing using the neography Gem.. Accomplishes the same thing but his version is much faster.. see comparison below
I am using a linked list in Neo4j (1.9, REST, Cypher) to help keep the comments in proper order (Yes I know I can sort on the time etc).
(object node)---[:comment]--->(comment)--->(comment)--->(comment).... etc
Currently I have 900 comments and it's taking 7 seconds to get through the whole list - completely unacceptable.. I'm just returning the ID of the node (I know, don't do this, but it's not he point of my post).
What I'm trying to do is find the ID's of users who commented so I can return a count.. (like "Joe and 405 others commented on your post").. Now, I'm not even counting the unique nodes at this point - I'm just returning the author_id for each record.. (I'll worry about counting later - first take care of the basic performance issue).
start object=node(15837) match object-[:COMMENTS*]->comments return comments.author_id
7 seconds is waaaay too long..
Instead of using a linked list, I could just simply have an object and link all the comments directly to the node - but that could lead to a supernode that is just bogged down, and then finding the most recent comments, even with skip and limit, will be dog slow..
Will relationship indexes help here? I've never used them other than to ensure a unique relationship, or to see if a relationship exists, but can I use them in a cypher query to help speed things up?
If not, what else can I do to decrease the time it takes to return the IDs?
COMPARISON: Here is the Rails version using "phase II" methods of the Neography gem:
next_node_id=18233
#neo=Neography::Rest.new
start_node = Neography::Node.load(next_node_id, #neo)
all_nodes=start_node.outgoing(:COMMENTS).depth(10000)
raise all_nodes.size.to_i
Result: 526 nodes found in 290ms..
Wes' solution took 5 ms.. :-)
Relationship indexes will not help. I'd suggest using an unmanaged extension and the traversal API--it will be a lot faster than Cypher for this particular query on long lists. This example should get you close:
https://github.com/wfreeman/linkedlistlength
I based it on Mark Needham's example here:
http://www.markhneedham.com/blog/2014/07/20/neo4j-2-1-2-finding-where-i-am-in-a-linked-list/
If you're only doing this to return a count, the best solution here is to not figure it out on every query since it isn't changing that often. Cache the results on the node in a total_comments property to your node. Every time a relationship is added or removed, update that count. If you want to know whether any of the current user's friends commented on it so you can say, "Joe and 700 others commented on this," you could do a second query:
start joe=node(15830) object=node(15838) match joe-[:FRIENDS]->friend-[:POSTED_COMMENT]->comment<-[:COMMENTS]-object RETURN friend LIMIT 1
You limit it to 1 since you only need the name of one friend who commented. If it returns someone, adjust the number of comments displayed by 1, include the user's name. You could do that with JS so it doesn't delay your page load. Sorry if my Cypher is a little off, not used to <2.0 syntax.
I am trying to link two types of documents in my Solr index. The parent is named "house" and the child is named "available". So, I want to return a list of houses that have available documents with some filtering. However, the following query gives me around 18 documents, which is wrong. It should return 0 documents.
q=*:*
&fq={!join from=house_id_fk to=house_id}doctype:available AND discount:[1 TO *] AND start_date:[NOW/DAY TO NOW/DAY%2B21DAYS]
&fq={!join from=house_id_fk to=house_id}doctype:available AND sd_year:2014 AND sd_month:11
To debug it, I tried first to check whether there is any available documents with the given filter queries. So, I tried the following query:
q=*:*
&fq=doctype:available AND discount:[1 TO *] AND start_date:[NOW/DAY TO NOW/DAY%2B21DAYS]
&fq=doctype:available AND sd_year:2014 AND sd_month:11
The query gives 0 results, which is correct. So as you can see both queries are the same, the different is using the join query parser. I am a bit confused, why the first query gives results. My understanding is that this should not happen because the second query shows that there is no any available documents that satisfy the given filter queries.
I have figured it out.
The reason is simply the type of join in Solr. It is an outer join. Since both filter queries are executed separately, a house that has available documents with discount > 1 or (sd_year:2014 AND sd_month:11) will be returned even though my intention was applying bother conditions at the same time.
However, in the second case, both conditions are applied at same time to find available documents, then houses based on the matching available documents are returned. Since there is no any available document that satisfies both conditions, then there is no any matching house which gives zero results.
It really took sometime to figure this out, I hope this will help someone else.
I have 7000 objects in my Db4o database.
When i retrieve all of the objects it's almost instant..
When i add a where constrain ie Name = "Chris" it takes 6-8 seconds.
What's going on?
Also i've seen a couple of comments about using Lucene for search type of queries does anyone have any good links for this?
There are two things to check.
Have you added the 'Db4objects.Db4o.NativeQueries'-assembly? Without this assembly, a native query cannot be optimized.
Have set an index on the field which represents the Name? A index should make query a lot faster
Index:
cfg.Common.ObjectClass(typeof(YourObject)).ObjectField("fieldName").Indexed(true);
This question is kinda old, but perhaps this is of any use:
When using native queries, try to set a breakpoint on the lambda expression. If the breakpoint is actually invoked, you're in trouble because the optimization failed. To invoke the lambda, each of the objects will have to be instantiated which is very costly.
If optimization worked, the lambda expression tree will be analyzed and the actual code won't be needed, thus breakpoints won't be triggered.
Also note that settings indexes on fields must be performed before opening the connection.
Last, I have a test case of simple objects. When I started without query optimization and indexing (and worse, using a server that was forced to use the GenericReflector because I failed to provide the model .dlls), it too 600s for a three-criteria query on about 100,000 objects. Now it takes 6s for the same query on 2.5M objects so there is really a HUGE gain.