Thinking Sphinx Application Wide Search and Dealing with Results - ruby-on-rails

The use case is this:
I'd like to let my user search from a single text box, then on the search results page organize the results by class, essentially.
So for example, say I have the following models configured for Thinking Sphinx: Post, Comment and User. (In my situation i have about 10 models but for clarity on StackOverflow I'm pretending there are only 3)
When i do a search similar to: ThinkingSphinx.search 'search term', :classes => [Post, Comment, User] I'm not sure the best way to iterate through the results and build out the sections of my page.
My first inclination is to do something like:
Execute the search
Iterate over the returned result set and do a result.is_a?(ClassType)
Based on the ClassType, add the item to 1 of 3 arrays -- #match_posts, #matching_comments, or #matching_users
Pass those 3 instance variables down to my view
Is there a better or more efficient way to do this?
Thank you!

I think it comes down to what's useful for people using your website. Does it make sense to have the same query run across all models? Then ThinkingSphinx.search is probably best, especially from a performance perspective.
That said, do you want to group search results by their respective classes? Then some sorting is necessary. Or are you separating each class's results, like a GitHub search? Then having separate collections may be worthwhile, like what you've already thought of.
At a most basic level, you could just return everything sorted by relevance instead of class, and then just render slightly different output depending on each result. A case statement may help with this - best to keep as much of the logic in helpers, and/or possibly partials?

If you have only 3 models to search from then why don't you use only model.search instead of ThinkingSphinx.search . This would resolve your problem of performing result.is_a?. That means easier treatment to the way you want to display results for each model.

Related

Query templating in cypher? How to avoid repeating myself

My group has many queries that tend to refer to a class of relationship types. So we tend to write a lot of repetitive queries that look like this:
match (n:Provenance)-[r:`input to`|triggered|contributed|generated]->(m:Provenance)
where (...etc...)
return n, r, m
The question has do to with the repetition of the set of different relationship types. Really we're looking for any relationship in a set of relationship types. Is there a way to enumerate a bunch of relationship types into a set ("foo relationships") and then use that as a variable to avoid repeating myself over and over in many queries? This repetitive querying of relationship types tends to create problems when we might add a new relationship type; now many queries distributed through the code base need to all be updated.
Enumerating all possible relationships isn't such a big deal in an individual query, but it starts to get difficult to manage and update when distributed across dozens (or hundreds) of queries. What's the recommended solution pattern here? Query templating?
This is not currently possible as a built-in feature, but it seems like an interesting feature. I would encourage you to post this to the ideas trello board here:
https://trello.com/b/2zFtvDnV/public-idea-board
Perhaps suggesting something like allowing parameters for relationship types:
MATCH (n)-[r:{types}]->(p)
Of course, that makes it much harder for the query engine to optimize queries ahead of time.. A relationship type hierarchy could work, but we are incredibly hesitant to introduce new abstractions to the model unless absolutely necessary. Still, suggestions for improvements are very welcome!
For now, yes, something like you suggest with templates would solve it. Ideally, you'd send the query to neo containing all the relationship types you are interested in, and with other items parameterized, to allow optimal planning. So to do that, you'd do some string replacement on your side to inject the long list of reltypes into the query before sending it off.

Performing multiple queries on the same model efficiently

I've been going round in circles for a few days trying to solve a problem which I've also struggled with in the past. Essentially its an issue of understanding the best (or an efficient) way to perform multiple queries on a model as I'm regularly finding my pages are very slow to load.
Consider the situation where you have a model called Everything. Initially you perform a query which finds those records in Everything which match certain criteria
#chosenrecords = Everything.where('name LIKE ?', 'What I want').order('price ASC')
I want to remember the contents of #chosenrecords as I will present them to the user as a list, however, I would also like to understand more of the attributes of #chosenrecords,for instance
#minimumprice = #chosenrecords.first
#numberofrecords = #chosenrecords.count
When I use the above code in my controller and inspect the command history on the local server, I am surprised to find that each of the three queries involves an SQL query on the original Everything model, rather than remembering the records returned in #chosenrecords and performing the query on that. This seems very inefficient to me and indeed each of the three queries takes the same amount of time to process, making the page perform slowly.
I am more experienced in writing codes in software like MATLAB where once you've calculated the value of a variable it is stored locally and can be quickly interrogated, rather than recalculating that variable on each occasion you want to know more information about it. Please could you guide me as to whether I am just on the wrong track completely and the issues I've identified are just "how it is in Rails" or whether there is something I can do to improve it. I've looked into concepts like using a scope, defining a different variable type, and caching, but I'm not quite sure what I'm doing in each case and keep ending up in a similar hole.
Thanks for your time
You are partially on the wrong track. Rails 3 comes with Arel, which defer the query until data is required. In your case, you have generated Arel query but executing it with .first & then with .count. What I have done here is run the first query, get all the results in an array and working on that array in next two lines.
Perform the queries like this:-
#chosenrecords = Everything.where('name LIKE ?', 'What I want').order('price ASC').all
#minimumprice = #chosenrecords.first
#numberofrecords = #chosenrecords.size
It will solve your issue.

Searching by several columns in diffrent tables using one input field

I know that this question or its variations were discussed many times.
But I didn't find exactly my solution.
I have many tables assume 10. I would like to provide a solution for searching in table1.field1, table2.field2, ... table10.field10 using only one form input field. When User is typing a searching phrase Autocomplete field have to suggest a list of available results with note a table its goes from.
First idea that comes in my mind is denormalization i.e. creation a separate table Table_for_search(field1, ..., field10) that contains data from tables was mentioned above.
My problem is I wouldn't like to use any heavy solutions like Sphinx, ElasticSearch etc. Also light solutions like meta_search.gem, ransack.gem look like not suitable for may situation.
Thanks in advance.

Pre-sorting associations in controller: How and is there a performance gain?

I have a view which loops through #regions. For each region its countries are displayed.
<% region.countries.each do |country| %>
A new requirement is to sort the countries by some column, which I have a scope for.
<% region.countries.order_alphabetically.each do |country| %>
However I heard that writing logic in views will severely impact the performance. Is it true for this case? Is is possible to pre-sort this in the controller?
P.S. I don't want to use default_scope because I need to sort it differently in other views.
EDIT: changed title to better reflect my question
Whether it's faster likely depends on how many records you have in your table and whether you're indexing on that column. You can experiment with passing this load of onto the database by doing:
region.countries.order(:column_name)
This should be faster in most cases than loading all the records into Ruby and sorting.
What would be undesirable would be if, in 3 different places in your view you put
region.countries.order(:column_name)
This would hit the database 3 times. Some would also argue that you're making the view do too much. You could address both concerns by doing
#sorted_countries = region.countries.order(:column_name)
You're keeping the specifics of how to order out of the view and by reusing the same relation active record will cache the sorted array between reuses.
If you're only using the sorted countries in one place then there shouldn't be any difference, although splitting it out like this makes it a maybe little easier to write a spec that tests that the countries are being sorted and makes it less likely you'll accidentally end up in the performance pitfall detailed above
Sorry for the late answer. I wanted to see some evidence, so I finally make time to sit down and write a benchmarking comparison of the two: sorting in view and controller demo.
In the page there are many regions, each regions have many countries. The page display all of them, sorting countries by name for each of the region. Run rake test:benchmark and the results will be saved in tmp/performance folder. The results of the two are the same, at around 0.0035 per page render.
So in summary, calling a sorting scope in view VS in controller makes no difference in performance.

New(?) attept to structure RESTful base URLs

We all love REST, especially when it comes to the development of APIs. Doing so for the last years I always stumble upon the same problem: nested resources. It seems we're living at the two edges of a scale. Let me introduce an example.
/galaxies/8/solarsystems/5/planets/1/continents/4/countries.json
Neato. Cases like that seem to happen everywhere, no matter in what shape they materialize. Now I'd like to being able to fetch all the countries in a solar system while being able to fetch countries deeply scoped as shown above.
It seems I have two choices here. The first one, I flatten my nested structure and introduce a lot of GET parameters (that need to be well documented and understood by my API user) like so:
/countries.json?galaxy=8&solarsystem=5&planet=1&continent=4
I could flatten all my resources like so and won a unique endpoint base URL for each one. Good point … unique endpoints per resource!
But what's the price? Something that does not feel natural, is not discoverable and does not behave like the tree structure below my resources. Conclusion: Bad idea, but well practiced.
On the other hand I could try to get rid of as many additional GET parameters as possible, creating endpoints like that:
/galaxies/8/solarsystems/5/countries.json
But I also needed:
/galaxies/8/solarsystems/5/planets/1/continents/4/countries.json
This seems to be the other side of the scale. Least number of additional GET parameters, more natural behave but still not what I expected as an API user.
The most APIs I worked with in the last year follow the one or the other paradigm. It seems there is at least one bullet to bite. So why not doing the following:
If there are resources that nest naturally, lets nest them exactly in the way we'd expect them to be nested. What we achive is at first a unique endpoint for every resource when we stay like that:
/galaxies.json
/galaxies/8/solarsystems.json
/galaxies/8/solarsystems/5/planets.json
/galaxies/8/solarsystems/5/planets/1/continents.json
/galaxies/8/solarsystems/5/planets/1/continents/4/countries.json
Ok, but how to solve the initial problem, I wanted to fetch all the countries in a solar system while still being able to fetch countries fully scoped under galaxies, solar systems, planets and continents? Here's what feels natural for me:
/galaxies/8/solarsystems/5/planets/0/continents/0/countries.json # give me all countries in the solarsystem 5
/galaxies/8/solarsystems/0/planets/0/continents/0/countries.json # give me all countries in the galaxy 8
… and so on, and so on. Now you may argue "ok, but the zero there ….." and you are right. Does not look really nice. So why not change the two upper calls to something like that:
/galaxies/8/solarsystems/5/planets/all/continents/all/countries.json # give me all countries in the solarsystem 5
/galaxies/8/solarsystems/all/planets/all/continents/all/countries.json # give me all countries in the galaxy 8
Neat eh? So what do we achive? No additional GET parameters and still stable base URLs for each resources endpoint. What's the price? Yep, at least longer URLs especially during testing by hand using tools like curl.
I wonder wether this could be a way to improve not only the maintainability but also the ease of use of APIs. If so, why does not anyone take an approach like that. I can not imagine to be the first one having that idea. So there must be valid counter arguments against an approach like that. I don't see any. Do you?
I would really like to hear your opinion and arguments for or against an approach like that. Maybe there are ideas for improvement … would be great to hear from you. In my opinion this could lead to much better structured APIs, so hopefully someone will read that and reply.
Regards.
Jan
It would all depend on upon how the data is presented. Would the user really need to the know the galaxy # to find a specific country? If so them what you propose makes sense. However, it seems to me that what you are proposing, while structured and presented well, doesn't allow for clients to search for child element unless the parent is a known quantity.
In your example, if I had a specific id for a continent I would need to know the planet, solar system and galaxy as well. In order to find the specific continent I would need to get all for each possible parent until I found the continent.
Presenting structured data in this manner if fine. Using this structure when you only have a piece of the data may be a bit cumbersome. It all depends upon what you are trying to accomplish.
Nested resource URLs are usually bad. The approach I generally take is to use unique IDs.
Design your DB so that it is only going to have one continent with ID 4. Then, instead of the horrible /galaxies/8/solarsystems/5/planets/1/continents/4/countries.json, all you need is the simple /continents/4/countries.json. Clear, sufficient, and memorable.
The :shallow routing option in Rails does this automatically.
For "all countries in a solar system", I'd use /solar_systems/5/countries.json -- that is, don't try to shoehorn it into the generic URL scheme. (And note the underscore.)

Resources