I am having a problem implementing a special kind of search for my Rails application. I am working on an achievement system where you can search for a set of users in a search form (e.g., the query being "Ross, Adam, Jake") and it returns all of the common achievements that the users have unlocked (e.g., if users Ross, Adam, and Jake all had an achievement named "You are winner!"). I have three tables, one for achievements, one for users, and a join table. We have tested the associations and such, so we know that works.
My first idea was to put the search terms in an array and get the search results for each item in the array and place them into respective "search result arrays". Then, I was thinking to go through each item in search result array 1 to see if it appears in both of the other result arrays. The objects that appear in all three of the search result arrays would be returned and displayed on a page.
Is there an easy way to implement this without writing a bunch of my own code? Are there some functions I should know about? Any help will be appreciated!
Well, both Ransack and it's predecessor (MetaSearch) are useful gems for creating complex search forms.
In general I think you want to do something like select distinct achievement ids for user ids in an array. Off the top of my head I'm not quite sure how you should write it... others may know.
Look at the documentation on MetaSearch (more established) and see if you see a pattern that fits, if not check Ransack (more advanced).
You can use some autocomplete plugin for user names and convert the names to ids on the fly, that way you won't have to deal with converting user names to ids in backend later.
For common achievements, if a user can have a achievement only once, aggregating the results in join table and counting the results with achievement ids would be the way to go.
You can provide more details for a more detailed answer. :)
You can use Sunspot which is allows easy solr integration with Ruby and Rails
Related
I have 2 solr collections:
Ads {id, title, body, description, etc etc)
AdPlacement (ad_id, placement_id, price)
Each Ad can have 500-1000 placements, with different prices.
The search usecase is where I have a placement and some search keyword and I want to find the Ads that map the keyword provided in the title/body/description fields and it should be sorted by the price in the AdPlacement collection for the given placement. We would like to get the Ad details and the price in the output returned.
Is there any way to achieve this in solr using join across multiple collections? What I have read so far says you can only get data from one collection and use the other one just for filtering.
Solr is a Document database and supports nested documents so ideally you would want to model such that your add placement records are a part of the Ad document. This would be the better way to handle your scenario. Please go through this blog Solr Nested Objects and the relevant Solr documentation
In case modifying the document structure is not an option then consider this documentation which mentions about allowing some level of join between collections.
I am working with twitter streaming api. and am a little confused about deciding the criteria for indexing the data. Right now I have a single index that contains all the tweets in one doc_type and users in another doc type.
Is it the best way to go about storing them or should i create a new doc type for every category (category can be decided on basis of hashtag and tweet content)
What should be the best approach to storing such data?
Thanks in advance.
At first, the answer to your question is that this very much depends on your use case. What is your application doing? What do you do with the tweets? How many categories do you plan to have?
I'd in general, however, go for a solution where you use the same index and the same doc_type for all tweets. This allows you to build queries and aggregations over all your tweets without thinking about the different types of categories. It also allows you to add new categories easily without having to change your queries.
If you want to do some classification of the tweets you could add a category field to the tweet document stored in elasticsearch. You can then use this category field to implement your specific application logic.
If your category names have spaces or punctuation marks don't forget to define the category field as not_analyzed. Otherwise it will be broken up in parts.
I have a rails app where a user can enter keywords into a search box. From there, I find the id's of those matching keywords, then I do more processing until i arrive at an array of items that fulfill that criteria.
EXAMPLE:
A user searches for people who have a JD degree. I look up the id in the Degrees database, then I look up all companies/firms from my Companies db that employ people who have JDs. Finally i collect the employees with JDs of those companies. Assume that there is no way to start by searching people.
Once I have an array of individuals that meet the requirement, how can I paginate through this array? It seems paginating in the Employee model isn't giving me what i want.
When the user who performs the search hits the 'Next' button, the array of results is gone and i would ideally like to preserve that array appropriately, or get rid of it if the user performs a new search. Thoughts?
In a file in your config/initializers, add this:
require 'will_paginate/array'
Then you can use it on arrays:
my_array.paginate(:page => x, :per_page => y)
The will_paginate gem is definitely a simple solution. The README has a few examples on how to implement it: https://github.com/mislav/will_paginate. The :per_page property takes care of retaining the array when you go to the next page of results.
I have sunspot/solr set up to search products on my site. We need the ability to search users and another model (too much to explain what this is) in out app. Basically there is form for searching product via solr and this works well. There would be another form for searching users and the other form to search the other model.
I assume it is recommended to have a separate index for products, users, and the other model? It's seems best to keep the index from getting too bloated? Am I on the right track here?
All the models are indexed in the same index. And sunspot will also index the classnames into the index.
I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?
Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.
I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).