Paginated, randomized search result, without clumping - ruby-on-rails

For our RoR-based e-commerce site, we are showing a paginated search result of products. Even when randomized, this list frequently has several products of the same brand clumped together. We want a search result that is "de-clumped" such that products of the same brand don't appear near one another.
For example, if I have thousands of products that belong to 50 brands and I'm showing fewer than 50 products on the page, it shouldn't show more than one product per brand on that page (or preferably some configurable maximum). I would have to maintain a "seed" value of some sort to pass in as the user advances from page to page so that I can recreate the search order.
What algorithm or strategy can I use to accomplish this de-clumped/randomized result?

I could describe the strategy that virtocommerce propose, it woroks on catalog level. Here is possible to group similar "variations" to "products". Then variations could be marked as not visible during search or/and in the e-store main catalog when products are visible and could inherit some "variations" keywords to be searchable. The variations are still purchasable but only from the product page.

Related

Solr join across multiple collections and fetch data from both collections

I have 2 solr collections:
Ads {id, title, body, description, etc etc)
AdPlacement (ad_id, placement_id, price)
Each Ad can have 500-1000 placements, with different prices.
The search usecase is where I have a placement and some search keyword and I want to find the Ads that map the keyword provided in the title/body/description fields and it should be sorted by the price in the AdPlacement collection for the given placement. We would like to get the Ad details and the price in the output returned.
Is there any way to achieve this in solr using join across multiple collections? What I have read so far says you can only get data from one collection and use the other one just for filtering.
Solr is a Document database and supports nested documents so ideally you would want to model such that your add placement records are a part of the Ad document. This would be the better way to handle your scenario. Please go through this blog Solr Nested Objects and the relevant Solr documentation
In case modifying the document structure is not an option then consider this documentation which mentions about allowing some level of join between collections.

Rails & Postgres: Best practice for many boolean relationship entries

I'm looking for input on what's best practice for my problem. In abstract terms, I need to store a lot of relationships and need to prioritise between database size, query speed and “ease of maintenance”. The setup is Ruby on Rails with PostgreSQL.
More specific description: Imagine a website that's essentially a searchable database of Products sold by Vendors. Vendors may not ship worldwide, and I want to filter out Products by Vendors who won't ship to a user based on their GeoIP country. To make matters slightly more complex, a Product page can have two different features (let’s call them F1 and F2) that are separately geo-IP-dependent.
Example: A Vendor may want their Product pages to have feature F1 for all countries worldwide except a few e.g. because of embargos; but feature F2 may only be available for countries within e.g. Europe.
Country filters will always be set at the Vendor level.
The “search” function of the website is a basic SQL search in which I want Products to show up if at least one of the features is available for the current user’s country.
The website will allow in the range of 1,000 to 3,000 Vendors (this is a hard limit), and a total of around 10,000 to 50,000 Products. Let's assume that in the beginning, filtering is only relevant to around 100 Vendors.
I had the following ideas and hope that others have feedback on these, or additional approaches:
One relation model CountryVendor with two boolean columns (in which case optionally, a Product could still be shown if the respective country_vendor does not yet exist; i.e. show if !country_vendor&.allow).
Assuming ca. 200 countries, this would imply ca. 2,000 rows in the beginning, and around 600k rows if filters were in place for every one of the 3,000 potential Vendors.
(Theoretically, if non-existence is treatet as true, I could also set up a rake task that removes rows that are false for both features, thus reigning in the table size.)
Two relation models CountryVendorF1 and CountryVendorF2, each with just one boolean column. Not sure if this will effectively be much different, but I imagine it closer to how I think of the UI for setting up the country filters (without going into detail here).
Two JSON columns in the Vendors table that would store true/false for each Country. (Maybe with an ISO code string as the index for simplicity.) There wouldn’t be thousands of new rows, although the DB would still grow in size, but querying might become slow.

Best way to recommend Products to Users based on Interest?

Let's say that each Product has a category. I want to ask the Users to select several categories that the user is interested in, and find the Products that have the same category. This is similar to what Quora, Stumbleupon, and Pinterest all do.
What would be the best way to set this database structure in Rails? Should I create 3 tables: User, Product, and Category, and make the relations
User has many Categories & Product has many Categories?
The problem I see with this is doesn't it create, rather than reference, a new instance of Categories to each row of Users and Products?
*extra: What if I wanted subcategories? For example, if the user chose Technology, it could further ask to choose between web dev, mobile dev, hardware, etc.
You could do that kind of 'recommendation' pretty easily.
Something like this should work (N.B.: I did not test this code, but it is right in spirit):
def recommended_products
joins(:categories, :products).where("product_id not in (?)", self.products)
end
Explanation of each bit:
joins(:categories, :products): this does a SQL join of users, products, and categories. This gives you a 'table' where each user-product-category combination is in it's own row.
.where("product_id not in (?)", self.products): adds a SQL where clause to filter out all the rows that have products in the current user's list of products.
The associations are not a problem. They don't create any new instances by themselves, only if you write code that creates new instances yourself.
As for sub categories, I think you'll do better to make that it's own question, as it's easily a whole post in itself.

Handling lots of COUNT queries for a report

I am putting together a report that shows statistical information about products for a company that owns those products. This report, in the form I need, contains as many as 150 'counts', because we are filling the table with the counts for 12 product types against 15 different statistical categories.
Here's the set up of the models. I'm afraid it's a little complicated!
Company is the entity accessing the report.
Company has many Products through Matchings; and
Product has many Companies through Matchings.
Matching belongs_to Order.
Example report:
___________|_Available/Active/Light Available/Active/Heavy (+12 columns)__
Perishable |
Intangible |
(+10 rows) |
The product types are in the Product table (they run down the left side of the report).
The categories across the top of the report are combinations of three criteria: two from Product and one from Order.
Example - for one cell in the Perishable row, show me how many matchings exist for whom the order type is 'active', the product's weight is 'light' and the product status is 'available'.
On its own the above query is not too bad, but if I keep going like this I'm going to have ~170 queries for this report - both an inelegant and highly impractical solution. Is there a magic ActiveRecord way to deal with this scenario?
You could always create a background job to run regularly and pre-cache the results, or pre-generate the entire report. This would free your users from having to sit and wait for 170 queries to run, and I assume it would be acceptable to have slightly stale results.
As for the elegance and practicality of it, the only magic you could use is SQL. Your object model wasn't built for reporting, don't feel bad about using a tool that was.
There is a statistics gem that does this sort of thing. It does allow you to cache the statistics.
I've used it for lightweight statistics like counts and averages but have never taken benchmarks, which is definitely something you'll want to do if performance is a concern.

Single Inheritance or Polymorphic?

I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?
Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.
I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).

Resources