I have a rails application using Rails 4, PostgreSQL and hosted on Heroku.
The application revolves around the following models: User and Article.
A user can create articles. An article contains a title, description, location (latitude, longitude) and an image.
I would like to add a notification system that works as follows:
A user can set-up a list of keywords that they wish to subscribe to.
The user gets a notification if an article containing one of their keywords is added (in the title, but perhaps in description in time).
What is the best approach to implement this in a scalable way?
In its simplest form, I could create a model called Keyword that stores what keywords a user wants to be notified for.
Then in the create action for article, check to see if the title (or description) contains any of the saved keywords.
This sounds good but will probably fall over once any reasonable amount of users are added.
Obviously, a background task would do the trick but it still sounds wrong to do a basic string contains directly on the database.
Perhaps I could tokenize the title and description into an index and use a background process to handle the heavy lifting? I heard Postgres has some built in text search - could this work?
Could I use a Heroku add-on like Solr or Redis to handle all this or is it overkill? (Not having to pay for an add-on is an advantage).
Perhaps someone has a better implementation for the same functionality.
I know I can implement it quickly, I just want to be sure it implementation is up to scratch.
Thanks,
Brian
I have faced a similar problem. The slowest thing is to do a case insensitive search. What I would suggest to you is the following approach: let TID be the id of the row in which you store the title; then create a table which has one row for every word in your title in lowercase, with the corresponding TID. Than what you need is a join between the word and the keywords of the given user. You can speed up this query with hash indexes.
In my case, no one of the postgres text function was usable because they all have poor performance.
PS we implemented a full text search over about 60000 documents, so your case might be a bit different.
Related
I am currently working on an application in Rails (though language/framework shouldn't matter for this question since it is more of a theoretical one). I'm working on wrapping my head around this problem:
Say I am tracking millions of blogs online and am plugged into their RSS feeds. My app pings these feeds every few few minutes to see if there has been any new activity across any of these millions of blogs. If there is any new activity, I want to alert users of my application who have signed up to receive alerts for specific blogs that there has been an alert.
Does it make sense to have a user_blog_alerts table (where a user can specify custom keywords to be alerted about) and continuously check this table against every new entry that comes in from my feed? And when there is a match, to add them to a queue (using Redis)?
What is the best, most efficient way to build and model this alerting system? Am I even thinking about this in the right way? Are there any good examples or tutorials on this when working with such large amounts of data?
I'm not sure what the right way to do this is, but the thought of continuously scanning a table over and over sounds exhausting (ie. unscalable).
Off the top of my head, what if you created a LIST for every blog in Redis. The values would be the user IDs of those who wanted an alert. The key name would contain the blog id (ex: "user_blog_alerts:12345").
Then when you got a new post for blog 12345 it's a simple lookup to see if that key exists. If it does, then fire off alerts for each user in the list.
I am looking for solution of logging data changes for public API.
There is a need to tell client app which tables form db has changed and need to be synchronised since the app synchronised last time and also need to be for specific brand and country.
Current Solution:
Version table with class_names of models which is touched from every model on create, delete, touch and save action.
When we are touching version for specific model we also look at the reflected associations and touch them too.
Version model is scoped to brand and country
REST API is responding to a request that includes last_sync_at:timestamp, brand and country
Rails look at Version with given attributes and return class_names of models which were changed since lans_sync_at timestamp.
This solution works but the problem is performance and is also hard to maintenance.
UPDATE 1:
Maybe the simple question is.
What is the best practice how to find out and tell frontend apps when and what needs to be synchronized. In terms of whole concept.
Conditions:
Front end apps needs to download only their own content changes not whole dataset.
Does not invoked synchronization when application from different country or brand needs to be synchronized.
Thank you.
I think that the best solution would be to use redis (or some other key-value store) and save your information there. Writing to redis is much faster than any sql db. You can write some service class that would save the data like:
RegisterTableUpdate.set(table_name, country_id, brand_id, timestamp)
Such call would save given timestamp under key that could look like i.e. table-update-1-1-users, where first number is country id, second number is brand id, followed by table name (or you could use country and brand names if needed). If you would like to find out which tables have changed you would just need to find redis keys with query "table-update-1-1-*", iterate through them and check which are newer than timestamp sent through api.
It is worth to rmember that redis is not as reliable as sql databases. Its reliability depends on configuration so you might want to read redis guidelines and decide if you would like to go for it.
You can take advantage of the fact that ActiveModel automatically logs every time it updates a table row (the 'Updated at' column)
When checking what needs to be updated, select the objects you are interested in and compare their 'Updated at' with the timestamp from the client app
The advantage of this approach is that you don't need to keep an additional table that lists all the updates on models, which should speed things up for the API users and be easier to maintain.
The disadvantage is that you cannot see the changes in data over time, you only know that a change occurred and you can access the latest version. If you need to track changes in data over time efficiently, than I'm afraid you'll have to rework things from the top.
(read last part - this is what you are interested in)
I would recommend that you use the decorator design pattern for changing the client queries. So the client sends a query of what he wants and the server decides what to give him based on the client's last update.
so:
the client sends a query that includes the time it last synched
the server sees the query and takes into account the client's nature (device-country)
the server decorates (changes accordingly) the query to request from the DB only the relevant data, and if that is not possible:
after the data are returned from the database manager they are trimmed to be relevant to where they are going
returns to the client all the new stuff that the client cares about.
I assume that you have a time entered field on your DB entries.
In that case the "decoration" of the query (abstractly) would be just to add something like a "WHERE" clause in your query and state you want data entered after the last update.
Finally, if you want that to be done for many devices/locales/whatever implement a decorator for the query and the result of the query and serve them to your clients as they should be served. (Keep in mind that in contrast with a subclassing approach you will only have to implement one decorator for each device/locale/whatever - not for all combinations!
Hope this helped!
I was browsing reddit for the answer to this and came across this conversation which lists out a bunch of search gems for rails, which is cool. But what I wanted was something where I could:
Enter: OMG Happy Cats
It searches the whole database looking for anything that has OMG Happy Cats and returns me a an array of model objects that contain that value, that I can then use Active model serializer (Very important to be able to use this) on to return you a json object of search results so you can display what ever you want to the user.
So that json object, if this was a blog, would have a post object, maybe a category object and even a comment object.
Everything I have seen is very specific to one controller, one model. Which is nice an all but I am more of a "search for what you want, we will return you what you want, maybe grow smarter like this gem, searchkick which also has the ability to offer spelling suggestion.
I am building this with an API, so it would be limited to everything that belongs to a blog object (as to make it not so huge of a search), so it would search things like posts, tags, categories, comments and pages looking for your term, return a json object (as described) and boom done.
Any ideas?
You'll be best considering the underlying technology for this
--
Third Party
As far as I know (I'm not super experienced in this area), the best way to search an entire Rails database is to use a third party system to "index" the various elements of data you require, allowing you to search them as required.
Some examples of this include:
Sunspot / Solr
ElasticSearch
Essentially, having one of these "third party" search systems gives you the ability to index the various records you want in a separate database, which you can then search with your application.
--
Notes
There are several advantages to handling "search" with a third party stack.
Firstly, it takes the load off your main web server - which means it'll be more reliable & able to handle more traffic.
Secondly, it will ensure you're able to search all the data of your application, instead of tying into a particular model / data set
Thirdly, because many of these third party solutions index the content you're looking for, it will free up your database connectivity for your actual application, making it more efficient & scaleable
For PostgreSQL you should be able to use pg_search.
I've never used it myself but going by the documentation on GitHub, it should allow you to do:
documents = PgSearch.multisearch('OMG Happy Cats').to_a
objects = documents.map(&:searchable)
groups = objects.group_by{|o| o.class.name.pluralize.downcase}
json = Hash[groups.map{|k,v| [k,ActiveModel::ArraySerializer.new(v).as_json]}].as_json
puts json.to_json
I'm have application that allows users to store food diary entries of approximately 140 characters in length. I am looking for a solution that will allow me to tie content modules (think tips for healthy eating) to the user's diary entries based on keywords in the entry similar to what Google does with adwords. Are there any out-of-the-box solutions that can do that in Rails?
Here are the specific requirements:
User logs food diary entry
In the user's food diary, if there's a specific tip that matches a keyword for the entry, then the tip is displayed next to the entry
Tips would be defined through an admin tool where the admin specifies the tip content and keywords that would make it appear in the diary
Trying to figure out a) if there's a pre-build solution I could use for something like this or b) what the best approach would be for performance since the users's food diary might have 20 entries per page, and each entry would have to be evaluated to see if there are any corresponding tips that match entry keywords.
For designing a home-grown solution, one idea I had was to make the tip associations when a new food entry is stored like this:
user adds a food entry
after_save a callback method breaks apart the entry into keywords and searches the tips model for matches
if there's a match, it's stored in an association table when new entries are created rather then when the user's food diary is rendered in the web page.
There's a performance hit on storing new entries, but it might allow the user's diary to load faster then doing all those look-ups when the diary is rendered.
Does that make sense, or is there a better way? better yet, are there tools that can accomplish what I'm trying to do?
Thanks!
This is not an AdWords API question, but I'll take a shot:
I would move the association table building into an offline task / cronjob. That would take care of the performance overhead when creating new entries, and users would be generally okay with a message like "Tips are being generated, please be patient" if they happen to view the topic too soon.
I'm not aware of any existing solutions, but this sounds like a hashtag system to me. Basically you have two lists (food dairy entries, tips), you want to assign hashtags to both lists and then pair entries with same hashtags. Googling for a hashtag system / library might be a good starting point.
Cheers,
Anash
I am using embedded documents in MongoDB for a Rails 3 app. I like that I can use embedded documents and the values are all returned with one query and there is less load on the database server. But what happens if I want my users to be able to update properties that really should be shared across documents. Is this sort of operation feasible with MongoDB or would I be better off using normal id based relations? If ID based relations are the way to go would it affect performance to a great degree?
If you need to know anything else about the application or data I would be happy to let you know what I am working with.
Document that has many properties that all documents share.
Person
name: string
description: string
Document that wants to use these properties:
Post
(references many people)
body: string
This all depends on what are you going to do with your Person model later. I know of at least one working example (blog using MongoDB) where its developer keeps user data inside comments they make and uses one collection for the entire blog. Well, ok, he uses second one for his "tag cloud" :) He just doesn't need to keep centralized list of all commenters, he doesn't care. His blog contains consolidated data from all his previous sites/blogs?, almost 6000 posts total. Posts contain comments, comments contain users, users have emails, he got "subscribe to comments" option for every user who comments some post, authorization is handled by the external OpenID service aggregator (Loginza), he keeps user email got from Loginza response and their "login token" in their cookies. So the functionality is pretty good.
So, the real question is - what are you going to do with your Users later? If really feel like you need a separate collection (you're going to let users have centralized control panels, have site-based registration, you're going to make user-centristic features and so on), make it separate. If not - keep it simple and have fun :)
It depends on what user info you want to share acrross documents. Lets say if you have user and user have emails. Does not make sence to move emails into separate collection since will be not more that 10, 20, 100 emails per user. But if user say have some big related information that always growing, like blog posts then make sence to move it into separate collection.
So answer depend on user document structure. If you show your user document structure and what you planning to move into separate collection i will help you make decision.