Keeping Rails data updated from Websocket without saving to database - ruby-on-rails

I have websocket price data streaming in to my rails api app which I want to keep updated so any api requests get an updated response. It would be too expensive to save each update to the database. How can I do this? In Ember I can modify the model and it persists. It doesn't seem to happen in rails.
Channel controller:
def receive(message)
#ActionCable.server.broadcast('channel', message)
platform = Platform.find(params[:id]);
market = platform.markets.find_by market_name: message["market_name"]
market.attributes = {
market.price = message.values["price"],
etc......
}
#market.save [this is too expensive every time]
end
Am I going about this in the right way? It also seems inefficient to use find every time I want to update which could be multiple times per second. In Ember I created a record Id lookup array so I could quickly match the market_name, I don't see how to do this in rails.

Persistence to some store is the only way you can have other threads respond with latest value.
Instead of 3 queries( 2 selects and 1 update) you can do it with just 1 update
Market.where(platform_id: params[:id], market_name: message["market_name"]).
update_all(price: message.values["price"])
With proper index, you might have a sub-ms performance for each update.
Depending on your business need:
If you are getting tons of updates for a market every second(making all prior stale and useless), you can choose to ignore few and not fire update at all.

Related

Memcache data update issue

I need help to optimize the rails code to update Memcache data.
The problem is like,
I have JSON data for a particular match in the following manner and writing into the cache.
match_data = {'teams' : 'Team one', 'Team two', 'score' { 'team1':0, 'team2': 1 }, time: 60 }
Rails.cache.write('match_123', match_data) #123 is match id
The above one is a very small example of a match it can have huge data.
So the match data is receiving in the application from the service frequently and updating the same in the cache and used the same to display in front-end as well.
So the actual problem is that I received the match updates in response like score, time, individually and frequently.
So I use the following code to update the cache data.
data = Rails.cache.read('match_123')
data[:time] = 120
OR
data[:score][:team1] = 1
Rails.cache.write('match_123', data)
I am using background jobs to perform the above process.
This process Is being too quick and we have multiple matches running together,
So at a point what is happening here as we updating complete match hash each time in the cache for a particular key-value change.
So what happening when we have two queues for example:
UpdateMatchCacheDataJob.perform_later(score_update)
UpdateMatchCacheDataJob.perform_later(time_update)
Suppose this queue runs together at a time it will either update the time in the cache or score.
This is the complexity I am facing. I have also tried to perform the job now yet got the issue sometime.
UpdateMatchCacheDataJob.perform_now(score_update)
UpdateMatchCacheDataJob.perform_now(time_update)
My problem may resolve if I play with particular data that needs to be updated instead not to update complete hash each time. Or is there a way to update the particular data in a hash
Note: I am using the AWS Memcache instance to store cache data.
I hope I could explain my issue, please let me know if anything needs from my side to explain the issue.

Save/update multiple rows in rails

I'm currently working on saving a user social media posts in my app. The basic idea is to check if the post exists if it does update the data or if not create a new row. Right now I'm looping through all of the post that I receive from the social platform so potentially I'm looping through 3,000 and adding them to the database.
Is there a way that I could rewrite this to save all the items at once, which hopefully would speed up the save method?
post_data.each do |post_data_details|
post_instance = Post::Tumblr.
where(platform_id: platform_id).
where("data ->> 'id' = ?", post_data_details["id"].to_s).
first_or_initialize
exisiting_data = post_instance.data
new_data = exisiting_data.merge! post_data_details.to_hash
post_instance.data = new_data
post_instance.refreshed_at = date
post_instance.save!
end
It is good practice to run such long-running jobs via sidekiq or other background jobs solution.
You can also use single ActiveRecord transaction.
http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html
But keep in mind, that if one of records will be invalid - whole trasaction will be rollbacked.

Inform ember-data about server changes

I am currently planning a complex application using ruby on rails and ember.js. What I have seen about ember-data so far is that it caches records automatically; post.comments will first result in a Ajax-call to fetch all comments for the given post, but if the user visits the same route the next time, it will just fetch the records from the store-cache.
The problem is: What if another user added a comment to this post? How to tell ember it has to reload its cache because something changed?
I already thought about a solution using websockets to tell clients which stuff to reload - but I don't think this is best-practice. And in addition, I can't imagine this isn't a common problem, so I am wondering what other developers are doing to solve this issue.
I tried to implement model updating in (experimental) chat application. I have used SSE: ActionController::Live on server side (Ruby on Rails) and EventSource on client side.
Simplified code:
App.MessagesRoute = Ember.Route.extend({
activate: function() {
if (! this.eventSource) {
this.eventSource = new EventSource('/messages/events');
var self = this;
this.eventSource.addEventListener('message', function(e) {
var data = $.parseJSON(e.data);
if (data.id != self.controllerFor('messages').get('savedId')) {
self.store.createRecord('message', data);
}
});
}
}
});
App.MessagesController = Ember.ArrayController.extend({
actions: {
create: function() {
var data = this.getProperties('body');
var message = this.store.createRecord('message', data);
var self = this;
message.save().then(function (response) {
self.set('savedId', response.id);
});
}
}
});
The logic is simple: I'm getting each new record from EventSource. Then, if record was created by another client, the application detects it and new record being added to store using ember-data's createRecord. Suppose this logic may have some caveats, but at least it serves well as 'proof of concept'. Chat is working.
Full sources available here: https://github.com/denispeplin/ember-chat/
I have something to say about reloading: you probably don't want to perform full reloading, it's resource-consuming operation. Still, your client side needs some way to know about new records. So, getting new records one-by-one via SSE is probably the best option.
If you just want to get rid of caching you can force a reload every time user navigates to comments route. But this largely depends on what you are trying to acheieve, I hope comments is just an example.
If you want your ui to get updated automagically with changes in server, you need some communication with server, some polling mechanism like websocket or polling from a webworker. Then you may reload the list of changed records sent from server. You are probably on the right track with this.
You can as well take a look at the orbitjs standalone library that integrates well with Ember. This is more useful if you require local storage as well and got to manage the multiple data sources.
This is really a common problem with any web application, no matter what framework you are using. From my point of view, there are two main options. One: You have a service that polls the server to check to see if there are any changes that would require you to reload some of your models, have that service return those model IDs and refresh them. The other option is as you suggested, using a websocket and pushing notifications of model changes/new models themselves.
I would opt to actually just send the comment model itself, and push it into the Ember store and the associated post object. This would reduce the need to hit the server with a hard refresh of your model. I am currently using this method with my Ember app, where there is an object that contains overview data based on all the models in my app, and when a change is made in the backend, my websocket server pushes the new overview data to me application.
UPDATE:: I meant for this to be a comment, not an answer, oh well.
I've had this same issue with mobile app development. While websockets seemed like the first answer, I was worried about scalability issues with limited server resources. I decided to stick with the Ajax call to fetch newly modified records. This way server resources are only used if the user is active. However, as others pointed out, returning all comments every single time you need data makes your cacheing useless and is slow. I suggest updating your rails server to accept an optional timestamp. If the timestamp is not supplied, then every comment is retrieved. If a timestamp is supplied, then only return comments where the updated_at column is >= the supplied timestamp. This way, if no comments were added or modified since your last call, then you quickly get back an empty list and can move on. If results are returned, you can then merge them with your existing list and show the updated comments.
Example of fetching newly created or modified comments
if params.has_key?(:updated_since)
comments = Post.find(params[:id]).comments.where("updated_at >= ?", params[:updated_since])
else
comments = Post.find(params[:id]).comments
end

How does Rails 4 Russian doll caching prevent stampedes?

I am looking to find information on how the caching mechanism in Rails 4 prevents against multiple users trying to regenerate cache keys at once, aka a cache stampede: http://en.wikipedia.org/wiki/Cache_stampede
I've not been able to find out much information via Googling. If I look at other systems (such as Drupal) cache stampede prevention is implemented via a semaphores table in the database.
Rails does not have a built-in mechanism to prevent cache stampedes.
According to the README for atomic_mem_cache_store (a replacement for ActiveSupport::Cache::MemCacheStore that mitigates cache stampedes):
Rails (and any framework relying on active support cache store) does
not offer any built-in solution to this problem
Unfortunately, I'm guessing that this gem won't solve your problem either. It supports fragment caching, but it only works with time-based expiration.
Read more about it here:
https://github.com/nel/atomic_mem_cache_store
Update and possible solution:
I thought about this a bit more and came up with what seems to me to be a plausible solution. I haven't verified that this works, and there are probably better ways to do it, but I was trying to think of the smallest change that would mitigate the majority of the problem.
I assume you're doing something like cache model do in your templates as described by DHH (http://37signals.com/svn/posts/3113-how-key-based-cache-expiration-works). The problem is that when the model's updated_at column changes, the cache_key likewise changes, and all your servers try to re-create the template at the same time. In order to prevent the servers from stampeding, you would need to retain the old cache_key for a brief time.
You might be able to do this by (dum da dum) caching the cache_key of the object with a short expiration (say, 1 second) and a race_condition_ttl.
You could create a module like this and include it in your models:
module StampedeAvoider
def cache_key
orig_cache_key = super
Rails.cache.fetch("/cache-keys/#{self.class.table_name}/#{self.id}", expires_in: 1, race_condition_ttl: 2) { orig_cache_key }
end
end
Let's review what would happen. There are a bunch of servers calling cache model. If your model includes StampedeAvoider, then its cache_key will now be fetching /cache-keys/models/1, and returning something like /models/1-111 (where 111 is the timestamp), which cache will use to fetch the compiled template fragment.
When you update the model, model.cache_key will begin returning /models/1-222 (assuming 222 is the new timestamp), but for the first second after that, cache will keep seeing /models/1-111, since that is what is returned by cache_key. Once 1 second passes, all of the servers will get a cache-miss on /cache-keys/models/1 and will try to regenerate it. If they all recreated it immediately, it would defeat the point of overriding cache_key. But because we set race_condition_ttl to 2, all of the servers except for the first will be delayed for 2 seconds, during which time they will continue to fetch the old cached template based on the old cache key. Once the 2 seconds have passed, fetch will begin returning the new cache key (which will have been updated by the first thread which tried to read/update /cache-keys/models/1) and they will get a cache hit, returning the template compiled by that first thread.
Ta-da! Stampede averted.
Note that if you did this, you would be doing twice as many cache reads, but depending on how common stampedes are, it could be worth it.
I haven't tested this. If you try it, please let me know how it goes :)
The :race_condition_ttl setting in ActiveSupport::Cache::Store#fetch should help avoid this problem. As the documentation says:
Setting :race_condition_ttl is very useful in situations where a cache entry is used very frequently and is under heavy load. If a cache expires and due to heavy load seven different processes will try to read data natively and then they all will try to write to cache. To avoid that case the first process to find an expired cache entry will bump the cache expiration time by the value set in :race_condition_ttl. Yes, this process is extending the time for a stale value by another few seconds. Because of extended life of the previous cache, other processes will continue to use slightly stale data for a just a bit longer. In the meantime that first process will go ahead and will write into cache the new value. After that all the processes will start getting new value. The key is to keep :race_condition_ttl small.
Great question. A partial answer that applies to single multi-threaded Rails servers but not multiprocess(or) environments (thanks to Nick Urban for drawing this distinction) is that the ActionView template compilation code blocks on a mutex that is per template. See line 230 in template.rb here. Notice there is a check for completed compilation both before grabbing the lock and after.
The effect is to serialize attempts to compile the same template, where only the first will actually do the compilation and the rest will get the already completed result.
Very interesting question. I searched on google (you get more results if you search for "dog pile" instead of "stampede") but like you, did I not get any answers, except this one blog post: protecting from dogpile using memcache.
Basically does it store you fragment in two keys: key:timestamp (where timestamp would be updated_at for active record objects) and key:last.
def custom_write_dogpile(key, timestamp, fragment, options)
Rails.cache.write(key + ':' + timestamp.to_s, fragment)
Rails.cache.write(key + ':last', fragment)
Rails.cache.delete(key + ':refresh-thread')
fragment
end
Now when reading from the cache, and trying to fetch a non existing cache, will it instead try to fecth the key:last fragment instead:
def custom_read_dogpile(key, timestamp, options)
result = Rails.cache.read(timestamp_key(name, timestamp))
if result.blank?
Rails.cache.write(name + ':refresh-thread', 0, raw: true, unless_exist: true, expires_in: 5.seconds)
if Rails.cache.increment(name + ':refresh-thread') == 1
# The cache didn't exists
result = nil
else
# Fetch the last cache, as the new one has not been created yet
result = Rails.cache.read(name + ':last')
end
end
result
end
This is a simplified summary of the by Moshe Bergman that i linked to before, or you can find here.
There is no protection against memcache stampedes. This is a real problem when multiple machines are involved and multiple processes on those multiple machines. -Ouch-.
The problem is compounded when one of the key processes has "died" leaving any "locking" ... locked.
In order to prevent stampedes you have to re-compute the data before it expires. So, if your data is valid for 10 minutes, you need to regenerate again at the 5th minute and re-set the data with a new expiration for 10 more minutes. Thus you don't wait until the data expires to set it again.
Should also not allow your data to expire at the 10 minute mark, but re-compute it every 5 minutes, and it should never expire. :)
You can use wget & cron to periodically call the code.
I recommend using redis, which will allow you to save the data and reload it in the advent of a crash.
-daniel
A reasonable strategy would be to:
use a :race_condition_ttl with at least the expected time it takes to refresh the resource. Setting it to less time than expected to perform a refresh is not advisable as the angry mob will end up trying to refresh it, resulting in a stampede.
use an :expires_in time calculated as the maximum acceptable expiry time minus the :race_condition_ttl to allow for refreshing the resource by a single worker and avoiding a stampede.
Using the above strategy will ensure that you don't exceed your expiry/staleness deadline and also avoid a stampede. It works because only one worker gets through to refresh, whilst the angry mob are held off using the cache value with the race_condition_ttl extension time right up to the originally intended expiry time.

How to efficiently update many ShopifyAPI::Product instances using ShopifyAPI (Ruby on Rails)?

I am writing an app which will sit between a vendors proprietary inventory management system and their Shopify shop. The app will periodically update Shopify from new data generated by the inventory management system. It will also provide end-points for Shopify webhooks.
I am currently doing something like this (pseudo-ruby with much stuff omitted):
def update_product_with_proxy(product_proxy)
product_proxy.variant_proxies.dirty.each do |variant_proxy|
update_variant_with_proxy(variant_proxy)
end
if product_proxy.dirty_proxy
shopify_product = ShopifyAPI::Product.find(product_proxy.shopify_id)
shopify_product.update_attributes({some attributes here})
end
end
Elsewhere:
def update_variant_with_proxy(variant_proxy)
shopify_variant = ShopifyAPI::Variant.find(variant_proxy.shopify_id)
shopify_variant.update_attributes({some attributes here})
end
This seems terribly inefficient as I have to fetch each updated ShopifyAPI::Product and ShopifyAPI::Variant before I can update them (I have their id's cached locally). It takes about 25 minutes for an update cycle updating 24 products each with 16 variants. Rails spends less than 2 seconds updating my product/variant proxies. The other 99% of the time is spent talking to Shopify. I must be doing something wrong.
Given that I know the id of the remote object is there a way to updated it directly without having to fetch it first?
cheers,
-tomek
First things first: You can update variants through their parent product. Once you've grabbed the product it'll have the variant info with it so you can edit them, save, and the changes will be persisted in a single API call. That'll save you some time.
Second: You can create an object locally using the gem, give it an id, and then call save to initiate the PUT request without first fetching the object from Shopify. Something like this should do the trick:
product = ShopifyAPI::Product.new(:id => 1, :title => "My new title")
product.save
Putting those two things together should give you what you want: The ability to update a product's variants in a single API call.
Note: For future reference, the shopify_api gem is built on Active Resource, so anything you can do with that library you can do with the gem.

Resources