Object variables in multithreaded Sneaker works like a global mutable data - ruby-on-rails

I have a sneaker worker(given below) as a backend of a chatbot.
class RabbitMQWorker
include Sneakers::Worker
from_queue "message"
def work(options)
parsed_options = JSON.parse(options)
# Initializing some object variables
#question_id = parsed_options[:question_id]
#answer = parsed_option[:answer]
#session_id = parsed_option[:session_id]
ActiveRecord::Base.connection_pool.with_connection do
# send next question to the session_id based on answer
end
ack!
end
end
What's happening
The problem I am facing here is that when I run sneaker with more than 1 thread and multiple users are chatting at the same time, then the ampq event which comes slightly later cause to override the #session_id and as a result, the second user gets two questions and the first one gets none. This happens because by the time 1st event is getting process the second event came and override #session_id. Now when it's time to send the next question to the first user by using #session_id, the question get's send to the second user.
My Questions
Do the work method and any instance variables I create in it works like global mutable data for sneaker's threads?
If yes then I am guessing I need to make them as thread-local variables. If I do that, then do I need to make these changes deep down in my Rails logic as well? As this worker works with Rails.
Curiosity question
How does Puma manage these things? It is a multi-threaded app server and we use instance variables in controllers and it manages to serve multiple requests simultaneously. Does it mean that Puma handles this multi-contexting implicitly and Sneakers don't?
What I have done till now
I read the documentation of Sneaker and couldn't found anything regarding this.
I perform a load tests to verify the problem and it is the problem as I stated above.
I tried getting my logic clear on how actually multi-threading works but everywhere there is only general stuff. The curiosity question I asked above will help a lot in terms of clearing the concepts, I am searching for an explanation of it for days but couldn't found any.

After 2 days of searching for an issue where messages seemed to get mixed up I was finally able to solve this by removing all instance variables from my workers.
This thread gave me the clue to do so: https://github.com/jondot/sneakers/issues/244
maybe we should simply disallow instance variables in workers since
changing the behavior to instantiate multiple worker instances might
break existing code somehow
and:
I think that an instance per thread is the way to go.
So when you remove your instance variables you should be fine!

Related

Is there a way to have a persistent object in memory I can read/write to anywhere in a rails app?

I'm working on a rails based web backend, and I've ran into a bit of an issue. I'm building a crypto trading based application, which relies on knowing the exact current price of many cryptos/stocks. To do this I seem to need a websocket to update certain data, however I can't figure out how to store this data. I need to be able to write to it on every websocket update, as well as read from it when sending out data to the front end. Both of these actions seem too fast to rely on my database, so I'm wondering if there is a better option. My idea was to use a class with a class method that is set on server startup. Then read from/write to that method when needed. The class looks something like this
class CryptoSocket
def self.start
##cryptos = {
BTC: 0,
ETH: 0,
DOGE: 0
}
end
def self.value(symbol)
##cryptos[symbol]
end
end
Inside the start method is a websocket which gets opened, and on message writes to ##cryptos with the updated value of the coin. I call CryptoSocket.start when the server boots up
To get the value for a symbol I can just call CryptoSocket.value(symbol) anywhere in my app. This seemed like it was working, however I've noticed sometimes it fails telling me NameError: uninitialized class variable ##cryptos in CryptoSocket
It seems like the issue is running reload! in the rails console, or entering a binding.pry then exiting. My guess is some garbage collection is happening, but overall it's something I'd like to avoid.
Does anyone have a suggestion for a better way to go about setting this class up? Does rails have a better way to persist an object in memory? It's fine to lose it when the server shuts down, but I would like to keep access to it when the server stays up

Put class instance to class constant in initializers

In one of my old apps, I'm using several API connectors - like AWS or Mandill as example.
For some reason (may be I saw it somewhere, don't remember), I using class constant to initialize this objects on init stage of application.
As example:
/initializers/mandrill.rb:
require 'mandrill'
MANDRILL = Mandrill::API.new ENV['MANDRILL_APIKEY']
Now I can access MANDRILL class constant of my application in any method and use it. (full path MyApplication::Application::MANDRILL, or just MANDRILL). All working fine, example:
def update_mandrill
result = MANDRILL.inbound.update_route id, pattern, url
end
The question is: it is good practice to use such class constants? Or better create new class instance in every method that using this instance, like in example:
def update_mandrill
require 'mandrill'
mandrill = Mandrill::API.new ENV['MANDRILL_APIKEY']
result = mandrill.inbound.update_route id, pattern, url
end
Interesting question.
It's very handy approach but it may have cons in some scenarios.
Imagine you have a constant that either takes a lot of time to initialize or it loads a lot of data into memory. When its initialization takes long you essentially degrade app boot time (which may or may not be a problem, usually it will in development).
If it loads a lot of data into memory it may turn out it's gonna be a problem when running rake tasks for example which load entire environment. You may hit memory boundaries in use cases in which you essentially do not need this data at all.
I know one application which load a lot of data during boot - and it's done very deliberately. Sure, use case is a bit uncommon, but still.
Another thing to consider is - imagine, you're trying to establish connection to external service like Mongo or anything else. If this service is unavailable (what happens) your application won't be able to boot. Maybe this service is essential for app to work, and without it it would be "useless" anyway, but it's also possible that you essentially stop everything because storage in which you keeps log does not work.
I'm not saying you shouldn't use it as you suggested - I do it also in my apps, but you should be aware of potential drawbacks.
Yes, pre-creating a pseudo-constant object (like that api client) is usually a good idea. However, there is, approximately, a thousand ways go about it and the constant is not on top of my personal list.
These days I usually go with setting it in the env files.
# config/environments/production.rb
config.email_client = Mandrill::API.new ENV['MANDRILL_APIKEY'] # the real thing
# config/environments/test.rb
config.email_client = a_null_object # something that conforms to the same api, but does absolutely nothing
# config/environments/development.rb
config.email_client = a_dev_object # post to local smtp, or something
Then you refer to the client like this:
Rails.application.configuration.email_client
And the correct behaviour will be picked up in each env.
If I don't need this per-env variation, then I either use some kind of singleton object (EmailClient.get) or a global variable in the initializer ($email_client). It can be argued that a constant is better than global variable, semantically and because it raises a warning when you try to re-assign it. But I like that global variable stands out more. You see right away that it's something special. (And then again, it's only #3 in the list, so I don't do it very often.).

Instance variable in Rails for several request

I have a Ruby on Rails server for a web application.
I am wondering if instance variables in Ruby are shared on different requests. Here is my problem:
Bob makes a Get request to my ruby server; it first initiates a instance variable #x and then its request is processed for a while.
In the meantime, Alice makes another Get request that initiates a instance variable #x.
At the end of Bob's process, the variable #x is looked at. Will it have the value given by Bob or Alice?
What if Bob and Alice are the same person that quickly makes the same request twice in a row? Will the first request see the value initiated by the second request at the end of its process?
Edit: this happens when config.cache_classes is set to true and from my experiments, it doesn't concern controllers' instance variables but it do concern librairies. Here is what I have done. When an http request arrives, I set an instance variable #x to a random value and save it in a variable x. I sleep for a while and then I compare #x and x. When I launch several requests at the same time, I sometime find different values when doing the comparison. However, this mismatch doesn't happen in controllers but it does in librairies. Does it means that the controllers are not cashed but librairies are when config.cache_classes = true ?
#x is not a global variable, it's an instance variable.
No, controller instance variables are not shared between requests.
Rails 4 is the first version to be multi-threaded by default - you shouldn't run into variable concurrency issues with earlier versions unless you explicitly turned threading on in the config (config.threadsafe!).
You can still have database-related concurrency issues in single-threaded Rails: If, for example, you read from a field at the start of a request, perform some calculation on that value and then write it back.
In multi-threaded Rails you can have concurrency problems with global or class variables too, as these are stored in shared memory space. Global variables are defined with the dollar symbol $x and class variables are defined with two at symbols ##x - both have their uses but there are usually better ways to pass data around.

Catching errors with Ruby Twitter gem, caching methods using delayed_job: What am I doing wrong?

What I'm doing
I'm using the twitter gem (a Ruby wrapper for the Twitter API) in my app, which is run on Heroku. I use Heroku's Scheduler to periodically run caching tasks that use the twitter gem to, for example, update the list of retweets for a particular user. I'm also using delayed_job so scheduler calls a rake task, which calls a method that is 'delayed' (see scheduler.rake below). The method loops through "authentications" (for users who have authenticated twitter through my app) to update each authorized user's retweet cache in the app.
My question
What am I doing wrong? For example, since I'm using Heroku's Scheduler, is delayed_job redundant? Also, you can see I'm not catching (rescuing) any errors. So, if Twitter is unreachable, or if a user's auth token has expired, everything chokes. This is obviously dumb and terrible because if there's an error, the entire thing chokes and ends up creating a failed delayed_job, which causes ripple effects for my app. I can see this is bad, but I'm not sure what the best solution is. How/where should I be catching errors?
I'll put all my code (from the scheduler down to the method being called) for one of my cache methods. I'm really just hoping for a bulleted list (and maybe some code or pseudo-code) berating me for poor coding practice and telling me where I can improve things.
I have seen this SO question, which helps me a little with the begin/rescue block, but I could use more guidance on catching errors, and one the higher-level "is this a good way to do this?" plane.
Code
Heroku Scheduler job:
rake update_retweet_cache
scheduler.rake (in my app)
task :update_retweet_cache => :environment do
Tweet.delay.cache_retweets_for_all_auths
end
Tweet.rb, update_retweet_cache method:
def self.cache_retweets_for_all_auths
#authentications = Authentication.find_all_by_provider("twitter")
#authentications.each do |authentication|
authentication.user.twitter.retweeted_to_me(include_entities: true, count: 200).each do |tweet|
# Actually build the cache - this is good - removing to keep this short
end
end
end
User.rb, twitter method:
def twitter
authentication = Authentication.find_by_user_id_and_provider(self.id, "twitter")
if authentication
#twitter ||= Twitter::Client.new(:oauth_token => authentication.oauth_token, :oauth_token_secret => authentication.oauth_secret)
end
end
Note: As I was posting this, I noticed that I'm finding all "twitter" authentications in the "cache_retweets_for_all_auths" method, then calling the "User.twitter" method, which specifically limits to "twitter" authentications. This is obviously redundant, and I'll fix it.
First what is the exact error you are getting, and what do you want to happen when there is an error?
Edit:
If you just want to catch the errors and log them then the following should work.
def self.cache_retweets_for_all_auths
#authentications = Authentication.find_all_by_provider("twitter")
#authentications.each do |authentication|
being
authentication.user.twitter.retweeted_to_me(include_entities: true, count: 200).each do |tweet|
# Actually build the cache - this is good - removing to keep this short
end
rescue => e
#Either create an object where the error is log, or output it to what ever log you wish.
end
end
end
This way when it fails it will keep moving on to the next user but will still making a note of the error. Most of the time with twitter its just better to do something like this then try to do with each error on its own. I have seen so many weird things out of the twitter API, and random errors, that trying to track down every error almost always turns into a wild goose chase, though it is still good to keep track just in case.
Next for when you should use what.
You should use a scheduler when you need something to happen based on time only, delayed jobs for when its based on an user action, but the 'action' you are going to delay would take to long for a normal response. Sometimes you can just put the thing plainly in the controller also.
So in other words
The scheduler will be fine as long as the time between updates X is less then the time it will take for the update to happen, time Y.
If X < Y then you might want to look at calling the logic from the controller when each indvidual entry is accessed, isntead of trying to do them all at once. The idea being you would only update it after a certain time as passed so. You could store the last time update either on the model itself in a field like twitter_udpate_time or in a redis or memecache instance at a unquie key for the user/auth.
But if the individual update itself is still too long, then thats when you should do the above, but instead of doing the actually update, call a delayed job.
You could even set it up that it only updates or calls the delayed job after a certain number of views, to further limit stuff.
Possible Fancy Pants
Or if you want to get really fancy you could still do it as a cron job, but have a point system based on views that weights which entries should be updated. The idea being certain actions would add points to certain users, and if their points are over a certain amount you update them, and then remove their points. That way you could target the ones you think are the most important, or have the most traffic or show up in the most search results etc etc.
Next off a nick picky thing.
http://api.rubyonrails.org/classes/ActiveRecord/Batches.html
You should be using
#authentications.find_each do |authentication|
instead of
#authentications.each do |authentication|
find_each pulls in only 1000 entries at a time so if you end up with a lof of Authentications you don't end up pulling a crazy amount of entries into memory.

why class variable of Application Controller in Rails is re-initialized in different request

I have my Application Controller called McController which extends ApplicationController, and i set a class variable in McController called ##scheduler_map like below:
class McController < ApplicationController
##scheduler_map = {}
def action
...
end
private
def get_scheduler(host, port)
scheduler = ##scheduler_map[host+"_"+port]
unless scheduler
scheduler = Scheduler.create(host, port)
##scheduler_map[host+"_"+port] = scheduler
end
scheduler
end
end
but i found that from second request start on ##scheduler_map is always an empty hash, i run it in development env, could someone know the reason? is that related to the running env?
Thank you in advance.
You answered your own question :-)
Yes this is caused by the development environment (i tested it) and to be more precise the config option "config.cache_classes = false" in config/environments/development.rb
This flag will cause all classes to be reloaded at every request.
This is done because then you dont have to restart the whole server when you make a small change to your controllers.
You might want to take in consideration that what you are trying can cause HUGE memory leaks when later run in production with a lot of visits.
Every time a user visits your site it will create a new entree in that hash and never gets cleaned.
Imagine what will happen if 10.000 users have visited your site? or what about 1.000.000?
All this data is kept in the systems memory so this can take a lot of space the longer the server is online.
Also, i'm not really sure this solution will work on a production server.
The server will create multiple threats to handle a lot of visitors on the same time.
I think (not sure) that every threat will have his own instances of the classes.
This means that in treat 1 the schedule map for ip xx might exist but in treat 2 it doesn't.
If you give me some more information about what this scheduler is i might be able to give a suggestion for a different solution.

Resources