When a user tries to sign into our Rails app, I contact a 3rd party ICAM server that returns some information about the user if he exists in the ICAM server. I get a hash back with the user name, email, etc... (Our environment is configured in a way that the ICAM server can detect the identify of the person who is attempting to sign in based on their workstation credentials).
We do all of this work in a custom gem. During the login process, I try to cache the info the ICAM sever returns so I don't have to talk to the ICAM server again. Naively, I had some code that basically did:
module Foo
def self.store_icam_data(data)
#icam_data = data
end
def self.icam_data
#icam_data || {}
end
end
I just discovered a problem when two users log into the system. When User A logs in, #icam_data is set with his info. When User B logs in, #icam_data is set with his info. The next time User A makes a request, #icam_data has by User B's info inside it instead of User A's!
I wasn't expecting the variable inside this module to be shared between threads/sessions like it is. It effectively makes all current users of the system become the last user who signs in... a pretty gnarly bug.
Can someone explain why the this #icam_data variable is getting shared across sessions? I was expecting the data/code to be more isolated than it apparently is.
There are only two ways you can share data between requests: your database (RDBMS, Redis, etc.) and session object (inside of controllers). Any other data which change and survive end of request is side-effect which should be avoided.
Your class variables are saved into memory (RAM) region which belong to particular app server process (e.g. Unicorn worker process). And single process naturally serve many requests, because it's inefficient to kill and restart Rails on each request.
So it's not "Rails sharing code", it's web application server shares its memory region amongst all requests which it serves.
If you want to bind small amount of data to current user, use session:
# save
session[:icam_data] = MyICAMModule.get_icam_data
# retain
MyICAMModule.set_icam_data(session[:icam_data])
More info on session is available in Action Controller Overview.
If you have large amount of data – use database.
Related
Ruby on Rails 4.1.4
I made an interface to a Twitch gem, to fetch information of the current stream, mainly whether it is online or not, but also stuff like the current title and game being played.
Since the website has a lot of traffic, I can't make a request every time a user walks in, so instead I need to cache this information.
Cached information is stored as a class variable ##stream_data inside class: Twitcher.
I've made a rake task to update this using cronjobs, calling Twitcher.refresh_stream, but naturally that is not running within my active process (to which every visitor is connecting to) but instead a separate process. So the ##stream_data on the actual app is always empty.
Is there a way to run code, within my currently running rails app, every X minutes? Or a better approach, for that matter.
Thank you for your time!
This sounds like a good call for caching
Rails.cache.fetch("stream_data", expires_in: 5.minutes) do
fetch_new_data
end
If the data is in the cache and is not old then it will be returned without executing the block, if not the block is used to populate the cache.
The default cache store just keeps things in memory so doesn't fix your problem: you'll need to pick a cache store that is shared across your processes. Both redis and memcached (via the dalli gem) are popular choices.
Check out Whenever (basically a ruby interface to cron) to invoke something on a regular schedule.
I actually had a similar problem with using google analytics. Google analytics requires that you have an API key for each request. However the api key would expire every hour. If you requested a new api key for every google analytics request, it'd be very slow per request.
So what I did was make another class variable ##expires_at. Now in every method that made a request to google analytics, I would check ##expires_at.past?. If it was true, then I would refresh the api key and set ##expires_at = 45.minutes.from_now.
You can do something like this.
def method_that_needs_stream_data
renew_data if ##expires_at.past?
# use ##stream_data
end
def renew_data
# renew ##stream_data here
##expires_at = 5.minutes.from_now
end
Tell me how it goes.
I have a rails app, where every user can connect his Facebook account, and give permission to send messages from the app wich is using. So, every logged user with connected Facebook account must has one Jabber Client authorized with his Facebook-id, token etc, I'm doing it with xmpp4r GEM.
The connected facebook account with token, and facebook data is stored in Database as Mailman object.The Mailman class has also methods to control the Jabber client like run_client, connect_client, authorize_client, stop_client, get_client etc. The most important methods for me are connect_client and get_client.
class Mailman < ActiveRecord::Base
##clients = {} unless defined? ##clients
def connect_client
#some code
##clients[self.id] = Jabber::Client.new Jabber::JID.new(facebook_chat_id)
#some code
end
def get_client
##clients[self.id]
end
#other stuff
end
As you can see in the code, every Mailman object has get_client method which should return Jabber::Client object, and it's true, it is working, but only in a scope of running application, because the ##clients variable is stored only for specifc running app.
This is problem for me because I would like to use cron task to close idle clients, and the cron task is using different initalization of the app, so Mailman.find(x).get_client will return always nil, even if it returns Jabber::Client object in a production app.
How are you dealing with such issues? For example, is it possible to get a pointer to memory for Jabber::Client object and save it to database, so any other app's initalization could use it? I have no idea how to achive that. Thank you for any advice!
Even if you manage to store a "pointer to memory" in your database, it will be of no use to a cron job. The cron job is started as a new process, and the OS ensures that it won't have access to the memory space of any other process.
The best way is to create a controller to manage your running XMPP clients. This will provide a restful API to your cron job, allowing you to terminate idle clients using HTTP requests.
I'm considering using Amazon RDS with read replicas to scale our database.
Some of our controllers in our web application are read/write, some of them are read-only. We already have an automated way for identifying which controllers are read-only, so my first approach would have been to open a connection to the master when requesting a read/write controller, else open a connection to a read replica when requesting a read-only controller.
In theory, that sounds good. But then I stumbled open the replication lag concept, which basically says that a replica can be several seconds behind the master.
Let's imagine the following use case then:
The browser posts to /create-account, which is read/write, thus connecting to the master
The account is created, transaction committed, and the browser gets redirected to /member-area
The browser opens /member-area, which is read-only, thus connecting to a replica. If the replica is even slightly behind the master, the user account might not exist yet on the replica, thus resulting in an error.
How do you realistically use read replicas in your application, to avoid these potential issues?
I worked with application which used pseudo-vertical partitioning. Since only handful of data was time-sensitive the application usually fetched from slaves and from master only in selected cases.
As an example: when the User updated their password application would always ask master for authentication prompt. When changing non-time sensitive data (like User Preferences) it would display success dialog along with information that it might take a while until everything is updated.
Some other ideas which might or might not work depending on environment:
After update compute entity checksum, store it in application cache and when fetching the data always ask for compliance with checksum
Use browser store/cookie for storing delta ensuring User always sees the latest version
Add "up-to-date" flag and invalidate synchronously on every slave node before/after update
Whatever solution you choose keep in mind it's subject of CAP Theorem.
This is a hard problem, and there are lots of potential solutions. One potential solution is to look at what facebook did,
TLDR - read requests get routed to the read only copy, but if you do a write, then for the next 20 seconds, all your reads go to the writeable master.
The other main problem we had to address was that only our master
databases in California could accept write operations. This fact meant
we needed to avoid serving pages that did database writes from
Virginia because each one would have to cross the country to our
master databases in California. Fortunately, our most frequently
accessed pages (home page, profiles, photo pages) don't do any writes
under normal operation. The problem thus boiled down to, when a user
makes a request for a page, how do we decide if it is "safe" to send
to Virginia or if it must be routed to California?
This question turned out to have a relatively straightforward answer.
One of the first servers a user request to Facebook hits is called a
load balancer; this machine's primary responsibility is picking a web
server to handle the request but it also serves a number of other
purposes: protecting against denial of service attacks and
multiplexing user connections to name a few. This load balancer has
the capability to run in Layer 7 mode where it can examine the URI a
user is requesting and make routing decisions based on that
information. This feature meant it was easy to tell the load balancer
about our "safe" pages and it could decide whether to send the request
to Virginia or California based on the page name and the user's
location.
There is another wrinkle to this problem, however. Let's say you go to
editprofile.php to change your hometown. This page isn't marked as
safe so it gets routed to California and you make the change. Then you
go to view your profile and, since it is a safe page, we send you to
Virginia. Because of the replication lag we mentioned earlier,
however, you might not see the change you just made! This experience
is very confusing for a user and also leads to double posting. We got
around this concern by setting a cookie in your browser with the
current time whenever you write something to our databases. The load
balancer also looks for that cookie and, if it notices that you wrote
something within 20 seconds, will unconditionally send you to
California. Then when 20 seconds have passed and we're certain the
data has replicated to Virginia, we'll allow you to go back for safe
pages.
I have to know whether user is online or not, not current authenticated user, but any other. Does restful_authentication has some methods to determine it?
The only way to know if somebody is online (and viewing your site) is to have an open connection to her. But, after a page is loaded the HTTP connection to your server is closed, and the client does not talk to your server again until she wants to see another page.
Given this, you can do the following:
Take a guess if the client is still there by assuming that she's "online" for x minutes after she has last requested a page.
Have the client send a "heartbeat" to your server in regular intervals using Javascript, then apply 1.
Actually have the client keep a persistent connection to your server using something like Comet, which will give you the most accurate result.
What's most appropriate depends on how accurate you need the status to be.
If your users log in then you could add a column to your users table like
rails g migration add_online_to_users online:boolean
and every time a user starts a new session you could populate that column with 'true'
user.online = true
and when the user logs out you could put
user.online = false in your sessions destroy method
I was wondering if anyone knew of a way that you could manage bandwidth within a rails application in some way that isn't dependent on the web server. For example each account has a bandwidth limit. In and out bound traffic subtracts from the monthly allowance?
One option would be to add an after_filter in application.rb (so that it applies to all actions) and do the following:
def store_bandwidth_usage
response_size = response.body.size
# Assuming the User model has a bandwidth_usage attribute
#current_user.increment!(:bandwidth_usage, response_size)
end
Of course then you would need a before_filter which checked that a user had not gone over their allocated bandwidth otherwise they should be denied access.
Keep in mind that this will only be counted for requests that hit the rails server, any requests that are filled by an front-end server (e.g. images) will not be included.