I'm accessing Solr in a Ruby on Rails application by using rsolr (not Sunspot). I create the local solr object that I use to send requests like this:
solr = RSolr.connect(:url => "http://localhost:8983/solr")
as far as I understand, this is not really a connection but just an object that will issue requests on demand, so it shouldn't be expensive to keep it initialized and it should never disconnect. According to that, it should be ok to have one global solr object, create it at start time and forget about it. Right? But maybe it's not thread safe?
When should I create the solr connection?
All that the RSolr.connect method really does is sanitize and save the options that you're using. You can see that method here. It's passed a new connection object (which, notably, doesn't have an initialize method, so it's not doing anything when created) and the options that you pass to RSolr.connect.
So yes, you're right -- no harm at all in connecting once and leaving it connected forever hanging around in a variable somewhere. (For example, I memoize the result of RSolr.connect in my Solr/Rails app.)
Related
I'm working on a Rails 7 app with some pretty tight response time SLAs. I am well within SLA during normal runtime. Where I fall painfully short is first request. I've added an initializer that will load up ActiveRecord and make sure all of my DB models are loaded. It hydrates some various memory caches, etc. This took me pretty far. My first response time was reduced about 60%. However, I've been trying to figure out a couple things are are still slowing down first response time.
First API request does a check to see if I need to do a rails migration. I've not figured out how to move this check to init.
First API request appears to be be using a fresh DB Pool.. not the one that was used in init phase. I've tried fully hydrating the pool to spare the API from creating them when Rails kicks on, but I've not figured it out.
In an initializer I can do something like this:
connections = []
ActiveRecord::Base.connection.pool.size.times do
connections << ActiveRecord::Base.connection.pool.checkout
end
connections.each { ActiveRecord::Base.connection.pool.checkin(_1) }
According to my PG logs, this opens up the connections and Rails does all of this typing queries, setting session properties, etc. However, when I go to fire off my first API call, my pool is empty.
In the end what ended up being the general issue was I needed to be hydrating the pool with the correct connections. on_worker_boot is because this is running behind puma.
on_worker_boot do
ActiveRecord::Base.connected_to(role: :reading) do
# spin up db connections
connections = []
(ActiveRecord::Base.connection.pool.size - 1).times do
connections << ActiveRecord::Base.connection.pool.checkout
end
connections.each { |x| ActiveRecord::Base.connection.pool.checkin(x) }
end
end
I'm working on a rails based web backend, and I've ran into a bit of an issue. I'm building a crypto trading based application, which relies on knowing the exact current price of many cryptos/stocks. To do this I seem to need a websocket to update certain data, however I can't figure out how to store this data. I need to be able to write to it on every websocket update, as well as read from it when sending out data to the front end. Both of these actions seem too fast to rely on my database, so I'm wondering if there is a better option. My idea was to use a class with a class method that is set on server startup. Then read from/write to that method when needed. The class looks something like this
class CryptoSocket
def self.start
##cryptos = {
BTC: 0,
ETH: 0,
DOGE: 0
}
end
def self.value(symbol)
##cryptos[symbol]
end
end
Inside the start method is a websocket which gets opened, and on message writes to ##cryptos with the updated value of the coin. I call CryptoSocket.start when the server boots up
To get the value for a symbol I can just call CryptoSocket.value(symbol) anywhere in my app. This seemed like it was working, however I've noticed sometimes it fails telling me NameError: uninitialized class variable ##cryptos in CryptoSocket
It seems like the issue is running reload! in the rails console, or entering a binding.pry then exiting. My guess is some garbage collection is happening, but overall it's something I'd like to avoid.
Does anyone have a suggestion for a better way to go about setting this class up? Does rails have a better way to persist an object in memory? It's fine to lose it when the server shuts down, but I would like to keep access to it when the server stays up
I have a sneaker worker(given below) as a backend of a chatbot.
class RabbitMQWorker
include Sneakers::Worker
from_queue "message"
def work(options)
parsed_options = JSON.parse(options)
# Initializing some object variables
#question_id = parsed_options[:question_id]
#answer = parsed_option[:answer]
#session_id = parsed_option[:session_id]
ActiveRecord::Base.connection_pool.with_connection do
# send next question to the session_id based on answer
end
ack!
end
end
What's happening
The problem I am facing here is that when I run sneaker with more than 1 thread and multiple users are chatting at the same time, then the ampq event which comes slightly later cause to override the #session_id and as a result, the second user gets two questions and the first one gets none. This happens because by the time 1st event is getting process the second event came and override #session_id. Now when it's time to send the next question to the first user by using #session_id, the question get's send to the second user.
My Questions
Do the work method and any instance variables I create in it works like global mutable data for sneaker's threads?
If yes then I am guessing I need to make them as thread-local variables. If I do that, then do I need to make these changes deep down in my Rails logic as well? As this worker works with Rails.
Curiosity question
How does Puma manage these things? It is a multi-threaded app server and we use instance variables in controllers and it manages to serve multiple requests simultaneously. Does it mean that Puma handles this multi-contexting implicitly and Sneakers don't?
What I have done till now
I read the documentation of Sneaker and couldn't found anything regarding this.
I perform a load tests to verify the problem and it is the problem as I stated above.
I tried getting my logic clear on how actually multi-threading works but everywhere there is only general stuff. The curiosity question I asked above will help a lot in terms of clearing the concepts, I am searching for an explanation of it for days but couldn't found any.
After 2 days of searching for an issue where messages seemed to get mixed up I was finally able to solve this by removing all instance variables from my workers.
This thread gave me the clue to do so: https://github.com/jondot/sneakers/issues/244
maybe we should simply disallow instance variables in workers since
changing the behavior to instantiate multiple worker instances might
break existing code somehow
and:
I think that an instance per thread is the way to go.
So when you remove your instance variables you should be fine!
In one of my old apps, I'm using several API connectors - like AWS or Mandill as example.
For some reason (may be I saw it somewhere, don't remember), I using class constant to initialize this objects on init stage of application.
As example:
/initializers/mandrill.rb:
require 'mandrill'
MANDRILL = Mandrill::API.new ENV['MANDRILL_APIKEY']
Now I can access MANDRILL class constant of my application in any method and use it. (full path MyApplication::Application::MANDRILL, or just MANDRILL). All working fine, example:
def update_mandrill
result = MANDRILL.inbound.update_route id, pattern, url
end
The question is: it is good practice to use such class constants? Or better create new class instance in every method that using this instance, like in example:
def update_mandrill
require 'mandrill'
mandrill = Mandrill::API.new ENV['MANDRILL_APIKEY']
result = mandrill.inbound.update_route id, pattern, url
end
Interesting question.
It's very handy approach but it may have cons in some scenarios.
Imagine you have a constant that either takes a lot of time to initialize or it loads a lot of data into memory. When its initialization takes long you essentially degrade app boot time (which may or may not be a problem, usually it will in development).
If it loads a lot of data into memory it may turn out it's gonna be a problem when running rake tasks for example which load entire environment. You may hit memory boundaries in use cases in which you essentially do not need this data at all.
I know one application which load a lot of data during boot - and it's done very deliberately. Sure, use case is a bit uncommon, but still.
Another thing to consider is - imagine, you're trying to establish connection to external service like Mongo or anything else. If this service is unavailable (what happens) your application won't be able to boot. Maybe this service is essential for app to work, and without it it would be "useless" anyway, but it's also possible that you essentially stop everything because storage in which you keeps log does not work.
I'm not saying you shouldn't use it as you suggested - I do it also in my apps, but you should be aware of potential drawbacks.
Yes, pre-creating a pseudo-constant object (like that api client) is usually a good idea. However, there is, approximately, a thousand ways go about it and the constant is not on top of my personal list.
These days I usually go with setting it in the env files.
# config/environments/production.rb
config.email_client = Mandrill::API.new ENV['MANDRILL_APIKEY'] # the real thing
# config/environments/test.rb
config.email_client = a_null_object # something that conforms to the same api, but does absolutely nothing
# config/environments/development.rb
config.email_client = a_dev_object # post to local smtp, or something
Then you refer to the client like this:
Rails.application.configuration.email_client
And the correct behaviour will be picked up in each env.
If I don't need this per-env variation, then I either use some kind of singleton object (EmailClient.get) or a global variable in the initializer ($email_client). It can be argued that a constant is better than global variable, semantically and because it raises a warning when you try to re-assign it. But I like that global variable stands out more. You see right away that it's something special. (And then again, it's only #3 in the list, so I don't do it very often.).
When you use ActiveRecord::Base.connection.execute(sql_string), should you call clear on the result in order to free memory?
At 19:09 in this podcast, the speaker (a Rails committer who has done a lot of work on Active Record) says that if we use ActiveRecord::Base.connection.execute, we should call clear on the result, or we should use the method ActiveRecord::Base.connection.execute_and_clear, which takes a block.
(He’s a bit unclear on the method names. The method for the MySQL adapter is free and the method for the Postgres adapter is clear. He also mentions release, but that method doesn't exist.)
My understanding is that he's saying we should change
result = ActiveRecord::Base.connection.execute(sql_string).to_a
process_result(result)
to
ActiveRecord::Base.connection.execute_and_clear(sql_string, "SCHEMA", []) do |result|
process_result(result)
end
or
result = ActiveRecord::Base.connection.execute(sql_string)
process_result(result)
result.clear
That podcast was the only place I've heard this claim, and I couldn't find any other information about it. The Rails app I'm working on uses execute without clear in a number of instances, and we don't know of any problems caused by it. Are there certain circumstances under which failing to call clear is more likely to cause memory problems?
It depends on the adapter. Keep in mind that Rails doesn't control the object that is returned by execute. If you're using PostgreSQL, you'll get back a PG::Result, and using the mysql2 adapter, you'll get back a Mysql2::Result.
For PG (documented here), you need to call clear unless autoclear? returns true or you'll get a memory leak. You may also want to call clear manually if you've got a large enough result set to ensure it doesn't cause memory issues before it gets cleaned up.
Mysql2 doesn't appear to expose its free through the Ruby API, and appears to always clean itself up during GC.