I'm having a problem with my Rails application where some random queries take around 5 seconds or longer to finish. Most of the time the queries are very simple (select * from x where id = ?) and the fields are even indexed too.
Here's some more information about the setup:
Puma 3.5.0 behind a reversed nginx proxy
4 workers with minimum 4, max 8 threads each.
Ruby v2.2.3, Rails v4.2.4
PostgreSQL 9.4 database
Thread pool set to max 60 connections
Appsignal for monitoring
8GB RAM, 4 CPU's, SSD.
I found this out when looking at the query performance in Appsignal. I noticed most queries finishing in a few ms and then every now and then, still in the same request, there are multiple queries that take 5+ seconds to finish. And the odd part is that it ALWAYS takes 5,.. seconds.
Here's a picture of that in action:
Things I've tried:
Increase the thread pool to make sure the puma worker threads have enough connection objects.
Set 'reaping_frequency' to 10s to make sure there are no dead connections being used.
Increase puma workers/threads
I'm noticing this in the application as there are some pages that take a long time to load (I have a function call that takes about 1 minute to finish) and somehow this is blocking new requests. This is strange to me as there are 4 workers each with 8 threads = 32 threads that can handle the other requests.
I ran an explain on the query in the picture above, this is the output:
Limit (cost=0.28..8.30 rows=1 width=150)
-> Index Scan using index_addresses_on_addressable_id_and_addressable_type on addresses (cost=0.28..8.30 rows=1 width=150)
Index Cond: ((addressable_id = 1) AND ((addressable_type)::text = 'University'::text))
Filter: (deleted_at IS NULL)
Total query runtime: 13 ms
And this is the schema of the addresses table:
# Table name: addresses
#
# id :integer not null, primary key
# street :string
# zip_code :string
# city :string
# country :string
# addressable_id :integer
# addressable_type :string
# created_at :datetime not null
# updated_at :datetime not null
# street_number :string
# latitude :float
# longitude :float
# mobile :string
# phone :string
# email :string
# deleted_at :datetime
# name :string`
Here's my Puma config file:
#!/usr/bin/env puma
directory '/home/deployer/apps/qeystate/current'
rackup "/home/deployer/apps/qeystate/current/config.ru"
environment 'staging'
pidfile "/home/deployer/apps/qeystate/shared/tmp/pids/puma.pid"
state_path "/home/deployer/apps/qeystate/shared/tmp/pids/puma.state"
stdout_redirect '/home/deployer/apps/qeystate/shared/log/puma_access.log', '/home/deployer/apps/qeystate/shared/log/puma_error.log', true
threads 4,8
bind 'unix:///home/deployer/apps/qeystate/shared/tmp/sockets/puma.sock'
workers 4
preload_app!
prune_bundler
on_worker_boot do
ActiveSupport.on_load(:active_record) do
ActiveRecord::Base.establish_connection
end
end
before_fork do
ActiveRecord::Base.connection_pool.disconnect!
end
I would suggest a couple of things - two possible solutions, one testing/reproduction approach, and one suggestion for deeper metrics.
1) Possible quick solution: Spin-off that 1 minute job so it is non-blocking. See if the problem resolves from that. Try using Redis+Sidekiq which is pretty simple to get up and running (or something similar).
2) Second possible solution: Look for any full table locks or exclusive row locks being made to Postgres - see if you EVER make full table locks, and if so, find the offending statement and eliminate it.
3) Testing/replication: For testing, see if you can replicate this problem outside of production. I'd recommend jmeter as a very useful tool to simulate a lot of requests and requests of different types, and see if you can repro this in a controlled/staging context. Consistent replication is the key to resolving issues like this. Refer to your production server logs around the time that the issue occurs to generate your jmeter test requests that will hopefully help reproduce the issue.
If you can figure out a way to replicate it, then you can start tuning the simulation to see if removing or increasing/decreasing various requests eliminates the problem or changes the problem in some way..
4) Analytics: Install NewRelic or similar analytics gem to get a deeper insight into what's going on when that request comes in. You really want to get a solid picture as to whether the request is truly blocked in Postgres (by an exclusive row/table lock that is blocking your query) or whether you are backed up by a slow running query in the Puma execution queue, or somewhere inside Ruby there's an unfortunate wait state somehow.
You don't yet have enough information to solve this issue, so you really want to start exploring solutions by collecting data, alongside hypotheses of what's happening.
My general strategy for this kind of problem is (in this order):
Try some quick/easy/safe fixes (in the blind) and see if anything is resolved/changed.
Try to replicate in a non-production environment (really, really try to make this work).
Instrument the system to collect data to see what you can learn about the problem and anything related.
Related
I have a database with a few million entities needing a friendly_id slug. Is there a way to speed up the process of saving the entities? find_each(&:save) is very slow. At 6-10 per second I'm looking at over a week of running this 24/7.
I'm just wondering if there is a method within friendly_id or parallel processing trick that can speed this process up drastically.
Currently I'm running about 10 consoles, and within each console starting the value +100k:
Model.where(slug: nil).find_each(start: value) do |e|
puts e.id
e.save
end
EDIT
Well one of the biggest things that was causing the updates to go so insanely slow is the initial find query of the entity, and not the actual saving of the record. I put the site live the other day, and looking at server database requests continually hitting 2000ms and the culprit was #entity = Entity.find(params[:id]) causing the most problems with 5+ million records. I didn't realize there was no index on the slug column and active record is doing its SELECT statements on the slug column. After indexing properly, I get 20ms response times and running the above query went from 1-2 entities per second to 1k per second. Doing multiple of them got the job down quick enough for the one time operation.
I think the fastest way to do this would be to go straight to the database, rather than using Active Record. If you have a GUI like Sequel Pro, connect to your database (the details are in your database.yml) and execute a query there. If you're comfortable on the command line you can run it straight in the database console window. Ruby and Active Record will just slow you down for something like this.
To update all the slugs of a hypothetical table called "users" where the slug will be a concatenation of their first name and last name you could do something like this in MySQL:
UPDATE users SET slug = CONCAT(first_name, "-", last_name) WHERE slug IS NULL
I am trying to move my application from Postgres to Oracle, and I am facing some surprises with Oracle sequence management during seeding of initial data.
=> the objective is to run the same application on various databases (PostGres, Oracle, MSSQL), and this initial data (Admin user, parameters ...) are supposed to have specific id's, starting from 1, assigned regarding the order of creation. Of course, for this specific purpose I could hardcode the id's.
=> Migration and seeding are done by
rake db:migrate RAILS_ENV=ORACLE
rake db:seed RAILS_ENV=ORACLE
Environments have nothing specific, but the relevant ActiveRecord adapter.
With Oracle, seeded data id's do not start from 1 as expected (behaviour in Postgres or in MS SQL), but they start with 10000.
Having a look at sequences created during db migration, they all start with 10000 (LAST_NUMBER).
Is it an Oracle way, or is it an activerecord-oracle_enhanced-adapter way of doing things ?
Why is it set like this ?
Is there a way to start numbering from 1 ?
Thanks for your help,
Best regards,
Fred
Is it an Oracle way, or is it an activerecord-oracle_enhanced-adapter
way of doing things ?
This is the adapter's way of doing things. Oracle will by default use the min value of the sequence that is created (so typically 1).
The adapter, as of version 1.6.7, is setting this in oracle_enhanced_adapter.rb:
self.default_sequence_start_value = 10000
Is there a way to start numbering from 1 ?
You can override this. Passing the :sequence_start_value option to the create_table method allows you to specify your own.
In a migration this might look like:
create_table :table_name, primary_key_trigger: true, sequence_start_value: 1 do |t|
...
ID's should have no business value. I would change your approach so that you don't care what the dbms uses.
I would consider adding an additional key that is populated manually by a trigger and/or stored procedures that populate the field, starting at one and incrementing by 1.
If I had to update 50,000 users, how would I go about it in a way that is best with a background processing library and not a N+1 issue?
I have users, membership, and points.
Memberships are related to total point values. If the membership is modified with point values I have to run through all of the users to update their proper membership. This is what I need to queue so the server isn't hanging for 30+ minutes.
Right now I have in a controller action
def update_memberberships
User.find_each do |user|
user.update_membership_level! # looks for a Membership defined by x points and assigns it to user. Then Saves the user.
end
end
This is a very expensive operation. How would I optimize for processing and in background so the post is near instantaneous from the form?
You seem to be after how to get this done with Resque or delayed_job. I'll give an example with delayed_job.
To create the job, add a method to app/models/user.rb:
def self.process_x_update
User.where("z = 1").find_each(:batch_size => 5000) do |user|
user.x = user.y + 3
user.save
end
end
handle_asynchronously :process_x_update
This will update all User records where z = 1, setting user.x = user.y + 3. This will complete this in batches of 5,000, so that performance is a bit more linear.
This will cause User.process_x_update to complete very quickly. To actually process the job, you should be running rake jobs:work in the background or start a cluster of daemons with ./script/delayed_job start
One other thing: can you move this logic to one SQL statement? That way you could have one statement that's fast and atomic. You'd still want to do this in the background as it could take some time to process. You could do something like:
def process_x_update
User.where("z = 1").update_all("x = y + 3")
end
handle_asynchronously :process_x_update
You're looking for update_all. From the docs:
Updates all records with details given if they match a set of conditions supplied, limits and order can also be supplied.
It'll probably still take awhile on the SQL side, but you can at least do it with one statement. Check out the documentation to see usage examples.
I have a database that has a list of rows that need to be operated on. It looks something like this:
id remaining delivered locked
============================================
1 10 24 f
2 6 0 f
3 0 14 f
I am using DataMapper with Ruby, but really I think this is a general programming question that isn't specific to the exact implementation I'm using...
I am creating a bunch of worker threads that do something like this (pseudo-ruby-code):
while true do
t = any_row_in_database_where_remaining_greater_than_zero_and_unlocked
t.lock # update database to set locked = true
t.do_some_stuff
t.delivered += 1
t.remaining -= 1
t.unlock
end
Of course, the problem is, these threads compete with each other and the whole thing isn't really thread safe. The first line in the while loop can easily pull out the same row in multiple threads before they get a chance to get locked.
I need to make sure one thread is only working on one row at the same time.
What is the best way to do this?
The key step is when you select an unlocked row from the database and mark it as locked. If you can do that safely then everything else will be fine.
2 ways I know of that can make this safe are pessimistic and optimistic locking. They both rely on your database as the ultimate guarantor when it comes to concurrency.
Pessimistic Locking
Pessimistic locking means acquiring a lock upfront when you select the rows you want to work with, so that no one else can read them.
Something like
SELECT * from some_table WHERE ... FOR UPDATE
works with mysql and postgres (and possibly others) and will prevent any other connection to the database from reading the rows returned to you (how granular that lock is depends on the engine used, indexes etc - check your database's documentation). It's called pessimistic because you are assuming that a concurrency problem will occur and acquire the lock preventatively. It does mean that you bear the cost of locking even when not necessary and may reduce your concurrency depending on the granularity of the lock you have.
Optimistic Locking
Optimistic locking refers to a technique where you don't want the burden of a pessimistic lock because most of the time there won't be concurrent updates (if you update the row setting the locked flag to true as soon as you have read the row, the window is relatively small). AFAIK this only works when updating one row at a time
First add an integer column lock_version to the table. Whenever you update the table, increment lock_version by 1 alongside the other updates you are making. Assume the current lock_version is 3. When you update, change the update query to
update some_table set ... where id=12345 and lock_version = 3
and check the number of rows updated (the db driver returns this). if this updates 1 row then you know everything was ok. If this updates 0 rows then either the row you wanted was deleted or its lock version has changed, so you go back to step 1 in your process and search for a new row to work on.
I'm not a datamapper user so I don't know whether it / plugins for it provide support for these approaches. Active Record supports both so you can look there for inspiration if data mapper doesn't.
I would use a Mutex:
# outside your threads
worker_updater = Mutex.new
# inside each thread's updater
while true
worker_updater.synchronize do
# your code here
end
sleep 0.1 # Slow down there, mister!
end
This guarantees that only one thread at a time can enter the code in the synchronize. For optimal performance, consider what portion of your code needs to be thread-safe (first two lines?) and only wrap that portion in the Mutex.
Hey. I use delayed_job for background processing. I have 8 CPU server, MySQL and I start 7 delayed_job processes
RAILS_ENV=production script/delayed_job -n 7 start
Q1:
I'm wondering is it possible that 2 or more delayed_job processes start processing the same process (the same record-row in the database delayed_jobs). I checked the code of the delayed_job plugin but can not find the lock directive in a way it should be (no lock table or SELECT...FOR UPDATE).
I think each process should lock the database table before executing an UPDATE on lock_by column. They lock the record simply by updating the locked_by field (UPDATE delayed_jobs SET locked_by...). Is that really enough? No locking needed? Why? I know that UPDATE has higher priority than SELECT but I think this does not have the effect in this case.
My understanding of the multy-threaded situation is:
Process1: Get waiting job X. [OK]
Process2: Get waiting jobs X. [OK]
Process1: Update locked_by field. [OK]
Process2: Update locked_by field. [OK]
Process1: Get waiting job X. [Already processed]
Process2: Get waiting jobs X. [Already processed]
I think in some cases more jobs can get the same information and can start processing the same process.
Q2:
Is 7 delayed_jobs a good number for 8CPU server? Why yes/not.
Thx 10x!
I think the answer to your question is in line 168 of 'lib/delayed_job/job.rb':
self.class.update_all(["locked_at = ?, locked_by = ?", now, worker], ["id = ? and (locked_at is null or locked_at < ?)", id, (now - max_run_time.to_i)])
Here the update of the row is only performed, if no other worker has already locked the job and this is checked if the table is updated. A table lock or similar (which by the way would massively reduce the performance of your app) is not needed, since your DBMS ensures that the execution of a single query is isolated from effects off other queries. In your example Process2 can't get the lock for job X, since it updates the jobs table if and only if it was not locked before.
To your second question: It depends. On an 8 CPU server. which is dedicated for this job, 8 workers are a good starting point, since workers are single threaded you should run one for every core. Depending on your setup more or less workers are better. It heavily depends on your jobs. Take your jobs advantage of mutiple cores? Or does your job wait most of the time for external resources? You have experiment with different settings and have a look at all involved resources.