RSpec: How to test methods that use Parallel (PG::ConnectionBad error)

RSpec: How to test methods that use Parallel (PG::ConnectionBad error) - ruby-on-rails

In my app I have several Builder classes that are responsible for taking data received from an external API request and building/saving resources to the database. I'm dealing with a large amount of data and have implemented the Parallel gem to speed this up by using multiple processes.
However, I'm finding that any test for a method that uses Parallel fails with the same error:
ActiveRecord::StatementInvalid:
PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Here is an example of the code being tested:
class AirportBuilder < Resource
def build_from_collection
Parallel.each(object_producer, in_processes: 24) do |params|
instance = Airport.find_or_initialize_by(fsid: params[:fs])
build!(instance, params)
end
end
end
I've done some searching on this but all the results in Google have to do with using multiple threads/processes to make the test suite run faster, which is a different problem.
Any ideas on how I can test this effectively without causing the PG error? I realize I may need to stub something out but am not quite sure what to stub and still have a meaningful test.
Thanks in advance to anyone who might be able to help!

Are you using too many database connections than are configured for your test database? Maybe try setting it to a pool size equal to the needs of your script (which looks like 24)?
test:
adapter: whatever
host: whatever
username: whatever
password: whatever
database: whatever
pool: 24
Heads up that you may also want to do some math on the default ActiveRecord connection pool. Some good info in this Heroku dev center article.

Related

How to get Rspec to use the same DB connection for reader and writer DB in rails 6?

For context, this question arose because we are migration from Rails 5 to Rails 6, and introducing reader / writer database connections via the new replication features.
Our specific problem is with request specs, with an eye towards using transactional fixtures. When we run our request specs files in isolation, they pass. When run as part of a multiple-file pass (such as a full bundle exec parallel_rspec pass used on circle CI) they fail. If we turn off transactional fixtures, the tests take far too long to run, but pass.
Using byebug, we've poked in and determined that the problem is that our test data has been written to / is accessible by the writer DB connection, but the route is attempting to use the reader DB connection to read it. I. E. ActiveRecord::Base.connected_to(role: :reading) { puts Foo.count } is 0, while the same code connecting to writing role is non-zero.
The problem from there seems fairly obvious: because we're using transactional tests / fixtures, the code is never committed to the DB. It's only available on the connection it was made on. The request spec is reading from the 'right' db for the call (a GET request should use the reader db), but in the use-case of tests that's producing errors.
It seems like this is a fairly obvious use case that either Rails or rspec should have a tool for handling, we just don't seem to be able to find the relevant documentation.

You need to tell the test environment that it should be using a single connection for both. There are multiple ways of doing this:
You can configure your test environment not to use replicas at all. See Setting up your application for examples of using a replica and not using a replica then reproduce the non-replica version in your database.yml for the test environment only.
You can use connected_to within your specs themselves so that those tests are forced to use the specific connection you want them to use. One way to do this is with around hooks:
describe "around filter" do
around(:each) do |example|
puts "around each before"
ActiveRecord::Base.connected_to(role: :writing) { example.run }
puts "around each after"
end
it "gets run in order" do
puts "in the example"
end
end
You can monkey patch your ActiveRecord configuration in rails_helper so that it doesn't use replicas (but I'd really recommend #1 over this option)

Table name corruption errors in ActiveRecord

Sporadically we get PG::UndefinedTable errors while using ActiveRecord. The association table name is some how corrupted and I quite often see
Cancelled appended to the end of the table name.
E.g:
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "fooCancell" does not exist
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "Cancelled" does not exist
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "barC" does not exist
In the example above, I have obfuscated the table name by using foo and bar.
We see this errors when the rails project is running inside Puma. Queue workers seems to be doing okay.
The tables in the error message doesn't correspond to real tables or models. It looks like the case of memory corruption. Has anyone seen such issues? If so how did you get around it?
puma.rb
on_worker_boot do
ActiveRecord::Base.establish_connection
end
database.yml
production:
url: <%= ENV["DATABASE_URL"] %>
pool: <%= ENV['DB_CONNECTION_POOL_SIZE'] || 5%>
reaping_frequency: <%= ENV['DB_CONNECTION_REAPING_FREQUENCY'] || 10 %>
prepared_statements: false

I'm hazarding a guess here, based on this possibly related error...
But you might be either:
calling fork within your application; OR
calling ActiveRecord routines (using database calls) before the server (puma) is forking it's worker processes (during the app initialization).
Either of these will break ActiveRecord's synchronization and cause multiple processes to share the database connection pool without synchronizing it's use (resulting in interlaced and corrupt database commands).
If you are using fork, make sure to close all the ActiveRecord database connections and reinitialize the connection pool (there's a function call that does it, but I don't remember it of the top of my head, maybe ActiveRecord.disconnect! or ActiveRecord.connection_pool.disconnect!).
Otherwise, before running Puma (either during the initialization process or using Puma's after_fork), close all the ActiveRecord database connections and reinitialize the connection pool.

It looks like reaping_frequency may be the issue. I found a couple claims that they may have a threading bug. I would try removing that option or setting it to nil and see if that works. The only other thing I can think of is if you are manually calling Thread.new and using active record within it.
Here are the few claims against reaping:
http://omegadelta.net/2014/03/15/the-rails-grim-reaper/
https://github.com/mperham/sidekiq/issues/1936
Search for "DO fear the Reaper" here:
https://www.google.com/amp/s/bibwild.wordpress.com/2014/07/17/activerecord-concurrency-in-rails4-avoid-leaked-connections/amp/

ActiveRecord data loss when unit testing Rails + Node.js

Background: I am unit testing a game server which is built upon rails 4.1.1 and separate socket.io/node.js for socket messaging. Messages from node.js to rails are going through RESTful http requests.
Single test case runs as follows:
(1) rake unit test --> (2) rails controller --> (3) node.js/socket.io --> (4) rails controller
Problem description: Some DB entries are created with ActiveRecord at step (2), then upon receiving a socket message at step (3) node.js sends HTTP request back to rails controller and finally(!!) at step (4) rails controller tries to access DB entries from step (2), but TEST DB contents are empty at this point.
Question: It seems like desired behavior of rake to cleanup TEST DB, but how can I persist TEST db across test cases and prevent such problem?
Thanks in advance

You should prepare and send request to node app inside a test and assert response there.
But it's not a good practice. The better solution would be HTTP mocks (like webmock gem). This approach will save lots of time in the future.

Luckily, I figured out the solution.
By default, rake is wrapping all tests in separate DB transactions and rolls back on cleanup. Moreover, whatever requests/queries are coming outside of TestCase are not included in transaction and not visible inside the test case.
To avoid such behavior, we have to disable transactional fixtures in test/test_helper.rb
class ActiveSupport::TestCase
self.use_transactional_fixtures = false
end
As downside, we have to cleanup test db manually. So #Alexander Shlenchack points out to avoid such practice in the first place and use http/socket mocks in future.
Here is brief summary http://devblog.avdi.org/2012/08/31/configuring-database_cleaner-with-rails-rspec-capybara-and-selenium/
And related question Rails minitest, database cleaner how to turn use_transactional_fixtures = false

Getting information from one Rails server to Another

I'll try to keep this as brief as possible and open the question to discussion.
I have a Rails app which is a static codebase and runs on 9 different servers all the same db schema but of course with different values.
I wrote some SQL to query some dollar totals and will be putting this into either a rake task or a sidekiq worker and have it fire once a week to generate the data. Initially I was thinking of just throwing the resulting data from each server into a mailer and mailing it to whoever needs the data. This is pretty straight forward.
But there's a kink in this, we need to see metrics over time in highcharts or some other charting engine.
So here's my thought.
Create the sidekiq worker and fire it on a schedule
Take the resulting data from each server and populate it on a target server via Postgres (not sure how to do this)
The target server will have a very simple Rails app built that will have a model with metrics and an association for each server (ie server 1 server 2 etc), after populating the data via postgres (somehow) from the source servers, read the data in HighCharts and present the view
So that's my thought process so far. I'm not sure on how to get the data from the source servers via a live postgres call when the sidekiq worker fires. So that's problem #1. Problem #2 or more like question #2 is, would this be a better use case for creating some sort of consumable API on the target Rails server? If so, what's the best place to start.
If my question and thought process is unclear, please let me know so I can clarify and explain in better detail.
Cheers!

There are plenty of tutorials on how to use multiple database connections in Rails as well as building an API in Rails. A few minutes of Googling will give you plenty of examples. But here are a couple barebones approaches:
For multiple database connections, you are right, you'll need to have the connection info for both databases defined in your database.yml file. Example:
# Local Database
development:
adapter: mysql2
database: local_db
username: my_user
password: my_password
host: localhost
port: 3306
# Reporting Database
development_reporting_db:
adapter: postgresql
encoding: unicode
database: reporting
username: some_user
password: some_password
host: 1.2.3.4
port: 5432
Rails won't do anything with this extra block though unless you explicitly tell it to. The common practice is to define an abstract ActiveRecord model that will establish the second connection:
class ReportingRecord < ActiveRecord::Base
establish_connection( "#{Rails.env}_reporting_db".to_sym )
self.abstract_class = true
end
Then, create new models for tables that reside in your reporting database and inherit from ReportingRecord instead of ActiveRecord::Base:
class SomeModel < ReportingRecord
# this model sits on top of a table defined in database.yml --> development_reporting_db instead of database.yml --> development
end
For building an API, there are tons of different ways to do it. Regardless of your approach, I'd highly suggest you make sure it's only accessible via HTTPS. Here's a basic controller with one action that responds to json requests:
class ApiController < ApplicationController
before_filter :restrict_access # ensures the correct api token was passed (defined in config/secrets.yml)
skip_before_action :verify_authenticity_token # not needed since we're using token restriction
respond_to :json
def my_endpoint_action
render :json => {some_info: 'Hello World'}, :status => 200 # 200 = success
end
private
rescue_from StandardError do |e|
render :json => {:error => e.message}.to_json, :status => 400 # 400 = bad request
end
# ensures the correct api token was passed (defined in config/secrets.yml)
def restrict_access
authenticate_or_request_with_http_token do |token, options|
token == Rails.application.secrets[:my_access_token]
end
end
end
This example would require you to define an access token in your config/secrets.yml file:
development:
secret_key_base: # normal Rails secret key base
my_api_access_token: # put a token here (you can generate one on the command like using rake secret)
Choosing between an API and a multiple DB solution depends mostly on how your application might expand in the future. The multiple DB approach is typically easier to implement and has higher performance. An API tends to scale horizontally better and databases that have a connection from only one application instead of 2 or more tend to be easier to maintain over time.
Hope this helps!

SQLite3::BusyException

Running a rails site right now using SQLite3.
About once every 500 requests or so, I get a
ActiveRecord::StatementInvalid (SQLite3::BusyException: database is locked:...
What's the way to fix this that would be minimally invasive to my code?
I'm using SQLLite at the moment because you can store the DB in source control which makes backing up natural and you can push changes out very quickly. However, it's obviously not really set up for concurrent access. I'll migrate over to MySQL tomorrow morning.

You mentioned that this is a Rails site. Rails allows you to set the SQLite retry timeout in your database.yml config file:
production:
adapter: sqlite3
database: db/mysite_prod.sqlite3
timeout: 10000
The timeout value is specified in miliseconds. Increasing it to 10 or 15 seconds should decrease the number of BusyExceptions you see in your log.
This is just a temporary solution, though. If your site needs true concurrency then you will have to migrate to another db engine.

By default, sqlite returns immediatly with a blocked, busy error if the database is busy and locked. You can ask for it to wait and keep trying for a while before giving up. This usually fixes the problem, unless you do have 1000s of threads accessing your db, when I agree sqlite would be inappropriate.
// set SQLite to wait and retry for up to 100ms if database locked
sqlite3_busy_timeout( db, 100 );

All of these things are true, but it doesn't answer the question, which is likely: why does my Rails app occasionally raise a SQLite3::BusyException in production?
#Shalmanese: what is the production hosting environment like? Is it on a shared host? Is the directory that contains the sqlite database on an NFS share? (Likely, on a shared host).
This problem likely has to do with the phenomena of file locking w/ NFS shares and SQLite's lack of concurrency.

If you have this issue but increasing the timeout does not change anything, you might have another concurrency issue with transactions, here is it in summary:
Begin a transaction (aquires a SHARED lock)
Read some data from DB (we are still using the SHARED lock)
Meanwhile, another process starts a transaction and write data (acquiring the RESERVED lock).
Then you try to write, you are now trying to request the RESERVED lock
SQLite raises the SQLITE_BUSY exception immediately (indenpendently of your timeout) because your previous reads may no longer be accurate by the time it can get the RESERVED lock.
One way to fix this is to patch the active_record sqlite adapter to aquire a RESERVED lock directly at the begining of the transaction by padding the :immediate option to the driver. This will decrease performance a bit, but at least all your transactions will honor your timeout and occurs one after the other. Here is how to do this using prepend (Ruby 2.0+) put this in a initializer:
module SqliteTransactionFix
def begin_db_transaction
log('begin immediate transaction', nil) { #connection.transaction(:immediate) }
end
end
module ActiveRecord
module ConnectionAdapters
class SQLiteAdapter < AbstractAdapter
prepend SqliteTransactionFix
end
end
end
Read more here: https://rails.lighthouseapp.com/projects/8994/tickets/5941-sqlite3busyexceptions-are-raised-immediately-in-some-cases-despite-setting-sqlite3_busy_timeout

Just for the record. In one application with Rails 2.3.8 we found out that Rails was ignoring the "timeout" option Rifkin Habsburg suggested.
After some more investigation we found a possibly related bug in Rails dev: http://dev.rubyonrails.org/ticket/8811. And after some more investigation we found the solution (tested with Rails 2.3.8):
Edit this ActiveRecord file: activerecord-2.3.8/lib/active_record/connection_adapters/sqlite_adapter.rb
Replace this:
def begin_db_transaction #:nodoc:
catch_schema_changes { #connection.transaction }
end
with
def begin_db_transaction #:nodoc:
catch_schema_changes { #connection.transaction(:immediate) }
end
And that's all! We haven't noticed a performance drop and now the app supports many more petitions without breaking (it waits for the timeout). Sqlite is nice!

bundle exec rake db:reset
It worked for me it will reset and show the pending migration.

Sqlite can allow other processes to wait until the current one finished.
I use this line to connect when I know I may have multiple processes trying to access the Sqlite DB:
conn = sqlite3.connect('filename', isolation_level = 'exclusive')
According to the Python Sqlite Documentation:
You can control which kind of BEGIN
statements pysqlite implicitly
executes (or none at all) via the
isolation_level parameter to the
connect() call, or via the
isolation_level property of
connections.

I had a similar problem with rake db:migrate. Issue was that the working directory was on a SMB share.
I fixed it by copying the folder over to my local machine.

Most answers are for Rails rather than raw ruby, and OPs question IS for rails, which is fine. :)
So I just want to leave this solution over here should any raw ruby user have this problem, and is not using a yml configuration.
After instancing the connection, you can set it like this:
db = SQLite3::Database.new "#{path_to_your_db}/your_file.db"
db.busy_timeout=(15000) # in ms, meaning it will retry for 15 seconds before it raises an exception.
#This can be any number you want. Default value is 0.

Source: this link
- Open the database
db = sqlite3.open("filename")
-- Ten attempts are made to proceed, if the database is locked
function my_busy_handler(attempts_made)
if attempts_made < 10 then
return true
else
return false
end
end
-- Set the new busy handler
db:set_busy_handler(my_busy_handler)
-- Use the database
db:exec(...)

What table is being accessed when the lock is encountered?
Do you have long-running transactions?
Can you figure out which requests were still being processed when the lock was encountered?

Argh - the bane of my existence over the last week. Sqlite3 locks the db file when any process writes to the database. IE any UPDATE/INSERT type query (also select count(*) for some reason). However, it handles multiple reads just fine.
So, I finally got frustrated enough to write my own thread locking code around the database calls. By ensuring that the application can only have one thread writing to the database at any point, I was able to scale to 1000's of threads.
And yea, its slow as hell. But its also fast enough and correct, which is a nice property to have.

I found a deadlock on sqlite3 ruby extension and fix it here: have a go with it and see if this fixes ur problem.
https://github.com/dxj19831029/sqlite3-ruby
I opened a pull request, no response from them anymore.
Anyway, some busy exception is expected as described in sqlite3 itself.
Be aware with this condition: sqlite busy
The presence of a busy handler does not guarantee that it will be invoked when there is
lock contention. If SQLite determines that invoking the busy handler could result in a
deadlock, it will go ahead and return SQLITE_BUSY or SQLITE_IOERR_BLOCKED instead of
invoking the busy handler. Consider a scenario where one process is holding a read lock
that it is trying to promote to a reserved lock and a second process is holding a reserved
lock that it is trying to promote to an exclusive lock. The first process cannot proceed
because it is blocked by the second and the second process cannot proceed because it is
blocked by the first. If both processes invoke the busy handlers, neither will make any
progress. Therefore, SQLite returns SQLITE_BUSY for the first process, hoping that this
will induce the first process to release its read lock and allow the second process to
proceed.
If you meet this condition, timeout isn't valid anymore. To avoid it, don't put select inside begin/commit. or use exclusive lock for begin/commit.
Hope this helps. :)

this is often a consecutive fault of multiple processes accessing the same database, i.e. if the "allow only one instance" flag was not set in RubyMine

Try running the following, it may help:
ActiveRecord::Base.connection.execute("BEGIN TRANSACTION; END;")
From: Ruby: SQLite3::BusyException: database is locked:
This may clear up the any transaction holding up the system

I believe this happens when a transaction times out. You really should be using a "real" database. Something like Drizzle, or MySQL. Any reason why you prefer SQLite over the two prior options?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

RSpec: How to test methods that use Parallel (PG::ConnectionBad error) - ruby-on-rails

Related

How to get Rspec to use the same DB connection for reader and writer DB in rails 6?

Table name corruption errors in ActiveRecord

ActiveRecord data loss when unit testing Rails + Node.js

Getting information from one Rails server to Another

SQLite3::BusyException

Categories

Resources