Intermittent error in PostGIS query via Rails (string interpolation failure?)

Intermittent error in PostGIS query via Rails (string interpolation failure?) - ruby-on-rails

I have an intermittent error come up after some deploys of my Rails app.
This code is running in Sidekiq (5 processes each with 10 threads), which is running in a Docker container.
I can have tens of thousands of these jobs queued up at any point.
path = Path.find(path_id)
nearby_nodes = Node.where("ST_DWITHIN(geog, ST_GeographyFromText(?), 25)", path.geog.to_s)
The error is:
ActiveRecord::StatementInvalid: PG::InternalError: ERROR: parse error - invalid geometry
HINT: "01" <-- parse error at position 2 within geometry
PG::InternalError: ERROR: parse error - invalid geometry
HINT: "01" <-- parse error at position 2 within geometry
I can get these jobs to run successfully if I quiet all the Sidekiq processes, stop the workers, wait a moment, then start the workers back up.
I added a number of delays to my deploy process (guessing that slowing things down might help, if restarting workers solves the problem), but that did not help.
I can usually get one successful deploy per day. After that first deploy, it's more likely to fall into this failure state & if it gets into this state every deploy thereafter will cause this same issue.
Path.first.geog returns:
#<RGeo::Geographic::SphericalPointImpl:0x3ffd8b2a6688 "POINT (-72.633932 42.206081)">
Path.first.geog.class returns:
RGeo::Geographic::SphericalPointImpl
I've tried a number of different formats of this query, which might shed some light on how/why this is failing (though I'm still stumped as to why it's only intermittent):
Node.where("ST_DWITHIN(geog, ST_GeographyFromText(?), 25)", path.geog) fails, generating this query:
Node Load (1.0ms) SELECT "nodes".* FROM "nodes" WHERE (ST_DWITHIN(geog, ST_GeographyFromText('0020000001000010e6c05228925785f8d340451a60dcb9a9da'), 25)) LIMIT $1 [["LIMIT", 11]]
and this error:
ActiveRecord::StatementInvalid (PG::InternalError: ERROR: parse error - invalid geometry)
HINT: "00" <-- parse error at position 2 within geometry
Node.where("ST_DWITHIN(geog, ST_GeographyFromText('#{path.geog}'), 25)") succeeds, generating this query:
Node Load (5.1ms) SELECT "nodes".* FROM "nodes" WHERE (ST_DWITHIN(geog, ST_GeographyFromText('POINT (-72.633932 42.206081)'), 25)) LIMIT $1 [["LIMIT", 11]]
Node.where("ST_DWITHIN(geog, ST_GeographyFromText(?), 25)", path.geog.to_s) also succeeds, generating the same query:
Node Load (2.3ms) SELECT "nodes".* FROM "nodes" WHERE (ST_DWITHIN(geog, ST_GeographyFromText('POINT (-72.633932 42.206081)'), 25)) LIMIT $1 [["LIMIT", 11]]
Doing the to_s conversion in a preceding line as some kind of superstitious test also works:
geog_string = path.geog.to_s
nearby_nodes = Node.where("ST_DWITHIN(geog, ST_GeographyFromText(?), 25)", geog_string)
Queries 2-4 generally work, but behave like query number 1 some of the time and only after a deploy.
I could not make 2-4 behave like the first query in a Rails console.
The only time queries 2-4 behave like the first query is in a Sidekiq job after a deploy.
It's as if the string conversion isn't working sometimes.
Here's a list of potentially relevant gems/versions:
activerecord-postgis-adapter (6.0.0)
pg (1.2.3)
rails (6.0.2.2)
rgeo (2.1.1)
rgeo-activerecord (6.2.1)
sidekiq (6.0.6)
Ruby 2.6.6
PostgreSQL 11.6
PostGIS 2.5.2
Docker 19.03.8, build afacb8b7f0

There is no need to convert the geography to a string, and then to read it back as a geography.
You can try directly
Node.where("ST_DWITHIN(geog, ?, 25)", path.geog)
That being said, you may indeed have some invalid geometries

Related

How can I speed up a simple rails database query?

I am using rails 4.2.11.1, on ruby 2.6.3
I have had extremely slow requests using rails, so I benchmarked my code and found the main culprit. The majority of the slowdown happens at the database call, where I select a single row from a table in the database. I have tried a few different versions of the same idea
Using this version
Rails.logger.info Benchmark.measure{
result = Record.find_by_sql(['SELECT column FROM table WHERE condition']).first.column
}
the rails output says that the sql takes 54.5ms, but the benchmark prints out 0.043427 0.006294 0.049721 ( 1.795859), and the total request takes 1.81 seconds. When I run the above sql directly in my postgres terminal, it takes 42ms.
Obviously the problem is not that my sql is slow. 42 milliseconds is not noticeable. But 1.79 seconds is way too slow, and creates a horrible user experience.
I did some reading and came to the conclusion that the slowdown was caused by rails' object creation (which seems weird, but apparently that can be super slow) so I tried using pluck to minimize the number of objects created:
Rails.logger.info Benchmark.measure{
result = Record.where(condition).pluck(column).first
}
Now rails says that the sql took 29.3ms, and the benchmark gives 0.017989 0.006119 0.024108 ( 0.713973)
The whole request takes 0.731 seconds. This is a huge improvement, but 0.7 seconds is still a bad slowdown and still undermines the usability of my application.
What am I doing wrong? It seems insane to me that something so simple should have such a huge slowdown. If this is just how rails works I can't imagine that anyone uses it for serious applications!

find_by_sql executes a custom SQL query against your database and returns all the results.
That means all the records in your database are returned and instanciated. Only then do you pick the first one from that array by calling first on the results.
When you call first on a ActiveRecord::Relation, it will add a limit to your query and pick only that, which is the behavior you want.
That means you should be limiting the query yourself:
result = Record.find_by_sql(['SELECT column FROM table WHERE condition LIMIT 1']).first.column
I'm pretty sure that your request will be fast then as ruby doesn't need to instanciate all the result rows.

As I mentioned above, not sure why you ask for all the matches if you just want the first one.
If I do:
Rails.logger.info Benchmark.measure{
result = User.where(email: 'foo#bar.com').pluck(:email).first
}
(9.6ms) SELECT "users"."email" FROM "users" WHERE "users"."email" = $1 [["email", "foo#bar.com"]]
#<Benchmark::Tms:0x00007fc2ce4b7998 #label="", #real=0.6364280000561848, #cstime=0.00364, #cutime=0.000661, #stime=0.1469640000000001, #utime=0.1646029999999996, #total=0.3158679999999997>
Rails.logger.info Benchmark.measure{
result = User.where(email: 'foo#bar.com').limit(1).pluck(:email)
}
(1.8ms) SELECT "users"."email" FROM "users" WHERE "users"."email" = $1 LIMIT $2 [["email", "foo#bar.com.com"], ["LIMIT", 1]]
#<Benchmark::Tms:0x00007fc2ce4cd838 #label="", #real=0.004004000045824796, #cstime=0.0, #cutime=0.0, #stime=0.0005539999999997214, #utime=0.0013550000000002171, #total=0.0019089999999999385>
Rails also does cashing. If you run your query again it should be faster the second time. How complex is your where condition? That might be part of it.

Does “#includes” position matter on rails?

I find my query is taking too long to load so I'm wondering if the position of the includes matters.
Example A:
people = Person.where(name: 'guillaume').includes(:jobs)
Example B:
people = Person.includes(:jobs).where(name: 'guillaume')
Is example A faster because I should have fewer people's jobs to load?

Short answer: no.
ActiveRecord builds your query and as long as you don't need the records, it won't send the final SQL query to the database to fetch them. The 2 queries you pasted are identical.
Whenever in doubt, you can always open up rails console, write your queries there and observe the queries printed out. In your example it would be something like:
SELECT "people".* FROM "people" WHERE "people"."name" = $1 LIMIT $2 [["name", "guillaume"], ["LIMIT", 11]]
SELECT "jobs".* FROM "jobs" WHERE "jobs"."person_id" = 1
in both of the cases.

rails model querying returns 11 records but no limit is set

I am querying a set of data using a custom model Stat based on ActiveRecord, using sqlite3. It is obvious that the total amount of data is more than 100, but when I do my querying of all data, it always returns 11 records, no more of other data, and I did not set any limit to the statement, but in console it just add a limit of 11. limit Below is my code:
2.5.1 001 > Stat.all
Stat Load (2.1ms) SELECT "stat".* FROM "stat" LIMIT ? [["LIMIT", 11]]
2.5.1 002 > Stat.count
(0.4ms) SELECT COUNT(*) FROM "stat"
>> 105
Is there any way to remove automatically added limit when I am doing this?

Is your Rails version 5.1 or newer? Since 5.1, Rails only loads needed records.
To query all records from database, use Stat.all.to_a instead.
Note: this returns Array instead of ActiveRecord_Relation
See this PR: https://github.com/rails/rails/pull/28592

Does Rails automagically optimize queries

After running two similar queries like
#articles = #magazine.articles.limit(2).offset(0)
#articles = #articles.limit(2).offset(2)
I was expecting to see two SQL statements in my console being executed by the server. However, the first query is missing and only the second one is being run. Similarly, after executing the following two queries:
#articles = #magazine.articles.limit(2).offset(0)
#articles = #articles.limit(2).offset(#articles.size - 2)
the first query is completely ignored as well. These two queries generate the SQL:
SELECT COUNT(count_column) FROM (SELECT 1 AS count_column FROM "articles"
WHERE "articles"."magazine_id" = $1 LIMIT 2 OFFSET 0)
subquery_for_count [["magazine_id", 1]]
SELECT "articles".* FROM "articles"
WHERE "articles"."magazine_id" = $1
LIMIT 2 OFFSET 2 [["magazine_id", 1]]
Interestingly enough, if I change #articles.size to #articles.length both queries are run as expected. I would think since length requires the collection in memory, the first statement is forced to run. Can anyone describe what's happening here and if it's too broad a topic, point me to a good resource.

It's not so much optimising as deferring execution of the query until it really needs to execute it.
In both cases you're storing the result of building up a query in #articles. Active Record, or more accurately arel, defers execution of the query until you call a method that needs the results. I suspect that you're actually seeing the query being executed against the database when you call something like #artircles.each or #articles.count or somesuch.
You could build the query up in a series of steps and it won't actually get executed:
a = #magazine.articles
a = a.limit(2)
a = a.offset(0)
It also means you can leave some query clause that drastically reduces the result size to the end of the process:
a = a.where('created_at > ?', Time.now.at_beginning_of_day)
Still no query has been sent to the database.
The thing to watch out for is testing this logic in the rails console. If you run these steps in the console itself it tries to display the last return value (by calling .inspect I think) and by inspecting the return value it causes the query to be executed. So if you put a = Magazine.find(1).articles into the console you'll see a query immediately exeecuted which wouldn't have been if the code was run in the context of a controller action for example. If you then call a.limit(2) you'll see another query and so on.

Backtick (`) causing problems with heroku (postgres) but not with local machine

I am trying to put my app live on heroku but I am running into a problem that it doesn't like me using backticks (`) in my sql queries. Here is the error from the log:
2011-10-29T18:28:26+00:00 app[web.1]: UTER JOIN "events_users" ON "events_users"."event_id" = "events"."id" LEFT OUTER JOIN "users" ON "users"."id" = "events_users"."user_id" WHERE (`users`.id IN (2,4,17,1)) ORDER BY events.event_date DESC):
It works on my local machine because i am using sqlite but it is not working on heroku. So I have two questions:
1) Is there something else I can use instead of the backtick?
2) Is postgres a sqlite alternative I should be using so that my heroku deployment matches my local machine?

You should be able to use both with quotes and without quotes. Quotes are acceptable for columns, and single ticks for values. Resulting in something like:
OUTER JOIN "events_users" ON "events_users"."event_id" = "events"."id" LEFT OUTER JOIN "users" ON "users"."id" = "events_users"."user_id" WHERE ("users"."id" IN (2,4,17,1)) ORDER BY events.event_date DESC)
sqlite is acceptable for local development, though if you do want exact parity you could setup postgres locally to ensure that you're creating code that runs identically.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart