Using Rails 4 and Mongoid 5.0.1
Our log output is showing almost all queries are duplicated. For a while I assumed it was just double output, but looking closer the times for execution are different, indicating it is actually sending two requests
d_562b2d81a54d7550ce000031.find | STARTED | {"find"=>"deals", "filter"=>{"contact_id"=>BSON::ObjectId('563bcb9da54d75116500010b')}}
d_562b2d81a54d7550ce000031.find | SUCCEEDED | 0.001186s
d_562b2d81a54d7550ce000031.find | STARTED | {"find"=>"deals", "filter"=>{"contact_id"=>BSON::ObjectId('563bcb9da54d75116500010b')}}
d_562b2d81a54d7550ce000031.find | SUCCEEDED | 0.0013s
This behaviour seems to apply to most queries but not all, sometimes specific queries only get called once. These seem to happen when the queries are against the database specified in the mongoid.yml and are the first in the web request.
This behaviour happens in the web request, but also any query in the Rails Console outputs two log lines too. It happens on 'where' queries, and on 'find' too
As this is a multi-tenant app, we have the following in most models:
store_in database: -> { Machine.current.database_name }
The collection for Machine (along with Users) is stored in the master_#{Rails.env} database
The duplicate requests (in the logs) are all against the correct databases though, so this might be a red herring.
When we were on Mongoid 3 this problem was never apparent, but Mongoid 5 has significantly better logging, so the problem may have existed then too but not been noticed.
Actually I suspect it's a gem called bullet causing the duplicate logs. Turning it off solve my problems.
Related
My main idea is to count the number of ActiveRecord queries for every API hit in rails. I was looking into ActiveSupport instrumentation API. And it seems it already provides a couple of useful data.
view_runtime
db_runtime
I also found a couple of gems which would count the number of queries. But they add query counts in the log data.
https://github.com/rubysamurai/query_count
https://github.com/comboy/sql_queries_count
https://github.com/makandra/query_diet
Ex: Below is a sample log line when I used the query_count gem.
Completed 200 OK in 140ms (Views: 12.7ms | ActiveRecord: 54.4ms | SQL Queries: 36 (0 cached) | Allocations: 20449)
But instead of a log line, is it possible to expose query_count via ActiveSupport events, maybe as part of the payload of process_action.action_controller?
The easiest way to do rails endpoint instrumentation, based on my experience is to use rails-panel.
It shows how long the rendering took, how long the query took and how many queries were executed.
I'm trying to use the closure tree gem for modelling some (ordered) nested data.
The issue I'm having is that when I go to insert records into the database (mysql) it is taking about 7 seconds to insert the 200 children (well, 400 inserts).
I'm about to go down the route of doing a straight bulk insert / raw sql in order to speed things up, though this means ensuring that I've got the hierarchy calls etc. correctly.
If anyone has a strategy out there for doing bulk inserts of children with closure_tree I'd love to see it.
My call to closure_tree is : has_closure_tree order: 'position'
I have also tried setting ActiveRecord::Base.connection.execute "set autocommit = 0;" (makes no difference) and turning off advisory_lock (also makes no difference)
[edit] also tried wrapping in a transaction where I was adding the children, no joy either.
[edit] have opened an issue (which I hate doing, but I'm hoping there's a strategy I can follow for this)
My program is dealing with a deep nested object.
Here is an illustration of this nested model :
ParentObject HasMany ChildObject1 ~ 30 records
ChildObject1 HasMany ChildObject2 ~ 40 records
ChildObject2 HasMany ChildObject3 ~ 15
records
ChildObject2 HasMany ChildObject4 ~ 10 records
To have an efficient app, I have decided to split the forms used to record this data (1 form per childObject1). I also use caching and then needs to update ChildObject1 'updated_at' field everytime the ChildObject2,3,4 are updated. For this reason every childObject 'belongs_to' relation has the 'touch' option set to true.
Then, with a small server, performance are not so bad (max 1s to save data).
But once everything is recorded, I also need to duplicate the parentObject with all is childObjects.
No problem to duplicate it and build the same parentObject but when I save the object, the transaction is very long.
I looked to the server log and I saw that objects are inserted one-by-one. I also saw that after each insert, the parent 'updated_at' field is updated (due to 'touch: true' option).
It results in 30000 inserts more 60000 updates, 90000 writing queries in the database (and each object can have 3 to 6 fields...)!
Normally, 'save' method is natively using ActiveRecord::Base.transaction.
Here it doesn't happened.
I tried to remove the 'touch: true' option, it's exactly the same, inserts are done one-by-one.
So my questions are :
I thought that transactions can be applied to nested object like explain here, Am I misunderstanding something ?
Is it an example of what shouldn't be done through ActiveRecord ?
Is it possible to only do one final update of parents object with 'touch:true' option ? (SOLVED : SEE ANSWER BELOW)
Normally, is it a big work to write 90000 rows in database at once ? Maybe the puma server or the pg DB are simply bad configured ?
Thanks by advance for your help. If there's no solution, I will automate this work by night...
I solved a first part of the problem with https://github.com/godaddy/activerecord-delay_touching
This gem delayed the "touch" update at the end of the batch. It's cleaner with this !
But I still have problems with the transactions. I still don't know if I can insert all the data in one single query for each table.
I have a database with a few million entities needing a friendly_id slug. Is there a way to speed up the process of saving the entities? find_each(&:save) is very slow. At 6-10 per second I'm looking at over a week of running this 24/7.
I'm just wondering if there is a method within friendly_id or parallel processing trick that can speed this process up drastically.
Currently I'm running about 10 consoles, and within each console starting the value +100k:
Model.where(slug: nil).find_each(start: value) do |e|
puts e.id
e.save
end
EDIT
Well one of the biggest things that was causing the updates to go so insanely slow is the initial find query of the entity, and not the actual saving of the record. I put the site live the other day, and looking at server database requests continually hitting 2000ms and the culprit was #entity = Entity.find(params[:id]) causing the most problems with 5+ million records. I didn't realize there was no index on the slug column and active record is doing its SELECT statements on the slug column. After indexing properly, I get 20ms response times and running the above query went from 1-2 entities per second to 1k per second. Doing multiple of them got the job down quick enough for the one time operation.
I think the fastest way to do this would be to go straight to the database, rather than using Active Record. If you have a GUI like Sequel Pro, connect to your database (the details are in your database.yml) and execute a query there. If you're comfortable on the command line you can run it straight in the database console window. Ruby and Active Record will just slow you down for something like this.
To update all the slugs of a hypothetical table called "users" where the slug will be a concatenation of their first name and last name you could do something like this in MySQL:
UPDATE users SET slug = CONCAT(first_name, "-", last_name) WHERE slug IS NULL
I have been working in Rails (I mean serious working) for last 1.5 years now. Coming from .Net background and database/OLAP development, there are many things I like about Rails but there are few things about it that just don't make sense to me. I just need some clarification for one such issue.
I have been working on an educational institute's admission process, which is just a small part of much bigger application. Now, for administrator, we needed to display list of all applied/enrolled students (which may range from 1000 to 10,000), and also give a way to export them as excel file. For now, we are just focusing on exporting in CSV format.
My questions are:
Is Rails meant to display so many records at the same time?
Is will_paginate only way to paginate records in Rails? From what I understand, it still fetches all the records from DB, and then selectively displays relevant records. Back in .Net/PHP/JSP, we used to create stored procedure and from there we selectively returns relevant records. Since, using stored procedure being a known issue in Rails, what other options do we have?
Same issue with exporting this data. I benchmarked the process i.e. receiving request at the server, execution of the query and response return. The ActiveRecord creation was taking a helluva time. Why was that? There were only like 1000 records, and the page showed connection timeout at the user. I mean, if connection times-out while working on for 1000 records, then why use Rails or it means Rails are not meant for such applications. I have previously worked with TB's of data, and never had this issue.
I never understood ORM techniques at the core. Say, we have a table users, and are associated with multiple other tables, but for displaying records, we need data from only tables users and its associated table admissions, then does it actually create objects for all its associated tables. I know, the data will be fetched only if we use the association, but does it create all the objects before-hand?
I hope, these questions are not independent and do qualify as per the guidelines of SF.
Thank you.
EDIT: Any help? I re-checked and benchmarked again, for 1000 records, where in we are joining 4-5 different tables (1000 users, 2-3 one-to-one association, and 2-3 one-to-many associations), it is creating more than 15000 objects. This is for eager loading. As for lazy loading, it will be 1000 user query plus some 20+ queries). What are other possible options for such problems and applications? I know, I am kinda bumping the question to come to top again!
Rails can handle databases with TBs of data.
Is will_paginate only way to paginate records in Rails?
There are many other gems like "kaminari".
it fetches all records from the db..
NO. It doesnt work that way. For example take the following query,Users.all.page(1).per(10)
User.all wont fire a db query, it will return a proxy object. And you call page(1) and per(10) on the proxy(ActiveRecord::Relation). When you try to access the data from the proxy object, it will execute a db query. Active record will accumulate all conditions and paramaters you pass and will execute a sql query when required.
Go to rails console and type u= User.all; "f"; ( the second statement: "f", is to prevent rails console from calling to_s on the proxy to display the result.)
It wont fire any query. Now try u[0], it will fire a query.
ActiveRecord creation was taking a helluva time
1000 records shouldn't take much time.
Check the number of sql queries fired from the db. Look for signs of
n+1 problem and fix them by eager loading.
Check the serialization of the records to csv format for any cpu or memory intensive operation.
Use a profiler and track down the function that is consuming most of the time.