Strange Performance issues with where() vs find() - ruby-on-rails

Mongoid 3.1.6
Rails 3.2.21
MongoDB 2.4.9
We're seeing strange performance issues with find() vs where().first:
$ rails c
2.1.5 :001 > Benchmark.ms { User.find('5091e4beccbce30200000006') }
=> 7.95
2.1.5 :002 > Benchmark.ms { User.find('5091e4beccbce30200000006') }
=> 0.27599999999999997
2.1.5 :003 > Benchmark.ms { User.find('5091e4beccbce30200000006') }
=> 0.215
2.1.5 :004 > exit
$ rails c
2.1.5 :001 > Benchmark.ms { User.where(id: '5091e4beccbce30200000006').first }
=> 7.779999999999999
2.1.5 :002 > Benchmark.ms { User.where(id: '5091e4beccbce30200000006').first }
=> 4.84
2.1.5 :003 > Benchmark.ms { User.where(id: '5091e4beccbce30200000006').first }
=> 5.297
2.1.5 :004 > exit
These both appear to be firing off the same queries. Can someone explain why we're seeing such a huge difference in performance?
Configuration:
production:
sessions:
default:
uri: <%= REDACTED %>
options:
consistency: :strong
safe: true
max_retries: 1
retry_interval: 0
options:
identity_map_enabled: true

Here is my assumption why the first one was few orders of magnitude slower (I am writing it from mongo point of view and have zero knowledge about ruby).
The first time you fired the query it was not in the working set and this caused slower performance. The consecutive times it was already there and thus performance is better. If you have small number of documents, I would find this behavior strange (because I would expect that all of them would be in a working set).
The second part with $where surprises me because I would expect all the numbers be bigger than with find() (it is not the case with the first event) because:
The $where provides greater flexibility, but requires that the
database processes the JavaScript expression or function for each
document in the collection.

It appears that find uses the identity map, while where does not. If I set identity_map_enabled to false, then the performance of find vs where is identical.
Moral of the story: use find instead of where when possible.
I've heard that the identity map is removed in Mongoid 4.x. So maybe this issue only affects folks on older versions.

Related

What's the fastest way to delete all errbit errors from mongodb?

I'd like to start over with errbit - there are millions of records in our mongodb database and we hadn't been cleaning them up. I'd like to start over, but I don't want to lose my user accounts.
I've tried to run these routines (https://mensfeld.pl/2015/01/making-errbit-work-faster-by-keeping-it-clean-and-tidy/):
bundle exec rake errbit:clear_resolved
desc 'Resolves problems that didnt occur for 2 weeks'
task :cleanup => :environment do
offset = 2.weeks.ago
Problem.where(:updated_at.lt => offset).map(&:resolve!)
Notice.where(:updated_at.lt => offset).destroy_all
end
but the second one (deleting problems and notices over 2 weeks old), just seems to run forever.
Querying problems and notices collections via mongo shell doesn't seem to show any being deleted... we're using errbit V 0.7.0-dev and mongodb version 3.2.22.
Fastest way would be to get a mongo console and drop most of the collections. I'd say stop your errbit server, get a mongo console, connect to the db you use and run:
> db.errs.drop()
true
> db.problems.drop()
true
> db.backtraces.drop()
true
> db.notices.drop()
true
> db.comments.drop()
Problem.where(:updated_at.lt => 2.months.ago).destroy_all
runs too long because of N+1 problem with recursive deletion of Err, Notice and Comment, also mongoid does not support nested eager loading, so only way to delete faster - is to manually take these ids and delete directly, without callbacks:
problem_ids = Problem.where(:updated_at.lt => 2.months.ago).pluck(:id)
err_ids = Err.where(problem_id: {:$in => problem_ids}).pluck(:id)
Notice.where(err_id:{:$in => err_ids}).delete_all
Err.where(id:{:$in => err_ids}).delete_all
Comment.where(problem_id: {:$in => problem_ids}).delete_all
Problem.where(id: {:$in => problem_ids}).delete_all

Ruby date today? - unexpected behavior

I'm new to Ruby and am trying to figure out why the following doesn't work as expected:
2.2.1 :010 > user_date = Date.today
=> Sun, 31 May 2015
2.2.1 :011 > user_date.today?
=> false
I'm using the Rails console and the commands are executed one after the other (with maybe a second between executions). I'm sure there is nuance that I'm not understanding, but shouldn't the second command return true instead of false? If not, why?
Thanks in advance!
Edit #1 - Additional information requested by Arup
2.2.1 :013 > puts user_date.method(:today?).owner
DateAndTime::Calculations
=> nil
Edit #2 - So I had a hunch. I'm on US Eastern time and it was coming up to midnight when I ran into the original issue. I waited for the turn of midnight, and now the following works.
2.2.1 :004 > user_date = Date.today
=> Mon, 01 Jun 2015
2.2.1 :005 > user_date.today?
=> true
Date.today belongs to core Ruby while today? belongs to Rails.
Under the hood, today? calls Date.current(Rails as well) instead of Date.today.
Going a bit further, we find that Date.current takes the current Rails time zone into account if one is configured. That should be the source of your mismatch.

Best way to read very huge Excel [XSLX] using node Or ruby

I need to parse XLXS which is around 25 MB in size [have about 1 million records] . I read through lot of Node modules including below
https://github.com/trevordixon/excel.js
https://github.com/dkiyatkin/node-office
I also tried using the Ruby with Roo
https://github.com/Empact/roo
But they are hanging. Is there any suggestion to do this Or I need to end up in splitting the files in to multiple small pieces ?
While Using "oxcelix" as per "carlosramireziii" suggestion!
" https://github.com/gbiczo/oxcelix "
2.0.0-p247 :001 > require 'oxcelix'
=> true
2.0.0-p247 :002 > s = Oxcelix::Workbook.new("/var/www/fullcontact/current/public/uploads/fileupload/filename/Book1.xlsx")
Killed
root#createresume:/var/www/fullcontact/current/public/uploads# irb
2.0.0-p247 :001 > require 'oxcelix'
=> true
2.0.0-p247 :002 > s = Oxcelix::Workbook.new("/var/www/fullcontact/current/public/uploads/fileupload/filename/Book1.xlsx")
Errno::EEXIST: File exists - /var/www/fullcontact/shared/uploads/tmp
from /usr/local/rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/fileutils.rb:245:in `mkdir'
from /usr/local/rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/fileutils.rb:245:in `fu_mkdir'
from /usr/local/rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/fileutils.rb:174:in `block in mkdir'
from /usr/local/rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/fileutils.rb:173:in `each'
from /usr/local/rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/fileutils.rb:173:in `mkdir'
from /usr/local/rvm/gems/ruby-2.0.0-p247/gems/oxcelix-0.3.2/lib/oxcelix/workbook.rb:52:in `initialize'
from (irb):2:in `new'
from (irb):2
from /usr/local/rvm/rubies/ruby-2.0.0-p247/bin/irb:13:in `<main>'
2.0.0-p247 :003 > exit
root#createresume:/var/www/fullcontact/current/public/uploads# rm -rf tmp/
root#createresume:/var/www/fullcontact/current/public/uploads# irb
2.0.0-p247 :001 > require 'oxcelix'
=> true
2.0.0-p247 :002 > s = Oxcelix::Workbook.new("/var/www/fullcontact/current/public/uploads/fileupload/filename/Book1.xlsx")
Killed
root#createresume:/var/www/fullcontact/current/public/uploads#
Depending on the parsing library you use, your parsing routine might be attempting to turn the entire XLXS file into objects which then get stored in memory. For very large files, this could result in the hanging behavior that you are seeing.
One option which is frequently used to avoid this issue is to use a SAX parser. Rather than trying to parsing the entire file at once, a SAX parser will sequentially read each piece of the document one bit at a time which won't result in the memory explosion of the former method.
For parsing XLSX documents, you should try the Oxcelix gem for Ruby which uses a SAX parser under the covers.
https://github.com/gbiczo/oxcelix
UPDATE:
Unfortunately, the Oxcelix gem does use SAX parsing under the covers but it then returns the result of the parsing as an array, which, in the case of very large files, will blow up in memory.
If you were able to convert your Excel sheet into XML, then you could make use of any SAX-style parser. In this case, I would recommend this fork of SAXMachine, which allows you to create declarative models and returns them sequentially using the lazy option.
I had a similar problem with a very large xml file. Performance wise it is best to "cut" it down into smaller junks and process each of them separately.

Rails - tinytds crashing ruby

When querying MSSQL 2008 database using freetds and tinytds gem with syntax below:
db = TinyTds::Client.new(:username => ...)
select = db.execute("EXEC dbo.__stored_procedure__")
db.close
Then this line is causing ruby to crash on windows:
select.each {|x| p x}
Strange thing, when querying simple select:
select = db.execute("SELECT field FROM table")
select.each doesn't crash - it doesn't do any loop either
It doesn't crash webrick nor rails console either.
But when I change code to:
db = TinyTds::Client.new(:username => ...)
select = []
db.execute("EXEC dbo.__stored_procedure__").each { |x|
select << x
}
db.close
Then it works like a charm (even with select).
Don't how it works on os better than windows...
Your expectations are incorrect. I suggest you read over the TinyTDS usage here.
https://github.com/rails-sqlserver/tiny_tds#tinytdsclient-usage

Strange Rails console behaviour

When I run a multi-line statement in the Rails 3.0.1 console, pressing enter doesn't actually run the statement. Instead, it goes to a new console line, and the cursor has been tabbed to the right. Then I have to run a basic line (like p "hey"), and then the multi-line statement will run.
ruby-1.9.2-p0 > images = Image.all;images.each do |im|; if im.imagestore_width.blank?;im.save;end;
ruby-1.9.2-p0 > p "hey"
I've been doing it like this for awhile and it's been working okay. But now I've got a problem in the console and it might be related. When I ran the above code, instead of it working like it normally does, it just went to a new console line with a ? added
ruby-1.9.2-p0 > images = Image.all;images.each do |im|; if im.imagestore_width.blank?;im.save;end;
ruby-1.9.2-p0 > p "hey"
ruby-1.9.2-p0 ?>
When it does this, I can't exit the console
ruby-1.9.2-p0 ?> exit
ruby-1.9.2-p0 ?> ^C
Are these problems related? How can I fix them?
In the line:
images = Image.all;images.each do |im|; if im.imagestore_width.blank?;im.save;end;
You have an end to close the if but not an end to close the do block of the each.
This is why the console is redisplaying the prompt asking for more input before executing your statements.
Try:
images = Image.all;images.each do |im|; if im.imagestore_width.blank?;im.save;end;end
Notice, you will see the same behaviour with brackets. irb or console won't execute until the brackets balance e.g.
irb(main):010:0> (3 *
irb(main):011:1* (2 + 1)
irb(main):012:1> )
=> 9
Dunno what's wrong with irb/console but your ruby code could look a lot nicer:
images = Image.all.each { |im| im.save if im.imagestore_width.blank? }
The general consensus is to use {} rather than do/end for single line blocks in ruby.

Resources