Rails reuse object to save memory in bulk import - ruby-on-rails

I'm currently using SmarterCSV to do bulk CSV import via MongoDB's upsert commands. I have the following code excerpt:
SmarterCSV.process(csv, csv_options) do |chunk|
chunk.each do |row|
#creates a temporary user to store the object
user = User.new
#converts row info to populate user object
#creates an array of commands that can be executed by MongoDB via user.as_document
updates << {:q => {:email => user.email},
:u => {:$set => user.as_document},
:multi => false,
:upsert => true}
user = nil
end
end
However, I'm noticing that the memory usage keeps growing as the Garbage Collection (using Rails 3.2.14 & Ruby 2.0.0p353) doesn't seem to clear the temporary user objects fast enough.
So I tried to create user = User.new outside of the SmarterCSV process (see below) and reuse the user object within the process. This saves memory. However, user.as_document would overwrite previous elements in the updates array on each iteration. I was able to solve the problem by using user.as_document.to_json, but that doesn't set any of User's relationship correctly. For example, instead of saving a BSON reference for an relation's id, it only saves the id in string format.
Any ideas? Is there a way that I can optimize the bulk import process?
user = User.new
SmarterCSV.process(csv, csv_options) do |chunk|
chunk.each do |row|
#creates a temporary user to store the object
#converts row info to populate & reuse user object
#creates an array of commands that can be executed by MongoDB via user.as_document.to_json
updates << {:q => {:email => user.email},
:u => {:$set => user.as_document.to_json},
:multi => false,
:upsert => true}
end
end

I ended fixing this by using 'user.as_document.deep_dup'

Related

How to save google.maps.Data.MultiPolygon to geometry datatype column in postgres database in rails?

I am a beginner in rails framework, so please pardon my naive question. I have a google.maps.Data.MultiPolygon object on my frontend which I want to save in my database. The table searches creates a new entry everytime a user searches, contains different columns out of which I have added another column with datatype :geometry, which will be updated when the user draws a polygon at a specific search. So we need to update a search entry in the database, for which I am using put call. I cannot send the whole google.maps.Data.MultiPolygons object on the put call, since $.params() is unable to serialise the object (this problem is the same which is faced here).
var polygons = new google.maps.Data.MultiPolygon([polygon]);
var search_params = $.param({search: $.extend(this.state.search, {user_id: Request.user_id, search_type: search_type})});
Request.put('searches/'+this.state.search['id'], search_params, function(){});
Uncaught TypeError: Cannot read property 'lat' of undefined(…)
So, I would need to send an array of location objects. Is there a specific format in which this array of location object is directly converted to geometry object on the update function? Here is the update function which is called on search update in ruby:
def update
search = Search.find(params[:id])
if search.update_attributes(search_params)
render :json => {:recent => search}, :status => 200
else
render :json => {:error => "Could not update search."}, :status => 422
end
end
And the search_params is:
def search_params
params.require(:search).permit(
:name,
:user_id,
:q,
:drawn_polygons #This is the column I want to be updated with geometry object
)
end
Would something like :drawn_polygons => RGeo::Geos::CAPIMultiPolygonImpl work? If it does, what format of :drawn_polygons object do I need to provide?
OR
I need to take the :drawn_polygons as a list of coordinates, and change it to a RGeo::Geos::CAPIMultiPolygonImpl inside the update function? If so, how to do it?
def update
search = Search.find(params[:id])
search_parameters = search_params
drawn_polygons = JSON.parse(URI.decode(search_parameters[:drawn_polygons]))
# Some code to change the search_params[:drawn_polygons] to geometry object, specifically RGeo::Geos::CAPIMultiPolygonImpl
search_parameters(:drawn_polygons) = drawn_polygons
if search.update_attributes(search_parameters)
render :json => {:recent => search}, :status => 200
else
render :json => {:error => "Could not update search."}, :status => 422
end
end

Rails CSV upload to update records - attribute not saved to db

I have a system for updating InventoryItem records using a CSV upload.
I have this controller method:
def import
InventoryItem.import(params[:file], params[:store_id])
redirect_to vendors_dashboard_path, notice: "Inventory Imported."
end
Which of course calls this model method:
def self.import(file, store_id)
CSV.foreach(file.path, headers: true) do |row|
inventory_item = InventoryItem.find_or_initialize_by_code_and_store_id(row[0], store_id)
inventory_item.update_attributes(:price => row.to_hash.slice(:price))
end
end
I want to update only the :price attribute in the update because and :code and :store_id won't change. Currently the records being imported have price all as 0.0 (big decimal). Not nil, or the correct value, but 0.0, so clearly I'm doing something wrong to make this work. I know when I do this in the console it looks something like this:
inventory_item = InventoryItem.find_by_id(1)
inventory_item.update_attributes(:price => 29.99)
Any ideas on why I'm not updating the price attribute correctly?
Trying this, it doesn't seem like csv returns symbolized hash keys
and slice doesn't seem to work there. how about this inside your
CSV.foreach loop:
inventory_item.update_attributes(:price => row.to_hash["price"])

Trouble Importing Data into Rails, Strange Looping

I am trying to import data into rails (3.1) and I have created this rake task to parse a CSV file (generated by Excel on Mac)
desc "Import users."
task :import_users => :environment do
File.open("users.csv", "r").each do |line|
id, name, age, email = line.strip.split(',')
u = User.new(:id => id, :name => name, :age => age, :email => email)
u.save
end
end
However when I run the rake task, only the first line of the CSV file gets imported. It does not iterate over every line in the file besides the first one. Can anyone tell me why?
Not sure, but I think what is happening here is the each is representing each file rather than each line. And as there's only one file, this may not work as expected. I'd try a CSV parser instead:
CSV.foreach("users.csv") do |line|
id, name, age, email = line
u = User.new(:id => id, :name => name, :age => age, :email => email)
u.save
end
When parsing any kind of text file using ruby, be sure to check encoding and/or line endings to make sure it's a format that Ruby likes.
In this case, Ruby disliked the Mac OS X line ending format, but liked the Unix one.

Trouble importing csv file with ruby CSV Module

I'm trying to use Ruby's csv module to import the records contained in a csv file to my local table in a Ruby on Rails 3 application.
The table was created through the creation of model Movie.
Here is what I've been executing in console:
require 'csv'
CSV.foreach('public/uploads/VideoTitles2.csv') do |row|
record = Movie.new(
:media_format => row[0],
:title => row[1],
:copies_at_home => row[2],
:order => row[3]
)
record.save
end
The rows of the csv file match (in data type) the columns they're being passed into. Here is a shortened version of the csv file (VideoTitles2.csv) I'm attempting to import:
"DVD","LEAP OF FAITH",1,1
"DVD","COCOON",1,2
"DVD","TITANIC",1,3
where each record is separated by \n I believe. This csv file was exported from Access and its original file extension was .txt. I've manually changed it to .csv for sake of the import.
The problem is that, after executing the above lines in rails console, I get the following output:
=> nil
The import doesn't seem to happen. If anyone has an idea as to how I could remedy this I'd really appreciate it.
I don't see the problem. This code snippet returns nil because CSV.foreach returns nil, but this is no indication if the loop is run or not. Did you checked if any Movie was created? did you include any debug lines to follow the process?
You may want to check the output of record.save (or call record.save!), maybe validations errors are preventing the record from being created. Also, if you want the loop to return the created records, you can write this (Ruby >= 1.8.7):
require 'csv'
records = CSV.foreach('public/uploads/VideoTitles2.csv').map do |media_format, title, copies_at_home, order|
Movie.create!({
media_format: media_format,
title: title,
copies_at_home: copies_at_home,
order: order,
})
end
Okay there were two things I had wrong:
The exported csv file should not have quotations around the strings - I just removed them.
Thanks to tokland, the record.save! was necessary (as opposed to the record.save I was doing) - validation errors were preventing the records from being created.
So to conclude, one could just create the following function after creating the model/table Movie:
class Movie < ActiveRecord::Base
attr_accessible :media_format, :title, :copies_at_home, :order
require 'csv'
def self.import_movies()
CSV.foreach('public/uploads/movies.csv') do |row|
record = Movie.new(
:media_format => row[0],
:title => row[1],
:copies_at_home => row[2],
:order => row[3]
)
record.save!
end
end
end
Where movies.csv looks like the following:
Blu-ray, Movie 1, 1, 1
DVD, Movie 2, 1, 2
Blu-ray, Movie 3, 1, 3
then call this function in console as such:
Movie.import_movies()
and, as expected, all that would be returned in the console would be:
=> nil
Check your index view (if you've created one) and you should see that the records were successfully imported into the movies table.

Overriding id on create in ActiveRecord

Is there any way of overriding a model's id value on create? Something like:
Post.create(:id => 10, :title => 'Test')
would be ideal, but obviously won't work.
id is just attr_protected, which is why you can't use mass-assignment to set it. However, when setting it manually, it just works:
o = SomeObject.new
o.id = 8888
o.save!
o.reload.id # => 8888
I'm not sure what the original motivation was, but I do this when converting ActiveHash models to ActiveRecord. ActiveHash allows you to use the same belongs_to semantics in ActiveRecord, but instead of having a migration and creating a table, and incurring the overhead of the database on every call, you just store your data in yml files. The foreign keys in the database reference the in-memory ids in the yml.
ActiveHash is great for picklists and small tables that change infrequently and only change by developers. So when going from ActiveHash to ActiveRecord, it's easiest to just keep all of the foreign key references the same.
You could also use something like this:
Post.create({:id => 10, :title => 'Test'}, :without_protection => true)
Although as stated in the docs, this will bypass mass-assignment security.
Try
a_post = Post.new do |p|
p.id = 10
p.title = 'Test'
p.save
end
that should give you what you're looking for.
For Rails 4:
Post.create(:title => 'Test').update_column(:id, 10)
Other Rails 4 answers did not work for me. Many of them appeared to change when checking using the Rails Console, but when I checked the values in MySQL database, they remained unchanged. Other answers only worked sometimes.
For MySQL at least, assigning an id below the auto increment id number does not work unless you use update_column. For example,
p = Post.create(:title => 'Test')
p.id
=> 20 # 20 was the id the auto increment gave it
p2 = Post.create(:id => 40, :title => 'Test')
p2.id
=> 40 # 40 > the next auto increment id (21) so allow it
p3 = Post.create(:id => 10, :title => 'Test')
p3.id
=> 10 # Go check your database, it may say 41.
# Assigning an id to a number below the next auto generated id will not update the db
If you change create to use new + save you will still have this problem. Manually changing the id like p.id = 10 also produces this problem.
In general, I would use update_column to change the id even though it costs an extra database query because it will work all the time. This is an error that might not show up in your development environment, but can quietly corrupt your production database all the while saying it is working.
we can override attributes_protected_by_default
class Example < ActiveRecord::Base
def self.attributes_protected_by_default
# default is ["id", "type"]
["type"]
end
end
e = Example.new(:id => 10000)
Actually, it turns out that doing the following works:
p = Post.new(:id => 10, :title => 'Test')
p.save(false)
As Jeff points out, id behaves as if is attr_protected. To prevent that, you need to override the list of default protected attributes. Be careful doing this anywhere that attribute information can come from the outside. The id field is default protected for a reason.
class Post < ActiveRecord::Base
private
def attributes_protected_by_default
[]
end
end
(Tested with ActiveRecord 2.3.5)
Post.create!(:title => "Test") { |t| t.id = 10 }
This doesn't strike me as the sort of thing that you would normally want to do, but it works quite well if you need to populate a table with a fixed set of ids (for example when creating defaults using a rake task) and you want to override auto-incrementing (so that each time you run the task the table is populate with the same ids):
post_types.each_with_index do |post_type|
PostType.create!(:name => post_type) { |t| t.id = i + 1 }
end
Put this create_with_id function at the top of your seeds.rb and then use it to do your object creation where explicit ids are desired.
def create_with_id(clazz, params)
obj = clazz.send(:new, params)
obj.id = params[:id]
obj.save!
obj
end
and use it like this
create_with_id( Foo, {id:1,name:"My Foo",prop:"My other property"})
instead of using
Foo.create({id:1,name:"My Foo",prop:"My other property"})
This case is a similar issue that was necessary overwrite the id with a kind of custom date :
# in app/models/calendar_block_group.rb
class CalendarBlockGroup < ActiveRecord::Base
...
before_validation :parse_id
def parse_id
self.id = self.date.strftime('%d%m%Y')
end
...
end
And then :
CalendarBlockGroup.create!(:date => Date.today)
# => #<CalendarBlockGroup id: 27072014, date: "2014-07-27", created_at: "2014-07-27 20:41:49", updated_at: "2014-07-27 20:41:49">
Callbacks works fine.
Good Luck!.
For Rails 3, the simplest way to do this is to use new with the without_protection refinement, and then save:
Post.new({:id => 10, :title => 'Test'}, :without_protection => true).save
For seed data, it may make sense to bypass validation which you can do like this:
Post.new({:id => 10, :title => 'Test'}, :without_protection => true).save(validate: false)
We've actually added a helper method to ActiveRecord::Base that is declared immediately prior to executing seed files:
class ActiveRecord::Base
def self.seed_create(attributes)
new(attributes, without_protection: true).save(validate: false)
end
end
And now:
Post.seed_create(:id => 10, :title => 'Test')
For Rails 4, you should be using StrongParams instead of protected attributes. If this is the case, you'll simply be able to assign and save without passing any flags to new:
Post.new(id: 10, title: 'Test').save # optionally pass `{validate: false}`
In Rails 4.2.1 with Postgresql 9.5.3, Post.create(:id => 10, :title => 'Test') works as long as there isn't a row with id = 10 already.
you can insert id by sql:
arr = record_line.strip.split(",")
sql = "insert into records(id, created_at, updated_at, count, type_id, cycle, date) values(#{arr[0]},#{arr[1]},#{arr[2]},#{arr[3]},#{arr[4]},#{arr[5]},#{arr[6]})"
ActiveRecord::Base.connection.execute sql

Resources