Rails duplicate key error for mongodb in batch insert - ruby-on-rails

On my Rails4.2 app using MongoDB v5, Well I have some data in this array format which I have to insert in the database:
array_to_be_inserted = [
{'unique_key' => '12234'},
{'unique_key' => '3214'},
{'unique_key' => '32142'}
]
SomeModel.create(array_to_be_inserted) //For inserting
In the database I already have lets say '12234' unique key, so this throws an exception and the code stops, and the remaining data does not get inserted (i.e. 3214 and 32142 keys will not get inserted even though they are not present in the database). Even if I do rescue Exception , the code continues but the insert still fails.
Is there any way to get around this which can be instant ?
I already tried to make an array of those unique keys and did SomeModel.in(array_of_unique_keys) and then filtered out the array so that the new array becomes this:
array_to_be_inserted = [
{'unique_key' => '3214'},
{'unique_key' => '32142'}
]
I use this for array filter:
array_to_be_inserted = array_to_be_inserted.select { |x| existing_data.none? { |y| x['unique_key'] == y['unique_key'] } }
But the problem is that, that array filter takes time and memory too and meanwhile some other person stores that unique key already and code fails again once again.
I need something which can be instant, like can be done in one single query, for example in MySQL we can simply do INSERT IGNORE INTO, isn't there something for this which is fast ?

I have been facing a similar issue with a Rails application and a postgresql db. I need to insert millions of rows and if I try to check the existing primary keys my app crashes.
I am using bulk_insert gem which allows the option ignore: true. However, postgresql syntax does not support the 'INSERT IGNORE' statement, so it was quite an useless option in my case.
destination_columns = [:title, :author]
# Ignore bad inserts in the batch
Book.bulk_insert(*destination_columns, ignore: true) do |worker|
worker.add(...)
worker.add(...)
# ...
end
If you are using activerecord-import you can refer to this blog post:
Author.import(
[:name, :key],
rows_to_import_second,
on_duplicate_key_update: [:name],
validate: false
)
However, it seems that this gem also doesn't have an adapter for mongo (check the source code).
I guess that a combination of these two SO answers might fix your problem:
Insert many mongo - continue on error ruby / Ruby Mongo equivalent for mysql insert ignore
Mongoid ignores collection.insert if at least one duplicate exists

Related

Active record querying with store_accesors

I have a database where I need to do a search on specific record that has a certain output. What is making this tricky for me is that these values are found in a 'store_accessor' and therefore they aren't always there.
For instance if I run Team.last.team_configuration, I get this value below, and what I need are only teams that have a specific setting.
<TeamConfiguration:0x00007123456987> {
:id => 8,
:owner_id => 6,
:team_type => "football",
:settings => {
"disable_coach_add" => false,
"delink_players_at_18" => true
},
:type => "TeamConfiguration"
}
My thoughts have been something around these lines, but i keep getting undefined method 'settings' for team_configuration:Symbol
Team.where(:team_configuration.settings['delink_players_at_18'])
Would anyone know what I am doing wrong in this instance? I think because there are two separations from the main source it has been causing me some issues. Thanks in advance!
The problem is way store_accesors works, look what documentation says:
Store gives you a thin wrapper around serialize for the purpose of
storing hashes in a single column. It's like a simple key/value store
baked into your record when you don't care about being able to query
that store outside the context of a single record.
https://api.rubyonrails.org/classes/ActiveRecord/Store.html
So a posible solution could be to search by that column, converting a previously hash of what you want to string.
Team.where(team_configuration: data.to_s)
If you're using a postgres database and the TeamConfiguration#settings column is serialized as jsonb column you can get at this with postgres json operators:
Team.joins(:team_configurations)
.where("team_configurations.settings #> '{\"delink_players_at_18\": true}'")

Simple Rails array not saving to array column

I'm working with Rails 4.1.0, Ruby 2.1.0, and Postgres 9.3.0.0.
I'm trying to save changes to a column which is an array of hstores (an array of hashes in ruby parlance).
Here's the (simplified) code in the product model, used for saving the changes:
def add_to_cart_with_credits(cart)
cart.added_product_hashes << {"id" => self.id.to_s, "method" => "credits"}
# For some reason, this isn't saving (without the exclamation, as well)
cart.save!
end
A few things of note:
The cart is initialised with an empty array in the added_product_hashes column
I'm storing added products as hashes because there are a couple of ways to add products to the cart, and they each require their own logic to remove and customise.
Each user has their own cart, and needs to reference it later, which is why carts are saved to the DB like this (instead of using a session variable, I guess).
Any ideas what I'm doing wrong? I'm not seeing an error: the server logs note that the cart.added_product_hashes column updates correctly, but the changes don't persist.
SOLUTION
As James pointed out, << doesn't flag the record as being dirty, as it edits the array in-place. While I wasn't changing the hstores themselves within the array column, it appears that changes to the enclosing array aren't picked up unless the attribute is explicitly reconstructed. The below code fixes the problem:
def add_to_cart_with_credits(cart)
cart.added_product_hashes = cart.added_product_hashes + [{"id" => self.id.to_s, "method" => "credits"}]
# Now we're all good to go!
cart.save!
end
James also suggests a particular method that would be more terse.
See "New data not persisting to Rails array column on Postgres"
ActiveRecord isn't recognizing the change to the array as attributes are being updated in place.
You can also do something like this:
cart.added_product_hashes_will_change!

New data not persisting to Rails array column on Postgres

I have a user model with a friends column of type text. This migration was ran to use the array feature with postgres:
add_column :users, :friends, :text, array: true
The user model has this method:
def add_friend(target)
#target would be a value like "1234"
self.friends = [] if self.friends == nil
update_attributes friends: self.friends.push(target)
end
The following spec passes until I add user.reload after calling #add_friend:
it "adds a friend to the list of friends" do
user = create(:user, friends: ["123","456"])
stranger = create(:user, uid: "789")
user.add_friend(stranger.uid)
user.reload #turns the spec red
user.friends.should include("789")
user.friends.should include("123")
end
This happens in development as well. The model instance is updated and has the new uid in the array, but once reloaded or reloading the user in a different action, it reverts to what it was before the add_friend method was called.
Using Rails 4.0.0.rc2 and pg 0.15.1
What could this be?
I suspect that ActiveRecord isn't noticing that your friends array has changed because, well, the underlying array reference doesn't change when you:
self.friends.push(target)
That will alter the contents of the array but the array itself will still be the same array. I know that this problem crops up with the postgres_ext gem in Rails3 and given this issue:
String attribute isn't marked as dirty, when it changes with <<
I'd expect Rails4 to behave the same way.
The solution would be to create a new array rather than trying to modify the array in-place:
update_attributes friends: self.friends + [ target ]
There are lots of ways to create a new array while adding an element to an existing array, use whichever one you like.
It looks like the issue might be your use of push, which modifies the array in place.
I can't find a more primary source atm but this post says:
One important thing to note when interacting with array (or other mutable values) on a model. ActiveRecord does not currently track "destructive", or in place changes. These include array pushing and poping, advance-ing DateTime objects. If you want to use a "destructive" update, you must call <attribute>_will_change! to let ActiveRecord know you changed that value.
If you want to use Postgresql array type, you'll have to comply with its format. From Postgresql docs the input format is
'{10000, 10000, 10000, 10000}'
which is not what friends.to_s will return. In ruby:
[1,2,3].to_s => "[1,2,3]"
That is, brackets instead of braces. You'll have to do the conversion yourself.
However I'd much rather rely on ActiveRecord serialize (see serialize). The database does not need to know that the value is actually an array, that's your domain model leaking into your database. Let Rails do its thing and encapsulate that information; it already knows how to serialize/deserialize the value.
Note: This response is applicable to Rails 3, not 4. I'll leave here in case it helps someone in the future.

Batch insertion in rails 3

I want to do a batch insert of few thousand records into the database (POSTGRES in my case) from within my Rails App.
What would be the "Rails way" of doing it?
Something which is fast and also correct way of doing it.
I know I can create the SQL query by string concatenation of the attributes but I want a better approach.
ActiveRecord .create method supports bulk creation. The method emulates the feature if the DB doesn't support it and uses the underlying DB engine if the feature is supported.
Just pass an array of options.
# Create an Array of new objects
User.create([{ :first_name => 'Jamie' }, { :first_name => 'Jeremy' }])
Block is supported and it's the common way for shared attributes.
# Creating an Array of new objects using a block, where the block is executed for each object:
User.create([{ :first_name => 'Jamie' }, { :first_name => 'Jeremy' }]) do |u|
u.is_admin = false
end
I finally reached a solution after the two answers of #Simone Carletti and #Sumit Munot.
Until the postgres driver supports the ActiveRecord .create method's bulk insertion, I would like to go with activerecord-import gem. It does bulk insert and that too in a single insert statement.
books = []
10.times do |i|
books << Book.new(:name => "book #{i}")
end
Book.import books
In POSTGRES it lead to a single insert statemnt.
Once the postgres driver supports the ActiveRecord .create method's bulk insertion in a single insert statement, then #Simone Carletti 's solution makes more sense :)
You can create a script in your rails model, write your queries to insert in that script
In rails you can run the script using
rails runner MyModelName.my_method_name
Is the best way that i used in my project.
Update:
I use following in my project but it is not proper for sql injection.
if you are not using user input in this query it may work for you
user_string = " ('a#ao.in','a'), ('b#ao.in','b')"
User.connection.insert("INSERT INTO users (email, name) VALUES"+user_string)
For Multiple records:
new_records = [
{:column => 'value', :column2 => 'value'},
{:column => 'value', :column2 => 'value'}
]
MyModel.create(new_records)
You can do it the fast way or the Rails way ;) The best way in my experience to import bulk data to Postgres is via CSV. What will take several minutes the Rails way will take several seconds using Postgres' native CSV import capability.
http://www.postgresql.org/docs/9.2/static/sql-copy.html
It even triggers database triggers and respects database constraints.
Edit (after your comment):
Gotcha. In that case you have correctly described your two options. I have been in the same situation before, implemented it using the Rails 1000 save! strategy because it was the simplest thing that worked, and then optimized it to the 'append a huge query string' strategy because it was an order of magnitude better performing.
Of course, premature optimization is the root of all evil, so perhaps do it the simple slow Rails way, and know that building a big query string is a perfectly legit technique for optimization at the expense of maintainabilty. I feel your real question is 'is there a Railsy way that doesn't involve 1000's of queries?' - unfortunately the answer to that is no.

Using and Editing Class Variables in Ruby?

So I've done a couple of days worth of research on the matter, and the general consensus is that there isn't one. So I was hoping for an answer more specific to my situation...
I'm using Rails to import a file into a database. Everything is working regarding the import, but I'm wanting to give the database itself an attribute, not just every entry. I'm creating a hash of the file, and I figured it'd be easiest to just assign it to the database (or the class).
I've created a class called Issue (and thus an 'issues' database) with each entry having a couple of attributes. I was wanting to figure out a way to add a class variable (at least, that's what I think is the best option) to Issue to simply store the hash. I've written a rake to import the file, iff the new file is different than the previous file imported (read, if the hash's are different).
desc "Parses a CSV file into the 'issues' database"
task :issues, [:file] => :environment do |t, args|
md5 = Digest::MD5.hexdigest(args[:file])
puts "1: Issue.md5 = #{Issue.md5}"
if md5 != Issue.md5
Issue.destroy_all()
#import new csv file
CSV.foreach(args[:file]) do |row|
issue = {
#various attributes to be columns...
}
Issue.create(issue)
end #end foreach loop
Issue.md5 = md5
puts "2: Issue.md5 = #{Issue.md5}"
end #end if statement
end #end task
And my model is as follows:
class Issue < ActiveRecord::Base
attr_accessible :md5
##md5 = 5
def self.md5
##md5
end
def self.md5=(newmd5)
##md5 = newmd5
end
attr_accessible #various database-entry attributes
end
I've tried various different ways to write my model, but it all comes down to this. Whatever I set the ##md5 in my model, becomes a permanent change, almost like a constant. If I change this value here, and refresh my database, the change is noted immediately. If I go into rails console and do:
Issue.md5 # => 5
Issue.md5 = 123 # => 123
Issue.md5 # => 123
But this change isn't committed to anything. As soon as I exit the console, things return to "5" again. It's almost like I need a .save method for my class.
Also, in the rake file, you see I have two print statements, printing out Issue.md5 before and after the parse. The first prints out "5" and the second prints out the new, correct hash. So Ruby is recognizing the fact that I'm changing this variable, it's just never saved anywhere.
Ruby 1.9.3, Rails 3.2.6, SQLite3 3.6.20.
tl;dr I need a way to create a class variable, and be able to access it, modify it, and re-store it.
Fixes please? Thanks!
There are a couple solutions here. Essentially, you need to persist that one variable: Postgres provides a key/value store in the database, which would be most ideal, but you're using SQLite so that isn't an option for you. Instead, you'll probably need to use either redis or memcached to persist this information into your database.
Either one allows you to persist values into a schema-less datastore and query them again later. Redis has the advantage of being saved to disk, so if the server craps out on you you can get the value of md5 again when it restarts. Data saved into memcached is never persisted, so if the memcached instance goes away, when it comes back md5 will be 5 once again.
Both redis and memcached enjoy a lot of support in the Ruby community. It will complicate your stack slightly installing one, but I think it's the best solution available to you. That said, if you just can't use either one, you could also write the value of md5 to a temporary file on your server and access it again later. The issue there is that the value then won't be shared among all your server processes.

Resources