Mongoid: Creating many objects with a single call - ruby-on-rails

I have 1000 users that i will be retrieving from Twitter, and I would like to save them at one shot, as opposed to doing 1000 insertions individually.
How can I do this on Mongoid? Something like this would rock:
TwitterUser.createMany([{:name=>u1}, {:name=>u2},{:name=>u3}] )

You should use the Mongo ruby driver to do this. You can pass an array of hashes to the insert method to create multiple documents at once (more info on this google groups discussion). Mongoid makes it easy to access the ruby driver.
The code would look something like this:
user_list = twitter_accounts.map do |account|
# create a hash of all the fields to be stored in each document
{ 'name' => account.name,
'username' => account.username
# some other fields...
}
end
Mongoid.master['twitter_users'].insert(user_list)

You almost got it, it's create, not createMany. You can use it like this:
TwitterUser.create([
{ username: "u1", display_name: "Display Name 1" },
{ username: "u2", display_name: "Display Name 2" },
{ username: "u3", display_name: "Display Name 3" }
])
Also, as #bowsersenior points out, it's a good idea to use it with Array#Map:
TwitterUser.create(
#users_array.map do |u|
{ username: u.username, display_name: u.name }
end
)
From the Mongoid#Persistence Docs:
Model.create
Insert a document or multiple documents into the database
Model.create!
Insert a document or multiple documents into the database, raising an error if a validation error occurs.

Just use MongoidModel.create directly.

Related

What is a good way to `update_or_initialize_with` in Mongoid?

Each user has one address.
class User
include Mongoid::Document
has_one :address
end
class Address
include Mongoid::Document
belongs_to :user
field :street_name, type:String
end
u = User.find(...)
u.address.update(street_name: 'Main St')
If we have a User without an Address, this will fail.
So, is there a good (built-in) way to do u.address.update_or_initialize_with?
Mongoid 5
I am not familiar with ruby. But I think I understand the problem. Your schema might looks like this.
user = {
_id : user1234,
address: address789
}
address = {
_id: address789,
street_name: ""
user: user1234
}
//in mongodb(javascript), you can get/update address of user this way
u = User.find({_id: user1234})
u.address //address789
db.address.update({user: u.address}, {street_name: "new_street name"})
//but since the address has not been created, the variable u does not even have property address.
u.address = undefined
Perhaps you can try to just create and attached it manually like this:
#create an address document, to get _id of this address
address = address.insert({street_name: "something"});
#link or attached it to u.address
u.update({address: address._id})
I had this problem recently. There is a built in way but it differs from active records' #find_or_initialize_by or #find_or_create_by method.
In my case, I needed to bulk insert records and update or create if not found, but I believe the same technique can be used even if you are not bulk inserting.
# returns an array of query hashes:
def update_command(users)
updates = []
users.each do |user|
updates << { 'q' => {'user_id' => user._id},
'u' => {'address' => 'address'},
'multi' => false,
'upsert' => true }
end
{ update: Address.collection_name.to_s, updates: updates, ordered: false }
end
def bulk_update(users)
client = Mongoid.default_client
command = bulk_command(users)
client.command command
client.close
end
since your not bulk updating, assuming you have a foreign key field called user_id in your Address collection. You might be able to:
Address.collection.update({ 'q' => {'user_id' => user._id},
'u' => {'address' => 'address'},
'multi' => false,
'upsert' => true }
which will match against the user_id, update the given fields when found (address in this case) or create a new one when not found.
For this to work, there is 1 last crucial step though.
You must add an index to your Address collection with a special flag.
The field you are querying on (user_id in this case)
must be indexed with a flag of either { unique: true }
or { sparse: true }. the unique flag will raise an error
if you have 2 or more nil user_id fields. The sparse option wont.
Use that if you think you may have nil values.
access your mongo db through the terminal
show dbs
use your_db_name
check if the addresses collection already has the index you are looking for
db.addresses.getIndexes()
if it already has an index on user_id, you may want to remove it
db.addresses.dropIndex( { user_id: 1} )
and create it again with the following flag:
db.addresses.createIndex( { user_id: 1}, { sparse: true } )
https://docs.mongodb.com/manual/reference/method/db.collection.update/
EDIT #1
There seems to have changes in Mongoid 5.. instead of User.collection.update you can use User.collection.update_one
https://docs.mongodb.com/manual/reference/method/db.collection.updateOne/
The docs show you need a filter rather than a query as first argument but they seem to be the same..
Address.collection.update_one( { user_id: user_id },
'$set' => { "address": 'the_address', upsert: true} )
PS:
If you only write { "address": 'the_address' } as your update clause without including an update operator such as $set, the whole document will get overwritten rather than updating just the address field.
EDIT#2
About why you may want to index with unique or sparse
If you look at the upsert section in the link bellow, you will see:
To avoid multiple upserts, ensure that the filter fields are uniquely
indexed.
https://docs.mongodb.com/manual/reference/method/db.collection.updateOne/

Storing a collection of mongoid documents with one operation

When using Moped gem, I can store an array of hashes with:
users = [{username: "ben", password: "123456", type: "admin" }, {username: "joe", password: "abcd1234" }]
Mongoid::Sessions.default["collection"].insert(users)
With mongoid documents it would look like:
class User
field :username, type: String
field :password, type: String
end
users.each { |user_hash| User.create(user_hash) }
Which means an insertion operation for each.
Do you know a way to keep the single operation method? Maybe something like a transaction in ActiveRecord?
You can convert Documents back to Hashes and insert them with single call to #create:
User.create(users.map(&:attributes))

Create multiple documents with array

I have the following array:
#unregistered_users = ['my#email.com', 'your#email.com', ...]
Now, I want to create a document for each array element:
#unregistered_users.each do |email_address|
Model.create(email: email_address, user: self.user, detail: self)
end
But it only creates a single document (the first element of the array). The other array elements are simply not created. Why?
We're using Ruby 1.9.3-p385, Rails 3.2.12, MongoID 3.0.0 and MongoDB 2.2.3
Update #1
So, we had a custom _id field with a custom random token using SecureRandom.hex(64).to_i(16).to_s(36)[0..127].
After I removed it worked normally, but with regular mongo ID's (which is not what we want).
Update #2
This is how the token are being generated:
class Model
include Mongoid::Document
include Mongoid::Timestamps
...
field :_id, default: SecureRandom.hex(64).to_i(16).to_s(36)[0..127]
...
index( { _id: 1 }, { unique: true } )
end
Try something like this to check what are the errors on the mongoid model:
#unregistered_users.each do |email_address|
model = Model.create(email: email_address, user: self.user, detail: self)
puts model.errors.inspect unless model.persisted?
end
or use create! to raise an exception and see what's happening

Getting couchrest and couch_potato to recognize existing couchdb documents

I'm trying to create a basic Rails CRUD app against a CouchDB database hosted on Cloudant.
I'm using couch_potato as my persistence layer and have it connecting properly to my Cloudant database.
The issues I'm having is my first model won't see the existing documents in my CouchDB database, unless I add a ruby_class field that equals the name of my model.
My simple User model:
class User
include CouchPotato::Persistence
property :id, :type => Fixnum
property :FullName, :type => String
view :all, :key => :FullName
end
Sample CouchDB document:
{
"_id": 123456,
"_rev": "4-b96f36763934ce7c469abbc6fa05aaf3",
"ORGID": 400638,
"MyOrgToken": "19fc342d50f9d8df1ecd5e5404f5e5f7",
"FullName": "Jane Doe",
"Phone": "555-555-5555",
"MemberNumber": 123456,
"Email": "jane#example.com",
"LoginPWHash": "14a3ccc0e6a50135ef391608e786f4e8"
}
Now, when I use my all view from the rails console, I don't get any results back:
1.9.2-p290 :002 > CouchPotato.database.view User.all
=> []
If I add the field and value "ruby_class: User" to the above CouchDB document, then I get results back in the console:
1.9.2-p290 :003 > CouchPotato.database.view User.all
=> [#<User _id: "123456", _rev: "4-b96f36763934ce7c469abbc6fa05aaf3", created_at: nil,
updated_at: nil, id: "123456", FullName: "Jane Doe">]
I'm working with a large set of customer data, and I don't want to write any scripts to add the ruby_class field to every document (and I may not be permitted to).
How can I get my app to recognize these existing CouchDB documents without adding the ruby_class field?
I couldn't find much documentation for couch_potato and couchrest that shows how to work with existing CouchDB databases. Most of the examples assume you're starting your project and database(s) from scratch.
Thanks,
/floatnspace
when you are looking at the all view of your User you will see something like ruby_class == 'User' so unless you add this property to your documents you will need to work around what couch_potato provides. you could i.e. use couch_rest directly to retrieve your documents, but i don't think that this what you want.
if you start persisting or updating your own documents, couch_potato will add the ruby_class field anyways. so i think the simples solution would be to just add them there.
another thing you can do is create a view that emits the documents also when they DON'T have the property set. this approach will only work if you have just one kind of document in your couchdb:
if(!doc.ruby_class || doc.ruby_class == 'User') {
emit(doc);
}

appending to rake db:seed in rails and running it without duplicating data

Rake db:seed populates your db with default database values for an app right? So what if you already have a seed and you need to add to it(you add a new feature that requires the seed). In my experience, when I ran rake db:seed again, it added the existing content already so existing content became double.
What I need is to add some seeds and when ran, it should just add the newest ones, and ignore the existing seeds. How do I go about with this? (the dirty, noob way I usually do it is to truncate my whole db then run seed again, but that's not very smart to do in production, right?)
A cleaner way to do this is by using find_or_create_by, as follows:
User.find_or_create_by_username_and_role(
:username => "admin",
:role => "admin",
:email => "me#gmail.com")
Here are the possible outcomes:
A record exists with username "admin" and role "admin". This record will NOT be updated with the new e-mail if it already exists, but it will also NOT be doubled.
A record does not exist with username "admin" and role "admin". The above record will be created.
Note that if only one of the username/role criteria are satisfied, it will create the above record. Use the right criteria to ensure you aren't duplicating something you want to remain unique.
I do something like this.... When I need to add a user
in seeds.rb:
if User.count == 0
puts "Creating admin user"
User.create(:role=>:admin, :username=>'blagh', :etc=>:etc)
end
You can get more interesting than that, but in this case, you could run it over again as needed.
Another option that might have a slight performance benefit:
# This example assumes that a role consists of just an id and a title.
roles = ['Admin', 'User', 'Other']
existing_roles = Role.all.map { |r| r.title }
roles.each do |role|
unless existing_roles.include?(role)
Role.create!(title: role)
end
end
I think that doing it this way, you only have to do one db call to get an array of what exists, then you only need to call again if something isn't there and needs to be created.
Adding
from
departments = ["this", "that"]
departments.each{|d| Department.where(:name => d).first_or_create}
to
departments = ["this", "that", "there", "then"]
departments.each{|d| Department.where(:name => d).first_or_create}
this is a simple example,
Updating/rename
from
departments = ["this", "that", "there", "then"]
departments.each{|d| Department.where(:name => d).first_or_create}
to
departments = ["these", "those", "there", "then"]
new_names = [['these', 'this'],['those','that']]
new_names.each do |new|
Department.where(:name => new).group_by(&:name).each do |name, depts|
depts.first.update_column :name, new[0] if new[1] == name # skips validation
# depts[1..-1].each(&:destroy) if depts.size > 1 # paranoid mode
end
end
departments.each{|d| Department.where(:name => d).first_or_create}
IMPORTANT: You need to update the elements of departments array else duplication will surely happen.
Work around: Add a validates_uniqueness_of validation or a validation of uniqueness comparing all necessary attributes BUT don't use methods skipping validations.
My preference for this sort of thing is to create a custom rake task rather than use the seeds.rb file.
If you're trying to bulk create users I'd create a .csv files with the data then create a rake task called import_users and pass it the filename. Then loop through it to create the user records.
In lib/tasks/import_users.rake:
namespace :my_app do
desc "Import Users from a .csv"
task :import_users => :environment do
# loop through records and create users
end
end
Then run like so: rake bundle exec my_app:import_users path/to/.csv
If you need to run it in production: RAILS_ENV=production bundle exec rake my_app:import_users /path/to/.csv
Another alternative is to use the #first_or_create.
categories = [
[ "Category 1", "#e51c23" ],
[ "Category 2", "#673ab7" ]
]
categories.each do |name, color|
Category.where( name: name, color: color).first_or_create
end
A really hackable way would be to comment out the existing data, that's how i did it, and it worked fine for me
=begin
#Commented Out these lines since they where already seeded
PayType.create!(:name => "Net Banking")
PayType.create!(:name => "Coupouns Pay")
=end
#New data to be used by seeds
PayType.create!(:name => "Check")
PayType.create!(:name => "Credit card")
PayType.create!(:name => "Purchase order")
PayType.create!(:name => "Cash on delivery")
Once done just remove the comments
Another trivial alternative:
#categories => name, color
categories = [
[ "Category 1", "#e51c23" ],
[ "Category 2", "#673ab7" ]
]
categories.each do |name, color|
if ( Category.where(:name => name).present? == false )
Category.create( name: name, color: color )
end
end
Just add User.delete_all and for all the models that you have included in your application at the beginning of your seed.rb file. There will not be any duplicate values for sure.

Resources