MongoDB Bulk Insert Performance

MongoDB Bulk Insert Performance - ruby-on-rails

I have the following code in my rails app.
module UserItem
class Rating
include MongoMapper::Document
key :user_id, Integer, :required => true
key :item_id, Integer, :required => true
key :rating, Float, :required => true
end
end
And I have about 10K users and 10K items and i need to store rating of each user for each item, which is about 10^8 records. I have computed the values of 10^8 records into an array as follows
ratings = [
{user_id: 1, item_id: 1, rating: 1.5},
{user_id: 1, item_id: 2, rating: 3.5},
... and so on 10^8 records
]
Now, I need to insert all these 10^8 records computed into mongo. I tried with
UserItem::Rating.collection.insert(ratings)
and
UserItem::Rating.create(ratings)
But it takes hours together to insert the 10^8 records into mongo. Is there any better/efficient way to insert records into mongo?
Context: I am using it more like a cache store which stores all rating values. When I display list of items, I will just read from this cache and display the rating provided by the user alongside each item.
Any help is much appreciated!

One approach is to store one document per user, with a ratings field that is a hash of item ids to users, for example
class UserRating
include MongoMapper::Document
key :ratings
key :user_id
end
UserRating.create(:user_id => 1, :ratings => {"1" => 4, "2" => 3})
You have to use string keys for the hash. This approach doesn't make it easy to retrieve all the ratings for a given document - if you do that a lot it might be easier to store a document per item instead. It's also probably not very efficient if you only ever need a small proportion of a user's ratings at a time.
Obviously you can combine this with other approaches to increasing write throughput, such as batching your inserts or sharding your database.

Related

elasticsearch sort by score - same field searchable but score different?

I'm trying to sort my ES results by 2 fields: searchable and year.
The mapping in my Rails app:
# mapping
def as_indexed_json(options={})
as_json(only: [:id, :searchable, :year])
end
settings index: { number_of_shards: 5, number_of_replicas: 1 } do
mapping do
indexes :id, index: :not_analyzed
indexes :searchable
indexes :year
end
end
The query:
#records = Wine.search(query: {match: {searchable: {query:params[:search], fuzziness:2, prefix_length:1}}}, sort: {_score: {order: :desc}, year: {order: :desc}}, size:100)
The interesting thing in the query:
sort: {_score: {order: :desc}, year: {order: :desc}}
I think the query is working well with the 2 sort params.
My problem is the score is not the same for 2 documents with the same name (searchable field).
For example, I'm searching for "winery":
You can see a very different score, even if the searchable field is the same. I think the issue is due to the ID field (it's an UUID in fact). Looks like this ID field influences the score.
But in my schema mapping, I wrote that ID should not be analyzed and in my ES query, I ask to search ONLY in "searchable" field, not in ID too.
What did I miss to math the same score for same fields ? (actually, sorting by year after score is not useful cos' scores are different for same fields)

Scores are different, because they are calculated independently for each shard. See here for more info.

Rails 4: select multiple attributes from a model instance

How do I fetch multiple attributes from a model instance, e.g.
Resource.first.attributes(:foo, :bar, :baz)
# or
Resource.where(foo: 1).fetch(:foo, :bar, :baz)
rather than returning all the attributes and selecting them manually.

You will use the method slice.
Slice a hash to include only the given keys. Returns a hash containing the given keys.
Your code will be.
Resource.first.attributes.slice("foo", "bar", "baz")
# with .where
Resource.where(foo: 1).select("foo, bar, baz").map(&:attributes)

How about pluck:
Resource.where(something: 1).pluck(:foo, :bar, :baz)
Which translates to the following SQL:
SELECT "resources"."foo", "resources"."bar" FROM, "resources"."baz" FROM "resources"
And returns an array of the specified column values for each of the records in the relation:
[["anc", 1, "M2JjZGY"], ["Idk", 2, "ZTc1NjY"]]
http://guides.rubyonrails.org/active_record_querying.html#pluck
Couple of notes:
Multiple value pluck is supported starting from Rails 4, so if you're using Rails 3 it won't work.
pluck is defined on ActiveRelation, not on a single instnce.
If you want the result to be a hash of attribute name => value for each record you can zip the results by doing something like the following:
attrs = [:foo, :bar, :baz]
Resource.where(something: 1).pluck(*attrs).map{ |vals| attrs.zip(vals).to_h }

To Fetch Multiple has_one or belongs_to Relationships, Not Just Static Attributes.
To fetch multiple relationships, such as has_one or belongs_to, you can use slice directly on the instance, use values to obtain just the values and then manipulate them with a map or collect.
For example, to get the category and author of a book, you could do something like this:
book.slice( :category, :author ).values
#=> #<Category id: 1, name: "Science Fiction", ...>, #<Author id: 1, name: "Aldous Huxley", ...>
If you want to show the String values of these, you could use to_s, like:
book.slice( :category, :author ).values.map( &:to_s )
#=> [ "Science Fiction", "Aldous Huxley" ]
And you can further manipulate them using a join, like:
book.slice( :category, :author ).values.map( &:to_s ).join( "➝" )
#=> "Science Fiction ➝ Aldous Huxley"

How to select unique records based on foreign key column in Rails?

I have the following model structure in my Rails 4.1 application:
delivery_service.rb
class DeliveryService < ActiveRecord::Base
attr_accessible :name, :description, :courier_name, :active, :country_ids
has_many :prices, class_name: 'DeliveryServicePrice', dependent: :delete_all
end
delivery_service_price.rb
class DeliveryServicePrice < ActiveRecord::Base
attr_accessible :code, :price, :description, :min_weight, :max_weight, :min_length, :max_length,
:min_thickness, :max_thickness, :active, :delivery_service_id
belongs_to :delivery_service
end
As you can see, a delivery service has many delivery service prices. I'm trying to retrieve records from the delivery service price table; selecting the record with the lowest price attribute within the unique scope of the foreign key, delivery_service_id (so essentially the cheapest delivery service price per delivery service).
How can I select unique records from a table, with the foreign key attribute as the scope?
I hope I've explained this enough, let me know if you need anymore information.
Thanks
UPDATE #1:
Example of what I'm trying to achieve:
delivery_service_prices table:
id: 1, price: 2.20, delivery_service_id: 1
id: 2, price: 10.58, delivery_service_id: 1
id: 3, price: 4.88, delivery_service_id: 2
id: 4, price: 1.20, delivery_service_id: 2
id: 5, price: 14.99, delivery_service_id: 3
expected results:
id: 1, price: 2.20, delivery_service_id: 1
id: 4, price: 1.20, delivery_service_id: 2
id: 5, price: 14.99, delivery_service_id: 3

Due to PostgreSQL being more strict with abiding the SQL standard (rightly so), it requires a bit of tweaking to get the correct results.
The following query returns the correct results for the lowest delivery service price, per delivery service:
DeliveryServicePrice.select('DISTINCT ON (delivery_service_id) *').order('delivery_service_id, price ASC')
I need to add the delivery_service_id attribute to the order condition, or PostgreSQL throws the following column error:
PG::InvalidColumnReference: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Hope this helps anyone who stumbles upon it!

To get the minimum for a single record you can use
DeliveryServicePrice.where(delivery_service_id: x).order(:price).limit(1).first
or if you have a delivery_service object available
delivery_service.prices.order(:price).limit(1).first
UPDATE
If you want all minimums for all service_delivery_ids you can use a group query
DeliveryServicePrice.group(:delivery_service_id).minimum(:price)
which will get you almost where you want to go
{
1: 2.20,
2: 1.20,
3: 14.99
}
with a hash containing the delivery_service_id and the price. (you can't see the price_id )

How does ORM relate classes/attributes to tables/fields?

I'm trying to understand ORM and looking for examples of the following explanation:
tables map to classes,
rows map to objects,
columns map to object attributes
I understand that tables map to classes, but terminology of rows mapping to objects and columns mapping to object attributes have me confused.

I find actual examples help best. Try this:
Ruby and Rails:
Create the class (Create the table)
class House < ActiveRecord::Base
attr :length, :width, :finish, :price, :available
end
Create an instance (Insert a row)
my_house = House.new(:length => 23, :width => 12, :finish => 'siding',
:price => '$100,000.00', :available => false)
Get an instance (Select a row)
my_house = House.find(1)
puts my_house.length, my_house.width, my_house.price,
my_house.finish, my_house.available?
SQL:
Create the table (Create the class)
create table house(
length Integer,
width Integer,
finish Varchar(255),
price Text,
available Boolean)
# Note this is generic SQL, adapt as needed
# to your implementation - SQLserver, Oracle, mySQL, etc.
Insert a row (Create an instance)
insert into house (length,width,finish,price,available)
values (23, 12, 'siding', '$100,000.00', false)
Select a row (Get an instance)
my_house = select * from house where id = 1

Here appeared some good answer while i was painting, but still hopes my simple one would help:
(since you've tagged your question with 'rails', I apply rails code)
class User < ActiveRecord::Base
attr_accessor :first_name, :email
end
puts User.inspect # => class
puts u = User.create(:first_name => 'name',
:email => 'em#il.com') # => object (class instance)
puts u.name # => object's attribute
DB:

So, using ActiveRecord as the example, let's say you have a table posts, with columns 'id', 'title', 'date'. There are 2 rows in the posts table.
posts = Post.all
posts will be an Array of length 2. Each object in the array is one of the rows, and is an instance of class Post.
posts[0]
This is the first object in the array, which is the representation of the first row in the posts table.
posts[0].title
This will return the title attribute of the first object, which is the title column in the first row of the posts table.
Does that help?

mongoid batch update

I'd like to update a massive set of document on an hourly basis.
Here's the
fairly simple Model:
class Article
include Mongoid::Document
field :article_nr, :type => Integer
field :vendor_nr, :type => Integer
field :description, :type => String
field :ean
field :stock
field :ordered
field :eta
so every hour i get a fresh stock list, where :stock,:ordered and :eta "might" have changed
and i need to update them all.
Edit:
the stocklist contains just
:article_nr, :stock, :ordered, :eta
wich i parse to a hash
In SQL i would have taken the route to foreign keying the article_nr to a "stock" table, dropping the whole stock table, and running a "collection.insert" or something alike
But that approach seems not to work with mongoid.
Any hints? i can't get my head around collection.update
and changing the foreign key on belongs_to and has_one seems not to work
(tried it, but then Article.first.stock was nil)
But there has to be a faster way than iterating over the stocklist array of hashes and doing
something like
Article.where( :article_nr => stocklist['article_nr']).update( stock: stocklist['stock'], eta: stocklist['eta'],orderd: stocklist['ordered'])

UPDATING
You can atomically update multiple documents in the database via a criteria using Criteria#update_all. This will perform an atomic $set on all the attributes passed to the method.
# Update all people with last name Oldman with new first name.
Person.where(last_name: "Oldman").update_all(
first_name: "Pappa Gary"
)
Now I can understood a bit more. You can try to do something like that, assuming that your article nr is uniq.
class Article
include Mongoid::Document
field :article_nr
field :name
key :article_nr
has_many :stocks
end
class Stock
include Mongoid::Document
field :article_id
field :eta
field :ordered
belongs_to :article
end
Then you when you create stock:
Stock.create(:article_id => "123", :eta => "200")
Then it will automaticly get assign to article with article_nr => "123"
So you can always call last stock.
my_article.stocks.last
If you want to more precise you add field :article_nr in Stock, and then :after_save make new_stock.article_id = new_stock.article_nr
This way you don't have to do any updates, just create new stocks and they always will be put to correct Article on insert and you be able to get latest one.

If you can extract just the stock information into a separate collection (perhaps with a has_one relationship in your Article), then you can use mongoimport with the --upsertFields option, using article_nr as your upsertField. See http://www.mongodb.org/display/DOCS/Import+Export+Tools.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

MongoDB Bulk Insert Performance - ruby-on-rails

Related

elasticsearch sort by score - same field searchable but score different?

Rails 4: select multiple attributes from a model instance

How to select unique records based on foreign key column in Rails?

How does ORM relate classes/attributes to tables/fields?

mongoid batch update

Categories

Resources