mongoid batch update - ruby-on-rails

I'd like to update a massive set of document on an hourly basis.
Here's the
fairly simple Model:
class Article
include Mongoid::Document
field :article_nr, :type => Integer
field :vendor_nr, :type => Integer
field :description, :type => String
field :ean
field :stock
field :ordered
field :eta
so every hour i get a fresh stock list, where :stock,:ordered and :eta "might" have changed
and i need to update them all.
Edit:
the stocklist contains just
:article_nr, :stock, :ordered, :eta
wich i parse to a hash
In SQL i would have taken the route to foreign keying the article_nr to a "stock" table, dropping the whole stock table, and running a "collection.insert" or something alike
But that approach seems not to work with mongoid.
Any hints? i can't get my head around collection.update
and changing the foreign key on belongs_to and has_one seems not to work
(tried it, but then Article.first.stock was nil)
But there has to be a faster way than iterating over the stocklist array of hashes and doing
something like
Article.where( :article_nr => stocklist['article_nr']).update( stock: stocklist['stock'], eta: stocklist['eta'],orderd: stocklist['ordered'])

UPDATING
You can atomically update multiple documents in the database via a criteria using Criteria#update_all. This will perform an atomic $set on all the attributes passed to the method.
# Update all people with last name Oldman with new first name.
Person.where(last_name: "Oldman").update_all(
first_name: "Pappa Gary"
)
Now I can understood a bit more. You can try to do something like that, assuming that your article nr is uniq.
class Article
include Mongoid::Document
field :article_nr
field :name
key :article_nr
has_many :stocks
end
class Stock
include Mongoid::Document
field :article_id
field :eta
field :ordered
belongs_to :article
end
Then you when you create stock:
Stock.create(:article_id => "123", :eta => "200")
Then it will automaticly get assign to article with article_nr => "123"
So you can always call last stock.
my_article.stocks.last
If you want to more precise you add field :article_nr in Stock, and then :after_save make new_stock.article_id = new_stock.article_nr
This way you don't have to do any updates, just create new stocks and they always will be put to correct Article on insert and you be able to get latest one.

If you can extract just the stock information into a separate collection (perhaps with a has_one relationship in your Article), then you can use mongoimport with the --upsertFields option, using article_nr as your upsertField. See http://www.mongodb.org/display/DOCS/Import+Export+Tools.

Related

MongoDB conditional aggregate query on a HABTM relationship (Mongoid, RoR)?

Rails 4.2.5, Mongoid 5.1.0
I have three models - Mailbox, Communication, and Message.
mailbox.rb
class Mailbox
include Mongoid::Document
belongs_to :user
has_many :communications
end
communication.rb
class Communication
include Mongoid::Document
include Mongoid::Timestamps
include AASM
belongs_to :mailbox
has_and_belongs_to_many :messages, autosave: true
field :read_at, type: DateTime
field :box, type: String
field :touched_at, type: DateTime
field :import_thread_id, type: Integer
scope :inbox, -> { where(:box => 'inbox') }
end
message.rb
class Message
include Mongoid::Document
include Mongoid::Timestamps
attr_accessor :communication_id
has_and_belongs_to_many :communications, autosave: true
belongs_to :from_user, class_name: 'User'
belongs_to :to_user, class_name: 'User'
field :subject, type: String
field :body, type: String
field :sent_at, type: DateTime
end
I'm using the authentication gem devise, which gives access to the current_user helper, which points at the current user logged in.
I have built a query for a controller that satisfied the following conditions:
Get the current_user's mailbox, whose communication's are filtered by the box field, where box == 'inbox'.
It was constructed like this (and is working):
current_user.mailbox.communications.where(:box => 'inbox')
My issue arrises when I try to build upon this query. I wish to chain queries so that I only obtain messages whose last message is not from the current_user. I am aware of the .last method, which returns the most recent record. I have come up with the following query but cannot understand what would need to be adjusted in order to make it work:
current_user.mailbox.communications.where(:box => 'inbox').where(:messages.last.from_user => {'$ne' => current_user})
This query produces the following result:
undefined method 'from_user' for #<Origin::Key:0x007fd2295ff6d8>
I am currently able to accomplish this by doing the following, which I know is very inefficient and want to change immediately:
mb = current_user.mailbox.communications.inbox
comms = mb.reject {|c| c.messages.last.from_user == current_user}
I wish to move this logic from ruby to the actual database query. Thank you in advance to anyone who assists me with this, and please let me know if anymore information is helpful here.
Ok, so what's happening here is kind of messy, and has to do with how smart Mongoid is actually able to be when doing associations.
Specifically how queries are constructed when 'crossing' between two associations.
In the case of your first query:
current_user.mailbox.communications.where(:box => 'inbox')
That's cool with mongoid, because that actually just desugars into really 2 db calls:
Get the current mailbox for the user
Mongoid builds a criteria directly against the communication collection, with a where statement saying: use the mailbox id from item 1, and filter to box = inbox.
Now when we get to your next query,
current_user.mailbox.communications.where(:box => 'inbox').where(:messages.last.from_user => {'$ne' => current_user})
Is when Mongoid starts to be confused.
Here's the main issue: When you use 'where' you are querying the collection you are on. You won't cross associations.
What the where(:messages.last.from_user => {'$ne' => current_user}) is actually doing is not checking the messages association. What Mongoid is actually doing is searching the communication document for a property that would have a JSON path similar to: communication['messages']['last']['from_user'].
Now that you know why, you can get at what you want, but it's going to require a little more sweat than the equivalent ActiveRecord work.
Here's more of the way you can get at what you want:
user_id = current_user.id
communication_ids = current_user.mailbox.communications.where(:box => 'inbox').pluck(:_id)
# We're going to need to work around the fact there is no 'group by' in
# Mongoid, so there's really no way to get the 'last' entry in a set
messages_for_communications = Messages.where(:communications_ids => {"$in" => communications_ids}).pluck(
[:_id, :communications_ids, :from_user_id, :sent_at]
)
# Now that we've got a hash, we need to expand it per-communication,
# And we will throw out communications that don't involve the user
messages_with_communication_ids = messages_for_communications.flat_map do |mesg|
message_set = []
mesg["communications_ids"].each do |c_id|
if communication_ids.include?(c_id)
message_set << ({:id => mesg["_id"],
:communication_id => c_id,
:from_user => mesg["from_user_id"],
:sent_at => mesg["sent_at"]})
end
message_set
end
# Group by communication_id
grouped_messages = messages_with_communication_ids.group_by { |msg| mesg[:communication_id] }
communications_and_message_ids = {}
grouped_messages.each_pair do |k,v|
sorted_messages = v.sort_by { |msg| msg[:sent_at] }
if sorted_messages.last[:from_user] != user_id
communications_and_message_ids[k] = sorted_messages.last[:id]
end
end
# This is now a hash of {:communication_id => :last_message_id}
communications_and_message_ids
I'm not sure my code is 100% (you probably need to check the field names in the documents to make sure I'm searching through the right ones), but I think you get the general pattern.

Indexing fields + custom text in with Thinking Sphinx

I've got indexes on a few different models, and sometimes the user might search for a value which exists in multiple models. Now, if the user is really only interested in data from one of the models I'd like the user to be able to pre/postfix the query with something to limit the scope.
For instance, if I only want to find a match in my Municipality model, I've set up an index in that model so that the user now can query "xyz municipality" (in quotes):
define_index do
indexes :name, :sortable => true
indexes "name || ' municipality' name", :as => :extended_name, :type => :string
end
This works just fine. Now I also have a Person model, with a relation to Municipality. I'd like, when searching only on the Person model, to have the same functionality available, so that I can say Person.search("xyz municipality") and get all people connected to that municipality. This is my current definition in the Person model:
has_many :municipalities, :through => :people_municipalities
define_index do
indexes [lastname, firstname], :as => :name, :sortable => true
indexes municipalities.name, :as => :municipality_name, :sortable => true
end
But is there any way I can create an index on this model, referencing municipalities, like the one I have on the Municipality model itself?
If you look at the generated SQL in the sql_query setting of config/development.sphinx.conf for source person_core_0, you'll see how municipalities.name is being concatenated together (I'd post an example, but it depends on your database - MySQL and PostgreSQL handle this completely differently).
I would recommend duplicating the field, and insert something like this (SQL is pseudo-code):
indexes "GROUP_CONCAT(' municipality ' + municipalities.name)",
:as => :extended_municipality_names
Also: there's not much point adding :sortable true to either this nor the original field from the association - are you going to sort by all of the municipality names concat'd together? I'm guessing not :)

Mongoid and querying for embedded locations?

I have a model along the lines of:
class City
include Mongoid::Document
field :name
embeds_many :stores
index [["stores.location", Mongoid::GEO2D]]
end
class Store
include Mongoid::Document
field :name
field :location, :type => Array
embedded_in :cities, :inverse_of => :stores
end
Then I tried calling something like City.stores.near(#location).
I want to query the City collection to return all cities that have at least 1 Store in a nearby location. How should I set up the index? What would be the fastest call?
I read the Mongoid documentation with using index [[:location, Mongo::GEO2D]] but I am not sure how this applies to an embedded document, or how to only fetch the City and not all the Stop documents.
Mike,
The feature you are requesting is called multi-location documents. It is not supported in the current stable release 1.8.2. This is available only from version 1.9.1.
And Querying is straightforward when use mongoid, its like this
City.near("stores.location" => #location)
And be careful when using near queries in multi-location documents, because the same document may be returned multiple times, since $near queries return ordered results by distance. You can read more about this here.
Use $within query instead to get the correct results
Same query written using $within and $centerSphere
EARTH_RADIUS = 6371
distance = 5
City.where("stores.location" => {"$within" => {"$centerSphere" => [#location, (distance.fdiv EARTH_RADIUS)]}})

Referencing foreign keys with DataMapper

I've been looking through this:
http://datamapper.org/docs/find
But haven't been able to gleam what I'm looking for, though I know it's quite simple.
I have two tables, scans and stations, with the relevant fields:
STATIONS - id (primary key), name
SCANS - id (primary key), item_id, in_station, out_station
Where in_station and out_station are foreign keys to the id field in the stations table.
I have a Scan object
class Scan
include DataMapper::Resource
property :id, Integer, :key => true
property :item_id, Integer
property :in_station, Integer
property :out_station, Integer
end
So right now, I can do Scan.all(:item_id => #barcode) to get all the scans on a particular item, and I've got the in_station id and out_station id. What's the best way of getting the names, though, instead of ids. I assume it's gotta be easier than for every scan calling Station.get(:id=> scan.in_station).
This is easy enough using SQL, but how can I alter Scan/Station to either get the name or have a property that's a Station object, so I can do something like scan.station.name?
EDIT:
I've almost got this working. I have a Station class:
class Station
include DataMapper::Resource
property :id, Integer, :key => true
property :name, String
end
and I got rid of property :in_station and property :out_station in Scan and replaced with:
belongs_to :in_station, :model => 'Station', :child_key => 'id'
belongs_to :out_station, :model => 'Station', :child_key => 'id'
Which I think/hope is saying "there's a field called in_station which is a foreign key into the Station table and one called out_station which is the same". Indeed, in_station and out_station are now instances of Station, BUT, they're the object. Even though in_station and out_station are different values, I'm getting the same object for each on every Scan. What am I doing wrong, how can I indicate that in_station and out_station are both references to Station but, when their ids are different, I expect different objects.
How about doing this:
class Station
include DataMapper::Resource
property :id, Serial
# rest of the properties
has n, :scans
end
class Scan
include DataMapper::Resource
property :id, Serial
# rest of the properties
belongs_to :station
end
Then you just do this to access the associated station:
station = Station.create
scan = station.scans.create
scan.station # returns the associated station
That should work for you at match your schema.
The assumption is that we don't want to change the underlying SQL schema. So we have to tell DataMapper to use the existing foreign key names (in_station and out_station). The twist is that DataMapper will choke if the association name is the same as the child key. That's why I have the 'my_' prefix on the association names.
class Scan
include DataMapper::Resource
#rest of the properties
belongs_to :my_in_station, :model => 'Station', :child_key => 'in_station'
belongs_to :my_out_station, :model => 'Station', :child_key => 'out_station'
end
Usage
s = Scan.get(id)
s.my_in_station.name
s.my_out_station.name

How can I get a unique :group of a virtual attribute in rails?

I have several similar models ContactEmail, ContactLetter, etcetera.
Each one belongs_to a Contact
Each contact belongs_to a Company
So, what I did was create a virtual attribute for ContactEmail:
def company_name
contact = Contact.find_by_id(self.contact_id)
return contact.company_name
end
Question: How can I get an easy list of all company_name (without duplicates) if I have a set of ContactEmails objects (from a find(:all) method, for example)?
When I try to do a search on ContactEmail.company_name using the statistics gem, for example, I get an error saying that company_name is not a column for ContactEmail.
Assuming your ContactEmail set is in #contact_emails (untested):
#contact_emails.collect { |contact_email| contact_email.company_name }.uniq
You don't need the virtual attribute for this purpose though. ActiveRecord sets up the relationship automatically based on the foreign key, so you could take the company_name method out of the ContactEmail model and do:
#contact_emails.collect { |contact_email| contact_email.contact.company_name }.uniq
Performance could be a consideration for large sets, so you might need to use a more sophisticated SQL query if that's an issue.
EDIT to answer your 2nd question
If company_name is a column, you can do:
ContactEmail.count(:all, :joins => :contact, :group => 'contact.company_name')
On a virtual attribute I think you'd have to retrieve the whole set and use Ruby (untested):
ContactEmail.find(:all, :joins => :contact, :select => 'contacts.company_name').group_by(&:company_name).inject({}) {|hash,result_set| hash.merge(result_set.first=>result_set.last.count)}
but that's not very kind to the next person assigned to maintain your system -- so you're better off working out the query syntax for the .count version and referring to the column itself.

Resources