Elasticsearch with Tire on Rails bulk import & indexing issue - ruby-on-rails

I've a rails app with a full-text search based on Elasticsearch and Tire, it is already working on a MongoDB model called Category, but now I want to add a more complex search based on MongoID Embedded 1-n model User which embeds_many :watchlists
Now I have to bulk import and indexing all the field in Watchlist, and I'd like to know :
how can I do that ?
can index just the watchlists children fields, without the user parents fields ?
The Embedded 1-N MongoDB/MongoID model looks like the following :
app/models/user.rb ( the parent ) :
class User
include Mongoid::Document
include Tire::Model::Search
include Tire::Model::Callbacks
index_name 'users'
field :nickname
field ... many others
embeds_many :watchlists
end
app/models/watchlist.rb ( the embedded "many" childrens ) :
class Watchlist
include Mongoid::Document
include Tire::Model::Search
include Tire::Model::Callbacks
index_name 'watchlists'
field :html_url
embedded_in :user
end
Any suggestion on how to accomplish the task ?
UPDATE:
here it is a chunk of the model seen with mongo shell
> user = db.users.findOne({'nickname': 'lgs'})
{
"_id" : ObjectId("4f76a16cf2a6a12f88cbca43"),
"encrypted_password" : "",
"sign_in_count" : 0,
"provider" : "github",
"uid" : "1573",
"name" : "Luca G. Soave",
"email" : "luca.soave#gmail.com",
"nickname" : "lgs",
"watchlists" : [
{
"_id" : ObjectId("4f76997f1d41c81173000002"),
"tags_array" : [ git, peristence ],
"html_url" : "https://github.com/mojombo/grit",
"description" : "Grit gives you object oriented read/write access to Git repositories via Ruby.",
"fork_" : false,
"forks" : 207,
"watchers" : 1258,
"created_at" : ISODate("2007-10-29T14:37:16Z"),
"pushed_at" : ISODate("2012-01-27T01:05:45Z"),
"avatar_url" : "https://secure.gravatar.com/avatar/25c7c18223fb42a4c6ae1c8db6f50f9b?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png"
},
...
...
}
I'd like to index & query any fields owned by the embedded child watchlists doc :
... "tags_array", "html_url", "description", "forks"
but I don't want elasticsearch to include the parent user fields :
... "uid", "name", "email", "nickname"
so that when I query for "git persistence", it will look into each 'watchlists' indexed fields of each 'user' of the original MongoDB.
(sorry for mismatching singular and plurals here, I was just indicating the doc object names)

It really depends on how you want to serialize your data for the search engine, based on how you want to query them. Please update the question and I'll update the answer. (Also, it's better to just remove the ES logs, they are not relevant here.)
I'm not sure how the Rake task works with embedded documents in Mongo, and also why it seems to "hang" at the end. Is your data in the "users" index when you run the task?
Notice that it's quite easy to provide your own indexing code, when the Rake task is not flexible enough. See the Tire::Index#import integration tests.

Related

Delete a field in a document in mongodb - Rails + Mongoid

I want to delete a field in a document using ROR.
I have already tried
book.remove_attribute(:name)
book.unset(:name)
But they both set the attribute to nil and it is still present in the object.
I want it to vanish from my document. Any help is welcome.
When you access a document via mongoid, it returns you a Ruby object. You can actually see the data stored in the document only via mongo shell (just type 'mongo' in you terminal).
The object is created by Mongoid (MongoDB ODM/wrapper for rails). This object may occasionally look different from the document.
For example
When you unset a field, that field is entirely removed from that document. BUT, since your model still has that field on it, MONGOID returns you a nil attribute for that field, instead of giving you different number of fields for objects of same model.
Model book.rb
class Book
include Mongoid::Document
field :name
field :author
end
In rails console, type
Book.create(name: "b1", author: "a1")
=> #<Book _id: 555231746c617a1c99030000, name: "b1", author: "a1">
In Mongo shell
db.books.find()
{ "_id" : ObjectId("555231746c617a1c99030000"), "name" : "b1", "author" : "a1" }
Now, we unset.
In rails console
Book.first.unset(:name)
=> #<Book _id: 555231746c617a1c99030000, name: nil, author: "a1">
In Mongo shell
db.books.find()
{ "_id" : ObjectId("555231746c617a1c99030000"), "author" : "a1" }
If however you still dont want to see the field in your rails console (mind you, this is not taking up any extra space in db) you can always remove the field from the model. If you do that, you will no longer be able to access this field through rails/mongoid on any object. It will only be present on the document and accessible through mongo shell.

rails mongoid find parent with child

Fast Example,
class Band
include Mongoid::Document
embeds_many :albums
end
class Album
include Mongoid::Document
field :name, type: String
embedded_in :band
end
and the document will look like this,
{
"_id" : ObjectId("4d3ed089fb60ab534684b7e9"),
"albums" : [
{
"_id" : ObjectId("4d3ed089fb60ab534684b7e0"),
"name" : "Violator",
}
]
}
lets say, i want to make a method to find the Band with albums name
if this was ActiveRecord, it is simple
Album.find_by(name: "Violator").band
but what about like this situation?
Do i have to iterate the whole collection and find it like this?
Band.select {|band| band.albums.select{|album| album.name == "Violator"}}
Sounds crazy...
Or do i have to do the data modeling with Referenced relations not Embedded relations?
Embedded documents are best for items which don't need to query independently. If you need something to query independently, then consider using references. In your case, you can better find bands first by using specific album name and then process these bands
#bands = Band.where("albums.name" => "Violator")
#albums = #bands.collect{|band| band.albums.where(name: 'Violator') }.flatten
Here are more details on mongoid relations http://mongoid.org/en/mongoid/docs/relations.html

How to create model with an array field which contains another documents as an embedded documents in Mongodb (Mongoid)

I am using Rails 4 with Mongoid for an event based application.
I am trying to create a model where I want to add an array field with embedded documents in that array. This embedded documents will contain user's geo coordinate and timestamp. After every 5 minutes I will be pushing user's latest coordinates to user's (location) array. can someone please help me, How can i create that.
My sample model and desired documents are as below.
class User
include Mongoid::Document
field :name, type: String
field :locations, type: Array
end
Here I want to push
Here is sample document that I am looking for as a result:
{ _id : ObjectId(...),
name : "User_name",
locations : [ {
_id : ObjectID(...),
time : "...." ,
loc : [ 55.5, 42.3 ]
} ,
{
_id : ObjectID(...),
time : "...",
loc : [ -74 , 44.74 ]
}
]
}
I was able to add the value in location array without embedded document through IRB, but as I will be using MongoDB's Geospatial queries later on, so I want to use 2D indexes and rest of the stuff Mongo Documentation mentioned.
Hence I believe it needs to have array of documents which contain the latitude & longitude. which will also save my time to code.
Also can I make the time of the location as documents '_id' ? (It can help me to reduce the query overhead)
I would really appriciate if someone can help me with the structure of model i should write or guide me to the references.
P.S: Let me know if you suggest some extra references/help about storing geospatial data in mongoDB which can be helpful for me.
Hope this will help somebody.
If you want to embed documents you can use embedded_many feature of mongoid, which handles such relations. It allows you to define index on embedded documents as well
http://mongoid.org/en/mongoid/docs/relations.html#embeds_many
Mongoid points out, that 2D indexes should be applied to arrays:
http://mongoid.org/en/mongoid/docs/indexing.html
In your case models may look like this:
class User
include Mongoid::Document
field :name, type: String
embeds_many :locations
index({ "locations.loc" => "2d" })
accepts_nested_attributes_for :locations # see http://mongoid.org/en/mongoid/docs/nested_attributes.html#common
end
class Location
include Mongoid::Document
field :time, type: DateTime # see http://mongoid.org/en/mongoid/docs/documents.html#fields
field :loc, type: Array
embedded_in :user
end
But beware of using update and nested attributes - it allows you only update attributes, but not delete or reject them. It's preferrable to use (association)_attributes= methods instead:
#user = User.new({ name: 'John Doe' })
#user.locations_attributes = {
"0" => {
_id : ObjectID(...),
time : "...." ,
loc : [ 55.5, 42.3 ]
} ,
"1" => {
_id : ObjectID(...),
time : "...",
loc : [ -74 , 44.74 ]
}
}
#user.save!

Headless Rails, easy way to store a hash?

I'm new to rails, but most of the documentation is towards user inputting something into the view and it eventually gets passed into the database.
Is there a rails way of storing below into a SQL database? Do I put it in the model or controller?
Is there a clean way to store this data, or do I have to explicitly store every attribute in this Hash individually?
I've already made the migrations manually that matches most if not all of the hashed data below, but is there a tool that can convert these hashes into a relational Data model?
.
{
"_id" : "36483f88e04d6dcb60684a33000791a6bc522a41",
"address_components" : [
{
"long_name" : "ON",
"short_name" : "ON",
"types" : [
"administrative_area_level_1",
"political"
]
},
{
"long_name" : "CA",
"short_name" : "CA",
"types" : [
"country",
"political"
]
},
{
"long_name" : "M5J 1L4",
"short_name" : "M5J 1L4",
"types" : [
"postal_code"
]
}
],
"formatted_address" : "ON, Canada",
"formatted_phone_number" : "(416) 362-5221",
"geometry" : {
"location" : {
"lat" : 43.640816,
"lng" : -79.381752
}
},
"icon" : "http://maps.gstatic.com/mapfiles/place_api/icons/restaurant-71.png",
"id" : "36483f88e04d6dcb60684a33000791a6bc522a41",
"international_phone_number" : "+1 416-362-5221",
"name" : "Scandinavian Airlines",
"reference" : "CoQBcgAAAMobbidhAbzwIMkxq3GTHzzlEW4hAAwxg5EmGDP7ZOcJRwUK29poFMTDvED5KW9UEQrqtgTwESj_DuCAchy6Qe5pPZH9tB47MmmuQHvyHFlApunmU3MN05_KLekN5hEbrW7Gv2ys2oXmn7FpvD7-0N0QILlFXCiwL5UlYWo2sEg3EhBMBsrkHBu4WCFsMCHRqgadGhTM3BVWR15l9L87zL1uN1ssoW4WCw",
"types" : [
"restaurant",
"food",
"establishment"
],
"url" : "https://plus.google.com/100786723768255083253/about?hl=en-US",
"utc_offset" : -300,
"vicinity" : ""
}
This data structure can be stored by a matching hierarchy of models/associations.
There is a very clean way which is...
Use accepts_nested_attributes_for. This will work for your entire structure except for the 'types' arrays which contains simple lists of strings. However, you can use a workaround for this specific case.
The only thing that cannot be stored (easily) is the id. ActiveRecord won't permit you to set the id directly, as it is supposed to be an implementation detail of the backing database. In your case, you can simply borrow the _id field which seems to contain the same data and insert that into an alias of some sort.
Here is an example of the code you might use:
class Address < ActiveRecord::Base
has_many :address_components
has_many :address_types
has_one :geometry
attr_accessor :address_components_attributes, :geometry_attributes
accepts_nested_attributes_for :address_components, :geometry
def types=(types)
types.each do |t|
self.address_types << AddressType.build(name: t)
end
end
def _id=(orig_id)
self.original_id = orig_id
end
end
class AddressType < ActiveRecord::Base
belongs_to :address
end
class Geometry < ActiveRecord::Base
belongs_to :address
has_one :location
attr_accessor :location_attributes
accepts_nested_attributes_for :location
end
class Location < ActiveRecord::Base
belongs_to :geometry
end
class AddressComponent < ActiveRecord::Base
belongs_to :address
has_many :component_types
def types=(types)
types.each do |t|
self.component_types << ComponentType.build(name: t)
end
end
end
class ComponentType < ActiveRecord::Base
belongs_to :address_component
end
Now you can store the entire structure using:
Address.create(data_hash)
If you have setter methods on your model, they can handle the data and import from this hash as you would want.
For example, given the hash above, if you had a method:
def address_components=(ac)
# Handle address components
end
This will get called if when you do the following (assuming the name of your model is MyModel and the hash is stored in #hash).
MyModel.new(#hash)
All the keys will trigger setter methods of the structure 'key='. This is very powerful - if you have a very well structured model, but have an arbitrary hash, you can create methods that handle the keys in the hash. Based on these, you can build new objects, build associations and save it all at the same time.
Note - you may need to strip out some keys, or handle some keys that use reserved ruby terms in a custom way.

Modeling many-to-many :through with Mongoid/MongoDB

I'm relatively new to Mongoid/MongoDB and I have a question about how to model a specific many-to-many relationship.
I have a User model and a Project model. Users can belong to many projects, and each project membership includes one role (eg. "administrator", "editor", or "viewer"). If I were using ActiveRecord then I'd set up a many-to-many association between User and Project using has_many :through and then I'd put a field for role in the join table.
What is a good way to model this scenario in MongoDB and how would I declare that model with Mongoid? The example below seems like a good way to model this, but I don't know how to elegantly declare the relational association between User and the embedded ProjectMembership with Mongoid.
Thanks in advance!
db.projects.insert(
{
"name" : "Test project",
"memberships" : [
{
"user_id" : ObjectId("4d730fcfcedc351d67000002"),
"role" : "administrator"
},
{
"role" : "editor",
"user_id" : ObjectId("4d731fe3cedc351fa7000002")
}
]
}
)
db.projects.ensureIndex({"memberships.user_id": 1})
Modeling a good Mongodb schema really depends on how you access your data. In your described case, you will index your memberships.user_id key which seems ok. But your document size will grow as you will add viewers, editors and administrators. Also, your schema will make it difficult to make querys like:
Query projects, where user_id xxx is editor:
Again, you maybe do not need to query projects like this, so your schema looks fine. But if you need to query your projects by user_id AND role, i would recommend you creating a 'project_membership' collection :
db.project_memberships.insert(
{
"project_id" : ObjectId("4d730fcfcedc351d67000032"),
"editors" : [
ObjectId("4d730fcfcedc351d67000002"),
ObjectId("4d730fcfcedc351d67000004")
],
"viewers" : [
ObjectId("4d730fcfcedc351d67000002"),
ObjectId("4d730fcfcedc351d67000004"),
ObjectId("4d730fcfcedc351d67000001"),
ObjectId("4d730fcfcedc351d67000005")
],
"administrator" : [
ObjectId("4d730fcfcedc351d67000011"),
ObjectId("4d730fcfcedc351d67000012")
]
}
)
db.project_memberships.ensureIndex({"editors": 1})
db.project_memberships.ensureIndex({"viewers": 1})
db.project_memberships.ensureIndex({"administrator": 1})
Or even easier... add an index on your project schema:
db.projects.ensureIndex({"memberships.role": 1})

Resources