MongoDB / Mongoid embedded document loading with ror - ruby-on-rails

I have a document (DataSet) with many embedded documents (DataPoint) (1:N relation). Since this appears as an array to me in rails, if I want to read every 20th element for example, will it load every element into memory, or only every 20th element?
I am trying to figure out if this will be inefficient. I would like ideally only to load what I need from the DB.
Here is an example:
a = DataSet.first
points = a.data_points.values_at(*(0..a.data_points.count).step(20))
Is this bad? Is there a mongoid specific way to do this?

Embedded documents aren't relations (in the typical RDBMS fashion) but are actually embedded (hence the name) within the parent record, just like any other attribute. So when you call DataSet.first, you're loading the entire document, as well as its embedded records, into memory.
Depending on how your application is structured, you may see a benefit from denormalizing every 20th DataPoint into a separate embedded relation (during a callback, or in a background task, or something like that), and then when you load the document, load only those points with DataSet.only(:datapoints_sample).first - which will load only that relation into memory (and no other attributes).

Related

How to avoid storing the original content in Solr, only the indexed version?

I have a lot of documents about 30 TB, These docs have other attributes associated with it
don't want to store the actual documents after indexing it with Solr since there it is stored somewhere else and I can access it if needed later
The other data attributes will also be indexed with solr and won't be deleted.
I'm currently developing with Ruby on rails and have mysql but would like to move to
Mongodb. Is the scenario above possible?
Thanks
-Maged
You don't have to store original content in Solr. That's the difference between stored and indexed. If you set stored to false, you will only keep the processed, tokenized version of content as needed for search. Just make sure you keep your ID stored. This is set in your field definition in schema.xml.
This does mean Solr cannot return any of the non-stored fields back to the user, so you need to match them to the original records based on IDs (just as you seem to suggest).
This also break the partial document updates, so you will need to make sure you are reindexing the whole document when things changed.
As I understand, that you don't want to play with you content of the document. Once you'll index it and keep it. The other data properties, you want to index frequently. It's better you create your "content" field stored and indexed both, if you are not concerned about space. Choose the tokenizer and filters for content smartly, so that it creates less tokens.
For partial update, follow http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

Any tips for displaying user dependent values in a list of items?

It is pretty common for a web application to display a list of items and for each item in the list to indicate to the current user whether they have already viewed the associated item.
An approach that I have taken in the past is to store HasViewed objects that contain the Id of a viewed item/object and the Id of the User who has viewed that item/object.
When it comes time to display a list of items this requires querying the database for the items, and separately querying the database for the HasViewed objects, and then combining the results of these queries into a set of objects constructed solely for the purpose of displaying them in the view.
Each e.g li then uses the e.g. has_viewed property of the objects constructed above.
I think it is time to find a better approach and would like to know what approaches you would recommend.
i wanna to know whether there is better idea also.
Right now my solution is putting the state in the redis and cached the view with fragment cache.

Neo4j - individual properties, or embedded in JSON? (ROR)

I want to know which is more efficient in terms of speed and property limitations of Neo4j.. (I'm using Ruby on Rails 3.2 and REST)
I'm wondering whether I should be storing node properties in a single property, much like a database table, or storing most/all for a node in a single node property but in JSON format.
Right now in a test system I have 1000 nodes with a total of 10000 properties.. Obviously the number of properties is going to skyrocket as more features and new node types are added to my system.
So I was considering storing all the non-searchable properties for a node in an embedded JSON structure.. Except this seems like it will put more burden on the web servers, having to parse the JSON after retrieving it, etc. (I'm going to use a single property field with JSON for activity feed nodes, but I'm addressing things like photo nodes, profile nodes etc).
Any advice here? Keep things in separate properties? A hybrid of JSON and individual properties?
What is your goal by storing things in JSON? Do you think you'll hit the 67B limit (which will be going up in 2.1 in a few months to something much larger)?
From a low level store standpoint, there isn't much difference between storing a long string and storing many shorter properties. The main thing you're doing is preventing yourself from using those fields in a query.
Also, if you're using REST, you're going to have to do JSON parsing anyway, so it's not like you're going to completely avoid that.

Using multiple key value stores

I am using Ruby on Rails and have a situation that I am wondering if is appropriate for using some sort of Key Value Store instead of MySQL. I have users that have_many lists and each list has_many words. Some lists have hundreds of words and I want users to be able to copy a list. This is a heavy MySQL task b/c it is going to have to create these hundreds of word objects at one time.
As an alternative, I am considering using some sort of key value store where the key would just be the word. A list of words could be stored in a text field in mysql. Each list could be a new key value db? It seems like it would be faster to copy a key value db this way rather than have to go through the database. It also seems like this might be faster in general. Thoughts?
The general way to solve this using a relational database would be to have a list table, a word table, and a table-words table relating the two. You are correct that there would be some overhead, but don't overestimate it; because table structure is defined, there is very little actual storage overhead for each record, and records can be inserted very quickly.
If you want very fast copies, you could allow lists to be copied-on-write. Meaning a single list could be referred to by multiple users, or multiple times by the same user. You only actually duplicate the list when the user tries to add, remove, or change an entry. Of course, this is premature optimization, start simple and only add complications like this if you find they are necessary.
You could use a key-value store as you suggest. I would avoid trying to build one on top of a MySQL text field in less you have a very good reason, it will make any sort of searching by key very slow, as it would require string searching. A key-value data store like CouchDB or Tokyo Cabinet could do this very well, but it would most likely take up more space (as each record has to have it's own structure defined and each word has to be recorded separately in each list). The only dimension of performance I would think would be better is if you need massively scalable reads and writes, but that's only relevant for the largest of systems.
I would use MySQL naively, and only make changes such as this if you need the performance and can prove that this method will actually be faster.

Does or can acts_as_tree be made to support eager loading?

Using acts_as_tree I would like to be able to preload an entire tree having its complete child hierarchy intact with one SQL call. To that end, I added a tree_id to the table and it runs through all descendants in that tree.
I had explored acts_as_nested_set (actually awesome_nested_set) as a possibility but since I graft trees into other trees, I found that using the nested set for my purposes performed far too many updates. Along with acts_as_versioned this is an unacceptable complication to the design I'm after. For my purposes, I believe acts_as_tree is better suited.
My only issue is to grab a whole tree with the hierarchy intact. The :include option of ActiveRecord works with :children but not :descendants. I am content with writing my own method for retrieving and mapping the associations manually. Any guidance or examples for how to accomplish this?
From my point of view, the only benefit of nested sets that I'm putting aside to use tree (one that supports grabbing the entire structured tree) is the selective grabbing any subsection of a tree. I'm okay with that.
The solution I'm hoping to avoid is to eliminate the :children association that comes along with tree and defining and manually loading a children array defined on each tree node.
I've looked into this in the past and IIRC I found that you can load a tree of a known depth with a single SQL query by joining the table with itself n times; however, it's not possible to load a complete tree of arbitrary size. Hence the need for the nested set design.
If your data set is relatively small, you could fetch all the records and re-assemble the tree(s) in memory. Perhaps that would suffice?
You can index_tree Gem, it provides an eager loading methods for trees
RootModel.find(1).preload_tree
Detailed description is in this blog

Resources