Rails Reporting Object - Map to New Object - ruby-on-rails

I have a complex multi-object query that is being used for reporting to customers and isn't dependent on new transactional data. With the volume of data I'm dealing with, it seems more efficient to use a custom query instead of loading all of my models up (which I was originally doing and even with effective filtering, it was getting slow).
I've gone as far as creating a class to represent my report (combining attributes from several models), but I'm stuck on something that seems fairly easy now.
I have a new class named AttendeeReport
It has the following attributes:
attendee_id, first_name, last_name, email_address, rsvp_id, response_id, etc
This is used for reporting (in jquery datatables, and possibly elsewhere) and I'm using this query to pull from multiple models correctly into an array of my correct attributes:
list = Attendee.joins(:event).joins(:focus_group_prospect).joins(:post_focus_group_survey).joins(:lead).joins(:vendor).where("events.focus_group_client_id = ?", client_id).select("attendees.id as attendee_id, leads.first_name, etc").all
How would map that result to a new array of AttendeeReport without using each field in the array? Is there a merge or map feature that I'm not familiar with that does this dynamically?
Note: I've dramatically decreased the number of attributes shown since they don't matter.
Update: Reflection?
Thinking on this if there isn't an easy way to do this via a map or merge, wondering if a reflection approach would work too (iterate over all attributes and match them up).

Related

Dynamic Queries using Couch_Potato

The documentation for creating a fairly straightforward view is easy enough to find:
view :completed, :key => :name, :conditions => 'doc.completed === true'
How, though, does one construct a view with a condition created on the fly? For example, if I want to use a query along the lines of
doc.owner_id == my_var
Where my_var is set programatically.
Is this even possible? I'm very new to NoSQL so apologies if I'm making no sense.
Views in CouchDB are incrementally built / indexed as data is inserted / updated into that particular database. So in order to take full advantage of the power behind views you won't want to dynamically query them. You'll want to construct your views in such a way that you can efficiently access the data based on the expected usage patterns of the application. In my experience it's not uncommon to have multiple views each giving you a different way to access / query the same data. I find it helpful to think of CouchDB views as a way to systematically denormalize your documents.
On the other hand there are also ways to generalize your indexes in your views so you can use a single view for endless combinations of queries.
For example, you have an "articles" database, and each article document contains a list of tags. If you want to set up a query to dynamically retrieve all articles tagged with a handful of tags, you could emit multiple entries to the view on the same document:
// this article is tagged with "tag1","tag2","tag3"
emit("tag1",doc._id);
emit("tag2",doc._id);
emit("tag3",doc._id);
....
Now you have a way to query: Give me all articles tagged with these words: ["tag1","tag2",etc]
For more info on how to query multiple keys see "Parameter -> keys" in the table of Querying Options here:
http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
One problem with the above example is it would produce duplicates if a single document was tagged with both or all of the tags you were querying for. You can easily de-dupe the results of the view by using a CouchDB "List Function". More info about list functions can be found here:
http://guide.couchdb.org/draft/transforming.html
Another way to construct views for even more robust "dynamic" access to the data would be to compose your indexes out of complex data types such as JavaScript arrays. Also incorporating "range queries" can help. So for example if you have a 3-item array in your index, but only have the first 2 values, you can set up a range query to pull all documents that match the first 2 items of the array. Some useful info about that can be found here:
http://guide.couchdb.org/draft/views.html
Refer to the "startkey", and "endkey" options under "Querying Options" table here:
http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
It's good to know how CouchDB indexes itself. It uses a "B+ tree" data structure:
http://guide.couchdb.org/draft/btree.html
Keep this in mind when thinking about how to compose your indexes. This has specific implications about how you need to construct your indexes. For example, you can't expect to get good performance on a view if you query with a range on the first item in the array. For example:
startkey = [a,1,2]
endkey = [z,1,2]
You'll get the performance you'd expect if your query is:
startkey = [1,2,a]
endkey = [1,2,z]
This, in more general terms, means that index order does matter when querying views. Not just on basis of performance, but on basis of what documents will be returned. If you index a document in a view with [1,2,3], you can't expect it to show up in query for index [3,2,1], [2,1,3], or any other combination.
In my experience, most data-access problems can be solved elegantly and efficiently with CouchDB and the basic tools it provides. If / when your project needs true dynamic access to the data, I generally still use CouchDB for common data access needs, but I'll also integrate ElasticSearch using an ElasticSearch plugin which streams your data from CouchDB into ElasticSearch as it becomes available:
http://www.elasticsearch.org/
https://github.com/elasticsearch/elasticsearch-river-couchdb

Activerecord-import records with relationships to each other

My app needs to import hundreds to thousands of records at a time. The records are each nodes in a tree structure. I'm using activerecord-import to significantly speed up the import, and haven't yet settled on which of ancestry, closure_tree, acts_as_list or a custom solution to use for setting out the hierarchy.
The problem I'm grappling with is how to import all the data and relationships in one or just a few passes. My draft naive solution is:
creating each object in memory, and manually giving each object an id;
using those ids to manually giving each object the foreign_key that it needs (eg parent_id); and then
mass-importing the resulting array of objects using activerecord-import
This feels like a hack with obvious problems. For example, if the ids that I've chosen for my objects get used by the database while I'm still instantiating my objects, then the relationships I've manually set become useless/wrong.
Another major problem is that as I look into more advanced solutions for the tree data structure (eg closure_tree and ancestry), manually setting the fields required by those gems feels more and more like a hack.
So I guess my question is, is there a clean way to set up a tree structure of N nodes in a rails activerecord database while touching the database less than N times?
The master branch has a commit with this functionality. It will work only with Rails 4 and Postgres.
If you happen to have another configuration, you will need to:
Create a new array of Models/hashes
Model.import it
Retrieve the IDs of the rows you just inserted
Goto 1, setting in the new array the IDs (parent_id) you just retrieved

Neo4j - individual properties, or embedded in JSON? (ROR)

I want to know which is more efficient in terms of speed and property limitations of Neo4j.. (I'm using Ruby on Rails 3.2 and REST)
I'm wondering whether I should be storing node properties in a single property, much like a database table, or storing most/all for a node in a single node property but in JSON format.
Right now in a test system I have 1000 nodes with a total of 10000 properties.. Obviously the number of properties is going to skyrocket as more features and new node types are added to my system.
So I was considering storing all the non-searchable properties for a node in an embedded JSON structure.. Except this seems like it will put more burden on the web servers, having to parse the JSON after retrieving it, etc. (I'm going to use a single property field with JSON for activity feed nodes, but I'm addressing things like photo nodes, profile nodes etc).
Any advice here? Keep things in separate properties? A hybrid of JSON and individual properties?
What is your goal by storing things in JSON? Do you think you'll hit the 67B limit (which will be going up in 2.1 in a few months to something much larger)?
From a low level store standpoint, there isn't much difference between storing a long string and storing many shorter properties. The main thing you're doing is preventing yourself from using those fields in a query.
Also, if you're using REST, you're going to have to do JSON parsing anyway, so it's not like you're going to completely avoid that.

How to change model lazyloadness at runtime in Symfony?

I use sfPropelORMPlugin.
Lazyload is ok if I operate on one object per web page. But if there are hundreds I get hundreds of separate DB queries. I'd like to completely disable lazyload or disable it for needed columns on those particularly heavy pages but couldn't find a way so far.
You should join all your relations when you build your query, that way you'll get all data in a single query. Note, you have to use joinWithRelation() where Relation is a related table name.
Elaborating on William Durand's answer, perhaps you should also look at the Propel function doSelectjoinAll(), which should pre-load all of the objects related to your relations. Just keep in mind this can be expensive as it relates to memory.
Another technique is to create a custom criteria with your needed joins, then use a manual hydrate technique to add on to your base object. I do this often when the data I need is using aggregates or other columns that are not exactly mapped to objects. There are plenty of hydrate() examples around.
Added utility method to peer to be able to set what columns I want to load. Using "pseudo columns" for this type of DB queries. Also I have overridden hydrate() to understand this "markup". All were good until I found out that even though data is hydrated symfony won't understand it and won't let you use it as intended.
PS join was never considered as an option because site is kind of high load.

Linq-to-SQL query - Need to filter by IDs returned by Full-Text Search sql functions - Hitting limit for Contains

My objective:
I have built a working controller action in MVC which takes user input for various filter criteria and, using PredicateBuilder (part of LinqKit - sorry, I'm not allowed enough links yet) builds the appropriate LINQ query to return rows from a "master" table in SQL with a couple hundred thousand records. My implementation of the predicates is totally inelegant, as I'm new to a lot of this, and under a very tight deadline, but it did make life easier. The page operates perfectly as-is.
To this, I need to add a Full-Text search filter. Understanding the way LINQ translates Contains to LIKE(%%), using the advice in Simon Blog: LINQ-to-SQL - Enabling Full-Text Searching, I've already prepared Table Functions in SQL to run Freetext queries on the relevant columns. I have 4 functions, to match the query against 4 separate tables.
My approach:
At the moment, I'm building the predicates (I'll spare you) for the initial IQueryable data object, running a LINQ command to return them, like so:
var MyData = DB.Master_Items.Where(outer);
Then, I'm attempting to further filter MyData on the Keys returned by my full-text search functions:
var FTS_Matches_Subtable_1 = (from tbl in DB.Subtable_1
join fts in DB.udf_Subtable_1_FTSearch(KeywordTerms)
on tbl.ID equals fts.ID
select tbl.ForeignKey);
... I have 4 of those sets of matches which I've tried to use to filter my original dataset in several ways with no success. For instance:
MyNewData = MyData.Where(d => FTS_Matches_Subtable_1.Contains(d.Key) ||
FTS_Matches_Subtable_2.Contains(d.Key) ||
FTS_Matches_Subtable_3.Contains(d.Key) ||
FTS_Matches_Subtable_4.Contains(d.Key));
I just get the error: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
I get that it's because I'm trying to pass a relatively large set of data into the Contains function and LINQ is converting each record into a separate parameter, exceeding the limit.
I just don't know how to get around it.
I found another post linq expression to return property value which seemed SO promising. I tried ifwdev's solution (2nd highest ranked answer): using LinqKit to build an extension that will break up the queries into manageable chunks. But I can't figure out how to implement it. Out of my depth right now maybe?
Is there another approach that I'm missing? Some simpler way to accomplish this that I've overlooked?
Sorry for the long post. But thank you for any help you can provide!
This is a perfect time to go back to raw ado.net.
Twisting things around just to use linq to sql is probably just as time consuming if you wrote the query and hydration by hand.

Resources