Rails object structure for reporting metrics - ruby-on-rails

I've asked a couple of questions around this subject recently, and I think I'm managing to narrow down what I need to do.
I am attempting to create some "metrics" (quotes because these should not be confused with metrics relating to the performance of the application; these are metrics that are generated based on application data) in a Rails app; essentially I would like to be able to use something similar to the following in my view:
#metric(#customer,'total_profit','01-01-2011','31-12-2011').result
This would give the total profit for the given customer for 2011.
I can, of course, create a metric model with a custom result method, but I am confused about the best way to go about creating the custom metrics (e.g. total_profit, total_revenue, etc.) in such a way that they are easily extensible so that custom metrics can be added on a per-user basis.
My initial thoughts were to attempt to store the formula for each custom metric in a structure with operand, operation and operation_type models, but this quickly got very messy and verbose, and was proving very hard to do in terms of adding each metric.
My thoughts now are that perhaps I could create a custom metrics helper method that would hold each of my metrics (thus I could just hard code each one, and pass variables to each method), but how extensible would this be? This option doesn't seem very rails-esque.
Can anyone suggest a better alternative for approaching this problem?
EDIT: The answer below is a good one in that it keeps things very simple - though i'm concerned it may be fraught with danger, as it uses eval (thus there is no prospect of ever using user code). Is there another option for doing this (my previous option where operands etc. were broken down into chunks used a combination of constantize and get_instance_variable - is there a way these could be used to make the execution of a string safer)?

This question was largely answered with some discussion here: Rails - Scalable calculation model.
For anyone who comes across this, the solution is essentially to ensure an operation always has two operands, but an operand can either be an attribute, or the result of a previous calculation (i.e. it can be a metric itself), and it is thus highly scalable. This avoids the need to eval anything, and thus avoids the potential security holes that this entails.

Related

Strategy to reduce duplicate code in many modules

The Situation
So I have created some code in the form of modules that each represent a medical questionnaire (I'm calling them Catalogs). Each different questionnaire has it's own module as they may differ slightly in their content and associated calculations, but are essentially made up of simple questions that have boolean/numeric possible responses. Here is an example:
http://www.janssenmedicalinformation.ca/assets/pdf/HarveyBradshaw_English.pdf
These Catalog modules are included in an Entry class that collects responses matching the question names. Each questionnaire is transformed into a DEFINITION which is used in the Entry to do things like:
Validate inputs
Check completeness
Calculate scoring
Here are 2 examples for reference that illustrate the problem of duplication... much of the code is similar but not exactly the same.
https://gist.github.com/theworkerant/3a074d5d2a642ded1b96
The Problem
There is a lot of duplication here, but I'm not sure about the best strategy to remove it. There are a few things that make it difficult for this particular problem and make me lean towards accepting some duplication as opposed to a system that is too strict to work. The system needs to remain flexible enough to accommodate currently unknown medical questionnaires of a similar nature so I need to be careful (the reason I've gone with a Module system so far)
Here are some examples:
Each Catalog can have slightly different scoring requirements and custom grouping of questions that represent one "score"
Potentially many Catalogs are included in an Entry class and can't step on each other
Some Catalogs incorporate things like "Current Weight" for calculations, breaking the 1-5 or 1-10 paradigm and not fitting very nicely into simple sum reductions.
One Catalog requires a week of previous entries in order to be valid, a sort of weird custom validation.
The Question:
What strategies might be employed here to reduce duplication overall? I'm not looking for tweaks cut out a few lines from these specific examples. Implementation cost is a consideration.
Possibilities:
Put some of this into a database (sounds pretty good, but I think the cost of implementation could be high)
I fear there could be room for improvement in my metaprogramming here, perhaps there are better ways to accomplish this through some dynamic method creation or other voodoo.
Thanks!

Identifying where to put controller versus model logic

I'm trying to figure out the best (or rather the most practical/efficient) way of doing something in my rails application. Basically what I have is an area where a user must enter some information in several form fields which I currently have DB columns for (income for example).
With that information obtained from the user, some calculations need to be performed (say, for example: income and rrsp contribution need to be run through a simple formula to determine the approximate taxable income of the user).
My question is, would it be best practice to perform said calculation in a method at the model level, and save that processed information in a DB column of its own, or would I perform said calculations of the raw input data at a controller level, requiring processing each time?
I'm guessing it's probably generally best to store the calculated data in the database so it doesn't need to be processed each time, but I'm basically looking for best practices to follow in this case and in general. It probably also depends widely based on the applications specific requirements I'd guess.
My preference is to store raw (or lightly sanitized) data only. Then turn the formulae you need into methods on your model, or perhaps library/helper functions, depending on the structure of your project as a whole.
When you start storing processed data, you need to start worrying about the task of syncing when the source data changes. This can be messy and hard.
Since computers are fast at arithmetic, and for a web application relying on a database the arithmetic is not likely to be your performance bottleneck, i wouldn't worry about the performance overhead. If it became a bottleneck, i might start to think about a cache layer.

Questions about application service design

I come from a mostly n-tier background, and I'm trying to move more towards a DDD architecture. I'm trying to find best practices for designing the application service, and after a few searches, am still left with a few questions. Granted, I know I can't be the first person to ask these questions, so if you know where these are answered, just point me the way and I'll happily close this.
Here are my main questions:
How "open" should your signatures be? For example, is it better to be more rigid with your signatures and use simple types as parameters when possible, or is it better to use objects (messages?) that can later be modified without breaking the signature?
If you want to expose variations of a signature, for example, a UserSearch method that returns a list of users based on various (and sometimes optional) search criteria, is it better to:
A. Use overloads
B. Use optional (or nullable) parameters
C. Break each scenario into its own unique method
D. Use messages
I know that some of these answers are subjective, and also depend on what all will be calling your application service. But I'm just trying to get a general direction of things to consider and other best practices at this point.
Thanks in advance.
Good questions. Thinking about the API is obviously important.
1) How open would depend for me would depend on who the consumers are. If this application service is only being used within the context of your own solution and/or team then I think it's fine to have specific messages (or rather their interfaces) or Dtos (data transfer objects). Though if as easy then keeping to simple types is best in my book and definitely better if being consumed by others. If they don't suffice then interfaced messages that provide only just enough. Again if you are going to be distributed to different platforms then simple messages of simple types is not a bad way to go.
2) Why not have a SearchCriteria object as a paramater? It could be a SearchCriteria message of simple types if you are looking at this as a start of a messaging bus.
As you say, your question is a little open but I'd be interested to hear more as it sounds like you are asking the right questions at least.
Jerad, those are tough questions to answer generally, as you noted.
My personal preference is to use primitives in method signatures where possible. If I need to pass 3+ primitives to a method, I define custom data transfer objects.
The thinking being: if multiple values are being passed together, it's likely they represent a concept in your problem space, and thus should become an object. For example, if you are passing X and Y coordinates to a method, I'd recommend creating a Point class or struct that represents that concept.
The only time I'd end up with variations on a signature, it would be to provide convenience methods that provide default values for method parameters. To continue the above example, a Draw method might not require a Point, in which case I'd use (0,0).
So, I'd answer #1 with "not very open" and #2 with A.
I hope that helps.

Things you look for when trying to understand new software

I wonder what sort of things you look for when you start working on an existing, but new to you, system? Let's say that the system is quite big (whatever it means to you).
Some of the things that were identified are:
Where is a particular subroutine or procedure invoked?
What are the arguments, results and predicates of a particular function?
How does the flow of control reach a particular location?
Where is a particular variable set, used or queried?
Where is a particular variable declared?
Where is a particular data object accessed, i.e. created, read, updated or deleted?
What are the inputs and outputs of a particular module?
But if you look for something more specific or any of the above questions is particularly important to you please share it with us :)
I'm particularly interested in something that could be extracted in dynamic analysis/execution.
I like to use a "use case" approach:
First, I ask myself "what's this software's purpose?": I try to identify how users are going to interact with the application;
Once I have some "use case", I try to understand what are the objects that are more involved and how they interact with other objects.
Once I did this, I draw a UML-type diagram that describe what I've just learned for further reference. What happens after depends on the task I've been assigned, i.e. modify the code, document the code etc.
There is the question of what motivation do I have for learning the new system:
Bug fix/minor enhancement - In this case, I may focus solely on that portion of the system that performs a specific function that needs to be altered. This is a way to break down a huge system but also is a way to identify if the issue is something I can fix or if it is something that I have to hand to the off-the-shelf company whose software we are using,e.g. a CRM, CMS, or ERP system can be a customized off-the-shelf system so there are many pieces to it.
Project work - This would be the other case and is where I'd probably try to build myself a view from 30,000 feet or so to know what are the high-level components and which areas of the system does the project impact. An example of this is where I'd join a company and work off of an existing code base but I don't have the luxury of having the small focus like in the previous case. Part of that view is to look for any patterns in the code in terms of naming conventions, project structure, etc. as this may be useful once I start changing some code in the system. I'd probably do some tracing through the system and try to see where are the uglier parts of the code. By uglier I mean those parts that are kludge-like and may have some spaghetti code as this was rushed when first written and is now being reworked heavily.
To my mind another way to view this is the question of whether I'm going to be spending days or weeks wrapping my head around a system like in the second case or should this be a case where it hopefully takes only a few hours, optimistically that is, to get my footing to make the necessary changes.

Is it ever a good idea to use association lists instead of records?

Would any experienced Erlang programmers out there ever recommend association lists over records?
One case might be where two (or more) nodes on different machines are exchanging messages. We want to be able to upgrade the software on each machine independently. Some upgrades may involve adding a field to one (or more) of the messages being sent. It seems like using a record as the message would mean you'd always have to do the upgrade on both machines in lock step so that the extra field didn't cause the receiver to ignore the record. Whereas if you used something like an association list (which still has a "record-like" API), the not-yet-upgraded receiver would still receive the message successfully and just ignore the new field. I realize this isn't always the desired behavior, but often it is. Also, assume the messages are fairly small so the lookup time doesn't matter.
Assuming the above makes some sense, I have the following additional questions:
Is there a standard (or widely used) library for alists? Some trivial googling didn't turn up anything.
Are there other cases where you would use an association list (or something like it)?
You have basically three choices:
Use Records
Use Association Lists (proplists)
Use Combination
I use records where the likelihood of changing it is very low. That way I get the pattern matching and speed up that I want.
I use proplists where I need hashtable like functionality. I get flexibility at the expense of pattern matching and speed.
And sometimes I use both. A record with one field that is a proplist. That way I can pattern match on a portion of it and yet have flexibility where I need it.
All three choices have different trade-offs so you basically just have to evaluate your particular needs and make a choice. It may take some prototyping and playing around to figure out which trade-offs make sense and which features you absolutely must have.
For small amount of keys you can use lists aka proplists for bigger you should use dict. In both cases biggest disadvantage is that you can't use pattern match in way as used for records. There is also speed penalty but it is in most cases irrelevant.
Note that lists:keysearch/3 is pretty much "assq".

Resources