Rails - Simplifying calculation models & objects - ruby-on-rails

I have asked a few questions about this recently and I am getting where I need to go, but have perhaps not been specific enough in my last questions to get all the way there. So, I am trying to put together a structure for calculating some metrics based on app data, which should be flexible to allow additional metrics to be added easily (and securely), and also relatively simple to use in my views.
The overall goal is that I will be able to have a custom helper that allows something like the following in my view:
calculate_metric(#metrics.where(:name => 'profit'),#customer,#start_date,#end_date)
This should be fairly self explanatory - the name can be substituted to any of the available metric names, and the calculation can be performed for any customer or group of customers, for any given time period.
Where the complexity arises is in how to store the formula for calculating the metric - I have shown below the current structure that I have put together for doing this:
You will note that the key models are metric, operation, operation_type and operand. This kind of structure works ok when the formula is very simple, like profit - one would only have two operands, #customer.sales.selling_price.sum and #customer.sales.cost_price.sum, with one operation of type subtraction. Since we don't need to store any intermediate values, register_target will be 1, as will return_register.
I don't think I need to write out a full example to show where it becomes more complicated, but suffice to say if I wanted to calculate the percentage of customers with email addresses for customers who opened accounts between two dates (but did not necessarily buy), this would become much more complex since the helper function would need to know how to handle the date variations.
As such, it seems like this structure is overly complicated, and would be hard to use for anything other than a simple formula - can anyone suggest a better way of approaching this problem?
EDIT: On the basis of the answer from Railsdog, I have made some slight changes to my model, and re-uploaded the diagram for clarity. Essentially, I have ensured that the reporting_category model can be used to hide intermediate operands from users, and that operands that may be used in user calculations can be presented in a categorised format. All I need now is for someone to assist me in modifying my structure to allow an operation to use either an actual operand or the result of a previous operation in a rails-esqe way.
Thanks for all of your help so far!

Oy vey. It's been years (like 15) since I did something similar to what it seems like you are attempting. My app was used to model particulate deposition rates for industrial incinerators.
In the end, all the computations boiled down to two operands and an operator (order of operations, parentheticals, etc). Operands were either constants, db values, or the result of another computation (a pointer to another computation). Any Operand (through model methods) could evaluate itself, whether that value was intrinsic, or required a child computation to evaluate itself first.
The interface wasn't particularly elegant (that's the real challenge I think), but the users were scientists, and they understood the computation decomposition.
Thinking about your issue, I'd have any individual Metric able to return it's value, and create the necessary methods to arrive at that answer. After all, a single metric just needs to know how to combine it's two operands using the indicated operator. If an operand is itself a metric, you just ask it what it's value is.

Related

NEAT: how does crossover occur for species with only one member

So, I'm trying to implement the NEAT(Neuroevolution of augmenting topologies) algorithm and have stumbled into a problem. How are networks in species with only one member crossed over?
One solution I came up with is to perform inter-species crossover. But I don't know if it would be effective.
In NEAT, there are four ways in which you can create candidate individuals for the next generation:
Pass an exact copy of an individual
Pass a mutated copy of an individual
Do crossover using two individuals from a given species
Do crossover with two individuals of different species (iter-species)
Of course, you can always do (1). This is often applied to "elites", which may be the best of all, or the best of each species.
You can also always do (2), again to a subset of all individuals or to a subset (random or sorted) within each species.
As you correctly anticipate, (4) is also always a possibility, as long as you do have at least two species (it seems things would be a bit broken otherwise).
Regarding (3) in case you have a species with only one individual? You can't really do it, right?
There are two things that can help in this situation. First, use a mix of 1 to 4 options. The frequency for each option is normally determined using hyperparameters (as well as the frequency for each type of mutation and so on).
But here I would actually reconsider your speciation algorithm. Speciation means separating your population into groups, where hopefully more similar individuals are grouped together. There are different ways in which you can do this, and you can re-examine your species with different frequencies as well (you can reset your species every generation!). It does not seem very efficient if your clustering algorithm (because speciation is a type of clustering) is returning species with one or even zero individuals. So this is where I would actually work!
As a final note, I remember a full NEAT implementation is no basic project. I would recommend not trying to implement this on your own. I think it is a better use of your time to work with a well-established implementation, so you can focus on understanding how things work and how to adapt them for your needs, and not so much on bugs and other implementation details.

How to handle homophones in speech recognition?

For those who are not familiar with what a homophone is, I provide the following examples:
our & are
hi & high
to & too & two
While using the Speech API included with iOS, I am encountering situations where a user may say one of these words, but it will not always return the word I want.
I looked into the [alternativeSubstrings] (link) property wondering if this would help, but in my testing of the above words, it always comes back empty.
I also looked into the Natural Language API, but could not find anything in there that looked useful.
I understand that as a user adds more words, the Speech API can begin to infer context and correct for these, but my use case will not work well with this since it will often only want one or two words at most, limiting the effectiveness of context.
An example of contextual processing:
Using the words above on their own, I get these results:
are
hi
to
However, if I put together the following sentence, you can see they are all wrong:
I am too high for our ladder
Ideally, I would either get a list back containing [are, our], [to, too, two], [hi, high] for each transcription segment, or would have a way to compare a string against a function that supports homophones.
An example of this would be:
if myDetectedWord == "to" then { ... }
Where myDetectedWord can be [to, too, two], and this function would return true for each of these.
This is a common NLP dilemma, and I'm not so sure what might be your desired output in this application. However, you may want to bypass this problem in your design/architecture process, if possible and if you could. Otherwise, this problem is to turn into a challenge.
Being said that, if you wish to really get into it, I like this idea of yours:
string against a function
This might be more efficient and performance friendly.
One way, I'd be liking to solve this problem would be though RegEx processing, instead of using endless loops and arrays. You could maybe prototype loops and arrays to begin with and see how it works, then you might want to use regular expression for gaining performance.
You could for instance define fixed arrays in regular expressions and quickly check against your string (word by word, maybe using back-referencing) and you can add many boundaries in your expressions for string processing, as you wish.
Your fixed arrays also can be designed based on probabilities of occurring certain words in certain part of a string. For instance,
^I
vs
^eye
The probability of I being the first word is much higher than that of eye.
The probability of I in any part of a string is higher than that of eye, also.
You might want to weight words based on that.
I'd say the key would be that you'd narrow down your desired outputs as focused as possible and increase accuracy, [maybe even with 100 words if possible], if you wish to have a good/working application.
Good project though, I hope you like/enjoy the challenge.

What is pb.conflict in Z3?

I am trying to find an optimal solution using the Z3 API for python. I have used set_option("verbose", 1) to print statements that Z3 generates while checking for sat. One of the statements it prints is pb.conflict statements. The statements look something like this -
pb.conflict statements.
I want to know what exactly is pb.conflict. What do these statements signify? Also, what are the two numbers that get printed along with it?
pb stands for Pseudo-boolean. A pseudo-boolean function is a function from booleans to some other domain, usually Real. A conflict happens when the choice of a variable leads to an unsatisfiable clause set, at which point the solver has to backtrack. Keeping the backtracking to a minimum is essential for efficiency, and many of the SAT engines carefully track that number. While the details are entirely solver specific (i.e., those two numbers you're asking about), in general the higher the numbers, the more conflict cases the solver met, and hence might decide to reset the state completely or take some other action. Often, there are parameters that users can set to specify when such actions are taken and exactly what those are. But again, this is entirely solver and implementation specific.
A google search on pseudo-boolean optimization will result in a bunch of scholarly articles that you might want to peruse.
If you really want to find Z3's treatment of pseudo-booleans, then your best bet is probably to look at the implementation itself: https://github.com/Z3Prover/z3/blob/master/src/smt/theory_pb.cpp

There seem to be a lot of ruby methods that are very similar, how do I pick which one to use?

I'm relatively new to Ruby, so this is a pretty general question. I have found through the Ruby Docs page a lot of methods that seem to do the exact same thing or very similar. For example chars vs split(' ') and each vs map vs collect. Sometimes there are small differences and other times I see no difference at all.
My question here is how do I know which is best practice, or is it just personal preference? I'm sure this varies from instance to instance, so if I can learn some of the more important ones to be cognizant of I would really appreciate that because I would like to develop good habits early.
I am a bit confused by your specific examples:
map and collect are aliases. They don't "do the exact same thing", they are the exact same thing. They are just two names for the same method. You can use whatever name you wish, or what reads best in context, or what your team has decided as a Coding Standard. The Community seems to have settled on map.
each and map/collect are completely different, there is no similarity there, apart from the general fact that they both operate on collections. map transform a collection by mapping every element to a new element using a transformation operation. It returns a new collection (an Array, actually) with the transformed elements. each performs a side-effect for every element of the collection. Since it is only used for its side-effect, the return value is irrelevant (it might just as well return nil like Kernel#puts does, in languages like C, C++, Java, C♯, it would return void), but it is specified to always return its receiver.
split splits a String into an Array of Strings based on a delimiter that can be either a Regexp (in which case you can also influence whether or not the delimiter itself gets captured in the output or ignored) or a String, or nil (in which case the global default separator gets used). chars returns an Array with the individual characters (represented as Strings of length 1, since Ruby doesn't have an specific Character type). chars belongs together in a family with bytes and codepoints which do the same thing for bytes and codepoints, respectively. split can only be used as a replacement for one of the methods in this family (chars) and split is much more general than that.
So, in the examples you gave, there really isn't much similarity at all, and I cannot imagine any situation where it would be unclear which one to choose.
In general, you have a problem and you look for the method (or combination of methods) that solve it. You don't look at a bunch of methods and look for the problem they solve.
There'll typically be only one method that fits a specific problem. Larger problems can be broken down into different subproblems in different ways, so it is indeed possible that you may end up with different combinations of methods to solve the same larger problem, but for each individual subproblem, there will generally be only one applicable method.
When documentation states that 2 methods do the same, it's just matter of preference. To learn the details, you should always start with Ruby API documentation

unifying model for 'static' / 'historical' /streaming data in F#

An API I use exposes data with different characteristics :
'Static' reference data, that is you ask for them, get one value which supposedly does not change
'historical' values, where you can query for a date range
'subscription' values, where you register yourself for receiving updates
From a semantic point of view, though, those fields are one and the same, and only the consumption mode differs. Reference data can be viewed as a constant function yielding the same result through time. Historical data is just streaming data that happened in the past.
I am trying to find a unifying model against which to program all the semantic of my queries, and distinguish it from its consumption mode.
That mean, the same quotation could evaluated in a "real-time" way which would turn fields into their appropriate IObservable form (when available), or in 'historical' way, which takes a 'clock' as an argument and yield values when ticked, or a 'reference' way, which just yield 1 value (still decorated by the historical date at which the query is ran..)
I am not sure which programming tools in F# would be the most natural fit for that purpose, but I am thinking of quotation, which I never really used.
Would it be well suited for such a task ?
you said it yourself: just go with IObservable
your static case is just OnNext with 1 value
in your historical case you OnNext for each value in your query (at once when a observer is registered)
and the subscription case is just the ordinary IObservable pattern - you don't need something like quotations for this.
I have done something very similar (not the static case, but the streaming and historical case), and IObservable is definitely the right tool for the job. In reality, IEnumerable and IObservable are dual and most things you can write for one you can also write for the other. However, a push based model (IObservable) is more flexible, and the operators you get as part of Rx are more complete and appropriate than those as part of regular IEnumerable LINQ.
Using quotations just means you need to build the above from scratch.
You'll find the following useful:
Observable.Return(value) for a single static value
list.ToObservable() for turning a historical enumerable to an observable
Direct IObservable for streaming values into IObservable
Also note that you can use virtual schedulers for ticking the observable if this helps (most of the above accept a scheduler). I imagine this is what you want for the historical case.
http://channel9.msdn.com/Series/Rx-Workshop/Rx-Workshop-Schedulers

Resources