unifying model for 'static' / 'historical' /streaming data in F# - f#

An API I use exposes data with different characteristics :
'Static' reference data, that is you ask for them, get one value which supposedly does not change
'historical' values, where you can query for a date range
'subscription' values, where you register yourself for receiving updates
From a semantic point of view, though, those fields are one and the same, and only the consumption mode differs. Reference data can be viewed as a constant function yielding the same result through time. Historical data is just streaming data that happened in the past.
I am trying to find a unifying model against which to program all the semantic of my queries, and distinguish it from its consumption mode.
That mean, the same quotation could evaluated in a "real-time" way which would turn fields into their appropriate IObservable form (when available), or in 'historical' way, which takes a 'clock' as an argument and yield values when ticked, or a 'reference' way, which just yield 1 value (still decorated by the historical date at which the query is ran..)
I am not sure which programming tools in F# would be the most natural fit for that purpose, but I am thinking of quotation, which I never really used.
Would it be well suited for such a task ?

you said it yourself: just go with IObservable
your static case is just OnNext with 1 value
in your historical case you OnNext for each value in your query (at once when a observer is registered)
and the subscription case is just the ordinary IObservable pattern - you don't need something like quotations for this.

I have done something very similar (not the static case, but the streaming and historical case), and IObservable is definitely the right tool for the job. In reality, IEnumerable and IObservable are dual and most things you can write for one you can also write for the other. However, a push based model (IObservable) is more flexible, and the operators you get as part of Rx are more complete and appropriate than those as part of regular IEnumerable LINQ.
Using quotations just means you need to build the above from scratch.
You'll find the following useful:
Observable.Return(value) for a single static value
list.ToObservable() for turning a historical enumerable to an observable
Direct IObservable for streaming values into IObservable
Also note that you can use virtual schedulers for ticking the observable if this helps (most of the above accept a scheduler). I imagine this is what you want for the historical case.
http://channel9.msdn.com/Series/Rx-Workshop/Rx-Workshop-Schedulers

Related

Listing available records available to a process in Erlang

Records are compile time structures. The record_info and is_record recognise the compiled records and their structures. Is there a way to ask the VM what records have been defined that are available to the process? I am interested in getting the internal tuple representation for every record definition.
What I want to do is something like:
-record(car,{make=honda}).
get_record(Car) ->
%% Some magic here to end up having sth like
{car,{make,honda}} or even better #car{} indeed. %% when Car = 'car'
As you said records are only a compile time construct, so once compiled records are only tuples, this would suggest no available information is left during runtime, but since you mentioned those two functions I was curious and I checked how they worked.
According to this record_info/2 is a pseudo function made available only during compilation, so it doesn't need any run time information on records.
On the other hand the description of is_record(Term, RecordTag) states that this BIF (built-in function) only "returns true if Term is a tuple and its first element is RecordTag, false otherwise", so it is actually only checking the structure and first element of the tuple.
Based on this, I would guess that there is no record information made available during runtime. This thread confirms the unavailability of record_info/2 during runtime.
I have used Dynarec (https://github.com/dieswaytoofast/dynarec.git) successfully in a data mapping module for one of the apps I am currently working on. It is a parse transformer, though, not a run-time VM tool. It compiles information on each defined record, as well as information about the fields for each record. In my case, I use it to dynamically map incoming data to record data. This module may get you what you need. YMMV. Good luck.
As others have said records are purely compile time and there is no runtime information about records. Erlang just sees tuples. For example the record_info/2 pseudo functions are expanded to data at compile time, a list of atoms for fields argument and an integer for size.

Why should we use classes rather than records, or vice versa?

I've been using Delphi for quite some time now, but rather than coming from a CS background I have learnt "on the job" - mostly from my Boss, and augmented by bits and pieces picked up from the web, users guides, examples, etc.
Now my boss is old school, started programming using Pascal, and hasn't necessarily kept up-to-date with the latest changes to Delphi.
Just recently I've been wondering whether one of our core techniques is "wrong".
Most of our applications interface with MySQL. In general we will create a record with a structure to store data read from the DB, and these records will be stored in a TList. Generally we will have a unit that defines the various records that we have in an application, and the functions and procedures that seed and read the records. We don't use record procedures such as outlined here
After reviewing some examples I've started wondering whether we'd be better off using classes rather than records, but I'm having difficulty finding strong guidance either way.
The sort of thing that we are dealing with would be User information: Names, DOB, Events, Event Types. Or Timesheet information: Hours, Jobs, etc...
The big difference is that records are value types and classes are reference types. In a nutshell what this means is that:
For a value type, when you use assignment, a := b, a copy is made. There are two distinct instances, a and b.
For a reference type, when you use assignment, a := b, both variables refer to the same instance. There is only one instance.
The main consequence of this is what happens when you write a.Field := 42. For a record, the value type, the assignment a.Field changes the value of the member in a, but not in b. That's because a and b are different instances. But for a class, since a and b both refer to the same instance, then after executing a.Field := 42 you are safe to assert that b.Field = 42.
There's no hard and fast rule that says that you should always use value types, or always use reference types. Both have their place. In some situations, it will be preferable to use one, and in other situations it will be preferable to use the other. Essentially the decision always comes down to a decision on what you want the assignment operator to mean.
You have an existing code base, and presumably programmers familiar with it, that has made particular choices. Unless you have a compelling reason to switch to using reference types, making the change will almost certainly lead to defects. And defects both in the existing code (switch to reference type changes meaning of assignment operator), and in code you write in the future (you and your colleagues have developed intuition as to meaning of assignment operator in specific contexts, and that intuition will break if you switch).
What's more, you state that your types do not use methods. A type that consists only of data, and has no methods associated with it is very likely best represented by a value type. I cannot say that for sure, but my instincts tell me that the original developers made the right choice.

Rails - Simplifying calculation models & objects

I have asked a few questions about this recently and I am getting where I need to go, but have perhaps not been specific enough in my last questions to get all the way there. So, I am trying to put together a structure for calculating some metrics based on app data, which should be flexible to allow additional metrics to be added easily (and securely), and also relatively simple to use in my views.
The overall goal is that I will be able to have a custom helper that allows something like the following in my view:
calculate_metric(#metrics.where(:name => 'profit'),#customer,#start_date,#end_date)
This should be fairly self explanatory - the name can be substituted to any of the available metric names, and the calculation can be performed for any customer or group of customers, for any given time period.
Where the complexity arises is in how to store the formula for calculating the metric - I have shown below the current structure that I have put together for doing this:
You will note that the key models are metric, operation, operation_type and operand. This kind of structure works ok when the formula is very simple, like profit - one would only have two operands, #customer.sales.selling_price.sum and #customer.sales.cost_price.sum, with one operation of type subtraction. Since we don't need to store any intermediate values, register_target will be 1, as will return_register.
I don't think I need to write out a full example to show where it becomes more complicated, but suffice to say if I wanted to calculate the percentage of customers with email addresses for customers who opened accounts between two dates (but did not necessarily buy), this would become much more complex since the helper function would need to know how to handle the date variations.
As such, it seems like this structure is overly complicated, and would be hard to use for anything other than a simple formula - can anyone suggest a better way of approaching this problem?
EDIT: On the basis of the answer from Railsdog, I have made some slight changes to my model, and re-uploaded the diagram for clarity. Essentially, I have ensured that the reporting_category model can be used to hide intermediate operands from users, and that operands that may be used in user calculations can be presented in a categorised format. All I need now is for someone to assist me in modifying my structure to allow an operation to use either an actual operand or the result of a previous operation in a rails-esqe way.
Thanks for all of your help so far!
Oy vey. It's been years (like 15) since I did something similar to what it seems like you are attempting. My app was used to model particulate deposition rates for industrial incinerators.
In the end, all the computations boiled down to two operands and an operator (order of operations, parentheticals, etc). Operands were either constants, db values, or the result of another computation (a pointer to another computation). Any Operand (through model methods) could evaluate itself, whether that value was intrinsic, or required a child computation to evaluate itself first.
The interface wasn't particularly elegant (that's the real challenge I think), but the users were scientists, and they understood the computation decomposition.
Thinking about your issue, I'd have any individual Metric able to return it's value, and create the necessary methods to arrive at that answer. After all, a single metric just needs to know how to combine it's two operands using the indicated operator. If an operand is itself a metric, you just ask it what it's value is.

Time series modeling in f#-- seq vs array vs vector vs list vs generic list

If I want to make a time series type in F# to hold stock prices, which basic type should I use? We need
Select a subset based on time index,
Calculate basic statistics for a subset like mean, STD or for several subsets like correlations,
Append item for new data and fast update statistics or technical indicators,
Do linear regression between time series, etc
I have read that array has a better performance, seq has a smaller memory footnote, list is better for adding items and F# vector is easier for certain math calculation. To balance all the trade offs, how would you model a stock price time series in f#? Thanks.
As a concrete representation you can choose either array or list or some other .NET colllection type. A sequence seq<'T> is an abstract type and both array and list are automatically also sequences - this means that when you write some code that works with sequences, it will work with any concrete data type (array, list or any other .NET collection).
So, when writing data processing, you can use Seq by default (as it gives you great flexibility - it doesn't matter what concrete representation you use) and then optimize some operations to use the concrete representation (whatever that will be) if you need something to run faster.
Regarding the concrete representation - I think the crucial question is whether you want to add elements without changing original data structure (immutable list or array used in an immutable way) or whether you want to mutate the data structure (e.g. use some mutable .NET collection).
If you need to add new items freuqently then you can either use immutable list (which supports appending elements to front) or a mutable collection (array won't do as it cannot be resized).
If you're working on a more sophisticated system, I would recommend taking a look at ObservableCollection<T> (see MSDN). This is a collection that automatically notifies you when it is changed. In response to the notification, you could update your statistics (it also tells you which elements were added, so you don't need to recalculate everything). However, F# doesn't have any libraries for working with this type, so you'll need to write a lot of things yourself.
If you're adding data only rarely or adding them in larger groups, you could use array (and allocate new array each time you add items). If you have only relatively small number of items in the collection, you could use lists (where adding item is easy).
For numerical calculations, the F# PowerPack (and types like vector) offer only quite limitied set of features, so you may need to look at some thrid party libraries. Extreme optimizations is a commercial library with some F# examples and Math.NET is an open source alternative.
Otherwise, it is difficult to give any concrete advice - can you add some more details about your system? (e.g. how large the data set is, how many items need to be added how often etc...)

Are there any downsides to passing in an Erlang record as a function argument?

Are there any downsides to passing in an Erlang record as a function argument?
There is no downside, unless the caller function and the called function were compiled with different 'versions' of the record.
Some functions from erlangs standard library do indeed use records in their interfaces (I can't recall which ones, right now--but there are a few), but in my humble opinion, the major turnoff is, that the user will have to include your header file, just to use your function.
That seems un-erlangy to me (you don't ever do that normally, unless you're using said functions from the stdlib), creates weird inter-dependencies, and is harder to use from the shell (I wouldn't know from the top of my head how to load & use records from the shell -- I usually just "cheat" by constructing the tuple manually...)
Also, handling records is a bit different from the stuff you usually do, since their keys per default take the atom 'undefined' as value, au contraire to how you usually do it with proplists, for instance (a value that wasn't set just isn't there) -- this might cause some confusion for people who do not normally work a lot with records.
So, all-in-all, I'd usually prefer a proplist or something similar, unless I have a very good reason to use a record. I do usually use records, though, for internal state of for example a gen_server or a gen_fsm; It's somewhat easier to update that way.
I think the biggest downside is that it's not idiomatic. Have you ever seen an API that required you to construct a record and pass it in?
Why would you want to do something that's going to feel foreign to any erlang programmer? There's a convention already in use for optional named arguments to functions. Inventing yet another way without good cause is pointless.

Resources