Markov library/samples in F# - f#

I am working on a personal project with F# and would like to experiment with F# and Markov models. Can anyone recommend a library/sample with source that supports Markov modeling? Since this is a personal project I would prefer something that is free...

I'm not exactly sure about Markov models, but Infer.NET is a great library for doing statistical inference.

Regarding math an F# in general - There was a native F# mathematics library FSharp.MathTools (written in F#), which has been merged with other projects and eventually become Math.NET (which is in C#, but claims to provide a facade for F# developers).
However, I'm not sure if the library has any direct support for Markov modeling (or how difficult would it be to implement that based on what the library provides).

Related

programing language and training environment for machine learning

I need advice on which libraries and game engines should I use for a ml project
my goal is to create machine learning model for pruning the trees. I believe I have to create a game with generic tree model with some randomness then create reinforcement learning model and train ml model inside the game.ML model must have ability to first find the branch which must be cut and then find a path to move robotic arm near to that branch to cut it. I have experience in c++ and java but I prefer c++ , could you give me advise which library should I use for ML and which language and game engine should I use for creating game? I have a little experience in opengl. If it doesn't make any difference my prefered language is c++ but I know that I should use right tool for right job and python is leader in ML so if it will save a time and energy I have nothing against learning python.
My recommendation is to learn and use Python for your ML project. Though there is some work in R, for your future in ML, your best bet is to learn and use Python. The community is great, and there are many frameworks that can work out-of-the-box.
After a quick search, I did find a framework called robotframework, that is pretty highly starred on GitHub here: https://github.com/robotframework/robotframework. I will say though, however, that I am not personally familiar with using this framework. But it may be helpful to you.
In terms of tree-based algorithms, you might want to start exploring with XGBoost. It can be found here: https://github.com/dmlc/xgboost.

Suggestions for machine learning toolset without Matlab

I am new to the field of machine learning, I am planning to use python as the programing language for implementing algorithms and Java for system architecture.
As far as I understand, machine learning is more about modeling data specific to the domain, visualize the data, and choose appropriate models & parameters. Implementing the models/algorithms is the last and relatively easy step.
Matlab seems to have everything for machine learning but it is too expensive and requires to learn a new language.
What tools other than programming language do I need in general for machine learning for enterprise projects? things like data modeling, visualization,etc
After a couple of years of trial and error, I would suggest you to go directly with python, possibly with scikit-learn or tensorflow (if you want to go hardcore :).
I also tried R in the past, and while it is a very valid language it has some limitations: It is single threaded by default, and although there are solutions for that, they are non as clean as python.
Also, python seems to be THE language for machine learning, it is easy to learn, and fast (depending on the interpreter implementation of course), also there is huuuuuuge support for it, lots of tutorials, documentation and, more important, libraries are actively develop and supported.
Finally, i recommend you to consider spyder as a good IDE for data science, I also tried Rodeo, but it does not seem as mature and stable as spyder.
Hope this helps.

Use of third-party library

I'm interested in using Alea GPU with a third-party library and am trying to get a sense of my options. Specifically, I'm interested in using this L-BFGS library. I'm fairly new to the F# ecosystem but do have experience with both CUDA and functional programming.
I've been using that L-BFGS library as part of a program which implements logistic regression. It would be neat if I could assume the library correct and write the rest of my code (including that which runs on the GPU) in type-safe F#.
It seems possible to link C++ with F#. Assuming I figure out how to integrate the L-BFGS library into a F# program, would the introduction of Alea GPU cause any issues?
What I am trying to avoid is re-writing L-BFGS in F# using Alea. However, maybe that's actually the easiest path to using F#. If Alea has any facilities for nonlinear optimization, I could probably use those instead.
Alea GPU does not have a nonlinear optimizer yet. The CUDA version has a slightly different implementation than the standard CPU L-BFGS which sometimes causes some accuracy issues. Apart from this I did not face any issues with the code, except that the performance win also significantly depends on the objective function. The objective function for logistic regression is numerically relatively cheap.
We have an internal C# version for this code ported to Alea GPU, which could be also used from F# and we plan to release it in a future version.

What is the best programming language to implement neural networks?

I'm not looking for a Neural Networks library, since I'm creating new kinds of networks. For that I need a good "dataflow" language.
Of course you can do this in C, C++, Java and co. but dealing from scratch with the multithreading etc. would be a nightmare.
At the other extremity, languages like Oz or Erlang seem more adapted, but they don't have many libraries, and they are harder to master (it's easy to play with them, but is it OK to create complete software ?).
What would you suggest ?
I watched an interesting conference presentation about using Erlang for Neural Networks. You might want to check it out:
From Telecom Networks to Neural Networks; Erlang, as the unintentional Neural Network Programming Language
I also know that the presented system is going to be open-sourced any day now according the authors tweet.
Erlang is very well suited for NN.
Neurons can be modeled by processes (no problem with having millions of them)
Connections/synapses can be represented by PIDs of target neuron. It is very easy to initialize such a network as part of standard init procedure in OTP. Communication would be realized by message passing.
Maybe it would be good to have global address space in ETS/mnesia (build in datastores) in order to do dynamic reconfiguration of network structure.
Pattern matching in receive block can determine what kind of signal neuron receives and modify it on the fly.
It would be very easy to monitor such a network.
Also consider that Erlang NN would be 'live' all the time. You would be able to query neurons, layers, routers etc any time.
In C/C++ you just read current state of arrays/data structure.
Regarding performance, we all know that C/C++ is orders of magnitude faster than Erlang,
however NN topic is tricky.
If the network would hold very few neurons, in very wide address space, in regular array,
iterating over it again and again could be costly (in C). Equivalent situation in Erlang would be solved by single query to root/roots (input layer) neurons, which would propagate query directly to well addressed neighborhs.
DXNN1, and DXNN2 which was built and introduced in the textbook: Handbook of Neuroevolution Through Erlang: http://www.amazon.com/Handbook-Neuroevolution-Through-Erlang-Gene/dp/1461444624/ref=zg_bs_760204_22
Are open source and available at: https://github.com/CorticalComputer
If you are interested in data flow programming and multi-threading then I would suggest National Instruments LabVIEW. In this case you don't need to bother about multi-threading since its already there and you can also use OOP since now OOP is also native with LabVIEW. LabVIEW OOP is also purely based on data flow programming paradigm.
If you have any Java experience, then use Scala which is a JVM language that is based on the same concept of "actors" as Erlang. But it is less strict than Erlang and can easily use any existing Java libraries.
Then, when you find a computationally expensive task that would work better in Erlang, you can use Erlang's jinterface library to communicate between your Scala code and your distributed Erlang nodes.
Using Java does not mean dealing from scratch with multithreading - just use one of numerous Java Actor Libraries.
It's not a language in and of itself, but Emergent is very powerful and can be highly customized (it has a full scripting language).
It's open source, too, which could be helpful as a guide if you need to make your own version for your novel architectures.
Why reinvent the wheel? Try PyBrain. It's free and very comprehensive:
Quickstart
Another big plus for Erlang is full integration with Drakon
http://drakon-editor.sourceforge.net/drakon-erlang/intro.html
It all depends on your application. C++, Python are some good programming languages for machine learning

Intelligent code-completion? Is there AI to write code by learning?

I am asking this question because I know there are a lot of well-read CS types on here who can give a clear answer.
I am wondering if such an AI exists (or is being researched/developed) that it writes programs by generating and compiling code all on it's own and then progresses by learning from former iterations. I am talking about working to make us, programmers, obsolete. I'm imagining something that learns what works and what doesn't in a programming languages by trial and error.
I know this sounds pie-in-the-sky so I'm asking to find out what's been done, if anything.
Of course even a human programmer needs inputs and specifications, so such an experiment has to have carefully defined parameters. Like if the AI was going to explore different timing functions, that aspect has to be clearly defined.
But with a sophisticated learning AI I'd be curious to see what it might generate.
I know there are a lot of human qualities computers can't replicate like our judgement, tastes and prejudices. But my imagination likes the idea of a program that spits out a web site after a day of thinking and lets me see what it came up with, and even still I would often expect it to be garbage; but maybe once a day I maybe give it feedback and help it learn.
Another avenue of this thought is it would be nice to give a high-level description like "menued website" or "image tools" and it generates code with enough depth that would be useful as a code completion module for me to then code in the details. But I suppose that could be envisioned as a non-intelligent static hierarchical code completion scheme.
How about it?
Such tools exist. They are the subject of a discipline called Genetic Programming. How you evaluate their success depends on the scope of their application.
They have been extremely successful (orders of magnitude more efficient than humans) to design optimal programs for the management of industrial process, automated medical diagnosis, or integrated circuit design. Those processes are well constrained, with an explicit and immutable success measure, and a great amount of "universe knowledge", that is a large set of rules on what is a valid, working, program and what is not.
They have been totally useless in trying to build mainstream programs, that require user interaction, because the main item a system that learns needs is an explicit "fitness function", or evaluation of the quality of the current solution it has come up with.
Another domain that can be seen in dealing with "program learning" is Inductive Logic Programming, although it is more used to provide automatic demonstration or language / taxonomy learning.
Disclaimer: I am not a native English speaker nor an expert in the field, I am an amateur - expect imprecisions and/or errors in what follow. So, in the spirit of stackoverflow, don't be afraid to correct and improve my prose and/or my content. Note also that this is not a complete survey of automatic programming techniques (code generation (CG) from Model-Driven Architectures (MDAs) merits at least a passing mention).
I want to add more to what Varkhan answered (which is essentially correct).
The Genetic Programming (GP) approach to Automatic Programming conflates, with its fitness functions, two different problems ("self-compilation" is conceptually a no-brainer):
self-improvement/adaptation - of the synthesized program and, if so desired, of the synthesizer itself; and
program synthesis.
w.r.t. self-improvement/adaptation refer to Jürgen Schmidhuber's Goedel machines: self-referential universal problem solvers making provably optimal self-improvements. (As a side note: interesting is his work on artificial curiosity.) Also relevant for this discussion are Autonomic Systems.
w.r.t. program synthesis, I think is possible to classify 3 main branches: stochastic (probabilistic - like above mentioned GP), inductive and deductive.
GP is essentially stochastic because it produces the space of likely programs with heuristics such as crossover, random mutation, gene duplication, gene deletion, etc... (than it tests programs with the fitness function and let the fittest survive and reproduce).
Inductive program synthesis is usually known as Inductive Programming (IP), of which Inductive Logic Programming (ILP) is a sub-field. That is, in general the technique is not limited to logic program synthesis or to synthesizers written in a logic programming language (nor both are limited to "..automatic demonstration or language/taxonomy learning").
IP is often deterministic (but there are exceptions): starts from an incomplete specification (such as example input/output pairs) and use that to constraint the search space of likely programs satisfying such specification and then to test it (generate-and-test approach) or to directly synthesize a program detecting recurrences in the given examples, which are then generalized (data-driven or analytical approach). The process as a whole is essentially statistical induction/inference - i.e. considering what to include into the incomplete specification is akin to random sampling.
Generate-and-test and data-driven/analytical§ approaches can be quite fast, so both are promising (even if only little synthesized programs are demonstrated in public until now), but generate-and-test (like GP) is embarrassingly parallel and then notable improvements (scaling to realistic program sizes) can be expected. But note that Incremental Inductive Programming (IIP)§, which is inherently sequential, has demonstrated to be orders of magnitude more effective of non-incremental approaches.
§ These links are directly to PDF files: sorry, I am unable to find an abstract.
Programming by Demonstration (PbD) and Programming by Example (PbE) are end-user development techniques known to leverage inductive program synthesis practically.
Deductive program synthesis start with a (presumed) complete (formal) specification (logic conditions) instead. One of the techniques leverage automated theorem provers: to synthesize a program, it constructs a proof of the existence of an object meeting the specification; hence, via Curry-Howard-de Bruijn isomorphism (proofs-as-programs correspondence and formulae-as-types correspondence), it extracts a program from the proof. Other variants include the use of constraint solving and deductive composition of subroutine libraries.
In my opinion inductive and deductive synthesis in practice are attacking the same problem by two somewhat different angles, because what constitute a complete specification is debatable (besides, a complete specification today can become incomplete tomorrow - the world is not static).
When (if) these techniques (self-improvement/adaptation and program synthesis) will mature, they promise to rise the amount of automation provided by declarative programming (that such setting is to be considered "programming" is sometimes debated): we will concentrate more on Domain Engineering and Requirements Analysis and Engineering than on software manual design and development, manual debugging, manual system performance tuning and so on (possibly with less accidental complexity compared to that introduced with current manual, not self-improving/adapting techniques). This will also promote a level of agility yet to be demonstrated by current techniques.

Resources