Objective-C Quadratic/Polynomial regression (Linest function in excel) - ios

The objective-c math library seems pretty basic.
I'm looking for some statistics analysis functions like the Excel function "linest" to retrieve the quadratic or polynomial regressions of a data set with a given order.
Is there any function similar to the "linest" function for objective-c? Or a known statistics library/framework?
I have a hard time to believe I'm the first person to stumble upon this problem in iOS.

I spend several days getting through the math and getting it in code because I couldn't find a math library for iOS with the function I needed. I wouldn't recommend anyone do to that again, it wasn't a walk in the park, so I published my solution on my github. You can find it here:
https://github.com/KingIsulgard/iOS-Polynomial-Regression
It's easy to use, just give the x values and y values of the data and the order of polynomial you want to get and voila, you got it.
Hope this might help some people. Feel free to improve if you can. I'm just happy it finally worked.

The standard math library in general only gives you an interface to the elementary mathematical operations that are implemented in the FPU part of a CPU.
For linear regression you need either your own algorithm, it is not that complicated to implement in a handful of loops, or a dedicated (most likely) statistics library.
Writing your own algorithm for higher order or general regression is simple if a QR decomposition algorithm is available, for instance via bindings for LAPACK or similar. Then to solve
minimize sum (b[0]*f[0](x[k])+...+b[n]*f[n](x[k])-y[k])^2
one has just to construct the matrix [X|Y] where X[k,j]=f[j](x[k]) is the matrix of the values of the ansatz functions and Y[k]=y[k] is the column vector of the values to approximate. Apply the QR algorithm to [X|Y], identify or extract the R factor from its result and solve for b in
R*[b|1]'=0
via back-substitution.

Related

Optimize deep Q network with long episode

I am working on a problem for which we aim to solve with deep Q learning. However, the problem is that training just takes too long for each episode, roughly 83 hours. We are envisioning to solve the problem within, say, 100 episode.
So we are gradually learning a matrix (100 * 10), and within each episode, we need to perform 100*10 iterations of certain operations. Basically we select a candidate from a pool of 1000 candidates, put this candidate in the matrix, and compute a reward function by feeding the whole matrix as the input:
The central hurdle is that the reward function computation at each step is costly, roughly 2 minutes, and each time we update one entry in the matrix.
All the elements in the matrix depend on each other in the long term, so the whole procedure seems not suitable for some "distributed" system, if I understood correctly.
Could anyone shed some lights on how we look at the potential optimization opportunities here? Like some extra engineering efforts or so? Any suggestion and comments would be appreciated very much. Thanks.
======================= update of some definitions =================
0. initial stage:
a 100 * 10 matrix, with every element as empty
1. action space:
each step I will select one element from a candidate pool of 1000 elements. Then insert the element into the matrix one by one.
2. environment:
each step I will have an updated matrix to learn.
An oracle function F returns a quantitative value range from 5000 ~ 30000, the higher the better (roughly one computation of F takes 120 seconds).
This function F takes the matrix as the input and perform a very costly computation, and it returns a quantitative value to indicate the quality of the synthesized matrix so far.
This function is essentially used to measure some performance of system, so it do takes a while to compute a reward value at each step.
3. episode:
By saying "we are envisioning to solve it within 100 episodes", that's just an empirical estimation. But it shouldn't be less than 100 episode, at least.
4. constraints
Ideally, like I mentioned, "All the elements in the matrix depend on each other in the long term", and that's why the reward function F computes the reward by taking the whole matrix as the input rather than the latest selected element.
Indeed by appending more and more elements in the matrix, the reward could increase, or it could decrease as well.
5. goal
The synthesized matrix should let the oracle function F returns a value greater than 25000. Whenever it reaches this goal, I will terminate the learning step.
Honestly, there is no effective way to know how to optimize this system without knowing specifics such as which computations are in the reward function or which programming design decisions you have made that we can help with.
You are probably right that the episodes are not suitable for distributed calculation, meaning we cannot parallelize this, as they depend on previous search steps. However, it might be possible to throw more computing power at the reward function evaluation, reducing the total time required to run.
I would encourage you to share more details on the problem, for example by profiling the code to see which component takes up most time, by sharing a code excerpt or, as the standard for doing science gets higher, sharing a reproduceable code base.
Not a solution to your question, just some general thoughts that maybe are relevant:
One of the biggest obstacles to apply Reinforcement Learning in "real world" problems is the astoundingly large amount of data/experience required to achieve acceptable results. For example, OpenAI in Dota 2 game colletected the experience equivalent to 900 years per day. In the original Deep Q-network paper, in order to achieve a performance close to a typicial human, it was required hundres of millions of game frames, depending on the specific game. In other benchmarks where the input are not raw pixels, such as MuJoCo, the situation isn't a lot better. So, if you don't have a simulator that can generate samples (state, action, next state, reward) cheaply, maybe RL is not a good choice. On the other hand, if you have a ground-truth model, maybe other approaches can easily outperform RL, such as Monte Carlo Tree Search (e.g., Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning or Simple random search provides a competitive approach to reinforcement learning). All these ideas a much more are discussed in this great blog post.
The previous point is specially true for deep RL. The fact of approximatting value functions or policies using a deep neural network with millions of parameters usually implies that you'll need a huge quantity of data, or experience.
And regarding to your specific question:
In the comments, I've asked a few questions about the specific features of your problem. I was trying to figure out if you really need RL to solve the problem, since it's not the easiest technique to apply. On the other hand, if you really need RL, it's not clear if you should use a deep neural network as approximator or you can use a shallow model (e.g., random trees). However, these questions an other potential optimizations require more domain knowledge. Here, it seems you are not able to share the domain of the problem, which could be due a numerous reasons and I perfectly understand.
You have estimated the number of required episodes to solve the problem based on some empirical studies using a smaller version of size 20*10 matrix. Just a caution note: due to the curse of the dimensionality, the complexity of the problem (or the experience needed) could grow exponentially when the state space dimensionalty grows, although maybe it is not your case.
That said, I'm looking forward to see an answer that really helps you to solve your problem.

Supervised learning linear regression

I am confused about how linear regression works in supervised learning. Now I want to generate a evaluation function for a board game using linear regression, so I need both the input data and output data. Input data is my board condition, and I need the corresponding value for this condition, right? But how can I get this expected value? Do I need to write an evaluation function first by myself? But I thought I need to generate an evluation function by using linear regression, so I'm a little confused about this.
It's supervised-learning after all, meaning: you will need input and output.
Now the question is: how to obtain these? And this is not trivial!
Candidates are:
historical-data (e.g. online-play history)
some form or self-play / reinforcement-learning (more complex)
But then a new problem arises: which output is available and what kind of input will you use.
If there would be some a-priori implemented AI, you could just take the scores of this one. But with historical-data for example you only got -1,0,1 (A wins, draw, B wins) which makes learning harder (and this touches the Credit Assignment problem: there might be one play which made someone lose; it's hard to understand which of 30 moves lead to the result of 1). This is also related to the input. Take chess for example and take a random position from some online game: there is the possibility that this position is unique over 10 million games (or at least not happening often) which conflicts with the expected performance of your approach. I assumed here, that the input is the full board-position. This changes for other inputs, e.g. chess-material, where the input is just a histogram of pieces (3 of these, 2 of these). Now there are much less unique inputs and learning will be easier.
Long story short: it's a complex task with a lot of different approaches and most of this is somewhat bound by your exact task! A linear evaluation-function is not super-uncommon in reinforcement-learning approaches. You might want to read some literature on these (this function is a core-component: e.g. table-lookup vs. linear-regression vs. neural-network to approximate the value- or policy-function).
I might add, that your task indicates the self-learning approach to AI, which is very hard and it's a topic which somewhat gained additional (there was success before: see Backgammon AI) popularity in the last years. But all of these approaches are highly complex and a good understanding of RL and the mathematical-basics like Markov-Decision-Processes are important then.
For more classic hand-made evaluation-function based AIs, a lot of people used an additional regressor for tuning / weighting already implemented components. Some overview at chessprogramming wiki. (the chess-material example from above might be a good one: assumption is: more pieces better than less; but it's hard to give them values)

what actually the sigmoid derivative purpose?

Honestly im learning the neural network but i have a question in the activation part.
I know that the question is general and a lot of explanation around the internet. But i still don't understand clearly.
Why we need to derivate the sigmoid function? why do not we just use
it?
It will be good if you give the clear explanation. Thankyou.
I've seen many videos on youtube, i've read many article about it but still don't get it.
Thanks for your help.
Your question is not entirely clear, but I assume you are asking: "Why don't we just use the Sigmoid function without having to calculate its derivative?".
Your question is also very broad, so my answer is very broad and wordy, you will need to read more to understand all the details, for which I'll try to provide links.
Activation function: as the name suggests, we are wanting to know if a given node is "on" or "off", for which the sigmoid function provides an easy way to turn continuous variables (X) into a range of {0,1}.
Use cases can vary and this function has certain properties, and so that is why there are many alternative "activation" functions, like tanh, ReLU, etc. Read more here: https://en.wikipedia.org/wiki/Sigmoid_function
Differentiate (derivate): most models we want to find the best-fit beta parameters for all our activation functions. To do this we, we typically want to minimise a "cost" function that describes how good our model is at predicting observed data. One way to solve this optimisation problem is Gradient Descent. Each step of gradient descent updates the parameters by following the multi-dimensional cost-function space. To do this, it needs the gradient of the activation function. This is important for back propagation that uses gradient descent to optimise the network, it requires that the activation functions you use (in most cases) to be differentiateable.
Read more here: https://en.wikipedia.org/wiki/Gradient_descent
I suggest if you have a deeper question that you take it to one of the machine learning stackexchange sites.

What is a proper Learning Technique for the given data sample

I am working in matlab.
I have data samples of two unrelated variables at 256 time-steps. Their plots with their value on Y - axis and time-steps on X-axis is as below.
Typical Plot for the first variable say Pos is
Typical Plot for the second variable say Vel is
Now I need to predict the values for these variables at next 10 time-steps. To check various machine learning techniques to do so , I took values of the variables at first 246 time-steps , predicted the next 10 time-steps and then compared them with their actual value by calculating the mean square error say ms_error.
I have done this using time-series(NAR) ,linear regression,fuzzy input systems,neural networks. but none of these are able to give the value of ms_error lesser than 2.
Can someone suggest a learning algorithm to use to predict future values for data samples like these two.
You could try with symbolic regression via Genetic Programming.
Genetic Programing makes no assumption about the structure of the function that fits your data points, so it's well suited to this sort of discovery task.
Symbolic regression was one of the earliest applications of GP and continues to be widely studied.
There are many ready-to-use environments for every major programming language and many tutorials on the subject e.g.
C++: Beagle
Java: ECJ
Matlab: GPTIPS - GPLAB
Python: DEAP
(I don't mean these are the best, just well known. Of course Google search could bring up other software that is more suitable for your needs).

Where can I get a Delphi/Pascal implementation of Excel-style polynomial regression curve fitting?

I have a set of X-Y values (i.e. a scatter plot) and I want a Pascal routine to generate the coefficients of a Nth order polynomial that fits those points, in the same way that Excel does.
I used David J Taylor's Polyfit example (curvefit.zip), which implements a least squares curve fitting algorithm (also known as linear regression) David's site is here, but keep reading, because my version is better. (See below).
The origin of the algorithms David is using is a book on scientific math for Pascal programmers, Allen Miller's Curve Fitting routine from the book "Pascal Programs For Scientists And Engineers", typed and submitted to MTPUG in Oct. 1982 by Juergen Loewner,
and corrected and adaptated for Turbo Pascal by Jeff Weiss.
You can grab curvefit.zip directly from bitbucket here. (You can clone the sourcecode with Mercurial/TortoiseHG, or download a ZIP from bitbucket)
hg clone https://bitbucket.org/wpostma/curvefit curvefit
It runs in any delphi version 5 and up, Unicode or not, even Delphi 10 Berlin. It has a little chart in the demo, added by me. I also added a way to force the result through the origin, a common technique where you want a best fit on all values, other than the constant term, which should be forced, either to zero, or to some experimentally derived average. A forced "blank subtraction" which is set equal to the average of a series of analytical "zero samples", is common in certain types of analytical chemistry when used with certain types of instrumentation, and in other scientific cases, where it can be more useful than a best-fit, because you may wish to minimize error around the origin more than minimize error across the area of the curve that is farthest from the origin.
I should also clarify that for purposes of linear regression, a "curve" may also be a line, which is the case I needed for analytical chemistry purposes, and that equation for any straight line (y=mx+b) is also called the "calibration curve". A first order curve fit is a line (y = mx +b), a second order curve fit (shown in the picture) is a parabola (y= nX^2 + mX + b). As you might guess, this algorithm scales from first order up to any level you might wish. I haven't tested it above 8 terms though.
Here's a screenshot:
Bitbucket project link:
https://bitbucket.org/wpostma/curvefit/overview
Try TPMath http://tpmath.sourceforge.net/ - I've been using this for years for fitting a hill regression and can recommend it.
Check the functions in Turbo Power's SysTools library, now is open source, it includes math functions in the unit StStat.
Even though you've already awarded an answer, for completeness, I thought I'd add this:
We use SDL Components' Math pack and have been very happy with it.
http://www.lohninger.com/delfcomp.html
It's well thought out, and does exactly what we need.
He's got a variety of other interesting tools on his site.
XlXtrFun is the best curve fitting I know and use, but it is for Excel:
http://www.xlxtrfun.com/XlXtrFun/XlXtrFun.htm

Resources