L-BFGS from RISO not working - machine-learning

I am testing the implementation of RISO's L-BFGS library for function minimization for logistic regression in Java. Here is the link to the class that I am using.
To test the library, I am trying to minimize the function:
f(x) = 2*(x1^2) + 4*x2 + 5
The library needs the objective and the gradient functions which I implemented as below:
/**
The value of the objective function, given variable assignments
x. This is specific to your problem, so you must override it.
Remember that LBFGS only minimizes, so lower is better.
**/
public double objectiveFunction(double[] x) throws Exception {
return (2*x[0]*x[0] + 3*x[1] + 1);
}
/**
The gradient of the objective function, given variable assignments
x. This is specific to your problem, so you must override it.
**/
public double[] evaluateGradient(double[] x) throws Exception {
double[] result = new double[x.length];
result[0] = 4 * x[0];
result[1] = 3;
return result;
}
Running the code with this implementation of the objective function and gradient gives me the following exception:
Exception in thread "main" Line search failed. See documentation of routine mcsrch.
Error return of line search: info = 3 Possible causes:
function or gradient are incorrect, or incorrect tolerances. (iflag == -1)
I haven't changed the tolerances from the default values. What am I doing wrong?

I don't think your cost function has a minimum since x2 can reach -Inf, and gradient algorithm won't find it.
It is a quadratic function for x1, but not for x2. I suspect the exception is thrown out because the gradient algorithm cannot find out the optimal solution, and it 'thinks' the problem is the tolerance coefficient is not correctly set, or the gradient function is wrong
Do you mean f(x) = 2*(x^2) + 3*x + 1 in your object function?

Related

Dot Product vs Element-Wise multiplication for Backpropogation

I am trying to backpropogate a very primitive / simple ANN.
I've almost got it working. I'm trying to implement the formulas and the article I'm reading does not specify whether to use dot product or element wise multiplication or some other multiplication.
Article: https://ml-cheatsheet.readthedocs.io/en/latest/backpropagation.html
Here's the formula for calculating the error (or delta) of a single Hidden layer:
Or, as I read it in the context of my algorithm,
Delta = prev_delta * prev_weight * zprime
Where delta is the error of this layer, prev_delta is the delta of the previous layer, prev_weight is the weight of the previous layer, and zprime is the derivative of the activation function of the current layer.
Also, for a single Output Layer:
Or, as I read it in the context of my algorithm,
Delta = (output - target) % zprime;
Where output is the final output of the feed-forward and target is the target values.
I've written this code to run this calculation:
void Layer::backward(Matrix & prev_delta, Matrix & prev_weight) {
// all variables are matrices
// except for prev_layer, that's a pointer to a layer object.
// I'm using Armadillo for linear algebra / matrices
// delta, weight, output, and zprime refer to the current layer.
// prev_delta, prev_weight belong the the previous layer.
if (next_layer == nullptr) {
// if next layer is null, this is the output layer.
// in that case, prev_delta is target.
// yHat - y * R'(Zo)
delta = (output - prev_delta) * zprime;
}
else {
// Eo * Wo * R'(Zh)
delta = prev_delta * prev_weight * zprime;
}
// tell the next layer to backpropogate
if (prev_layer != nullptr)
prev_layer -> backward(delta, weight);
}
matrix * matrix indicates a matrix multiplication (dot product)
matrix % matrix indicates element-wise multiplication
The issue I'm having is that these matrices don't seem to multiply properly. I've made sure everything lines up the same way the article has it, but these pieces just don't seem to fit. How should these matrices be multiplied to get the result?
Edit: to clarify, I get errors when I try to take the dot product of these matrices. "invalid size". I've tried using element wise multiplication but then things get weird there too.

16 bit logic/computer simulation in Swift

I’m trying to make a basic simulation of a 16 bit computer with Swift. The computer will feature
An ALU
2 registers
That’s all. I have enough knowledge to create these parts visually and understand how they work, but it has become increasingly difficult to make larger components with more inputs while using my current approach.
My current approach has been to wrap each component in a struct. This worked early on, but is becoming increasingly difficult to manage multiple inputs while staying true to the principles of computer science.
The primary issue is that the components aren’t updating with the clock signal. I have the output of the component updating when get is called on the output variable, c. This, however, neglects the idea of a clock signal and will likely cause further problems later on.
It’s also difficult to make getters and setters for each variable without getting errors about mutability. Although I have worked through these errors, they are annoying and slow down the development process.
The last big issue is updating the output. The output doesn’t update when the inputs change; it updates when told to do so. This isn’t accurate to the qualities of real computers and is a fundamental error.
This is an example. It is the ALU I mentioned earlier. It takes two 16 bit inputs and outputs 16 bits. It has two unary ALUs, which can make a 16 bit number zero, negate it, or both. Lastly, it either adds or does a bit wise and comparison based on the f flag and inverts the output if the no flag is selected.
struct ALU {
//Operations are done in the order listed. For example, if zx and nx are 1, it first makes input 1 zero and then inverts it.
var x : [Int] //Input 1
var y : [Int] //Input 2
var zx : Int //Make input 1 zero
var zy : Int //Make input 2 zero
var nx : Int //Invert input 1
var ny : Int //Invert input 2
var f : Int //If 0, do a bitwise AND operation. If 1, add the inputs
var no : Int //Invert the output
public var c : [Int] { //Output
get {
//Numbers first go through unary ALUs. These can negate the input (and output the value), return 0, or return the inverse of 0. They then undergo the operation specified by f, either addition or a bitwise and operation, and are negated if n is 1.
var ux = UnaryALU(z: zx, n: nx, x: x).c //Unary ALU. See comments for more
var uy = UnaryALU(z: zy, n: ny, x: y).c
var fd = select16(s: f, d1: Add16(a: ux, b: uy).c, d0: and16(a: ux, b: uy).c).c //Adds a 16 bit number or does a bitwise and operation. For more on select16, see the line below.
var out = select16(s: no, d1: not16(a: fd).c, d0: fd).c //Selects a number. If s is 1, it returns d1. If s is 0, it returns d0. d0 is the value returned by fd, while d1 is the inverse.
return out
}
}
public init(x:[Int],y:[Int],zx:Int,zy:Int,nx:Int,ny:Int,f:Int,no:Int) {
self.x = x
self.y = y
self.zx = zx
self.zy = zy
self.nx = nx
self.ny = ny
self.f = f
self.no = no
}
}
I use c for the output variable, store values with multiple bits in Int arrays, and store single bits in Int values.
I’m doing this on Swift Playgrounds 3.0 with Swift 5.0 on a 6th generation iPad. I’m storing each component or set of components in a separate file in a module, which is why some variables and all structs are marked public. I would greatly appreciate any help. Thanks in advance.
So, I’ve completely redone my approach and have found a way to bypass the issues I was facing. What I’ve done is make what I call “tracker variables” for each input. When get is called for each variable, it returns that value of the tracker assigned to it. When set is called it calls an update() function that updates the output of the circuit. It also updates the value of the tracker. This essentially creates a ‘copy’ of each variable. I did this to prevent any infinite loops.
Trackers are unfortunately necessary here. I’ll demonstrate why
var variable : Type {
get {
return variable //Calls the getter again, resulting in an infinite loop
}
set {
//Do something
}
}
In order to make a setter, Swift requires a getter to be made as well. In this example, calling variable simply calls get again, resulting in a never-ending cascade of calls to get. Tracker variables are a workaround that use minimal extra code.
Using an update method makes sure the output responds to a change in any input. This also works with a clock signal, due to the architecture of the components themselves. Although it appears to act as the clock, it does not.
For example, in data flip-flops, the clock signal is passed into gates. All a clock signal does is deactivate a component when the signal is off. So, I can implement that within update() while remaining faithful to reality.
Here’s an example of a half adder. Note that the tracker variables I mentioned are marked by an underscore in front of their name. It has two inputs, x and y, which are 1 bit each. It also has two outputs, high and low, also known as carry and sum. The outputs are also one bit.
struct halfAdder {
private var _x : Bool //Tracker for x
public var x: Bool { //Input 1
get {
return _x //Return the tracker’s value
}
set {
_x = x //Set the tracker to x
update() //Update the output
}
}
private var _y : Bool //Tracker for y
public var y: Bool { //Input 2
get {
return _y
}
set {
_y = y
update()
}
}
public var high : Bool //High output, or ‘carry’
public var low : Bool //Low output, or ‘sum’
internal mutating func update(){ //Updates the output
high = x && y //AND gate, sets the high output
low = (x || y) && !(x && y) //XOR gate, sets the low output
}
public init(x:Bool, y:Bool){ //Initializer
self.high = false //This will change when the variables are set, ensuring a correct output.
self.low = false //See above
self._x = x //Setting trackers and variables
self._y = y
self.x = x
self.y = y
}
}
This is a very clean way, save for the trackers, do accomplish this task. It can trivially be expanded to fit any number of bits by using arrays of Bool instead of a single value. It respects the clock signal, updates the output when the inputs change, and is very similar to real computers.

How to estimate? "simple" Nonlinear Regression + Parameter Constraints + AR residuals

I am new to this site so please bear with me. I want to
the nonlinear model as shown in the link: https://i.stack.imgur.com/cNpWt.png by imposing constraints on the parameters a>0 and b>0 and gamma1 in [0,1].
In the nonlinear model [1] independent variable is x(t) and dependent are R(t), F(t) and ξ(t) is the error term.
An example of the dataset can be shown here: https://i.stack.imgur.com/2Vf0j.png 68 rows of time series
To estimate the nonlinear regression I use the nls() function with no problem as shown below:
NLM1 = nls(**Xt ~ (aRt-bFt)/(1-gamma1*Rt), start = list(a = 10, b = 10, lamda = 0.5)**,algorithm = "port", lower=c(0,0,0),upper=c(Inf,Inf,1),data = temp2)
I want to estimate NLM1 with allowing for also an AR(1) on the residuals.
Basically I want the same procedure as we go from lm() to gls(). My problem is that in the gnls() function I dont know how to put contraints for the model parameters a, b, gamma1 and the model estimates wrong values for them.
nls() has the option for lower and upper bounds. I cant do the same on gnls()
In the gnls(): I need to add the contraints something like as in nls() lower=c(0,0,0),upper=c(Inf,Inf,1)
NLM1_AR1 = gnls( model = Xt ~ (aRt-bFt)/(1-gamma1*Rt), data = temp2, start = list(a =13, b = 10, lamda = 0.5),correlation = corARMA(p = 1))
Does any1 know the solution on how to do it?
Thank you

Obtaining weights in CvSVM, the SVM implementation of OpenCV

I am using the SVM implementation of OpenCV (based on LibSVM) on iOS. Is it possible to obtain the weight vector after training?
Thank you!
After dealing with it I have been able to obtain the weights. For obtaining the weights one has to obtain first the support vectors and then add them multiplied by the alpha values.
// get the svm weights by multiplying the support vectors by the alpha values
int numSupportVectors = SVM.get_support_vector_count();
const float *supportVector;
const CvSVMDecisionFunc *dec = SVM.decision_func;
svmWeights = (float *) calloc((numOfFeatures+1),sizeof(float));
for (int i = 0; i < numSupportVectors; ++i)
{
float alpha = *(dec[0].alpha + i);
supportVector = SVM.get_support_vector(i);
for(int j=0;j<numOfFeatures;j++)
*(svmWeights + j) += alpha * *(supportVector+j);
}
*(svmWeights + numOfFeatures) = - dec[0].rho; //Be careful with the sign of the bias!
The only trick here is that the instance variable float *decision_function is protected on the opencv framework, so I had to change it in order to access it.
A cursory glance of the doc and the source code (https://github.com/Itseez/opencv/blob/master/modules/ml/src/svm.cpp) tells me that on the surface the answer is "No". The hyperplane parameters seem to be tucked away into the CvSVMSolver class. CvSVM contains an object of this class called "solver". See if you can get to its members.

Finding the Oriented Bounding Box of a Convex Hull in XNA Using Rotating Calipers

Perhaps this is more of a math question than a programming question, but I've been trying to implement the rotating calipers algorithm in XNA.
I've deduced a convex hull from my point set using a monotone chain as detailed on wikipedia.
Now I'm trying to model my algorithm to find the OBB after the one found here:
http://www.cs.purdue.edu/research/technical_reports/1983/TR%2083-463.pdf
However, I don't understand what the DOTPR and CROSSPR methods it mentions on the final page are supposed to return.
I understand how to get the Dot Product of two points and the Cross Product of two points, but it seems these functions are supposed to return the Dot and Cross Products of two edges / line segments. My knowledge of mathematics is admittedly limited but this is my best guess as to what the algorithm is looking for
public static float PolygonCross(List<Vector2> polygon, int indexA, int indexB)
{
var segmentA1 = NextVertice(indexA, polygon) - polygon[indexA];
var segmentB1 = NextVertice(indexB, polygon) - polygon[indexB];
float crossProduct1 = CrossProduct(segmentA1, segmentB1);
return crossProduct1;
}
public static float CrossProduct(Vector2 v1, Vector2 v2)
{
return (v1.X * v2.Y - v1.Y * v2.X);
}
public static float PolygonDot(List<Vector2> polygon, int indexA, int indexB)
{
var segmentA1 = NextVertice(indexA, polygon) - polygon[indexA];
var segmentB1 = NextVertice(indexB, polygon) - polygon[indexB];
float dotProduct = Vector2.Dot(segmentA1, segmentB1);
return dotProduct;
}
However, when I use those methods as directed in this portion of my code...
while (PolygonDot(polygon, i, j) > 0)
{
j = NextIndex(j, polygon);
}
if (i == 0)
{
k = j;
}
while (PolygonCross(polygon, i, k) > 0)
{
k = NextIndex(k, polygon);
}
if (i == 0)
{
m = k;
}
while (PolygonDot(polygon, i, m) < 0)
{
m = NextIndex(m, polygon);
}
..it returns the same index for j, k when I give it a test set of points:
List<Vector2> polygon = new List<Vector2>()
{
new Vector2(0, 138),
new Vector2(1, 138),
new Vector2(150, 110),
new Vector2(199, 68),
new Vector2(204, 63),
new Vector2(131, 0),
new Vector2(129, 0),
new Vector2(115, 14),
new Vector2(0, 138),
};
Note, that I call polygon.Reverse to place these points in Counter-clockwise order as indicated in the technical document from perdue.edu. My algorithm for finding a convex-hull of a point set generates a list of points in counter-clockwise order, but does so assuming y < 0 is higher than y > 0 because when drawing to the screen 0,0 is the top left corner. Reversing the list seems sufficient. I also remove the duplicate point at the end.
After this process, the data becomes:
Vector2(115, 14)
Vector2(129, 0)
Vector2(131, 0)
Vector2(204, 63)
Vector2(199, 68)
Vector2(150, 110)
Vector2(1, 138)
Vector2(0, 138)
This test fails on the first loop when i equals 0 and j equals 3. It finds that the cross-product of the line (115,14) to (204,63) and the line (204,63) to (199,68) is 0. It then find that the dot product of the same lines is also 0, so j and k share the same index.
In contrast, when given this test set:
http://www.wolframalpha.com/input/?i=polygon+%282%2C1%29%2C%281%2C2%29%2C%281%2C3%29%2C%282%2C4%29%2C%284%2C4%29%2C%285%2C3%29%2C%283%2C1%29
My code successfully returns this OBB:
http://www.wolframalpha.com/input/?i=polygon+%282.5%2C0.5%29%2C%280.5%2C2.5%29%2C%283%2C5%29%2C%285%2C3%29
I've read over the C++ algorithm found on http://www.geometrictools.com/LibMathematics/Containment/Wm5ContMinBox2.cpp but I'm too dense to follow it completely. It also appears to be very different than the other one detailed in the paper above.
Does anyone know what step I'm skipping or see some error in my code for finding the dot product and cross product of two line segments? Has anyone successfully implemented this code before in C# and have an example?
Points and vectors as data structures are essentially the same thing; both consist of two floats (or three if you're working in three dimensions). So, when asked to take the dot product of the edges, I suppose it means taking the dot product of the vectors that the edges define. The code you provided does exactly this.
Your implementation of CrossProduct seems correct (see Wolfram MathWorld). However, in PolygonCross and PolygonDot I think you shouldn't normalize the segments. It will affect the magnitude of the return values of PolygonDot and PolygonCross. By removing the superfluous calls to Vector2.Normalize you can speed up your code and reduce the amount of noise in your floating point values. However, normalization is not relevant to the correctness of the code that you have pasted as it only compares the results with zero.
Note that the paper you refer to assumes that the polygon vertices are listed in counterclockwise order (page 5, first paragraph after "Beginning of comments") but your example polygon is defined in clockwise order. That's why PolygonCross(polygon, 0, 1) is negative and you get the same value for j and k.
I assume DOTPR is a normal vector dot product, crosspr is a crossproduct. dotproduct will return a normal number , crossproduct will return a vector which is perpendicular to the two vectors given. (basic vector math,check wikipedia)
they are actually defined in the paper as DOTPR(i,j) returns dotproduct of vectors from vertex i to i+1 and j to j+1. same for CROSSPR but with cross product.

Resources