Related
Background and my thought process:
I wanted to see if I could utilize logistic regression to create a hypothesis function that could predict recessions in the US economy by looking at a date and its corresponding leading economic indicators. Leading economic indicators are known to be good predictors of the economy.
To do this, I got data from OECD on the composite leading (economic) indicators from January, 1970 to July, 2021 in addition to finding when recessions occurred from 1970 to 2021. The formatted data that I use for training can be found further below.
Knowing the relationship between a recession and the Date/LEI wouldn't be a simple linear relationship, I decided to make more parameters for each datapoint so I could fit a polynominal equation to the data. Thus, each datapoint has the following parameters: Date, LEI, LEI^2, LEI^3, LEI^4, and LEI^5.
The Problem:
When I attempt to train my hypothesis function, I get a very strange cost history that seems to indicate that I either did not implement my cost function correctly or that my gradient descent was implemented incorrectly. Below is the imagine of my cost history:
I have tried implementing the suggestions from this post to fix my cost history, as originally I had the same NaN and Inf issues described in the post. While the suggestions helped me fix the NaN and Inf issues, I couldn't find anything to help me fix my cost function once it started oscillating. Some of the other fixes I've tried are adjusting the learning rate, double checking my cost and gradient descent, and introducing more parameters for datapoints (to see if a higher-degree polynominal equation would help).
My Code
The main file is predictor.m.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Program: Predictor.m
% Author: Hasec Rainn
% Desc: Predictor.m uses logistic regression
% to predict when economic recessions will occur
% in the United States. The data it uses is from the past 50 years.
%
% In particular, it uses dates and their corresponding economic leading
% indicators to learn a non-linear hypothesis function to fit to the data.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
LI_Data = dlmread("leading_indicators_formatted.csv"); %Get LI data
RD_Data = dlmread("recession_dates_formatted.csv"); %Get RD data
%our datapoints of interest: Dates and their corresponding
%leading Indicator values.
%We are going to increase the number of parameters per datapoint to allow
%for a non-linear hypothesis function. Specifically, let the 3rd, 4th
%5th, and 6th columns represent LI^2, LI^3, LI^4, and LI^5 respectively
X = LI_Data; %datapoints of interest (row = 1 datapoint)
X = [X, X(:,2).^2]; %Adding LI^2
X = [X, X(:,2).^3]; %Adding LI^3
X = [X, X(:,2).^4]; %Adding LI^4
X = [X, X(:,2).^5]; %Adding LI^5
%normalize data
X(:,1) = normalize( X(:,1) );
X(:,2) = normalize( X(:,2) );
X(:,3) = normalize( X(:,3) );
X(:,4) = normalize( X(:,4) );
X(:,5) = normalize( X(:,5) );
X(:,6) = normalize( X(:,6) );
%What we want to predict: if a recession happens or doesn't happen
%for a corresponding year
Y = RD_Data(:,2); %row = 1 datapoint
%defining a few useful variables:
nIter = 4000; %how many iterations we want to run gradient descent for
ndp = size(X, 1); %number of data points we have to work with
nPara = size(X,2); %number of parameters per data point
alpha = 1; %set the learning rate to 1
%Defining Theta
Theta = ones(1, nPara); %initialize the weights of Theta to 1
%Make a cost history so we can see if gradient descent is implemented
%correctly
costHist = zeros(nIter, 1);
for i = 1:nIter
costHist(i, 1) = cost(Theta, Y, X);
Theta = Theta - (sum((sigmoid(X * Theta') - Y) .* X));
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Function: Cost
% Author: Hasec Rainn
% Parameters: Theta (vector), Y (vector), X (matrix)
% Desc: Uses Theta, Y, and X to determine the cost of our current
% hypothesis function H_theta(X). Uses manual loop approach to
% avoid errors that arrise from log(0).
% Additionally, limits the range of H_Theta to prevent Inf
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function expense = cost(Theta, Y, X)
m = size(X, 1); %number of data points
hTheta = sigmoid(X*Theta'); %hypothesis function
%limit the range of hTheta to [10^-50, 0.9999999999999]
for i=1:size(hTheta, 1)
if (hTheta(i) <= 10^(-50))
hTheta(i) = 10^(-50);
endif
if (hTheta(i) >= 0.9999999999999)
hTheta(i) = 0.9999999999999;
endif
endfor
expense = 0;
for i = 1:m
if Y(i) == 1
expense = expense + -log(hTheta(i));
endif
if Y(i) == 0
expense = expense + -log(1-hTheta(i));
endif
endfor
endfunction
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Function: normalization
% Author: Hasec Rainn
% Parameters: vector
% Desc: Takes in an input and normalizes its value(s)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function n = normalize(data)
dMean = mean(data);
dStd = std(data);
n = (data - dMean) ./ dStd;
endfunction
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Function: Sigmoid
% Author: Hasec Rainn
% Parameters: scalar, vector, or matrix
% Desc: Takes an input and forces its value(s) to be between
% 0 and 1. If a matrix or vector, sigmoid is applied to
% each element.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function result = sigmoid(z)
result = 1 ./ ( 1 + e .^(-z) );
endfunction
The data I used for my learning process can be found here: formatted LI data and recession dates data.
The problem you're running into here is your gradient descent function.
In particular, while you correctly calculate the cost portion (aka, (hTheta - Y) or (sigmoid(X * Theta') - Y) ), you do not calculate the derivative of the cost correctly; in Theta = Theta - (sum((sigmoid(X * Theta') - Y) .* X)), the .*X is not correct.
The derivative is equivalent to the cost of each datapoint (found in the vector hTheta - Y) multiplied by their corresponding parameter j, for every parameter. For more information, check out this article.
My intention was to solve the l1 error fitting problem using scipy.optimize as suggested in L1 norm instead of L2 norm for cost function in regression model, but I keep getting the wrong solution, so I debugged using least square which we know how to get a closed-form expression:
n=10
d=2
A = np.random.rand(n,d)
x = np.random.rand(d,1)
b = np.dot(A, x)
x_hat = np.linalg.lstsq(A, b)[0]
print(np.linalg.norm(np.dot(A, x_hat)-b))
def fit(X, params):
return X.dot(params)
def cost_function(params, X, y):
return np.linalg.norm(y - fit(X, params))
x0 = np.zeros((d,1))
output = minimize(cost_function, x0, args=(A, b))
y_hat = fit(A, output.x)
print(np.linalg.norm(y_hat-b))
print(output)
The output is:
4.726604209672303e-16
2.2714597315189407
fun: 2.2714597315189407
hess_inv: array([[ 0.19434496, -0.1424377 ],
[-0.1424377 , 0.16718703]])
jac: array([ 3.57627869e-07, -2.98023224e-08])
message: 'Optimization terminated successfully.'
nfev: 32
nit: 4
njev: 8
status: 0
success: True
x: array([0.372247 , 0.32633966])
which looks super weird because it can't even solve the l2 regression? Am I making stupid mistakes here?
The error is due to mismatching shapes. Minimize's x0 and the iterates have shape (n,). So when params has shape (n,), fit(X,params) will be shape (n,) while your y has shape (n,1). Thus, the expression y-fit(X,params) has shape (n,n) due to numpy's implicit broadcasting. And in consequence np.linalg.norm returns the frobenius matrix norm instead of the euclidean vector norm.
Solution: change your cost function to:
def cost_function(params, X, y):
return np.linalg.norm(y - fit(X, params.reshape(d,1)))
I am trying to solved a mixed-integer non-linear program with Drake using the Snopt solver, but I encounter the following error:
ValueError: The capabilities of drake::solvers::SnoptSolver do not meet the requirements of the MathematicalProgram ({ProgramAttributes: GenericConstraint, QuadraticCost, LinearConstraint, LinearEqualityConstraint, BinaryVariable})
What is the recommended solver/alternative approach in my case?
Mixed integer nonlinear optimizations are very hard problems and only a handful of optimization solvers exist for that, e.g.: Couenne, Baron. Drake does not support any of these solvers.
Drake supports solvers for most common mixed-integer convex programs (such as MILPs, MIQPs, MISOCs). But for using these, your optimization problem must only have convex constraints (e.g. linear equalities and inequalities).
One way to proceed using a nonlinear solver such as SNOPT to solve your optimization problem is to enforce binary constraints as follow. Say you want x to be binary, then you can add the smooth constraint x * (x - 1) = 0. (Note that the latter is verified iff x = 0 or x = 1.) This however is a very "tough" constraint, and can lead to numeric issues. Note also that doing this you do not have any guarantees in terms of finding a feasible solution if one exists (guarantees that you do have in mixed-integer convex programming), and the solver might converge to a local minimum.
As you mentioned that your constraint is
x[k+1] = A * ((1-b[k]) * x1[k] + b[k] * x2[k])
where b[k] is a binary variable, x1[k], x2[k], x[k+1] are continuous variables.
This constraint could be reformulated as a mixed-integer linear constraint. What you want is
b[k] = 1, then x[k+1] = A * x2[k]
b[k] = 0, then x[k+1] = A * x1[k]
Generally, if by fixing the binary variable to either 0 or 1, the remaining constraint is linear, then the constraints can be reformulated as mixed-integer linear constraints. There are two approaches to convert your constraint to mixed-integer linear constraint, namely the big-M approach and the convex hull approach, as explained in this tutorial.
As a quick demo on the convex hull approach, suppose that your variable are bounded
x1[k] ∈ ConvexHull(v₁, v₂, ..., vₘ)
x2[k] ∈ ConvexHull(w₁, w₂, ..., wₙ)
where v₁, v₂, ..., vₘ, w₁, w₂, ..., wₙ are all given points.
We introduce two slack variables s1, s2. We intend to impose the following constraints
b[k] = 1, then s1 = 0, s2 = A * x2[k]
b[k] = 0, then s2 = 0, s1 = A * x1[k]
x[k+1] = s1 + s2
The constraints above means that the convex hull of the point (b[k], s2, x2[k]) is the polytope with vertices
(1, A * w₁, w₁)
(1, A * w₂, w₂)
...
(1, A * wₙ, wₙ)
(0, 0, w₁)
(0, 0, w₂)
...
(0, 0, wₙ)
With these vertices, the polytope is written in the V-representation. You could convert this V-representation to an H-representation. This H-representation is denoted as
H * [b[k], s2, x2[k]] <= h
Where each row of H is a face normal of the polytope. This H-representation is a (mixed-integer) linear constraint on the variables b[k], s2, x2[k]. An alternative formulation to constrain that b[k], s2, x2[k] to lie within the polytope is to write (b[k], s2, x2[k]) as a convex combination of the polytope vertices (You will need to introduce the convex combination weights as decision variables also, with all the weights being non-negative, and the sum of the weights equal to 1). Using this convex combination approach you won't need to convert the V-representation to H-representation. Here is the mixed-integer linear constraints on b[k], s2, x2[k] using the convex combination approach.
b[k] = λ₁ + λ₂ + ... + λₙ
s2 = A*(λ₁w₁ + λ₂w₂ + ... + λₙwₙ)
x2[k] = λ₁w₁ + λ₂w₂ + ... + λₙwₙ + λₙ₊₁w₁ + λₙ₊₂w₂ + ... + λ₂ₙwₙ
1 = λ₁ + λ₂ + ... + λₙ + λₙ₊₁ + λₙ₊₂ + ... + λ₂ₙ
λᵢ ≥ 0 ∀ i = 1, ..., 2n
b[k] is binary
where the weights λᵢ, i = 1, ..., 2n are new slack variables representing the convex combination weights. You could verify that these constraints enforce that
if b[k] = 0, then s2 = 0.
if b[k] = 1, then s2 = A*x2[k]
Similarly we know that the convex hull of the point (b[k], s1, x1[k]) is a polytope with the following vertices
(1, 0, v₁)
(1, 0, v₂)
...
(1, 0, vₘ)
(0, A*v₁, v₁)
(0, A*v₂, v₂)
...
(0, A*vₘ, vₘ)
and we can write the linear constraints from the H-representation of the polytope. We combine the two sets of linear constraints derived from the polytopes, together with the linear constraint x[k+1] = s1 + s2, we get the mixed-integer linear constraints of your problem. For the detailed explanation of this convex hull approach, you could refer to the linked tutorial above.
I have the logistic map function in Maxima like so:
F(x,r,n):= x[n]=r*x[n-1]*(1-x[n-1]);
And when I input the correct variables it gives me the answer to, for example, x[0]:
(%i15) n:0$
x[n-1]:[0.1]$
F(x, r:3, n);
(%o15) x[0]=[0.27]
However, this answer does not stay memorized and when I enter x[0] I get
x[0];
(%o5) x[0]
How do I write a function that will calculate x[n] for me and store it in memory, so I can use it later? I am trying to make a bifurcation diagram for the logistic map without using any black boxes, i.e., the orbits functions.
Thank you!
There are different ways to go about it. One straightforward way is to create a list and then iterate, computing its elements one by one. E.g.:
(%i4) x: makelist (0, 10);
(%o4) [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
(%i5) x[1]: 0.1;
(%o5) 0.1
(%i6) r: 3;
(%o6) 3
(%i7) for i:2 thru 10 do x[i]: r * x[i - 1] * (1 - x[i - 1]);
(%o7) done
(%i8) x;
(%o8) [0.1, 0.2700000000000001, 0.5913000000000002,
0.7249929299999999, 0.5981345443500454, 0.7211088336156269,
0.603332651091411, 0.7179670896552621, 0.6074710434816448,
0.7153499244388992]
Note that : is the assignment operator, not =.
I have to create a path between two given points in a grid in Prolog. The code I have so far is:
createPath(GridSize, BeginPosition, EndPosition, VisitedPoints, Path):-
nextStep(BeginPosition, NextStep, GridSize),
(
NextStep \== EndPosition,
->
nonmember(NextStep, VisitedPoints),
add(NextStep, VisitedPoints, NewVisitedPoints),
add(NextStep, Path, NewPath),
createPath(GridSize, NextStep, EndPosition, NewVisitedPoints, NewPath)
;
???
).
A little bit of explanation of my code:
GridSize is just an integer. If it is 2, the grid is a 2x2 grid. So all the grids are square.
The BeginPosition and EndPosition are shown like this: pos(X,Y).
The function nextStep looks for a valid neigbor of a given position. The values of X and Y have to be between 1 and the grid size. I've declared 4 different predicates of nextStep: X + 1, X - 1, Y + 1 and Y - 1.
This is the code:
nextStep(pos(X,Y),pos(X1,Y),GridSize):-
X1 is X + 1,
X1 =< GridSize.
nextStep(pos(X,Y),pos(X1,Y),_):-
X1 is X - 1,
X1 >= 1.
nextStep(pos(X,Y),pos(X,Y1),GridSize):-
Y1 is Y + 1,
Y1 =< GridSize.
nextStep(pos(X,Y),pos(X,Y1),_):-
Y1 is Y - 1,
Y1 >= 1.
nonmember returns true if a given element doesn't occur in a given list.
add adds an element to a given list, and returns the list with that element in it.
Another thing to know about VisitedPoints: Initially the BeginPosition and EndPosition are stored in that list. For example, if I want to find a path in a 2x2 grid, and I have to avoid point pos(2,1), then I will call the function like this:
createPath(2, pos(1,1), pos(2,2), [pos(1,1),pos(2,2),pos(2,1)], X).
The result I should get of it, should be:
X = [pos(1,2)]
Because that is the point needed to connect pos(1,1) and pos(2,2).
My question is, how can I stop the code from running when NextStep == EndPosition. In other words, what do I have to type at the location of the '???' ? Or am I handling this problem the wrong way?
I'm pretty new to Prolog, and making the step from object oriented languages to this is pretty hard.
I hope somebody can answer my question.
Kind regards,
Walle
I think you just placed the 'assignment' to path at the wrong place
createPath(GridSize, BeginPosition, EndPosition, VisitedPoints, Path):-
nextStep(BeginPosition, NextStep, GridSize),
(
NextStep \== EndPosition,
->
nonmember(NextStep, VisitedPoints),
add(NextStep, VisitedPoints, NewVisitedPoints),
% add(NextStep, Path, NewPath),
% createPath(GridSize, NextStep, EndPosition, NewVisitedPoints, NewPath)
createPath(GridSize, NextStep, EndPosition, NewVisitedPoints, Path)
;
% ???
% bind on success the output variable, maybe add EndPosition
Path = VisitedPoints
).
Maybe this is not entirely worth an answer, but a comment would be a bit 'blurry'