how does xgboost enforce monotonicity constraints

how does xgboost enforce monotonicity constraints - machine-learning

I would like to know that how xgboost enforce monotonic constraints while building the tree model. So far by reading the code, I have understood that it has something to do with weights of each node but am not able to understand why this approach works.
Thanks in advance for your answers

Here is a simple pseudocode for the same feature in LightGBM:
min_value = node.min_value
max_value = node.max_value
check(min_value <= split.left_output)
check(min_value <= split.right_output)
check(max_value >= split.left_otput)
check(max_value >= split.right_output)
mid = (split.left_output + split.right_output) / 2;
if (split.feature is monotonic increasing) {
check(split.left_output <= split.right_output)
node.left_child.set_max_value(mid)
node.right_child.set_min_value(mid)
}
if (split.feature is monotonic decreasing ) {
check(split.left_output >= split.right_output)
node.left_child.set_min_value(mid)
node.right_child.set_max_value(mid)
}
Reference: https://github.com/Microsoft/LightGBM/issues/14#issuecomment-359752223.
I believe it's basically the same algorithm as the one implemented in XGBoost.
For each split candidate:
Check the values of both leaves against the monotonicity constraints propagated from predecessors.
Check the monotonicity between two leaves.
Reject the split if the monotonicity is broken.

Related

Recommend a good heuristic for longest Hamiltonian path in polynomial time

I have a complete weighted graph with 1000 nodes and need to find the longest possible Hamiltonian path in the graph (the sequence of nodes, to be more precise). I am supposed to fit in 5 sec (for Java), the memory limit is big enough.
Finding the longest Hamiltonian path doesn't look much different from finding solution for TSP (travelling salesman). Of course, an optimal solution is out of question, so I'm looking for a good heuristic.
My best solution so far is using the Nearest Neighbour algorithm, which is easy to implement and runs in polynomial time (takes ~0.7 seconds for 1000 nodes graph). It's a bit far from the optimal solution though.
So I'm looking for a better heuristic that still runs relatively fast.
I see mentioned Tabu Search, Simulated Annealing, Ant Colony, Genetics, Branch and Bound, MST based algorithms and others.
The problem is, as their implementation is not exactly trivial, it's hard to find their time complexity to decide which can fit in the 5 sec. time limit; e.g. run in polynomial time.
For some algorithms like Christofides' I see that the complexity is O(V^4), where V is the number of vertices, which apparently makes it impossible to fit.
I came across the Bitonic Tour solution, usually used for finding the shortest Hamiltonian path in Euclidean graphs, but seems kind of OK for finding the longest path in non-Euclidean graphs too:
public static void minCostTour(int[][] graph) {
int n = graph.length;
int[][] dp = new int[n][n];
dp[0][1] = graph[0][1];
for (int i = 0; i < n - 1; i++) {
for (int j = i + 1; j < n; j++)
if (i == (j - 1) && i != 0) {
dp[i][j] = dp[0][j-1] + graph[0][j];
for (int k = 1; k <= j - 2; k++)
if ((dp[k][j-1] + graph[k][j] < dp[i][j])) {
dp[i][j] = dp[k][j-1] + graph[k][j];
}
} else if (i != 0 || j != 1) {
dp[i][j] = dp[i][j-1] + graph[j-1][j];
}
}
System.out.println("Optimal Tour Cost: " + (dp[n-2][n-1] + graph[n-2][n-1]));
}
The standard algorithm includes an initial sorting of coordinates, which I skipped, as apparently there are no coordinates to sort (the graph is non-Euclidean).
This dynamic programming solution runs in O(V^2) so it might be good.
The problem is that it outputs the Hamiltonian path length and I need the sequence of nodes. I can't really understand how to restore the path from the above algorithm (if possible at all).
TL DR version:
Can the Bitonic Tour algorithm above be used for finding the sequence of nodes on the longest Hamiltonian path in a complete weighted graph?
If not, can you recommend an algorithm with similar (polynomial) time complexity for that task?

z3 Optimize does not produce result where Solver produces one

I'm writing a function that will identify when a dropped object will hit the ground. It starts at some highness-value y with some initial velocity y0. It takes gravity acceleration (9.81m/s) into account and tries to determine a dt at which y == 0.
The code (below) works fine to determine at what point the mass will hit the ground. However, I had to find out the hard way that I have to use Solver and not Optimize. (result: unknown). Can somebody explain the reason for this?
Shouldn't optimize also be able to find a solution? (obviously there is only one in this scenario)
Here's my code:
from z3 import *
vy0, y0 = Reals("vy0 y0") # initial velocity, initial position
dt, vy, y = Reals("dt vy y") # time(s), acceleration, velocity, position
solver = Solver() # Optmize doesn't work, but why?
solver.add(vy0 == 0)
solver.add(y0 == 3)
solver.add(dt >= 0) # time needs to be positive
solver.add(vy == vy0 - 9.81 * dt) # the velocity is initial - acceleration * time
solver.add(y == y0 + vy/2 * dt) # the position changes with the velocity (s = v/2 * t)
solver.add(y == 0)
# solver.minimize(dt) # Optmize doesn't work, but why?
print(solver.check())
print(solver.model())

Z3's optimizer only works with linear constraints. You have a non-linear term (due to the multiplication involving vy and dt), and thus the optimizing solver gives up with Unknown.
The satisfiability solver, however, can deal with non-linear real constraints; and hence has no problem giving you a model.
For more on non-linear optimization in Z3, simply search for that term. You'll see many folks asking similar questions! Here's one example: z3Opt optimize non-linear function using qfnra-nlsat
(Note that nonlinear optimization is a significantly harder problem for reals as opposed to pure satisfiability. So, it's not just a matter of "not having implemented it yet.")

optimal separating hyperplane objective function confusion

Chapter 4.5.2 of Elements of Statistical Learning
I don't understand what does it mean：
＂Since for any β and β0 satisfying these inequalities, any positively scaled
multiple satisfies them too, we can arbitrarily set ||β|| = 1/M.＂　
Also, how does maximize M becomes minimize 1/2(||β||^2) ?

＂Since for any β and β0 satisfying these inequalities, any positively scaled multiple satisfies them too, we can arbitrarily set ||β|| = 1/M.＂　
y_i(x_i' b + b0) >= M ||b||
thus for any c>0
y_i(x_i' [bc] + [b0c]) >= M ||bc||
thus you can always find such c that ||bc|| = 1/M, so we can focus only on b such that they have such norm (we simply limit the space of possible solutions because we know that scaling does not change much)
Also, how does maximize M becomes minimize 1/2(||β||^2) ?
We put ||b|| = 1/M, thus M=1/||b||
max_b M = max_b 1 / ||b||
now maximization of positive f(b) is equivalent of minimization of 1/f(b), so
min ||b||
and since ||b|| is positive, its minimization is equivalent to minimization of the square, as well as multiplied by 1/2 (this does not change the optimal b)
min 1/2 ||b||^2

Normalize a feature in this table

This has become quite a frustrating question, but I've asked in the Coursera discussions and they won't help. Below is the question:
I've gotten it wrong 6 times now. How do I normalize the feature? Hints are all I'm asking for.
I'm assuming x_2^(2) is the value 5184, unless I am adding the x_0 column of 1's, which they don't mention but he certainly mentions in the lectures when talking about creating the design matrix X. In which case x_2^(2) would be the value 72. Assuming one or the other is right (I'm playing a guessing game), what should I use to normalize it? He talks about 3 different ways to normalize in the lectures: one using the maximum value, another with the range/difference between max and mins, and another the standard deviation -- they want an answer correct to the hundredths. Which one am I to use? This is so confusing.

...use both feature scaling (dividing by the
"max-min", or range, of a feature) and mean normalization.
So for any individual feature f:
f_norm = (f - f_mean) / (f_max - f_min)
e.g. for x2,(midterm exam)^2 = {7921, 5184, 8836, 4761}
> x2 <- c(7921, 5184, 8836, 4761)
> mean(x2)
6676
> max(x2) - min(x2)
4075
> (x2 - mean(x2)) / (max(x2) - min(x2))
0.306 -0.366 0.530 -0.470
Hence norm(5184) = 0.366
(using R language, which is great at vectorizing expressions like this)
I agree it's confusing they used the notation x2 (2) to mean x2 (norm) or x2'
EDIT: in practice everyone calls the builtin scale(...) function, which does the same thing.

It's asking to normalize the second feature under second column using both feature scaling and mean normalization. Therefore,
(5184 - 6675.5) / 4075 = -0.366

Usually we normalize all of them to have zero mean and go between [-1, 1].
You can do that easily by dividing by the maximum of the absolute value and then remove the mean of the samples.

"I'm assuming x_2^(2) is the value 5184" is this because it's the second item in the list and using the subscript _2? x_2 is just a variable identity in maths, it applies to all rows in the list. Note that the highest raw mid-term exam result (i.e. that which is not squared) goes down on the final test and the lowest raw mid-term result increases the most for the final exam result. Theta is a fixed value, a coefficient, so somewhere your normalisation of x_1 and x_2 values must become (EDIT: not negative, less than 1) in order to allow for this behaviour. That should hopefully give you a starting basis, by identifying where the pivot point is.

I had the same problem, in my case the thing was that I was using as average the maximum x2 value (8836) minus minimum x2 value (4761) divided by two, instead of the sum of each x2 value divided by the number of examples.

For the same training set, I got the question as
Q. What is the normalized feature x^(3)_1?
Thus, 3rd training ex and 1st feature makes out to 94 in above table.
Now, normalized form is
x = (x - mean(x's)) / range(x)
Values are :
x = 94
mean(89+72+94+69) / 4 = 81
range = 94 - 69 = 25
Normalized x = (94 - 81) / 25 = 0.52

I'm taking this course at the moment and a really trivial mistake I made first time I answered this question was using comma instead of dot in the answer, since I did by hand and in my country we use comma to denote decimals. Ex:(0,52 instead of 0.52)
So in the second time I tried I used dot and works fine.

Minimum and maximum values of integer variable

Let's assume a very simple constraint: solve(x > 0 && x < 5).
Can Z3 (or any other SMT solver, or any other automatic technique)
compute the minimum and maximum values of (integer) variable x that satisfies the given constraints?
In our case, the minimum is 1 and the maximum is 4.

Z3 has not support for optimizing (maximizing/minimizing) objective functions or variables.
We plan to add this kind of capability, but it will not happen this year.
In the current version, we can "optimize" an objective function by solving several problems where in each iteration we add additional constraints. We know we found the optimal when the problem becomes unsatisfiable. Here is a small Python script that illustrates the idea. The script maximizes the value of a variable X. For minimization, we just have to replace s.add(X > last_model[X]) with s.add(X < last_model[X]). This script is very naive, it performs a "linear search". It can be improved in many ways, but it demonstrates the basic idea.
You can also try the script online at: http://rise4fun.com/Z3Py/KI1
See the following related question: Determine upper/lower bound for variables in an arbitrary propositional formula
from z3 import *
# Given formula F, find the model the maximizes the value of X
# using at-most M iterations.
def max(F, X, M):
s = Solver()
s.add(F)
last_model = None
i = 0
while True:
r = s.check()
if r == unsat:
if last_model != None:
return last_model
else:
return unsat
if r == unknown:
raise Z3Exception("failed")
last_model = s.model()
s.add(X > last_model[X])
i = i + 1
if (i > M):
raise Z3Exception("maximum not found, maximum number of iterations was reached")
x, y = Ints('x y')
F = [x > 0, x < 10, x == 2*y]
print max(F, x, 10000)

As Leonardo pointed out, this was discussed in detail before: Determine upper/lower bound for variables in an arbitrary propositional formula. Also see: How to optimize a piece of code in Z3? (PI_NON_NESTED_ARITH_WEIGHT related).
To summarize, one can either use a quantified formula, or go iteratively. Unfortunately, these techniques are not equivalent:
Quantified approach needs no iteration, and can find global min/max in a single call to the solver; at least in theory. However, it does give rise to harder formulas. So, the backend solver can time-out, or simply return "unknown".
Iterative approach creates simple formulas for the backend solver to deal with, but it can loop forever if there's no optimal value; simplest example being trying to find the largest Int value. Quantified version can solve this problem nicely by quickly telling you that there is no such value, while the iterative version would go on indefinitely. This can be a problem if you don't know ahead of time that your constraints do have an optimal solution. (Needless to say, the "sufficient" iteration count is typically hard to guess, and might depend on random factors, like the seed used by the solver.)
Also keep in mind that if there is a custom optimization algorithm for the problem domain at hand, it's unlikely that a general purpose SMT solver can outperform it.

z3 now supports optimization.
from z3 import *
o = Optimize()
x = Int( 'x' )
o.add(And(x > 0, x < 5))
o.maximize(x)
print(o.check()) # prints sat
print(o.model()) # prints [x = 4]
This particular problem is an integer program.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

how does xgboost enforce monotonicity constraints - machine-learning

I would like to know that how xgboost enforce monotonic constraints while building the tree model. So far by reading the code, I have understood that it has something to do with weights of each node but am not able to understand why this approach works. Thanks in advance for your answers

Related

Recommend a good heuristic for longest Hamiltonian path in polynomial time

z3 Optimize does not produce result where Solver produces one

optimal separating hyperplane objective function confusion

Normalize a feature in this table

Minimum and maximum values of integer variable

Categories

Resources