Are heuristic functions that produce negative values inadmissible?

Are heuristic functions that produce negative values inadmissible? - a-star

As far as I understand, admissibility for a heuristic is staying within bounds of the 'actual cost to distance' for a given, evaluated node. I've had to design some heuristics for an A* solution search on state-spaces and have received a lot of positive efficiency using a heuristic that may sometimes returns negative values, therefore making certain nodes who are more 'closely formed' to the goal state have a higher place in the frontier.
However, I worry that this is inadmissible, but can't find enough information online to verify this. I did find this one paper from the University of Texas that seems to mention in one of the later proofs that "...since heuristic functions are nonnegative". Can anyone confirm this? I assume it is because returning a negative value as your heuristic function would turn your g-cost negative (and therefore interfere with the 'default' dijkstra-esque behavior of A*).

Conclusion: Heuristic functions that produce negative values are not inadmissible, per se, but have the potential to break the guarantees of A*.
Interesting question. Fundamentally, the only requirement for admissibility is that a heuristic never over-estimates the distance to the goal. This is important, because an overestimate in the wrong place could artificially make the best path look worse than another path, and prevent it from ever being explored. Thus a heuristic that can provide overestimates loses any guarantee of optimality. Underestimating does not carry the same costs. If you underestimate the cost of going in a certain direction, eventually the edge weights will add up to be greater than the cost of going in a different direction, so you'll explore that direction too. The only problem is loss of efficiency.
If all of your edges have positive costs, a negative heuristic value can only over be an underestimate. In theory, an underestimate should only ever be worse than a more precise estimate, because it provides strictly less information about the potential cost of a path, and is likely to result in more nodes being expanded. Nevertheless, it will not be inadmissible.
However, here is an example that demonstrates that it is theoretically possible for negative heuristic values to break the guaranteed optimality of A*:
In this graph, it is obviously better to go through nodes A and B. This will have a cost of three, as opposed to six, which is the cost of going through nodes C and D. However, the negative heuristic values for C and D will cause A* to reach the end through them before exploring nodes A and B. In essence, the heuristic function keeps thinking that this path is going to get drastically better, until it is too late. In most implementations of A*, this will return the wrong answer, although you can correct for this problem by continuing to explore other nodes until the greatest value for f(n) is greater than the cost of the path you found. Note that there is nothing inadmissible or inconsistent about this heuristic. I'm actually really surprised that non-negativity is not more frequently mentioned as a rule for A* heuristics.
Of course, all that this demonstrates is that you can't freely use heuristics that return negative values without fear of consequences. It is entirely possible that a given heuristic for a given problem would happen to work out really well despite being negative. For your particular problem, it's unlikely that something like this is happening (and I find it really interesting that it works so well for your problem, and still want to think more about why that might be).

Related

How to leverage Z3 SMT solver for ILP problems

Problem
I'm trying to use z3 to disprove reachability assertions on a Petri net.
So I declare N state variables v0,..v_n-1 which are positive integers, one for each place of a Petri net.
My main strategy given an atomic proposition P on states is the following :
compute (with an exterior engine) any "easy" positive invariants as linear constraints on the variables, of the form alpha_0 * v_0 + ... = constant with only positive or zero alpha_i, then check_sat if any state reachable under these constraints satisfies P, if unsat conclude, else
compute (externally to z3) generalized invariants, where the alpha_i can be negative as well and check_sat, conclude if unsat, else
add one positive variable t_i per transition of the system, and assert the Petri net state equation, that any reachable state has a Parikh firing count vector (a value of t_i's) such that M0 the initial state + product of this Parikh vector by incidence matrix gives the reached state. So this one introduces many new variables, and involves some multiplication of variables, but stays a linear integer programming problem.
I separate the steps because since I want UNSAT, any check_sat that returns UNSAT stops the procedure, and the last step in particular is very costly.
I have issues with larger models, where I get prohibitively long answer times or even the dreaded "unknown" answer, particularly when adding state equation (step 3).
Background
So besides splitting the problem into incrementally harder segments I've tried setting logic to QF_LRA rather than QF_LIA, and declaring the variables as Real than integers.
This overapproximation is computationally friendly (z3 is fast on these !) but unfortunately for many models the solutions are not integers, nor is there an integer solution.
So I've tried setting Reals, but specifying that each variable is either =0 or >=1, to remove solutions with fractions of firings < 1. This does eliminate spurious solutions, but it "kills" z3 (timeout or unknown) in many cases, the problem is obviously much harder (e.g. harder than with just integers).
Examples
I don't have a small example to show, though I can produce some easily. The problem is if I go for QF_LIA it gets prohibitively slow at some number of variables. As a metric, there are many more transitions than places, so adding the state equation really ups the variable count.
This code is generating the examples I'm asking about.
This general presentation slides 5 and 6 express the problem I'm encoding precisely, and slides 7 and 8 develop the results of what "unsat" gives us, if you want more mathematical background.
I'm generating problems from the Model Checking Contest, with up to thousands of places (primary variables) and in some cases above a hundred thousand transitions. These are extremum, the middle range is a few thousand places, and maybe 20 thousand transitions that I would really like to deal with.
Reals + the greater than 1 constraint is not a good solution even for some smaller problems. Integers are slow from the get-go.
I could try Reals then iterate into Integers if I get a non integral solution, I have not tried that, though it involves pretty much killing and restarting the solver it might be a decent approach on my benchmark set.
What I'm looking for
I'm looking for some settings for Z3 that can better help it deal with the problems I'm feeding it, give it some insight.
I have some a priori idea about what could solve these problems, traditionally they've been fed to ILP solvers. So I'm hoping to trigger a simplex of some sort, but maybe there are conditions preventing z3 from using the "good" solution strategy in some cases.
I've become a decent level SMT/Z3 user, but I've never played with the fine settings of :options, to guide the solver.
Have any of you tried feeding what are basically ILP problems to SMT, and found options settings or particular encodings that help it deploy the right solutions ? thanks.

Why is my p value too low with > 1,000,000 obs (p<<0.01)

I know that generally a low P value is good since I want to reject the H0 hypothesis. But my problem is an odd one, and I would appreciate any help or insight you may give me.
I work with huge data sets (n > 1,000,000), each representing data of one year. I am required to analyse the data and find out whether the mean of the year is significantly different than the mean of the previous year. Yet everyone would prefer it to be non-significant instead of significant.
By "significant" I mean that I want to be able to tell my boss, "look, these non-significant changes are noise, while these significant changes represent something real to consider."
The problem is that simply comparing the two averages with a t-test always results in a significant difference, even if the difference is very very small (probably due to the huge sample size) and falls within the O.K zone of reality. So basically the way I perceive it, a p value does not function well for my needs.
What do you think I should do?

There is nothing wrong with the p value. Even slight effects with this number of observations will be flagged for significance. You have rightfully asserted that the effect size for such a sample is very weak. This basically nullifies whatever argument can be made for using the p value alone for "significance"...while the effect can be determined to not be by chance, its actual usefulness in the real world is likely low given it doesn't produce anything predictable.
For a comprehensive book on this subject, see the often-cited book by Jacob Cohen on power analysis. You can also check out my recent post on Cross Validated regarding two regression models with significant p values for predictors, but with radically different predictive power.

Genetic algorithm - shortest path in weighted graph

I want to make a genetic algorithm that solves a shortest path problem in weighted, connected graph. Similar to travelling salesman, but instead of fully-connected graph, it's just connected.
My idea is to randomly generate a path consisting of n-1 nodes for each chromosome in binary form, where numbers indicate nodes in a path. Then I will choose the best depending on sum of weights (if cant go from A to B i would give it penalty) and crossover/mutate bits in it. Will it work? It feels a little like smaller version of bruteforce. Is there a better way?
Thanks!

Genetic algorithm is pretty much "smaller version of bruteforce". It is just a metaheuristic, not an optimization method which has decent convergence guarantees. It basically depends on randomness to provide new solutions, thus it is a "slightly better random search".
So "will it work"? Yes, it will do something, as long as you have enough randomness in mutation it will even (eventually) converge to optimum. Will it work better than a random search? Hard to say, this depends on dozens of factors, not only your encoding, but also all the hyperparameters used etc. in general genetic algorithms are about trials and errors. In particular representation of chromosomes which does not loose any information (yours does not) does not matter, meaning that everything depends on clever implementation of crossover and mutation (as long as chromosomes do not loose any information they are all equivalent).

Edited.
You can use permutation coding GA. In permutation coding, you should give the start and end points. GA searches for the best chromosome with your fitness function. Candidate solutions (chromosomes) will be like 2-5-4-3-1 or 2-3-1-4-5 or 1-2-5-4-3 etc. So your solution depends on your fitness function. (Look at GA package for R to apply permutation GA easily.)
Connections are constraints for your problem. My best advice is create a constraint matrix like that:
FirstPoint SecondPoint Connected
A B true
A C true
A E false
... ... ...
In standard TSP, just distances are considered. In your fitness function, you have to consider this matrix and add a penalty to return value for each false.
Example chromosome: A-B-E-D-C
A-B: 1
B-E: 1
E-D: 4
D-C: 3
Fitness value: 9
.
Example chromosome: A-E-B-C-D
A-E: penalty
E-B: 1
B-C: 6
C-D: 3
Fitness value: 10 + penalty value.
Because your constraint is a hard constraint, you can use max integer value as the penalty. GA will find the best solution. :)

is Q-learning without a final state even possible?

I have to solve this problem with Q-learning.
Well, actually I have to evaluated a Q-learning based policy on it.
I am a tourist manager.
I have n hotels, each can contain a different number of persons.
for each person I put in a hotel I get a reward, based on which room I have chosen.
If I want I can also murder the person, so it goes in no hotel but it gives me a different reward.
(OK,that's a joke...but it's to say that I can have a self transition. so the number of people in my rooms doesn't change after that action).
my state is a vector containing the number of persons in each hotel.
my action is a vector of zeroes and ones which tells me where do I
put the new person.
my reward matrix is formed by the rewards I get for each transition
between states (even the self transition one).
now,since I can get an unlimited number of people (i.e. I can fill it but I can go on killing them) how can I build the Q matrix? without the Q matrix I can't get a policy and so I can't evaluate it...
What do I see wrongly? should I choose a random state as final? Do I have missed the point at all?

This question is old, but I think merits an answer.
One of the issues is that there is not necessarily the notion of an episode, and corresponding terminal state. Rather, this is a continuing problem. Your goal is to maximize your reward forever into the future. In this case, there is discount factor gamma less than one that essentially specifies how far you look into the future on each step. The return is specified as the cumulative discounted sum of future rewards. For episodic problems, it is common to use a discount of 1, with the return being the cumulative sum of future rewards until the end of an episode is reached.
To learn the optimal Q, which is the expected return for following the optimal policy, you have to have a way to perform the off-policy Q-learning updates. If you are using sample transitions to get Q-learning updates, then you will have to specify a behavior policy that takes actions in the environment to get those samples. To understand more about Q-learning, you should read the standard introductory RL textbook: "Reinforcement Learning: An Introduction", Sutton and Barto.

RL problems don't need a final state per se. What they need is reward states. So, as long as you have some rewards, you are good to go, I think.
I don't have a lot of XP with RL problems like this one. As a commenter suggests, this sounds like a really huge state space. If you are comfortable with using a discrete approach, you would get a good start and learn something about your problem by limiting the scope (finite number of people and hotels/rooms) of the problem and turning Q-learning loose on the smaller state matrix.
OR, you could jump right into a method that can handle infinite state space like an neural network.
In my experience if you have the patience of trying the smaller problem first, you will be better prepared to solve the bigger one next.

Maybe it isn't an answer on "is it possible?", but... Read about r-learning, to solve this particular problem you may want to learn not only Q- or V-function, but also rho - expected reward over time. Joint learning of Q and rho results in better strategy.

To iterate on the above response, with an infinite state space, you definitely should consider generalization of some sort for your Q Function. You will get more value out of your Q function response in an infinite space. You could experiment with several different function approximations, whether that is simple linear regression or a neural network.
Like Martha said, you will need to have a gamma less than one to account for the infinite horizon. Otherwise, you would be trying to determine the fitness of N amount of policies that all equal infinity, which means you will not be able to measure the optimal policy.
The main thing I wanted to add here though for anyone reading this later is the significance of reward shaping. In an infinite problem, where there isn't that final large reward, sub-optimal reward loops can occur, where the agent gets "stuck", since maybe a certain state has a reward higher than any of its neighbors in a finite horizon (which was defined by gamma). To account for that, you want to make sure you penalize the agent for landing in the same state multiple times to avoid these suboptimal loops. Obviously, exploration is extremely important as well, and when the problem is infinite, some amount of exploration will always be necessary.

When to use geometric vs arithmetic mean?

So I guess this isn't technically a code question, but it's something that I'm sure will come up for other folks as well as myself while writing code, so hopefully it's still a good one to post on SO.
The Google has directed me to plenty of nice lengthy explanations of when to use one or the other as regards financial numbers, and things like that.
But my particular context doesn't fit in, and I'm wondering if anyone here has some insight. I need to take a whole bunch of individual users' votes on how "good" a particular item is. I.e., some number of users each give a particular item a score between 0 and 10, and I want to report on what the 'typical' score is. What would be the intuitive reasons to report the geometric and/or arithmetic mean as the typical response?
Or, for that matter, would I be better off reporting the median instead?
I imagine there's some psychology involved in what the "best" method might be...
Anyway, there you have it.
Thanks!

Generally speaking, the arithmetic mean will suffice. It is much less computationally intensive than the geometric mean (which involves taking an n-th root).
As for the psychology involved, the geometric mean is never greater than the arithmetic mean, so arithmetic is the best choice if you'd prefer higher scores in general.
The median is most useful when the data set is relatively small and the chance of a massive outlier relatively high. Depending on how much precision these votes can take, the median can sometimes end up being a bit arbitrary.
If you really really want the most accurate answer possible, you could go for calculating the arithmetic-geomtric mean. However, this involved calculating both arithmetic and geometric means repeatedly, so it is very computationally intensive in comparison.

you want the arithmetic mean. since you aren't measuring the average change in average or something.

Arithmetic mean is correct.
Your scale is artificial:
It is bounded, from 0 and 10
8.5 is intuitively between 8 and 9
But for other scales, you would need to consider the correct mean to use.
Some other examples
In counting money, it has been argued that wealth has logarithmic utility. So the median between Bill Gates' wealth and a bum in the inner city would be a moderately successful business person. (Arithmetic average would hive you Larry Page.)
In measuring sound level, decibels already normalizes the effect. So you can take arithmetic average of decibels.
But if you are measuring volume in watts, then use quadratic means (RMS).

The answer depends on the context and your purpose. Percent changes were mentioned as a good time to use geometric mean. I use geometric mean when calculating antennas and frequencies since the percentage change is more important than the average or middle of the frequency range or average size of the antenna is concerned. If you have wildly varying numbers, especially if most are similar but one or two are "flyers" (far from the range of the others) the geometric mean will "smooth" the results (not let the different ones exert a change in the results more than they should). This method is used to calculate bullet group sizes (the "flyer" was probably human error, not the equipment, so the average is ""unfair" in that case). Another variation similar to geometric mean is the root mean square method. First you take the square root of the numbers, take THAT mean, and then square your answer (this provides even more smoothing). This is often used in electrical calculations and most electical meters are calculated in "RMS" (root mean square), not average readings. Hope this helps a little. Here is a web site that explains it pretty well. standardwisdom.com

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart