What will be the heuristic for the tower of Hanoi Question - greedy

I want to implement python with python with different search algorithms. One of the algorithms is Greedy or A* that needs a heuristic function to work.
I cant think of any correct heuristic to work. Could someone suggest a heuristic?

For search algorithms, like Greedy and A*, you need to implement the manhattan. It is used to compare distance of two point.
(n, m) = |xn − xm| + |yn − ym|

Related

Inverting feedforward neural network

I have been searching online for papers on inverting feedforward neural networks and it turns out there are NLP and LP algorithms for inverting them. But most of the papers were interested in receiving an inversed mapping one-to-many. I am wondering about this kind of problems:
Say I have a function z=x+y and I will teach my FFNN to approximate this function. Once it has been taught, I would like to take for example x and z as inputs and would like to get y as output. So it is not exactly mapping one-to-many, but is it any easier than problem of having just z and wanting to compute x and y? Are there any algorithms for performing such task?
To best of my knowledge methods that inverse a network are usually Adversarial methods (GANs) or do so by optimizing the networks output (let's say optimizing |f(x, y') - z| where y' is the output in your problem). The first method is more popular.
Let's talk about the first method more, we call the network that you trained to learn x + y = z network D. You will have to teach a network (let's call it G) to get x and z and produce y and then D checks if that's the correct answer (i.e. if x + y = z), we continue this until G learns to satisfy D (I have left some detail out, you can learn more by studying about GANs). However, this is more like reformulating our problem.
If you're familiar with how NNs work, you'll know that it's hard to train a network by determining its desired output and part of input, since we can't use back propagation.
Finally, you might want to check this paper out. there is not much technical details but it is proposing precisely what you asked :
https://openreview.net/forum?id=BJxMQbQ3wm

Feature Selection or PCA?

I'm having the below Azure Machine Learning question:
You need to identify which columns are more predictive by using a
statistical method. Which module should you use?
A. Filter Based Feature Selection
B. Principal Component Analysis
I choose is A but the answer is B. Can someone explain why it is B
PCA is the optimal approximation of a random vector (in N-d space) by linear combination of M (M < N) vectors. Notice that we obtain these vectors by calculating M eigenvectors with largest eigen values. Thus these vectors (features) can (and usually are) a combination of original features.
Filter Based Feature Selection is choosing the best features as they are (not combining them in any way) based on various scores and criteria.
so as you can see, PCA results in better features since it creates better set of features while FBFS merely finds the best subset.
hope that helps ;)

Distance measure for categorical attributes for k-Nearest Neighbor

For my class project, I am working on the Kaggle competition - Don't get kicked
The project is to classify test data as good/bad buy for cars. There are 34 features and the data is highly skewed. I made the following choices:
Since the data is highly skewed, out of 73,000 instances, 64,000 instances are bad buy and only 9,000 instances are good buy. Since building a decision tree would overfit the data, I chose to use kNN - K nearest neighbors.
After trying out kNN, I plan to try out Perceptron and SVM techniques, if kNN doesn't yield good results. Is my understanding about overfitting correct?
Since some features are numeric, I can directly use the Euclid distance as a measure, but there are other attributes which are categorical. To aptly use these features, I need to come up with my own distance measure. I read about Hamming distance, but I am still unclear on how to merge 2 distance measures so that each feature gets equal weight.
Is there a way to find a good approximate for value of k? I understand that this depends a lot on the use-case and varies per problem. But, if I am taking a simple vote from each neighbor, how much should I set the value of k? I'm currently trying out various values, such as 2,3,10 etc.
I researched around and found these links, but these are not specifically helpful -
a) Metric for nearest neighbor, which says that finding out your own distance measure is equivalent to 'kernelizing', but couldn't make much sense from it.
b) Distance independent approximation of kNN talks about R-trees, M-trees etc. which I believe don't apply to my case.
c) Finding nearest neighbors using Jaccard coeff
Please let me know if you need more information.
Since the data is unbalanced, you should either sample an equal number of good/bad (losing lots of "bad" records), or use an algorithm that can account for this. I think there's an SVM implementation in RapidMiner that does this.
You should use Cross-Validation to avoid overfitting. You might be using the term overfitting incorrectly here though.
You should normalize distances so that they have the same weight. By normalize I mean force to be between 0 and 1. To normalize something, subtract the minimum and divide by the range.
The way to find the optimal value of K is to try all possible values of K (while cross-validating) and chose the value of K with the highest accuracy. If a "good" value of K is fine, then you can use a genetic algorithm or similar to find it. Or you could try K in steps of say 5 or 10, see which K leads to good accuracy (say it's 55), then try steps of 1 near that "good value" (ie 50,51,52...) but this may not be optimal.
I'm looking at the exact same problem.
Regarding the choice of k, it's recommended be an odd value to avoid getting "tie votes".
I hope to expand this answer in the future.

A*: Finding a better solution for 15-square puzzle with one given solution

Given that there is a 15-square puzzle and we will solve the puzzle using a-star search. The heuristic function is Manhattan distance.
Now a solution is provided by someone with cost T and we are not sure if this solution is optimal. With this information provided,
Is it possible to find a better solution with cost < T?
Is it possible to optimize the performance of searching algorithm?
For this question, I have considered several approaches.
h(x) = MAX_INT if g(x) >= T. That is, the f(x) value will be maximum if the solution is larger than T.
Change the search node as CLOSED state if g(x) >= T.
Is it possible to find a better solution?
You need to know if T is the optimal solution. If you do not know the optimal solution, use the average cost; a good path is better than the average. If T is already better than average, you don't need to find a new path.
Is it possible to optimize the performance of the searching algorithm?
Yes. Heuristics are assumptions that help algorithms to make good decisions. The A* algorithm makes the following assumptions:
The best path costs the least (Djikstra's Algorithm - stay near origin of search)
The best path is the most direct path (Greedy Search - minimize distance to goal)
Good heuristics vastly improve performance (A* is useful for this reason). Bad heuristics lead the search away from good solutions and obliterate performance. My advice is to know the game you are searching; in chess, it's generally best to avoid losing a queen, so that may be a good heuristic to use.
Heuristics will have the largest impact on performance, especially in the case of a 15x15 search space. In larger search spaces (2000x2000), good use of high efficiency data structures like arrays and integers may improve performance.
Potential solutions
Both the solutions you provide are effectively the same; if the path isn't as good as the other paths you have, ignore them. Search algorithms like A* do this for you, as j_random_hacker has said in a roundabout manner.
The OPEN list is a set of possible moves; select the best and ignore the rest. The CLOSED list is the set of moves that have already been selected, not the ones you wish to ignore.
(1) d(x) = Djikstra's Algorithm
(2) g(x) = Greedy Search
(3) a*(x) = A* Algorithm = d(x) + g(x)
To make your A* more greedy (prefer suboptimal but fast solutions), multiply the cost of g(x) to favour a greedy search; (4) a*(x) = d(x) + 1.1 * g(x)
I actually tested this in to a search space of 1500x2000. (3), a standard A*, took about 5 seconds to find the goal on the opposite side. (4) took only milliseconds to find the goal, demonstrating the value of using heuristics well.
You may also add other heuristics to A*, such as:
Depth-first search (prefer a greater amount of moves)
Bread-first (prefer a smaller amount of moves)
Stick to Roads (if terrain determines movement speed, increase the cost of choosing bad terrain)
Stay out of enemy territory (if you want to avoid losing units, don't put them in harms way)

How to do random embedded bracketing of elements

I'm writing a learning algorithm for automatic constituent bracketing. Since the algorithm starts from scratch, the bracketing (embedded) should be random at first. It is then improved through iterations. I'm stuck with how to do random bracketing. Can you please suggest a code in R or Python or give some programming idea (pseudo code)? I also need ideas on how to check a random bracketing against a proper one for correctness.
This is what I'm trying to finally arrive at, through the learning process, starting from random bracketing.
This is a sentence.
'He' 'chased' 'the' 'dog.'
Replacing each element with grammatical elements,
N, V, D, N.
Bracketing (first phase) (D, N are constituents):
(N) (V) (D N)
Bracketing (second phase):
(N) ((V) (D N))
Bracketing (third phase):
((N) ((V) (D N)))
Please help. Thank you.
Here's all I can say with the information provided:
A naive way for the bracketing would be to generate some trees (generating all can quickly get very space consuming), having as many leaves as there are words (or components), then selecting a suitable one (at random or according to proper partitioning) and apply it as bracketing pattern. For more efficiency, look for a true random tree generation algorithm (I couldn't find one at the moment).
Additionally, I'd recommend reading about genetic algos/evolutionary programming, especially fitness fucnctions (which are the "check random results for correctness" part). As far as I understood you, you want the program to detect ways of parsing and then keep them in memory as "learned". That quite matches a genetic algorithm with memorization of "fittest" patterns (and only mutation as changing factor).
An awesome, very elaborate (if working), but probably extremely difficult approach would be to use genetic programming. But that's probably too different from what you want.
And last, the easiest way to check correctness of bracketing imo would be to keep a table with the grammar/syntax rules and compare with them. You also could improve this to a better fitness function by keeping them in a tree and measuring the distance from the actual pattern ((V D) N) to the correct pattern (V (D N)). (which is just some random idea, I've never actually done this..)

Resources