Is there a algorithm that will find as long as possible hamiltonian cycles in v^2 time. I am running a program that needs to find cycles on a sparse graph (4v edges at maximum), and according to my calculations, I need v^2 or better. I understand that to operate in v^2 it would have to be heuristical, and possibly not very accurate. Please tell me if this is impossible, as I have no clue whether this is possible.
Related
I would like to improve the scalability of SMT solving. I have actually implemented the incremental solving. But I would like to improve more. Any other general methods to improve it without the knowledge of the problem itself?
There's no single "trick" that can make z3 scale better for an arbitrary problem. It really depends on what the actual problem is and what sort of constraints you have. Of course, this goes for any general computing problem, but it really applies in the context of an SMT solver.
Having said that, here're some general ideas based on my experience, roughly in the order of ease of use:
Read the Programming Z3 book This is a very nice write-up and will teach you a ton of things about how z3 is architected and what the best idioms are. You might hit something in there that directly applies to your problem: https://theory.stanford.edu/~nikolaj/programmingz3.html
Keep booleans as booleans not integers Never use integers to represent booleans. (That is, use 1 for true, 0 for false; multiplication for and etc. This is a terrible idea that kills the powerful SAT engine underneath.) Explicitly convert if necessary. Most problems where people tend to deploy such tricks involve counting how many booleans are true etc.: Such problems should be solved using the pseudo-boolean tactics that's built into the solver. (Look up pbEq, pbLt etc.)
Don't optimize unless absolutely necessary The optimization engine is not incremental, nor it is well optimized (pun intended). It works rather slowly compared to all other engines, and for good reason: Optimization modulo theories is a very tricky thing to do. Avoid it unless you really have an optimization problem to tackle. You might also try to optimize "outside" the solver: Make a SAT call, get the results, and making subsequent calls asking for "smaller" cost values. You may not hit the optimum using this trick, but the values might be good enough after a couple of iterations. Of course, how well the results will be depends entirely on your problem.
Case split Try reducing your constraints by case-splitting on key variables. Example: If you're dealing with floating-point constraints, say; do a case split on normal, denormal, infinity, and NaN values separately. Depending on your particular domain, you might have such semantic categories where underlying algorithms take different paths, and mixing-and-matching them will always give the solver a hard time. Case splitting based on context can speed things up.
Use a faster machine and more memory This goes without saying; but having plenty of memory can really speed up certain problems, especially if you have a lot of variables. Get the biggest machine you can!
Make use of your cores You probably have a machine with many cores, further your operating system most likely provides fine-grained multi-tasking. Make use of this: Start many instances of z3 working on the same problem but with different tactics, random seeds, etc.; and take the result of the first one that completes. Random seeds can play a significant role if you have a huge constraint set, so running more instances with different seed values can get you "lucky" on average.
Try to use parallel solving Most SAT/SMT solver algorithms are sequential in nature. There has been a number of papers on how to parallelize some of the algorithms, but most engines do not have parallel counterparts. z3 has an interface for parallel solving, though it's less advertised and rather finicky. Give it a try and see if it helps out. Details are here: https://theory.stanford.edu/~nikolaj/programmingz3.html#sec-parallel-z3
Profile Profile z3 source code itself as it runs on your problem, and see
where the hot-spots are. See if you can recommend code improvements to developers to address these. (Better yet, submit a pull request!) Needless to say, this will require a deep study of z3 itself, probably not suitable for end-users.
Bottom line: There's no free lunch. No single method will make z3 run better for your problems. But above ideas might help improve run times. If you describe the particular problem you're working on in some detail, you'll most likely get better advice as it applies to your constraints.
I am working on a machine learning scenario where the target variable is Duration of power outages.
The distribution of the target variable is severely skewed right (You can imagine most power outages occur and are over with fairly quick, but then there are many, many outliers that can last much longer) A lot of these power outages become less and less 'explainable' by data as the durations get longer and longer. They become more or less, 'unique outages', where events are occurring on site that are not necessarily 'typical' of other outages nor is data recorded on the specifics of those events outside of what's already available for all other 'typical' outages.
This causes a problem when creating models. This unexplainable data mingles in with the explainable parts and skews the models ability to predict as well.
I analyzed some percentiles to decide on a point that I considered to encompass as many outages as possible while I still believed that the duration was going to be mostly explainable. This was somewhere around the 320 minute mark and contained about 90% of the outages.
This was completely subjective to my opinion though and I know there has to be some kind of procedure in order to determine a 'best' cut-off point for this target variable. Ideally, I would like this procedure to be robust enough to consider the trade-off of encompassing as much data as possible and not telling me to make my cut-off 2 hours and thus cutting out a significant amount of customers as the purpose of this is to provide an accurate Estimated Restoration Time to as many customers as possible.
FYI: The methods of modeling I am using that appear to be working the best right now are random forests and conditional random forests. Methods I have used in this scenario include multiple linear regression, decision trees, random forests, and conditional random forests. MLR was by far the least effective. :(
I have exactly the same problem! I hope someone more informed brings his knowledge. I wander to what point is a long duration something that we want to discard or that we want to predict!
Also, I tried treating my data by log transforming it, and the density plot shows a funny artifact on the left side of the distribution ( because I only have durations of integer numbers, not floats). I think this helps, you also should log transform the features that have similar distributions.
I finally thought that the solution should be stratified sampling or giving weights to features, but I don't know exactly how to implement that. My tries didn't produce any good results. Perhaps my data is too stochastic!
I'm looking to analyze and compare the following `signals':
(Edit: better renderings here: oscillations good and here: oscillations bad)
What you see are plots of neuron activations from a type of artificial neural network plotted against time. Each line in the plot is a neuron's activation over time which can have a value between -1 and 1.
In the first plot, the activities are stable and consistent while the second exemplifies more chaotic activity (for want of a better term)-- some kind of destructive interference seems to occur ever so often..
Anyhow, I would like to do some kind of 'clever' analysis but since signal analysis is really not my strong point, thought I'd ask for some advice here...
EDIT: Let me clarify a bit. Ultimately, I would like to characterize the data. This could for example involve the pinpointing of correlations between the individual signals contained in each plot. I would like to measure 'regularity' or data invariance: in the above examples, the upper plot is more regular than the lower plot. I guess therefore I could compute the variance of each signal and take that as a measure; but I was wondering if some more comprehensive signal-processing technique could be better suited (I'm not sure). In fact I'm not even sure if signal-processing is what I really want now that I think about it. Perhaps some kind of wavelet or ft analysis...
For those interested, I am working on the computational modelling of worm locomotion.
You should consult some good books on nonlinear time series analysis. For instane, a measure for the regularity of your signal could be the Lyapunov spectrum. Another possibility would entropy. If you are interested in the correlation between signal, you could use transfer-entropy or granger causality, or for neurons it would be good to have a look at some measure for phase synchronization. The bayesian stuff could also be worth trying.
But – most important – firstly you need a proper question about what you really want to know. Once you've got that it is far more easy to pick the right tool.
And one final hint. Look for tools outside the engineering community. Their tools are mostly linear, but you are dealing with a highly nonlinear system. Wavelets, FFT and stuff are useful if you don't know anything about your signal and you want to have another perspective on it, but they are not suited for your kind of problem.
Consider an optimization problem of some dimension n, Given some linear set of equations(inequalities) or constraints on the inputs which form a convex region, finding the maximum\minimum value of some expression which is some linear combination of inputs(or dimensions).
For larger dimension, these optimization problems take much time to give the exact answer.
So, can we use machine learning techniques, to get some approximate solution in lesser time.
if we can use machine learning techniques in this context, How the Training set should be??
Do you mean "How big should the training set be?" If so, then that is very much a "how long is a piece of string" question. It needs to be large enough for the algorithm being used, and to represent the data that is being modeled.
This doesn't strike me as being especially focused on machine learning, as is typically meant by the term anyway. It's just a straightforward constrained optimization problem. You say that it takes too long to find solutions now, but you don't mention how you're trying to solve the problem.
The simplex algorithm is designed for this sort of problem, but it's exponential in the worst case. Is that what you're trying that's taking too long? If so, there are tons of metaheuristics that might perform well. Tabu search, simulated annealing, evolutionary algorithms, variable depth search, even simple multistart hill climbers. I would probably try something along those lines before I tried anything exotic.
I am working on testing several Machine Learning algorithm implementations, checking whether they can work as efficient as described in the papers and making sure they could offer a great power to our statistic NLP (Natural Language Processing) platform.
Could u guys show me some methods for testing an algorithm implementation?
1)What aspects?
2)How?
3)Do I have to follow some basic steps?
4)Do I have to consider diversity specific situations when using different programming languages?
5)Do I have to understand the algorithm? I mean, does it offer any help if I really know what the algorithm is and how it works?
Basically, we r using C or C++ to implement the algorithm and our working env is Linux/Unix. Our testing methods only focus on black box testing and testing input/output of functions. I am eager to improve them but I dont have any better idea now...
Great Thx!! LOL
For many machine learning and statistical classification tasks, the standard metric for measuring quality is Precision and Recall. Most published algorithms will make some kind of claim about these metrics, or you could implement them and run these tests yourself. This should provide a good indicative measure of the quality you can expect.
When you talk about efficiency of an algorithm, this is usually some statement about the time or space performance of an algorithm in terms of the size or complexity of its input (often expressed in Big O notation). Most published algorithms will report an upper bound on the time and space characteristics of the algorithm. You can use that as a comparative indicator, although you need to know a little bit about computational complexity in order to make sure you're not fooling yourself. You could also possibly derive this information from manual inspection of program code, but it's probably not necessary, because this information is almost always published along with the algorithm.
Finally, understanding the algorithm is always a good idea. It makes it easier to know what you need to do as a user of that algorithm to ensure you're getting the best possible results (and indeed to know whether the results you are getting are sensible or not), and it will allow you to apply quality measures such as those I suggested in the first paragraph of this answer.