Building a MINLP Heuristic Model in Python - heuristics

I am currently building a MINLP model which has around 200k decision variables and upto 100 constraints. I have access to only open source solvers which are BONMIN and COUENNE.
When I try to solve the problem, I see that the solver keeps on running for more than 2 hours.
I have been reading the BONMIN documentation and there I see various heuristic algorithms as options. I wanted to know is there any options list i can pass to this BONMIN solver which will trigger a heuristic algorithm that will give me a suboptimal solution in ~15 minutes?
I am working with the Pyomo package.
Thanks in Advance!

See this section of the Pyomo documentation on sending options to a solver: https://pyomo.readthedocs.io/en/latest/working_models.html#sending-options-to-the-solver

Related

Deep Learning Algorithm to Predict Bash Commands

Im new to machine learning, and I want to develop an application that takes all the data from multiple user's bash history, and predict the next command of another user based on other's executed commands.
I searched for it a lot but didnt find any good answer. Appreciate the ML expert's help if know about sample of similar code, or have any comments that might be useful such as what algorithm.etc. should I look into.
You can check Language Modeling topic, which is able to predict the next word in the sequence given the words that precede it. You probably work with RNN or LSTM based networks for Language Modeling.

Can anyone explain how to get BIDMach's Word2vec to work?

In a paper titled, "Machine Learning at the Limit," Canny, et. al. report substantial word2vec processing speed improvements.
I'm working with the BIDMach library used in this paper, and cannot find any resource that explains how Word2Vec is implemented or how it should be used within this framework.
There are several scripts in the repo:
getw2vdata.sh
getwv2data.ssc
I've tried running them (after building the referenced tparse2.exe file) with no success.
I've tried modifying them to get them to run but have nothing but errors come back.
I emailed the author, and posted an issue on the github repo, but have gotten nothing back. I only got somebody else having the same troubles, who says he got it to run but at much slower speeds than reported on newer GPU hardware.
I've searched all over trying to find anyone that has used this library to achieve these speeds with no luck. There are multiple references floating around that point to this library as the fastest implementation out there, and cite the numbers in the paper:
Intel research references the reported numbers without running the code on GPU (they cite numbers reported in the original paper)
old reddit post pointing to BIDMach as the best (but the OP says "I haven't tested BIDMach myself yet")
SO post citing BIDMach as the best (OP doesn't actually run the library to make this claim...)
many more not worth listing citing BIDMach as the best/fastest without example or claims of "I haven't tested myself..."
When I search for a similiar library (gensim), and the import code required to run it, I find thousands of results and tutorials but a similar search for the BIDMach code yields only the BIDMach repo.
This BIDMach implementation certainly carries the reputation for being the best, but can anyone out there tell me how to use it?
All I want to do is run a simple training process to compare it to a handful of other implementations on my own hardware.
Every other implementation of this concept I can find either has works with the original shell script test file, provides actual instructions, or provides shell scripts of their own to test.
UPDATE:
The author of the library has added additional shell scripts to get the previously mentioned scripts running, but exactly what they mean or how they work is still a total mystery and I can't understand how to get the word2vec training procedure to run on my own data.
EDIT (for bounty)
I'll give out the bounty to anywone that can explain how I'd use my own corpus (text8 would be great), and then train a model, and then save the ouput vectors and the vocabulary to files that can be read by Omar Levy's Hyperwords.
This is exactly what the original C implementation would do with arguments -binary 1 -output vectors.bin -save-vocab vocab.txt
This is also what Intel's implementation does, and other CUDA implementations, etc, so this is a great way to generate something that can be easily compared with other versions...
UPDATE (bounty expired without answer)
John Canny has updated a few scripts in the repo and added a fmt.txt file, thus making it possible to run test scripts that are package in the repo.
However, my attempt to run this with the text8 corpus yields near 0% accuracy on they hyperwords test.
Running the training process on the billion word benchmark (which is what the repo scripts now do) also yields well-below-average accuracy on the hyperwords test.
So, either the library never yielded accuracy on these tests, or I'm still missing something in my setup.
The issue remains open on github.
BIDMach's Word2vec is a tool for learning vector representations of words, also known as word embeddings. To use Word2vec in BIDMach, you will need to first download and install BIDMach, which is an open-source machine learning library written in Scala. Once you have BIDMach installed, you can use the word2vec function to train a Word2vec model on a corpus of text data. This function takes a number of parameters, such as the size of the word vectors, the number of epochs to train for, and the type of model to use. You can find more detailed instructions and examples of how to use the word2vec function in the BIDMach documentation.

what software can learn the best structure of a neural network?

Is there software out there that optimises the best combination of learning rate, weight ranges, hidden layer structure, for a certain task? After presumably trying and failing different combinations? What is this called? As far as I can tell, we just do it manually at the moment...
I know this is not differently code related but am sure it will help many others too. Cheers.
The above comes under multi variate optimization problem, use an optimization algorithm and check the results. Particle Swarm Optimization would do it ( there are however considerations to use this algorithm) as long as you have a cost function to optimize for example the error rate of the network output

Recommended local search optimization algorithm for control domain

Background: I am trying to find a list of floating point parameters for a low level controller that will lead to balance of a robot while it is walking.
Question: Can anybody recommend me any local search algorithms that will perform well for the domain I just described? The main criteria for me is the speed of convergence to the right solution.
Any help will be greatly appreciated!
P.S. Also, I conducted some research and found out that "Evolutianry
Strategy" algorithms are a good fit for continuous state space. However, I am not entirely sure, if they will fit well my particular problem.
More info: I am trying to optimize 8 parameters (although it is possible for me to reduce the number of parameters to 4). I do have a simulator and a criteria for me is speed in number of trials because simulation resets are costly (take 10-15 seconds on average).
One of the best local search algorithms for low number of dimensions (up to about 10 or so) is the Nelder-Mead simplex method. By the way, it is used as the default optimizer in MATLAB's fminsearch function. I personally used this method for finding parameters of some textbook 2nd or 3rd degree dynamic system (though very simple one).
Other option are the already mentioned evolutionary strategies. Currently the best one is the Covariance Matrix Adaption ES, or CMA-ES. There are variations to this algorithm, e.g. BI-POP CMA-ES etc. that are probably better than the vanilla version.
You just have to try what works best for you.
In addition to evolutionary algorithm, I recommend you also check reinforcement learning.
The right method depends a lot on the details of your problem. How many parameters? Do you have a simulator? Do you work in simulation only, or also with real hardware? Speed is in number of trials, or CPU time?

Map Reduce Algorithms on Terabytes of Data?

This question does not have a single "right" answer.
I'm interested in running Map Reduce algorithms, on a cluster, on Terabytes of data.
I want to learn more about the running time of said algorithms.
What books should I read?
I'm not interested in setting up Map Reduce clusters, or running standard algorithms. I want rigorous theoretical treatments or running time.
EDIT: The issue is not that map reduce changes running time. The issue is -- most algorithms do not distribute well to map reduce frameworks. I'm interested in algorithms that run on the map reduce framework.
Technically, there's no real different in the runtime analysis of MapReduce in comparison to "standard" algorithms - MapReduce is still an algorithm just like any other (or specifically, a class of algorithms that occur in multiple steps, with a certain interaction between those steps).
The runtime of a MapReduce job is still going to scale how normal algorithmic analysis would predict, when you factor in division of tasks across multiple machines and then find the maximum individual machine time required for each step.
That is, if you have a task which requires M map operations, and R reduce operations, running on N machines, and you expect that the average map operation will take m time and the average reduce operation r time, then you'll have an expected runtime of ceil(M/N)*m + ceil(R/N)*r time to complete all of the tasks in question.
Prediction of the values for M,R,m, and r are all something that can be accomplished with normal analysis of whatever algorithm you're plugging into MapReduce.
There are only two books that i know of that are published, but there are more in the works:
Pro hadoop and Hadoop: The Definitive Guide
Of these, Pro Hadoop is more of a beginners book, whilst The Definitive Guide is for those that know what Hadoop actually is.
I own The Definitive Guide and think its an excellent book. It provides good technical details on how the HDFS works, as well as covering a range of related topics such as MapReduce, Pig, Hive, HBase etc. It should also be noted that this book was written by Tom White who has been involved with the development of Hadoop for a good while, and now works at cloudera.
As far as the analysis of algorithms goes on Hadoop you could take a look at the TeraByte sort benchmarks. Yahoo have done a write up of how Hadoop performs for this particular benchmark: TeraByte Sort on Apache Hadoop. This paper was written in 2008.
More details about the 2009 results can be found here.
There is a great book about Data Mining algorithms applied to the MapReduce model.
It was written by two Stanford Professors and it if available for free:
http://infolab.stanford.edu/~ullman/mmds.html

Resources