I am trying to improve the timing of my Z3 code. I have observed that Z3 always utilizes only one core, no matter how many cores my machine has. Is there a way in which Z3 might be able to use all the cores as it will certainly improve performance?
Related
With respect specifically to CatBoost:
Under what scenarios might one want to use fewer than the max number of threads of one's CPU? I cannot find an answer to this.
Is there a fixed cost/overhead associated with each core utilized? I.e., is more always better for all data set types/sizes?
Do the answers to the questions above generalize to all machine learning algorithms?
I think that most of the reasons for changing the thread_count are not catboost specific. Other libraries like sklearn offer the same feature. Reasons for not running with all CPUs are:
Debugging: If there is a problem it might be handy to only have one thread thus making the process more simple.
You want other processes on your machine to have CPU power. Especially if you have a server for in-memory data analysis shared by a team of data scientists. Your colleagues won't be happy if you take all resources.
Your job is so small that it simply does not need all the resources.
Your parallelize in another way: For example you try different hyper parameters using cross validation. Then it would make sense to dedicate one CPU to training one model rather than training a model with with all CPUs and then move on to train the next model with all CPUs
I hope this answers question 1. This generalizes to other in-memory ml libraries like sklearn.
Regarding question 2 I'm not sure. CatBoost does the parallelisation somewhere in its C++ Code and uses it via Cython in the Python package. I assume it introduces some overhead (since distributed computing always introduces overhead) but it's probably not too much. You could find out by timing some experiments.
I would like to improve the scalability of SMT solving. I have actually implemented the incremental solving. But I would like to improve more. Any other general methods to improve it without the knowledge of the problem itself?
There's no single "trick" that can make z3 scale better for an arbitrary problem. It really depends on what the actual problem is and what sort of constraints you have. Of course, this goes for any general computing problem, but it really applies in the context of an SMT solver.
Having said that, here're some general ideas based on my experience, roughly in the order of ease of use:
Read the Programming Z3 book This is a very nice write-up and will teach you a ton of things about how z3 is architected and what the best idioms are. You might hit something in there that directly applies to your problem: https://theory.stanford.edu/~nikolaj/programmingz3.html
Keep booleans as booleans not integers Never use integers to represent booleans. (That is, use 1 for true, 0 for false; multiplication for and etc. This is a terrible idea that kills the powerful SAT engine underneath.) Explicitly convert if necessary. Most problems where people tend to deploy such tricks involve counting how many booleans are true etc.: Such problems should be solved using the pseudo-boolean tactics that's built into the solver. (Look up pbEq, pbLt etc.)
Don't optimize unless absolutely necessary The optimization engine is not incremental, nor it is well optimized (pun intended). It works rather slowly compared to all other engines, and for good reason: Optimization modulo theories is a very tricky thing to do. Avoid it unless you really have an optimization problem to tackle. You might also try to optimize "outside" the solver: Make a SAT call, get the results, and making subsequent calls asking for "smaller" cost values. You may not hit the optimum using this trick, but the values might be good enough after a couple of iterations. Of course, how well the results will be depends entirely on your problem.
Case split Try reducing your constraints by case-splitting on key variables. Example: If you're dealing with floating-point constraints, say; do a case split on normal, denormal, infinity, and NaN values separately. Depending on your particular domain, you might have such semantic categories where underlying algorithms take different paths, and mixing-and-matching them will always give the solver a hard time. Case splitting based on context can speed things up.
Use a faster machine and more memory This goes without saying; but having plenty of memory can really speed up certain problems, especially if you have a lot of variables. Get the biggest machine you can!
Make use of your cores You probably have a machine with many cores, further your operating system most likely provides fine-grained multi-tasking. Make use of this: Start many instances of z3 working on the same problem but with different tactics, random seeds, etc.; and take the result of the first one that completes. Random seeds can play a significant role if you have a huge constraint set, so running more instances with different seed values can get you "lucky" on average.
Try to use parallel solving Most SAT/SMT solver algorithms are sequential in nature. There has been a number of papers on how to parallelize some of the algorithms, but most engines do not have parallel counterparts. z3 has an interface for parallel solving, though it's less advertised and rather finicky. Give it a try and see if it helps out. Details are here: https://theory.stanford.edu/~nikolaj/programmingz3.html#sec-parallel-z3
Profile Profile z3 source code itself as it runs on your problem, and see
where the hot-spots are. See if you can recommend code improvements to developers to address these. (Better yet, submit a pull request!) Needless to say, this will require a deep study of z3 itself, probably not suitable for end-users.
Bottom line: There's no free lunch. No single method will make z3 run better for your problems. But above ideas might help improve run times. If you describe the particular problem you're working on in some detail, you'll most likely get better advice as it applies to your constraints.
I am using z3py. I am trying to check the satisfiability for different problems with different sizes and verify the scalability of the proposed method. However, to do that I need to know the memory consumed by the solver for each problem. Is there a way to access the memory or make the z3py print it in the STATISTICS section. Thank you so much in advance.
Update-27/5/2015:
I tried with the paython memory profiler but it seems that the generated memory is very large. I am not sure but the reported memory is similar to the memory consumed by the python application and not only Z3(constructing the z3 model and checking the sat and then generating model). Moreover, I used formal modeling checking tools for many years now. I am expecting Z3 to be more efficient and have better scalability, however, I am getting much less memory than what the paython is generating.
What I am thinking to do is to try to measure the design size or the scalability using factors other than memory. In z3py statistics many details are generated to describe the design size and complexity. However, I am not able to find any explanation of these parameters in the tutorial, webpage, or z3 papers.
For example, can you help me understand the following parameters generated in the statistics for one of the basic models I have. Also is there any parameter/parameters which can replace the memory or be good indication of the Z3 mdoel size/complexity.
:added-eqs 152
:assert-lower 2
:assert-upper 2
:binary-propagations 59
:conflicts 6
:datatype-accessor-ax 12
:datatype-constructor-ax 14
:datatype-occurs-check 19
:datatype-splits 12
:decisions 35
:del-clause 2
:eq-adapter 2
:final-checks 1
:mk-clause 9
:offset-eqs 2
:propagations 61
Thank again for your time.
Z3 approximates the global maximum memory usage, but it does not have facilities for tracking the memory usage of one particular call to the API. In cases where the global memory usage decreases, this won't be enough (e.g., should it be negative?). I think that external tools like the Python memory_profiler would do a much better job.
The updated question is a recurring one, please see these earlier answers: Interpretation of Z3 Statistics, What is the unit of memory usage in Z3 statistics?, Z3 statistics: what does time measure?, Z3 real arithmetic and statistics, Statistics in Z3, How to get statistics in Z3 3.2?.
I have added statistics for memory consumption into the unstable branch.
So by now you should be able to access this information from the statistics
returned by Z3.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
In the beginning, I would like to describe my current position and the goal that I would like to achieve.
I am a researcher dealing with machine learning. So far have gone through several theoretical courses covering machine learning algorithms and social network analysis and therefore have gained some theoretical concepts useful for implementing machine learning algorithms and feed in the real data.
On simple examples, the algorithms work well and the running time is acceptable whereas the big data represent a problem if trying to run algorithms on my PC. Regarding the software I have enough experience to implement whatever algorithm from articles or design my own using whatever language or IDE (so far have used Matlab, Java with Eclipse, .NET...) but so far haven't got much experience with setting-up infrastructure. I have started to learn about Hadoop, NoSQL databases, etc, but I am not sure what strategy would be the best taking into consideration the learning time constraints.
The final goal is to be able to set-up a working platform for analyzing big data with focusing on implementing my own machine learning algorithms and put all together into production, ready for solving useful question by processing big data.
As the main focus is on implementing machine learning algorithms I would like to ask whether there is any existing running platform, offering enough CPU resources to feed in large data, upload own algorithms and simply process the data without thinking about distributed processing.
Nevertheless, such a platform exists or not, I would like to gain a picture big enough to be able to work in a team that could put into production the whole system tailored upon the specific customer demands. For example, a retailer would like to analyze daily purchases so all the daily records have to be uploaded to some infrastructure, capable enough to process the data by using custom machine learning algorithms.
To put all the above into simple question: How to design a custom data mining solution for real-life problems with main focus on machine learning algorithms and put it into production, if possible, by using the existing infrastructure and if not, design distributed system (by using Hadoop or whatever framework).
I would be very thankful for any advice or suggestions about books or other helpful resources.
First of all, your question needs to define more clearly what you intend by Big Data.
Indeed, Big Data is a buzzword that may refer to various size of problems. I tend to define Big Data as the category of problems where the Data size or the Computation time is big enough for "the hardware abstractions to become broken", which means that a single commodity machine cannot perform the computations without intensive care of computations and memory.
The scale threshold beyond which data become Big Data is therefore unclear and is sensitive to your implementation. Is your algorithm bounded by Hard-Drive bandwidth ? Does it have to feet into memory ? Did you try to avoid unnecessary quadratic costs ? Did you make any effort to improve cache efficiency, etc.
From several years of experience in running medium large-scale machine learning challenge (on up to 250 hundreds commodity machine), I strongly believe that many problems that seem to require distributed infrastructure can actually be run on a single commodity machine if the problem is expressed correctly. For example, you are mentioning large scale data for retailers. I have been working on this exact subject for several years, and I often managed to make all the computations run on a single machine, provided a bit of optimisation. My company has been working on simple custom data format that allows one year of all the data from a very large retailer to be stored within 50GB, which means a single commodity hard-drive could hold 20 years of history. You can have a look for example at : https://github.com/Lokad/lokad-receiptstream
From my experience, it is worth spending time in trying to optimize algorithm and memory so that you could avoid to resort to distributed architecture. Indeed, distributed architectures come with a triple cost. First of all, the strong knowledge requirements. Secondly, it comes with a large complexity overhead in the code. Finally, distributed architectures come with a significant latency overhead (with the exception of local multi-threaded distribution).
From a practitioner point of view, being able to perform a given data mining or machine learning algorithm in 30 seconds is one the key factor to efficiency. I have noticed than when some computations, whether sequential or distributed, take 10 minutes, my focus and efficiency tend to drop quickly as it becomes much more complicated to iterate quickly and quickly test new ideas. The latency overhead introduced by many of the distributed frameworks is such that you will inevitably be in this low-efficiency scenario.
If the scale of the problem is such that even with strong effort you cannot perform it on a single machine, then I strongly suggest to resort to on-shelf distributed frameworks instead of building your own. One of the most well known framework is the MapReduce abstraction, available through Apache Hadoop. Hadoop can be run on 10 thousands nodes cluster, probably much more than you will ever need. If you do not own the hardware, you can "rent" the use of a Hadoop cluster, for example through Amazon MapReduce.
Unfortunately, the MapReduce abstraction is not suited to all Machine Learning computations.
As far as Machine Learning is concerned, MapReduce is a rigid framework and numerous cases have proved to be difficult or inefficient to adapt to this framework:
– The MapReduce framework is in itself related to functional programming. The
Map procedure is applied to each data chunk independently. Therefore, the
MapReduce framework is not suited to algorithms where the application of the
Map procedure to some data chunks need the results of the same procedure to
other data chunks as a prerequisite. In other words, the MapReduce framework
is not suited when the computations between the different pieces of data are
not independent and impose a specific chronology.
– MapReduce is designed to provide a single execution of the map and of the
reduce steps and does not directly provide iterative calls. It is therefore not
directly suited for the numerous machine-learning problems implying iterative
processing (Expectation-Maximisation (EM), Belief Propagation, etc.). The
implementation of these algorithms in a MapReduce framework means the
user has to engineer a solution that organizes results retrieval and scheduling
of the multiple iterations so that each map iteration is launched after the reduce
phase of the previous iteration is completed and so each map iteration is fed
with results provided by the reduce phase of the previous iteration.
– Most MapReduce implementations have been designed to address production needs and
robustness. As a result, the primary concern of the framework is to handle
hardware failures and to guarantee the computation results. The MapReduce efficiency
is therefore partly lowered by these reliability constraints. For example, the
serialization on hard-disks of computation results turns out to be rather costly
in some cases.
– MapReduce is not suited to asynchronous algorithms.
The questioning of the MapReduce framework has led to richer distributed frameworks where more control and freedom are left to the framework user, at the price of more complexity for this user. Among these frameworks, GraphLab and Dryad (both based on Direct Acyclic Graphs of computations) are well-known.
As a consequence, there is no "One size fits all" framework, such as there is no "One size fits all" data storage solution.
To start with Hadoop, you can have a look at the book Hadoop: The Definitive Guide by Tom White
If you are interested in how large-scale frameworks fit into Machine Learning requirements, you may be interested by the second chapter (in English) of my PhD, available here: http://tel.archives-ouvertes.fr/docs/00/74/47/68/ANNEX/texfiles/PhD%20Main/PhD.pdf
If you provide more insight about the specific challenge you want to deal with (type of algorithm, size of the data, time and money constraints, etc.), we probably could provide you a more specific answer.
edit : another reference that could prove to be of interest : Scaling-up Machine Learning
I had to implement a couple of Data Mining algorithms to work with BigData too, and I ended up using Hadoop.
I don't know if you are familiar to Mahout (http://mahout.apache.org/), which already has several algorithms ready to use with Hadoop.
Nevertheless, if you want to implement your own Algorithm, you can still adapt it to Hadoop's MapReduce paradigm and get good results. This is an excellent book on how to adapt Artificial Intelligence algorithms to MapReduce:
Mining of Massive Datasets - http://infolab.stanford.edu/~ullman/mmds.html
This seems to be an old question. However given your usecase, the main frameworks focusing on Machine Learning in Big Data domain are Mahout, Spark (MLlib), H2O etc. However to run Machine Learning algorithms on Big Data you have to convert them to parallel programs based on Map Reduce paradigm. This is a nice article giving a brief introduction to major (not all) big Data frameworks:
http://www.codophile.com/big-data-frameworks-every-programmer-should-know/
I hope this will help.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
What can we do more with qubits than normal bits, and how do they work? I read about them some time ago, and it appears that qubits can store not just 0 or 1, but also 0 and 1 at the same time. I don't really understand how they work. Can someone please explain this to me?
What are their pros and cons, and what impact will they have on programming languages like C after quantum computers are actually invented?
How would we manage memory when a bit (which is also a quantum) can take multiple values at once? How can we determine if something is true or false, when there is more than just 1 and 0?
Any "classical" (as it will be called once the technology is in wider use) problem which is solved by "classical" code can be solved using some sort of quantum processor by transforming the problem. For example, to do a database search, instead of using an index-based search/binary search, or a linear search for an unsorted database, you can use Grover's algorithm. Also, to take a step back from the previous poster's mention of BQP problems, problems with a classical "solution" that runs in NP-time can be sped up considerably by Grover's algorithm (a speedup in the time to search through every possible solution). RSA cryptography is also made much more insecure by the advent of Shor's algorithm, since it makes factorising large numbers into their prime factors (the hinge upon which RSA sits) solvable in logarithmic time.
EDIT: Shor's algorithm actually runs in O((log N)^3), which is polynomial-over-logarithmic time.
The conclusion of this sort of thing is that pre-existing programming languages like C will not be able to be used on a quantum computer due to the nature of quantum algorithms (applying certain functions to quantum states), unless someone invents a way to map quantum gates and so forth to logical gates (EDIT: This has apparantly been mostly addressed here), in which case about all we get is a very very fast logical processor when using languages like C.
PS: I'm sure there'll be OpenGL bindings for quantum computing eventually :P
If we can make a working quantum computer (still an open question) then it can efficiently solve certain algorithmic problems that (we think) a classical computer cannot efficiently solve. These are the problems in the complexity class BQP but not in P. One big one is integer factorization. As Will A mentioned, if you can factor enormous integers quickly, you can break a lot of modern ciphers.
The catch is that nobody knows for sure if BQP is actually "bigger" than P — it might be that anything a quantum computer can do quickly, so can a classical computer.
We also don't know if BQP is as big as NP — for instance, nobody has found an efficient way to solve the Traveling Salesman Problem on a quantum computer. This is a common misconception about quantum computers. They might be able to do NP-complete problems quickly, and then again they might not. Nobody knows.
http://scottaaronson.com/blog/?p=208 be good readin' on this topic (as is the rest of the blog).
Regarding what can be solved with quantum computers: A quantum computer would break current asymmetric encryption schemes. It is a common misconception, that quantum computers can solve most optimization problems. They cannot. See
this article for more details what can be solved using quantum computers and what cannot.
qubits doesn't store 0 and 1 simultaneously, actually they are made from the superposition of the 0 and 1 at a time.
so if a normal bit can represent 0 or 1 at a time, but qubits contain 0 and 1 at a time. three normal bits can store any one of the following....
000,001,010,...,111. but qubit can represent all of them at a time(which are in superposition). so a 'n' qubits store 2^n bits simultaneously!
Suppose a qubit an electron and it spins just like dipole momentum particle and when it spins it create an amplitude of multiple intensity and frequencies that minor amplitude can create spin vibration or momentum of particle that momentum can store thousand bits of information !!! (that's called quantum information processing) which is future !