I'm working on a Python project, where I'm currently trying to speed things up in some horrible ways: I set up my Z3 solvers, then I fork the process, and have Z3 perform the solve in the child process and pass a pickle-able representation of the model back to the parent.
This works great, and represents the first stage of what I'm trying to do: the parent process is now no longer CPU-bound. The next step is to multi-thread the parent, so that we can solve multiple Z3 solvers in parallel.
I'm pretty sure I've mutexed away any concurrent accesses of Z3 in the setup phase, and only one thread should be touching Z3 at any one time. However, despite this, I'm getting random segfaults in libz3.so. It's important to note, at this point, that it's not always the same thread that touches Z3 -- the same object (not the solvers themselves, but the expressions) might be handled by different threads at different times.
My question is, is it possible to multi-thread Z3? There is a brief note here (http://research.microsoft.com/en-us/um/redmond/projects/z3/z3.html) saying "It is not safe to access Z3 objects from multiple threads.", which I guess would answer my question, but I'm holding out hope that it means to say that one shouldn't access Z3 from multiple threads simultaneously. Another resource (Again: Installing Z3 + Python on Windows) states, from Leonardo himself, that "Z3 uses thread local storage", which, I guess, would sink this whole undertaking, but a) that answer is from 2012, so maybe things have changed, and b) maybe it uses thread-local storage for some unrelated stuff?
Anyways, is multi-threading Z3 possible (from Python)? I'd hate to have to push the setup phase into the child processes...
Z3 does indeed use thread local storage, but as far as I can see, there is only one point left in the code where it does so (to track how much memory each thread is using; in memory_manager.cpp), but that should not be responsible for the symptoms you see.
Z3 should behave nicely in a multi-threaded setting, if every thread strictly uses only it's own context object (Z3_context, or in Python class Context). This means that any object created through one of the Context's can not in any way interact with any of the other Context's; if that is required, all objects have to be translated from one Context to another first, e.g. in Python via functions like translate(...) in class ASTRef.
That said, there surely are some bugs left to fix. My first target when seeing random segfaults would be the garbage collector, because it might not interact nicely with Z3's reference counting (which is the case in other APIs). There is also a known bug that's triggered when many Context objects are created at the same time (on my todo list though...)
Related
I hope you are doing well! I am relatively new to Electron and after reading numerous articles, I am still confused on where I should put heavy computing functions in Electron. I plan on using node libraries in these functions and I have read numerous articles that state that these functions should be put in the main process. However, isn't there a chance that this would possibly overhead my main process and thus, block my renderers? This is definitely not desired and I was wondering why could I not just put these functions in preload.js. Wouldn't it be better for performance? Also, if I am only going to require node modules and only connect to my API, would there still be security concerns if I were to put these functions in the preload.js? Sorry for the basic questions and please let me know!
Thanks
You can use web workers created in your renderer thread. They won't block.
However you mentioned planning to use node modules. So depending on what they are, it could make more sense to run them from the main process. (But see also https://www.electronjs.org/docs/latest/tutorial/multithreading which points out that you can set nodeIntegrationInWorker, independently of nodeIntegration)
You can use https://nodejs.org/api/worker_threads.html in Node too, or for a process-level separation there is also https://nodejs.org/api/child_process.html.
Note that worker threads in the browser (and therefore the renderer thread) cannot share memory. Instead you have to serialize it to pass it back and forth. If your heavy compute process is working on large data structures, bear this in mind. I notice that node worker threads say they do allow sharing memory between threads.
According to the lua 5.1 manual, lua_xmove moves values between stacks of different threads belonging to the same Lua state. But, I accidentally happened to use it to move values across different Lua states and it seemed to work fine! Is there any other API to move values from one Lua state to another (in 5.1), or can lua_xmove be used?
Lua stores garbage collection data in the global state. So, if you move GC or string objects across states, you can potentially confuse the garbage collector and create dangling references.
So, while it might look like it works, it could just as easily cause problems later on.
For reference, see this mailing list thread where developers discuss this exact issue.
Note that lua_xmove does check that the global states are the same:
api_check(from, G(from) == G(to));
Is it possible to store all changes of a set by using some means of logical paths - of the changes as they occur - such that one may revert the changes by essentially "stepping back"? I assume that something would need to map the changes as they occur, and the process of reverting them would thus ultimately be linear.
Apologies for any incoherence and this isn't applicable to any particular language. Rather, it's a problem of memory – i.e. can a set * (e.g. which may be some store of user input)* of a finite size that's changed continuously * (e.g. at any given time for any amount of time - there's no limit with regards to how much it can be changed)* be mapped procedurally such that new - future - changes are assumed to be the consequence of prior change * (in a second, mirror store that can be used to revert the state of the set all the way to its initial state)*.
You might want to look at some functional data structures. Functional languages, like Erlang, make it easy to roll back to the earlier state, since changes are always made on new data structures instead of mutating existing ones. While this feature can be used at repeatedly internally, Erlang programming typically uses this abundantly at the top level of a "process" so that on any kind of failure, it aborts both processing as well as all the changes in their entirety simply by throwing an exception (in a non-functional language, using mutable data structures, you'd be able to throw an exception to abort, but restoring originals would be your program's job not the runtime's job). This is one reason that Erlang has a solid reputation.
Some of this functional style of programming is usefully applied to non-functional languages, in particular, use of immutable data structures, such as immutable sets, lists, or trees.
Regarding immutable sets, for example, one might design a functionally-oriented data structure where modifications always generate a new set given some changes and an existing set (a change set consisting of additions and removals). You'd leave the old set hanging around for reference (by whomever); languages with automatic garbage collection reclaim the old ones when they're no longer being used (referenced).
You can put a id or tag into your set data structure, this way you can do some introspection to see what data structure id someone has a hold of. You also can capture the id of the base off of which each new version was generated; this gives you some history or lineage.
If desired, you can also capture a reference to the entire old data structure in the new one, or, one can maintain a global list of all of the sets as they are being generated. If you do, however, you'll have to take over more responsibility for storage management, as an automatic collector will probably not find any unused (unreferenced) garbage to collect without additional some help.
Database designs do some of this in their transaction controllers. For the purposes of your question, you can think of a database as a glorified set. You might look into MVCC (Multi-version Concurrency Control) as one example that is reasonably well written up in literature. This technique keeps old snapshot versions of data structures around (temporarily), meaning that mutations always appear to be in new versions of the data. An old snapshot is maintained until no active transaction references it; then is discarded. When two concurrently running transactions both modify the database, they each get a new version based off the same current and latest data set. (The transaction controller knows exactly which version each transaction is based off of, though the transaction's client doesn't see the version information.) Assuming both concurrent transactions choose to commit their changes, the versioning control in the transaction controller recognizes that the second committer is trying to commit a change set that is not a logical successor to the first (since both changes sets as we postulated above were based on the same earlier version). If possible, the transaction controller will merge the changes as if the 2nd committer was really working off the other, newer version committed by the first committer. (There are varying definitions of when this is possible, MVCC says it is when there are no write conflicts, which is a less-than-perfect answer but fast and scalable.) But if not possible, it will abort the 2nd committers transaction and inform the 2nd committer thereof (they then have the opportunity, should they like, to retry their transaction starting from the newer base). Under the covers, various snapshot versions in flight by concurrent transactions will probably share the bulk of the data (with some transaction-specific change sets that are consulted first) in order to make the snapshots cheap. There is usually no API provided to access older versions, so in this domain, the transaction controller knows that as transactions retire, the original snapshot versions they were using can also be (reference counted and) retired.
Another area this is done is using Append-Only-Files. Logging is a way of recording changes; some databases are based 100% on log-oriented designs.
BerkeleyDB has a nice log structure. Though used mostly for recovery, it does contain all the history so you can recreate the database from the log (up to the point you purge the log in which case you should also archive the database). Again someone has to decide when they can start a new log file, and when they can purge old log files, which you'd do to conserve space.
These database techniques can be applied in memory as well. (Nothing is free, though, of course ;)
Anyway, yes, there are fields where this is done.
Immutable data structures help preserve history, by simply keeping old copies; changes always go to new copies. (And efficiency techniques can make this not as bad as it sounds.)
Id's can help understand lineage without necessarily holding onto all the old copies.
If you do want to hold onto all old the copies, you have to look at your domain design to understand when/how/if old data structures possibly can get accessed with an eye toward how to eventually reclaim them. You'll mostly likely have to help get involved in defining how they get released, if ever. Or how they get archived for posterity though at the cost of slower access later.
I am trying to write a customized threadpool suited to my purpose using pthreads, and I am new to pthreads. I read these (POSIX threads programming and Linux Tutorial Posix Threads) tutorials online and they were quite helpful, but i still have some (maybe silly) doubts regarding mutexes and condition variables:
What is the scope of a mutex? Will a global mutex lock all the global variables so that only one thread can access them at a time? If i have two global mutexes, would they lock the same set of variables? What about a mutex that is declared inside a class or a function, what will happen when i lock/unlock it?
If i just plan to just read a global variable, and not modify it at all, should i still use a mutex lock?
If i am correct, a condition variable is used to wake up other threads which are sleeping (or blocked using pthread_cond_wait()) on some condition. The wake up call to sleeping threads is given by pthread_cond_signal() or pthread_cond_broadcast() from some other thread. How is the flow of control supposed to occur so that some all or one thread wake(s) up to do a work and wait until next work is available? I am particularly interested in a scenario with 4 threads.
Is there a way to set the affinity of a thread to a particular processor core before it is created (so that it starts execution on the desired core and no shifting of cores occur after creation)?
I am sorry if the questions look silly, but as i said, i am new to this. Any help, comments, code or pointer to good resources is appreciated. thanks in advance for your help.
That's a lot of questions. A few answers.
(1a) The scope of a mutex is whatever you program it to be. In that sense it is no different from any other kind of variable.
(1b) A global mutex will protect whatever variables you program it to protect. I think from your other questions you might have a fundamental misunderstanding here. There is nothing magical about mutexes. You can't just declare one and say "Ok, protect these variables", you have to incorporate the mutex in your code. So if you have two functions that use variable X and one does a mutex lock/unlock around any changes to the variable and the other function completely ignores that a mutex even exists you really aren't protecting anything. The best example I can think of is advisory file locks - one program can use them but if another doesn't then that file isn't locked.
(1c) As a rule, don't have multiple mutexes locking the same data. It is an invitation to problems. Again the use of mutexes depends on programmed cooperation. If function A is protecting data B with mutex C while function D is protecting data B with mutex E then data B isn't protected at all. Function A can hold the lock on mutex C but since function D pays no attention to it it will just overwrite data B anyway.
(1d) Basic scoping rules apply.
(2) No. If the variable isn't going to change in any way that would make it inconsistent among threads then you don't need to lock it.
(3) There are a number of detailed answers on this on SO that go into considerable detail on this. Search around a bit.
(4) Not that I am aware.
Recently, I have encountered many difficulties when I was developing using C++ and Lua. My situation is: for some reason, there can be thousands of Lua-states in my C++ program. But these states should be same just after initialization. Of course, I can do luaL_loadlibs() and lua_loadfile() for each state, but that is pretty heavy(in fact, it takes a rather long time for me even just initial one state). So, I am wondering the following schema: What about keeping a separate Lua-state(the only state that has to be initialized) which is then cloned for other Lua-states, is that possible?
When I started with Lua, like you I once wrote a program with thousands of states, had the same problem and thoughts, until I realized I was doing it totally wrong :)
Lua has coroutines and threads, you need to use these features to do what you need. They can be a bit tricky at first but you should be able to understand them in a few days, it'll be well worth your time.
take a look to the following lua API call I think it is what you exactly need.
lua_State *lua_newthread (lua_State *L);
This creates a new thread, pushes it on the stack, and returns a pointer to a lua_State that represents this new thread. The new thread returned by this function shares with the original thread its global environment, but has an independent execution stack.
There is no explicit function to close or to destroy a thread. Threads are subject to garbage collection, like any Lua object.
Unfortunately, no.
You could try Pluto to serialize the whole state. It does work pretty well, but in most cases it costs roughly the same time as normal initialization.
I think it will be hard to do exactly what you're requesting here given that just copying the state would have internal references as well as potentially pointers to external data. One would need to reconstruct those internal references in order to not just have multiple states pointing to the clone source.
You could serialize out the state after one starts up and then load that into subsequent states. If initialization is really expensive, this might be worth it.
I think the closest thing to doing what you want that would be relatively easy would be to put the states in different processes by initializing one state and then forking, however your operating system supports it:
http://en.wikipedia.org/wiki/Fork_(operating_system)
If you want something available from within Lua, you could try something like this:
How do you construct a read-write pipe with lua?