In some Redis loading tests, LuaJIT 2.0 (beta) is performing quite nicely, at about 60% of the runtime of a similar single threaded Python script.
When using Python's multiprocessing module to chunk large text files, results in a significant performance improvement splitting the works across cores.
I am assuming using the same approach in Lua would perform even better, but as a Lua beginner, I have not found the correct approach. Can anyone point me in the right direction?
There is LuaThreads, however, as explained here by Mike Pall, it isn't really the best solution for multi-threading and interlinked task (as all threads will hammer the single lock on the lua state).
However LuaLanes may provide what you need, but, seeing as you are using/favorable to LuaJIT and not just plain Lua, you can probably leverage the FFI to spawn system threads straight from LuaJIT and pass them a Lua callback (I'm not sure on the safety of this however, thats something for the LuaJIT mailing list).
Related
I have been struggling a lot to get this to work.
Can someone provide an example with any LUA api of 2 scripts that pass a message back and forth.
I have tried Oil, lua-ipc and zeromq.
But I face several missing libraries issues.
The ultimate goal is to pass a vector of numbers from one Lua process to another Lua process (with a different version of Lua) without going through disk.
Here is a similar example in python of IPC in a single file. Something similar in lua would be extremely helpful.
I really need an example as my knowledge in pipes or UDP/TCP is not strong.
The equivalent would be to use luasocket. These examples come very close to the python example given. Here socket:receive() is used for the framing.
https://github.com/diegonehab/luasocket/blob/master/samples/listener.lua
https://github.com/diegonehab/luasocket/blob/master/samples/talker.lua
I need to write a compute-intensive simulation program. I tried writing a multi-threaded version of this program, but it's taking too much time. Now I plan to expand to multiple nodes (probably via Amazon EC2 nodes).
I'm already familiar with Python. Is Python outfitted with some parallel module a viable option if I care about speed, or would I be better off going to some other framework/language like Erlang?
Can you even write a simulation program in Erlang?
The project is more about dividing up computation rather than dividing up a dataset, so I didn't consider frameworks based on map reduce
dispy is a framework for distributed computing with Python. It uses asyncoro, a framework for asynchronous, concurrent programming using coroutines, with some features of erlang (broadly speaking). Disclaimer: I am the author of both these frameworks.
If you are familiar with python already I would recommend you keep simulation in python (and speed up critical parts in C) and use Erlang for managing it. Writing simulation in Erlang would be a far away out of your comfort zone (even personally I would do it). You probably can reuse parts of Erlang projects as Disco project or Riak core. Start your project with some sub-optimal POC and tune it in iterations. It means start with python, embed it to Erlang (probably Disco) and then move bits around until you are not happy with performance and features. You can end up with anything including pure Erlang solution or emended Python in BEAM using NIF or anything else which satisfy your needs.
Does your problem parallelize trivially? Then you may want to take a look at Elastic Map Reduce instead of EC2.
EDIT: unfortunately LuaJIT was taken out of the comparison in the link below.
This comparison of programming languages shows that LuaJIT has an over tenfold improvement over the normal Lua implementation.
Why is the change so big? Is there something specific about Lua that makes it benefit a lot from JIT compilation?
Python is dynamically typed and compiled to bytecode as well, so why doesn't PyPy (that has JIT now, I believe) show such a large jump in performance?
Mike Pall has talked about this in a few places:
http://article.gmane.org/gmane.comp.lang.lua.general/58908
http://lambda-the-ultimate.org/node/3851
http://www.reddit.com/user/mikemike
As with every performant system, the answer in the end comes down to two things: algorithms and engineering. LuaJIT uses advanced compilation techniques, and it also has a very finely engineered implementation. For example, when the fancy compilation techniques can't handle a piece of code, LuaJIT falls back to an very fast interpreter written in x86 assembly.
LuaJIT gets double points on the engineering aspect, because not only is LuaJIT itself well-engineered, but the Lua language itself has a simpler and more coherent design than Python and JavaScript. This makes it (marginally) easier for an implementation to provide consistently good performance.
I have an image processing routine that I believe could be made very parallel very quickly. Each pixel needs to have roughly 2k operations done on it in a way that doesn't depend on the operations done on neighbors, so splitting the work up into different units is fairly straightforward.
My question is, what's the best way to approach this change such that I get the quickest speedup bang-for-the-buck?
Ideally, the library/approach I'm looking for should meet these criteria:
Still be around in 5 years. Something like CUDA or ATI's variant may get replaced with a less hardware-specific solution in the not-too-distant future, so I'd like something a bit more robust to time. If my impression of CUDA is wrong, I welcome the correction.
Be fast to implement. I've already written this code and it works in a serial mode, albeit very slowly. Ideally, I'd just take my code and recompile it to be parallel, but I think that that might be a fantasy. If I just rewrite it using a different paradigm (ie, as shaders or something), then that would be fine too.
Not require too much knowledge of the hardware. I'd like to be able to not have to specify the number of threads or operational units, but rather to have something automatically figure all of that out for me based on the machine being used.
Be runnable on cheap hardware. That may mean a $150 graphics card, or whatever.
Be runnable on Windows. Something like GCD might be the right call, but the customer base I'm targeting won't switch to Mac or Linux any time soon. Note that this does make the response to the question a bit different than to this other question.
What libraries/approaches/languages should I be looking at? I've looked at things like OpenMP, CUDA, GCD, and so forth, but I'm wondering if there are other things I'm missing.
I'm leaning right now to something like shaders and opengl 2.0, but that may not be the right call, since I'm not sure how many memory accesses I can get that way-- those 2k operations require accessing all the neighboring pixels in a lot of ways.
Easiest way is probably to divide your picture into the number of parts that you can process in parallel (4, 8, 16, depending on cores). Then just run a different process for each part.
In terms of doing this specifically, take a look at OpenCL. It will hopefully be around for longer since it's not vendor specific and both NVidia and ATI want to support it.
In general, since you don't need to share too much data, the process if really pretty straightforward.
I would also recommend Threading Building Blocks. We use this with the IntelĀ® Integrated Performance Primitives for the image analysis at the company I work for.
Threading Building Blocks(TBB) is similar to both OpenMP and Cilk. And it uses OpenMP to do the multithreading, it is just wrapped in a simpler interface. With it you don't have to worry about how many threads to make, you just define tasks. It will split the tasks, if it can, to keep everything busy and it does the load balancing for you.
Intel Integrated Performance Primitives(Ipp) has optimized libraries for vision. Most of which are multithreaded. For the functions we need that aren't in the IPP we thread them using TBB.
Using these, we obtain the best result when we use the IPP method for creating the images. What it does is it pads each row so that any given cache line is entirely contained in one row. Then we don't divvy up a row in the image across threads. That way we don't have false sharing from two threads trying to write to the same cache line.
Have you seen Intel's (Open Source) Threading Building Blocks?
I haven't used it, but take a look at Cilk. One of the big wigs on their team is Charles E. Leiserson; he is the "L" in CLRS, the most widely/respected used Algorithms book on the planet.
I think it caters well to your requirements.
From my brief readings, all you have to do is "tag" your existing code and then run it thru their compiler which will automatically/seamlessly parallelize the code. This is their big selling point, so you dont need to start from scratch with parallelism in mind, unlike other options (like OpenMP).
If you already have a working serial code in one of C, C++ or Fortran, you should give serious consideration to OpenMP. One of its big advantages over a lot of other parallelisation libraries / languages / systems / whatever, is that you can parallelise a loop at a time which means that you can get useful speed-up without having to re-write or, worse, re-design, your program.
In terms of your requirements:
OpenMP is much used in high-performance computing, there's a lot of 'weight' behind it and an active development community -- www.openmp.org.
Fast enough to implement if you're lucky enough to have chosen C, C++ or Fortran.
OpenMP implements a shared-memory approach to parallel computing, so a big plus in the 'don't need to understand hardware' argument. You can leave the program to figure out how many processors it has at run time, then distribute the computation across whatever is available, another plus.
Runs on the hardware you already have, no need for expensive, or cheap, additional graphics cards.
Yep, there are implementations for Windows systems.
Of course, if you were unwise enough to have not chosen C, C++ or Fortran in the beginning a lot of this advice will only apply after you have re-written it into one of those languages !
Regards
Mark
I am trying to build out a useful 3d game engine out of the Ogre3d rendering engine for mocking up some of the ideas i have come up with and have come to a bit of a crossroads. There are a number of scripting languages that are available and i was wondering if there were one or two that were vetted and had a proper following.
LUA and Squirrel seem to be the more vetted, but im open to any and all.
Optimally it would be best if there were a compiled form for the language for distribution and ease of loading.
One interesting option is stackless-python. This was used in the Eve-Online game.
The syntax is a matter of taste, Lua is like Javascript but with curly braces replaced with Pascal-like keywords. It has the nice syntactic feature that semicolons are never required but whitespace is still not significant, so you can even remove all line breaks and have it still work. As someone who started with C I'd say Python is the one with esoteric syntax compared to all the other languages.
LuaJIT is also around 10 times as fast as Python and the Lua interpreter is much much smaller (150kb or around 15k lines of C which you can actually read through and understand). You can let the user script your game without having to embed a massive language. On the other hand if you rip the parser part out of Lua it becomes even smaller.
The Python/C API manual is longer than the whole Lua manual (including the Lua/C API).
Another reason for Lua is the built-in support for coroutines (co-operative multitasking within the one OS thread). It allows one to have like 1000's of seemingly individual scripts running very fast alongside each other. Like one script per monster/weapon or so.
( Why do people write Lua in upper case so much on SO? It's "Lua" (see here). )
One more vote for Lua. Small, fast, easy to integrate, what's important for modern consoles - you can easily control its memory operations.
I'd go with Lua since writing bindings is extremely easy, the license is very friendly (MIT) and existing libraries also tend to be under said license. Scheme is also nice and easy to bind which is why it was chosen for the Gimp image editor for example. But Lua is simply great. World of Warcraft uses it, as a very high profile example. LuaJIT gives you native-compiled performance. It's less than an order of magnitude from pure C.
I wouldn't recommend LUA, it has a peculiar syntax so takes some time to get used to. Depending on who will be doing the scripting, this may not be a problem, but I would try to use something fairly accessible.
I would probably choose python. It normally compiles to bytecode, so you would need to embed the interpreter. However, if you must you can use PyPy to for example translate the code to C, and then compile it.
Embedding the interpreter is no issue. I am more interested in features and performance at this point in time. LUA and Squirrel are both interpreted, which is nice because one of the games i am building out is to include modifiable code, which has an editor in game.
I would love to hear about python, as i have seen its use within the battlefield series i believe.
python is also nice because it has actual OGRE bindings, just in case you need to modify something lower-level on the fly. I don't know of any equivalent bindings for Lua.
Since it's a C++ library, I would suggest either JavaScript or Squirrel, the latter being my personal favorite of the two for being even closer to C++, in particular to how it handles tables/structs and classes. It would be the easiest to get used to for a C++ coder because of all the similarities.
However, if you go with JavaScript and find an HTML5 version of Ogre3D, you should be able to port your game code directly into the web version with minimal (if any) changes necessary.
Both of these are a good choice, and both have their pros and cons, but both would definitely be the easiest to learn since you're likely already working in C++. If you're working with Java, the same may hold true, and if it's Game Maker, you wouldn't need either one unless you're trying to make an executable that people wouldn't need Game Maker itself to run, in which case, good luck finding an extension to run either of these.