Clone a lua state - lua

Recently, I have encountered many difficulties when I was developing using C++ and Lua. My situation is: for some reason, there can be thousands of Lua-states in my C++ program. But these states should be same just after initialization. Of course, I can do luaL_loadlibs() and lua_loadfile() for each state, but that is pretty heavy(in fact, it takes a rather long time for me even just initial one state). So, I am wondering the following schema: What about keeping a separate Lua-state(the only state that has to be initialized) which is then cloned for other Lua-states, is that possible?

When I started with Lua, like you I once wrote a program with thousands of states, had the same problem and thoughts, until I realized I was doing it totally wrong :)
Lua has coroutines and threads, you need to use these features to do what you need. They can be a bit tricky at first but you should be able to understand them in a few days, it'll be well worth your time.

take a look to the following lua API call I think it is what you exactly need.
lua_State *lua_newthread (lua_State *L);
This creates a new thread, pushes it on the stack, and returns a pointer to a lua_State that represents this new thread. The new thread returned by this function shares with the original thread its global environment, but has an independent execution stack.
There is no explicit function to close or to destroy a thread. Threads are subject to garbage collection, like any Lua object.

Unfortunately, no.
You could try Pluto to serialize the whole state. It does work pretty well, but in most cases it costs roughly the same time as normal initialization.

I think it will be hard to do exactly what you're requesting here given that just copying the state would have internal references as well as potentially pointers to external data. One would need to reconstruct those internal references in order to not just have multiple states pointing to the clone source.
You could serialize out the state after one starts up and then load that into subsequent states. If initialization is really expensive, this might be worth it.
I think the closest thing to doing what you want that would be relatively easy would be to put the states in different processes by initializing one state and then forking, however your operating system supports it:
http://en.wikipedia.org/wiki/Fork_(operating_system)
If you want something available from within Lua, you could try something like this:
How do you construct a read-write pipe with lua?

Related

lua_xmove between different lua states

According to the lua 5.1 manual, lua_xmove moves values between stacks of different threads belonging to the same Lua state. But, I accidentally happened to use it to move values across different Lua states and it seemed to work fine! Is there any other API to move values from one Lua state to another (in 5.1), or can lua_xmove be used?
Lua stores garbage collection data in the global state. So, if you move GC or string objects across states, you can potentially confuse the garbage collector and create dangling references.
So, while it might look like it works, it could just as easily cause problems later on.
For reference, see this mailing list thread where developers discuss this exact issue.
Note that lua_xmove does check that the global states are the same:
api_check(from, G(from) == G(to));

How do I save the memory state of a C program to jumpstart later

In a large complex C program, I'd like to save to a file the contents of all memory that is used by static variables, global structures and dynamically allocated variables. Those memory variables are more than 10,000.
The C program has only single thread, no file operation and program itself is not so complex (calculation is complex).
Then, in a same execution of the program, I want to initialize the memory from this saved state.
If this is even possible, can someone offer an approach to accomplish this?
You have to define a Struct to keep al your data in and then you have to implement a function to save it into a file.
Something like this: Saving struct to file
Please note, however, that this method is the simplest, but comes with no portability at all.
Edit after Comment: basically, what you would like to do is save whatever is happening in the program and then restart it after a load. I don't think this is possible in any simple way. You MUST understand what "status of your application" means.
Think about it: doing a dump of the memory saves not only the data, but also the current Instruction Pointer. So, with that "dumb" dump, you would have also saved the actual instruction currently running. And many more complications you really don't want to care about.
The closest thing you are thinking about is running the program in a Virtual Machine. If you pause the VM the execution status will be "saved", but whenever you restart the VM, the program will restart at the exact same execution point you paused it.
If the configurations are scattered through the application, still you can access a global struct used to save everything.
But still you have to know your program and identify what you have to save. No shortcuts on that.

How would someone create a preemptive scheduler for the Lua VM?

I've been looking at lua and lvm.c. I'd very much like to implement an interface to allow me to control the VM interpreter state.
Cooperative multitasking from within lua would not work for me (user contributed code)
The debug hook gets me only about 50% of the way there, instruction execution limits, but it raises an exception which just crashes the running lua code - but I need to be able to tweak it even further.
I want to create a system where 10's of thousands of lua user scripts are running - individual threads would not work, and the execution limits would cause headache for beginning developers, I'm going to control execution speeds too. but ultimately
while true do
end
will execute forever, and I really don't care that it is.
Any ideas, help or other implementations that I could look at?
EDIT: This is not about sandboxing pretend I'm an expert in that field for this conversation
EDIT: I do not want to use an internally ran lua code coroutine based controller.
EDIT: I want to run one thread, and manage a large number of user contributed lua scripts, an external process level control mechansim would not scale at all.
You can search for Lua Sandbox implementations; for example, this wiki page and SO question provide some pointers. Note that most of the effort in sandboxing is focused on not allowing you to execute bad code, but not necessarily on preventing infinite loops. For better control you may need to combine Lua sandboxing with something like LXC or cpulimit. (not relevant based on the comments)
If you are looking for something Lua-based, lightweight, but not necessarily 100% foolproof, then you can try running your client code in a separate coroutine and set a debug hook on that coroutine that will be triggered every N-th line. In that hook you can check if the process you are running exceeded its quotes. You also need to take care of new coroutines started as those need to have their own hooks set (you either need to disable coroutine.create/wrap or to replace them with something that sets the debug hook you need).
The code in this case may look like:
local coro = coroutine.create(client_func)
debug.sethook(coro, debug_hook, "l", 1000) -- trigger hook on every 1000th line
It's not foolproof, because it may block on some IO operation and the debug hook will not help there.
[Edit based on updated question and comments]
Between "no lua code coroutine based controller" and "no external process control mechanism" I don't think you are left with much choice. It may be that your only option is to run one VM per user script and somehow give ticks to those VMs (there was a recent question on SO on this, but I can't find it). Before going this route, I would still try to do this with coroutines (which should scale to tens of thousands easily; Tir claims supporting 1M active users with coroutine-based architecture).
The mechanism would roughly look like this: you install the debug hook as I shown above and from that hook you yield back to your controller, which then decides what other coroutine (user script) to resume. I have this very mechanism working in the Lua debugger I've been developing (although it only does it for one client script). This doesn't protect you from IO calls that can block and for that you may still need to have a watchdog at the VM level to see if it's been blocked for longer than needed.
If you need to serialize and deserialize running code fragments that preserve upvalues and such, then Pluto is probably your only option.
Look at implementing lua_lock and lua_unlock.
http://www.lua.org/source/5.1/llimits.h.html#lua_lock
Take a look at lulu. It is lua VM written on lua. It's for Lua 5.1
For newer version you need to do some work. But it's then you really can make a schelduler.
Take a look at this,
https://github.com/amilamad/preemptive-task-scheduler-for-lua
I maintain this project. It,s a non blocking preemptive scheduler for running lua code. Suitable for long running game scripts.

Is the process dictionary appropriate in this case?

I've read several comments here and elsewhere suggesting that Erlang's process dictionary was a bad idea and should die. Normally, as a total Erlang newbie, I'd just avoid it. However, in this situation my other options aren't great.
I have a main dispatcher function that looks something like this:
dispatch(State) ->
receive
{cmd1, Params} ->
NewState = do_cmd1_stuff(Params, State),
dispatch(NewState);
{cmd2, Params} ->
NewState = do_cmd2_stuff(Params, State),
dispatch(NewState);
BadMsg ->
log_error(BadMsg),
dispatch(State)
end.
Obviously, my names are more meaningful to me, but that's the gist of it. Deep down in a function called by a function called by a function called by do_cmd2_stuff(), I want to send out messages to all my users telling them about something I've done. In order to do that, I need to get the list of users from the point where I send the messages. The user list doesn't lend itself easily to sticking in the global state, since that's just one data structure representing the only block of data on which I operate.
The way I see it, I have a couple unpleasant options other than using the process dictionary. I can send the user list through all the various levels of functions down to the very bottom one that does the broadcasting. That's unpleasant because it causes all my functions to gain a parameter, whether they really care about it or not.
Alternatively, I could have all the do_cmdN_stuff() functions return a message to send. That's not great either though, since sending the message may not be the last thing I want to do and it clutters up my dispatcher with a bunch of {Msg, NewState} tuples. Furthermore, some of the functions might not have any messages to send some of the time.
Like I said earlier, I'm very new to Erlang. Maybe someone with more experience can point me at a better way. Is there one? Is the process dictionary appropriate in this case?
The general rule is that if you have doubts, you shouldn't use the process dictionary.
If the two options you mentioned aren't good enough (I personally like the one where you return the messages to send) and what you want is some particular piece of code to track users and forward messages to them, maybe what you want to do is have a process holding that info.
Pid ! {forward, Msg}
where Pid will take care of sending everything to a bunch of other processes. Now, you would still need to pass the Pid around, unless you give it a name in some registry to find it. Either with register/2, global or gproc.
A simple answer would be to nest your global within a state record, which is then threaded through the system, at least at the stop level. This makes it easy to add new fields to the state in the future, not an uncommon occurrence, and allow you to keep your global state data structure untouched. So initially
-record(state, {users=[],state_data}).
Defining it as a record makes it easy to access and extend when necessary.
As you mentioned you can always pass the user list as extra param, thats not so bad.
If you don't want to do this just put it in State. You can have a special State just for this part of the calculation that also contains the user list.
Then there always is the possibility of putting it in ETS or in another server process.
What exactly to do is hard to recommend since it depends a lot on your exact application and preferences.
Just choose from the mentioned possibilities as if the process dictionary doesn't exist. Maybe your code needs restructuring if none of the variants look elegant, there always is some better way without the process dictionary.
Its really bad it is still there, because its alluring to many beginning Erlang users.
You really should not use process dictionary. I accept using dictionary only if
It is short living process.
I have full control about the process from spawn to termination i.e. I use minimum and well known set of external modules.
I need performance gain badly. It means avoid copy of data when using ets and dict/gb_tree is too slow (for GC reason).
ad 1. is not your case, you are using in server. ad 2. I don't know if it is your case. ad 3. is not your case because you need list of recipient so you don't gain nothing from that process dictionary is very fast key/value storage. In your case I don't see any reason why you should not include what you need to your State. IMHO State is exactly the right place for it.
Its an interesting question because it involves the fundamentals of functional design.
My opinion:
Try as much as possible to make the function return the messages, then send them. This separates the two different tasks nicely, and separates the purely functional task from the one that causes side effects.
If this isn't possible, pass receivers as argument even if its a bit messy. If the broadcasting function uses that data, it should be given to it explicitly, for clarity and predictability.
Using ETS as Peer Stritzinger suggests is really not any better than the PD, both hides the fact that the broadcasting function uses the receiver list and makes it dependent on global data.
I'm not sure about the Erlang way of encapsulating some state in a process, as I GIVE TERRIBLE ADVICE suggests. Is it really any better that ETS or PD?
clutters up my dispatcher with a bunch
of {Msg, NewState}
This is my experience also, that you often end up like this. It's not particularly pretty, but functional design seems to encourage this. Could some language feature be introduced to make it more beautiful and natural?
EDIT:
6 years ago I wrote:
Could some language feature be introduced to make it more beautiful and natural?
After learning much more about functional programming I have realised that examples of this are state-monads and do-notation that are found in Haskell.
I would consider sending a special message to self() from deep inside the call stack, and handling it at the top level dispatch method that you've sketched, where list of users is available.

Event manager process in erlang. Named processes or Pids?

I have event manager process that dispatches events to subscribers (e.g. http_session_created, http_sesssion_destroyed). If Pid is used instead of named process, I must put it into functions to operate with event manager but if Named process is used, code will be more clear.
Which variant is right?
Thank you!
While there is no actual difference to the process naming a process, registering it, makes it global. You in essence you are telling the system that here is a global service which anyone can use.
From you description it more sounds like you are giving them names to save the, small, effort of carrying them around in your loop. If this is the case I would put their pids in a record with all the other state data you carry around. This much better indicates the type of the processes.
If you have a fixed set of "subscriber" processes, then use registered names IMO.
If, on the contrary, you have a publish/subscribe sort of architecture where subscribers come and go, then you need an infrastructure to track those and from this point it doesn't really matter if you use Pid() or names.
One of the drawbacks of using registered names is that you need to track them in your code base to avoid "collisions". So it is up to you: personally, I tend to favor named processes as, like you say, it makes reading the code clearer. One way or another, OTP doesn't care.

Resources