using Kernel#fork for backgrounding processes, pros? cons? - ruby-on-rails

I'd like some thoughts on whether using fork{} to 'background' a process from a rails app is such a good idea or not...
From what I gather fork{my_method; Process#setsid} does in fact do what it's supposed to do.
1) creates another processes with a different PID
2) doesn't interrupt the calling process (e.g. it continues w/o waiting for the fork to finish)
3) executes the child until it finishes
..which is cool, but is it a good idea? What exactly is fork doing? Does it create a duplicate instance of my entire rails mongrel/passenger instance in memory? If so that would be very bad. Or, does it somehow do it without consuming a huge swath of memory.
My ultimate goal was to do away with my background daemon/queue system in favor of forking these processes (primarily sending emails) -- but if this won't save memory then it's definitely a step in the wrong direction

The fork does make a copy of your entire process, and, depending on exactly how you are hooked up to the application server, a copy of that as well. As noted in the other discussion this is done with copy-on-write so it's tolerable. Unix is built around fork(2), after all, so it has to manage it fairly fast. Note that any partially buffered I/O, open files, and lots of other stuff are also copied, as well as the state of the program that is spring-loaded to write them out, which would be incorrect.
I have a few thoughts:
Are you using Action Mailer? It seems like email would be easily done with AM or by Process.popen of something. (Popen will do a fork, but it is immediately followed by an exec.)
immediately get rid of all that state by executing Process.exec of another ruby interpreter plus your functionality. If there is too much state to transfer or you really need to use those duplicated file descriptors, you might do something like IO#popen instead so you can send the subprocess work to do. The system will share the pages containing the text of the Ruby interpreter of the subprocess with the parent automatically.
in addition to the above, you might want to consider the use of the daemons gem. While your rails process is already a daemon, using the gem might make it easier to keep one background task running as a batch job server, and make it easy to start, monitor, restart if it bombs, and shut down when you do...
if you do exit from a fork(2)ed subprocess, use exit! instead of exit
having a message queue and a daemon already set up, like you do, kinda sounds like a good solution to me :-)

Be aware that it will prevent you from using JRuby on Rails as fork() is not implemented (yet).

The semantics of fork is to copy the entire memory space of the process into a new process, but many (most?) systems will do that by just making a copy of the virtual memory tables and marking it copy-on-write. That means that (at first, at least) it doesn't use that much more physical memory, just enough to make the new tables and other per-process data structures.
That said, I'm not sure how well Ruby, RoR, etc. interacts with copy-on-write forking. In particular garbage collection could be problematic if it touches many memory pages (causing them to be copied).

Related

How resilient is modern Rails to the antipattern "thread + fork"?

I think this is a popular antipattern that happens either standalone, for example activeJob local task with async, or coming from controllers, because then the strategy of the server must be taken into account.
My question is, what cautions should one take in the code when forking inside a thread (think inside of a ActiveJob task) and then even threading it?
The main worries I have seen online are:
Needs to lose and reopen the database connections after the fork. It seems that nowadays activeRecord takes care of it, doesn't it?
Access to the common Logger could be complicated. Somehow it seems to work.
Concurrent was expected to be problematic too but current versions are patched to detect that a fork has happened and threads are dead. Still it seems that one needs to make sure of doing, at the end of the forked process, a fine shutdown of any Rails::Concurrent pool that could have active or pending jobs. I think that it is enough
ActiveJob::Base.queue_adapter.shutdown
but perhaps it could miss some tasks that have not started or tasks under other Concurrent queue. In fact I think it already happens if one uses Concurrent::Future in a controller managed by the puma webserver. Generically I try to insert
Concurrent::global_io_executor.shutdown
Concurrent::global_io_executor.wait_for_termination
Extra problems I have found are resource-related: the postgres server is not ready to manage so many connections by default. Perhaps it could be sensible to reduce the size of the connection pool before the fork. And the inotify watcher gem also exhausts resource, when launched in development. Production is fine in both cases.
TL;DR; - I'm against doing it but many of us do it anyway and ignore the fact that it's unsafe... things break too rarely.
It is a simple fact that calling fork in a multi-threaded process may cause the new child to crash / deadlock / spin and may also cause other (harder to isolate) bugs.
This has nothing to do with Ruby, this is related to the locking mechanisms that safeguard critical sections and core process functionality such opening/closing files, allocating memory and any user created mutex / spinlock, etc'.
Why is it risky?
When calling fork the new process inherits all the state of the previous process but only the thread that called fork (all other threads do not exist in the new process).
This means that if any of the other threads was inside a critical section (i.e., allocating memory, opening a file, etc'), that critical section would remain locked for the lifetime of the new process, possibly causing deadlocks or unexpected errors.
Why do we ignore it?
In practical terms, the risk of something seriously breaking is often very low and most developers hadn't both encounter the issue and recognized its cause. Open files can be manually (if not automatically) closed, which leaves us mostly with the question of critical sections.
We can often reset our own critical sections which leaves mostly the system's critical sections...
The system's core critical sections that can be effected by fork are not that many. The main one is the memory allocator which can hardly ever break. Often the malloc implementation has multiple "arenas", each with its own critical section and it would be a long-shot to hit the system's underlying page allocation (i.e., mmap).
So is it safe?
No. Things still break, they just break rarely and when they do it isn't always obvious. Also, a parent process can sometime catch some of these errors and retry / recuperate and there are other ways to handle the risks.
Should I do it?
I wouldn't recommend to do it, but it depends. If you can handle an error, sure, go ahead. If not, that's a no.
Anyway, it's usually much better to use IPC to forward a message to a background process so that process perform any required fork / task.
The pattern can occur naturally when a Rails controller is combined with a webserver. The situation is different depending if the webserver is threaded, forked or evented, but the final conclusion is the same; that it is safe.
Fork + fork and thread + fork should not present problems of multiple access to the database or multiple execution of the same code, as only the current thread is active in the children.
Event + fork could be a source of troubles if the event machine is still active in the forked thread. Fortunately most designs generate a separate thread for the control of the event loop.

Is gen_server restart strategy copy state?

Erlang world not use try-catch as usual. I'm want to know how about performance when restart a process vs try-catch in mainstream language.
A Erlang process has it's small stack and heap concept which actually allocate in OS heap. Why it's effective to restart it?
Hope someone give me a deep in sight about Beam what to do when invoke a restart operation on a process.
Besides, how about use gen_server which maintain state in it's process. Will cause a copy state operate when gen_server restart?
Thanks
I recommend having a read of https://ferd.ca/the-zen-of-erlang.html
Here's my understanding: restart is effective for fixing "Heisenbug" which only happens when the (Erlang) process is in some weird state and/or trying to handle a "weird" message.
The presumption is that you revert to a known good state (by restarting), which should handle all normal messages correctly. Restart is not meant to "fix all the problems", and certainly not for things like bad configuration or missing internet connection. By this definition we can see it's very dangerous to copy the state when crash happened and try to recover from that, because this is defeating the whole point of going back to a known state.
The second point is, say this process only crashes when handling an action that only 0.001% (or whatever percentage is considered negligible) of all your users actually use, and it's not really important (e.g. a minor UI detail) then it's totally fine to just let it crash and restart, and don't need to fix it. I think it can be a productivity enabler for these cases.
Regarding your questions in the OP comment: yes just whatever your init callback returns, you can either build the entire starting state there or source from other places, totally depend on the use case.

How would someone create a preemptive scheduler for the Lua VM?

I've been looking at lua and lvm.c. I'd very much like to implement an interface to allow me to control the VM interpreter state.
Cooperative multitasking from within lua would not work for me (user contributed code)
The debug hook gets me only about 50% of the way there, instruction execution limits, but it raises an exception which just crashes the running lua code - but I need to be able to tweak it even further.
I want to create a system where 10's of thousands of lua user scripts are running - individual threads would not work, and the execution limits would cause headache for beginning developers, I'm going to control execution speeds too. but ultimately
while true do
end
will execute forever, and I really don't care that it is.
Any ideas, help or other implementations that I could look at?
EDIT: This is not about sandboxing pretend I'm an expert in that field for this conversation
EDIT: I do not want to use an internally ran lua code coroutine based controller.
EDIT: I want to run one thread, and manage a large number of user contributed lua scripts, an external process level control mechansim would not scale at all.
You can search for Lua Sandbox implementations; for example, this wiki page and SO question provide some pointers. Note that most of the effort in sandboxing is focused on not allowing you to execute bad code, but not necessarily on preventing infinite loops. For better control you may need to combine Lua sandboxing with something like LXC or cpulimit. (not relevant based on the comments)
If you are looking for something Lua-based, lightweight, but not necessarily 100% foolproof, then you can try running your client code in a separate coroutine and set a debug hook on that coroutine that will be triggered every N-th line. In that hook you can check if the process you are running exceeded its quotes. You also need to take care of new coroutines started as those need to have their own hooks set (you either need to disable coroutine.create/wrap or to replace them with something that sets the debug hook you need).
The code in this case may look like:
local coro = coroutine.create(client_func)
debug.sethook(coro, debug_hook, "l", 1000) -- trigger hook on every 1000th line
It's not foolproof, because it may block on some IO operation and the debug hook will not help there.
[Edit based on updated question and comments]
Between "no lua code coroutine based controller" and "no external process control mechanism" I don't think you are left with much choice. It may be that your only option is to run one VM per user script and somehow give ticks to those VMs (there was a recent question on SO on this, but I can't find it). Before going this route, I would still try to do this with coroutines (which should scale to tens of thousands easily; Tir claims supporting 1M active users with coroutine-based architecture).
The mechanism would roughly look like this: you install the debug hook as I shown above and from that hook you yield back to your controller, which then decides what other coroutine (user script) to resume. I have this very mechanism working in the Lua debugger I've been developing (although it only does it for one client script). This doesn't protect you from IO calls that can block and for that you may still need to have a watchdog at the VM level to see if it's been blocked for longer than needed.
If you need to serialize and deserialize running code fragments that preserve upvalues and such, then Pluto is probably your only option.
Look at implementing lua_lock and lua_unlock.
http://www.lua.org/source/5.1/llimits.h.html#lua_lock
Take a look at lulu. It is lua VM written on lua. It's for Lua 5.1
For newer version you need to do some work. But it's then you really can make a schelduler.
Take a look at this,
https://github.com/amilamad/preemptive-task-scheduler-for-lua
I maintain this project. It,s a non blocking preemptive scheduler for running lua code. Suitable for long running game scripts.

is membase a good persistence layer for a erlang gamer server?

I aim to create a browser game where players can set up buildings.
Each building will have several modules (engines, offices,production lines, ...). Each module will have enentually one or more actions running, like creation of 2OO 'item X' with ingredients Y, Z.
The game server will be set up with erlang : An OTP application as the server itself, and nitrogen as the web front.
I need persistence of data. I was thinking about the following :
When somebody or something interacts with a building, or a timer representing some production line ends up, a supervisor spawns a gen_server (if not already spawned) which loads the state of the building from a database, so the gen_server can answer messages like 'add this module', 'starts this action', 'store this production to warehouse', 'die', etc. (
But when a building don't receive any messages during X seconds or minutes, he will terminate (thanks to the gen_server timeout feature) and drop its current state back to the database.
So, as it will be a (soft) real time game, the gen_server must be set up very fastly. I was thinking of membase as the database, because it's known to have very good response time.
My question is : when a gen server is up an running, his states fills some memory, and this state is present in the memory handled by membase too, so the state use two times his size in memory. Is that a bad design ?
Is membase a good solution to handle persistence in my case ? would be use mnesia a better choice , or something else ?
I fear mnesia 2 Go (or 4 ?) table size limit because i don't know at the moment the average state size of my gen_servers (buildings in this example, butalso players, production lines, whatever) and i may have someday more than 1 player :)
Thank you
I agree with Hynek -Pichi- Vychodil. Riak is a great thing for key-valye storage.
We use Riak almost 95% for the same thing you described. Everything works so far without any issues. In case you will hit performance limitation of Riak - add more nodes and it good to go!
Another cool thing about Riak is its very low performance degradation over the time. You can find more information about benchmarking Riak here: http://joyeur.com/2010/10/31/riak-smartmachine-benchmark-the-technical-details/
In case you go with it:
a driver: https://github.com/basho/riak-erlang-client
a connection pool you may need to work with it: https://github.com/dweldon/riakpool
About membase and memory usage: I also tried membase, but I found that it is not suitable for my tasks - (membase declares fault tolerance, but I could not setup it in the way it should work with faults, even with help from membase guys I didn't succeed). So at the moment I use the following architecture: All players that are online and play the game are presented as player-processes (gen_server). All data data and business logic for each player is in its player-process. From time to time each player-process desides to save its state in riak.
So far seems to be very fast and efficient approach.
Update: Now we are with PostgreSQL. It is awesome!
You can look to bitcask or other Riak backends to store your data. Avoid IPC is definitely good idea, so keep it inside Erlang.

Erlang: create filewatcher

I have to implement file watcher functionality in Erlang: There should be a process that list files if specific directory and do something, when files appear.
I take a look at OTP. So at the moment I have following ideas:
1. Create Supervisor that will control gen_servers (one server per folder)
2. Create WatchServer - gen_server for each folder that I want to monitor.
3. Create ProcessFileServer - gen server that should do something with files )assume copy to different folder=
So First problem: WatchServer should not wait for request, it should generate one in predefined intervals.
At the moment I have created a timer in init/1 function and handle on_timer event in handle_info function.
Now questions:
1. Are there better ideas?
2. How should I inform ProcessFileServer that file found? It seams to me that it would be much more convenient create WatchServers and ProcessServers independently, but in this case I do not know to whom send message?
May be there are some similar project/libs available?
if you are using Linux, you can use inotify. It is a kernel service that lets you subscribe to file system events. Don't poll the filesystem, let the filesystem call you.
you can try https://github.com/massemanet/inotify for observing your directory.
Ulf
In Erlang it is very cheap to create processes (orders of magnitudes compared to other systems).
Therefore I recommend to create a new ProcessFileServer each time a new file to process is appearing. When it is done with just terminate the process with exit reason normal.
I would suggest the following structure:
top_supervisor
|
+-----------------------+-------------------------+
| |
directory_supervisor processing_supervisor
| simple_one_for_one
+----------+-----...-----+ |
| | | starts children transient
| | | |
dir_watcher_1 dir_watcher_2 dir_watcher_n +-------------+------+---...----+
| | |
proc_file_1 proc_file_2 proc_file_n
When a dir_watcher notices a new file appeared. It calls the processing_supervisors supervisor:start_child\2 function, with the extra parameter of the file pathe e.g.
The processing_supervisor should start its children with transient restart policy.
So if one of the proc_file servers is crashing it will be restarted, but when they terminate with exit reason normal they are not restarted. So you just exit normal when done and crash when whatever else happens.
If you don't overdo it, cyclic polling for files is Ok. If the system becomes loaded because of this polling you can investigate in kernel notification systems (e.g. FreeBSD KQUEUE or the higher level services building upon it on MacOSX) to send you a message when a file appears in a directory. These services however have a complexity because it is necessary for them to throw up their hands if too many events happen (otherwise they wouldn't be a performance improvement but the opposite). So you will have to have a robust polling solution as a fallback anyway.
So don't do premature optimization and start with polling, adding improvements (which would be isolated in the dir_watcher servers) when it gets necessary.
Regarding the comment what behaviour to use as dir_watcher process since it doesn't use much of gen_servers functionality:
There is no problem with only using part of gen_servers posibilities, in fact it is very common not to use all of it. In your case you only set up a timer in init and use handle_info to do your work. The rest of the gen_server is just the unchanged template.
If you later want changing parameters like poll frequency it is easy to add into this.
gen_fsm is much less used since it only fits a quite limited model and is not very flexible. I use it only when it really fits 100% to the requirement (which it does almost never).
In a case where you just want a simple plain Erlang server you can use the spawn functions in proc_lib to get just the minimal functionality to run under a supervisor.
A interesting way to write more natural Erlang code and still have the OTP advantages is plain_fsm, here you have the advantages of selective receive and flexible message handling needed especially when handling protocols paired with the nice features of OTP.
Having said all this: if I would write a dir_watcher I'd just use a gen_server and use only what I need. The unused functionality doesn't really cost you anything and everybody understands what it does.
I have written such a library, based on polling. (It would be nice to extend it to use inotify on platforms where this is supported.) It was originally meant to be used in EUnit, but I turned into a separate project instead. You can find it here:
https://github.com/richcarl/file_monitor

Resources