My Erlang project has multiple applications: How should I start them? - erlang

I'm struggling, as an OTP noobie, to understand how to structure my Erlang project. So far it has several applications under an app directory managed by rebar:
proj_root
apps
app1
app2
appN
rebar.config
I can start app1, say, in the shell with application:start(app1). No doubt I could repeat this through appN. But is there a preferred or better way? Can I, say, write a function that bundles all these starts? If so, where do I put it?
I have several other questions along this line, but will post separately.
Many thanks,
LRP

You can indeed start applications manually as you suggest. This can quickly become burdensome if you have many applications and there are dependencies between them.
Automating the process is quite easy to implement with a recursive function. If you try to start an application while one or more dependencies is not running, application:start/1 will fail and return {error, {not_started, App}}. This function can be in any of your applications or even in its own.
However, this manual (or automated) way to proceed is not the OTP way, even if it can prove useful (typically for tests…). If you follow OTP principles, you are supposed to create a release with a .rel file containing all your applications. OTP releases are composed of a set of applications (yours and system applications they depend on), an emulator and a boot script that will start all applications (and handle dependencies). Starting a node with your applications can be performed by using the -boot flag to erl pointing to the proper boot script.
This is quite complex and rebar can actually build releases. It will even produce shell scripts to start nodes with all your applications using the OTP boot mechanism.

Related

Erlang applications design (how to short-circuit)

I am having an existential question about how I design my erlang applications:
I usually create an application, which starts a supervisor and some workers.
Aside from the supervision tree, I have modules with functions (duh).
I also have a web API that calls functions from applications' modules.
When I stop my application (application:stop(foo).), the webserver can still call foo's functions.
I find it "not idiomatic" to not be able to have a proper circuit-breaker for the foo application.
Does it mean that every public functions from foo should spawn a process under it's supervisor?
Thanks,
Bastien
Not necessarily, for two reasons:
The foo application will have two kinds of functions: those that require the worker processes to be running, and those that don't (most likely pure functions). If the application is stopped, obviously the former will fail when called, while the latter will still work. As per Erlang's "let it crash" philosophy, this is just another error condition that the web server needs to handle (or not handle). If the pure functions still work, there is no reason to prohibit the web server from calling them: it means that a greater portion of the system is functional.
In an Erlang node, stopping an application is not something you'd normally do. An Erlang application declares dependencies, that is, applications that need to be running for it to function correctly. You'll notice that if you try to start an application before its dependencies, it will refuse to start. While it's possible to stop applications manually, this means that the state of the node is no longer in accordance with the assumptions of the application model. When building a "release" consisting of a set of Erlang applications, normally they would all be started as permanent applications, meaning that if any one application crashes, the entire Erlang node would exit, in order not to violate this assumption.

Cowboy Message Queue Design

I have a web service written in Cowboy and I am planning to use RabbitMQ as the DB layer. So my Cowboy service will be one of the producer which writes to the queue and the consumer writes to the database. There are couple more asynchronous tasks that will come from another service (not Cowboy).
Now the question is where these consumers should go. Should these be part of single erlang app or should I create separate Erlang app for all the consumers.
Any advice would be highly appreciated.
Since Erlang is not the exclusive producer, and since one can usually imagine consumers running without knowledge of the producers, having separate applications is not a bad idea at all. You can have multiple top-level applications in a single Erlang release (that's what the dependencies are, really), so you can always put all the code in the same repository (I usually have a top level apps/ directory for these), and if needed later on split them out to separate repos.
Having them as separate applications certainly makes deciding later on to distribute the application across multiple erlang nodes easier: just start the relevant producer applications s on some nodes, and the consumer application on others.
So while either way will probably work, separate apps is probably a cleaner design and keeps the door open for future expansion in a slightly nicer way.

Turning on dbg tracing on a remote node?

When running our Erlang application in our system tests, I sometimes want to turn on and capture a debug trace.
The Erlang node is started using a relx start script (called as _rel/bin/foo foreground), so I don't have any control over the startup options. The system test runner (written in Python) is capturing stdout from the node.
How do I connect to an Erlang node, using -remsh, turn on dbg-tracing, and have that output written to stdout on the original node? And how do I do this all in a Python-friendly way (though I'm happy to write an escript if that'll make it easier).
To complicate this further, the relx generated release doesn't include the runtime_tools library, so dbg: isn't actually available, so I'll also add this question.
There are quite few way you could do that. All depends on what you are familiar with, and what your use case is.
I would start from doing everything by hand. That way you have greatest control on that's going one, and how effects look like (if you are turning too much debugging or not enough). That's I'm most familiar with, and in the end you almost always will have to connect to remote shell and do something by hand (from my experience)
One feature of dbg that not too many people talk about i ability of saving/loading trace pasterns from files. I find those easiest way to store and share debugging information in between sessions; but lack of readability might be too big trade-off.
You don't have to use dbg if you don't want to interfere with your live system too much. You could use erlang:trace which is given by default, but you must be cautious about state you leave your VM in (dbg should turn off all tracing upon exit; with erlang:trace that's your responsibility)
If you debug session is part of python script, writng escript and calling it from python would be my way to go. You just have to remember that escripts are run in new VM, and -remsh will not allow you to just run your code on other VM. You will have to use rpc module for that.
Since you are using application is released you might look into logging. One might assume that there should already be some logging in place, quite possible lager which is somewhat standard in Erlang, and which have possibility to change logging level during runtime.
Personally I would try some mix of first and last option, and just experiment.

mochiweb and gen_server

[This will only make sense if you've seen Kevin Smith's 'Erlang in Practice' screencasts]
I'm an Erlang noob trying to build a simple Erlang/OTP system with embedded webserver [mochiweb].
I've walked through the EIP screencasts, and I've toyed with simple mochiweb examples created using the new_mochiweb.erl script.
I'm trying to figure out how the webserver should relate to the gen_server modules. In the EIP examples [Ch7], the author creates a web_server.erl gen_server process and links the mochiweb_http process to it. However in a mochiweb project, the mochiweb_http process seems to be 'standalone'; it doesn't seem to be embedded in a separate gen_server process.
My question is, should one of these patterns be preferred over the other ? If so, why ? Or doesn't it matter ?
Thanks in advance.
You link processes to the supervisor hierarchy of your application for two reasons: 1) to be able to restart your worker processes if they crash, and 2) to be able to kill all your processes when you stop the application.
As the previous answer says, 1) is not the case for http requests handling processes. However, 2) is valid: if you let your processes alone, you can't guarantee that all your processes will be cleared from the VM after stopping your application (think of processes stuck in endless loops, waiting in receives, etc...).
The reason to embed a process in a supervision tree is so that you can restart it if it fails.
A process that handles an HTTP request is responding to an event generated externally - in a browser. It is not possible to restart it - that is the prerogative of the person running the browser - therefore it is not necessary to run it under OTP - you can just spawn it without supervision.

Learning Erlang? speedbump thread, common, small problems

I just want know all the small problems that got between you and your final solution when you were new to Erlang.
For example, here are the first speedbumps I had:
Use controlling_process(Socket, Pid) if you spawn off in multiple threads. Right packet to the right thread.
You going to start talking to another server? Remember to net_adm:ping('car#bsd-server'). in the shell. Else no communication will get through.
Timer:sleep(10), if you want to do nothing. Always useful when debugging.
Learning to browse the standard documentation
Once you learn how the OTP documentation is organised it becomes much easier to find what you're looking for (you tend to need to learn which applications provide which modules or kinds of modules).
Also just browsing the documentation for applications is often quite rewarding - I've discovered lots of really useful code this way - sys, dbg, toolbar, etc.
The difference between shell erlang and module erlang
Shell erlang is a slightly different dialect to module erlang. You can't define module functions (only funs), you need to load record definitions in order to work with records (rr/1) and so on. Learning how to write erlang code in terms of anonymous functions is somewhat tricky, but is essential for working on production systems with a remote shell.
Learning the interaction between the shell and {start,spawn}_link ed processes - when you run some shell code that crashes (raises an exception), the shell process exits and will broadcast exit signals to anything you linked to. This will in turn shut down that new gen_server you're working on. ("Why does my server process keep disappearing?")
The difference between erlang expressions and guard expressions
Guard expressions (when clauses) are not Erlang expressions. They may look similar, but they're quite different. Guards cannot call arbitrary erlang functions, only guard functions (length/1, the type tests, element/2 and a few others specified in the OTP documentation). Guards succeed or fail and don't have side effects. Erlang expressions on the other hand can do what they like.
Code loading
Working out when and how code upgrades work, the incantation to get a gen_server to upgrade to the latest version of a callback module (code:load(Mod), sys:suspend(Pid), sys:change_code(Pid, Mod, undefined, undefined), sys:resume(Pid).).
The code server path (code:get_path/0) - I can't count how many times I ran into undefined function errors that turned out to be me forgetting to add an ebin directory to the code search path.
Building erlang code
Working out a useful combination of emake (make:all/0 and erl -make) and gnu make took quite a long time (about three years so far :).
My current favourite makefiles can be seen at http://github.com/archaelus/esmtp/tree/master
Erlang distribution
Getting node names, dns, cookies and all the rest right in order to be able to net_adm:ping/1 the other node. This takes practise.
Remote shell IO intricacies
Remembering to pass group_leader() to io:format calls run on the remote node so that the output appears in your shell rather than mysteriously disappearing (I think the SASL report browser rb still has a problem with sending some of its output to the wrong node when used over a remote shell connection)
Integrating it into msvc 6, so I could use the editor, and see the results in the output window.
I created a tool, with
command - path to erlc
arguments - +debug_info $(FileName)$(FileExt)
Initial Directory - $(fileDir)
Checked Use Output Window.
Debugging is hard. All I know to do is to stick calls to "error_logger:info_msg" in my code.
Docs have been spotty -- they're correct, but very very terse.
This is my own fault, but: I started coding before I understood eunit, so a lot of my code is harder to test than it should be.
controlling_process()
Use controlling_process(Socket, Pid) if you spawn off in multiple threads. Right packet to the right thread.
net_adm:ping()
You going to start talking to another server? Remember to net_adm:ping('car#bsd-server'). in the shell. Else no communication will get through.
timer:sleep()
Pause for X ms.
The thing that took me the most time to get my head around was just the idea of structuring my code entirely around function calls and message passing. The rest of it either just fell out from there (spawning, remote nodes) or felt like the usual stuff you've got to learn in any new language (syntax, stdlib).

Resources