I'm looking For help in this matter, what options do I have if I want to sandbox the execution of commands that are typed in a website? I would like to create an online interpreter for a programming language.
I've been looking at docker, how would I use it? Is this the best option?
codecube.io does this. It's open source: https://github.com/hmarr/codecube
The author wrote up his rationale and process. Here's how the system works:
A user types some code in to a box on the website, and specifies the language the code is written in
They click “Run”, the code is POSTed to the server
The server writes the code to a temporary directory, and boots a docker container with the temporary directory mounted
The container runs the code in the mounted directory (how it does this varies according to the code’s language)
The server tails the logs of the running container, and pushes them down to the browser via server-sent events
The code finishes running (or is killed if it runs for too long), and the server destroys the container
The Docker container's entrypoint is entrypoint.sh, which inside a container runs:
prog=$1
<...create user and set permissions...>
sudo -u codecube /bin/bash /run-code.sh $prog
Then run-code.sh checks the extension and runs the relevant compiler or interpreter:
extension="${prog##*.}"
case "$extension" in
"c")
gcc $prog && ./a.out
;;
"go")
go run $prog
;;
<...cut...>
The server that accepts the code examples from the web, and orchestrates the Docker containers was written in Go. Go turned out to be a pretty good choice for this, as much of the server relied on concurrency (tailing logs to the browser, waiting for containers to die so cleanup could happen), which Go makes joyfully simple.
The author also details how he implemented resource limiting, isolation and thoughts of security.
Related
I have implemented the LAMP stack for a 3rd party forum application on its own dedicated virtual server. One of my aims here was to use a composed docker project (under Git) to encapsulate the application fully. I wanted to keep this as simple to understand as possible for the other sysAdmins supporting the forum, so this really ruled out using S6 etc., and this in turn meant that I had to stick to the standard of one container per daemon service using the docker runtime to do implement the daemon functionality.
I had one particular design challenge that doesn't seem to be addressed cleanly through the Docker runtime system, and that is I need to run periodic housekeeping activities that need to interact across various docker containers, for example:
The forum application requires a per-minute PHP housekeeping task to be run using php-cli, and I only have php-cli and php-fpm (which runs as the foreground deamon process) installed in the php container.
Letsencrypt certificate renewal need a weekly certbot script to be run in the apache container's file hierarchy.
I use conventional /var/log based logging for high-volume Apache access logs as these generate Gb access files that I want to retain for ~7 days in the event of needing to do hack analysis, but that are otherwise ignored.
Yes I could use the hosts crontab to run docker exec commands but this involves exposing application internals to the host system and IMO this is breaking one of my design rules. What follows is my approach to address this. My Q is really to ask for comments and better alternative approaches, and if not then this can perhaps serve as a template for others searching for an approach to this challenge.
All of my containers contain two special to container scripts: docker-entrypoint.sh which is a well documented convention; docker-service-callback.sh which is the action mechanism to implement the tasking system.
I have one application agnostic host service systemctl: docker-callback-reader.service which uses this bash script, docker-callback-reader. This services requests on a /run pipe that is volume-mapped into any container that need to request such event processes.
In practice I have only one such housekeeping container see here that implements Alpine crond and runs all of the cron-based events. So for example the following entry does the per-minute PHP tasking call:
- * * * * echo ${VHOST} php task >/run/host-callback.pipe
In this case the env variable VHOST identifies the relevant docker stack, as I can have multiple instances (forum and test) running on the server; the next parameter (php in this case) identifies the destination service container; the final parameter (task) plus any optional parameters are passed as arguments to a docker exec of php containers docker-service-callback.sh and magic happens as required.
I feel that the strengths of the system are that:
Everything is suitably encapsulated. The host knows nothing of the internals of the app other than any receiving container must have a docker-service-callback.sh on its execution path. The details of each request are implemented internally in the executing container, and are hidden from the tasking container.
The whole implementation is simple, robust and has minimal overhead.
Anyone is free to browse my Git repo and cherry-pick whatever of this they wish.
Comments?
I have a development use-case where I use a script to configure a shell with docker-machine or other environment and then open a directory containing source and settings (/.vscode/, .devcontainer/) that I can edit/build/debug in the VS code Remote Containers extension.
In short, I'm looking to implement the following sequence when a "start-development.sh" script/hook runs:
Set up host-side env or remote resources (reverse sshfs to mount source to a remote docker-machine, modprobe, docker buildx, xhost for x-passthrough, etc.)
Run VS Code in that shell so settings aren't thrown away with a specified directory (may be mounted via sshfs or other means) in container, not just open on the host
Run cleanup scripts to clean-up and/or destroy real resources (unmount, modprobe -r, etc.) when the development container is stopped (by either closing VS Code or rebuilding the container).
See this script for a simple example of auto-configuring a shell with an AWS instance via docker-machine. I'll be adding a few more examples to this repository over the coming day or so.
It's easy enough to open VS Code in that directory (code -w -n --folder-uri /path/here) and wait for it to quit (so I can perform cleanup steps like taking down the remote docker-machine, un-mounting reverse-sshfs mounted code or disabling kernel mods I use for development, etc.).
However, VS code currently opens in "host mode" and when I choose "Reopen in container" or "Rebuild container" via the UI or command palette, it kills that process and opens another top-level(?) process, quitting the shell & throwing away my configuration and/or prematurely running my cleanup portion of the script so it has the wrong env. when it finally launches in-container. Sadness.
So finally, my question is:
Is there a way to tell VS code to open a folder "in-container"? This would solve a ton of problems for me, instead of a janky dev. cycle where I have to ensure that the code instance isn't restarting itself and messing things up - whenever I rebuild the container, for example.
Alternatively, it'd be great to not quit the top-level code process I started altogether, enabling me to wait on that, or perhaps monitor it in other ways I'm not aware of to prevent erasure of my settings and premature run of my cleanup script?
Thanks in advance!
PS: Please read the entire question before flagging it as "not related to development". If the idea of a zero-install development environment for a complex native project, live on-device development/debugging or deep learning using cloud instances with giant GPUs for Docker where you don't have to manually manage everything and write pages of readmes appeals to you - this is very much about programming.
After all weekend of trying different things, I finally figured it out! The key was this section in the awesome articles about advanced container configuration.
I put that into a bash script and used jq to merge docker.host and other docker env settings into .vscode/settings.json. See this example here.
After running a script that generates this file, the user will only need to reload/relaunch VS code in that workspace folder (where the settings were created) and yay, everything works as expected.
I plan to add some actual samples now that I have the basics working. Unfortunately, I had to separate my create and teardown as separate activate and deactivate hooks. Not a bad workflow still, IMO.
That container is built when deploying the application.
Looks like its purpose is to share dependencies across modules.
It looks like it is started as a container but nothing is apparently running, a bit like an init container.
Console says it starts/stops that component when using respective wolkenkit start and wolkenkit stop command.
On startup:
On shutdown:
When you docker ps, that container cannot be found:
Can someone explain these components?
When starting a wolkenkit application, the application is boxed in a number of Docker containers, and these containers are then started along with a few other containers that provide the infrastructure, such as databases, a message queue, ...
The reason why the application is split into several Docker containers is because wolkenkit builds upon the CQRS pattern, which suggests separating the read side of an application from the application's write side, and hence there is one container for the read side, and one for the write side (actually there are a few more, but you get the picture).
Now, since you may develop on an operating system other than Linux, the wolkenkit application may run under a different operating system than when you develop it, as within Docker it's always Linux. This means that the start command can not simply copy over the node_modules folder into the containers, as they may contain binary modules, which are then not compatible (imagine installing on Windows on the host, but running on Linux within Docker).
To avoid issues here, wolkenkit runs an npm install when starting the application inside of the containers. The problem now is that if wolkenkit did this in every single container, the start would be super slow (it's not the fastest thing on earth anyway, due to all the Docker building and starting that's happening under the hood). So wolkenkit tries to optimize this as much as possible.
One concept here is to run npm install only once, inside of a container of its own. This is the node-modules container you encountered. This container is then linked as a volume to all the containers that contain the application's code. This way you only have to run npm install once, but multiple containers can use the outcome of this command.
Since this container now contains data, but no code, it only has to be there, it doesn't actually do anything. This is why it gets created, but is not run.
I hope this makes it a little bit clearer, and I was able to answer your question :-)
PS: Please note that I am one of the core developers of wolkenkit, so take my answer with a grain of salt.
I am using Mongooseim 3.2.0 from the source code on the ubuntu server. Below are concern:
What is the best way to run mongooseim as a service so that it automatically restarts if mongooseim crashes or system restarts?
How to interact via terminal with already running mongooseim instance on the ubuntu server like "mongooseimctl live". My guess is running "mongooseimctl live" will try to create another instance. I just want to see the live logs and interaction and don't want to keep scrolling the long log files for this purpose.
I apologize if the answer to above is obvious but just want to follow the best guidance.
mongooseimctl live or mongooseimctl foreground is mostly useful for development or smoke testing a deployment (unless you're running inside a container). For real world use cases you should start the server in the background with mongooseimctl start.
Back to the container - the best approach for containerised applications is to run them in the foreground, therefore in a container startup script use mongooseimctl foreground.
Once the server is running (no matter how it was started) attaching a shell to troubleshoot issues can be done with mongooseimctl debug. This is the command to use when you get the Protocol 'inet_tcp': the name mongooseim#localhost seems to be in use by another Erlang node error. Be careful if it's a production environment - you can easily take the server down with access to this shell.
If you're just interested in watching logs, with no interactive access to the server internals that the shell offers, a simple tail -f /your-configured-mongooseim-log-dir/* should be enough.
Ubuntu nowadays uses systemd for managing its services' lifetimes. A systemd .service file can be found at https://github.com/esl/MongooseIM/blob/master/tools/pkg/platforms/debian_stretch/files/build/mongooseim.service - we use it for packaging into Debian/Ubuntu .deb packages.
I'm running a program that uses several docker images and containers and it's all spawned and managed by the code. At the same time, I need to enter into the docker exec -it cli bash and execute some commands. These commands however cant be manual and must be made into an api. After extensive searching the closest thing I found is docker remote api [https://blog.trifork.com/2013/12/24/docker-from-a-distance-the-remote-api]. However, I'm a bit scared messing with the internals of docker. I want the spawning and management to remain controlled by the program. I only need to run a limited number of commands to docker cli. Is docker remote api the right way to go? Will it handle scale- my application may see ~27000 mobile and webapps use/call the apis from different parts of the world. Tried and tested solutions would be preferred.
Any advice would be highly appreciated.
There’s not an easy answer to this. Since you include “safest” in the question title, I will suggest you probably need to do some redesign of your application architecture.
The first critical detail is this: being able to run any Docker command, or access the Docker API, implies unrestricted root access on the host. You can trivially docker run an image with writeable root-level access to the host’s filesystem and steal public keys, user passwords, give yourself sudo access, and so on. Using it as a core part of your workflow is incredibly dangerous. Turning on the Docker remote API at all is incredibly dangerous.
As a corollary to this, while docker exec is handy as a debugging tool, you can’t really use it as part of your core workflow. As you note running commands by hand as a trusted administrator doesn’t scale. There are also dangers in shell quoting: you need to make sure an argument doesn’t look like foo; docker run -v/:/host ... and inadvertently gain access to the host system.
In my mind your only real option here is to do this “properly”. Take whatever administrative commands you need to do and wrap them in some API, probably HTTP-based. Build a new service (or several) and add it to your Docker deployment. Maybe under the hood that launches a shell script as a subprocess, but the API wrapper has control over the arguments and can double-check things. The plus side is that this approach probably won’t be a choke point if your application does need to scale out.