How does /dev work in Docker? - docker

I had some issues with /dev/random and I replaced it with /dev/urandom on some of my servers:
lrwxrwxrwx 1 root root 12 Jul 22 21:04 /dev/random -> /dev/urandom
I since have replaced some of my infrastructure with Docker. I thought, it would be sufficient to replace /dev/random on my host machine. But when starting the service, I quickly noticed that some RNG operations blocked because it used the original implementation of /dev/random.
I wrote a little test program to proof this behavior:
package main
import (
"fmt"
"os"
)
func main() {
f, err := os.Open("/dev/random")
if err != nil {
panic(err)
}
chunk := make([]byte, 1024)
index := 0
for {
index = index + 1
_, err := f.Read(chunk)
if err != nil {
panic(err)
}
fmt.Println("iteration ", index)
}
}
Executing this program on the host machine (which has the symlink) works as expected - it will not block and run until I shut it down.
When running this in a container, it will block after the first iteration (at least on my machine).
I can obviously fix this problem by mounting my random file into the container:
docker run -it --rm -v /dev/random:/dev/random bla
But this is not the point of this question. I want to know, the following:
How does Docker set up the devices listed in /dev
Why is it not just using (some) of the device files of the host machine.

Docker never uses any of the host system's filesystem unless you explicitly instruct it to. The device files in /dev are whatever (if anything) is baked into the image.
Also note that the conventional Unix device model is that a device "file" is either a character- or block-special device, and a pair of major and minor device numbers, and any actual handling is done in the kernel. Relatedly, the host's kernel is shared across all Docker containers. So if you fix your host's broken implementation of /dev/random then your containers will inherit this fix too; but merely changing what's in the host's /dev will have no effect.

Related

How to use io reader client

I would like to copy a zipped file from host machine to a container using go code running inside a container. The setup has go code running in a container with docker.sock mounted. The idea is to copy zip file from host machine to the container that runs go code. The path parameter is on the host machine. On host machine command line looks like this
docker cp hostFile.zip myContainer:/tmp/
The documentation for docker-client CopyToContainer looks
func (cli *Client) CopyToContainer(ctx context.Context, containerID, dstPath string, content io.Reader, options types.CopyToContainerOptions) error
How to create content io.Reader argument ?
cli, err := client.NewClientWithOpts(client.FromEnv, client.WithAPIVersionNegotiation())
if err != nil {
panic(err)
}
// TODO
// reader := io.Reader()
// reader := file.NewReader()
// tar.NewReader()
cli.CopyToContainer(context.Background(), containerID, dst, reader, types.CopyToContainerOptions{
AllowOverwriteDirWithFile: true,
CopyUIDGID: true,
})
There are a huge variety of things that implement io.Reader. In this case the normal way would be to open a file with os.Open, and then the resulting *os.File pointer is an io.Reader.
As you note in comments, though, this only helps you read and write things from your "local" filesystem. Having access to the host's Docker socket is super powerful but it doesn't directly give you read and write access to the host filesystem. (As #mkopriva suggests in a comment, launching your container with a docker run -v /host/path:/container/path bind mount is much simpler and avoids the massive security problem I'm about to discuss.)
What you need to do instead is launch a second container that bind-mounts the content you need, and read the file out of the container. It sounds like you're trying to write it into the local filesystem, which simplifies things. From a docker exec shell prompt inside the container you might do something like
docker run --rm -v /:/host busybox cat /host/some/path/hostFile.zip \
> /tmp/hostFile.zip
In Go it's more involved but still very doable (untested, imports omitted)
ctx := context.Background()
cid, err := client.ContainerCreate(
ctx,
&container.Config{
Image: "docker.io/library/busybox:latest",
Cmd: strslice.StrSlice{"cat", "/host/etc/shadow"},
},
&container.HostConfig{
Mounts: []mount.Mount{
{
Type: mount.TypeBind,
Source: "/",
Target: "/host",
},
},
},
nil,
nil,
""
)
if err != nil {
return err
}
defer client.ContainerRemove(ctx, cid.ID, &types.ContainerRemoveOptions{})
rawLogs, err := client.ContainerLogs(
ctx,
cid.ID,
types.ContainerLogsOptions{ShowStdout: true},
)
if err != nil {
return err
}
defer rawLogs.close()
go func() {
of, err := os.Create("/tmp/host-shadow")
if err != nil {
panic(err)
}
defer of.Close()
_ = stdcopy.StdCopy(of, io.Discard, rawLogs)
}()
done, cerr := client.ContainerWait(ctx, cid.ID, container. WaitConditionNotRunning)
for {
select {
case err := <-cerr:
return err
case waited := <-done:
if waited.Error != nil {
return errors.New(waited.Error.Message)
} else if waited.StatusCode != 0 {
return fmt.Errorf("cat container exited with status code %v", waited.StatusCode)
} else {
return nil
}
}
}
As I hinted earlier and showed in the code, this approach bypasses all controls on the host; I've decided to read back the host's /etc/shadow encrypted password file because I can, and nothing would stop me from writing it back with my choice of root password using basically the same approach. Owners, permissions, and anything else don't matter: the Docker daemon runs as root and most containers run as root by default (and you can explicitly request it if not).

Why is my containerized Selenium application failing only in AWS Lambda?

I'm trying to get a function to run in AWS Lambda that uses Selenium and Firefox/geckodriver in order to run. I've decided to go the route of creating a container image, and then uploading and running that instead of using a pre-configured runtime. I was able to create a Dockerfile that correctly installs Firefox and Python, downloads geckodriver, and installs my test code:
FROM alpine:latest
RUN apk add firefox python3 py3-pip
RUN pip install requests selenium
RUN mkdir /app
WORKDIR /app
RUN wget -qO gecko.tar.gz https://github.com/mozilla/geckodriver/releases/download/v0.28.0/geckodriver-v0.28.0-linux64.tar.gz
RUN tar xf gecko.tar.gz
RUN mv geckodriver /usr/bin
COPY *.py ./
ENTRYPOINT ["/usr/bin/python3","/app/lambda_function.py"]
The Selenium test code:
#!/usr/bin/env python3
import util
import os
import sys
import requests
def lambda_wrapper():
api_base = f'http://{os.environ["AWS_LAMBDA_RUNTIME_API"]}/2018-06-01'
response = requests.get(api_base + '/runtime/invocation/next')
request_id = response.headers['Lambda-Runtime-Aws-Request-Id']
try:
result = selenium_test()
# Send result back
requests.post(api_base + f'/runtime/invocation/{request_id}/response', json={'url': result})
except Exception as e:
# Error reporting
import traceback
requests.post(api_base + f'/runtime/invocation/{request_id}/error', json={'errorMessage': str(e), 'traceback': traceback.format_exc(), 'logs': open('/tmp/gecko.log', 'r').read()})
raise
def selenium_test():
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('-headless')
options.add_argument('--window-size 1920,1080')
ffx = Firefox(options=options, log_path='/tmp/gecko.log')
ffx.get("https://google.com")
url = ffx.current_url
ffx.close()
print(url)
return url
def main():
# For testing purposes, currently not using the Lambda API even in AWS so that
# the same container can run on my local machine.
# Call lambda_wrapper() instead to get geckodriver logs as well (not informative).
selenium_test()
if __name__ == '__main__':
main()
I'm able to successfully build this container on my local machine with docker build -t lambda-test . and then run it with docker run -m 512M lambda-test.
However, the exact same container crashes with an error when I try and upload it to Lambda to run. I set the memory limit to 1024M and the timeout to 30 seconds. The traceback says that Firefox was unexpectedly killed by a signal:
START RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30 Version: $LATEST
/app/lambda_function.py:29: DeprecationWarning: use service_log_path instead of log_path
ffx = Firefox(options=options, log_path='/tmp/gecko.log')
Traceback (most recent call last):
File "/app/lambda_function.py", line 45, in <module>
main()
File "/app/lambda_function.py", line 41, in main
lambda_wrapper()
File "/app/lambda_function.py", line 12, in lambda_wrapper
result = selenium_test()
File "/app/lambda_function.py", line 29, in selenium_test
ffx = Firefox(options=options, log_path='/tmp/gecko.log')
File "/usr/lib/python3.8/site-packages/selenium/webdriver/firefox/webdriver.py", line 170, in __init__
RemoteWebDriver.__init__(
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status signal
END RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30
REPORT RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30 Duration: 20507.74 ms Billed Duration: 21350 ms Memory Size: 1024 MB Max Memory Used: 131 MB Init Duration: 842.11 ms
Unknown application error occurred
I had it upload the geckodriver logs as well, but there wasn't much useful information in there:
1608506540595 geckodriver INFO Listening on 127.0.0.1:41597
1608506541569 mozrunner::runner INFO Running command: "/usr/bin/firefox" "--marionette" "-headless" "--window-size 1920,1080" "-foreground" "-no-remote" "-profile" "/tmp/rust_mozprofileQCapHy"
*** You are running in headless mode.
How can I even begin to debug this? The fact that the exact same container behaves differently depending upon where it's run seems fishy to me, but I'm not knowledgeable enough about Selenium, Docker, or Lambda to pinpoint exactly where the problem is.
Is my docker run command not accurately recreating the environment in Lambda? If so, then what command would I run to better simulate the Lambda environment? I'm not really sure where else to go from here, seeing as I can't actually reproduce the error locally to test with.
If anyone wants to take a look at the full code and try building it themselves, the repository is here - the lambda code is in lambda_function.py.
As for prior research, this question a) is about ChromeDriver and b) has no answers from over a year ago. The link from that one only has information about how to run a container in Lambda, which I'm already doing. This answer is almost my problem, but I know that there's not a version mismatch because the container works on my laptop just fine.
I have exactly the same problem and a possible explanation.
I think what you want is not possible for the time being.
According to AWS DevOps Blog Firefox relies on fallocate system call and /dev/shm.
However AWS Lambda does not mount /dev/shm so Firefox will crash when trying to allocate memory. Unfortunately, this handling cannot be disabled for Firefox.
However if you can live with Chromium, there is an option for chromedriver --disable-dev-shm-usage that disables the usage of /dev/shm and instead writes shared memory files to /tmp.
chromedriver works fine for me on AWS Lambda, if that is an option for you.
According to AWS DevOps Blog you can also use AWS Fargate to run Firefox/geckodriver.
There is an entry in the AWS forum from 2015 that requests mounting /dev/shm in Lambdas, but nothing happened since then.

Is this a bug of spark stream or memory leak?

I submit my code to a spark stand alone cluster. Submit command is like below:
nohup ./bin/spark-submit \
--master spark://ES01:7077 \
--executor-memory 4G \
--num-executors 1 \
--total-executor-cores 1 \
--conf "spark.storage.memoryFraction=0.2" \
./myCode.py 1>a.log 2>b.log &
I specify the executor use 4G memory in above command. But use the top command to monitor the executor process, I notice the memory usage keeps growing. Now the top Command output is below:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12578 root 20 0 20.223g 5.790g 23856 S 61.5 37.3 20:49.36 java
My total memory is 16G so 37.3% is already bigger than the 4GB I specified. And it is still growing.
Use the ps command , you can know it is the executor process.
[root#ES01 ~]# ps -awx | grep spark | grep java
10409 ? Sl 1:43 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -Xms4G -Xmx4G -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip ES01 --port 7077 --webui-port 8080
10603 ? Sl 6:16 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -Xms4G -Xmx4G -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://ES01:7077
12420 ? Sl 10:16 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --master spark://ES01:7077 --conf spark.storage.memoryFraction=0.2 --executor-memory 4G --num-executors 1 --total-executor-cores 1 /opt/flowSpark/sparkStream/ForAsk01.py
12578 ? Sl 21:03 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -Xms4096M -Xmx4096M -Dspark.driver.port=52931 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler#10.79.148.184:52931 --executor-id 0 --hostname 10.79.148.184 --cores 1 --app-id app-20160511080701-0013 --worker-url spark://Worker#10.79.148.184:52660
Below are the code. It is very simple so I do not think there is memory leak
if __name__ == "__main__":
dataDirectory = '/stream/raw'
sc = SparkContext(appName="Netflow")
ssc = StreamingContext(sc, 20)
# Read CSV File
lines = ssc.textFileStream(dataDirectory)
lines.foreachRDD(process)
ssc.start()
ssc.awaitTermination()
The code for process function is below. Please note that I am using HiveContext not SqlContext here. Because SqlContext do not support window function
def getSqlContextInstance(sparkContext):
if ('sqlContextSingletonInstance' not in globals()):
globals()['sqlContextSingletonInstance'] = HiveContext(sparkContext)
return globals()['sqlContextSingletonInstance']
def process(time, rdd):
if rdd.isEmpty():
return sc.emptyRDD()
sqlContext = getSqlContextInstance(rdd.context)
# Convert CSV File to Dataframe
parts = rdd.map(lambda l: l.split(","))
rowRdd = parts.map(lambda p: Row(router=p[0], interface=int(p[1]), flow_direction=p[9], bits=int(p[11])))
dataframe = sqlContext.createDataFrame(rowRdd)
# Get the top 2 interface of each router
dataframe = dataframe.groupBy(['router','interface']).agg(func.sum('bits').alias('bits'))
windowSpec = Window.partitionBy(dataframe['router']).orderBy(dataframe['bits'].desc())
rank = func.dense_rank().over(windowSpec)
ret = dataframe.select(dataframe['router'],dataframe['interface'],dataframe['bits'], rank.alias('rank')).filter("rank<=2")
ret.show()
dataframe.show()
Actually I found below code will cause the problem:
# Get the top 2 interface of each router
dataframe = dataframe.groupBy(['router','interface']).agg(func.sum('bits').alias('bits'))
windowSpec = Window.partitionBy(dataframe['router']).orderBy(dataframe['bits'].desc())
rank = func.dense_rank().over(windowSpec)
ret = dataframe.select(dataframe['router'],dataframe['interface'],dataframe['bits'], rank.alias('rank')).filter("rank<=2")
ret.show()
Because If I remove these 5 line. The code can run all night without showing memory increase. But adding them will cause the memory usage of executor grow to a very high number.
Basically the above code is just some window + grouby in SparkSQL. So is this a bug?
Disclaimer: this answer isn't based on debugging, but more on observations and the documentation Apache Spark provides
I don't believe that this is a bug to begin with!
Looking at your configurations, we can see that you are focusing mostly on the executor tuning, which isn't wrong, but you are forgetting the driver part of the equation.
Looking at the spark cluster overview from Apache Spark documentaion
As you can see, each worker has an executor, however, in your case, the worker node is the same as the driver node! Which frankly is the case when you run locally or on a standalone cluster in a single node.
Further, the driver takes 1G of memory by default unless tuned using spark.driver.memory flag. Furthermore, you should not forget about the heap usage from the JVM itself, and the Web UI that's been taken care of by the driver too AFAIK!
When you delete the lines of code you mentioned, your code is left without actions as map function is just a transformation, hence, there will be no execution, and therefore, you don't see memory increase at all!
Same applies on groupBy as it is just a transformation that will not be executed unless an action is being called which in your case is agg and show further down the stream!
That said, try to minimize your driver memory and the overall number of cores in spark which is defined by spark.cores.max if you want to control the number of cores on this process, then cascade down to the executors. Moreover, I would add spark.python.profile.dump to your list of configuration so you can see a profile for your spark job execution, which can help you more with understanding the case, and to tune your cluster more to your needs.
As I can see in your 5 lines, maybe the groupBy is the issue , would you try with reduceBy, and see how it performs.
See here and here.

lot of temp magick files created in temporary folder

I am using the imagick library to resizing and cropping the images in a http handler. Which doesn't write anything in /tmp folder. But as i can a lot of these files are being created in that folder and it's size is growing on a daily basis and currently it's consuming around 90% of partition size. Moreover, i am not able to read their content also. So, why these files are being created and would they be delete after a while or do i need to manually delete them.
-rw------- 1 ubuntu ubuntu 16M Feb 22 09:46 magick-d6ascKJV
-rw------- 1 ubuntu ubuntu 16M Feb 22 09:46 magick-46ccZIfq
-rw------- 1 ubuntu ubuntu 1.8M Feb 22 09:47 magick-vUET7vyh
-rw------- 1 ubuntu ubuntu 1.8M Feb 22 09:47 magick-OLkGWTX8
-rw------- 1 ubuntu ubuntu 15M Feb 22 09:48 magick-LNMV7YvE
-rw------- 1 ubuntu ubuntu 16M Feb 22 09:49 magick-0LMYt6Kc
-rw------- 1 ubuntu ubuntu 16M Feb 22 09:50 magick-ceNxX5CY
-rw------- 1 ubuntu ubuntu 16M Feb 22 09:50 magick-nQ1M3y6I
Edit :
I am not using the following two lines in my http Handler. Reason being i couldn't find any explanation to do so. Moreover, the go http handler works fine. So, what is the purpose of these statements?
imagick.Initialize()
defer imagick.Terminate()
I am assuming there would be some reason to include them in the code. So, in go http handler. Where it should be included inside func main() or inside serveHTTP ?
func main() {
myMux := http.NewServeMux()
myMux.HandleFunc("/", serveHTTP)
if err := http.ListenAndServe(":8085", myMux); err != nil {
logFatal("Error when starting or running http server: %v", err)
}
}
func serveHTTP(w http.ResponseWriter, r *http.Request) {
}
Temporary / intermediate files should be automatically cleaned-up by the calling ImageMagick thread. The pattern-behavior you are describing hints at a bug in the application using ImageMagick.
I would suggest...
Check error logs of application.
Evaluate error reporting of calling code.
Ensure proper imagick.Initialize & imagick.Terminate routines are called
And if nothing else works, use the environment variable MAGICK_TMPDIR to control where the imagemagick artifacts are written to.
Update
The imagick.Initialize wraps the underlying MagickCoreGenesis, and imagick.Terminate the MagickCoreTerminus routine. They are important for managing the environment in which ImageMagick operates. They should be called on the worker thread that will handle any ImageMagick tasks, so in your case, serveHTTP method. BUT it would not be optimal to perform such routines with every HTTP request, and an additional condition should be evaluated -- if possible.
func serveHTTP(w http.ResponseWriter, r *http.Request) {
// if request will do image work
imagick.Initialize()
defer imagick.Terminate()
// ... image methods ...
// end if
}
You can set a cron which runs in a day or after an hour cleaning up your temp folder for all temporary magick files. you can use this on line command to delete all magick files as well
sudo find /tmp/ -name "magick-*" -type f -delete
Actually it should be included in the func main() not in the
func serveHTTP(). It is meant to be called once for a long running application and it solved my problem.
func main() {
imagick.Initialize()
defer imagick.Terminate()
myMux := http.NewServeMux()
myMux.HandleFunc("/", serveHTTP)
if err := http.ListenAndServe(":8085", myMux); err != nil {
logFatal("Error when starting or running http server: %v", err)
}
}
func serveHTTP(w http.ResponseWriter, r *http.Request) {
}

Kdump doesn't executed on Kernel Panic

I'am trying to get kdump work an a Debian Squeeze.
I've made a dump kernel with this config
.config
The normal Kernel gets started in grub with the option crashkernel=128M
kdump tells me that the dump kernel is loaded and ready for dumping.
The kexec command that is executed by kdump also seems to work.
kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-2.6.32-kdump root=UUID=ee9dbc96-599a-4c55-b20c-2bc2b2301581 rw 1 irqpoll maxcpus=1 reset_devices" --initrd=/boot/initrd.img-2.6.32-kdump /boot/vmlinuz-2.6.32-kdump
If i use -f instead of -p the dump kernel is loaded.
But when i trigger a crashdump with sysrq nothing happens.
Did you enable the config macro CONFIG_KEXEC? Can you add prints at the panic() function in the kernel/panic.c file and check?

Resources