I have a KStreams application running inside a Docker container which uses a persistent key-value store. My runtime environment is Docker 1.13.1 on RHEL 7.
I have configured state.dir with a value of /tmp/kafka-streams (which is the default).
When I start this container using "docker run", I mount /tmp/kafka-streams to a directory on my host machine which is, say for example, /mnt/storage/kafka-streams.
My application.id is "myapp". I have 288 partitions in my input topic which means my state store / changelog topic will also have that many partitions. Accordingly, when start my Docker container, I see that there a folder with the number of the partition such as 0_1, 0_2....0_288 under /mnt/storage/kafka-streams/myapp/
When I shutdown my application, I do not see any .checkpoint file in any of the partition directories.
And when I restart my application, it starts fetching the records from the changelog topic rather than reading from local disk. I suspect this is because there is no .checkpoint file in any of the partition directories. (Note : I can see the .lock and rocksdb sub-directory inside the partition directories)
This is what I see in the startup log. It seems to be bootstrapping the entire state store from the changelog topic i.e. performing network I/O rather than reading from what is on disk :
2022-05-31T12:08:02.791 [mtx-caf-f6900c0a-50ca-43a0-8a4b-95eaad9e5093-StreamThread-122] WARN o.a.k.s.p.i.ProcessorStateManager - MSG=stream-thread [myapp-f6900c0a-50ca-43a0-8a4b-95eaa
d9e5093-StreamThread-122] task [0_170] State store MyAppRecordStore did not find checkpoint offsets while stores are not empty, since under EOS it has the risk of getting uncommitte
d data in stores we have to treat it as a task corruption error and wipe out the local state of task 0_170 before re-bootstrapping
2022-05-31T12:08:02.791 [myapp-f6900c0a-50ca-43a0-8a4b-95eaad9e5093-StreamThread-122] WARN o.a.k.s.p.internals.StreamThread - MSG=stream-thread [mtx-caf-f6900c0a-50ca-43a0-8a4b-95eaad
9e5093-StreamThread-122] Detected the states of tasks [0_170] are corrupted. Will close the task as dirty and re-create and bootstrap from scratch.
org.apache.kafka.streams.errors.TaskCorruptedException: Tasks [0_170] are corrupted and hence needs to be re-initialized
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.initializeStoreOffsetsFromCheckpoint(ProcessorStateManager.java:254)
at org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:109)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeIfNeeded(StreamTask.java:216)
at org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:433)
at org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:849)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:731)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:583)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:556)
Should I expect to see a .checkpoint file in each of the partition directories under /mnt/storage/kafka-streams/myapp/ when I shutdown my application ?
Is this an issue because I am running my KStreams app inside a docker container ? If there were permissions issues, then I would have expected to see issues in creating the other files such as .lock or rocksdb folder (and it's contents).
If I run this application as a standalone/runnable Springboot JAR on my Windows laptop i.e. not in a Docker container, I can see that it creates the .checkpoint file as expected.
My java application inside the Docker container is run via an entrypoint script. It seems that if I stop the container, then it does not send the TERM signal to my java process and hence does not have a clean shutdown of the java KStreams application.
So, all I needed to do was to find a way to somehow send a TERM signal to my java application inside the container.
For the moment, I just ssh'ed into the container and did a kill -s TERM <pid> for my java process.
Once I did that, it resulted in a clean shutdown and thus created the .checpoint file as well.
Related
I'm running my development environment in Docker containers. Since I have done some updates I'm now experiencing some difficulties when trying to rebuild my project that's running in my Docker container.
My project is running in a Windows Server Core Docker container running IIS, and I'm running the project from a shared volume on my host. I'm able to build the project before starting the docker container, but after the docker container is started the build fails with the following error:
Could not copy "C:\path\to\dll\name.dll" to "bin\name.dll". Exceeded retry count of 10. Failed. The file is locked by: "vmwp.exe (22604), vmmem (10488)"
It seems that the Hyper-V process is locking the DLL files. This clearly wasn't the case before and this seems to be related to some Docker or Windows updates I have done. How can I solve this issue? Do I need to change the process of building the application and running it in my Docker containers?
I have been searching for a while now, and I can't find much about this specific issue. Any help would be appreciated. Thanks in advance!
I've run in the similar problem. Solved by stopping/removing the running application container from docker-for-windows interface. docker rm -f will also do.
Potential solution:
If you use Docker Windows Containers make sure you have at least Windows 10.0.1809 on both environment(your physical machine and on docker) -run CMDs and you will see on top of it.
Use isolation flag with process when you run docker: --isolation process.
On physical machine two vmxxx(lower and higher PID)(don't remember the name exactly) processes was keeping *.dll file(the build was going on docker side where build tools 2019 was used).
Short description:
First MSbuild Error occurred because msbuild tries to delete file - access denied - probably this one vm process handle the file.
Second Msbuild Error occurred(the first vmxxx one caused that) showing that copy the same dll file from one direction to another it's not possible due to System lock (4).
Both two vmxxx processes kept one dll file during build on docker. It was visible in tool "Process Explorer"(use full version from Sysinternals)
One vmxxx had lower number of PID which lock the dll file and do not release it before second process with higher number of PID tries do something with it.
And it's one random dll file(s) that is kept by two different process.
Also, using and defining only one CPU without parallel on msbuild did not solved the issue before. Same on docker where you are able to manage the cpu and memory. In the end isolation on docker solved the case.
Isolation should take care of processes when you build project from docker container.
I'm running a Docker container on Compute Engine, using the Container Image VM property.
However, if I stop and restart the VM, my app works but the logs aren't collected any more.
When I run docker ps I only see my own Docker image. However, for a new VM that hasn't been stopped I also see a container image called gcr.io/stackdriver-agents/stackdriver-logging-agent.
Are there any specific steps I need to take to restore the VM as it was before it was stopped? How can I make logging work again, and are there other differences I should be aware of?
I understand you are running a docker container on Compute Engine and when you stop/restart the VM, the logs aren’t being collected anymore. As well as wanting to know how to restore a VM to its previous form and the stackdriver-logging-agent.
As described in this article [1], you can use GCE snapshots to create backups of persistent disks attached to the instance, including boot volumes. This is useful for backing up your data, recreating a disk that might have been lost, or copying a persistent disk. That being said, currently this is the method you can recover deleted disk.
Therefore, unfortunately if there are no snapshots taken already from the VM’s disk(s), the deleted disk volume cannot be recovered, this process is irreversible [2].
In the future, you can set disk ‘auto-delete’ [3] to no when creating an instance, this way disk will remain even if the instance is deleted.
As for the the logging agent image, it’s a container image that streams logs from your VM instances and from selected third-party software packages to Stackdriver Logging. It is a best practice to run the Logging agent on all your VM instances, which can answer your question as to why the logs aren’t appearing anymore. They are simply being recorded by the logging agent and sent to Stackdriver Logging.
For the logs not being recollected you can try this out to reset the service:
Please do the following on your affected Windows instance:
Stop the "StackdriverLogging" service. You can do it from command line with "net stop StackdriverLogging"
Navigate to the following directory: "C:\Program Files (x86)\Stackdriver\LoggingAgent\Main\pos\winevtlog.pos\worker0"
Remove the file “storage.json” located in that directory
Restart StackdriverLogging service - execute "net start StackdriverLogging" from command line.
This should reset logging agent state and make logging functional again.
[1] https://cloud.google.com/compute/docs/disks/create-snapshots
[2] https://cloud.google.com/compute/docs/disks/#pdspecs
[3] https://cloud.google.com/sdk/gcloud/reference/compute/instances/create#--disk
I wrote a simple go application and added a flock system to prevent being running twice at the same time:
import "github.com/nightlyone/lockfile"
lock, err := lockfile.New(filepath.Join(os.TempDir(), "pagerduty-read-api.lock"))
if err != nil {
panic(err)
}
if err = lock.TryLock(); err != nil {
fmt.Println("Already running.")
return
}
defer lock.Unlock()
It works well on my host. On docker, I tried to run it with volume sharing of tmp:
docker run --rm -it -v /tmp:/tmp my-go-binary
But it does not work. I suppose it's because the flock system is not ported on volume sharing.
My question: Does Docker have option to make flock working between running instance? If not, what are my other options to have the same behavior?
Thanks.
This morning I wrote a little Python test program that just writes one million consecutive integers to a file, with flock() locking, obtaining and releasing the lock once for each number appended. I started up 5 containers, each running that test program, and each writing to the same file in a docker volume.
With the locking enabled, the numbers were all written without interfering with each other, and there were exactly 5 million integers in the file. They weren't consecutive when written this way, but that's expected and consistent with flock() working.
Without locking, many of the numbers were written in a manner that indicates the numbers were running afoul of the multitasking sans locking. There were only 3,167,546 numbers in the file and there were 13,357 blank lines. That adds up to the 3,180,903 lines in the file - substantially different than the desired 5,000,000.
While a program cannot definitively prove that there will never be problems just by testing many times, to me that's a pretty convincing argument that Linux flock() works across containers.
Also, it just kinda makes sense that flock would work across containers; containers are pretty much just a shared kernel, distinct pid, distinct file (other than volumes) and distinct IP port space.
I ran my test on a Linux Mint 19.1 system with Linux kernel 4.15.0-20-generic, Docker 19.03.0 - build aeac949 and CPython 3.6.8.
Go is a cool language, but why flock() didn't appear to be working across volumes in your Go program I do not know.
HTH.
I suppose that you want to use docker volume or you may need some other docker volume plugins.
According to this article, Docker Volume File permission and locking, docker volumes only provides a way to define a volume to use by multi containers or use by a container after restarting.
In docker volume plugins, flocker may meet your requirements.
Flocker is an open-source Container Data Volume Manager for your Dockerized applications.
BTW, if you are using kubernetes, you may need to learn more about persistent volume, persistent volume claim, storage class.
I've been doing some research on this myself recently, and the issue is that nightlyone/lockfile doesn't actually use the flock syscall. Instead, the lockfile it writes is a PIDfile, a file that just contains the PID (Process IDentifier) of the file that created it.
When checking if a lock is locked, lockfile checks the PID stored in the lockfile, and if it's different to the PID of the current process, it sees it as locked.
The issue is that lockfile doesn't have any special logic to know if it's in a docker container or not, and PIDs get a little muddled when working with docker; the PID of a process when viewed from inside the container will be different from the PID of a process outside the container.
Where this often ends up is that we have two containers running your code above, and they both have PIDs of 1 within their containers. They'll each try to create a lockfile, writing what they think their PID is (1). They then both think they hold the lock - after all, their PID is the one that wrote it! So the lockfile is ignored.
My advice is to switch to a locking implementation that uses flock. I've switched to flock, and it seems to work okay.
I am planning a setup where, the docker containers are using remote volume - volume that have ssh-ed to another machine and it is reading all the time.
Lets say we have 5 containers using that remote volume. In my understanding, the docker is ssh-ed to the remote machine and constantly reading on certain directory (with about 100 files, not more than few MB).
Presumably that constant reading will put some load to the remote machine. Will that load be significant or it can be negligible? There is php-fpm and Apache2 on the remote machine, will the constant reading slow down that web server? Also, how often the volume is refreshing the files?
Sincerely.
OK after some testing:
I have created a remote volume with vieux/sshfs driver.
Created an ubuntu container with the volume mounted under certain folder.
Then tail a txt file from the container itself.
Write to that txt file form the remote machine (the one that contains the physical folder).
I have found out, if we write to the file continuously (like echo "whatever" >> thefile.txt). The changes appear all at once after few seconds, not one by one as they have been introduced. Also, if I print or list the files in the mounted directory, the response is instant. This makes me thing, that Docker is making local copy of the folder ssh-ed in the volume and refreshes it every 5 sec or so. Basically negligible load after the folder is copied once.
Also, when trying to write from the container to the mounted folder, the changes on the file are reflected almost instantly (considering any latency). Which makes me think that the daemon is propagating the write changes instantly.
In conclusion - reading a remote folder, puts negligible load to the remote machine. The plan is to use such setup in production environment, so we don't have to pull changes on two different places (prod server and machine which is sharing (local) volume between containers).
If there is anyone who can confirm my findings, that would be great.
Sincerely
I'm trying to write temporary files on the workers executing Dataflow jobs, but it seems like the files are getting deleted while the job is still running. If I SSH into the running VM, I'm able to execute the exact same file-generating command and the files are not destroyed -- perhaps this is a cleanup that happens for the dataflow runner user only. Is it possible to use temp files or is this a platform limitation?
Specifically, I'm attempting to write to the location returned by Files.createTempDir(), which is /tmp/someidentifier.
Edit: Not sure what was happening when I posted, but Files.createTempDirectory() works...
We make no explicit guarantee about the lifetime of files you write to the local disk.
That said, writing to a temporary file inside ProcessElement will work. You can write and read from it within the same ProcessElement. Similarly, any files created in DoFn.startBundle will be visible in processElement and finishBundle.
You should avoid writing to /dataflow/logs/taskrunner/harness. Writing files there might conflict with Dataflow's logging. We encourage you to use the standard Java APIs File.createTempFile() and File.createTempDirectory() instead.
If you want to preserve data beyond finishBundle you should write data to durable storage such as GCS. You can do this by emitting data as a sideOutput and then using TextIO or one of the other writers. Alternatively, you could just write to GCS directly from inside your DoFn.
Since Dataflow runs inside containers you won't be able to see the files by ssh'ing into the VM. The container has some of the directories of the host VM mounted, but /tmp is not one of them. You would need to attach to the appropriate container e.g. by running
docker exec -t -i <CONTAINER ID> /bin/bash
That command would start a shell inside a running container.
Dataflow workers run in a Docker container on the VM, which has some of the directories of the host VM mounted, but apparently /tmp is not one of them.
Try writing your temp files, e.g., to /dataflow/logs/taskrunner/harness, which will be mapped to /var/log/dataflow/taskrunner/harness on the host VM.