Adding large file to Docker build gives EOF exception - docker

To restore our production db locally, I'm adding a Postgres dump to a Docker build file. Until recently this was a smooth process. But as the db steadily grows (now +80G), it seems as though I've hit an unknown treshold. The build crashes at a simple ADD dmp.sql.gz /tmp/dmp.sql.gz line in the Dockerfile (so before it actually unzips or executes the contents of the file)
Sending build context to Docker daemon 87.42GB
Step 1/6 : FROM ecr.url/postgres96
---> 36f64c15a938
...
Step 5/6 : ADD dmp.sql.gz /tmp/dmp.sql.gz
Error processing tar file(exit status 1): unexpected EOF
logs of the Docker deamon don't give me much of a clue:
Aug 15 10:02:55 raf-P775DM3-G dockerd[2498]: time="2018-08-15T10:02:55.902896948+02:00" level=error msg="Can't add file /var/lib/docker/overlay2/84787e6108e9df6739cee9905989e2aab8cc72298cbffa107facda39158b633d/diff/tmp/dmp.sql.gz to tar: io: read/write on closed pipe"
Aug 15 10:02:55 raf-P775DM3-G dockerd[2498]: time="2018-08-15T10:02:55.904099449+02:00" level=error msg="Can't close tar writer: io: read/write on closed pipe"
I followed up on the actual copying of the file to the overlay fs, expecting to see it crash somewhere in the process, but it actually crashes after the whole file is transferred:
root#raf-P775DM3-G:/home/raf# ls /var/lib/docker/overlay2/e1d241ba14524cff6a7ef3dff8222d4f1ffbc4de05f60cd15d6afbdb2bb9f754/diff/tmp/ -lrta
total 85150928
-rw-r--r-- 1 root root 87194526754 Aug 14 00:01 dmp.sql.gz // -> this is the whole file
drwxr-xr-x 3 root root 4096 Aug 14 17:30 ..
drwxrwxrwt 2 root root 4096 Aug 14 17:30 .
When this dmp file was in the 70GB range, restoring it in this fashion was a time consuming but smooth process,on different OSes and Docker versions.
Does anyone can help figuring out the gist of the problem?
Currently experiencing this issue on Docker version 18.06.0-ce, build 0ffa825
Ps: I read about a tar header limit of 8GB which causes a EOF exception (https://github.com/moby/moby/issues/37581) but again, we were restoring 70GB+ dumps without issue.

Try upgrading to 18.09. They changed the tar backend which should fix this issue. As for why the 70GB file worked, I suspect it has something to do with compression in the layers since you cannot trigger this issue with an 8GB file of zeros. See https://github.com/moby/moby/pull/37771.

Related

NixOS - How to configure config property which is a type of types.lines

Tried to install Flexget. On the Options website it shows that there is config defined here
How does it work? it says the type is types.lines. Tried to write some random text, but the config file is not created and the daemon fails to start.
Edit:
So here is my configs inside configuration.nix:
services.flexget = {
enable = true;
config = "asdas\n asdas";
};
and when I run sudo nixos-rebuild switch this is what happens:
building Nix...
building the system configuration...
NOT restarting the following changed units: systemd-fsck#dev-disk-by\x2dlabel-FuHua.service, systemd-fsck#dev-disk-by\x2duuid-25A4\x2d32EA.service
activating the configuration...
setting up /etc...
reloading user units for shalva...
setting up tmpfiles
reloading the following units: dbus.service
the following new units were started: flexget-runner.timer, flexget.service
warning: the following units failed: flexget-runner.service
× flexget-runner.service - FlexGet Runner
Loaded: loaded (/etc/systemd/system/flexget-runner.service; linked; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2022-04-28 10:34:20 UTC; 129ms ago
TriggeredBy: ● flexget-runner.timer
Process: 92136 ExecStart=/nix/store/1pdq67nfjw2mad5s679dfgm5h98bm4xm-flexget-3.1.153/bin/flexget -c /var/lib/deluge/flexget.yml execute (code=exited, status=217/USER)
Main PID: 92136 (code=exited, status=217/USER)
IP: 0B in, 0B out
CPU: 799us
Apr 28 10:34:20 Lenovo-7200 systemd[1]: Started FlexGet Runner.
Apr 28 10:34:20 Lenovo-7200 systemd[92136]: flexget-runner.service: Failed to determine user credentials: No such process
Apr 28 10:34:20 Lenovo-7200 systemd[92136]: flexget-runner.service: Failed at step USER spawning /nix/store/1pdq67nfjw2mad5s679dfgm5h98bm4xm-flexget-3.1.153/bin/flexget: No such process
Apr 28 10:34:20 Lenovo-7200 systemd[1]: flexget-runner.service: Main process exited, code=exited, status=217/USER
Apr 28 10:34:20 Lenovo-7200 systemd[1]: flexget-runner.service: Failed with result 'exit-code'.
warning: error(s) occurred while switching to the new configuration
I htink the problem is in /nix/store/1pdq67nfjw2mad5s679dfgm5h98bm4xm-flexget-3.1.153/bin/flexget -c /var/lib/deluge/flexget.yml. Because the config file is not created. here is the output of it:
shalva in Lenovo-7200 in ~ took 16s
❯ /nix/store/1pdq67nfjw2mad5s679dfgm5h98bm4xm-flexget-3.1.153/bin/flexget -c /var/lib/deluge/flexget.yml
Could not instantiate manager: Config `/var/lib/deluge/flexget.yml` does not appear to be a file.
my expectation is that config = "asdas\n asdas"; should at least create config file at /var/lib/deluge/flexget.yml, right? I know it will still fail because it wont be valid config, but at least it should be created...
Flexget is a software which is meant to be run together with deluge, therefore it expects the user deluge and its home directory to be available. The service will not create the user deluge by itself.
Enable deluge and flexget will start up correctly:
services.deluge.enable = true;
services.deluge.package = pkgs.deluge-2_x;
If you want to only run flexget without deluge you need to have the directory created and either change the flexget user and create the user deluge by hand.

symfony 5.3 logfile could not be opened in append mode: Failed to open stream (docker)

I have a rotating Logfile configured with monolog, which worked like a charm and suddenly I get error messages stating that the file cannot be created, when the frontend is calling my symfony backend, which runs in a alpine docker container.
Creating logfiles while running my phpunit tests is causing no trouble at all.
This is my monolog configuration, which didn't change:
monolog:
handlers:
frontend:
type: rotating_file
path: "%kernel.logs_dir%/%kernel.environment%.frontend.log"
level: error
channels: [ frontend ]
max_files: 3
main:
type: rotating_file
path: "%kernel.logs_dir%/%kernel.environment%.log"
level: debug
channels: [ "!event", "!frontend", "!deprecation" ]
max_files: 3
console:
type: console
process_psr_3_messages: false
channels: [ "!event", "!doctrine", "!console", "!frontend", "!deprecation" ]
What I did was disabling sessions, which should have no influence on whether the system is able to create a file for writing.
Why can my server no longer write logfiles?
My permissions for my log folder are increased to 777 and the server runs as "root" in it's container:
e9dffe459185:/var/www/var# ll
total 8
drwxrwxr-x 6 root root 192 Jul 22 15:07 ./
drwxr-xr-x 51 root root 1632 Jul 23 10:16 ../
drwxrwxr-x 4 root root 128 Jul 23 11:01 cache/
drwxrwxrwx 3 root root 96 Jul 23 11:02 log/
Looks like the problem was fixed with an update of monolog:
Updating dependencies
Lock file operations: 0 installs, 1 update, 0 removals
- Upgrading monolog/monolog (2.3.1 => 2.3.2)
Writing lock file
Installing dependencies from lock file (including require-dev)
Package operations: 0 installs, 1 update, 0 removals
- Downloading monolog/monolog (2.3.2)
- Upgrading monolog/monolog (2.3.1 => 2.3.2): Extracting archive
Looks like version 2.3.1 was buggy. Since the upgrade I don't have further problems.

LINKERD: Unable to build docker image from linkerd

https://github.com/linkerd/linkerd#docker
From the instruction on Readme, I have executed the following commands,
; linkerd/docker ;namerd/docker
I get the following exception,
[info] Done packaging.
[trace] Stack trace suppressed: run last linkerd/bundle:docker for the full output.
[error] (linkerd/bundle:docker) java.io.IOException: Cannot run program "docker" (in directory "/home/shaikk/linkerd/linkerd/target/docker"): error=2, No such file or directory
[error] Total time: 284 s, completed Mar 6, 2017 9:13:49 AM
I think the No such file or directory error message is referring to the docker binary itself. Can you try running which docker to see if it's in your path? If it's not there, you can install it using the instructions here: https://docs.docker.com/engine/installation/#platform-support-matrix

Jenkins Gerrit polling fails: constantly triggering builds

So we're using Gerrit Trigger (2.23.0) on our Jenkins CI build manager & using docker containers for the actual builds.
The issue that has recently popped up in some of our branches the Gerrit Repo polling is failing and causing it to "detect changes" every time, so it's constantly rebuilding despite no changes.
Checking the Gerrit Repo Polling Log for any of the affected jobs gives one of the following outputs:
Started on Feb 1, 2017 3:12:25 PM
Polling SCM changes on aosp-host
[workspace] $ repo init -u http://xxx.xxx.xxx.xxx/git/project/platform/manifest.git -b branch -m branch.xml
Get https://gerrit.googlesource.com/git-repo/clone.bundle
Get https://gerrit.googlesource.com/git-repo
fatal: Not a git repository: '/home/jenkins/workspace/.repo/manifests.git'
fatal: Not a git repository: '/home/jenkins/workspace/.repo/manifests.git'
fatal: cannot obtain manifest http://xxx.xxx.xxx.xxx/git/project/platform/manifest.git
Done. Took 1 min 19 sec
Changes found
or, if the build was already building (gerrit waits for the build to finish before doing the scm poll)
Started on Feb 2, 2017 3:24:15 AM
Polling SCM changes on aosp-host
[workspace] $ repo init -u http://xxx.xxx.xxx.xxx/git/project/platform/manifest.git -b branch -m branch.xml
fatal: cannot make /home/jenkins/workspace/.repo/repo directory: File exists
Done. Took 2 hr 4 min
Changes found
The builds, which are triggered by this failure, use the same commands and work fine:
[workspace] $ repo init -u http://xxx.xxx.xxx.xxx/git/project/platform/manifest.git -b branch -m branch.xml
Navigating to the manifest directory, we see the symptom:
jenkins#f052b3453d95:~/workspace/.repo$ ll
total 32
drwxr-xr-x 1 jenkins jenkins 180 Dec 20 11:08 ./
drwxrwxr-x 1 jenkins jenkins 778 Dec 20 11:07 ../
-rw-r--r-- 1 jenkins jenkins 20087 Dec 20 10:14 .repo_fetchtimes.json
lrwxrwxrwx 1 jenkins jenkins 20 Dec 20 10:13 manifest.xml -> manifests/branch.xml
drwxr-xr-x 1 jenkins jenkins 8 Dec 16 17:33 manifests/
drwxr-xr-x 1 jenkins jenkins 50 Dec 16 17:33 manifests.git/
drwxr-xr-x 1 jenkins jenkins 28 Dec 16 17:43 project-objects/
-rw-r--r-- 1 jenkins jenkins 7756 Dec 20 10:14 project.list
drwxr-xr-x 1 jenkins jenkins 410 Dec 16 17:46 projects/
with
lrwxrwxrwx 1 jenkins jenkins 20 Dec 20 10:13 manifest.xml -> manifests/branch.xml
highlighted in red, because the associated branch.xml is not found... so the Gerrit Log from above is accurate, it's failing to init properly. This is confirmed via a repo status in the main directory:
jenkins#f052b3453d95:~/workspace$ repo status
Traceback (most recent call last):
File "/home/jenkins/workspace/.repo/repo/main.py", line 531, in <module>
_Main(sys.argv[1:])
File "/home/jenkins/workspace/.repo/repo/main.py", line 507, in _Main
result = repo._Run(argv) or 0
File "/home/jenkins/workspace/.repo/repo/main.py", line 180, in _Run
result = cmd.Execute(copts, cargs)
File "/home/jenkins/workspace/.repo/repo/subcmds/status.py", line 130, in Execute
all_projects = self.GetProjects(args)
File "/home/jenkins/workspace/.repo/repo/command.py", line 140, in GetProjects
all_projects_list = manifest.projects
File "/home/jenkins/workspace/.repo/repo/manifest_xml.py", line 350, in projects
self._Load()
File "/home/jenkins/workspace/.repo/repo/manifest_xml.py", line 407, in _Load
self.manifestProject.worktree))
File "/home/jenkins/workspace/.repo/repo/manifest_xml.py", line 443, in _ParseManifestXml
root = xml.dom.minidom.parse(path)
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 922, in parse
fp = open(file, 'rb')
IOError: [Errno 2] No such file or directory: '/home/jenkins/workspace/.repo/manifest.xml'
The issue is, running repo init -u <url> -m branch.xml via the command line works fine, and produces a valid repo.
Any insight one can offer for this issue?

Passenger/mod_rails fails to initialize in Fedora 12 when starting Apache

I am in the process of setting up a server to run a Ruby on Rails application on Fedora 12, using Passenger.
I am at the stage where I've installed Passenger, set it up as prescribed, but get the following errors when I restart Apache:
[Wed Jan 13 15:41:38 2010] [notice] caught SIGTERM, shutting down
[Wed Jan 13 15:41:40 2010] [notice] SELinux policy enabled; httpd running as context unconfined_u:system_r:httpd_t:s0
[Wed Jan 13 15:41:40 2010] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Wed Jan 13 15:41:40 2010] [error] *** Passenger could not be initialized because of this error: Cannot create FIFO file /tmp/passenger.25235/.guard: Permission denied (13)
[Wed Jan 13 15:41:40 2010] [notice] Digest: generating secret for digest authentication ...
[Wed Jan 13 15:41:40 2010] [notice] Digest: done
[Wed Jan 13 15:41:40 2010] [error] *** Passenger could not be initialized because of this error: Cannot create FIFO file /tmp/passenger.25235/.guard: Permission denied (13)
[Wed Jan 13 15:41:40 2010] [error] python_init: Python version mismatch, expected '2.6', found '2.6.2'.
[Wed Jan 13 15:41:40 2010] [error] python_init: Python executable found '/usr/bin/python'.
[Wed Jan 13 15:41:40 2010] [error] python_init: Python path being used '/usr/lib/python26.zip:/usr/lib/python2.6/:/usr/lib/python2.6/plat-linux2:/usr/lib/python2.6/lib-tk:/usr/lib/python2.6/lib-old:/usr/lib/python2.6/lib-dynload'.
[Wed Jan 13 15:41:40 2010] [notice] mod_python: Creating 4 session mutexes based on 256 max processes and 0 max threads.
[Wed Jan 13 15:41:40 2010] [notice] mod_python: using mutex_directory /tmp
[Wed Jan 13 15:41:40 2010] [notice] Apache/2.2.14 (Unix) DAV/2 Phusion_Passenger/2.2.9 PHP/5.3.0 mod_python/3.3.1 Python/2.6.2 mod_ssl/2.2.14 OpenSSL/1.0.0-fips-beta3 mod_perl/2.0.4 Perl/v5.10.0 configured -- resuming normal operations
As you can see, there is a permissions problem when Passenger is trying to initialize:
[Wed Jan 13 15:41:40 2010] [error] *** Passenger could not be initialized because of this error: Cannot create FIFO file /tmp/passenger.25235/.guard: Permission denied (13)
When Apache is starts, it does create a file in /tmp:
d-ws--x--x. 2 root root 4096 2010-01-13 16:04 passenger.26117
If instead I run the app by firing up mongrel directly with mongrel_rails start -e production, I see the following:
ActiveRecord::StatementInvalid (Mysql::Error: Can't create/write to file '/tmp/#sql_5d3_0.MYI' (Errcode: 13): SHOW FIELDS FROM `users`):
Again the error points to permission issues with the /tmp directory.
I am at a loss as to what the solution is. I'm not sure if it is related to simply directory permissions or Fedora's SELinux security.
Any help would be appreciated. Thanks.
I did the same as Fred, except that instead of doing it one error at a time:
Go into permissive mode by running setenforce 0
Restart apache, and hit your site and use it for a while as normal
Run grep httpd /var/log/audit/audit.log | audit2allow -M passenger
semodule -i passenger.pp
Go back to enforcing mode by running setenforce 1
Restart apache and test your site - hopefully it should all be working as before!
Note that this is basically a specific example of the procedure on the Centos SELinux help - check it out.
I'm having the same issue in CentOS 5.4, SELinux getting in the way of Passenger.
Setting PassengerTempDir to /var/run/passenger simply gives you the same permission errors in the new directory instead of /tmp :
[Mon Feb 22 11:42:40 2010] [error] *** Passenger could not be initialized because of this error: Cannot create directory '/var/run/passenger/passenger.3686'
I can then change the security context of /var/run/passenger to get past this error:
chcon -R -h -t httpd_sys_content_t /var/run/passenger/
...and that lets Passenger create the temp directory, but not files within that directory:
[Mon Feb 22 12:07:06 2010] [error] *** Passenger could not be initialized because of this error: Cannot create FIFO file /var/run/passenger/passenger.3686/.guard: Permission denied (13)
Oddly, re-running the recursive chcon again doesn't get past this error, it keeps dying at this point, and this is where my SELinux knowledge gets murky.
The Phusion Passenger guide sections 6.3.5 and 6.3.7 have some useful thoughts, but they don't seem to completely resolve the problem.
You need more than just the httpd_sys_content_t permission. I use the following technique to get things started:
start a tail on the audit log: tail -f /var/log/audit/audit.log
reload apache: apachectl restart
Go to the /tmp/directory: cd /tmp
If just 1 line is added use the command: tail -1 /var/log/audit/audit.log | audit2allow -M httpdfifo
Note that the name 'httpdfifo' is just a name chosen to reflect the kind of error that has been observed.
This will create a file named 'httpdfifo.pp'. To allow apache to create a FIFO from here on after you have to issue the command: semodule -i httpdfifo.pp
Continue to do this until all audit errors have been resolved (It took 4 different kind of permissions on my system running Centos 5.4)
Running setenforce 0 before starting will let you test if it's SELinux. Don't forget to run setenforce 1 afterwards.
I tried what Dan Sketcher and Fred Appleman suggested, i.e. repeat the following:
yum install setroubleshoot
echo > /var/log/audit/audit.log # clear irrelevant errors
cd ~
service httpd restart # try booting passenger -- audit.log now shows the relevant permission errors
tail -f /var/log/httpd/error_log # check that passenger is still failing due to permission errors
sealert -a /var/log/audit/audit.log > selinux-diag.txt # translate the permission errors
# read and check that you are happy with selinux-diag.txt
# and either follow its specific advice, or if it just wants you to grep into audit2allow, then:
cat /var/log/audit/audit.log | audit2allow -M mypol # grant everything just denied
semodule -i mypol.p # commit new permissions
But after doing this 5 or 6 times, I kept coming up against new errors, and some of the same errors came up even after I had tried to permit them with "audit2allow".
In the end I just turned off SELinux, with:
echo 0 >/selinux/enforce

Resources