How to deploy a large docker image to Pivotal Cloud Foundry - docker

I'm trying to push this docker image to my PCF environment:
https://hub.docker.com/r/jupyter/tensorflow-notebook/tags/
The image is 3.9GB when extracted.
When I do:
cf push jupyter-minimal-notebook --docker-image jupyter/minimal-notebook -m 8GB -k 4G
I get the error:
The app is invalid: disk_quota too much disk requested (requested 4096 MB - must be less than 2048 MB)
The default disk space an app gets is 1024mb by default. This is set in cloud_controller_ng in the default_app_disk_in_mb parameter.
The maximum amount of disk a user can request is 2048mb by default. This is set in cloud_controller_ng in the maximum_app_disk_in_mb parameter.
I believe the solution is to increase the value for maximum_app_disk_in_mb however, after much searching I cannot figure out how to set it.
I have tried the following in the manifest.yml:
---
applications:
- name: jupyter-tensorflow-notebook
docker:
image: jupyter/tensorflow-notebook
cloud_controller_ng:
maximum_app_disk_in_mb: 4096
disk_quota: 4G
memory: 8G
instances: 1
This does not work and returns the same error:
The app is invalid: disk_quota too much disk requested (requested 4096 MB - must be less than 2048 MB)
UPDATE September 3rd 2019:
I didn't give enough background. I've setup a Small Footprint PCF environment on AWS using the AWS Quickstart, so I have full control over the deployment to tweak whatever parameters I'd like. In effect I'm the platform operator. So the question is, given I have the rights to make the changes to maximum_app_disk_in_mb how would I go about doing that? I'd like to change the maximum_app_disk_in_mb parameter but can't see how to do that without redeploying the entire environment.
To get the manifest that was used in Quickstart I figured that I needed to do the following:
bosh -e aws -d [my-deployment] manifest
This from what I understand is the complete manifest. There are a lot of variable parameters in it such as ((cloud-controller-internal-api-user-credentials.password)) etc.
Is there a way to update maximum_app_disk_in_mb without redeploying the complete manifest?
If I have to redeploy the complete manifest is the best way to do this by doing:
bosh -e aws -d [my-deployment] manifest > manifest.yml
Then editing maximum_app_disk_in_mb parameter value in the manifest.yml and redeploying? If I do this will it pick up on all the values for parameters that are using variables in the manifest such ((cloud-controller-internal-api-user-credentials.password))?
When I do:
bosh -e aws deployments
There seem to be two deployments, aws-service-broker-xxxxxxxxxxxxxxx and cf-xxxxxxxxxxxxxxxx (I've replaced the id with x's for anonymity). The former doesn't have any running instances so I guess I don't need to make any changes to that one?

Let me just start by saying that you, as a developer/end user, of Cloud Foundry cannot change settings in Cloud Controller, which is a system level component. Only your operator could do that.
If you could change settings in Cloud Controller, that would void all the limits set by Cloud Controller and your platform operators, as you could just set them as high as you want.
Cloud Controller will set a default disk size, which takes effect if you do not set one, and it will set a maximum disk size, which limits you from consuming too much disk on the foundation and impacting other users.
I don't believe there is a way to fetch the max disk space allowed by Cloud Controller & your platform operator other than just specifying a large disk quota (try cf push -k 99999G) and looking at the response.
As you can see from your response, Cloud Controller will tell you in the error message the maximum allowed value. In your case, the most it will allow is 2G.
The app is invalid: disk_quota too much disk requested (requested 4096 MB - must be less than 2048 MB)
There's nothing you can do to get a disk above 2G, other than write to your platform operator and ask them to increase this value. If this is your company's platform, then that is probably the best option.
If you are using a public provider, then you're probably better looking at other providers. There are several public, certified providers you can use. See the list here -> https://www.cloudfoundry.org/thefoundry/
Hope that helps!

Related

How to "rsync" docker image to remote host over ssh?

In a deployment bash script, I have two hosts:
localhost, which is a machine, that typically builds docker images.
$REMOTE_HOST, which is believed to be some production web server.
And I need to transfer locally built docker image to $REMOTE_HOST, in most efficient way (fast, reliable, private, storage-friendly). Up to day, I have following command in my streaming script:
docker save $IMAGE_NAME :latest | ssh -i $KEY_FILE -C $REMOTE_HOST docker load
This has following PROS:
Utilizes "compression-on-the-fly"
Does not stores intermediate files on both source and destination
Does direct transfer (images may be private), that also reduces upload time and stays "green", in another broader terms.
However, the CONS are also on checkerboard: When you are involved in transferring larger images, you dont know operation progress. So you have to wait unknown, but sensible time, that you cant estimate. I heard that progress can be tracked with kinda rsync --progress command
But rsync seems to transfer files, and is not working good with my ol'UNIX-style commands. Of couse you can docker load from some file, but how to avoid it?
How can I utilize piping, to preserve above advantages? (Or is there another special tool do copy build image to remote docker host, which shows progress?)
You could invoke pv as part of your pipeline:
docker save $1:latest | pv [options...] | ssh -i $3 -C $2 docker load
pv works like cat, in that it reads from its standard input and writes to its standard output. Except that, like the documentation says,
pv allows a user to see the progress of data through a pipeline, by giving information such as time elapsed, percentage completed (with progress bar), current throughput rate, total data transferred, and ETA.
pv has a number of options to control what kind of progress information it prints. You should read the documentation and choose the output that you want. In order to display a percentage complete or an ETA, you will probably need to supply an expected size for the data transfer using the -s option.

Docker save issue

I am on docker version 1.11.2. I am trying to docker save an image but i get
an error.
i did docker images to see the size of the image and the result is this
myimage 0.0.1-SNAPSHOT e0f04657b1e9 10 months ago 1.373 GB
The server I am on is low on space but it has 2.2 GB available but when I run docker save myimage:0.0.1-SNAPSHOT > img.tar i get
write /dev/stdout: no space left on device
I removed all exited containers and dangling volumes in hopes of making it work but nothing helped.
You have no enough space left on device. So free some more space or try gzip on the fly:
docker save myimage:0.0.1-SNAPSHOT | gzip > img.tar.gz
To restore it, docker automatically realizes that is gziped:
docker load < img.tar.gz
In such a situation where you can't free enough space locally you might want to use storage available over a network connection. A little bit more difficult to set up are NFS or Samba.
The easiest approach could be piping the output through netcat, but keep in mind that this is at least by default unencrypted.
But as long as your production server is that low on space you are vulnerable to a bunch of other problems.
Until you can provide more free space I wouldn't create files locally, zipped or not. You could bring important services down when you run out of free space.

Docker consuming more HD memory

I have a play application and running in docker 1.10.3. We are hitting this applicaton with 1000 request per second to do a load test. Application works fine. We see a significant HD memory consumed by Docker. In 3 day the docker consumed fron 2.2gb to 39gb. This worries us a load.
Docker INFO and the consumed space highlighted
Is there any was to configre docker not to consumen HD memory?
Any help will be appreciated.
Docker captures the standard output (STDOUT) of your application and stores it (by default) in an internal log file. You can find this file at /var/lib/docker/containers/$CONTAINER_ID/$CONTAINER_ID-json.log. This file is not rotated by default and may grow large if your application prints to STDOUT verbosely.
Two possible solutions:
Configure log rotation for the Docker log files. I've found a good article here that describes how to enable log rotation for Docker by creating the file /etc/logrotate.d/docker-container with the following contents:
/var/lib/docker/containers/*/*.log {
rotate 7
daily
compress
size=1M
missingok
delaycompress
copytruncate
}
You can play around with the options. They are all documented in logrotate's man page.
Use alternate logging for your containers by specifying the --log-driver option when creating a container:
$ docker run --log-driver=syslog your_image
Available drivers are documented in the official documentation. You can for example use --log-driver=syslog to use the system's syslog daemon, target various cloud services or disable logging entirely by using --log-driver=none.

qsub disregarded memory limit

My command is:
qsub -t 1:30:1 -q test.q -l r_core=5 -l r_mem=30 run.sh
It launches 30 instances, each on one server, but they tend to consume more than the specified 30GB of RAM.
What are the reasons for this?
The only real-time resource enforcement you get is A) checking of min/max requests at submission, and B) walltime--and even with walltime, you may not get reliable enforcement, depending on the node. For solid resource enforcement, you should impose default resource restrictions, and then upgrade to the version supporting cgroups and enable that.

How to limit Docker filesystem space available to container(s)

The general scenario is that we have a cluster of servers and we want to set up virtual clusters on top of that using Docker.
For that we have created Dockerfiles for different services (Hadoop, Spark etc.).
Regarding the Hadoop HDFS service however, we have the situation that the disk space available to the docker containers equals to the disk space available to the server. We want to limit the available disk space on a per-container basis so that we can dynamically spawn an additional datanode with some storage size to contribute to the HDFS filesystem.
We had the idea to use loopback files formatted with ext4 and mount these on directories which we use as volumes in docker containers. However, this implies a large performance loss.
I found another question on SO (Limit disk size and bandwidth of a Docker container) but the answers are almost 1,5 years old which - regarding the speed of development of docker - is ancient.
Which way or storage backend would allow us to
Limit storage on a per-container basis
Has near bare-metal performance
Doesn't require repartitioning of the server drives
You can specify runtime constraints on memory and CPU, but not disk space.
The ability to set constraints on disk space has been requested (issue 12462, issue 3804), but isn't yet implemented, as it depends on the underlying filesystem driver.
This feature is going to be added at some point, but not right away. It's a bit more difficult to add this functionality right now because a lot of chunks of code are moving from one place to another. After this work is done, it should be much easier to implement this functionality.
Please keep in mind that quota support can't be added as a hack to devicemapper, it has to be implemented for as many storage backends as possible, so it has to be implemented in a way which makes it easy to add quota support for other storage backends.
Update August 2016: as shown below, and in issue 3804 comment, PR 24771 and PR 24807 have been merged since then. docker run now allow to set storage driver options per container
$ docker run -it --storage-opt size=120G fedora /bin/bash
This (size) will allow to set the container rootfs size to 120G at creation time.
This option is only available for the devicemapper, btrfs, overlay2, windowsfilter and zfs graph drivers
Documentation: docker run/#Set storage driver options per container.

Resources