Service Fabric with windows container error using microsoft/aspnet docker image

Service Fabric with windows container error using microsoft/aspnet docker image - docker

I can deploy to my local service fabric cluster and it works fine. When I attempt to deploy it to my azure service fabric cluster it errors out with
Error event: SourceId='System.Hosting', Property='Download:1.0:1.0:5fb96531-7b75-42d0-8f23-6a9e42f0bda4'.
There was an error during download.System.Fabric.FabricException (-2147017731)
Container image download failed for ImageName=microsoft/aspnet with unexpected error. Exception=System.Exception: Container image history check failed after successful download. ImageName=microsoft/aspnet.
at Hosting.ContainerActivatorService.ContainerImageDownloader.d__6.MoveNext().
When googling this error, the common answers are that the vm hardrive is full (check one of my nodes, over 100gb available) or that the vm operating system is wrong (verified on the vm scaleset that it is running 2016-Datacenter-with-Containers). Also have seem some people mention not having enough resources on the vm's so I bumped them up to Standard_D3_v2 which should be plenty.
I did see some people mentioning increasing the container download timeout. The container is over 5gb so this is potentially an issues, and could work locally because its coming from docker cache. Unfortunately I'm not sure how to increase the timeout easily.
What else could cause this issue?

Make sure you target the correct version of the (base) image.
There are a few to choose from.
The version of the image must be compatible with the version of Windows you're running on the host.

For an image on this size, is likely that is timing out while downloading it.
You could try:
Use a private repository on same region as your cluster, like 'Azure Container Registry', you might get higher download speeds
If the botleneck is on your network, Increase the VM sizes, bigger VMs has more bandwidth.
Configure the cluster to wait longer to download the image. You can try setting the ContainerImageDownloadTimeout as described here
This is set in the cluster configuration, an you cluster manifest will have a section like this:
{
"name": "Hosting",
"parameters": [
{
"name": "ContainerImageDownloadTimeout",
"value": "1200"
}
]
}
To change the settings from an existing cluster, you can follow the instructions found here and here

Related

Unable to reach registry-1.docker.io from Kind cluster node on WSL2

I am setting up and airflow k8s cluster using kind deployment on a WSL2 setup. When I execute standard helm install $RELEASE_NAME apache-airflow/airflow --namespace $NS it fails. Further investigation shows that cluster worker node cannot connect to registry-1.docker.io.
Error log for one the image pull
Failed to pull image "redis:6-buster": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/redis:6-buster": failed to resolve reference "docker.io/library/redis:6-buster": failed to do request: Head "https://registry-1.docker.io/v2/library/redis/manifests/6-buster": dial tcp: lookup registry-1.docker.io on 172.19.0.1:53: no such host
I can access all other websites from this node e.g. google.com, yahoo.com merriam-webster.com etc. ; even docker.com works. This issue is very specific to registry-1.docker.io.
All the search and links seems to be around general internet connection issue.
Current solution:
If I manually change the /etc/resolv.conf on the kind worker node to point to the IP address from /etc/resolv.conf of the WSL2 Debian main IP address, then it works.
But, this is a dynamic cluster and node and I cannot do this every time. I am currently searching for a way as to how the make it a part of the cluster configuration. Some way that makes it work just by saying kind create cluster and one should be able to use kubectl or helm by default.
However, I am more interested in figuring out why this network setup fails specifically for registry-1.docker.io. Is there some configuration that can be done to avoid changing DNS to host IP or google DNS? As the current network configuration seems to work pretty much for the rest of the internet.
I have documented all the steps and investigation details including some of network configuration details on github repositroy. If you need any further information to help solve the issue, please let me know. I will keep on updating the github documentation as I make progress.
Setup:
Windows 11 with WSL2 without any Docker desktop
WSL2 image : Debian bullseye (11) with docker engine on linux
Docker version : 20.10.2
Kind version : 0.11.1
Kind image: kindest/node:v1.20.7#sha256:cbeaf907fc78ac97ce7b625e4bf0de16e3ea725daf6b04f930bd14c
67c671ff9

I am not sure, if it is an answer or not. After spending 2 days trying to find solution. I thought to change the node image version. On the Kind release page, it says 1.21 as the latest image for the kind version 0.11.1. I had problems with 1.21 to even start the cluster. 1.20 faced this strange DNS image. So went with 1.23. It all worked fine with thus image.
However, to my surprise, when I changed the cluster configuration back to 1.20, the DNS issue was gone. So, I do not what changed due to switch of of the image, but I cannot reproduce the issue again! Maybe it will help someone else

I find that i have found the correct workaround for this bug: Switching IPTables to legacy mode has fixed this for me.
https://github.com/docker/for-linux/issues/1406#issuecomment-1183487816

Docker install of AZCore results in authserver+worldserver doesn't exist error

I'm trying to spin up a fresh server using the azerothcore docker installation guide. I have completed all of the early installation steps, up until running the containers. Upon running the containers (for worldserver and authserver) i see the following output from the containers. It appears the destination of the world and auth servers in dist/bin is missing, how may i resolve this issue?

Check your docker settings. Make sure you have enough memory. If containers have low memory they will not finish the compile. Check if you have build issues.

How can I clone my Google Cloud Instance so I can download it and host it locally using Docker [duplicate]

I have a Google Cloud VM that installed with my application. The installation step is completed and I:
Turned off the VM instance.
Exported the disk to disk image called MY_CUSTOM_IMAGE_1
My wish now is to use MY_CUSTOM_IMAGE_1 as the starting image of my docker image build. For building the images I'm using Google Cloud Build.
My docker file should look like this:
FROM MY_CUSTOM_IMAGE_1 AS BUILD_ENV
...
When I tried to use this image I got the build error:
ERROR: build step 0 "gcr.io/cloud-builders/docker" failed: exit status 1
ERROR
pull access denied for MY_CUSTOM_IMAGE_1, repository does not exist or may require 'docker login'
Step 1/43 : FROM MY_CUSTOM_IMAGE_1 AS BUILD_ENV
The reason is that VM images are not the same as Docker images.
Is this possible to make this transform (GCP VM Image -> Docker image), without external tools (outside GCP, like "docker private repositories")?
Thanks!

If you know all the installed things on your VM (and all the commands), do the same thing in a Dokerfile. Use as base image, the same OS version as your current VM. Perform some tests and it should be quickly equivalent.
If you have statefull files in your VM application, it's a little bit more complex, you have to mount a disk in your container and to update your application's configuration to write in the correct mounted folder. It's more "complex" but there is tons of example on internet!

No, this is not possible without a tool to extract your application out of the virtual machine image and recreate in a container. To the best of my knowledge, there is no general-purpose tool that exists.
There is a big difference between a container image and a virtual machine image. Container images do not have an operating system, virtual machine images are a complete operating system and device data. The two conceptually are similar, but extremely different in how they are implemented at the software and hardware level.

Can't Scale Down Kubernetes Deployment (Overscaled)

I've got kubernetes running via docker (Running linux containers on a windows host).
I created a deployment (1 pod with 1 container, simple hello world node app) scaled to 3 replicas.
Scaled to 3 just fine, cool, lets scale to 20, nice, still fine.
So I decided to take it to the extreme to see what happens with 200 replicas (Now I know).
CPU is now 80%, the dashboard wont run, and I can't even issue a powershell command to scale the deployment back down.
I've tried restarting docker and seeing if I can sneak in a powershell command as soon as docker and kubernetes are available, and it doesn't seem to be taking.
Are the kubernetes deployment configurations on disk somewhere so I can modify them when kubernetes is down so it definitely picks up the new settings?
If not, is there any other way I can scale down the deployment?
Thanks

https://github.com/docker/for-mac/issues/2536 is a useful thread on this as it gives tips for getting logs, increasing resources or if necessary doing a factory reset (as discussed in the comments)

Launching jobs with large docker images in mesos via aurora can be slow

When launching a task over mesos via aurora that uses a rather large docker image (~2GB) there is a long wait time before the task actually starts.
Even when the task has been previously launched and we would expect the docker image to already be available to the worker node, there is still a waiting time dependent on image size before the task actually launches. Using docker, you can launch a container almost instantly as long as it is in your images list already, does the mesos containerizer not support this "caching" as well ? Is this functionality something that can be configured ?
I haven't tried using the docker containerizer, but it is my understanding that it will be phased out soon anyway and that gpu resource isolation, which we require, only works for the mesos containerizer.

I am assuming you are talking about the unified containerizer running docker images? What is backend you are using? By default the Mesos agents use the copy backend which is why you are seeing it being slow. You can look at the backend the agent is using by hitting flags endpoint on the agent. Switch the backend to aufs or overlayfs to see if speeds up the launch. You can specify the backend through the flag --image_provisioner_backend=VALUE on the agent.
NOTE: There are few bugs fixes related to aufs and overlayfs backend in the latest Mesos release 1.2.0-rc1 that you might want to pick up. Not to mention that there is an autobackend feature in 1.2.0-rc1 that will automatically select the fastest backend available.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart