GCE VM can't connect to TPU - machine-learning

GCE VM can't connect to TPU - machine-learning

I've been following the instruction at https://cloud.google.com/tpu/docs/custom-setup
and now I'm trying to run a tiny example from https://cloud.google.com/tpu/docs/quickstart
But it hangs on sess.run(tpu.initialize_system())
I suspect that it can't access the TPU network endpoint, even though "gcloud beta compute tpus list" returns status "READY".

Related

There is a proper way to run Intel-specific docker containers on Mac M1?

Im working on web-application that uses IBM MQ as a message broker. I want to setup environment for integration tests via testcontainers, but there is no IBM MQ container for ARM arhitecture, so using Docker as a container manager is not a proper solution.
I replaced Docker with Podman on Intel machine using this article but Podman`s performance dropped significantly (25 seconds to run a container and infinite execution of podman ps) so i dont want to use this mechanism.
Also I've heard of Lima and Colima so now im totally confused and I can't decide what setup for my case is the best.

Being architecture independent is one of the benefits of testcontainers.cloud, a product by the maintainers of the Testcontainers libraries.
When you use Testcontainers Cloud your tests run locally as usual, and the containers are started in an isolated on-demand cloud environment which your tests provision and connect to via a small, user-space agent application.
Testcontainers Cloud is currently in a public beta, and you can evaluate it for your use-cases and setup by joining on the website.

Docker connectivity issues (to Azure DevOps Services from self hosted Linux Docker agent)

I am looking for some advice on debugging some extremely painful Docker connectivity issues.
In particular, for an Azure DevOps Services Git repository, I am running a self-hosted (locally) dockerized Linux CI (setup according to https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/docker?view=azure-devops#linux), which has been working fine for a few months now.
All this runs on a company network, and since last week the network connection of my docker container became highly unstable:
Specifically it intermittently looses network connection, which is also visible via the logs of the Azure DevOps agent, which then keeps trying to reconnect.
This especially happens while downloading Git LFS objects. Enabling extra traces via GIT_TRACE=1 highlights a lot of connection failures and retries:
trace git-lfs: xfer: failed to resume download for "SHA" from byte N: expected status code 206, received 200. Re-downloading from start
During such a LFS pull / fetch, sometimes the container even stops responding as a docker container list command only responds:
Error response from daemon: i/o timeout
As a result the daemon cannot recover on its own, and needs a manual restart (to get back up the CI).
Also I see remarkable differences in network performance:
Manually cloning the same Git repository (including LFS objects, all from scratch) in container instances (created from the same image) on different machines, takes less than 2mins on my dev laptop machine (connected from home via VPN), while the same operation easily takes up to 20minutes (!) on containers running two different Win10 machines (company network, physically located in offices, hence no VPN.
Clearly this is not about the host network connection itself, since cloning on the same Win10 hosts (company network/offices) outside of the containers takes only 14seconds!
Hence I am suspecting some network configuration issues (e.g. sth with the Hyper-V vEthernet Adapter? Firewall? Proxy? or whichever other watchdog going astray?), but after three days of debugging, I am not quite sure how to further investigate this issue, as I am running out of ideas and expertise. Any thoughts / advice / hints?
I should add that LFS configuration options (such as lfs.concurrenttransfers and lfs.basictransfersonly) did not really help, similarly for git config http.version (or just removing some larger files)
UPDATE
it does not actually seem to be about the self-hosted agent but a more general docker network cfg issue within my corporate network.
Running the following works consistently fast on my VPN machine (running from home):
docker run -it
ubuntu bash -c "apt-get update; apt-get install -y wget; start=$SECONDS;
wget http://cdimage.ubuntu.com/lubuntu/releases/18.04/release/lubuntu-18.04-alternate-amd64.iso;
echo Duration: $(( SECONDS - start )) seconds"
Comparision with powershell download (on the host):
$start=Get-Date
$(New-Object
net.webclient).Downloadfile("http://cdimage.ubuntu.com/lubuntu/releases/18.04/release/lubuntu-18.04-alternate-amd64.iso",
"e:/temp/lubuntu-18.04-alternate-amd64.iso")
'Duration: {0:mm}
min {0:ss} sec' -f ($(Get-Date)-$start)
Corporate network
Docker: 1560 seconds (=26 min!)
Windows host sys: Duration: 00 min 15 sec
Dev laptop (VPN, from home):
Docker: 144 seconds (=2min 24sec)
Windows host sys: Duration: 02 min 16 sec
Looking at the issues discussed in https://github.com/docker/for-win/issues/698 (and proposed workaround which didn't work for me), it seems to be a non-trivial problem with Windows / hyper-v ..

The whole issue "solved itself" when my company decided to finally upgrade from Win10 1803 to 1909 (which comes with WSL, replacing Hyper-V) .. 😂
Now everything runs supersmoothly (I kept running these tests for almost 20 times)

GCP windows VM will abruptly stop allowing to connect

I have jenkins installed as a service on a Google Cloud Platform Compute Engine VM.
It was working fine till a few days ago.
Recently I am having issues connecting with the compute engine vm itself. I try to connect to that vm using a RDP and the vm would not open. I am trying to trigger jenkins using https, the requests will timeout.
I have to restart the compute engine vm itself to make it working again. It works for some minutes and starts rejecting requests again.
Weird thing is that I have jenkins slaves installed on multiple different gcp compute vms (other than the jenkins master). And while I am unable to access the VM via RDP, I can trigger jenkins requests from other gcp slaves.
I have configured SSL for the jenkins.
I have a static IP assigned to the compute vm which has jenkins server installed.
Any direction on what might be causing the gcp compute engine vm to reject connection requests would be helpful.

Docker/SAM local aws sdk's requests failing with (InvalidSignatureException)

Hi when trying to use aws SDK's inside of a docker container I am getting the following error
> (InvalidSignatureException) when calling the PutItem operation:
> Signature expired: 20180613T153236Z is now earlier than
> 20180614T223818Z (20180614T225318Z - 15 min.)
When I use aws cli though and default credential providers in SDK's on my local machine aws api calls are working fine though. What is going wrong inside my container?

This might be due to the following issue with running docker on Mac https://github.com/docker/for-mac/issues/17 where your overall docker machine agents time gets out of sync when your system goes to sleep.
Try restarting the docker daemon on your system for quick fix. In issue post above they have some more long term fixes/suggestions

What is image "containersol/minimesos" in minimesos?

I was able to setup the minimesos cluster on my laptop and also could deploy a small command-line utility. Now the questions;
What is the image "containersol/minimesos" used for? It is pulled but I don't see it running, when I do "docker ps". "docker images" lists it.
How come when I run "top" inside the mesos-agent container, I see all the processes running in my host (laptop)? This is a bit strange.
I was trying to figure out what's inside minimesos script. I see that there's just one "docker run ... " command. Would really appreciate if I could get to know what the aforementioned command does that results into 4 containers (1 master, 1 slave, 1 zk, 1 marathon) running on my laptop.

containersol/minimesos runs the Java code that is the core of minimesos. It only runs until it executes the command from the CLI. When you do minimesos up the command name and the minimesosFile will be passed to this container. The container in turn will execute the Java code that will create the other containers that form the Mesos cluster specified in the minimesosFile. That should answer #3 also. Take a look at MesosCluster class thats the root of where the magic happens.
I don't know the answer to #2 will get back to you when I find out.

Every minimesos command runs as a short lived container, whose image is containersol/minimesos.
When you run 'minimesos up' it launches containersol/minimesos with 'up' as the argument. It then launches a cluster by starting other containers like containersol/mesos-agent and containersol/mesos-master. After the cluster is up the containersol/minimesos container exists and is removed.
We have separated cli and minimesos core as a refactoring to prepare for the upcoming API module. We are creating an API to support clients for different programming language. The first client will be a Golang client.
In this new setup minimesos will run launch a long running API server and any minimesos cli commands call the API. The clients will also launch the API server and call the API.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

GCE VM can't connect to TPU - machine-learning

Related

There is a proper way to run Intel-specific docker containers on Mac M1?

Docker connectivity issues (to Azure DevOps Services from self hosted Linux Docker agent)

GCP windows VM will abruptly stop allowing to connect

Docker/SAM local aws sdk's requests failing with (InvalidSignatureException)

What is image "containersol/minimesos" in minimesos?

Categories

Resources