AWS DataSync agent ran into an error connecting to AWS - aws-datasync

I'm trying to sync local S3 object storage to AWS S3 using AWS DataSync.
I configured agent and it shows as online in AWS DataSync dashboard under Agents tab.
This is screen of local datasync agent: Agent screen
Also it passes all of the network requirements from the agent: Agent network checks
When I start the task, it finished after 10 minutes, 5 seconds with error:
DataSync agent ran into an error connecting to AWS. Please review the
DataSync network requirements and ensure required endpoints are
accessible from the agent. Please contact AWS support if the error
persists.
Cloudwatch logs here:
2022-11-24T23:51:55.523+04:00 [INFO] Request to start task-07d3b5f3a2145f50c.
2022-11-24T23:54:15.550+04:00 [INFO] Execution exec-089b9bd72e66f49aa started.
2022-11-25T00:04:20.687+04:00 [INFO] Execution exec-089b9bd72e66f49aa finished with status Connection refused.
Please help to figure out the problem

Related

How to connect local kubernetes with local jenkins

My kubernetes environment is running on kind while my jenkins environment is running as a docker instance. I tried watching all of the youtube tutorial regarding this and have followed all of the steps carefully but still, I can't seem to get past this very specific error. This error doesn't appear to any of the youtube tutorial I watched and it's very frustrating.
Error testing connection https://127.0.0.1:53883: java.net.ConnectException: Failed to connect to /127.0.0.1:53883
The URL is from running the command: kubectl cluster-info

Jenkins slave pod on kubernetes randomly failing

I have set a Jenkins master (on a VM) and this is provisioning jnlp slaves as kubernetes pods.
In very rare occasions, the pipeline fails, with this message:
java.io.IOException: Pipe closed
at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260)
at java.io.PipedInputStream.receive(PipedInputStream.java:226)
at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
at java.io.OutputStream.write(OutputStream.java:75)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.setupEnvironmentVariable(ContainerExecDecorator.java:510)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:474)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:333)
at hudson.Launcher$ProcStarter.start(Launcher.java:455)
Viewing kubernetes logs Stackdriver in Stackdriver, one can see that the pod does manage to connect to the master, e.g.
Handshaking
Agent discovery successful
Trying protocol: JNLP4-Connect
Remote Identity confirmed: <some_hash_here>
Connecting to <jenkins-master-url>:49187
started container
loading plugin ...
but after a while it fails and here are the relevant logs:
org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$SlaveDisconnector call
INFO: Disabled slave engine reconnects.
hudson.remoting.jnlp.Main$CuiListener status
Terminated
hudson.remoting.Request$2 run
Failed to send back a reply to the request hudson.remoting.Request$2#336ec321: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel#29d0e8b2:JNLP4-connect connection to <jenkins-master-url>/<public-ip-of-jenkins-master>:49187": channel is already closed
"Processing signal 'terminated'"
.
.
.
How can I further troubleshoot this random error?
Can you take a look at the Kubernetes Pod-Events with Stackdriver? We had a similar behavior with a different CI-System (GitlabCI). Our builds where also randomly failing. It turned out that the JVM inside the Container exceeded its memory limitation and was killed by Kubernetes (OOMKilled) and the CI-System recognised this as a build error.

Airflow: Could not send worker log to S3

I deployed Airflow webserver, scheduler, worker, and flower on my kubernetes cluster using Docker images.
Airflow version is 1.8.0.
Now I want to send worker logs to S3 and
Create S3 connection of Airflow from Admin UI (Just set S3_CONN as
conn id, s3 as type. Because my kubernetes cluster is running on
AWS and all nodes have S3 access roles, it should be sufficient)
Set Airflow config as follows
remote_base_log_folder = s3://aws-logs-xxxxxxxx-us-east-1/k8s-airflow
remote_log_conn_id = S3_CONN
encrypt_s3_logs = False
and first I tried creating a DAG so that it just raises an exception immediately after it's running. This works, log can be seen on S3.
So I modified so that the DAG now creates an EMR cluster and waits for it to be ready (waiting status). To do this, I restarted all 4 docker containers of airflow.
Now the DAG looks working, a cluster is started and once it's ready, DAG marked as success. But I could see no logs on S3.
There is no related error log on worker and web server, so I even cannot see what may cause this issue. The log just not sent.
Does anyone know if there is some restriction for remote logging of Airflow, other than this description in the official documentation?
https://airflow.incubator.apache.org/configuration.html#logs
In the Airflow Web UI, local logs take precedence over remote logs. If
local logs can not be found or accessed, the remote logs will be
displayed. Note that logs are only sent to remote storage once a task
completes (including failure). In other words, remote logs for running
tasks are unavailable.
I didn't expect it but on success, will the logs not be sent to remote storage?
The boto version that is installed with airflow is 2.46.1 and that version doesn't use iam instance roles.
Instead, you will have to add an access key and secret for an IAM user that has access in the extra field of your S3_CONN configuration
Like so:
{"aws_access_key_id":"123456789","aws_secret_access_key":"secret12345"}

Jenkins Docker Push to google cloud fails with an exception

I'm building a docker image via jenkins and want to deploy it to google cloud registry using the jenkins plugin (docker-build-step, Google Container Registry Auth Plugin, Google OAuth Credentials plugin), using the following instructions: https://wiki.jenkins-ci.org/display/JENKINS/Google+Container+Registry+Auth+Plugin
I have a VM instance on GCE where I have both jenkins and docker installed.
The build works OK, but it fails when I'm trying to push it to the Registry:
Successfully built c2ddc81c66d1
[Docker] INFO: Sucessfully created image eu.gcr.io/$project-id/base
[Docker] INFO: Pushing image eu.gcr.io/$project-id/base
ERROR: Build step failed with exception
javax.ws.rs.ProcessingException: org.apache.http.NoHttpResponseException: 127.0.0.1:2375 failed to respond
at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:513)
at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:246)
at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:667)
at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:664)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
--
But when I try to push it via SSH, it works:
bash-4.2$ whoami
jenkins
bash-4.2$ gcloud docker push eu.gcr.io/$project-id/base
The push refers to a repository [eu.gcr.io/$project-id/base]
41772e41ab05: Layer already exists
a03f60753e4e: Pushing [=========> ] 9.223 MB/47.44 MB
I believe that if it was some kind of scope from the google VM, I shouldn't be able to do it via ssh either. Could it be the jenkins user environment variables?
Anyone has a working configuration for a similar scenario? Anyone that knows Jenkins well knows what kind of config could be causing this?
Also, before using http://127.0.0.1:2375 as the docker url, I had unix:///var/run/docker.sock, and with that configuration, instead of "127.0.0.1:2375 no HttpResponseExpcetion" I had "localhost:80 no HttpResponseExpcetion" on the log, so using the socket isn't the solution either.
Regards,
JS
The error is related with failed a connection between Jenkins and Google Registry. I'm assuming there's no problem with the network connection, since it's calling the localhost.
[Docker] INFO: Sucessfully created image eu.gcr.io/$project-id/base
[Docker] INFO: Pushing image eu.gcr.io/$project-id/base
ERROR: Build step failed with exception
javax.ws.rs.ProcessingException: org.apache.http.NoHttpResponseException: 127.0.0.1:2375 failed to respond
So, that leaves us with an applicational problem. Check your Google oAuth credentials, it can be an unauthorized access exception being throwed, in which you'll have to create the credentials.
In any case, check your logs and elevate the log level to see if there's any important info that went unnoticed.

Unable to follow sandbox link in Apache Mesos

I have a mesos cluster setup with a master and slave on separate hosts on GCE. In the mesos console I can see the list of tasks - but the sandbox link isn't working.
I'm getting this error
Failed to connect to slave '20150806-140802-3049385994-5050-1-S0' on '4c37a1dd950b:5051'.
Potential reasons:
The slave's hostname, '4c37a1dd950b', is not accessible from your network
The slave's port, '5051', is not accessible from your network
The slave timed out or went offline
4c37a1dd950b is on the same server as the master.
Any tips appreciated

Resources