We are deploying several services in docker swarm (16 services in total) in a single master node. Most of these services are developed in Quarkus, some of them compiled in native mode, others are java compiled because of their dependencies.
When the services are in use everything works fine, but if they are waiting to be used for more than 15 minutes we start to receive this message:
Caused by: java.net.SocketException: Connection reset,
at java.net.SocketInputStream.read(SocketInputStream.java:186),
at java.net.SocketInputStream.read(SocketInputStream.java:140),
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137),
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153),
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280),
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138),
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56),
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259),
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163),
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157),
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273),
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125),
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272),
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186),
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89),
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110),
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185),
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83),
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56),
at org.jboss.resteasy.client.jaxrs.engines.ManualClosingApacheHttpClient43Engine.invoke(ManualClosingApacheHttpClient43Engine.java:302),
It doesn't matter if it is native compilation or Java compilation, in all of them the novelty is reproduced.
This message is presented when trying to consume the API also built in Quarkus, its consumption is through Swarm's internal network using the name of the service to resolve its location.
To consume the api we build a jar, in which we implement the services interfaces as indicated in the guide https://quarkus.io/guides/rest-client using additionally org.eclipse.microprofile.faulttolerance for the retry.
Because of lack of communication through the socket your connection is lost.
I found few solutions here: Apache HttpClient throws java.net.SocketException: Connection reset if I use it as singletone
Related
We have the following scenario:
Current working setup
Web API project using a single DockerFile
A release pipe line with an 'Azure App Service deploy' task.
Proposed new setup
Web API project using multi container Docker Compose file
A release pipe line with an 'Azure Web App for Containers' task.
Upon deploying the new setup we receive the below error message:
ERROR - multi-container unit was not started successfully
Unhandled exception. System.AggregateException: One or more errors occurred.
(Parameters: Connection String: XXX, Resource: https://vault.azure.net, Authority:
https://login.windows.net/xxxxx. Exception Message:
Tried to get token using Managed Service Identity.
Access token could not be acquired. Connection refused)
The exception thrown is because it can't connect to Azure MSI (Managed Service Identity). It does this to obtain a token before connecting to key vault.
I have tried the following based upon some research and solutions others have found:
Connecting with "RunAs=App" (this seems to be the default parameter-less constructor anyway)
Building up the connection string myself manually by pulling the "MSI_SECRET" environment variable from the machine. This is always blank.
Restarting MSI.
Upgrading and downgrading AppAuthentication package
MSI appears to be configured correctly as it works perfectly with our current working setup so we can rule that out.
It's worth noting that this is System assigned identity not a user assigned one.
The documentation that states which services support managed identites only mentions 'Azure Container Instances' not 'Azure Managed Container Instances' and that is for Linux/Preview too so that it could be not supported.
Services that support managed identities for Azure resources
We've spent a considerable amount of time getting to this point with the configuration and deployment and it would be great if we could resolve this last issue.
Any help appreciated.
Unfortunately, there currently is no multi-container support for managed identities. The multi-container feature is in preview and so does not have all its functionality working yet.
However, the documentation you linked to is also not as clear about the supported scenarios, so I am working on getting this documentation updated to better clarify this. I can update this answer once that's done.
I'm experiencing an issue with OpenSSL/DTLS server.
Environment: docker container based on CentOs7
OpenSSL version: OpenSSL-1.1.1d
A DTLS server (non-blocking) using DTLSv1_Listen having a UDP socket with SO_REUSEADDR is unable to accept a second
client connection when it has already been accepted a client connection and serving it.
When the first client has finished, the second client connection is accepted.
I have used the dtls_udp_echo.c (taken from http://web.archive.org/web/20150617012520/http://sctp.fh-muenster.de/dtls-samples.html ) to carry out the test and reproduce the issue.
The test application has been compiled and executed within a docker container, having CentOS7 as base image, but the behaviour has been noticed with other base images OS too (e.g. Redhat, Ubuntu, Debian, SLES).
The same application running on a bare metal works without any issue.
Is there any known compatibility issue between Docker and OpenSSL/DTLS?
Is there any specific configuration to be done to overcome this issue?
Best Regards
When attempting to run Boot inside Docker, using the adzerk/boot-clj image, I receive connection refused errors.
Specifically, when the container starts up, boot is started, and then a stack trace is output. The trace (which is not easy to copy and paste between computers with no connectivity) essentially is to do with downloading - https://github.com/boot-clj/boot/releases/download/2.7.2/boot.jar - and receiving "Connection refused" errors.
I’m asking, and answering this, question in the hope that it might help someone else.
Where to start?
My main problem was with a Docker + Clojure + Boot setup, specifically when running “boot” from inside the container. Doing this spewed out a stack trace. This is where my journey begins.
I’m using the adzerk/boot-clj image. I’ve used it locally (OSX) without issue, the problem I experienced was in using a VM (CentOS 7) hosted within a corporate data center.
docker run -ti adzerk/boot-clj
Issuing this starts up the container, the entry point is Boot, and it starts pulling down some jars, specifically boot.jar from Github. The resulting stack trace details several problems, but the crux of it was
“java.net.ConnectException: Connection refused” (connecting to Clojars.org:443)
Hmmm…
So instead of running Boot straight away in the container, I specified the container entry point as “—-entrypoint bash” so I can prod around a little.
So, wget - connection refused.
What about without Docker in the way. Same thing. Connection refused.
After a little wrangling with the network team, I found that the “https_proxy” env variable needs to be set on CentOS to route traffic out to the internet. A very specific issue to me in the situation.
However….
wget is now fine, both on the host, and inside the adzerk/boot-clj container. Boot however was not.
In an effort to simplify things even more, I took Docker out of the equation entirely, and used boot locally.
Installed java-1.8.0-openjdk.x86_64, installed Boot. Same problem.
So dug around a little, and found this - https ://github.com/boot-clj/boot-bin/issues/2
This was a start. It mentions setting the BOOT_JVM_OPTIONS, specifically https.proxyHost and https.proxyPort.
It still didn’t work… Arrrg.
OK, let’s take Boot out of the equation.
I wrote a test harness in Java, very simple that connects to https ://clojars.org and attempts to read the index page. Copied from https ://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html, and setting the JVM_OPTS.
It still fails. “Connection refused”
…. Weird beard.
I finally stumbled on this SO - https ://stackoverflow.com/questions/43695299/java-httpurlconnection-works-on-windows-and-fails-on-linux - specifically the answer from Stephen C
“Java doesn't necessarily respect your system's default proxy settings. Since you are able to "curl" the URL on the Linux machine, the most likely explanation is that Java is not using the proxy that you have configured. The following links explains various ways to configure the proxies for Java:”
So taking the first link - https ://stackoverflow.com/questions/120797/how-do-i-set-the-proxy-to-be-used-by-the-jvm - and the answer from Leonel
I issued “java -Dhttps.proxyHost=xxx -Dhttps.proxyPort=80 HelloWorld”
I get an error, but a different one. This is progress. “Unable to tunnel through proxy”
A quick Google of this led me here: http ://www.oracle.com/technetwork/java/javase/8u111-relnotes-3124969.html - “Disable Basic authentication for HTTPS tunneling”
So updated to “java -Dhttps.proxyHost=xxx -Dhttps.proxyPort=80 -Djdk.http.auth.tunneling.disabledSchemes=“” HelloWorld
Profit.
Info:
java -v
openjdk version 1.8.0_144
Openjdk Runtime Environment (build 1.8.0_144-b01)
OpenJDK 64-Bit Server VM (build 25.144-b01, mixed mode)
Sorry for all my profanity Boot.
I have together 6 containers running in docker swarm. Kafka+Zookeeper, MongoDB, A, B, C and Interface. Interface is the main access point from public - only this container publish the port - 5683. The interface container connects to A, B and C during startup. I am using docker-compose file + docker stack deploy, each service has a name which is used as host for interface. Everything starts successfully and works fine. After some time (20 mins,1h,..) I am not able to make request to interface. Interface receives my requests but application lost connection with service A,B,C or all of them. If I restart interface, it's able to reconnect to services A,B,C.
I firstly thought it's problem of application so I expose 2 new ports on each service (interface, A,B,C) and connect with profiler and debugger to them. Application is running properly, no leaks, no blocked threads, normally working and waiting for connections. Debugger shows me that when I make a request to interface and interface tries to request service A, Connection reset by peer exception was thrown.
During this debugging I found out interesting stuff. I attached debugger to interface when the services started and also debugger was disconnected after some time. + I was not able to reconnect it, until I made request to the container -> application. PRoblem - handshake failed.
Another interesting stuff that I found out was that I was not able to request neither interface. So I used wireshark to see what's going on and: SYN - ACK was fine. Then application post some data and interface respond with FIN,ACK. I assume that this also happen when interface tries to request service A and it FIN the connection. Codebase of Interface, A,B and C is the same regarding netty server.
Finally, I don't think it's a application issue. Why? I tried to deploy containers not as services. I run each container separately, published the ports of each and endpoint of services were set to localhost. (not overlay network). And it is working. Containers run without problem. + I didn't say at the beginning, that the the java applications (interface, A,B,C) runs without problem when they are running as standalone application - not in docker.
Could you please help me what could be the issue? Why the docker in case of overlay network is closing sockets?
I am using newest docker. I used also older.
Finally, I was able to solve the problem.
What was happening, one more time. Interface opens permanent TCP connection to A,B,C. When you try to run these services A,B,C as a standalone java applications, everything is working. When we dockerize them and run in swarm, it was working only few minutes. Strange was that the connection between Interface and another service was interrupted in the moment when you made a request from client to interface.
After many many unsuccessful tests and debugging each container I tried to run each docker container separately, with mapped ports and as endpoint I specified localhost. (each container exposed ports and interface was connecting to localhost) Funny thing happen, it was working. When you run containers like this, different network driver for container is used. Bridge one. If you run it in swarm, overlay network driver is used.
So it had to be something with the docker network, not with application itself. Next step was tcpdump from each container after couple of minutes, when it should stop working. It was very interesting.
Client -> Interface (OK, request accepted)
Interface ->(forward request because it belongs to A) A
Interface -> A [POST]
A -> Interface [RESET]
A was reseting opened TCP communication after couple of minutes without communication. Why?
Docker uses IP Virtual Server and IPVS maintains its own connection table. The default timeout for CLOSE_WAIT connections in IPVS table is 60 seconds. Hence when the server sends something after 60 seconds, the IPVS connection is no longer available and the packet looks invalid for a new TCP session and gets RST. On the client side, the connection remains forever in FIN_WAIT2 state because the app still has the socket open; kernel's fin_wait timer kicks in only for orphaned TCP sockets.
This is what I read about it and how understand it. I am not sure if my explanation of problem is correct, but based on these assumptions I implemented ping-pong between Interface and A,B,C services in case there is no communication for <60seconds. And, it’s working.
Got the same issue.
Specified
endpoint_mode: dnsrr
to properties of the service which plays "server" role and it works just fine.
https://forums.docker.com/t/tcp-timeout-that-occurs-only-in-docker-swarm-not-simple-docker-run/58179
I am trying to create a standalone SCTP diameter client using Jdiameter. The jar libraries I am using are jdiameter-api-1.5.9.0-build538-SNAPSHOT and jdiameter-impl-1.5.9.0-build538-SNAPSHOT
But I get this error Unable to create server socket for LocalPeer 'client.test.com' at 127.0.0.1:55555 (org.mobicents.protocols.api.AssociationListener)
It works fine with TCP. I tried to debug but couldn't figure out the problem. Kindly help me with this.
SCTP will not work on windows systems. For linux systems, you might have to install the sctp stack. However, be aware that for some linux distributions you might run into strange issues with it, so e.g. that the port is still being blocked even after all server sockets, client sockets etc are closed and even the processes have been shut down or killed. In these cases, you need to wait about 5-10 minutes until the sctp stack recognizes that there is no one anymore who is interested in this port and releases it by itself.