Jenkins Slave Connection Timeout When Connecting

Jenkins Slave Connection Timeout When Connecting - jenkins

Last week I set up a selenium grid using jenkins and 4 slave windows VMs. As part of doing this I had to unblock ports for both the slave connection and the selenium connection.
The vms downloaded the jnlp starter and registered correctly and by the end of the day Friday I had my tests running as reported as expected.\
Happy Monday, I come in to find out over the weekend that the connections to all four of the VMs have been lost due to connection timeouts. (the initial error indicated it had been terminated because the ping was too long, subsequent attempts never successfully connect in the first place.)
My research on SO so far points to issues with the ports, so I checked to make sure they are still enabled, and they are. Next I restarted the jenkins instance, and still no success.
Interestingly, the connection to the jenkins selenium grid IS working, each of the standalone servers starts and registers correctly on the VMs, and they are all able to access the jenkins ui from the browsers, just not able to register as a slave through jnlp.
At this point I am at a loss, I've mirrored the exact same setup that was working last week. I checked with our devOps team that manages the server and verified there have been no changes on that end. The VMs have been untouched.

Found a solution, but it leaves at least one question.
To resolve this I altered the Jenkins global security settings to use a fixed port for TCP connections and made sure it was one of my enabled ports, connection goes through cleanly now.
That said - this should NOT have worked on its own. When trying to connect earlier the logs clearly stated that connection attempts at the given port were refused (exact same port, and it was enabled then as well.)
I can understand if the agent was trying to connect at a different port, but I don't understand why dedicating the port itself would make a difference to the connecting agent.

Related

Jenkins ssh why it shows this error com.jcraft.jsch.JSchException: channel is not opened?

I know I posted a question that was bogging me for days but found a solution for it just 5 minutes after posting so I am posting about this problem that I get ever since 2 hours, anyway, I have a job in Jenkins that executes a series of commands remotely via SSH but before there is a connection establishment it throws me this error: com.jcraft.jsch.JSchException: channel is not opened, on my topology I have the Jenkins server in my main pc and I want to communicate with a CentOS 7 VM, on my jenkins I have configured everything (the SSH agent on global configuration for example), on my CentOS 7 VM I don't think that there's a need to open the port 22, my expected results are obviously the possibility to execute the script (let's begin by connecting), my VM has the ip of 192.168.127.129, if you want another information you can ask me by commenting, thanks in advance

I did not resolve the problem, however my VM was in host only connection, I changed it to NAT and problem solved but it isn't a permanent one nor best practice, now my VM is connected to the internet and is exposed to all of its dangers

Microservices with dynamic ports

I have a series of microservices that I have been testing. Originally it was using Service Fabric however I have switched to using Consul, Fabio, Nomad which I like better.
In development on my machine things work well however I am running into some issues actually getting Fabio to work in a cluster format.
I have a cluster of 5 nodes each running Consul, Fabio, Nomad.
Each service gets a dynamic port at runtime and successfully registers itself.
On the node which the service is running Fabio correctly forwards traffic.
However if the same fabio url is used on a different node then traffic is forwarded to the correct node/port however that is closed so the connection doesn't work.
For instance if ServiceA running on MachineA on port 1234 then http://MachineA:9999/ServiceA correctly works.
However http://MachineB/ServiceA fails after MachineA tries to initiate a connection to MachineB on port 1234.
A solution would be to add firewall rules, I would imagine, however this requires all the Services to run as Admin which I don't want.
Is there a way to support this through Fabio?

Cannot Connect to a Firebird 2.5 database remotely

I currently have a Firebird 2.5 database at a client premises, installed on a Windows 7 Pro machine (32 bit), that has multiple stations in their local network that can connect to the database, along with the local machine being able to connect with our application and IBExpert.
However, for some of our software packages, a remote connection is required (outside of the local network). This previously was working but no longer works.
When I connect with FlameRobin from my office (I'm located in a different city / different network), I receive the following error message:
IBPP::SQLException
Context: Database::Connect
Message: isc_attach_database failed
SQL Message: -923
Connection not established
Engine Code : 335544421
Engine Message :
connection rejected by remote interface.
Performing this connection attempt with IBExpert, both from my office and from other external networks fail with a same message.
However, I am getting TCP/IP communication from what I can see. Here are the details of my troubleshooting steps for the last week:
Originally, I was receiving the following error when connecting from outside the network:
"Connection not established
Connection refused by remote interface"
Since that time, we have done a restart of the router and now have the current "connection rejected by remote interface." error message.
I can telnet to the public IP through port 3050 from my office and other outside networks.
I tested port 3050 on sites like YouGetSignal.com or CanYouSeeMe.org and they appear as open.
Other ports that we communicate on publicly are open and communicating.
The site has Kaspersky antivirus installed but all tests to connect via IBExpert while Kaspersky was in sleep mode behaved the same.
Installation of Firebird 2.5 to another workstation in the same local network, pointing to port 3051 (both in Firebird.conf and in the Windows Firewall and Router) show up as being open through Telnet and CanYouSeeMe.org but again, cannot be communicated on from outside via port 3051.
IBExpert works from a workstation in the network to the server
The server currently has no entry for RemoteBindAddress in the Firebird.conf
Wireshark shows that when a connecting from outside, there are packets coming through.
The TCP/IP test in IBexpert under Communication Diagnostics for the public IP as the host and the Service show the following Test Results:
Attempt connecting to XX.YY.ZZ.AAA.
Socket for connection obtained.
Found service 'GDS_DB' at port '3050'
Connection established to host 'XX.YY.ZZ.AAA',
on port 3050.
TCP/IP Communication Test Passed!
Database path, username, and password have all been checked multiple times.
locally on the server, I've changed security of the database.FDB and the security2.FDB to have Everyone, Full Control
At this point, we have a scheduled restart of the ISP's modem happening soon, although the fact that we have full TCP/IP communication over the port makes me doubtful that this is the issue.
If anyone can lead me down any recommended next steps to debug or to any tools that are available to help in this situation, that would be greatly appreciated.

This turns out to be a networking issue. We performed the following tests:
We performed a power cycle on the ISP's modem which showed no change in behavior
We connected a laptop directly to the ISP's modem but couldn't communicate to FB even with proper port forwarding rules in place on the machine and firewall.
We ran wireshark on both sides and on connection attempts, we found many attempts to connect with retransmissions that failed.
The technical team at the client side decided to install a VPN capable router and now we're good to go. From what we found there may be some kind of ISP blocking occurring as many of the tech teams remote services were failing to connect with similar behavior.
Hopefully this post helps people in the future with remote connectivity debugging, and all of the places you can look at when you're running into this problem.

Docker services stops communicating after some time

I have together 6 containers running in docker swarm. Kafka+Zookeeper, MongoDB, A, B, C and Interface. Interface is the main access point from public - only this container publish the port - 5683. The interface container connects to A, B and C during startup. I am using docker-compose file + docker stack deploy, each service has a name which is used as host for interface. Everything starts successfully and works fine. After some time (20 mins,1h,..) I am not able to make request to interface. Interface receives my requests but application lost connection with service A,B,C or all of them. If I restart interface, it's able to reconnect to services A,B,C.
I firstly thought it's problem of application so I expose 2 new ports on each service (interface, A,B,C) and connect with profiler and debugger to them. Application is running properly, no leaks, no blocked threads, normally working and waiting for connections. Debugger shows me that when I make a request to interface and interface tries to request service A, Connection reset by peer exception was thrown.
During this debugging I found out interesting stuff. I attached debugger to interface when the services started and also debugger was disconnected after some time. + I was not able to reconnect it, until I made request to the container -> application. PRoblem - handshake failed.
Another interesting stuff that I found out was that I was not able to request neither interface. So I used wireshark to see what's going on and: SYN - ACK was fine. Then application post some data and interface respond with FIN,ACK. I assume that this also happen when interface tries to request service A and it FIN the connection. Codebase of Interface, A,B and C is the same regarding netty server.
Finally, I don't think it's a application issue. Why? I tried to deploy containers not as services. I run each container separately, published the ports of each and endpoint of services were set to localhost. (not overlay network). And it is working. Containers run without problem. + I didn't say at the beginning, that the the java applications (interface, A,B,C) runs without problem when they are running as standalone application - not in docker.
Could you please help me what could be the issue? Why the docker in case of overlay network is closing sockets?
I am using newest docker. I used also older.

Finally, I was able to solve the problem.
What was happening, one more time. Interface opens permanent TCP connection to A,B,C. When you try to run these services A,B,C as a standalone java applications, everything is working. When we dockerize them and run in swarm, it was working only few minutes. Strange was that the connection between Interface and another service was interrupted in the moment when you made a request from client to interface.
After many many unsuccessful tests and debugging each container I tried to run each docker container separately, with mapped ports and as endpoint I specified localhost. (each container exposed ports and interface was connecting to localhost) Funny thing happen, it was working. When you run containers like this, different network driver for container is used. Bridge one. If you run it in swarm, overlay network driver is used.
So it had to be something with the docker network, not with application itself. Next step was tcpdump from each container after couple of minutes, when it should stop working. It was very interesting.
Client -> Interface (OK, request accepted)
Interface ->(forward request because it belongs to A) A
Interface -> A [POST]
A -> Interface [RESET]
A was reseting opened TCP communication after couple of minutes without communication. Why?
Docker uses IP Virtual Server and IPVS maintains its own connection table. The default timeout for CLOSE_WAIT connections in IPVS table is 60 seconds. Hence when the server sends something after 60 seconds, the IPVS connection is no longer available and the packet looks invalid for a new TCP session and gets RST. On the client side, the connection remains forever in FIN_WAIT2 state because the app still has the socket open; kernel's fin_wait timer kicks in only for orphaned TCP sockets.
This is what I read about it and how understand it. I am not sure if my explanation of problem is correct, but based on these assumptions I implemented ping-pong between Interface and A,B,C services in case there is no communication for <60seconds. And, it’s working.

Got the same issue.
Specified
endpoint_mode: dnsrr
to properties of the service which plays "server" role and it works just fine.
https://forums.docker.com/t/tcp-timeout-that-occurs-only-in-docker-swarm-not-simple-docker-run/58179

How to administer computer when no inbound connection is possible?

I have to periodically administer my parent's Linux computer, because they are too old to understand how to do this themselves. Computer is in the remote location. I always used ssh through the port forwarding on the router. However, their provider recently removed the ability to make any inbound connection and my ssh doesn't connect any more.
My question is: what is the next best way to administer it?
I know that VPN can possibly be used. I can (maybe) set up VPN network with this computer.
Also I can make it try to connect with ssh to my home computer on a particular port for ex. every 15 minutes, establishing the port forwarding back to it. Custom shell script should be used for this.
But what are the alternatives?
Any other, nicer way to be able to connect to this Linux machine from outside?

Similar to your suggestion: get your parents to run a script (all they would have to do is double click something) which ssh's to your computer, and then run back on that?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart