I have a series of microservices that I have been testing. Originally it was using Service Fabric however I have switched to using Consul, Fabio, Nomad which I like better.
In development on my machine things work well however I am running into some issues actually getting Fabio to work in a cluster format.
I have a cluster of 5 nodes each running Consul, Fabio, Nomad.
Each service gets a dynamic port at runtime and successfully registers itself.
On the node which the service is running Fabio correctly forwards traffic.
However if the same fabio url is used on a different node then traffic is forwarded to the correct node/port however that is closed so the connection doesn't work.
For instance if ServiceA running on MachineA on port 1234 then http://MachineA:9999/ServiceA correctly works.
However http://MachineB/ServiceA fails after MachineA tries to initiate a connection to MachineB on port 1234.
A solution would be to add firewall rules, I would imagine, however this requires all the Services to run as Admin which I don't want.
Is there a way to support this through Fabio?
Related
I want to have the following setup:
3 Couchbase nodes, each running on a separate container, all in the same cluster
Python application running in another container (querying, inserting, deleting data from the Couchbase cluster)
What I managed to do:
Set up a cluster, bucket, query the bucket via UI (accessed by localhost:8091)
What I didn't manage to do:
Create a connection between a Python application (which would at the end be Dockerized, for now for the sake of simplicity, let's treat it as local) and the (already working) cluster. Unfortunately, I cannot access it via Docker containers IP's with 8091 port, via localhost too. Unfortunately, the Couchbase documentation is either severely lacking here, or I just don't understand it. I tried to even use the setting-alternate-address option, but without much success (maybe I used it wrongly, so if you have any "how-to's" explaining the process, I'd still be grateful)
The connection works if there is one node, but throws Timeout if I set up 3 nodes.
I would really appreciate any tips leading to solving this problem.
EDIT: Adding code and error message:
connection_string = "couchbase://localhost"
cluster = Cluster.connect(connection_string, ClusterOptions(PasswordAuthenticator(os.getenv("LOGIN"), os.getenv("PASSWORD"))))
# following a successful authentication, a bucket can be opened.
# access a bucket in that cluster
bucket = cluster.bucket('travel-sample')
coll = bucket.default_collection()
result = coll.get('airline_10')
print(result.content_as[dict])
Error message:
couchbase.exceptions.UnAmbiguousTimeoutException: <ec=14, category=couchbase.common, message=unambiguous_timeout, context=KeyValueErrorContext:{'key': 'airline_10', 'bucket_name': 'travel-sample', 'scope_name': '_default', 'collection_name': '_default', 'opaque': 0}, C Source=C:\Jenkins\workspace\python\sdk\python-scripted-build-pipeline\py-client\src\kv_ops.cxx:209>
Couchbase SDKs need to be able to connect to every node on the cluster.
If you are running an app outside of the Docker host, it cannot connect to every node (you can't expose every node on the same port).
This is exactly why it will work fine with one node, but not with multiple (more details in the documentation)
If you run the Python app inside of a container that runs in the Docker host, it should connect just fine (or stick to a single node for development - which is mostly fine if you're not testing something specific to clustering/failover/replication).
I'm using docker in swarm mode for the services in my application and traefik to handle, well, the traffic. My goal is to make a separate service for each API section my application has (so for example requests on domain.com/api/foo_api go to the foo_api service and requests on domain.com/api/bar_api go to the bar_api service.
Now all this is pretty straightforward with traefik. However, I'm also using the API services with other internal services not related to the API. They use a websocket connection to the internal docker URL, so currently it's ws://api:api_port/ws. However, if I split up the API part I'd need something like ws://foo_api:foo_api_port/ws which obviously leaves the service only access to the foo_api, not every other one.
So my question is: Can I route this websocket traffic with traefik similiar to how I do it externally, but internally in the docker net?
Traefik is a north-south reverse proxy. Most people historically in traditional infrastructure would use NGINX or Apache to address inbound - good to see you using a more modern tool. What you are describing is an east-west pattern of communication inside your firewall behind traefik (assuming you control all ingress through traefik).
Have you considered using service discovery and registry capabilities with tools like Hashicorp Consul - https://consul.io?
The idea of having service discovery is so that your containers / services inside the swarm can be discovered and made available through the registry and referenced in proximation to each other by name without the pains of manual labor in building and maintaining complicated name-IP-lookups. Most understand this historically in a more persistent model behind DNS SRV which requires external query. Consul can still support that legacy reference integration as well.
This site might help you along: https://attx-project.github.io/Consul-for-Service-Discovery-on-Docker-Swarm.html
They appear to have addressed a similar case to yours. And the work is likely reusable with a few tweaks.
We operate a docker cluster with several workers and a manager.
Our current problem:
We have the Jwilery Nginx proxy running on all nodes which does not cause any problems. What causes us problems is, if we operate a service e.g. grav.
This is only then available, if the domain points to the IP address on which the service is running at that time.
My question now:
Is there a way to route the domain in such a way that we only have to set an A-record and Docker does the internal routing on the respective node where the website is running?
If yes, how would we realize this or are there other alternatives with which this is easier to implement?
Useful information:
1 manager
4 workers
a total of 5 ip addresses (Public)
All Barebone Server with Docker (Without Kubernetes etc.)
1 decentralized data server with NVME
Website can be called if the domain points to the judge Worker Target 1 Public IP for all domains with failover incl. Internal routing to the respective workers.
Resources:
To implement this, no resources are a shame. Other servers could also be used for this scenario.
ps: You could also contact me in other ways for testing purposes etc.pp.
I have together 6 containers running in docker swarm. Kafka+Zookeeper, MongoDB, A, B, C and Interface. Interface is the main access point from public - only this container publish the port - 5683. The interface container connects to A, B and C during startup. I am using docker-compose file + docker stack deploy, each service has a name which is used as host for interface. Everything starts successfully and works fine. After some time (20 mins,1h,..) I am not able to make request to interface. Interface receives my requests but application lost connection with service A,B,C or all of them. If I restart interface, it's able to reconnect to services A,B,C.
I firstly thought it's problem of application so I expose 2 new ports on each service (interface, A,B,C) and connect with profiler and debugger to them. Application is running properly, no leaks, no blocked threads, normally working and waiting for connections. Debugger shows me that when I make a request to interface and interface tries to request service A, Connection reset by peer exception was thrown.
During this debugging I found out interesting stuff. I attached debugger to interface when the services started and also debugger was disconnected after some time. + I was not able to reconnect it, until I made request to the container -> application. PRoblem - handshake failed.
Another interesting stuff that I found out was that I was not able to request neither interface. So I used wireshark to see what's going on and: SYN - ACK was fine. Then application post some data and interface respond with FIN,ACK. I assume that this also happen when interface tries to request service A and it FIN the connection. Codebase of Interface, A,B and C is the same regarding netty server.
Finally, I don't think it's a application issue. Why? I tried to deploy containers not as services. I run each container separately, published the ports of each and endpoint of services were set to localhost. (not overlay network). And it is working. Containers run without problem. + I didn't say at the beginning, that the the java applications (interface, A,B,C) runs without problem when they are running as standalone application - not in docker.
Could you please help me what could be the issue? Why the docker in case of overlay network is closing sockets?
I am using newest docker. I used also older.
Finally, I was able to solve the problem.
What was happening, one more time. Interface opens permanent TCP connection to A,B,C. When you try to run these services A,B,C as a standalone java applications, everything is working. When we dockerize them and run in swarm, it was working only few minutes. Strange was that the connection between Interface and another service was interrupted in the moment when you made a request from client to interface.
After many many unsuccessful tests and debugging each container I tried to run each docker container separately, with mapped ports and as endpoint I specified localhost. (each container exposed ports and interface was connecting to localhost) Funny thing happen, it was working. When you run containers like this, different network driver for container is used. Bridge one. If you run it in swarm, overlay network driver is used.
So it had to be something with the docker network, not with application itself. Next step was tcpdump from each container after couple of minutes, when it should stop working. It was very interesting.
Client -> Interface (OK, request accepted)
Interface ->(forward request because it belongs to A) A
Interface -> A [POST]
A -> Interface [RESET]
A was reseting opened TCP communication after couple of minutes without communication. Why?
Docker uses IP Virtual Server and IPVS maintains its own connection table. The default timeout for CLOSE_WAIT connections in IPVS table is 60 seconds. Hence when the server sends something after 60 seconds, the IPVS connection is no longer available and the packet looks invalid for a new TCP session and gets RST. On the client side, the connection remains forever in FIN_WAIT2 state because the app still has the socket open; kernel's fin_wait timer kicks in only for orphaned TCP sockets.
This is what I read about it and how understand it. I am not sure if my explanation of problem is correct, but based on these assumptions I implemented ping-pong between Interface and A,B,C services in case there is no communication for <60seconds. And, it’s working.
Got the same issue.
Specified
endpoint_mode: dnsrr
to properties of the service which plays "server" role and it works just fine.
https://forums.docker.com/t/tcp-timeout-that-occurs-only-in-docker-swarm-not-simple-docker-run/58179
I've developed a Grails application and I want my coworkers to be able to test it. They are on my network so I figure they can access it by using my IP address and the port number (8080). I've tried running it according to the steps laid out here and here to no avail.
I noticed that whenever I run the program, even when I follow those instructions, it says:
Grails application running at http://localhost:8080 in environment: development
Basic networking stuff here.
When something starts on interface 127.0.0.1 port something
Usually that port is then available for all the interfaces on the machine
if you run netstat -plant you will see running ports open on the machine.
Basically what ever ipconfig or ifconfig tells under Linux as your internal interface something like 192.168.1.x
The app is then available on http://192.168.1.x:8080
If you can't access it from other machines on network start by trying to ping {your machine ip}
It sounds like network security stopping local access from 1 machine accessing another.
Or even better still your good old MS firewall try stopping your security stuff on your desktop
It's not clear if you can access the app yourself on your own machine? It should be available at:
http://localhost:8080/appname
Your co-workers should be able to access the app by changing localhost to your computer name:
http://mycomputername:8080/appname