How to allow Azure IoT edge modules to execute http requests? - azure-iot-edge

running curl -m 20 https://www.google.com works, however when run from within edge module container it times out:
docker exec myModule sh -c 'curl -m 20 https://www.google.com'
curl: (28) Operation timed out after 20000 milliseconds with 0 out of 0 bytes received
We have the same edge environment running on several VMs, where it works as expected. It return a result. I assume a firewall rule or other setting is blocking the request? So I compared the results of sudo iptables -L -v between the two VMs and these are similar.
Any suggestions on what can cause this behavior? Or what am I missing?

We found the issue. The device on which the request timed out had two networks: an Ethernet and a GSM modem. Docker was bridged to the Ethernet network, which was not having an internet connection. It had to be the GSM modem. We solved this by running the azure-iot-edge network as "host" network. So it uses the same configuration as the host machine.
"createOptions": {
"NetworkingConfig": {
"EndpointsConfig": {
"host": {}
}
},
"HostConfig": {
"NetworkMode": "host"
}
}
Running as host network solved the issue on most machines. On some other machines we got another exception, namely:
System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (00000005, 0xFFFDFFFF): Name or service not known
This had to do with a mismatch between /etc/hosts and /etc/hostname, see this issue to fix it.

Related

Dockerized Zabbix: Server Can't Connect to the Agents by IP

Problem:
I'm trying to config a fully containerized Zabbix version 6.0 monitoring system on Ubuntu 20.04 LTS using the Zabbix's Docker-Compose repo found HERE.
The command I used to raise the Zabbix server and also a Zabbix Agent is:
docker-compose -f docker-compose_v3_ubuntu_pgsql_latest.yaml --profile all up -d
Although the Agent rises in a broken state and shows a "red" status, when I change its' IP address FROM 127.0.0.1 TO 172.16.239.6 (default IP Docker-Compose assigns to it) the Zabbix Server can now successfully connect and monitoring is established. HOWEVER: the Zabbix Server cannot connect to any other Dockerized Zabbix Agents on REMOTE hosts which are raised with the docker run command:
docker run --add-host=zabbix-server:172.16.238.3 -p 10050:10050 -d --privileged --name DockerHost3-zabbix-agent -e ZBX_SERVER_HOST="zabbix-server" -e ZBX_PASSIVE_ALLOW="true" zabbix/zabbix-agent:ubuntu-6.0-latest
NOTE: I looked at other Stack groups to post this question, but Stackoverflow appeared to be the go-to group for these Docker/Zabbix issues having over 30 such questions.
Troubleshooting:
Comparative Analysis:
Agent Configuration:
Comparative analysis of the working ("green") Agent on the same host as the Zabbix Server with Agents on different hosts showing "red" statuses (not contactable by the Zabbix server) using the following command show the configurations have parity.
docker exec -u root -it (ID of agent container returned from "docker ps") bash
And then execute:
grep -Ev ^'(#|$)' /etc/zabbix/zabbix_agentd.conf
Ports:
The correct ports were showing as open on the "red" Agents as were open on the "green" agent running on the same host as the Zabbix Server from the output of the command:
ss -luntu
NOTE: This command was issued from the HOST, not the Docker container for the Agent.
Firewalling:
Review of the iptables rules from the HOST (not container) using the following command didn't reveal anything of concern:
iptables -nvx -L --line-numbers
But to exclude Firewalling, I nonetheless allowed everything in iptables in the FORWARD table on both the Zabbix server and an Agent in an "red" status used for testing.
I also allowed everything on the MikroTik GW router connecting the Zabbix Server to the different physical hosts running the Zabbix Agents.
Routing:
The Zabbix server can ping remote Agent interfaces proving there's a route to the Agents.
AppArmor:
I also stopped AppArmor to exclude it as being causal:
sudo systemctl stop apparmor
sudo systemctl status apparmor
Summary:
So everything is wide-open, the Zabbix Server can route to the Agents and the config of the "red" agents have parity with the config of the "green" Agent living on the same host at the Zabbix Server itself.
I've setup non-containerized Zabbix installation in production environments successfully so I'm otherwise familiar with Zabbix.
Why can't the containerized Zabbix Server connect to the containerized Zabbix Agents on different hosts?
Short Answer:
There was NOTHING wrong with the Zabbix config; this was a Docker-induced problem.
docker logs <hostname of Zabbix server> revealed that there appeared to be NAT'ing happening on the Zabbix SERVER, and indeed there was.
Docker was modifying iptables NAT table on the host running the Zabbix Server container causing the source address of the Zabbix Server to present as the IP of the physical host itself, not the Docker-Compose assigned IP address of 172.16.238.3.
Thus, the agent was not expecting this address and refused the connection. My experience of Dockerized apps is that they are mostly good at modifying IP tables to create the correct connectivity, but not in this particular case ;-).
I now reviewed the NAT table by executing the following command on the HOST (not container):
iptables -t nat -nvx -L --line-numbers
This revealed that Docker was being, erm "helpful" and NAT'ing the Zabbix server's traffic
I deleted the offending rules by their rule number:
iptables -t nat -D <chain> <rule #>
After which the Zabbix server's IP address was now presented correctly to the Agents who now accepted the connections and their statuses turned "green".
The problem is reproducible if you execute:
docker-compose -f docker-compose -f docker-compose_v3_ubuntu_pgsql_latest.yaml down
And then run the up command raising the containers again you'll see the offending iptables rule it restored to the NAT table of the host running the Zabbix Server's container breaking the connectivity with Agents.
Longer Answer:
Below are the steps required to identify and resolve the problem of the Zabbix server NAT'ing its' traffic out of the host's IP:
Identify If the HOST of the Zabbix Server container is NAT'ing:
We need to see how the IP of the Zabbix Server's container is presenting to the Agents, so we have to get the container ID for a Zabbix AGENT to review its' logs:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b2fcf38d601f zabbix/zabbix-agent:ubuntu-6.0-latest "/usr/bin/tini -- /u…" 5 hours ago Up 5 hours 0.0.0.0:10050->10050/tcp, :::10050->10050/tcp DockerHost3-zabbix-agent
Next, supply container ID for the Agent to the docker logs command:
docker logs b2fcf38d601f
Then Review the rejected IP address in the log output to determine if it's NOT the Zabbix Server's IP:
81:20220328:000320.589 failed to accept an incoming connection: connection from "NAT'ed IP" rejected, allowed hosts: "zabbix-server"
The fact that you can see this error proves that there is no routing or connectivity issues: the connection is going through, it's just being rejected by the application- NOT the firewall.
If NAT'ing proved, continue to next step
On Zabbix SERVER's Host:
The remediation happens on the Zabbix Server's Host itself, not the Agents. Which is good because we can fix the problem in one place versus many.
Execute below command on the Host running the Zabbix Server's container:
iptables -t nat -nvx -L --line-numbers
Output of command:
Chain POSTROUTING (policy ACCEPT 88551 packets, 6025269 bytes)
num pkts bytes target prot opt in out source destination
1 0 0 MASQUERADE all -- * !br-abeaa5aad213 192.168.24.128/28 0.0.0.0/0
2 73786 4427208 MASQUERADE all -- * !br-05094e8a67c0 172.16.238.0/24 0.0.0.0/0
Chain DOCKER (2 references)
num pkts bytes target prot opt in out source destination
1 0 0 RETURN all -- br-abeaa5aad213 * 0.0.0.0/0 0.0.0.0/0
2 95 5700 RETURN all -- br-05094e8a67c0 * 0.0.0.0/0 0.0.0.0/0
We can see the counters are incrementing for the "POSTROUTING" and "DOCKER" chains- both rule #2 in their respective chains.
These rules are clearly matching and have effect.
Delete the offending rules on the HOST of the Zabbix server container which is NATing its' traffic to the Agents:
sudo iptables -t nat -D POSTROUTING 2
sudo iptables -t nat -D DOCKER 2
Wait a few moments and the Agents should now go "green"- assuming there are no other configuration or firewalling issues. If the Agents remain "red" after applying the fix then please work through the troubleshooting steps I documented in the Question section.
Conclusion:
I've tested and restarting the Zabbix-server container does not recreate the deleted rules. But again, please note that a docker-compose down followed by a docker-compose up WILL recreate the deleted rules and break Agent connectivity.
Hope this saves other folks wasted cycles. I'm a both a Linux and network engineer and this hurt my head, so this would be near impossible to resolve if you're not a dab hand with networking.

Memcached in standalone Docker container time out and port error

I'm running a setup of 3 Ubuntu virtual machines. Two running the Python production code base and the other has a Memcached Docker container. On the Memcached machine I ran docker run -dit --name production-memcached --publish 11211:11211 memcached:latest.
The code base gets the following error message when trying to interact with it:
"exception": "TimeoutError(10060, 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond', None, 10060, None)"
I have ran docker exec -it production-memcached memcached stats and get the error message below.
failed to listen on TCP port 11211: Address already in use
However I've ran netstat -plnt and get tcp6 0 0 :::11211 :::* LISTEN 35030/docker-proxy, which looks fine to me.
I was able to get this to work by opening port 80 and using iptables to forward incoming port 80 to port 11211 but would prefer to use the Memcached port number.
The Memcached client is created by the following line:
client = base.Client(("domain.co.uk", 11211))
Any help would be appreciated, thanks.
DigitalOcean doesn't allow in-traffic on port 11211 to its virtual machines. If you want a Memcached machine you'll also need a virtual machine to act a proxy between you and it. I hope this saves someone the headache it caused me!

docker-compose internal DNS server 127.0.0.11 connection refused

Suddenly when I deployed some new containers with docker-compose the internal hostname resolution didn't work.
When I tried to ping one container from the other using the service name from the docker-compose.yaml file I got ping: bad address 'myhostname'
I checked that the /etc/resolv.conf was correct and it was using 127.0.0.11
When I tried to manually resolve my hostname with either nslookup myhostname. or nslookup myhostname.docker.internal I got error
nslookup: write to '127.0.0.11': Connection refused
;; connection timed out; no servers could be reached
Okay so the issue is that the docker DNS server has stopped working. All already started containers still function, but any new ones started has this issue.
I am running Docker version 19.03.6-ce, build 369ce74
I could of course just restart docker to see if it solves it, but I am also keen on understanding why this issue happened and how to avoid it in the future.
I have a lot of containers started on the server and a total of 25 docker networks currently.
Any ideas on what can be done to troubleshoot? Any known issues that could explain this?
The docker-compose.yaml file I use has worked before and no changes has been done to it.
Edit: No DNS names at all can be resolved. 127.0.0.11 refuses all connections. I can ping any external IP addresses, as well as the IP of other containers on the same docker network. It is only the 127.0.0.11 DNS server that is not working. 127.0.0.11 still replies to ping from within the container.
Make sure you're using a custom bridge network, NOT the default one. As per the Docker docs (https://docs.docker.com/network/bridge/), the default bridge network does not allow automatic DNS resolution:
Containers on the default bridge network can only access each other by IP addresses, unless you use the --link option, which is considered legacy. On a user-defined bridge network, containers can resolve each other by name or alias.
I have the same problem. I am using the pihole/pihole docker container as the sole dns server on my network. Docker containers on the same host as the pihole server could not resolve domain names.
I resolved the issue based on "hmario"'s response to this forum post.
In brief, modify the pihole docker-compose.yml from:
---
version: '3.7'
services:
unbound:
image: mvance/unbound-rpi:1.13.0
hostname: unbound
restart: unless-stopped
ports:
- 53:53/udp
- 53:53/tcp
volumes: [...]
to
---
version: '3.7'
services:
unbound:
image: mvance/unbound-rpi:1.13.0
hostname: unbound
restart: unless-stopped
ports:
- 192.168.1.30:53:53/udp
- 192.168.1.30:53:53/tcp
volumes: [...]
Where 192.168.1.30 is a ip address of the docker host.
I'm having exactly the same problem. According to the comment here I could reproduce the setting without docker-compose, only using docker:
docker network create alpine_net
docker run -it --network alpine_net alpine /bin/sh -c "cat /etc/resolv.conf; ping -c 4 www.google.com"
stopping docker (systemctl stop docker) and enabling debug output it gives
> dockerd --debug
[...]
[resolver] read from DNS server failed, read udp 172.19.0.2:40868->192.168.177.1:53: i/o timeout
[...]
where 192.168.177.1 is my local network ip for the host that docker runs on and where also pi-hole as dns server is running and working for all of my systems.
I played around with fixing iptables configuration. but even switching them off completely and opening everything did not help.
The solution I found, without fully understanding the root case, was to move the dns to another server. I installed dnsmasq on a second system with ip 192.168.177.2 that nothing else than forwarding all dns queries back to my pi-hole server on 192.168.177.1
starting docker on 192.168.177.1 again with dns configured to use 192.168.177.2 everything was working again
with this in one terminal
dockerd --debug --dns 192.168.177.2
and the command from above in another it worked again.
> docker run -it --network alpine_net alpine /bin/sh -c "cat /etc/resolv.conf; ping -c 4 www.google.com"
search mydomain.local
nameserver 127.0.0.11
options ndots:0
PING www.google.com (172.217.23.4): 56 data bytes
64 bytes from 172.217.23.4: seq=0 ttl=118 time=8.201 ms
--- www.google.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 8.201/8.201/8.201 ms
So moving the the dns server to another host and adding "dns" : ["192.168.177.2"] to my /etc/docker/daemon.json fixed it for me
Maybe someone else can help me to explain the root cause behind the problem with running the dns server on the same host as docker.
First, make sure your container is connected to a custom bridged network. I suppose by default in a custom network DNS request inside the container will be sent to 127.0.0.11#53 and forwarded to the DNS server of the host machine.
Second, check iptables -L to see if there are docker-related rules. If there is not, probably that's because iptables are restarted/reset. You'll need to restart docker demon to re-add the rules to make DNS request forwarding working.
I had same problem, the problem was host machine's hostname. I have checked hostnamectl result and it was ok but problem solved with stupid reboot. before reboot result of cat /etc/hosts was like this:
# The following lines are desirable for IPv4 capable hosts
127.0.0.1 localhost HostnameSetupByISP
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4
# The following lines are desirable for IPv6 capable hosts
::1 localhost HostnameSetupByISP
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
and after reboot, I've got this result:
# The following lines are desirable for IPv4 capable hosts
127.0.0.1 hostnameIHaveSetuped HostnameSetupByISP
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4
# The following lines are desirable for IPv6 capable hosts
::1 hostnameIHaveSetuped HostnameSetupByISP
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

Cannot Connect from Kafkacat running in docker to Kafka broker running locally on windows machine

I am running kafka locally on a windows machine. The way I am running kafka is using
.\bin\windows\kafka-server-start.bat .\config\server.properties
The server.properties are:
listeners=Listener_BOB://:29092,Listener_Kafkacat://127.0.0.1:9092
advertised.listeners=Listener_BOB://:29092,Listener_Kafkacat://127.0.0.1:9092
listener.security.protocol.map=Listener_BOB:PLAINTEXT,Listener_Kafkacat:PLAINTEXT
inter.broker.listener.name=Listener_BOB
I am running kafkacat using docker
docker run -it --network="host" --name="producer" confluentinc/cp-kafkacat bash
When I run
kafkacat -b host.docker.internal:9092 -C -t test
I get the error message:
% ERROR: Local: Broker transport failure: 127.0.0.1:9092/0: Connect to ipv4#127.0.0.1:9092 failed: Connection refused (after 0ms in state CONNECT)
I understand that I can run Kafka in docker but I would like to know why I cannot connect to the broker and produce or consume messages. I tried different things and tried to understand what the listeners do but I couldn't wrap my head around why this doesn't work.
When I do
kafkacat -b host.docker.internal:9092 -t test -L
I get
1 brokers:
broker 0 at 127.0.0.1:9092
1 topics:
topic "test" with 1 partitions:
partition 0, leader 0, replicas: 0, isrs: 0
Maybe someone can explain what am I doing wrong or tell me why this is not possible.
I downloaded the latest version of Kafka: kafka-2.4
Machine is Windows 10
Docker for Windows running using Linux containers
You need to set host.docker.internal:9092 as an advertised listener. Not localhost / 127.0.0.1
That's the address that clients will be returned when trying to connect
When you do that, you should be able to see this
1 brokers:
broker 0 at host.docker.internal:9092
Otherwise, as you can tell, the bootstrap connection worked, but the advertised address is going to be 127.0.0.1 to reach that specific broker, which obviously doesn't work because that would be the kafkacat container itself
Note: --network="host" doesn't do what you expect outside of a Linux host machine, so you're best off removing it

Docker 1.9.0 "bridge" versus a custom bridge network results in difference in hosts file and SSH_CLIENT env variable

Let me first explain what I'm trying to do, as there may be multiple ways to solve this. I have two containers in docker 1.9.0:
node001 (172.17.0.2) (sudo docker run --net=<<bridge or test>> --name=node001 -h node001 --privileged -t -i -v /sys/fs/cgroup:/sys/fs/cgroup <<image>>)
node002 (172.17.0.3) (,,)
When I launch them with --net=bridge I get the correct value for SSH_CLIENT when I ssh from one to the other:
[root#node001 ~]# ssh root#172.17.0.3
root#172.17.0.3's password:
[root#node002 ~]# env | grep SSH_CLIENT
SSH_CLIENT=172.17.0.3 56194 22
[root#node001 ~]# ping -c 1 node002
ping: unknown host node002
In docker 1.8.3 I could also use the hostnames I supply when I start them, in 1.8.3 that last ping statement works!
In docker 1.9.0 I don't see anything being added in /etc/hosts, and the ping statement fails. This is a problem for me. So I tried creating a custom network...
docker network create --driver bridge test
When I launch the two containers with --net=test I get a different value for SSH_CLIENT:
[root#node001 ~]# ssh root#172.18.0.3
root#172.18.0.3's password:
[root#node002 ~]# env | grep SSH_CLIENT
SSH_CLIENT=172.18.0.1 57388 22
[root#node001 ~]# ping -c 1 node002
PING node002 (172.18.0.3) 56(84) bytes of data.
64 bytes from node002 (172.18.0.3): icmp_seq=1 ttl=64 time=0.041 ms
Note that the ip address is not node001's, it seems to represent the docker host itself. The hosts file is correct though, containing:
172.18.0.2 node001
172.18.0.2 node001.test
172.18.0.3 node002
172.18.0.3 node002.test
My current workaround is using docker 1.8.3 with the default bridge network, but I want this to work with future docker versions.
Is there any way I can customize the test network to make it behave similarly to the default bridge network?
Alternatively:
Maybe make the default bridge network write out the /etc/hosts file in docker 1.9.0?
Any help or pointers towards different solutions will be greatly appreciated..
Edit: 21-01-2016
Apparently the problem is fixed in 1.9.1, with bridge in docker 1.8 and with a custom (--net=test) in 1.9.1, now the behaviour is correct:
[root#node001 tmp]# ip route
default via 172.17.0.1 dev eth0
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.5
[root#node002 ~]# env | grep SSH_CLIENT
SSH_CLIENT=172.18.0.3 52162 22
Retried in 1.9.0 to see if I wasn't crazy, and yeah there the problem occurs:
[root#node001 tmp]# ip route
default via 172.18.0.1 dev eth0
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3
[root#node002 ~]# env|grep SSH_CLI
SSH_CLIENT=172.18.0.1 53734 22
So after remove/stop/start-ing the instances the IP-addresses were not exactly the same, but it can be easily seen that the ssh_client source ip is not correct in the last code block. Thanks #sourcejedi for making me re-check.
Firstly, I don't think it's possible to change any settings on the default network, i.e. to write /etc/hosts. You apparently can't delete the default networks, so you can't recreate them with different options.
Secondly
Docker is careful that its host-wide iptables rules fully expose containers to each other’s raw IP addresses, so connections from one container to another should always appear to be originating from the first container’s own IP address. docs.docker.com
I tried reproducing your issue with the random containers I've been playing with. Running wireshark on the bridge interface for the network, I didn't see my ping packets. From this I conclude my containers are indeed talking directly to each other; the host was not doing routing and NAT.
You need to check the routes on your client container ip route. Do you have a route for 172.18.0.2/16? If you only have a default route, it could try to send everything through the docker host. And it might get confused and do masquerading as if it was talking with the outside world.
This might happen if you're running some network configuration in your privileged container. I don't know what's happening if you're just booting it with bash though.

Resources