How to connect to hdfs deployed in the docker? - docker

I deployed hadoop and spark from this project https://github.com/Marcel-Jan/docker-hadoop-spark
I've completed instructions "Quick Start Spark (PySpark)" in readme and got a dataframe from hdfs
I need to do the same thing but through airflow. I successfull connect to spark master:
spark = (SparkSession
.builder
# .master('local')
.master("spark://127.0.0.1:7077")
.appName("Test")
.getOrCreate())
But when i tried to reach hadoop cluster, i got the error
Call From SPB-379/127.0.1.1 to namenode:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I understand that airflow doesn't see address "namenode" and i tried to execute command -
docker inspect <hadoop container id> and get
"Networks": {
"docker-hadoop-spark_default": {
"IPAMConfig": null,
"Links": null,
"Aliases": [
"namenode",
"namenode",
"8a06dd5cca57"
],
"NetworkID": "f57164de9f26ef3a1a33c4ee46b24903c0824009ecfeda06f7f45ba9206f6a0a",
"EndpointID": "6bf662634c9ef925dc435d8a28364b045ef8b572e22f5c5316257ada2a52cc0d",
"Gateway": "172.19.0.1",
"IPAddress": "172.19.0.9",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:13:00:09",
"DriverOpts": null
}
But this ipaddress also gave a error. I am not confident that this way is correct and i want to give any feedback:)
For a better understanding
It works
It doesn't work
Network config of the docker-compose
Error - allocate worker to airflow
22/12/26 19:54:02 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.io.InvalidClassException: org.apache.spark.deploy.DeployMessages$ExecutorUpdated; local class incompatible: stream classdesc serialVersionUID = 1654279024112373855, local class serialVersionUID = -1971851081955655249

The error says namenode:9000 could not be resolved from your host. Regarding airflow, you need to ensure it is also running as a container, and not on your host, then Spark code will execute within the Docker network as well.
I have a Docker example here that works fine from a Spark notebook.

Related

TCP server started but connection refused inside a container

I have a golang application which starts a TCP server on port 8080. Every thing works fine when I run the app native.
However when I run it as a container, I am unable to even telnet to the port from within the container itself.
docker ps
9bb08785b728 dp_local "/bin/dragonpit-linux" 8 seconds ago Up 7 seconds 8080-8081/tcp youthful_villani
docker exec -it youthful_villani sh
/ # telnet localhost 8080
telnet: can't connect to remote host (127.0.0.1): Connection refused
Note: used 0.0.0.0 as well as 127.0.0.1 in place of localhost
TCP Server starting code
var err error
var lc net.ListenConfig
th.listener, err = lc.Listen(ctx, "tcp", "0.0.0.0:8080")
if err != nil {
return err
}
clog.Info(ctx, "tcp protocol listening", "listenAddr", th.addr)
I hard-coded the address to see whats the issue.
My Dockerfile
FROM golang:1.18.0 as builder
RUN mkdir -p /build/
ADD . /build
WORKDIR /build
RUN CGO_ENABLED=0 GOOS=linux go build -o tcp_server
FROM alpine:latest
EXPOSE 8080 8081
RUN mkdir -p /server/config
ADD config /server/config/
ENV SMC_PATH /
COPY --from=builder /build/tcp_server /bin/
RUN apk update
RUN apk add busybox-extras
ENTRYPOINT ["/bin/tcp_server"]
Output of docker-inspect
"NetworkSettings": {
"Bridge": "",
"SandboxID": "a70f476a8a7376b1e5a935b67170145f2222e059c5b2a1a63da50519a491babf",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {
"8080/tcp": null,
"8081/tcp": null
},
"SandboxKey": "/var/run/docker/netns/a70f476a8a73",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "7b6b9d0d5f83b8136919ac0f765167f6d380a8d836f460a0243bedeb3489a013",
"Gateway": "172.17.0.1",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"MacAddress": "02:42:ac:11:00:02",
"Networks": {
"bridge": {
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"NetworkID": "16563fdda1d5059bb6e2800455f2e87ac8d02e040386eae595a215692e849d76",
"EndpointID": "7b6b9d0d5f83b8136919ac0f765167f6d380a8d836f460a0243bedeb3489a013",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:02",
"DriverOpts": null
}
}
}
}
Docker running command and starting logs
docker run dp_local
25T14:59:46.367075600Z","caller":"build/main.go:24","msg":"Starting listening","listenAddr":"0.0.0.0:8080"
Edit:
I just printed out the Addr().String() of TCP Listener, I got this
{"level":"info","ts":"2022-04-25T15:44:28.952095700Z","caller":"server/tcp.go:65","msg":"[::]:8080"}
Answering my question here. It was actually a TCP parameter which was set incorrectly and caused issues. The culprit was MTU. For reference: https://www.baeldung.com/cs/tcp-max-packet-size#:~:text=The%20maximum%20size%20of%20a,size%20should%20never%20exceed%20MTU.
Two Assumptions based on your question
You want to expose ports 8080 and 8081
EXPOSE 8080 8081
2. You are running the docker image and the client on same machine
In that case you need to bind the Docker container ports to ports on an interface on the local machine. Let's use localhost. You docker run command should look like
docker run -p 127.0.0.1:8080:8080/tcp -p 127.0.0.1:8081:8081/tcp dp_local
This will bind container ports 8080 to 8080 and 8081 to 8081 on localhost of local machine.
Read more here

Docker - connect from HOST to container

I am using Docker for Windows with created bridge network:
"bridge":"none" (daemon.json)
docker network create --subnet 192.168.23.1/24 --gateway 192.168.23.1 --driver bridge my-network
... and container with Jenkins image.
When I configure connection between Jenkins (container) and Gitlab ("internet") everything is working fine. But when I am creating Webhook in Gitlab I have to enter URL of Jenkins. I was trying with localhost and IP obtained from IPAddress property:
"Networks": {
"my-network": {
"IPAMConfig": null,
"Links": null,
"Aliases": [
"jenkins",
"dff5dcb7c95a"
],
"NetworkID": "xxx",
"EndpointID": "yyy",
"Gateway": "192.168.23.1",
"IPAddress": "192.168.23.2",
"IPPrefixLen": 24,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "zzz",
"DriverOpts": null
}
}
.. but both options have been not working.
Question: How to determine correct URL?
How to connect from HOST to my container? Is it correct approach? What issues should I know to resolve following problems in the future?
Thanks for help :)
If you are running your Gitlab-instance also in a Docker-container you just need to add the Gitlab-container to the same Docker-network.
If your Gitlab-instance is really in the internet, you can not solve this with localhost or any local IP-adress. You need to:
find out your public IP-adress, maybe use dynDNS to get a fix domain if you have a dynamic IP
open a port on your router and configure your firewall
open a port on your local windows firewall
need to find out on which port jenkins is waiting for the webhooks from GitLab
map this port to the docker-container by using
--p <docker-internal-port>:<docker-external-port>
If you would provide some more information about your network infrastruture, the answer could be clearer.

What is the "gateway" found in `docker inspect`?

I have the nginx running in docker(Docker for mac, using docker-compose), here is the "Networks" section of docker inspect.
{
"Networks": {
"laradock_backend": {
"IPAMConfig": null,
"Links": null,
"Aliases": [
"c189cabxfdf9",
"nginx"
],
"NetworkID": "f4f8d8ff07ae90d5758644968d96f2g653fc5188c895f19c2d08de92c46cc075",
"EndpointID": "f8c6d5a8b061c75c44c2e078a65928a9b45dd91833fc05x7f249c64a180e84a1",
"Gateway": "172.21.0.1",
"IPAddress": "172.21.0.4",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:02:fc:15:10:05",
"DriverOpts": null
},
"laradock_frontend": {
"IPAMConfig": null,
"Links": null,
"Aliases": [
"c189cab4ddg1",
"nginx"
],
"NetworkID": "7b410b1bd764617a3f6146862307f886681e57aaxf057e4308f1236e1558ffcb",
"EndpointID": "0caa62bc5bbx600a5b1f260ebg11014e05394671ca347f818bfx819f43f7011e",
"Gateway": "172.22.0.1",
"IPAddress": "172.22.0.3",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "01:41:af:16:00:03",
"DriverOpts": null
}
}
}
I'm not an expert on computer networking. But I use docker very frequently, I just want to understand a little better about what's happening whenever I connect to the docker container through my host machine (localhost).
Where I found each network has a "Gateway" there.
I can't find such interfaces on my host machine. Where does the "Gateway" sits? Why we need such a thing?
Any simple charts would be helpful…
Thanks.
The gateway is the device that connects the network to the outside world. When a packet is sent to a destination that is not in the same network, the packet is sent to the gateway, which knows how to sent it to the next router and so on, until the packet receive at the destination.
In this case, this gateway device is virtual and its part of a bridge between the container and the host physical interface. This emulation is needed in order to permit the software that runs in the container to run as it would be on the host. It also separate the network of the container to the network of the host (the separation is the motive for which you use docker).

Docker Toolbox for Windows, Container is not accesible on the host

I am new and working on Docker on my Windows Machine. I got toolbox installed on my machine well and ran a container, see below:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cea8e6cf92b5 seqvence/static-site "/bin/sh -c 'cd /usr…" 21 minutes ago Up 21 minutes 0.0.0.0:32769->80/tcp, 0.0.0.0:32768->443/tcp competent_goodall
Now, this is a Linux container running on a Oracle VM on my windows machine. After this I expect to do a http://172.17.0.2:32769 on my windows machine and get a web page running on Ngnix server.
Here is the container inspect:
"Networks": {
"bridge": {
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"NetworkID": "81d64a885b80a000f3b91e9959acf125b170b7acb11a918bf77bf7fa3fea3ae1",
"EndpointID": "6cf13c7007539f0b31c6d8da52844477f13e1debd84a8f3e2ec63ee140e90014",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:02",
"DriverOpts": null
}
I am not sure if any more details would be needed to understand the problem, so please feel free to let me know.
I think you should try either of the following,
http://localhost:32769
http://(ipv4 of your windows machine):32769

cannot reach web app from docker host (not docker-machine)

I have a simple web app container running on docker engine for mac (v.1.12.5) using the following:
docker run --rm -p 80:8089 test-app
I've checked my container's IP under Networks > bridge from the following:
docker inspect $(docker ps -l --format "{{.ID}}")
"Networks": {
"bridge": {
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"NetworkID": "f53f1b93aa0f2fda186498d30e7f6e5b97ba952d1b6fe442663ac6025fd74ce3",
"EndpointID": "178937cf211c2360d9f9c594891985637d1d82a334a40b1b46d3acb2ea8aaf20",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.2", // <- use this?
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:02"
}
}
As far as I understand, I am running my web app container on my docker engine directly on my laptop (not via docker-machine). At this point, I'm not so much concerned with making it work rather than understanding.
My container has an assigned IP 172.17.0.2 which I've pasted above and I've mapped my web app container (with an EXPOSE 80) to port 8089 via the docker run -p flag.
I'm under the impression that I should be able to reach my web app at http:// 172.17.0.2:8089 but I just get no response. Why?
If the process in the container listens on port 80, your -p flag should be the other way round -p 8089:80. The service can then be reached at localhost:8089.

Resources