I have a HBase + HDFS setup, in which each of the HBase master, regionservers, HDFS namenode and datanodes are containerized.
When running all of these containers on a single host VM, things work fine as I can use the docker container names directly, and set configuration variables as:
CORE_CONF_fs_defaultFS: hdfs://namenode:9000
for both the regionserver and datanode. The system works as expected in this configuration.
When attempting to distribute these across multiple host VMs however, I run into issue.
I updated the config variables above to look like:
CORE_CONF_fs_defaultFS: hdfs://hostname:9000
and make sure the namenode container is exposing port 9000 and mapping it to the host machine's port 9000.
It looks like the names are not resolving correctly when I use the hostname, and the error I see in the datanode logs looks like:
2019-08-24 05:46:08,630 INFO impl.FsDatasetAsyncDiskService: Deleted BP-1682518946-<ip1>-1566622307307 blk_1073743161_2337 URI file:/hadoop/dfs/data/current/BP-1682518946-<ip1>-1566622307307/current/rbw/blk_1073743161
2019-08-24 05:47:36,895 INFO datanode.DataNode: Receiving BP-1682518946-<ip1>-1566622307307:blk_1073743166_2342 src: /<ip3>:48396 dest: /<ip2>:9866
2019-08-24 05:47:36,897 ERROR datanode.DataNode: <hostname>-datanode:9866:DataXceiver error processing WRITE_BLOCK operation src: /<ip3>:48396 dst: /<ip2>:9866
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:786)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
at java.lang.Thread.run(Thread.java:748)
Where <hostname>-datanode is the name of the datanode container, and the IPs are various container IPs.
I'm wondering if I'm missing some configuration variable that would let containers from other VMs connect to the namenode, or some other change that'd allow this system to be distributed correctly. I'm wondering if the system is expecting the containers to be named a certain way, for example.
Related
I have a private git repo. My runner is on a separate machine, both ubuntu. When I try ping $CI_REGISTRY in the yml file, I see during the build that the $CI_REGISTRY domain name is not resolving to the correct IP address. I need to hit the internal address of the server, not the external address so I set up a hosts file on the host on which gitlab runner is running that has the correct address, but the executor is ignoring it. Oddly, the address it's coming up with is an internal address on the cloudflare network, not the external address for the host I'm trying to reach as I would expect if it was doing a DNS lookup.
How can I either:
force the docker executor to use the host's hosts file
pass in an environment variable (or something) that the executor can use to resolve the address correctly
This issue was resolved by modifying /etc/gitlab-runner/config.toml:
[[runners]]
...
[runners.docker]
...
privileged = true
extra_hosts = ["repo.mydomain.com:172.23.8.182"]
You need to modify the container's /etc/hosts file, not the host's host file. The simplest way of doing this is the --add-host option.
Here's the documentation:
Add entries to container hosts file (--add-host)
You can add other hosts into a container’s /etc/hosts file by using one or more --add-host flags. This example adds a static address for a host named docker:
$ docker run --add-host=docker:10.180.0.1 --rm -it debian
root#f38c87f2a42d:/# ping docker
PING docker (10.180.0.1): 48 data bytes
56 bytes from 10.180.0.1: icmp_seq=0 ttl=254 time=7.600 ms
56 bytes from 10.180.0.1: icmp_seq=1 ttl=254 time=30.705 ms
^C--- docker ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 7.600/19.152/30.705/11.553 ms
(Source.)
I tried several solutions but nothing worked until i simply entered the ip+port instead of the my fake domain name
Enter the GitLab instance URL (for example, https://gitlab.com/):
[http://gitlab_ip:port]
.....
Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!
Sometimes it worth to think a bit before dive into stackoverflow :D
docker executor:
[[runners]]
...
executor = "docker"
[runners.docker]
extra_hosts = ["gitlab.someweb.com:10.0.1.1"]
kubernetes executor:
[[runners]]
...
executor = "kubernetes"
[runners.kubernetes]
[[runners.kubernetes.host_aliases]]
ip = "10.0.1.1"
hostnames = ["gitlab.someweb.com"]
You can use:
--docker-extra-hosts domainexample.com:x.x.x.x
I'm attempting to use GCP Memorystore to handle session ids for a event streaming job running on GCP Dataflow. The job fails with a timeout when trying to connect to Memorystore:
redis.clients.jedis.exceptions.JedisConnectionException: Failed connecting to host 10.0.0.4:6379
at redis.clients.jedis.Connection.connect(Connection.java:207)
at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:101)
at redis.clients.jedis.Connection.sendCommand(Connection.java:126)
at redis.clients.jedis.Connection.sendCommand(Connection.java:117)
at redis.clients.jedis.Jedis.get(Jedis.java:155)
My Memorystore instance has these properties:
Version is 4.0
Authorized network is default-auto
Master is in us-central1-b. Replica is in us-central1-a.
Connection properties: IP address: 10.0.0.4, Port number: 6379
> gcloud redis instances list --region us-central1
INSTANCE_NAME VERSION REGION TIER SIZE_GB HOST PORT NETWORK RESERVED_IP STATUS CREATE_TIME
memorystore REDIS_4_0 us-central1 STANDARD_HA 1 10.0.0.4 6379 default-auto 10.0.0.0/29 READY 2019-07-15T11:43:14
My Dataflow job has these properties:
runner: org.apache.beam.runners.dataflow.DataflowRunner
zone: us-central1-b
network: default-auto
> gcloud dataflow jobs list
JOB_ID NAME TYPE CREATION_TIME STATE REGION
2019-06-17_02_01_36-3308621933676080017 eventflow Streaming 2019-06-17 09:01:37 Running us-central1
My "default" network could not be used since it is a legacy network, which Memorystore would not accept. I failed to find a way to upgrade the default network from legacy to auto and did not want to delete the existing default network since this would require messing with production services. Instead I created a new network "default-auto" of type auto, with the same firewall rules as the default network. The one I believe is relevant for my Dataflow job is this:
Name: default-auto-internal
Type: Ingress
Targets: Apply to all
Filters: IP ranges: 10.0.0.0/20
Protocols/ports:
tcp:0-65535
udp:0-65535
icmp
Action: Allow
Priority: 65534
I can connect to Memorystore using "telnet 10.0.0.4 6379" from a Compute Engine instance.
Things I have tried, which did not change anything:
- Switched Redis library, from Jedis 2.9.3 to Lettuce 5.1.7
- Deleted and re-created the Memorystore instance
Is Dataflow not supposed to be able to connect to Memorystore, or am I missing something?
Figured it out. I was trying to connect to Memorystore from code called directly from the main method of my Dataflow job. Connecting from code running in a Dataflow step worked. On second though (well, actually more like 1002nd thought) this makes sense because main() is running on the driver machine (my desktop in this case) whereas the steps of the Dataflow graph will run on GCP. I have confirmed this theory by connecting to Memorystore on localhost:6379 in my main(). This works since I have an SSH tunnel to Memorystore running on port 6379 (using this trick).
I am trying to use Test Containers to run an integration test against HBase launched in a Docker container. The problem I am running into may be a bit unique to how a client interacts with HBase.
When the HBase Master starts in the container, it stores its hostname:port in Zookeeper so that clients can find it. In this case, it stores "localhost:16000".
In my test case running outside the container, the client retrieves "localhost:16000" from Zookeeper and cannot connect. The connection fails because the port has been remapped by TestContainers to some other random port, other than 16000.
Any ideas how to overcome this?
(1) One idea is to find a way to tell the HBase Client to use the remapped port, ignoring the value it retrieved from Zookeeper, but I have yet to find a way to do this.
(2) If I could get the HBase Master to write the externally accessible host:port in Zookeeper that would also fix the problem. But I do not believe the container itself has any knowledge about how Test Containers is doing the port remapping.
(3) Perhaps there is a different solution that Test Containers provides for this sort of situation?
You can take a look at KafkaContainer's implementation where we start a Socat (fast tcp proxy) container first to acquire a semi-random port and use it later to configure the target container.
The algorithm is:
In doStart, first start Socat targetting the original container's network alias & port like 12345
Get mapped port (it will be something like 32109 pointing to 12345)
Make the original container (e.g. with environment variables) use the mapped port in addition to the original one, or, if only one port can be configured, see CouchbaseContainer for the more advanced option
Return Socat's host & port to the client
we build a new image of hbase to be compliant with test container.
Use this image:
docker run --env HBASE_MASTER_PORT=16000 --env HBASE_REGION_PORT=16020 jcjabouille/hbase-standalone:2.4.9
Then create this Container (in scala here)
private[test] class GenericHbase2Container
extends GenericContainer[GenericHbase2Container](
DockerImageName.parse("jcjabouille/hbase-standalone:2.4.9")
) {
private val randomMasterPort: Int = FreePortFinder.findFreeLocalPort(18000)
private val randomRegionPort: Int = FreePortFinder.findFreeLocalPort(20000)
private val hostName: String = InetAddress.getLocalHost.getHostName
val hbase2Configuration: Configuration = HBaseConfiguration.create
addExposedPort(randomMasterPort)
addExposedPort(randomRegionPort)
addExposedPort(2181)
withCreateContainerCmdModifier { cmd: CreateContainerCmd =>
cmd.withHostName(hostName)
()
}
waitingFor(Wait.forLogMessage(".*0 row.*", 1))
withStartupTimeout(Duration.ofMinutes(10))
withEnv("HBASE_MASTER_PORT", randomMasterPort.toString)
withEnv("HBASE_REGION_PORT", randomRegionPort.toString)
setPortBindings(Seq(s"$randomMasterPort:$randomMasterPort", s"$randomRegionPort:$randomRegionPort").asJava)
override protected def doStart(): Unit = {
super.doStart()
hbase2Configuration.set("hbase.client.pause", "200")
hbase2Configuration.set("hbase.client.retries.number", "10")
hbase2Configuration.set("hbase.rpc.timeout", "3000")
hbase2Configuration.set("hbase.client.operation.timeout", "3000")
hbase2Configuration.set("hbase.client.scanner.timeout.period", "10000")
hbase2Configuration.set("zookeeper.session.timeout", "10000")
hbase2Configuration.set("hbase.zookeeper.quorum", "localhost")
hbase2Configuration.set("hbase.zookeeper.property.clientPort", getMappedPort(2181).toString)
}
}
More details here: https://hub.docker.com/r/jcjabouille/hbase-standalone
I have my web based java application working in wildfly/jboss version 10.I am using docker(1.13.1-cs2) to deploy my application.Now as per some HA(High availability) scenario I want my application to work in cluster mode.So I changes my wildfly configuration to cluster mode inside my standalone-full-ha.xml.After this changes everything works perfect only if I use default docker network and start container with docker bridge network. But As per my requirement I want this whole container/my application to work as a service by docker swarm.But if I start put this as a service than wildfly/jboss is not be able to start in cluster mode and throwing error like this :
21:01:27,885 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (38 bytes): java.io.IOException: Operation not permitted, headers: NAKACK2: [HIGHEST_SEQNO, seqno=2631], TP: [cluster_name=ee]
21:01:28,826 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (4166 bytes): java.io.IOException: Operation not permitted, headers: FORK: ee:activemq-cluster, NAKACK2: [MSG, seqno=2632], TP: [cluster_name=ee]
21:01:29,886 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (38 bytes): java.io.IOException: Operation not permitted, headers: NAKACK2: [HIGHEST_SEQNO, seqno=2632], TP: [cluster_name=ee]
21:01:30,826 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (4166 bytes): java.io.IOException: Operation not permitted, headers: FORK: ee:activemq-cluster, NAKACK2: [MSG, seqno=2633], TP: [cluster_name=ee]
Note: I am using default swarm ingress network for port expose and communication.
As per my troubleshooting this issue is related to multicast address used by wildfly/jboss version 10 creating the issue.
I have also tried these steps multicast address in docker
But it is still not help in my case.Can anyone help me in this ?it is appreciated very much!
Thank you!
The overlay network from Docker Swarm does currently not support IP multicast.
You can either fallback to TCP based unicast for your cluster. But that leaves the challenge to know the IP addresses of all other containers in the service.
Another way is to create a macvlan based network which supports unicast. Tutorial: http://collabnix.com/docker-17-06-swarm-mode-now-with-macvlan-support/
With that variant I have the problem that as soon as you connect such a network to a container ingress (routing mesh) and access to the ouside world via docker_gwbridge stops working (details: Docker Swarm container with MACVLAN network gets wrong gateway - no internet access)
I am facing a huge problem at deploying a dashDB local cluster. After a successful deployment the following error comes in case of trying to create a single table or launch a query. Furthermore webserver is not working properly like in previous SMP deployment.
Cannot connect to database "BLUDB" on node "20" because the difference
between the system time on the catalog node and the virtual timestamp
on this node is greater than the max_time_diff database manager
configuration parameter.. SQLCODE=-1472, SQLSTATE=08004,
DRIVER=4.18.60
I followed official deployment guide, so followings were doublechecked:
each physical machines' and docker containers' /etc/hosts file contains all ips, fully qualified and simple hostnames
there is a NFS preconfigured and mounted to /mnt/clusterfs on every single server
none of the servers signed an error at phase "docker logs --follow dashDB" command
nodes config file is located in /mnt/clusterfs directory
After starting dashDB with following command:
docker exec -it dashDB start
It looks as it should be (see below), but the error can be found at /opt/ibm/dsserver/logs/dsserver.0.log.
#
--- dashDB stack service status summary ---
##################################################################### Redirecting to /bin/systemctl status slapd.service
SUMMARY
LDAPrunning: SUCCESS
dashDBtablesOnline: SUCCESS
WebConsole : SUCCESS
dashDBconnectivity : SUCCESS
dashDBrunning : SUCCESS
#
--- dashDB high availability status ---
#
Configuring dashDB high availability ... Stopping the system Stopping
datanode dashdb02 Stopping datanode dashdb01 Stopping headnode
dashdb03 Running sm on head node dashdb03 .. Running sm on data node
dashdb02 .. Running sm on data node dashdb01 .. Attempting to activate
previously failed nodes, if any ... SM is RUNNING on headnode dashdb03
(ACTIVE) SM is RUNNING on datanode dashdb02 (ACTIVE) SM is RUNNING on
datanode dashdb01 (ACTIVE) Overall status : RUNNING
After several redeployment nothing has changed. Please help me in what I am doing wrong.
Many Thanks, Daniel
Always make sure to NTP service is started on every single cluster node before starting a docker container. Otherwise it will take no effect on it.