I have setup collectd in 3 Linux VMs and Influxdb in 1 VM where collectd is also running, but influxDB cannot fetch data from the remote Linux VMs.
The collectd.conf is having same configuration as it should be only the Server name should differ due to the different IP address and collectd is running successfully in these 3 VMs but somehow the data from these collectd does not reach the influxDB even it is listening on the same port.
The collectd.conf for the network plugin on these 3 VMs is the only difference and the Hostname parameter.
<Plugin network>
Server "IP1" "25826"
</Plugin>
<Plugin network>
Server "IP2" "25826"
</Plugin>
<Plugin network>
Server "IP3" "25826"
</Plugin>
Now the influxDB configuration file for collectd is:-
[[collectd]]
enabled = true
bind-address = ":25826"
database = "[COLLECTD DB]"
retention-policy = ""
Now as 1 of the collectd is running in same VM where InfluxDB is running, hence this collectd data is populated in the InfluxDB but the rest collectd data is not populated in the DB. Hence please help to fix this issue.
Related
My Spring Boot application uses ElasticSearch, therefore I have to start an instance of ElasticSearch for integration testing. To start the Docker container of ElasticSearch I use docker-maven-plugin. The integration test should work on GitLab and on developer machines.
My code works on GitLab's Runner (Docker container) with a Unix socket (see Use Docker socket binding), but not on developer machines.
The internal IP address of the Docker container (172.17.0.2) is not known with Docker Desktop for Windows (Docker host), see Networking features in Docker Desktop for Windows.
Source
<plugin>
<groupId>io.fabric8</groupId>
<artifactId>docker-maven-plugin</artifactId>
<version>0.33.0</version>
<configuration>
<registry>docker.elastic.co</registry>
<imagePullPolicy>always</imagePullPolicy>
<images>
<image>
<alias>elasticsearch</alias>
<name>elasticsearch/elasticsearch:7.6.2</name>
<run>
<env>
<discovery.type>single-node</discovery.type>
</env>
<wait>
<http>
<url>http://${docker.container.elasticsearch.ip}:9200</url>
<method>GET</method>
<status>200</status>
</http>
<time>60000</time>
</wait>
</run>
</image>
</images>
</configuration>
<executions>
<execution>
<id>docker:start</id>
<phase>pre-integration-test</phase>
<goals>
<goal>start</goal>
</goals>
</execution>
<execution>
<id>docker:stop</id>
<phase>post-integration-test</phase>
<goals>
<goal>stop</goal>
</goals>
</execution>
</executions>
</plugin>
Additional informations
The internal IP is saved in the Maven property docker.container.elasticsearch.ip by the docker-maven-plugin, see 5.2. docker:start.
The network is by default bridge, see 5.2.5. Network.
I can't change GitLab's runner to use Docker in Docker, see Use Docker-in-Docker workflow with Docker executor.
I can't change GitLab's runner to use shell execution mode, see Use shell executor
Logs
[INFO] DOCKER> Pulling from elasticsearch/elasticsearch
[INFO] DOCKER> Digest: sha256:59342c577e2b7082b819654d119f42514ddf47f0699c8b54dc1f0150250ce7aa
[INFO] DOCKER> Status: Image is up to date for docker.elastic.co/elasticsearch/elasticsearch:7.6.2
[INFO] DOCKER> Pulled elasticsearch/elasticsearch:7.6.2 in 2 seconds
[INFO] DOCKER> [elasticsearch/elasticsearch:7.6.2] "elasticsearch": Start container 121efac6ba65
[INFO] DOCKER> [elasticsearch/elasticsearch:7.6.2] "elasticsearch": Waiting on url http://172.17.0.2:9200 with method GET for status 200.
[ERROR] DOCKER> [elasticsearch/elasticsearch:7.6.2] "elasticsearch": Timeout after 60700 ms while waiting on url http://172.17.0.2:9200
[ERROR] DOCKER> Error occurred during container startup, shutting down...
[INFO] DOCKER> [elasticsearch/elasticsearch:7.6.2] "elasticsearch": Stop and removed container 121efac6ba65 after 0 ms
[ERROR] DOCKER> I/O Error [[elasticsearch/elasticsearch:7.6.2] "elasticsearch": Timeout after 60700 ms while waiting on url http://172.17.0.2:9200]
Research
Using port mappings and Docker host's IP address from property docker.host.address, doesn't work for Unix sockets, see 5.2.9. Wait.
Using host network doesn't work for Docker Desktop for Windows, see Use host networking.
Question
Is it possible to use only one docker-maven-plugin configuration for GitLab and for developer machines?
I have a HBase + HDFS setup, in which each of the HBase master, regionservers, HDFS namenode and datanodes are containerized.
When running all of these containers on a single host VM, things work fine as I can use the docker container names directly, and set configuration variables as:
CORE_CONF_fs_defaultFS: hdfs://namenode:9000
for both the regionserver and datanode. The system works as expected in this configuration.
When attempting to distribute these across multiple host VMs however, I run into issue.
I updated the config variables above to look like:
CORE_CONF_fs_defaultFS: hdfs://hostname:9000
and make sure the namenode container is exposing port 9000 and mapping it to the host machine's port 9000.
It looks like the names are not resolving correctly when I use the hostname, and the error I see in the datanode logs looks like:
2019-08-24 05:46:08,630 INFO impl.FsDatasetAsyncDiskService: Deleted BP-1682518946-<ip1>-1566622307307 blk_1073743161_2337 URI file:/hadoop/dfs/data/current/BP-1682518946-<ip1>-1566622307307/current/rbw/blk_1073743161
2019-08-24 05:47:36,895 INFO datanode.DataNode: Receiving BP-1682518946-<ip1>-1566622307307:blk_1073743166_2342 src: /<ip3>:48396 dest: /<ip2>:9866
2019-08-24 05:47:36,897 ERROR datanode.DataNode: <hostname>-datanode:9866:DataXceiver error processing WRITE_BLOCK operation src: /<ip3>:48396 dst: /<ip2>:9866
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:786)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
at java.lang.Thread.run(Thread.java:748)
Where <hostname>-datanode is the name of the datanode container, and the IPs are various container IPs.
I'm wondering if I'm missing some configuration variable that would let containers from other VMs connect to the namenode, or some other change that'd allow this system to be distributed correctly. I'm wondering if the system is expecting the containers to be named a certain way, for example.
I'm attempting to use GCP Memorystore to handle session ids for a event streaming job running on GCP Dataflow. The job fails with a timeout when trying to connect to Memorystore:
redis.clients.jedis.exceptions.JedisConnectionException: Failed connecting to host 10.0.0.4:6379
at redis.clients.jedis.Connection.connect(Connection.java:207)
at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:101)
at redis.clients.jedis.Connection.sendCommand(Connection.java:126)
at redis.clients.jedis.Connection.sendCommand(Connection.java:117)
at redis.clients.jedis.Jedis.get(Jedis.java:155)
My Memorystore instance has these properties:
Version is 4.0
Authorized network is default-auto
Master is in us-central1-b. Replica is in us-central1-a.
Connection properties: IP address: 10.0.0.4, Port number: 6379
> gcloud redis instances list --region us-central1
INSTANCE_NAME VERSION REGION TIER SIZE_GB HOST PORT NETWORK RESERVED_IP STATUS CREATE_TIME
memorystore REDIS_4_0 us-central1 STANDARD_HA 1 10.0.0.4 6379 default-auto 10.0.0.0/29 READY 2019-07-15T11:43:14
My Dataflow job has these properties:
runner: org.apache.beam.runners.dataflow.DataflowRunner
zone: us-central1-b
network: default-auto
> gcloud dataflow jobs list
JOB_ID NAME TYPE CREATION_TIME STATE REGION
2019-06-17_02_01_36-3308621933676080017 eventflow Streaming 2019-06-17 09:01:37 Running us-central1
My "default" network could not be used since it is a legacy network, which Memorystore would not accept. I failed to find a way to upgrade the default network from legacy to auto and did not want to delete the existing default network since this would require messing with production services. Instead I created a new network "default-auto" of type auto, with the same firewall rules as the default network. The one I believe is relevant for my Dataflow job is this:
Name: default-auto-internal
Type: Ingress
Targets: Apply to all
Filters: IP ranges: 10.0.0.0/20
Protocols/ports:
tcp:0-65535
udp:0-65535
icmp
Action: Allow
Priority: 65534
I can connect to Memorystore using "telnet 10.0.0.4 6379" from a Compute Engine instance.
Things I have tried, which did not change anything:
- Switched Redis library, from Jedis 2.9.3 to Lettuce 5.1.7
- Deleted and re-created the Memorystore instance
Is Dataflow not supposed to be able to connect to Memorystore, or am I missing something?
Figured it out. I was trying to connect to Memorystore from code called directly from the main method of my Dataflow job. Connecting from code running in a Dataflow step worked. On second though (well, actually more like 1002nd thought) this makes sense because main() is running on the driver machine (my desktop in this case) whereas the steps of the Dataflow graph will run on GCP. I have confirmed this theory by connecting to Memorystore on localhost:6379 in my main(). This works since I have an SSH tunnel to Memorystore running on port 6379 (using this trick).
Im running jgroups application inside docker containers. Containers are running across two nodes (A & B) and they are all connected using docker swarm mode overlay network.
I referred to https://docs.docker.com/engine/swarm/networking/
Here is what i did and observed.
Jgroup bind address is set to the overlay network IP of the container
containers running inside the same node are forming a cluster.
I used nslookup to ensure that the overlay network IP of a container running in node A is reachable by a container running in node B
docker node ls display the nodes correctly and able to schedule the containers instances successfully.
Docker Version: Docker version 17.03.1-ce, build c6d412e
OS: Ubuntu 16.04.2
Cloud: AWS (Port 7946 TCP/UDP and 4789 UDP are open)
JGroups: 3.6.6
Jgroups XML:
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:org:jgroups"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
<TCP bind_port="7600"
recv_buf_size="${tcp.recv_buf_size:5M}"
send_buf_size="${tcp.send_buf_size:5M}"
max_bundle_size="64K"
max_bundle_timeout="30"
use_send_queues="true"
sock_conn_timeout="300"
timer_type="new3"
timer.min_threads="4"
timer.max_threads="10"
timer.keep_alive_time="3000"
timer.queue_max_size="500"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="10000"
thread_pool.rejection_policy="discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="1"
oob_thread_pool.max_threads="8"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="discard"/>
<JDBC_PING connection_url="jdbc:mysql://${database.host:localhost}:3306/mydb"
connection_username="root"
connection_password="root"
connection_driver="com.mysql.jdbc.Driver"
initialize_sql="CREATE TABLE JGROUPSPING (
own_addr varchar(200) NOT NULL,
bind_addr varchar(200) NOT NULL,
created timestamp NOT NULL,
cluster_name varchar(200) NOT NULL,
ping_data varbinary(5000) DEFAULT NULL,
PRIMARY KEY (`own_addr`,`cluster_name`))"
insert_single_sql="INSERT INTO JGROUPSPING (own_addr, bind_addr, created, cluster_name, ping_data) values (?,
'${jgroups.bind_addr:127.0.0.1}',now(), ?, ?)"
delete_single_sql="DELETE FROM JGROUPSPING WHERE own_addr=? AND cluster_name=?"
select_all_pingdata_sql="SELECT ping_data FROM JGROUPSPING WHERE cluster_name=?"
/>
<MERGE3 min_interval="10000"
max_interval="30000"/>
<FD_SOCK/>
<FD timeout="3000" max_tries="3" />
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK2 use_mcast_xmit="false"
discard_delivered_msgs="true"/>
<UNICAST3 />
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="4M"/>
<pbcast.GMS print_local_addr="true" join_timeout="2000"
view_bundling="true"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
<RSVP resend_interval="2000" timeout="10000"/>
<pbcast.STATE_TRANSFER/>
Any help is appreciated
You probably need to use host networking (--net=host) or use external_addr in the transport in addition to bind_addr.
An alternative if you have DNS available is DNS_PING [2].
For details take a look at [1].
[1] https://github.com/belaban/jgroups-docker
[2] http://www.jgroups.org/manual4/index.html#_dns_ping
I've a docker container deployed with a memory restriction of 300M and CPU-1 Core.
When the container starts and my program executes, it is adhering the memory restriction to 300M and CPU to 1st CPU Core.
However the collectd running inside the container pushes the metrics of Memory and Swap memory of the actual box (16 GB RAM) instead of the restricted containers (300MB RAM).
Is there any configuration that I'm missing?
Docker run command:
docker run -e CONTAINER_NAME='sample_docker_container' -m 300M --memory-swap=300M --cpuset-cpus="1" --net=host --name=sample_docker -p 4000:4000 -p 4001:4001 -p 4002:4002 sample_docker
Graphite metrics:
As seen in the graph, the metrics are being pushed for > 300MB of RAM.
When I run a high performance code (which uses more than > 4 GB RAM) in my actual box then the collectd from within the container is also spiking up the RAM used.
So its not collecting and pushing metrics from inside docker container.
collectd.conf:
Hostname sample_docker_container
Interval 60
LoadPlugin logfile
<Plugin logfile>
LogLevel info
File STDOUT
Timestamp true
PrintSeverity false
</Plugin>
Include "/opt/comp/indis-docker/collectd/conf/collectd.d/*"
collectd_perf.conf:
LoadPlugin disk
LoadPlugin load
LoadPlugin memory
LoadPlugin swap
LoadPlugin vmem
LoadPlugin interface
<Plugin interface>
Interface "lo"
Interface "eth0"
Interface "eth1"
IgnoreSelected false
</Plugin>
LoadPlugin df
<Plugin df>
MountPoint "/dev"
MountPoint "/run"
MountPoint "/run/lock"
MountPoint "/run/shm"
IgnoreSelected true
ValuesPercentage true
</Plugin>
LoadPlugin cpu
LoadPlugin "aggregation"
<Plugin "aggregation">
<Aggregation>
Plugin "cpu"
Type "cpu"
SetPlugin "cpu"
SetPluginInstance "%{aggregation}"
GroupBy "Host"
GroupBy "TypeInstance"
CalculateMinimum true
CalculateMaximum true
CalculateAverage true
</Aggregation>
</Plugin>
LoadPlugin "match_regex"
PostCacheChain "Cpumetrics"
<Chain "Cpumetrics">
<Rule>
<Match "regex">
Plugin "^cpu$"
PluginInstance "^[0-9]+$"
</Match>
<Target write>
Plugin "aggregation"
</Target>
Target stop
</Rule>
Target "write"
</Chain>