Cannot connect to GCP Memorystore from GCP Dataflow - google-cloud-dataflow

I'm attempting to use GCP Memorystore to handle session ids for a event streaming job running on GCP Dataflow. The job fails with a timeout when trying to connect to Memorystore:
redis.clients.jedis.exceptions.JedisConnectionException: Failed connecting to host 10.0.0.4:6379
at redis.clients.jedis.Connection.connect(Connection.java:207)
at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:101)
at redis.clients.jedis.Connection.sendCommand(Connection.java:126)
at redis.clients.jedis.Connection.sendCommand(Connection.java:117)
at redis.clients.jedis.Jedis.get(Jedis.java:155)
My Memorystore instance has these properties:
Version is 4.0
Authorized network is default-auto
Master is in us-central1-b. Replica is in us-central1-a.
Connection properties: IP address: 10.0.0.4, Port number: 6379
> gcloud redis instances list --region us-central1
INSTANCE_NAME VERSION REGION TIER SIZE_GB HOST PORT NETWORK RESERVED_IP STATUS CREATE_TIME
memorystore REDIS_4_0 us-central1 STANDARD_HA 1 10.0.0.4 6379 default-auto 10.0.0.0/29 READY 2019-07-15T11:43:14
My Dataflow job has these properties:
runner: org.apache.beam.runners.dataflow.DataflowRunner
zone: us-central1-b
network: default-auto
> gcloud dataflow jobs list
JOB_ID NAME TYPE CREATION_TIME STATE REGION
2019-06-17_02_01_36-3308621933676080017 eventflow Streaming 2019-06-17 09:01:37 Running us-central1
My "default" network could not be used since it is a legacy network, which Memorystore would not accept. I failed to find a way to upgrade the default network from legacy to auto and did not want to delete the existing default network since this would require messing with production services. Instead I created a new network "default-auto" of type auto, with the same firewall rules as the default network. The one I believe is relevant for my Dataflow job is this:
Name: default-auto-internal
Type: Ingress
Targets: Apply to all
Filters: IP ranges: 10.0.0.0/20
Protocols/ports:
tcp:0-65535
udp:0-65535
icmp
Action: Allow
Priority: 65534
I can connect to Memorystore using "telnet 10.0.0.4 6379" from a Compute Engine instance.
Things I have tried, which did not change anything:
- Switched Redis library, from Jedis 2.9.3 to Lettuce 5.1.7
- Deleted and re-created the Memorystore instance
Is Dataflow not supposed to be able to connect to Memorystore, or am I missing something?

Figured it out. I was trying to connect to Memorystore from code called directly from the main method of my Dataflow job. Connecting from code running in a Dataflow step worked. On second though (well, actually more like 1002nd thought) this makes sense because main() is running on the driver machine (my desktop in this case) whereas the steps of the Dataflow graph will run on GCP. I have confirmed this theory by connecting to Memorystore on localhost:6379 in my main(). This works since I have an SSH tunnel to Memorystore running on port 6379 (using this trick).

Related

Unable to export traces to OpenTelemetry Collector on Kubernetes

I am using the opentelemetry-ruby otlp exporter for auto instrumentation:
https://github.com/open-telemetry/opentelemetry-ruby/tree/main/exporter/otlp
The otel collector was installed as a daemonset:
https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector
I am trying to get the OpenTelemetry collector to collect traces from the Rails application. Both are running in the same cluster, but in different namespaces.
We have enabled auto-instrumentation in the app, but the rails logs are currently showing these errors:
E, [2022-04-05T22:37:47.838197 #6] ERROR -- : OpenTelemetry error: Unable to export 499 spans
I set the following env variables within the app:
OTEL_LOG_LEVEL=debug
OTEL_EXPORTER_OTLP_ENDPOINT=http://0.0.0.0:4318
I can't confirm that the application can communicate with the collector pods on this port.
Curling this address from the rails/ruby app returns "Connection Refused". However I am able to curl http://<OTEL_POD_IP>:4318 which returns 404 page not found.
From inside a pod:
# curl http://localhost:4318/
curl: (7) Failed to connect to localhost port 4318: Connection refused
# curl http://10.1.0.66:4318/
404 page not found
This helm chart created a daemonset but there is no service running. Is there some setting I need to enable to get this to work?
I confirmed that otel-collector is running on every node in the cluster and the daemonset has HostPort set to 4318.
The problem is with this setting:
OTEL_EXPORTER_OTLP_ENDPOINT=http://0.0.0.0:4318
Imagine your pod as a stripped out host itself. Localhost or 0.0.0.0 of your pod, and you don't have a collector deployed in your pod.
You need to use the address from your collector. I've checked the examples available at the shared repo and for agent-and-standalone and standalone-only you also have a k8s resource of type Service.
With that you can use the full service name (with namespace) to configure your environment variable.
Also, the Environment variable now is called OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, so you will need something like this:
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=<service-name>.<namespace>.svc.cluster.local:<service-port>
The correct solution is to use the Kubernetes Downward API to fetch the node IP address, which will allow you to export the traces directly to the daemonset pod within the same node:
containers:
- name: my-app
image: my-image
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://$(HOST_IP):4318
Note that using the deployment's service as the endpoint (<service-name>.<namespace>.svc.cluster.local) is incorrect, as it effectively bypasses the daemonset and sends the traces directly to the deployment, which makes the daemonset useless.

Port error when setting up Dev mode of Hyperledger Fabric

I'm setting up the development environment following the instructions on Hyperledger fabric's official website:
https://hyperledger-fabric.readthedocs.io/en/latest/peer-chaincode-devmode.html
I have started the orderer successfully using:
ORDERER_GENERAL_GENESISPROFILE=SampleDevModeSolo orderer
This command didn't work at first but it worked after I cd fabric/sampleconfig
2020-12-21 11:23:15.084 CST [orderer.common.server] Main -> INFO 009 Starting orderer: Version: 2.3.0 Commit SHA: dc2e59b3c Go version: go1.15.6 OS/Arch: darwin/amd64
2020-12-21 11:23:15.084 CST [orderer.common.server] Main -> INFO 00a Beginning to serve requests
but when I start the peer using:
export PATH=$(pwd)/build/bin:$PATH
export FABRIC_CFG_PATH=$(pwd)/sampleconfig
export FABRIC_LOGGING_SPEC=chaincode=debug
export CORE_PEER_CHAINCODELISTENADDRESS=0.0.0.0:7052
peer node start --peer-chaincodedev=true
An error is spotted:
FABRIC_LOGGING_SPEC=chaincode=debug
CORE_PEER_CHAINCODELISTENADDRESS=0.0.0.0:7052
peer node start --peer-chaincodedev=true
2020-12-21 11:25:13.047 CST [nodeCmd] serve -> INFO 001 Starting peer: Version: 2.3.0 Commit SHA: dc2e59b3c Go version: go1.15.6 OS/Arch: darwin/amd64 Chaincode: Base Docker Label: org.hyperledger.fabric Docker Namespace: hyperledger
2020-12-21 11:25:13.048 CST [peer] getLocalAddress -> INFO 002 Auto-detected peer address: 10.200.83.208:7051
2020-12-21 11:25:13.048 CST [peer] getLocalAddress -> INFO 003 Host is 0.0.0.0 , falling back to auto-detected address: 10.200.83.208:7051 Error: failed to initialize operations subsystem: listen tcp 127.0.0.1:9443: bind: address already in use
this is the error:
Error: failed to initialize operations subsystem: listen tcp 127.0.0.1:9443: bind: address already in use
I checked this issue and it seems this happens because the peer node is using the same port 9443 as the orderer node for the same service. How can I get the two nodes running separately? It seems the docker is running as well.
If you see your error, you can easily follow
Error: failed to initialize operations subsystem: listen tcp 127.0.0.1:9443: bind: address already in use
It is said that the 9443 port is already in use.
It seems that you are not running the orderer and peer as separate containers on the docker-based virtual network, but running on the host pc.
This eventually seems to conflict with two servers requesting one port 9443 on your pc.\
Referring to the configuration below of fabric-2.3/sampleconfig, you can see that each port 9443 is assigned to the server. Assigning one of them to the other port solves this.
fabric-2.3/sampleconfig/orderer.yaml
configuration of orderer
# orderer.yaml
...
Admin:
# host and port for the admin server
ListenAddress: 127.0.0.1:9443
...
fabric-2.3/sampleconfig/core.yaml
configuration of peer
# core.yaml
...
operations:
# host and port for the operations server
# listenAddress: 127.0.0.1:9443
listenAddress: 127.0.0.1:10443
...
This is not a direct answer to the port mapping / collision issue, but we've had great success using the new Kubernetes Test Network as a development platform running on a local system with a virtual Kubernetes cluster running in KIND (Kubernetes in Docker).
In this mode, applications can be developed using the Gateway client (exposed via a port forward or ingress), and smart contracts running As a Service can be launched either in the cluster OR run on the local host OS in a container, binary, or launched in a debugger.
The documentation for the development setup is still sparse, but we'd love to hear feedback on the overall approach, as it offers an exponentially better experience for working with a test network in a development context. In general the process of "port juggling" with Compose is no longer relevant when working on a local Kubernetes cluster. In this mode, you can run services on the host network, instructing peers/orderers/etc. to connect to the remote process running on the host OS.

How to fix "failed to ensure load balancer" error for nginx ingress

When setting a new nginx-ingress using helm and a static ip on Azure the nginx controller never gets the static IP assigned. It always says <pending>.
I install the helm chart as follows -
helm install stable/nginx-ingress --name <my-name> --namespace <my-namespace> --set controller.replicaCount=2 --set controller.service.loadBalancerIP="<static-ip-address>"
It says it installs correctly but there is an error listed as well
E0411 06:44:17.063913 13264 portforward.go:303] error copying from
remote stream to local connection: readfrom tcp4
127.0.0.1:57881->127.0.0.1:57886: write tcp4 127.0.0.1:57881->127.0.0.1:57886: wsasend: An established connection was aborted by the software in your host machine.
I then do a kubectl get all -n <my-namespace> and everything is listed correctly just with the external IP as <pending> for the controller.
I then do a kubectl describe -n <my-namespace> service/<my-name>-nginx-ingress-controller and this error is listed under Events -
Warning CreatingLoadBalancerFailed 11s (x4 over 47s)
service-controller Error creating load balancer (will retry): failed
to ensure load balancer for service
my-namespace/my-name-nginx-ingress-controller: timed out waiting for the
condition.
Thank you kindly
For your issue, the possible reason is that your public IP is not in the same resource group and region with the AKS cluster. See the steps in Create an ingress controller with a static public IP address in Azure Kubernetes Service (AKS).
You can get the AKS group through the CLI command like this:
az aks show --resource-group myResourceGroup --name myAKSCluster --query nodeResourceGroup -o tsv
When your public IP in a different group and region, then it will give the time out error as you.
Make sure that your ingress is in the node resource group, and also that the sku for the ingress is Basic not Standard

How to run a redis cluster on a docker cluster?

Context
I am trying to setup a redis cluster so that it runs on top off a docker cluster, to achieve maximum auto-healing.
More precisely, I have a docker compose file, which defines a service that has 3 replicas. Each service replica has a redis-server running on.
Then I have a program inside each replica that listens to changes on the docker cluster and that starts the cluster when conditions are met (each 3 redis-servers know each other).
Setting up the redis cluster works has expected, the cluster is formed and all the redis-servers communicate well, but the communication between redis-servers is inside the docker cluster.
The Problem
When I try to communicate from outside the docker cluster, because of the ingress mode I am able to talk to a redis-server, however when I try to add info (eg: set foo bar) and the client is moved to another redis-server the communication hangs and eventually times out.
Code
This is the docker-compose file.
version: "3.3"
services:
redis-cluster:
image: redis-srv-instance
volumes:
- /var/run/:/var/run
deploy:
mode: replicated
#endpoint_mode: dnsrr
replicas: 3
resources:
limits:
cpus: '0.5'
memory: 512M
ports:
- target: 6379
published: 30000
protocol: tcp
mode: ingress
The flux of commands that show the problem.
Client
~ ./redis-cli -c -p 30000
127.0.0.1:30000>
Redis-server
OK
1506533095.032738 [0 10.255.0.2:59700] "COMMAND"
1506533098.335858 [0 10.255.0.2:59700] "info"
Client
127.0.0.1:30000> set ghb fki
OK
Redis-server
1506533566.481334 [0 10.255.0.2:59718] "COMMAND"
1506533571.315238 [0 10.255.0.2:59718] "set" "ghb" "fki"
Client
127.0.0.1:30000> set rte fgh
-> Redirected to slot [3830] located at 10.0.0.3:6379
Could not connect to Redis at 10.0.0.3:6379: Operation timed out
Could not connect to Redis at 10.0.0.3:6379: Operation timed out
(150.31s)
not connected>
Any ideas? I have also tried making my one proxy/load balancer but didn't work.
Thank you! Have a nice day.
For this use case, sentinel might help. Redis on its own is not capably of high availability. Sentinel on the other side is a distributed system which can do the following for you:
Route the ingress trafic to the current Redis master.
Elect a new Redis master should the current one fail.
While I have previously done research on this topic, I have not yet managed to pull to getter a working example.
redis-cli would get the redis server ip inside the ingress network, and try to access the remote redis server by that ip directly. That is why redis-cli shows Redirected to slot [3830] located at 10.0.0.3:6379. But this internal 10.0.0.3 is not accessible to redis-cli.
One solution is to run another proxy service which attaches to the same network with redis cluster. The application sends all requests to that proxy service, and the proxy service talks with redis cluster.
Or you could create 3 swarm services that uses the bridge network and exposes the redis port to node. Your internal program needs to change accordingly.

Wildfly/Jboss-v10 is not working in cluster mode with docker swarm

I have my web based java application working in wildfly/jboss version 10.I am using docker(1.13.1-cs2) to deploy my application.Now as per some HA(High availability) scenario I want my application to work in cluster mode.So I changes my wildfly configuration to cluster mode inside my standalone-full-ha.xml.After this changes everything works perfect only if I use default docker network and start container with docker bridge network. But As per my requirement I want this whole container/my application to work as a service by docker swarm.But if I start put this as a service than wildfly/jboss is not be able to start in cluster mode and throwing error like this :
21:01:27,885 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (38 bytes): java.io.IOException: Operation not permitted, headers: NAKACK2: [HIGHEST_SEQNO, seqno=2631], TP: [cluster_name=ee]
21:01:28,826 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (4166 bytes): java.io.IOException: Operation not permitted, headers: FORK: ee:activemq-cluster, NAKACK2: [MSG, seqno=2632], TP: [cluster_name=ee]
21:01:29,886 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (38 bytes): java.io.IOException: Operation not permitted, headers: NAKACK2: [HIGHEST_SEQNO, seqno=2632], TP: [cluster_name=ee]
21:01:30,826 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (4166 bytes): java.io.IOException: Operation not permitted, headers: FORK: ee:activemq-cluster, NAKACK2: [MSG, seqno=2633], TP: [cluster_name=ee]
Note: I am using default swarm ingress network for port expose and communication.
As per my troubleshooting this issue is related to multicast address used by wildfly/jboss version 10 creating the issue.
I have also tried these steps multicast address in docker
But it is still not help in my case.Can anyone help me in this ?it is appreciated very much!
Thank you!
The overlay network from Docker Swarm does currently not support IP multicast.
You can either fallback to TCP based unicast for your cluster. But that leaves the challenge to know the IP addresses of all other containers in the service.
Another way is to create a macvlan based network which supports unicast. Tutorial: http://collabnix.com/docker-17-06-swarm-mode-now-with-macvlan-support/
With that variant I have the problem that as soon as you connect such a network to a container ingress (routing mesh) and access to the ouside world via docker_gwbridge stops working (details: Docker Swarm container with MACVLAN network gets wrong gateway - no internet access)

Resources