Google Cloud Platform DataFlow workers IP addresses - google-cloud-dataflow

Is it possible to know what range of external IP the DataFlow workers on GCP are using? The goal is to set-up some kind of IP filtering on an external service, so that only our DataFlow jobs running on GCP can access the service.

The best solution would be to upgrade so that you can use SSL or other mechanisms of strong authentication.
You can use the --network= option to control the GCE Network that the worker VMs are assigned to. Take a look at the GCE docs on networking for details on how to set up a VPN (like the comment from Elmar suggested). You could also look at setting up a single machine in the network with a static, external IP and using it as a proxy for the other VMs in the network.
This is not a use pattern we have tested, so there may be issues with latency or throughput of traffic through the proxy/VPN. You will likely need to be careful to only send your traffic through this proxy so that you don’t accidentally hijack the traffic used by each worker to communicate with the Dataflow service.

Related

VPC access connector GCP - Cloudrun Services and AlloyDB different Regions

Quick Question, i am trying to configure a cloudrun service to be connected using AlloyDB on GCP.
The problem here is AlloyDB is in a different region than the others services, in this case central1, and services east1.
Is there any way to do the pairing between them?
Thanks in advance,
There is no connectivity issue. You use a serverless VPC connector to bridge the serverless world (where your Cloud Run live) with your VPC. Therefore, with default configuration, all the traffic going to a private IP will arrive in your VPC.
Then you have your AlloyDB peered with your VPC also. Because the VPC is global, as long as you are in the VPC (AlloyDB or Cloud Run), any service can reach any resources, whatever their location.
In fact, your main concern should be the latency and the egress cost.

Using traefik for docker internal traffic via websockets

I'm using docker in swarm mode for the services in my application and traefik to handle, well, the traffic. My goal is to make a separate service for each API section my application has (so for example requests on domain.com/api/foo_api go to the foo_api service and requests on domain.com/api/bar_api go to the bar_api service.
Now all this is pretty straightforward with traefik. However, I'm also using the API services with other internal services not related to the API. They use a websocket connection to the internal docker URL, so currently it's ws://api:api_port/ws. However, if I split up the API part I'd need something like ws://foo_api:foo_api_port/ws which obviously leaves the service only access to the foo_api, not every other one.
So my question is: Can I route this websocket traffic with traefik similiar to how I do it externally, but internally in the docker net?
Traefik is a north-south reverse proxy. Most people historically in traditional infrastructure would use NGINX or Apache to address inbound - good to see you using a more modern tool. What you are describing is an east-west pattern of communication inside your firewall behind traefik (assuming you control all ingress through traefik).
Have you considered using service discovery and registry capabilities with tools like Hashicorp Consul - https://consul.io?
The idea of having service discovery is so that your containers / services inside the swarm can be discovered and made available through the registry and referenced in proximation to each other by name without the pains of manual labor in building and maintaining complicated name-IP-lookups. Most understand this historically in a more persistent model behind DNS SRV which requires external query. Consul can still support that legacy reference integration as well.
This site might help you along: https://attx-project.github.io/Consul-for-Service-Discovery-on-Docker-Swarm.html
They appear to have addressed a similar case to yours. And the work is likely reusable with a few tweaks.

How can I integrate my application with Kubernetes cluster running Docker containers?

This is more of a research question. If it does not meet the standards of SO, please let me know and I will ask elsewhere.
I am new to Kubernetes and have a few basic questions. I have read a lot of doc on the internet and was hoping someone can help answer few basic questions.
I am trying to create an integration with Kubernetes (user applications running inside Docker containers to be precise) and my application that would act as a backup for certain data in the containers.
My application currently runs in AWS. Would the Kube cluster need to run in AWS as well ? Or can it run in any cloud service or even on-prem as long as the APIs are available ?
My application needs to know the IP of the Master node API server to do POST/GET requests and nothing else ?
For authentication, can I use AD (my application uses AD today for a few things). That would also give me Role based policies for each user. Or do I have to use the Kube Token Reviewer API for authentication always ?
Would the applications running in Kubernetes use the APIs I provide to communicate with my application ?
Would my application use POST/GET to communicate with the Kube Master API server ? Do I need to use kubectl for this and above #4 ?
Thanks for your help.
Your application needn't exist on the same server as k8s. There are several ways to connect to k8s cluster, depending on your use case. Either you can expose the built-in k8s API using kubectl proxy, connect directly to the k8s API on the master, or you can expose services via load balancer or node port.
You would only need to know the IP for the master node if you're connecting to the cluster directly through the built-in k8s API, but in most cases you should only be using this API to internally administer your cluster. The preferred way of accessing k8s pods is to expose them via load balancer, which allows you to access a service on any node from a single IP. k8s also allows you to access a service with a nodePort from any k8s node (except the master) through a preassigned port.
TokenReview is only one of the k8s auth strategies. I don't know anything about Active Directory auth, but at a glance OpenID connect tokens seem to support it. You should review whether or not you need to allow users direct access to the k8s API at all. Consider exposing services via LoadBalancer instead.
I'm not sure what you mean by this, but if you deploy your APIs as k8s deployments you can expose their endpoints through services to communicate with your external application however you like.
Again, the preferred way to communicate with k8s pods from external applications is through services exposed as load balancers, not through the built-in API on the k8s master. In the case of services, it's up to the underlying API to decide which kinds of requests it wants to accept.

How to point streaming Dataflow at internal service?

I am running a private service (e.g. Redis) on my Cloud Network, and I would like to access it from my streaming Dataflow job. Is there a good way to configure my job so that, if I need to update the IP address(es) for the private service, I don't need to modify the Dataflow job?
You can add a layer of indirection to this setup using Global Load Balancing with Single Anycast IP (https://cloud.google.com/load-balancing/). With internal load balancing, you can configure this target without exposing it to the Internet.

Does JIRA work on Google Compute Engine VM

Is JIRA supported in GCE? If so, how to make it work?
We have installed 64-bit .bin of JIRA(6.4.1), and opened necessary custom http ports under Networks.
Started JIRA as service, but unable to see it work via browser. No error message than, timed out error!
Any help would be highly appreciated.
Note: We are new to Google Cloud Platform.
Did you enable the http and https services on your instance ? By default the GCE instance does not allow Http and Https traffic, you have to do it manually.
The Jira configuration for Google Compute Engine can be tricky. You need to make sure that:
The firewall rules under Netowrking allows a connection to Jira HTTP port or the HTTP enables in VM properties
The global Networking rules allow TCP traffic on this port
The virtual network have routes configured
If you use Apache as proxy for Jira (recommended) then make sure Apache is configured to point to the Tomcat port
Your Tomcat is configured
You have enabled port allocation using setcap utility
Your local machine firewall enables the connection (in Red Hat ipconfig is enabled by default and blocks the connections)
As you can see it may be tricky to install Jira on Google Cloud. It may be a good idea to use a deployment service like Deploy4Me to do this quickly and automatically.

Resources