Cloud dataflow job using Internal IP? - google-cloud-dataflow

How do I configure to run my Cloud dataflow job using Internal IP?
Our policy doesn't allow to use external IP to spawn the workers. So, looking for options that would disallow external IP. I ran and got the below error.
Startup of the worker pool in zone XXX failed to bring up any of the desired 1 workers. Please check for errors in your job parameters, check quota, and retry later, or please try in a different zone/region.
Add instance projects to use external IP with it.

You can use the --usePublicIps=false flag. Here you can look at some examples.

looks like they updated flags
now it's
--no_use_public_ips or --use_public_ips
PS: Python

Related

How to config dataflow Pipeline to use a Shared VPC?

I know we have configuration arguments where you can specify network and subnets, I tried doing that but with a Shared VPC network, it gives me this error.
The usage of subnetworks in Cloud Dataflow require to specify the subnetwork parameter when running the pipeline; However, in the case of subnetwork that are located in a Shared VPC network, it is required to use the complete URL based on the following format:
https://www.googleapis.com/compute/v1/projects/<HOST_PROJECT>/regions/<REGION>/subnetworks/<SUBNETWORK>
Additionally, verify you are adding the project's Dataflow service account into the Shared VPC's project IAM table and give it the "Compute Network User" role permission in order to ensure that the service has the required access scope.
You can take a look on the Subnetwork parameter official Google's documentation which contains detailed information about this matter.
Be sure to include the Project ID in the --subnetwork option:
/projects/<PROJECT_ID>/regions/<REGION>/subnetworks/<SUBNETWORK>
and give to the Dataflow Service account the Network User role in the host project, which is what I suspect is going on according to the error message.

How to add health check for python code in docker container

I have just started exploring Health Check feature in docker. All the tutorials online are showing same type of health check examples. Like this link1 link2. They are using this same command:
HEALTHCHECK CMD curl --fail http://localhost:3000/ || exit 1
I have a python code which I have converted into docker image and its container is running fine. I have service in container which runs fine but I want to put a health check on this service. It is started/stopped using :
service <myservice> start
service <myservice> stop
This service is responsible to send data to server. I need to put a health check on this but don't know how to do it. I have searched for this and didn't found any examples. Can anyone please point me to the right link or can explain it.?
Thanks
The health check command is not something magical, but rather something you can automate to get a better status on your service.
Some questions you should ask yourself before setting the healthcheck:
How would i normally verify that the service is running ok, assuming i'm running it normally instead of inside of a container and it's not an automated process, but rather i check the status doing something myself
If the service has no open ports it can be interrogated on, does it rather write it's success/failure status on disk inside a file?
If the service has open ports but it communicates on a custom protocol, do i have any tools that i use to interrogate the open ports
Let's take the curl command you listed: It implies that the healthcheck listed is monitoring a http service started on port 3000. The curl command will fail if the http status code returned is not 200. That's pretty straight forward to demonstrate the health check usage.
Assuming you write success or failure to a file every 30 seconds from your service then your healthcheck would be a script that exits abnormally when encountering the failure text
Assuming that your service has an open port but is communicating via some custom protocol like protocol buffers, then all you have to do is call it with a script that encodes a payload with proto buf then checks the output received
And so on...

Why does Jenkins keep telling me it is offline?

I am not getting option to install plugins in Jenkins. Instead getting two options Configure Proxy and Skip Plugin Installations.
Might be worth checking - I did a mistake myself and spent a day checking.
Just mention the IP, and not the complete address in Server field in jenkins while configuring proxy.
So, let us suppose your proxy is http://x.x.x.x:8080 - so just put x.x.x.x in server field.
Navigate to C:\Windows\System32\config\systemprofile\AppData\Local\Jenkins.jenkins,
Modify "hudson.model.UpdateCenter.xml" file by changing the URL property to "http"
Finally Open CMD with Admin privilege and run.
net stop jenkins
net start jenkins

How to point streaming Dataflow at internal service?

I am running a private service (e.g. Redis) on my Cloud Network, and I would like to access it from my streaming Dataflow job. Is there a good way to configure my job so that, if I need to update the IP address(es) for the private service, I don't need to modify the Dataflow job?
You can add a layer of indirection to this setup using Global Load Balancing with Single Anycast IP (https://cloud.google.com/load-balancing/). With internal load balancing, you can configure this target without exposing it to the Internet.

Google Cloud Platform DataFlow workers IP addresses

Is it possible to know what range of external IP the DataFlow workers on GCP are using? The goal is to set-up some kind of IP filtering on an external service, so that only our DataFlow jobs running on GCP can access the service.
The best solution would be to upgrade so that you can use SSL or other mechanisms of strong authentication.
You can use the --network= option to control the GCE Network that the worker VMs are assigned to. Take a look at the GCE docs on networking for details on how to set up a VPN (like the comment from Elmar suggested). You could also look at setting up a single machine in the network with a static, external IP and using it as a proxy for the other VMs in the network.
This is not a use pattern we have tested, so there may be issues with latency or throughput of traffic through the proxy/VPN. You will likely need to be careful to only send your traffic through this proxy so that you don’t accidentally hijack the traffic used by each worker to communicate with the Dataflow service.

Resources