Custom VM images for Google Cloud Dataflow workers - google-cloud-dataflow

Having skimmed the Google Cloud Dataflow documentation, my impression is that worker VMs run a specific predefined Python 2.7 environment without any option to change that. Is it possible to provide a custom VM image for the workers (built with libraries, external commands that the particular application needs). Is it possible to run Python 3 on Gcloud Dataflow?

2021 Update
As of today, the answer to both of this questions is YES.
Python 3 is supported on Dataflow.
Custom container images are supported on Dataflow, see this SO answer, and this docs page.
Is it possible to provide a custom VM image for the workers (built with libraries, external commands that the particular application needs). Is it possible to run Python 3 on Gcloud Dataflow?
No and no to both questions. You're able to configure Compute Engine instance machine type and disk size for a Dataflow job, but you're not able to configure things like installed applications. Currently, Apache Beam does not support Python 3.x.
References:
https://cloud.google.com/dataflow/pipelines/specifying-exec-params
https://issues.apache.org/jira/browse/BEAM-1251
https://beam.apache.org/get-started/quickstart-py/

Python 3 support in to Apache Beam status:
https://beam.apache.org/roadmap/python-sdk/#python-3-support

You cannot provide a custom VM image for the workers, but you can provide a setup.py file to run custom commands and install libraries.
You can find more info about the setup.py file here:
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/

Custom containers are now supported on Dataflow.

Related

Is IntelliJ's support for Dockerized Python environments compatible with Python running on a Windows container?

My Python project is very windows-centric, we want the benefits of containers but we can't give up Windows just yet.
I'd like to be able to use the Dockerized remote python interpreter feature that comes with IntelliJ. This works flawlessly with Python running on a standard Linux container, but appears to work not at all for Python running on a Windows container.
I've built a new image based on a standard Microsoft Server core image. I've installed Miniconda, bootstrapped a Python environment and verified that I can start an interactive Python session from the command prompt.
Whenever I try to set this up I get an error message: "Can't retrieve image ID from build stream". This occurs at the moment when IntelliJ would have normally detected the python interpreter and it's installed libraries.
I also tried giving the full path for the interpreter: c:\miniconda\envs\htp\python.exe
I've never seen any mention that this works in the documentation, but nor have I seen any mention that it does not work. I totally accept that Windows Containers are an oddity, so it's entirely possible that IntelliJ's remote-Python feature was never tested on Python running in Windows containers.
So, has anybody got this feature working with Python running on a Windows container yet? Is there any reason to believe that it does or does not work?
Regrettably, it is not supported yet. Please vote for the feature request https://youtrack.jetbrains.com/issue/PY-45222 in order to increase its priority.

What is the difference between Flask and Docker?

I am lost in these words. Watched a lot of videos and articles. But I am not able to understand the difference. Flask helps me to create web interface which can be used only on my local system. But what does docker do? Does it make the application visible to world with url? Requesting answer in very simple words..
Flask is a web framework in that it provides an API for the python language with which web applications can be built, such as a website or a backend service.
Docker is a containerization tool which deals with the deployment of applications and the environment in which they run. Docker provides a lightweight alternative to Virtual Machines - a lightweight software environment in which an application can run independently, with dependencies handled by Docker. Docker environments can vary in operating system, the programming language of the application being deployed, and more.
I may, for example, build a web application in python using libraries provided by the flask API, then deploy the application on a server in a docker environment running a Windows Operating system.

Step by Step Setup Guide to Neo4j Mazerunner in Windows

I would like to use the Spark-graphX packages available to Neo4j through Mazerunner, however I am an analyst and not a software person. I am running Windows 7 on my laptop and Neo4j 2.3.0, and would like a step-by-step guide explaining how I can set-up Mazerunner for both Community & Enterprise. There's a lot of mention of dockers and containers, and I have no idea what these are, or how to set them up. Simple instructions would be of sooo much help! :)
Docker is primarily Operating System Level Visualization technology designed to run on Unix based systems (Linux,Mac,FreeBSD). Luckily Docker provides a Windows version that sort of does the same thing on Unix.
What happens is, after you have installed Docker, it allows you to run what they call containers which are basically virtual machines on top of your host (Windows 7 Running Docker). This allows you to run services like Neo4j in an isolated environment. Docker also allows you to download and install pre-configured, pre-compiled images of operating systems that usually provide some sort of service or have some software pre-installed.
In your case, I believe all you have to do is:
First install Docker
Use "Docker Compose" to download and install the images.
Continue Reading the Tutorial as you have now installed the required docker images
Note: Some of the operations, like the one in Step 2 will require command-line access and Also the creation of a "docker-compose.yml" so, be sure to visit all the links I have provided. Spend a little time going through them and you should be alright.
PS: great blog. definitely bookmarking it!

JMX Monitoring using jboss-cli

We have an application which used JBoss 4.2.3.GA and we are migrating it to WildFly 8.2. In the old implementation, the JMX monitoring was done using twiddle. Since twiddle doesn't exist in WildFly, we are using the JBoss CLI for JMX monitoring.
Is it the right approach to use the JBoss CLI for JMX monitoring? Are there any command line tools similar to twiddle which can be used for JMX monitoring in WildFly?
One option to get something similar would be to simply query the JMX MBeans programmatically yourself. The advantage here is that your solution can be reused without depending on things like Twiddle which may be discontinued and also are compatible with other app servers.
Here is an example using Groovy to query an MBean in Tomcat and here is an example using Java to query an MBean in ActiveMQ.
If you choose to go with Groovy, you should be aware that there is a way to use Groovy (or Javascript or Python) to wrap the CLI and have more of a control flow. The CLI is great for simple declarative things, but lacks the versatility of a proper scripting language.
If you want to use pure CLI, then that's fine too, but I would suggest you create files which you can then call through bash e.g.:
$JBOSS_HOME/bin/jboss-cli.sh -c --file="my-jvm-monitoring.cli"
You might find this CLI model reference useful and also this blog about monitoring WildFly with the CLI

Does Google Cloud support neo4j?

I'm a beginner of databases and I want to deploy neo4j on Google Cloud Platform.
I can find something about deploying MongoDB on Google, but nothing about neo4j.
So I wonder does Google Cloud support neo4j?
Thanks!
Neo4j is an open source project that could run your own Linux machine.
You could just create a Google Compute Engine instance, and follow the tutorials on the web to setup your Neo4j.
like this one: Neo4j setup instruction
just follow the Linux part, and I suggest you to use Debian image to create your instance for Neo4j, because the command-line tool on Debian is most like to Ubuntu's one.
Updated answer from 2018.
Yes -- neo4j supports Google Cloud. Instructions can be found on their website. You can use a pre-built image and launch a single node instance, or multi-node clusters on GCP.

Resources