How to run fpg algorithm of mahout on hortonworks? - mahout

I utilized hortonworks to run job of big data.
How to run the fpg-growth algorithm of mahout on hortonworks?
Please, Anyone introduce.

Mahout is deprecated. Try Spark MLlib
;-)

Related

CPU and GPU Stress test for Nvidia Jetson Xavier (Ubuntu 18.04, Jetpack 4.6)

How can I make a simultaneous CPU and GPU stress test on Jetson Xavier machine (Ubuntu 18.04, Jetpack 4.6)?
The only code found is
https://github.com/JTHibbard/Xavier_AGX_Stress_Test with tough package incompatibility issues. It only works for CPU.
Anyone can contribute with providing another code or solve the issue with the mentioned one? A python code is preferred.
Solution found. For CPU stress test, the above link works. It needs numba package to be installed. For GPU stress test, the samples in cuda folder of the Nvidia Jetson machines can be simply and efficiently used. The samples are in the /usr/local/cuda/samples. Choose one and compile it using sudo make. The compiled test file will be accessible in /usr/local/cuda/samples/bin/aarch64/linux/release (aarch64 may differ in different architectures). Run the test and check the performances using sudo jtop in another command line.

Cannot infer on Movidius (NCS2) using OpenVINO Workbench through Docker: Drivers setup failed?

I am trying to run some inferences using the OpenVINO Workbench Docker image https://hub.docker.com/r/openvino/workbench . Everything works well using my CPU as targeted device (Configuration -> Select Environment). But I get the following error when I select my Intel Movidius Myriad X VPU (a Neural Compute Stick 2):
"Cannot infer this model on Intel(R) Movidius(TM) Neural Compute Stick 2 (NCS 2). Possible causes: Drivers setup failed. Update the drivers or run inference on a CPU." (cf attached screenshot).
I did not change the start_workbench.sh script. Here are my execution params:
./start_workbench.sh -IMAGE_NAME openvino/workbench -TAG latest -ENABLE_MYRIAD -DETACHED -ASSETS_DIR /hdd-raid0/openvino_workbench
However, I can play with the NCS2 using the classification or cross check commands provided by https://hub.docker.com/r/openvino/ubuntu18_dev.
Any idea ?
Thxxxx!
This is how you can use a Docker* Image for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs: https://docs.openvinotoolkit.org/latest/openvino_docs_install_guides_installing_openvino_docker_linux.html
Kindly navigate to the specific topic. You will found that there are a few additional steps to be done before NCS2 can be used with Docker.

Compile Tensorflow from source with Docker to get CPU speed up

I am looking for a way to set up or modify an existing Docker image for installing tensorflow that will install it such that the SSE4, AVX, AVX2, and FMA instructions can be utilized for CPU speed up. So far I have found how to install from source using bazel How to Compile Tensorflow... and CPU instructions not compiled.... Neither of these explain how to do this within Docker. So I think what I am looking for is what you need to add to an existing docker image that installs without these options so that you can get a compile version of tensorflow with the CPU options enabled. The existing docker images do not do this because they want the image to run on as many machines as possible. I am using Ubuntu 14.04 on linux PC. I am new to docker but have installed tensorflow and have it working without getting the CPU warnings I get when I use the docker images. I may not need this for speed, but I have seen posts that claim the speed up can be significant. I searched for existing docker images that do this and could not find anything. I need this to work with gpu so needs to be compatible with nvidia-docker.
I just found this docker support for bazel and it might provide an answer, however I do not understand it well enough to know for sure. I believe this is saying that you can not build tensorflow with bazel inside a Dockerfile. You have to build a Dockerfile using bazel. Is my understanding correct and is this the only way to get a docker image with tensorflow compiled from source? If so, I could still use help in how to do it and still get the other dependencies that I would get if using an existing docker image for tensorflow.
Dockerfiles that build with CPU support can be found here.
Hope that helps! Spent many a late night here on Stack Overflow and Github Issues and stuff. Now it's my turn to give back! :)
The GPU stuff in particular is really hairy - especially when enabling the XLA/JIT/AOT stuff as well as the Graph Transform Tools.
Lots of hacks embedded in my Dockerfiles. Feel free to review and ask me questions!
The contributing guidelines mention building TensorFlow from source with Docker to run the unit tests:
Refer to the
CPU-only developer Dockerfile and
GPU developer Dockerfile
for the required packages. Alternatively, use the said
Docker images, e.g.,
tensorflow/tensorflow:nightly-devel and tensorflow/tensorflow:nightly-devel-gpu
for development to avoid installing the packages directly on your system.

What is the difference between Apache Mahout and PredictionIO?

What are the differences in their usage and the main reason for development of PredictionIO?
From wikipedia:
Apache Mahout is a project of the Apache Software Foundation to
produce free implementations of distributed or otherwise scalable
machine learning algorithms focused primarily in the areas of
collaborative filtering, clustering and classification
From predictionio website:
PredictionIO is an open source Machine Learning Server built on top of
state-of-the-art open source stack for developers and data scientists
create predictive engines for any machine learning task. It is as a
full machine learning stack, bundled with Apache Spark, MLlib, HBase,
Spray and Elasticsearch, which simplifies and accelerates scalable
machine learning infrastructure management.
Apache Mahout is used to implement machine learning algorithms in a hadoop-based environment.
Predictionio is a full tech stack used to bring machine learning to a production environment. With predictionio, you can more easily build, train, deploy algorithms. It comes with an http server and database backend. PredictionIO actually used to be built on top of Apache Mahout but switched to Apache Spark.
related:
https://www.quora.com/What-is-the-difference-between-Prediction-io-and-apache-mahout

Profiling Google Cloud Dataflow job

What would be the best way to profile a dataflow job if the scale does not permit doing so locally?
In the past, I tried using jstack to check what the Java threads are doing on the worker instances, but that doesn't seem to work for anymore.
Of course I can use stopwatches and log the measured timing data, but I was hoping maybe there is a better way.
Update: The instructions here still seem to work, with the only difference that instead of installing java with apt-get install openjdk-7-jdk, I had to download it from Oracle's site.
Thanks,
GB
As mentioned in the question, you can install jstack if you install the JDK.
We have a Github issue tracking the need for user-code profiling -- check there for progress.

Resources