How to install a working Intel Data Science Environment with Graphviz - machine-learning

Recently, I found a need to explore the Intel DAAL MKL for Data Science and was having difficulties finding the proper installations for a working environment in one location. After several days and trial and failures, I was able to reach a final installation process that I think would be beneficial to all the other Data Scientist enthusiast who are looking to get started with their Data Science adventures, utilizing Visual Studio Code or JupyterLab. Posted below are my recommended steps to get a working environment on Windows 10.

1. Download and install the latest version of Anaconda: https://www.anaconda.com/distribution/
2. Download and install the latest Graphviz installer from their Web site: https://graphviz.gitlab.io/download/ (in my case, version graphviz-2.38.msi was active)
a. Install graphviz msi for all users
b. Navigate to Environment Variables: https://t.ly/Gz359
c. Create a new environment path for the Graphviz (need two links to be added): (in my case it was: C:\Program Files (x86)\Graphviz2.38\bin\ and C:\Program Files (x86)\Graphviz2.38\bin\dot.exe
d. Close all command and environment windows
e. Open a new cmd window and test for the existence of Graphviz: c:\Users\MyDrive>dot -v
i. You should get a report of the version and other info (if it fails, check the environment path entry and possibly repair your Graphviz installation)
ii. Ctrl-C to close the report
iii. Close the cmd window
iv. Reboot your PC
3. Navigate to Anaconda Command Prompt as Administrator and remain in the (base) environment:
a. In Windows 10, Search for anaconda and select the anaconda command prompt:
i. Right-Click on it and select to run it as Administrator
b. Navigate to the root of the (base) environment:
i. cd\
c. Get a current list of existing environments:
i. conda env list
d. Remove any unwanted environments:
i. conda env remove -n OldenvironmentName
e. Create new desired environment for Intel Data Science (ids) with the most current Conda libraries, and supported Python version 3.x:
i. conda create -n ids python=3 numpy pandas seaborn matplotlib scikit-learn daal4py jupyterlab -y
f. Activate the new environment:
i. conda activate ids
g. Install Graphviz with pip:
i. pip install graphviz
h. Install python support for Graphviz:
i. conda install pydot python-graphviz -y
i. Check that dot is accessible via cmd prompt:
i. dot -v
j. Ctrl-C to close the report
k. Register the following for intel DataScience enhancements:
i. set USE_DAAL4PY_SKLEARN=YES
ii. python -c "import sklearn"
l. Reboot our PC
When you return to your Desktop, you will be ready to use your new environment for Data Science processes

Related

Why there is different versions of openCV packages on Anaconda website? Which one should I install? [duplicate]

When searching a package in Anaconda Cloud, there are often multiple commands one could use to install a package. For example,
conda install -c conda-forge xxx
conda install -c conda-forge/label/gcc7 xxx
conda install -c conda-forge/label/cf201901 xxx
What's the difference between them?
Labels
Channel maintainers have an option to add labels to their package builds. Anaconda Cloud suggests using labels as a tool for organizing the development cycle. What the labels mean is totally up to the channel maintainer, so there's no general answer that will cover it all. When a label isn't provided, then default main is assigned.
Only advanced users should ever need to use a label. Most users should simply use the default specification:
conda install -c conda-forge xxx
Example: gcc7
Let's look at a specific use case taken from your example. The gcc7 label is used by the Conda Forge channel maintainers to designate packages that have been compiled under a different toolchain than the packages they provide under their main tag. This gcc7 toolchain is designed to more closely match that which is used by the official channels (what you'd get from -c defaults) and thereby yield binaries that are compatible. You can read all about it in this issue on the Conda Forge repo.

Link hadoop installed with sdkman to brew

I installed hadoop with sdkman and now I'm trying to install Hive with homebrew but brew wants to install hadoop again because it doesn't know hadoop is already installed on my computer.
I use --ignore-dependencies flag as workaround but it's not a best practice.
Do you know how can I link my hadoop installation done with sdkman to brew?
It is not possible to use a non-Homebrew hadoop with Homebrew hive, see https://docs.brew.sh/Building-Against-Non-Homebrew-Dependencies
To improve quality and reduce variation, Homebrew now exclusively supports using the default formula, as an ordinary dependency, and no longer supports using arbitrary alternatives.
You will have to install Hive manually: https://cwiki.apache.org/confluence/display/hive/gettingstarted#GettingStarted-InstallingHivefromaStableRelease
Installing Hive from a Stable Release
Start by downloading the most recent stable release of Hive from one of the Apache download mirrors (see Hive Releases).
Next you need to unpack the tarball. This will result in the creation of a subdirectory named hive-x.y.z (where x.y.z is the release number):
$ tar -xzvf hive-x.y.z.tar.gz
Set the environment variable HIVE_HOME to point to the installation directory:
$ cd hive-x.y.z
$ export HIVE_HOME={{pwd}}
Finally, add $HIVE_HOME/bin to your PATH:
$ export PATH=$HIVE_HOME/bin:$PATH

Mamba installing a package into wrong environment

The background is, I'm responsible for maintaining a fancy Docker image that is used by our team for analytics. It uses a Jupyter notebook image as the base, and then adds various customisations, extra packages, etc.
One of the team members recently wanted to run Tensorflow. No problem, I'll just run mamba install and add it to the image. However, this created an issue: Tensorflow 2.4.3 (the latest version) is somehow incompatible with R 4.1.1 (also the latest version) or something else in the ecosystem, causing R to to be downgraded to 3.6.3. So I created a new environment and installed TF into that:
FROM hongooi/jupytermodelrisk:1.0.0
RUN mamba create -n tensorflow --clone base
# Make RUN commands use the new environment
RUN echo "conda activate tensorflow" >> ~/.bashrc
SHELL ["/bin/bash", "--login", "-c"]
RUN mamba install -y 'tensorflow=2.4.3'
But when I rebuilt the image, I found that while the tensorflow env had been created, the Tensorflow package had been installed into the base env, not the tensorflow env. Has anyone else encountered this? I can verify, if I login to the container, that the tensorflow env has been created: it just doesn't contain the Tensorflow package.
I don't get this problem if run the create, activate and install commands from inside the container. It's only when I try to do it in the Dockerfile.
I use mamba instead of conda because the latter takes forever to run, given the number of packages installed. In fact, trying to run conda install tensorflow crashes after ~5 hours.
Not an expert on dockerfiles, but in general you could just use the -n flag to the install command to specify the target environment for the installation like so:
mamba install -n tensorflow -y tensorflow=2.4.3

Plotly shows blank graphs in AWS Sagemaker JupyterLab

Background: I am new to the Python world and am using Plotly for creating basic graphs in Python. I am using AWS Sagemaker's JupyterLab for creating the python scripts.
Issue: I have been trying to run the basic codes mentioned on Plotly's website however even those are returning blank graphs.
Issue Resolution Tried by myself:
pip installed plotly version 4.6.0
Steps mentioned on https://plotly.com/python/getting-started/ for JupyterLab support have already been executed
Code Example:
import plotly.graph_objects as go
fig = go.Figure(data=go.Bar(y=[2, 3, 1]))
fig.show()
I recently had the same issue. Simple change suggested here helped me. I know this is a temporary workaround until a proper fix is found.
// fig = go.Figure()
fig = go.FigureWidget() // replace with this
// fig.show()
fig // remove .show()
Sagemaker notebook instances are using (As of Jan 2022), for some reason, jupyterlab==1.2.21. You can verify that by running pip freeze | grep lab from the terminal or !pip freeze | grep lab from a notebook.
According to the documentation, you'll need to install the following jupyterlab extensions (which are not needed if sagemaker was running jupyterlab 3):
jupyterlab-plotly
jupyter-widgets/jupyterlab-manager
You can install those on a up-and-running instance by running
jupyter labextension install jupyterlab-plotly#5.5.0 #jupyter-widgets/jupyterlab-manager in the terminal or notebook (using ! if you are running on the notebook ofcourse). Notice that the jupyterlab-plotly extension version (here 5.5.0) should match the plotly version you are installing. Mismatches my cause issues. In this case by plotly version is 5.5.0 and thus that's also the jupyterlab-plotly version I've installed.
If you need, like I did, to have it ready upon spinning up a notebook instance, you'll need to:
Create a lifecycle script
To it, add:
PATH=$PATH:/home/ec2-user/anaconda3/envs/JupyterSystemEnv/bin - To ensure nodejs path which is needed for the extension installation
pip install plotly==5.5.0 - To ensure a specific version
jupyter labextension install jupyterlab-plotly#5.5.0 #jupyter-widgets/jupyterlab-manager - To ensure same version
of coures, you can change the version according to the most up to date.
I think that documentation is not on par. You now need to install jupyterlab-plotly extension.
jupyter labextension install jupyterlab-plotly
UPDATE
I followed a mix of instructions here and here.
First Enable Extention manager from jupyter-lab
then from terminal
conda install -c conda-forge "nbformat" "ipywidgets>=7.5" -y
jupyter labextension install jupyterlab-plotly
jupyter labextension install #jupyter-widgets/jupyterlab-manager plotlywidget
And within your environment
conda install nbformat

How to install YugaByte on Docker for Windows

The instructions at https://docs.yugabyte.com/latest/quick-start/docker/install/ state that Docker for Windows is supported, however the yb-docker-ctl utility in the step that follows seems to be a *nix app and does not run on Windows 10 Pro. How do I install a 3-node local YugaByte cluster on Docker for Windows? (by the way StackOverflow would not let me add a YugaByte tag to the question, I could only add Docker)
The yb-docker-ctl utility is actually a Python2 script that will run on Windows 10 Pro if you have Python2 installed. I prefer to use Chocolately (https://chocolatey.org) to manage my package installations, so you could install python2 (not python -- as that will default to python3) using choco install python2 from PowerShell or CMD. You can also install wget in the same manner.
You will then need to a couple of changes to yb-docker-ctl. The script utilizes os.path.join which will utilize the Windows default of \\ for path separator. Add the line import posixpath after line 10 of yb-docker-ctl and substitute posixpath.join for os.path.join at lines 227 and 377.
After you have made those modifications, you can run python yb-docker-ctl create to create your 3 node cluster.

Resources