How to access scala shell using docker image for spark - docker

I just downloaded this docker image to set up a spark cluster with two worker nodes. Cluster is up and running however I want to submit my scala file to this cluster. I am not able to start spark-shell in this.
When I was using another docker image, I was able to start it using spark-shell.
Can someone please explain if I need to install scala separately in the image or there is a different way to start
UPDATE
Here is the error bash: spark-shell: command not found
bash: spark-shell: command not found
root#a7b0682ff17d:/opt/spark# ls /home/shangupta/Scripts/
ProfileData.json demo.scala queries.scala
TestDataGeneration.sql input.scala
root#a7b0682ff17d:/opt/spark# spark-shell /home/shangupta/Scripts/input.scala
bash: spark-shell: command not found
root#a7b0682ff17d:/opt/spark#

You're getting command not found because PATH isn't correctly established
Use the absolute path /opt/spark/bin/spark-shell
Also, I'd suggest packaging your Scala project as an uber jar to submit unless you have no external dependencies or like to add --packages/--jars manually

Related

Run !docker build from Managed Notebook cell in GCP Vertex AI Workbench

I am trying to push a docker image on Google Cloud Platform container registry to define a custom training job directly inside a notebook.
After having prepared the correct Dockerfile and the URI where to push the image that contains my train.py script, I try to push the image directly in a notebook cell.
The exact command I try to execute is: !docker build ./ -t $IMAGE_URI, where IMAGE_URI is the environmental variable previously defined. However I try to run this command I get the error: /bin/bash: docker: command not found. I also tried to execute it with the magic cell %%bash, importing the subprocess library and also execute the command stored in a .sh file.
Unfortunately none of the above solutions work, they all return the same command not found error with code 127.
If instead I run the command from a bash present in the Jupyterlab it works fine as expected.
Is there any workaround to make the push execute inside the jupyter notebook? I was trying to keep the whole custom training process inside the same notebook.
If you follow this guide to create a user-managed notebook from Vertex AI workbench and select Python 3, then it comes with Docker available.
So you will be able to use Docker commands such as ! docker build . inside the user-managed notebook.
Example:

Deploying cgal docker

I'm trying to deploy the official CGAL docker. From reading the README I understand that after downloading the specific image (e.g I want to open a docker with ubuntu16+CGAL and all of it's dependencies) using the following command:
docker pull cgal/testsuite-docker:ubuntu # get a specific image by replacing TAG with some tag
I need to install the cgal library itself using the
./test_cgal.py --user **** --passwd **** --images cgal-testsuite/ubuntu
The thing is that eventually I want to start the docker with an interactive shell, i.e
docker run --rm -it -v $(pwd):/source somedocker
And I couldn't understand where is the generated image, after the CGAL installation script.
Those images are not for running CGAL. They are only images we use to define an environment for our testsuite, and run tests in it, including compiling CGAL.
test_cgal.py will download the integration branch, which is rarely working as it is the branch in which we merge our PR to test them nightly. Don't use this to get a working CGAL. To my knowledge, there is no such image as the one you are looking for. No official one anyways.
Furthermore, installing cgal at runtime in this image will not modify the image, once you close the container your installation will be lost. You need to specify how to install CGA in the Dockerfile of your image and
then build it if you want a "ready to use" image.
You can use the dockerfile of the image you found to write your own, as there should be all the dependencies specified in it, but you need to edit it to download CGAL and maybe build it if you don't want the header-only version. This is not done in test-cgal.py or anywhere in this docker repository.

Dockerd with Chocolatey

I am using Chocolatey to install Docker.
When I originally run the following command:
choco install docker
and try to run the "docker --version" command, everything goes as expected.
Docker version 17.10.0-ce, build f4ffd25
When I try to run "dockerd" command, it shows as not being part of my path.
'dockerd' is not recognized as an internal or external command,
Looking at the PATH variable, and navigating to where Chocolatey stores the executables, dockerd.exe is not present while docker.exe is. Am I missing something in instructing Chocolatey in adding dockerd?
The reason I need the dockerd executable is so that I can limit the number of concurrent downloads, as shown in the Docker documentation.
This is a decision that the package maintainer(s) for Docker have made. If you have a look here:
https://chocolatey.org/packages/docker#files
You will see that there is a dockerd.exe.ignore file. This file is used to instruct Chocolatey to explicitly not create what is referred to as a shim file, which would make it work from the command line, in the same way as Docker does.
My best suggestion would be to reach out to the maintainers of that package to ask them why this was done, and to perhaps get it changed. You can do this by clicking on the Contact Maintainers link on this page:
https://chocolatey.org/packages/docker
As a workaround, you could add the following path to your Windows PATH environment variable:
C:\ProgramData\chocolatey\lib\docker\tools\docker
Which would allow it to work.

How to run Bazel container images on OSX?

According to the documentation at bazelbuild/rules_docker, it should be possible to work with these container images on OSX, and it also claims that it's possible to do so without docker.
These rules do not require / use Docker for pulling, building, or pushing images. This means:
They can be used to develop Docker containers on Windows / OSX without boot2docker or docker-machine installed.
They do not require root access on your workstation.
How do I do that? Here's a simple rule:
go_image(
name = "helloworld_image",
importpath = "github.com/nictuku/helloworld",
library = ":go_default_library",
visibility = ["//visibility:public"],
)
I can build the image with bazel build :helloworld_image. It produces a tar ball in blaze-bin, but it won't run it:
INFO: Running command line: bazel-bin/helloworld_image
Loaded image ID: sha256:08d312b529d30431c68741fd3a31468a02533f27a8c2c29eedc969dae5a39852
Tagging 08d312b529d30431c68741fd3a31468a02533f27a8c2c29eedc969dae5a39852 as bazel:helloworld_image
standard_init_linux.go:185: exec user process caused "exec format error"
ERROR: Non-zero return code '1' from command: Process exited with status 1.
It's trying to run the linux this is OSX, which is silly.
I also tried doing a "docker load" on the .tar content but it doesn't seem to like that format.
$ docker load -i bazel-bin/helloworld_image-layer.tar
open /var/lib/docker/tmp/docker-import-330829602/app/json: no such file or directory
Help? Thanks!
You are building for your host platform by default so you need to build for the container platform if you want to do that.
Since you are using a go binary, you can do cross compilation by specifying --cpu=k8 on the command line. Ideally we would be able to just say that the docker image needs a linux binary (so no need to specify the --cpu command-line flag) but this is still a work in progress in Bazel.

"docker-compose up" fails with error

I want to work on a project, but I need to use docker for running the app, but the docker-compose up command fails with this error:
System error: exec: "./wait_to_start": stat ./wait_to_start:
no such file or directory
The wait_to_start command is an executable python script in the subfolder backend/.
I need to determine why it cannot be executed. Either it's been searched in the wrong path, or there are access right problems, or maybe the wrong python version is used.
Can I debug it with details, or login with SSH and check the files on the virtual machine? I'm too unexperienced with Docker...
You can either set the "workdir" metadata to make sure you are in the right place when you start a container or simply call /backend/wait_to_start instead of ./wait_to_start so you remove the need to be in the proper directory.
Do debug with docker-compose I would do this:
docker-compose run --entrypoint bash <servicename>
That should give you a prompt and let you inspect the file and working directory, so see what's wrong.

Resources