Are notebooks accessible on the spark as a service file system? - dsx

I would like to investigate if it is possible to use the git command line client using a %%sh cell so that I can work directly with project resources such as scripts and notebooks using a git client. E.g.
%%sh
git clone ... myproj
Are the dsx notebooks stored on the spark as a service file system? If so, what folder are they stored in?

The notebooks are managed and are stored seperately and .ipynb are not exposed directly because DSX need ability organize this notebooks in project and collaborative enivornment.
You can certainly use
%%sh
git clone https://github.com/charles2588/bluemixsparknotebooks
Since the .ipynb files are not exposed, you cannot push them from here.
The alternative would be to use github integration and push files as explained in this thread:-
http://datascience.ibm.com/blog/github-integration-available-2/
Thanks,
Charles.

Related

Do I need to share the docker image if I can just share the docker file along with the source code?

I am just starting to learn about docker. Is docker repository (like Docker Hub) useful? I see the docker image as a package of source code and environment configurations (dockerfile) for deploying my application. Well if it's just a package, why can't I just share my source code with the dockerfile (via GitHub for example)? Then the user just downloads it all and uses docker build and docker run. And there is no need to push the docker image to the repository.
There are two good reasons to prefer pushing an image somewhere:
As a downstream user, you can just docker run an image from a repository, without additional steps of checking it out or building it.
If you're using a compiled language (C, Java, Go, Rust, Haskell, ...) then the image will just contain the compiled artifacts and not the source code.
Think of this like any other software: for most open-source things you can download its source from the Internet and compile it yourself, or you can apt-get install or brew install a built package using a package manager.
By the same analogy, many open-source things are distributed primarily as source code, and people who aren't the primary developer package and redistribute binaries. In this context, that's the same as adding a Dockerfile to the root of your application's GitHub repository, but not publishing an image yourself. If you don't want to set up a Docker Hub account or CI automation to push built images, but still want to have your source code and instructions to build the image be public, that's a reasonable decision.
That is how it works. You need to put the configuration files in your code, i.e,
Dockerfile and docker-compose.yml.

How do you put your source code into Kubernetes?

I am new to Kubernetes and so I'm wondering what are the best practices when it comes to putting your app's source code into container run in Kubernetes or similar environment?
My app is a PHP so I have PHP(fpm) and Nginx containers(running from Google Container Engine)
At first, I had git volume, but there was no way of changing app versions like this so I switched to emptyDir and having my source code in a zip archive in one of the images that would unzip it into this volume upon start and now I have the source code separate in both images via git with separate git directory so I have /app and /app-git.
This is good because I do not need to share or configure volumes(less resources and configuration), the app's layer is reused in both images so no impact on space and since it is git the "base" is built in so I can simply adjust my dockerfile command at the end and switch to different branch or tag easily.
I wanted to download an archive with the source code directly from repository by providing credentials as arguments during build process but that did not work because my repo, bitbucket, creates archives with last commit id appended to the directory so there was no way o knowing what unpacking the archive would result in, so I got stuck with git itself.
What are your ways of handling the source code?
Ideally, you would use continuous delivery patterns, which means use Travis CI, Bitbucket pipelines or Jenkins to build the image on code change.
that is, every time your code changes, your automated build will get triggered and build a new Docker image, which will contain your source code. Then you can trigger a Deployment rolling update to update the Pods with the new image.
If you have dynamic content, you likely put this a persistent storage, which will be re-mounted on Pod update.
What we've done traditionally with PHP is an overlay on runtime. Basically the container will have a volume mounted to it with deploy keys to your git repo. This will allow you to perform git pull operations.
The more buttoned up approach is to have custom, tagged images of your code extended from fpm or whatever image you're using. That way you would run version 1.3 of YourImage where YourImage would contain code version 1.3 of your application.
Try to leverage continuous integration and continuous deployment. You can use Jenkins as CI/CD server, and create some jobs for building image, pushing image and deploying image.
I recommend putting your source code into docker image, instead of git repo. You can also extract configuration files from docker image. In kubernetes v1.2, it provides new feature 'ConfigMap', so we can put configuration files in ConfigMap. When running a pod, configuration files will be mounted automatically. It's very convenience.

OpenShift S2I build strategy from multiple data sources

A web application typically consists of code, config and data. Code can often be made open source on GitHub. But per-instance config and data may contain secretes therefore are inappropriate be saved in GH. Data can be imported to a persistent storage so disregard for now.
Assuming the configs are file based and are saved in another private secured SVN repo, in order to deploy the web app to OpenShift and implement CI, I need to merge config files with code prior to running build scripts. In addition, the build strategy should support GH webhooks for automated build.
My questions are, to be more specific:
Does OS BuildConfig support multiple data sources, especially from svn?
If not, how to deploy such web app to OS?
The solution I came up with so far:
Instead of relying on OS for CI, use Jenkin instead.
Merge config files with code using Jenkins.
Instead of using Git source type in BuildConfig, use binary source instead
Let jenkins run
oc start-build --from-dir=<directory>
where <directory> contains merged code/config

Repository manager that manages binary dll files (Embedded C/C++ project artifacts) and that integrates with Jenkins

Is there any Repository manager that manages the binary dll files and also integrates well with the Jenkins?
Can Nexus be used to manage the dll files as these files are created as a part of Embedded C/C++ Projects and not sure if Nexus Artifact Manager supports/integrates well with such Projects as it mainly supports the Java projects?
Is there a way to automatically manage the upload and download of such project artifacts from Nexus/other artifact managers without the use of POM file?
Suggest in case there are other Artifact Managers that supports binary artifacts.
Artifactory can be used to store any type of binaries.
Starting with Artifactory 4.0, you can create generic repositories which allows uploading packages of any type. You will not need to upload any POM files and Artifactory will not need to calculate any metadata (for example Maven metadata).
To deploy files you can use the REST API or the UI, for example:
curl -uUSER:PASS -T file.dll http://localhost:8081/artifactory/dll-local/path/to/file.dll
If you have a certain layout you would like to use for this repository you can create a custom layout and associate it with the repository. This can be useful for automatic snapshot/integration versions cleanup and other module management tasks.
Disclaimer: I'm affiliated with Artifactory
The Nexus repository manager is java oriented, but can be used to store any files you want. Binaries of all types or even just text configuration files.
To automate the file upload process, you can use maven from command line:
mvn deploy:deploy-file -DgroupId=com.you -DartifactId=file -Dversion=1.0 -Dpackaging=exe -Dfile=c:\out\file.exe -Durl=http://yourserver/nexus/content/repositories/releases -DrepositoryId=releases
Then, to get the file, you should be able to get it directly with the following URL:
wget http://yourserver/nexus/content/repositories/releases/com/you/file/1.0/file-1.0.exe
This is a simple approach to using Nexus as a general artifact repository.
I hope this helps.
The open source version of Nexus (Nexus OSS) is supports many repository formats out of the box including Maven, NuGet, NPM, RubyGems and others. Nexus just runs on Java (e.g. like Jenkins). It is not Java only...
Depending on how you plan to get the DLL files from the repository, different formats might be more or less suited to your usage. You could even use a custom format, but then you rely custom tools.
The scenarios I have seen at many customers are
using a Maven repo and pulling the files in either in a Maven build together with the Maven NAR Plugin (used for native development with C/C++)
using a Maven repo and pulling via plan HTTP GET calls using your scripting language/build tool of choice
using NuGet format and store the DLLs in NuGet packages in the repo and using nuget to retrieve them for the projects
All of these work well.

Is it possible to merge changes on server using Jenkins

I am looking for a way to merge full local directory to a remote server using jenkins. It is easy to use some FTP plugin to delete while remote directory and re-upload all the files, but i would like to only upload new/changed files and remove the deleted files.
Is it possible to do that using jenkins? Maybe some other automation tool?
On Unix or Linux you can run 'rsync' with the two directories as parameters -
either on the local or on the remote host.
Just make sure you are not in the middle of some other operation while 'rsync' runs.

Resources