I am using a face recognition model based on tensorflow. in my local machine - ubuntu 14.04 - everything works.
when I deploy it using docker, I am getting the following error:
DataLossError: Unable to open table file /data/model/model.ckpt-80000: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you
need to use a different restore operator?
I am using python implementation for tensorflow.
The model is in the old 11.* format (model.meta & model.ckpt-80000) while the tensorflow python version is 12.* . It shouldn't be a problem, as that's the configuration in my local machine, as well as in the place where I took the model from.
The versions of tensorflow, numpy and protobuf are identical in my machine and in the docker machine.
Any advice?
UPDATE
I created a small script that runs perfectly on my machine. Then, I run the same script on the deployed on virtual machine (AWS instance) BUT NOT on docker. It also failed, with the same error.
The deployed machine is ubuntu 16.04.
Seems like i was dealing with a corrupted file
Related
I'm trying to run the code from this repository: https://github.com/danielgordon10/thor-iqa-cvpr-2018
It has the following requirements
Python 3.5
CUDA 8 or 9
cuDNN
Tensorflow 1.4 or 1.5
Ubuntu 16.04, 18.04
an installation of darknet
My system satisfies neither of these. I don't want to reinstall tf/cuda/cudnn on my machine (especially if have to do that everytime I try to run deep learning code with different tensorflow requirements everytime).
I'm looking for a way to install the requirements and run the code regardless of the host.
To my knowledge that is exactly what Docker is for.
Looking into this there exist docker images from nvidia. For example one called "nvidia/cuda:9.1-cudnn7-runtime". Based on the name I assumed that any image build with this as the base comes with cuda installed. This does not seem to be the case as if I try to install darknet it will fail with the error that "cuda_runtime.h" is missing.
So what my question basicaly boils down to is: How do I keep multiple different versions of cuda and tensorflow on the same machine ? Ideally with docker (or similar) so I won't have to do the process to many times.
It feels like I'm missing and/or don't understand something obvious, because I can't imagine that it can be so hard to run tensorflow code with different versions without reinstalling things from scratch all the time.
My Python project is very windows-centric, we want the benefits of containers but we can't give up Windows just yet.
I'd like to be able to use the Dockerized remote python interpreter feature that comes with IntelliJ. This works flawlessly with Python running on a standard Linux container, but appears to work not at all for Python running on a Windows container.
I've built a new image based on a standard Microsoft Server core image. I've installed Miniconda, bootstrapped a Python environment and verified that I can start an interactive Python session from the command prompt.
Whenever I try to set this up I get an error message: "Can't retrieve image ID from build stream". This occurs at the moment when IntelliJ would have normally detected the python interpreter and it's installed libraries.
I also tried giving the full path for the interpreter: c:\miniconda\envs\htp\python.exe
I've never seen any mention that this works in the documentation, but nor have I seen any mention that it does not work. I totally accept that Windows Containers are an oddity, so it's entirely possible that IntelliJ's remote-Python feature was never tested on Python running in Windows containers.
So, has anybody got this feature working with Python running on a Windows container yet? Is there any reason to believe that it does or does not work?
Regrettably, it is not supported yet. Please vote for the feature request https://youtrack.jetbrains.com/issue/PY-45222 in order to increase its priority.
I try to train a pytorch model on amazon sagemaker studio.
It's working when I use an EC2 for training with:
estimator = PyTorch(entry_point='train_script.py',
role=role,
sagemaker_session = sess,
train_instance_count=1,
train_instance_type='ml.c5.xlarge',
framework_version='1.4.0',
source_dir='.',
git_config=git_config,
)
estimator.fit({'stockdata': data_path})
and it's work on local mode in classic sagemaker notebook (non studio) with:
estimator = PyTorch(entry_point='train_script.py',
role=role,
train_instance_count=1,
train_instance_type='local',
framework_version='1.4.0',
source_dir='.',
git_config=git_config,
)
estimator.fit({'stockdata': data_path})
But when I use it the same code (with train_instance_type='local') on sagemaker studio it doesn't work and I have the following error: No such file or directory: 'docker': 'docker'
I tried to install docker with pip install but the docker command is not found if use it in terminal
This indicates that there is a problem finding the Docker service.
By default, the Docker is not installed in the SageMaker Studio (confirming github ticket response).
Adding more information to an almost 2 years old question.
SageMaker Studio does not natively support local mode. Studio Apps are themselves docker containers and therefore they require privileged access if they were to be able to build and run docker containers.
As an alternative solution, you can create a remote docker host on an EC2 instance and setup docker on your Studio App. There is quite a bit of networking and package installation involved, but the solution will enable you to use full docker functionality. Additionally, as of version 2.80.0 of SageMaker Python SDK, it now supports local mode when you are using remote docker host.
sdockerSageMaker Studio Docker CLI extension (see this repo) can simplify deploying the above solution in simple two steps (only works for Studio Domain in VPCOnly mode) and it has an easy to follow example here.
UPDATE:
There is now a UI extension (see repo) which can make the experience much smoother and easier to manage.
I have built a docker image with several packages loaded for a development environment.
I plan to use the image to make porting the environment to various machines simple.
In a container of the image, I can build my binary (using cmake & g++) on any of the machines I have loaded the docker image to. The binary has executed well within such containers on most machines.
But, on one machine executing the binary in the container results in a core dump with "illegal instruction" reported.
The crash happens on a machine with Intel Xeon CPUs. But it runs fine on another Xeon machine. It also runs fine on an AMD Ryzen machine.
I haven't tried executing the binary outside the container because the environment is so hard to set up.. why I'm using docker.
I just wonder if anyone's seen this happen much and how to start trying to resolve?
If it helps & anyone is familiar it, the base image I pulled from docker hub, to add other packages to, is floopcz/tensorflow_cc:ubuntu-shared. It is a Ubuntu image with the Tensorflow C++ API built for CPU use only (not CUDA).
The binary that's crashing does attempt to open a Tensorflow session before doing anything else.
I'm running Docker 19.03 on Ubuntu 16.04 and 18.04. The image has Ubuntu 18.04 loaded.
I have a python application implemented with python containing following components:
1. Database
2. python app upono anaconda
3. Linux OS
The idea is to dockerization these three components into isolated container and then linking them together by running.
To me it's clear how to link database image with linux image, but how can I combine anaconda and linux? Isn't anaconda suppose to be installed on linux system?
You will only have two containers. Both your database and python app presumably need a Linux OS of one flavor or another. In your docker file you would start with something like with ubuntu to pull in a base image and make your changes. Using the diff based file system your changes will be layered on top of the base image.