How to verify all the dependencies are installed in sagemaker? - serverless

I am creating an sagemaker endpoint and loading a pretrained model from an s3 bucket. the model -> model.tar.gz file has directory structure as documented here, https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#model-directory-structure
model.tar.gz/
|- model.pth
|- code/
|- inference.py
|- requirements.txt # only for versions 1.3.1 and higher
I have put few dependencies in requirements.txt, is there a way to verify that all the dependencies were installed correctly?

It is not possible to get access or SSH into the machine that's running your deployment. So, one way is to assert the versions of your dependencies in the model_fn inside "inference.py" something like below.
if your requirements.txt looks like this:
numpy==1.20.3
pandas==1.3.4
get the versions and assert them in `model_fn like below:
import os
### your other code ###
def model_fn(model_dir):
# assuming you have numpy and pandas
assert os.popen("python3 -m pip freeze | grep -E 'numpy|pandas'").read() == 'numpy==1.20.3\npandas==1.3.4\n'
### your other code ###
return xxxx
### your other code ###

Related

Conda: how to add packages to environment from log (not yaml)?

I'm doing an internship (= yes I'm a newbie). My supervisor told me to create a conda environment. She passed me a log file containing many packages.
A quick qwant.com search shows me how to create envs via the
conda env create --file env_file.yaml
The file I was give is however NOT a yaml file it is structured like so:
# packages in environment at /home/supervisors_name/.conda/envs/pancancer:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
bedtools 2.29.2 hc088bd4_0 bioconda
blas 1.0 mkl
bzip2 1.0.8 h7b6447c_0
The file contains 41 packages = 44 lines including comments above. For simplicity I'm showing only the first 7.
Appart from adding env name (see 2. below), is there a way to use the file as it is to generate an environment with the packages?
I ran the cmd using
conda env create --file supervisors.log.txt
SpecNotFound: Environment with requirements.txt file needs a name
Where in the file should I put the name?
alright, so, it seems that they give you the output of conda list rather than the .yml file produced by conda with conda env export > myenv.yml. Therefore you have two solutions:
You ask for the proper file and then proceed to install the env with conda built-in pipeline
If you do not have any access on the proper file, you could do one of the following:
i) Parse with python into a proper .yml file and then do the conda procedure.
ii) Do a bash script, downloading the packages listed in the file she gave you.
This is how I would proceed, personally :)
Because there is no other SO post on this error, for people of the future: I got this error just because I named my file conda_environment.txt instead of conda_environment.yml. Looks like the yml extension is mandatory.

How to revise an exsiting kernel package in OpenWrt?

I want to make some revisions to the code of package/kernel/mac80211.
I am new to OpenWrt and after some research, I think I should change the PKG_SOURCE_URL to my own GitHub repository, which is my own copy of /linux/kernel/projects/backports/stable/v4.19.120.
So I change package/kernel/mac80211/Makefile like following:
PKG_SOURCE_PROTO:=git
PKG_SOURCE_URL:=https://github.com/sheep94lion/openwrt.git
PKG_SOURCE_VERSION:=168bae33318ebd14d8c035b543a2583ea31f9f52
PKG_MIRROR_HASH:=skip
# PKG_SOURCE_URL:=#KERNEL/linux/kernel/projects/backports/stable/v4.19.120/
# PKG_HASH:=2bafd75da301a30a5f2b98f433b6545d7b58c1fc3af15e9e9aa085df7f9db1d4
My question is: am I in the right direction? What is the right/proper way to revise an existing kernel package?
I copy the source files to package/kernel/mac80211/src directory (create the directory first), and then revise Makefile to use local source files instead of downloading and unpack from the official URI.
The revisions in Makefile is as following:
# comment out the configurations to download the tarball.
# PKG_SOURCE_URL:=#KERNEL/linux/kernel/projects/backports/stable/v4.19.120/
# PKG_HASH:=2bafd75da301a30a5f2b98f433b6545d7b58c1fc3af15e9e9aa085df7f9db1d4
# PKG_SOURCE:=backports-$(PKG_VERSION).tar.xz
......
define Build/Prepare
rm -rf $(PKG_BUILD_DIR)
mkdir -p $(PKG_BUILD_DIR)
# do not unpack the downloaded tarbar.
# $(PKG_UNPACK)
# instead, copy files under src to the build directory.
$(CP) ./src/* $(PKG_BUILD_DIR)/
$(Build/Patch)
When I want to release the code, I think I should use the patch.

circleci python -t flag when running tests does not work

I have this run step in my circle.yaml file with no checkout or working directory set:
- run:
name: Running dataloader tests
command: venv/bin/python3 -m unittest discover -t dataloader tests
The problem with this is that the working directory from the -t flag does not get set. I have moduleNotFound Errors when trying to find an assertions folder inside the dataloader class.
My tree:
├── dataloader
│   ├── Dockerfile
│   ├── Makefile
│   ├── README.md
│   ├── __pycache__
│   ├── assertions
But this works:
version: 2
defaults: &defaults
docker:
- image: circleci/python:3.6
jobs:
dataloader_tests:
working_directory: ~/dsys-2uid/dataloader
steps:
- checkout:
path: ~/dsys-2uid
...
- run:
name: Running dataloader tests
command: venv/bin/python3 -m unittest discover -t ~/app/dataloader tests
Any idea as to what might be going on?
Why doesn't the first one work with just using the -t flag?
What does working directory and checkout with a path actually do? I don't even know why my solution works.
The exact path to the tests folder from the top has to be specified for 'discovery' to work. For example:'python -m unittest discover src/main/python/tests'. That must be why its working in the second case.
Its most likely a bug with 'unittest discovery' where discovery works when you explicitly specify namespace package as a target for discovery.But it does not recurse into any namespace packages inside namespace_pkg. So when you simply run 'python3 -m unittest discover' it doesn't go under all namespace packages (basically folders) in cwd.
Some PRs are underway(for example:issue35617) to fix this, but are yet to be released
checkout = Special step used to check out source code to the configured path (defaults to the working_directory). The reason this is a special step is because it is more of a helper function designed to make checking out code easy for you. If you require doing git over HTTPS you should not use this step as it configures git to checkout over ssh.
working_directory = In which directory to run the steps. Default: ~/project (where project is a literal string, not the name of your specific project). Processes run during the job can use the $CIRCLE_WORKING_DIRECTORY environment variable to refer to this directory. Note: Paths written in your YAML configuration file will not be expanded; if your store_test_results.path is $CIRCLE_WORKING_DIRECTORY/tests, then CircleCI will attempt to store the test subdirectory of the directory literally named $CIRCLE_WORKING_DIRECTORY, dollar sign $ and all.

Trainer module not found in Google Cloud ML Engine

I am trying to tune my variational autoencoder's hyperparameters using Google Cloud ML Engine. I set up my package with the structure they recommend in the docs, so that I specify "trainer.task" as my main module name. Below is an image of my directory structure.
image of directory structure
This works on my own machine when I include the following lines:
import sys
sys.path.append("/path/to/project/directory/")
When I run using the below command, I get the error "No module named trainer". Is there a different path I need to specify or something special I need to do for running on Google Cloud ML Engine?
gcloud ml-engine jobs submit training $JOB_NAME --package-path $TRAINER_PACKAGE_PATH --module-name $MAIN_TRAINER_MODULE --job-dir $JOB_DIR --region $REGION --config config.yaml
Do you have a setup.py file? If so you might be hitting this issue
To debug this:
Get the GCS location of the package from the job
gcloud --project=$PROJECT ml-engine jobs describe $JOB_NAME
This will output something like
jobId: somejob
state: PREPARING
trainingInput:
jobDir: gs://BUCKET/job
packageUris:
- gs://bucket/job/packages/7d2611c7366f266058da5a9e2c93467426c5fdd018491fa33853516d9db533b1/somepackage-0.0.0.tar.gz
pythonModule: cifar.task
region: us-central1
trainingOutput: {}
Note the values above are for illustrative purposes only and will differ from your output.
Copy the GCS package to your machine
gsutil cp gs://bucket/job/packages/7d2611c7366f266058da5a9e2c93467426c5fdd018491fa33853516d9db533b1/somepackage-0.0.0.tar.gz /tmp
Unpack the .tar.gz and check it has a directory trainer with an __init__.py file and task.py. If not then you probably specified incorrect values for the command line.
If you include the actual command line (i.e. the values for the variables) and the contents of .tar.gz, I can probably provide a better answer.
Jeremy I had a similar problem. I downloaded and unzipped my files but there was no task.py in it.
These are the cmd line arguments I used:
gcloud ml-engine jobs submit training job11 --package-path=./trainer --module-
name='Keras_On_GoogleCloud.trainer.shallownet_train' --job-dir=gs://zubair-gc-
bucket/jobs/job11 --region='us-central1' --config=trainer/cloudml-gpu.yaml -- -
-job_name='zubair-gc-job11' --dataset='dataset/animals' --model='shallownet_weights1.hdf5'

How to create homebrew formula with only scripts

I want to package up a few shell scripts + support files into a homebrew formula that installs these scripts somewhere on the user $PATH. I will serve the formula from my own tap.
Reading through the formula cookbook the examples seem to assume cmake or autotools system in the upstream library. What if my project only consists of a few scripts and config files? Should I just manually copy those into #{prefix}/ in the Formula?
There are two cases here:
Standalone Scripts
Install them under bin using bin.install. You can optionally rename them, e.g. to strip the extension:
class MyFormula < Formula
# ...
def install
# move 'myscript.sh' under #{prefix}/bin/
bin.install "myscript.sh"
# OR move 'myscript.sh' to #{prefix}/bin/mybettername
bin.install "myscript.sh" => "mybettername"
# OR move *.sh under bin/
bin.install Dir["*.sh"]
end
end
Scripts with Support Files
This case is tricky because you need to get all the paths right. The simplest way is to install everything under #{libexec}/ then write exec scripts under #{bin}/. That’s a very common pattern in Homebrew formulae.
class MyFormula < Formula
# ...
def install
# Move everything under #{libexec}/
libexec.install Dir["*"]
# Then write executables under #{bin}/
bin.write_exec_script (libexec/"myscript.sh")
end
end
Given a tarball (or a git repo) that contains the following content:
script.sh
supportfile.txt
The above formula will create the following hierarchy:
#{prefix}/
libexec/
script.sh
supportfile.txt
bin/
script.sh
Homebrew creates that #{prefix}/bin/script.sh with the following content:
#!/bin/bash
exec "#{libexec}/script.sh" "$#"
This means that your script can expect to have a support file in its own directory while not polluting bin/ and not making any assumption regarding the install path (e.g. you don’t need to use things like ../libexec/supportfile.txt in your script).
See this answer of mine for an example with a Ruby script and that one for an example with manpages.
Note Homebrew also have other helpers to e.g. not only write an exec script but also set environment variables or execute a .jar.

Resources