How to do PascalVoc evaluation with google object detection api - machine-learning

I'm working with the new Google object_detection api on my own dataset
In the config file there are fields eval_config and eval_input_reader, but I don't understand how to make them work.
I've also found a file in tensorflow/models/object_detection/ the file eval.py which seems to run the evaluation but I don't entirely understand what these args are:
./eval \
--logtostderr \
--checkpoint_dir=path/to/checkpoint_dir \
--eval_dir=path/to/eval_dir \
--pipeline_config_path=pipeline_config.pbtxt
Suppose I have a model checkpoint (3 ckpt files meta, index and data) what should I do with them?

i am looking for a config file too, or any information on how to build one, with parameters explanation, on this point the documentation is not really helpful

You shouldn't have to to anything with the eval config field, but you should change the eval_input_reader.tf_record_input_reader.input_path field to point to your validation dataset, and the eval_input_reader.label_map_path should point to your label map.
Replace the PATH_TO_YOUR_TF_RECORD and PATH_TO_YOUR_LABEL_MAP strings in the sample config below.
```
eval_config: {
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_YOUR_TF_RECORD"
}
label_map_path: "PATH_TO_YOUR_LABEL_MAP"
shuffle: false
num_readers: 1
}
```

Related

Conftest verify fixture data

I have been writing a few policies using Conftest and wish to verify my configuration with the conftest verify command. So far I have been able to successfully verify my policies like so
test_deployment_with_security_context {
no_violations with input as {
... json content ...
}
}
However the omitted JSON content above is rather large and clutters my policy tests. I want to put the JSON into a file and import it into the test. The conftest verify command takes a --data flag allowing files to be loaded as data and made available to the policies. For example, as per the documentation, conftest verify --data policy will recursively load in YAML and JSON files it finds. Therefore a file located in policy/examples/input.json is made available within the policies under import data.examples. My question is how can I use this data in the tests?
There's an open issue around this - the docs currently reflect OPA's behavior of recursively reading data files from dirs and using directory names for namespacing. This behavior is currently not mirrored in conftest. I'd suggest tracking the ticket for progress on that. As a workaround until then you could always "namespace" the data yourself, so that your input.json looks something like this:
{
"examples": {
"actual_data": {
...
}
}
}

How to get the PDB id of a mystery sequence?

I have a bunch of proteins, from something called proteinnet.
Now the sequences there have some sort of ID, but it is clearly not a PDB id, so I need to find that in some other way. For each protein I have the amino acid sequence. I'm using biopython, but I'm not very experienced in it yet and couldn't find this in the guide.
So my question is how do I find a proteins PDB id given that I have the amino acid sequence of the protein? (Such that I can download the PDB file for the protein)
hi I was playing a little bit ago with the RCSB PDB search API,
ended up with this piece of code (can't find examples on rcsb pdb website anymore),
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Dec 27 16:20:43 2020
#author: Pietro
"""
import PDB_searchAPI_5
from PDB_searchAPI_5.rest import ApiException
import json
#"value":"STEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKLRKLNPPDESGPGCMSCKCVLS"
# Defining the host is optional and defaults to https://search.rcsb.org/rcsbsearch/v1
# See configuration.py for a list of all supported configuration parameters.
configuration = PDB_searchAPI_5.Configuration(
host = "http://search.rcsb.org/rcsbsearch/v1"
)
data_entry_1 = '''{
"query": {
"type": "terminal",
"service": "sequence",
"parameters": {
"evalue_cutoff": 1,
"identity_cutoff": 0.9,
"target": "pdb_protein_sequence",
"value": "STEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKLRKLNPPDESGPGCMSCKCVLS"
}
},
"request_options": {
"scoring_strategy": "sequence"
},
"return_type": "entry"
}'''
# Enter a context with an instance of the API client
with PDB_searchAPI_5.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = PDB_searchAPI_5.SearchServiceApi(api_client)
try:
# Get RCSB PDB data schema as JSON schema extended with RCSB metadata.
pippo = api_instance.run_json_queries_get(data_entry_1)
except ApiException as e:
print("Exception when calling SearchServiceApi->run_json_queries_get: %s\n" % e)
exit()
print(type(pippo))
print(dir(pippo))
pippox = pippo.__dict__
print('\n bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb \n' ,pippox)
print('\n\n ********************************* \n\n')
print(type(pippox))
pippoy = pippo.result_set
print(type(pippoy))
for i in pippoy:
print('\n',i,'\n', type(i))
print('\n LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL\n')
for i in pippoy:
for key in i:
print('\n', i['identifier'], ' score : ', i['score'])
the search module (import PDB_searchAPI_5) was generated with: openapi-generator-cli-4.3.1.jar link here
the open api specs where 1.7.3 now they are 1.7.15 see https://search.rcsb.org/openapi.json
the data_entry_1 bit was copied from rcsb pdb website but can't find it anymore,
it was saying something about mmseqs2 being the sofware doing the search, played with:
"evalue_cutoff": 1,
"identity_cutoff": 0.9,
parameters but didn't find a way to select only 100% identity
here the PDB_searchAPI_5 install it in a virtual enviroment with:
pip install PDB-searchAPI-5-1.0.0.tar.gz
was generated by openapi-generator-cli-4.3.1.jar with:
java -jar openapi-generator-cli-4.3.1.jar generate -g python -i pdb-search-api-openapi.json --additionalproperties=generateSourceCodeOnly=True,packageName=PDB_searchAPI_5
don't put spaces in --additionalproperties part (took one week to figure it out)
the README.md file is the most important part as it explain how to use the OPEN-API client
you need your fasta sequences here:
"value":"STEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKLRKLNPPDESGPGCMSCKCVLS"
the score = 1 should be the exact match,
probably the biopython blast module is easier, but it blast NIH database instead of RCSB PDB, sorry can't elaborate more on this, still need to figure out what is a JSON file, and wasnt able to find a better free tool that automatically generate a better OPEN-API python client (I believe is kind of not so easy task... but we always want more...)
to get API documentation try:
java -jar openapi-generator-cli-4.3.1.jar generate -g html -i https://search.rcsb.org/openapi.json --skip-validate-spec
You get html document or for pdf: https://mrin9.github.io/RapiPdf/
http://search.rcsb.org/openapi.json
works as well as https://search.rcsb.org/openapi.json so that you can look at the exchanges between client and server with wireshark

How can I pass a pointer to a file in helm upgrade command?

I have a truststore file(a binary file) that I need to provide during helm upgrade. This file is different for each target env(dev,qa,staging or prod). So I can only provide this file at time of deployment. helm upgrade --set-file does not take a binary file. This seem to be the issue I found here: https://github.com/helm/helm/issues/3276. This truststore files are stored in Jenkins Credential store.
As the command itself is described below:
--set-file stringArray set values from respective files specified via the command line (can specify multiple or separate values with commas: key1=path1,key2=path2)
it is also important to know The Format and Limitations of
--set.
The error you see: Error: failed parsing --set-file data... means that the file you are trying to use does not meet the requirements. See the example below:
--set-file key=filepath is another variant of --set. It reads the
file and use its content as a value. An example use case of it is to
inject a multi-line text into values without dealing with indentation
in YAML. Say you want to create a brigade project with certain value
containing 5 lines JavaScript code, you might write a values.yaml
like:
defaultScript: |
const { events, Job } = require("brigadier")
function run(e, project) {
console.log("hello default script")
}
events.on("run", run)
Being embedded in a YAML, this makes it harder for you to use IDE
features and testing framework and so on that supports writing code.
Instead, you can use --set-file defaultScript=brigade.js with
brigade.js containing:
const { events, Job } = require("brigadier")
function run(e, project) {
console.log("hello default script")
}
events.on("run", run)
I hope it helps.

YOLO (Darknet): How to detect a whole directory of images?

The Darknet guide to detect objects in images using pre-trained weights is here: https://pjreddie.com/darknet/yolo/
The command to run is:
./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
The last argument is the path to a file, I've tried to change it to data/*.jpg but didn't work.
How to use Darknet to detect a whole directory of images?
As per the link mentioned below, one can use cv2.dnn.readNetFromDarknet module to read darknet, trained weights and configuration file to make a loaded model in python. Once the model is loaded, one can simply use for loop for prediction.
Please refer this link for further clarification
There is a simple way to detect objects on a list of images based on this repository AlexeyAB/darknet.
./darknet detector test cfg/obj.data cfg/yolov3.cfg yolov3.weights < images_files.txt
You can generate the file list either from the command line (Send folder files to txt ) or using a GUI tool like Nautilus on Ubuntu.
Two extra flags -dont_show -save_labels will disable the user interaction, and save the detection results to text files instead.
There's a trick to make Darknet executable load weights once and infer multiple image files. Use expect to do the trick.
Install expect:
sudo yum install expect -y
#sudo apt install expect -y
Do object detection on multiple images:
expect <<"HEREDOC"
puts "Spawning...";
spawn ./darknet detect cfg/yolov3-tiny.cfg yolov3-tiny.weights;
set I 0;
expect {
"Enter Image Path" {
set timeout -1;
if {$I == 0} {
send "data/dog.jpg\r";
incr I;
} elseif {$I == 1} {
send "data/kite.jpg\r";
incr I;
} else {
exit;
}
exp_continue;
}
}
HEREDOC
Another solution is loading Darknet from Python2 (not 3, Darknet is using Python2).
1a) Clone darknet as described in https://pjreddie.com/darknet/yolo/
1b) Go to the cloned dir, download yolov3-tiny.weights and yolov3.weights as said in https://pjreddie.com/darknet/yolo/
2) Copy darknet/examples/detector.py to darknet dir
3) Edit the new detector.py
Change .load_net line to use: cfg/yolov3-tiny.cfg and yolov3-tiny.weights
Change .load_meta line to use: cfg/coco.data
4a) Detect objects in images by adding some dn.dectect lines in detector.py
4b) Run detector.py

Using label path to check if file location exists

Is there an easy way to get hold of a path object so I can check if a given label path exists. Say for example if path.exists("#external_project_name//:filethatmightexist.txt"):. I can see that the repository context has this. But I need to have a wrapping repository rule. Is it possible to do this in a macro or Skylark native call instead?
Even with a repository_rule, I had a lot of trouble with this due to what you already pointed out:
if you create a Label with a path that doesn't exist, it will cause the build to fail
But if you're willing to do a repository rule, here's a possible solution...
In this example, my rule allows specification of a default configuration if a config file is not present. The configuration can be checked into .gitignore and overridden for individual developers, but work out of the box for most cases.
I think I understand why the ctx.actions have sibling arguments now, same idea here. The trick is config_file_location is a true label, and then config_file is a string attribute. I chose BUILD arbitrarily, but since all workspaces have a top level BUILD that's public seemed legit-ish.
WORKSPACE Definition
...
workspace(name="E02_mysql_database")
json_datasource_configuration(name="E02_datasources",
config_file_location="#E02_mysql_database//:BUILD",
config_file="database.json")
The definition for json_datasource_configuration looks like this:
json_datasource_configuration = repository_rule(
attrs = {
"config_file_location": attr.label(
doc="""
Path relative to the repository root for a datasource config file.
"""),
"config_file": attr.string(
doc="""
Config file, maybe absent
"""),
"default_config": attr.string(
# better way to do this?
default="None",
doc = """
If no config is at the path, then this will be the default config.
Should look something like:
{
"datasource_name": {
"host": "<host>"
"port": <port>
"password": "<password>"
"username": "<username>"
"jdbc_connection_string": "<optional>"
}
}
There can be more than datasource configured... maybe, eventually.
""",
),
},
local = True,
implementation = _json_config_impl,
)
Then in the rule I can test for the file existence, and if not present, do other logic.
def _json_config_impl(ctx):
"""
Allows you to specify a file on disk to use for data connection.
If you pass a default
"""
config_path = ctx.path(ctx.attr.config_file_location).dirname.get_child(ctx.attr.config_file)
config = ""
if config_path.exists:
config = ctx.read(config_path)
elif ctx.attr.default_config == "None":
fail("Could not find config at %s, you must supply a default_config if this is intentional" % ctx.attr.config_file)
else:
config = ctx.attr.default_config
...
probably too late to help, but your question is the only thing I found referencing this goal. If someone knows a better way I am looking for other options. It's complicated to explain to other developers why the rule has to work the way it does.
Also note, if you change the config file, you have to clean to get the workspace to re-read the config. I haven't been able to figure out any way to fix that. glob() does not work in the workspace.

Resources