keyerror when trying to load data

keyerror when trying to load data - parsing

I was trying to parse my dataset with the images and annotations with the code below. I am using anaconda with Spyder IDE in a windows machine:
DATA_DIR = 'input'
# Directory to save logs and trained model
ROOT_DIR = 'working'
train_dicom_dir = os.path.join(DATA_DIR, 'stage_1_train_images')
test_dicom_dir = os.path.join(DATA_DIR, 'stage_1_test_images')
print(train_dicom_dir)
print(test_dicom_dir)
def get_dicom_fps(dicom_dir):
dicom_fps = glob.glob(dicom_dir+'/'+'*.jpg')
return list(set(dicom_fps))
def parse_dataset(dicom_dir, anns):
image_fps = get_dicom_fps(dicom_dir)
image_annotations = {fp: [] for fp in image_fps}
for index, row in anns.iterrows():
fp = os.path.join(dicom_dir, row['patientId']+'.jpg')
image_annotations[fp].append(row)
return image_fps, image_annotations
image_fps, image_annotations = parse_dataset(train_dicom_dir, anns=anns)
On running the code, i get the following error:
Traceback (most recent call last):
File "<ipython-input-21-fe2564bb2360>", line 1, in <module>
image_fps, image_annotations = parse_dataset(train_dicom_dir, anns=anns)
File "<ipython-input-12-a5d26437db38>", line 6, in parse_dataset
image_annotations[fp].append(row)
KeyError: 'input\\stage_1_train_images\\1.2.276.0.7230010.3.1.4.8323329.10001.1517874346.163716.jpg'
My current working directory is'C:\Users\rajaramans2\codes\input'

It seems like you are trying to process images in .dcm (DICOM format), not .jpg, so your code should work if you replace “.jpg” for “.dcm”, I hope this helps.

Related

How to convert TensorFlow 2 saved model to be used with OpenCV dnn.readNet

I am struggling to find a way to convert my trained network using TensorFlow 2 Object detection API to be used with OpenCV for deployment purposes. I tried two methods for that but without success.
Could someone help me resolve this issue or propose the best and easy deep learning framework to convert my model to OpenCV (OpenCV friendly)?
I really appreciate any help you can provide.
This is my information system
OS Platform: Windows 10 64 bits
Tensorflow Version: 2.8
Python version: 3.9.7
OpenCV version: 4.5.5
1st Method: Using tf2onnx
I used the following code since I am using TensorFlow 2
python -m tf2onnx.convert --saved-model tensorflow-model-path --output model.onnx --opset 15
The conversion process generates the model.onnx successfully and returns the following:
However, when I try to read the converted model, I get the following error:
File "C:\Tensorflow\testcovertedTF2ToONNX.py", line 10, in <module> net = cv2.dnn.readNetFromONNX('C:/Tensorflow/model.onnx') cv2.error: Unknown C++ exception from OpenCV code
The code used to read the converted network is simple.
import cv2
import numpy as np
image = cv2.imread("img002500.jpg")
if image is None:
print("image emplty")
image_height, image_width, _ = image.shape
net = cv2.dnn.readNetFromONNX('model.onnx')
image = image.astype(np.float32)
input_blob = cv2.dnn.blobFromImage(image, 1, (640,640), 0, swapRB=False, crop=False)
net.setInput(input_blob)
output = net.forward()
2nd Method: Trying to get Frozen graph from saved model
I tried to get frozen_graph.pb from my saved_model using the script below, found in
https://github.com/opencv/opencv/issues/16879#issuecomment-603815872
import tensorflow as tf
print(tf.__version__)
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
loaded = tf.saved_model.load('models/mnist_test')
infer = loaded.signatures['serving_default']
f = tf.function(infer).get_concrete_function(input_tensor=tf.TensorSpec(shape=[None, 640, 640, 3], dtype=tf.float32))
f2 = convert_variables_to_constants_v2(f)
graph_def = f2.graph.as_graph_def()
# Export frozen graph
with tf.io.gfile.GFile('frozen_graph.pb', 'wb') as f:
f.write(graph_def.SerializeToString())
Then, I tried to generate the text graph representation (graph.pbtxt) using tf_text_graph_ssd.py found in https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API
python tf_text_graph_ssd.py --input path2frozen_graph.pb --config path2pipeline.config --output outputgraph.pbtxt
The execution of this script returns the following error:
cv.dnn.writeTextGraph(modelPath, outputPath)
cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\dnn\src\tensorflow\tf_graph_simplifier.cpp:1052: error: (-215:Assertion failed) permIds.size() == net.node_size() in function 'cv::dnn::dnn4_v20211220::sortByExecutionOrder'
During the handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_ssd.py", line 413, in <module>
createSSDGraph(args.input, args.config, args.output)
File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_ssd.py", line 127, in createSSDGraph
writeTextGraph(modelPath, outputPath, outNames)
File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_common.py", line 320, in writeTextGraph
from tensorflow.tools.graph_transforms import TransformGraph
ModuleNotFoundError: No module named 'tensorflow.tools.graph_transforms'
Trying to read the generated frozen model without a graph.pb using dnn.readNet the code below:
import cv2
import numpy as np
image = cv2.imread("img002500.jpg")
if image is None:
print("image emplty")
image_height, image_width, _ = image.shape
net = cv2.dnn.readNet('frozen_graph_centernet.pb')
image = image.astype(np.float32)
# create blob from image (opencv dnn way of pre-processing)
input_blob = cv2.dnn.blobFromImage(image, 1, (1024,1024), 0, swapRB=False, crop=False)
net.setInput(input_blob)
output = net.forward()
returns the following error
Traceback (most recent call last):
File "C:\Tensorflow\testFrozengraphTF2.py", line 14, in <module>
output = net.forward()
cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\dnn\src\dnn.cpp:621: error: (-2:Unspecified error) Can't create layer "StatefulPartitionedCall" of type "StatefulPartitionedCall" in function 'cv::dnn::dnn4_v20211220::LayerData::getLayerInstance'
I understand that OpenCV doesn't import models with StatefulPartitionedCall (TF Eager mode). Unfortunately, this means the script found to export my saved model to frozen_graph did not work.
saved model
you can get my saved model from the link below
https://www.dropbox.com/s/liw5ff87rz7v5n5/my_model.zip?dl=0
#note: the exported model works well with the TensorFlow script

2nd Method: Trying to get Frozen graph from saved model
make_FB
https://medium.com/#sebastingarcaacosta/how-to-export-a-tensorflow-2-x-keras-model-to-a-frozen-and-optimized-graph-39740846d9eb
use pyopencv
model = cv.dnn.readNetFromTensorflow('./frozen_graph2.pb')

How to address a Snakemake error at the stage of DAG computation?

I'm running in to an error with my Snakemake variant identification pipeline, when the original DAG of jobs is built. I believe this is a memory issue; when I test with a short list of input files, the DAG is constructed without issue, however, when I try with 300+ input paired-fastq, I receive the following error:
Building DAG of jobs...
Traceback (most recent call last):
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/__init__.py", line 633, in snakemake
keepincomplete=keep_incomplete,
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/workflow.py", line 568, in execute
dag.check_incomplete()
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/dag.py", line 281, in check_incomplete
incomplete = self.incomplete_files
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/dag.py", line 402, in incomplete_files
filterfalse(self.needrun, self.jobs),
File "/home/k/.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/dag.py", line 399, in <genexpr>
job.output
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 205, in incomplete
return any(map(lambda f: f.exists and marked_incomplete(f), job.output))
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 205, in <lambda>
return any(map(lambda f: f.exists and marked_incomplete(f), job.output))
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 203, in marked_incomplete
return self._read_record(self._metadata_path, f).get("incomplete", False)
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 322, in _read_record_cached
return self._read_record_uncached(subject, id)
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 328, in _read_record_uncached
return json.load(f)
File "/home//.conda/envs/snakemake/lib/python3.6/json/__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/home//.conda/envs/snakemake/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/home//.conda/envs/snakemake/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home//.conda/envs/snakemake/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I'm not sure how to resolve this - if this is a known bug or if there is a way to define my pipeline to build a less complex DAG? I am including the first section of my Snakemake file as well. I use the rule all to define all desired output files.
################################
#### Mtb bwa/GATK Snakemake ####
################################
import numpy as np
from collections import defaultdict
import pandas as pd
samples_df = pd.read_table('config/tgen_samples2a.tsv',sep = ',').set_index("sample", drop=False)
sample_names = list(samples_df['sample'])
batch_names = list(samples_df['batch'])
#print(sample_names)
# fastq1 input function definition
def fq1_from_sample(wildcards):
return samples_df.loc[wildcards.sample, "fastq_1"]
# fastq2 input function definition
def fq2_from_sample(wildcards):
return samples_df.loc[wildcards.sample, "fastq_2"]
# Define config file. Stores sample names and other things.
configfile: "config/config.yaml"
# Define a rule for running the complete pipeline.
rule all:
wildcard_constraints:
batch="IS-.+"
input:
trim = expand(['results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz'], zip, samp=sample_names,batch=batch_names),
kraken=expand('results/{batch}/{samp}/kraken/{samp}_trim_kr_1.fq.gz', zip, samp=sample_names,batch=batch_names),
bams=expand('results/{batch}/{samp}/bams/{samp}_{mapper}_{ref}_sorted.bam', zip, samp=sample_names,batch=batch_names, ref = config['ref']*len(sample_names), mapper = config['mapper']*len(sample_names)), # When using zip, need to use vectors of equal lengths for all wildcards.
per_samp_run_stats = expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_combined_stats.csv', zip, samp=sample_names,batch=batch_names, ref = config['ref']*len(sample_names), mapper = config['mapper']*len(sample_names)),
amr_stats=expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_amr.csv', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper']),
cov_stats=expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_cov_stats.txt', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper']),
all_sample_stats=expand('results/{batch}/stats/combined_per_run_sample_stats.csv',batch = batch_names),
vcfs=expand('results/{batch}/{samp}/vars/{samp}_{mapper}_{ref}_{caller}_qfilt.vcf.gz', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'], caller = config['caller']),
ann_vcfs=expand('results/{batch}/{samp}/vars/{samp}_{mapper}_{ref}_gatk_ann.vcf.gz', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'], caller = config['caller']),
fastas=expand('results/{batch}/{samp}/fasta/{samp}_{mapper}_{ref}_{caller}_{filter}.fa', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'], caller = config['caller'], filter=config['filter']),
profiles=expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_lineage.csv', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'])
# Trim reads for quality.
rule trim_reads:
input:
p1=fq1_from_sample,
p2=fq2_from_sample
output:
trim1='results/{batch}/{sample}/trim/{sample}_trim_1.fq.gz',
trim2='results/{batch}/{sample}/trim/{sample}_trim_2.fq.gz'
log:
'results/{batch}/{sample}/trim/{sample}_trim_reads.log'
shell:
'{config[scripts_dir]}trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'
# Filter reads taxonomically with Kraken.
rule taxonomic_filter:
input:
trim1='results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz',
trim2='results/{batch}/{samp}/trim/{samp}_trim_2.fq.gz'
output:
kr1='results/{batch}/{samp}/kraken/{samp}_trim_kr_1.fq.gz',
kr2='results/{batch}/{samp}/kraken/{samp}_trim_kr_2.fq.gz',
kraken_report='results/{batch}/{samp}/kraken/{samp}_kraken.report',
kraken_stats = 'results/{batch}/{samp}/kraken/{samp}_kraken_stats.csv'
log:
'results/{batch}/{samp}/kraken/{samp}_kraken.log'
threads: 8
shell:
'{config[scripts_dir]}run_kraken.sh {input.trim1} {input.trim2} {output.kr1} {output.kr2} {output.kraken_report} &>> {log}'
Thank you in advance for help using Snakemake!
All the best,

I kind of doubt memory is an issue. 300+ is not much, especially if each of them is processed independently of the others.
Try to start from the subset of samples that you say worked and gradually increase it until you see the problem appearing. Perhaps you have some funny value in your sample sheet or in your config? json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) hints at something like that in my impression.

The answer was from #TroyComi, above: after deleting the .snakemake directory, the issue was resolved. Thank you!

Gensim: error loading pretrained vectors No such file or directory: 'word2vec.kv.vectors.npy'

I am trying to load a Pretrained word2vec embeddings that is in gensim keyedvector 'word2vec.kv'
pretrained = KeyedVectors.load(args.pretrained mmap = 'r')
where arg.pretrained is "/ptembs/word2vec.kv"
and iam getting this error
File "main.py", line 60, in main
pretrained = KeyedVectors.load(args.pretrained, mmap = 'r')
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 1553, in load
model = super(WordEmbeddingsKeyedVectors, cls).load(fname_or_handle, **kwargs)
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 228, in load
return super(BaseKeyedVectors, cls).load(fname_or_handle, **kwargs)
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\utils.py", line 436, in load obj._load_specials(fname, mmap, compress, subname)
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\utils.py", line 478, in _load_specials
val = np.load(subname(fname, attrib), mmap_mode=mmap)
File "C:\Users\ASUS\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 417, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'ptembs/word2vec.kv.vectors.npy'
i dont understand why it need word2vec.kv.vectors.npy file ? and i dont have it.
Any idea how to solve this problem?
gensim version 3.8.3
tried it on 4.1.2 also same error.

Where did you get the file 'word2vec.kv'?
If loading that file triggers an error mentioning a 2nd file by name, then that 2nd file should've been created alongside 'word2vec.kv' when it was 1st saved using a .save() operation.
That other file needs to be kept alongside 'word2vec.kv' in order for 'word2vec.kv' to be .load()ed again in the future.

Avoiding memory error when reading large csv file

I'm trying to read a large csv file (25 GB) onto a google cloud instance using the following method:
from google.cloud import storage
from io import StringIO
client = storage.Client()
bucket = client.get_bucket('bucket')
blob = bucket.get_blob(f"full_dataset.csv")
bt = blob.download_as_string()
s = str(bt,"utf-8")
s = StringIO(s)
df = pd.read_csv(s)
which gives me the following error:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-18-e919b9b86de2> in <module>
2
3 s = str(bt,"utf-8")
----> 4 s = StringIO(s)
MemoryError:
Is there another method that I could you use to efficiently read this csv file without a memory error ?

The object is too big to fit into a string in memory. You can instead read it chunk by chunk, for example by using google.resumable_media.

OpenCV Haar Cascade Creation

I want to try create my own .xml file for my graduation project with this reference.
But I have a problem which stage 6 doesn't work.It gives error such as:
Traceback (most recent call last):
File "./tools/mergevec.py", line 170, in <module>
merge_vec_files(vec_directory, output_filename)
File "./tools/mergevec.py", line 120, in merge_vec_files
val = struct.unpack('<iihh', content[:12])
TypeError: a bytes-like object is required, not 'str'
I have found a solution which says find 0 size vector files and delete them.
But, I don't know which vector files are 0 size and how I can detect them.
Can you help about this please?

I was able to solve my problem when i changed it:
for f in files:
with open(f, 'rb') as vecfile:
content = ''.join(str(line) for line in vecfile.readlines())
data = content[12:]
outputfile.write(data)
except Exception as e:
exception_response(e)
for it:
for f in files:
with open(f, 'rb') as vecfile:
content = b''.join((line) for line in vecfile.readlines())
outputfile.write(bytearray(content[12:]))
except Exception as e:
exception_response(e)
and like before i changed it:
content = ''.join(str(line) for line in vecfile.readlines())
for it:
content = b''.join((line) for line in vecfile.readlines())
because it was waiting for some str, and now it is able to receive the binary archives that we're in need.
:)

Try following this guide. It's more recent.

Categories

HOME

imagemagick

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

keyerror when trying to load data - parsing

It seems like you are trying to process images in .dcm (DICOM format), not .jpg, so your code should work if you replace “.jpg” for “.dcm”, I hope this helps.

Related

How to convert TensorFlow 2 saved model to be used with OpenCV dnn.readNet

How to address a Snakemake error at the stage of DAG computation?

Gensim: error loading pretrained vectors No such file or directory: 'word2vec.kv.vectors.npy'

Avoiding memory error when reading large csv file

OpenCV Haar Cascade Creation

Categories

Resources