Visualizing the Model using export_graphviz from .pkl file - machine-learning

I have exported the model into .pkl file. Now I am trying to import via Joblib which imports in form sklearn.model_selection._search.GridSearchCV.
However I am not able to use from sklearn.tree import export_graphvizexport_graphviz as it expects tree_ as the first parameter.
Is there any way to do this? Here is my code:
export_graphviz(model,out_file="out.dot")
Traceback (most recent call last):
File "", line 1, in
File "/home/anaconda3/lib/python3.6/site-packages/sklearn/tree/export.py", line 433, in export_graphviz
recurse(decision_tree.tree_, 0, criterion=decision_tree.criterion)
AttributeError: 'GridSearchCV' object has no attribute 'tree_'

How did you define the gridSearchCV and train it?
Most probably the underlying tree in the gridSearchCV can be accessed by:-
decision_tree.best_estimator_.tree_
if decision_tree is a gridsearchcv object

Related

how to save logs from c++ binary in beam python?

I have a c++ binary that uses glog. I run that binary within beam python on cloud dataflow. I want to save c++ binary's stdout, stderr and any log file for later inspection. What's the best way to do that?
This guide gives an example for beam java. I tried to do something like that.
def sample(target, output_dir):
import os
import subprocess
import tensorflow as tf
log_path = target + ".log"
with tf.io.gfile.GFile(log_path, mode="w") as log_file:
subprocess.run(["/app/.../sample.runfiles/.../sample",
"--target", target,
"--logtostderr"],
stdout=log_file,
stderr=subprocess.STDOUT)
I got the following error.
...
File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "/home/swang/.cache/bazel/_bazel_swang/09eb83215bfa3a8425e4385b45dbf00d/execroot/__main__/bazel-out/k8-opt/bin/garage/sample_launch.runfiles/pip_parsed_deps_apache_beam/site-packages/apache_beam/transforms/core.py", line 1877, in <lambda>
wrapper = lambda x, *args, **kwargs: [fn(x, *args, **kwargs)]
File "/home/swang/.cache/bazel/_bazel_swang/09eb83215bfa3a8425e4385b45dbf00d/execroot/__main__/bazel-out/k8-opt/bin/garage/sample_launch.runfiles/__main__/garage/sample_launch.py", line 17, in sample
File "/usr/local/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/local/lib/python3.8/subprocess.py", line 808, in __init__
errread, errwrite) = self._get_handles(stdin, stdout, stderr)
File "/usr/local/lib/python3.8/subprocess.py", line 1489, in _get_handles
c2pwrite = stdout.fileno()
AttributeError: 'GFile' object has no attribute 'fileno' [while running 'Map(functools.partial(<function sample at 0x7f45e8aa5a60>, output_dir='gs://swang/sample/20220815_test'))-ptransform-28']
google.cloud.storage API also does not seem to expose fileno().
import google.cloud.storage
google.cloud.storage.blob.Blob("test", google.cloud.storage.bucket.Bucket(google.cloud.storage.client.Client(), "swang"))
<Blob: swang, test, None>
blob = google.cloud.storage.blob.Blob("test", google.cloud.storage.bucket.Bucket(google.cloud.storage.client.Client(), "swang"))
reader = google.cloud.storage.fileio.BlobReader(blob)
reader.fileno()
Traceback (most recent call last):
File "/usr/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
I also considered writing the logs in c++ binary rather than passing them to python. As glog is implemented on top of c++ FILE rather than iostream, I have to reset stdout etc to gcs at FILE level like this rather than reset cout to gcs in iostream level like this. But gcs c++ API is only implemented on top of iostream, so this approach does not work. Using dup2 like this is another approach but seem too complicated and expensive.
You can use the Filesystems module of Beam to open a writable channel (file handle where you have write permissions) in any of the filesystems supported by Beam. If you are running in Dataflow, this will automatically use the credentials of the Dataflow job to access Google Cloud Storage: https://beam.apache.org/releases/pydoc/current/apache_beam.io.filesystems.html?apache_beam.io.filesystems.FileSystems.create
If you are writing to GCS, you need to make sure that you don't overwrite an object, that would produce an error.

How to convert TensorFlow 2 saved model to be used with OpenCV dnn.readNet

I am struggling to find a way to convert my trained network using TensorFlow 2 Object detection API to be used with OpenCV for deployment purposes. I tried two methods for that but without success.
Could someone help me resolve this issue or propose the best and easy deep learning framework to convert my model to OpenCV (OpenCV friendly)?
I really appreciate any help you can provide.
This is my information system
OS Platform: Windows 10 64 bits
Tensorflow Version: 2.8
Python version: 3.9.7
OpenCV version: 4.5.5
1st Method: Using tf2onnx
I used the following code since I am using TensorFlow 2
python -m tf2onnx.convert --saved-model tensorflow-model-path --output model.onnx --opset 15
The conversion process generates the model.onnx successfully and returns the following:
However, when I try to read the converted model, I get the following error:
File "C:\Tensorflow\testcovertedTF2ToONNX.py", line 10, in <module> net = cv2.dnn.readNetFromONNX('C:/Tensorflow/model.onnx') cv2.error: Unknown C++ exception from OpenCV code
The code used to read the converted network is simple.
import cv2
import numpy as np
image = cv2.imread("img002500.jpg")
if image is None:
print("image emplty")
image_height, image_width, _ = image.shape
net = cv2.dnn.readNetFromONNX('model.onnx')
image = image.astype(np.float32)
input_blob = cv2.dnn.blobFromImage(image, 1, (640,640), 0, swapRB=False, crop=False)
net.setInput(input_blob)
output = net.forward()
2nd Method: Trying to get Frozen graph from saved model
I tried to get frozen_graph.pb from my saved_model using the script below, found in
https://github.com/opencv/opencv/issues/16879#issuecomment-603815872
import tensorflow as tf
print(tf.__version__)
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
loaded = tf.saved_model.load('models/mnist_test')
infer = loaded.signatures['serving_default']
f = tf.function(infer).get_concrete_function(input_tensor=tf.TensorSpec(shape=[None, 640, 640, 3], dtype=tf.float32))
f2 = convert_variables_to_constants_v2(f)
graph_def = f2.graph.as_graph_def()
# Export frozen graph
with tf.io.gfile.GFile('frozen_graph.pb', 'wb') as f:
f.write(graph_def.SerializeToString())
Then, I tried to generate the text graph representation (graph.pbtxt) using tf_text_graph_ssd.py found in https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API
python tf_text_graph_ssd.py --input path2frozen_graph.pb --config path2pipeline.config --output outputgraph.pbtxt
The execution of this script returns the following error:
cv.dnn.writeTextGraph(modelPath, outputPath)
cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\dnn\src\tensorflow\tf_graph_simplifier.cpp:1052: error: (-215:Assertion failed) permIds.size() == net.node_size() in function 'cv::dnn::dnn4_v20211220::sortByExecutionOrder'
During the handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_ssd.py", line 413, in <module>
createSSDGraph(args.input, args.config, args.output)
File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_ssd.py", line 127, in createSSDGraph
writeTextGraph(modelPath, outputPath, outNames)
File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_common.py", line 320, in writeTextGraph
from tensorflow.tools.graph_transforms import TransformGraph
ModuleNotFoundError: No module named 'tensorflow.tools.graph_transforms'
Trying to read the generated frozen model without a graph.pb using dnn.readNet the code below:
import cv2
import numpy as np
image = cv2.imread("img002500.jpg")
if image is None:
print("image emplty")
image_height, image_width, _ = image.shape
net = cv2.dnn.readNet('frozen_graph_centernet.pb')
image = image.astype(np.float32)
# create blob from image (opencv dnn way of pre-processing)
input_blob = cv2.dnn.blobFromImage(image, 1, (1024,1024), 0, swapRB=False, crop=False)
net.setInput(input_blob)
output = net.forward()
returns the following error
Traceback (most recent call last):
File "C:\Tensorflow\testFrozengraphTF2.py", line 14, in <module>
output = net.forward()
cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\dnn\src\dnn.cpp:621: error: (-2:Unspecified error) Can't create layer "StatefulPartitionedCall" of type "StatefulPartitionedCall" in function 'cv::dnn::dnn4_v20211220::LayerData::getLayerInstance'
I understand that OpenCV doesn't import models with StatefulPartitionedCall (TF Eager mode). Unfortunately, this means the script found to export my saved model to frozen_graph did not work.
saved model
you can get my saved model from the link below
https://www.dropbox.com/s/liw5ff87rz7v5n5/my_model.zip?dl=0
#note: the exported model works well with the TensorFlow script
2nd Method: Trying to get Frozen graph from saved model
make_FB
https://medium.com/#sebastingarcaacosta/how-to-export-a-tensorflow-2-x-keras-model-to-a-frozen-and-optimized-graph-39740846d9eb
use pyopencv
model = cv.dnn.readNetFromTensorflow('./frozen_graph2.pb')

ValueError: [E143] Labels for component 'tagger' not initialized

I've been following this tutorial to create a custom NER. However, I keep getting this error:
ValueError: [E143] Labels for component 'tagger' not initialized. This can be fixed by calling add_label, or by providing a representative batch of examples to the component's initialize method.
This is how I defined the spacy model:
import spacy
from spacy.tokens import DocBin
from tqdm import tqdm
nlp = spacy.blank("ro") # load a new spacy model
source_nlp = spacy.load("ro_core_news_lg")
nlp.tokenizer.from_bytes(source_nlp.tokenizer.to_bytes())
nlp.add_pipe("tagger", source=source_nlp)
doc_bin = DocBin() # create a DocBin object
I just meet the same problem. The picture of setting the config file is misleading you.
If you just want to run through the tutrital, you can set the config file like this.
only click the check box on ner

client.upload_file() for nested modules

I have a project structured as follows;
- topmodule/
- childmodule1/
- my_func1.py
- childmodule2/
- my_func2.py
- common.py
- __init__.py
From my Jupyter notebook on an edge-node of a Dask cluster, I am doing the following
from topmodule.childmodule1.my_func1 import MyFuncClass1
from topmodule.childmodule2.my_func2 import MyFuncClass2
Then I am creating a distributed client & sending work as follows;
client = Client(YarnCluster())
client.submit(MyFuncClass1.execute)
This errors out, because the workers do not have the files of topmodule.
"/mnt1/yarn/usercache/hadoop/appcache/application_1572459480364_0007/container_1572459480364_0007_01_000003/environment/lib/python3.7/site-packages/distributed/protocol/pickle.py", line 59, in loads return pickle.loads(x) ModuleNotFoundError: No module named 'topmodule'
So what I tried to do is - I tried uploading every single file under "topmodule". The files directly under the "topmodule" seems to get uploaded, but the nested ones do not. Below is what I am talking about;
Code:
from pathlib import Path
for filename in Path('topmodule').rglob('*.py'):
print(filename)
client.upload_file(filename)
Console output:
topmodule/common.py # processes fine
topmodule/__init__.py # processes fine
topmodule/childmodule1/my_func1.py # throws error
Traceback:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-13-dbf487d43120> in <module>
3 for filename in Path('nodes').rglob('*.py'):
4 print(filename)
----> 5 client.upload_file(filename)
~/miniconda/lib/python3.7/site-packages/distributed/client.py in upload_file(self, filename, **kwargs)
2929 )
2930 if isinstance(result, Exception):
-> 2931 raise result
2932 else:
2933 return result
ModuleNotFoundError: No module named 'topmodule'
My question is - how can I upload an entire module and its files to workers? Our module is big so I want to avoid restructuring it just for this issue, unless the way we're structuring the module is fundamentally flawed.
Or - is there a better way to have all dask workers understand the modules perhaps from a git repository?
When you call upload_file on every file individually you lose the directory structure of your module.
If you want to upload a more comprehensive module you can package up your module into a zip or egg file and upload that.
https://docs.dask.org/en/latest/futures.html#distributed.Client.upload_file

"invalid sequence" error in seqio.write() of biopython

This question is related to bioinformatics. I did not recieve any suggestions in corresponding forums, so I write it here.
I need to remove non-ACTG nucleotides in fasta file and write output to a new file using seqio from biopython.
My code is
import re
import sys
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
seq_list=[]
for seq_record in SeqIO.parse("test.fasta", "fasta",IUPAC.ambiguous_dna):
sequence=seq_record.seq
sequence=sequence.tomutable()
seq_record.seq = re.sub('[^GATC]',"",str(sequence).upper())
seq_list.append(seq_record)
SeqIO.write(seq_list,"test_out","fasta")
Running this code gives errors:
Traceback (most recent call last):
File "remove.py", line 18, in <module>
SeqIO.write(list,"test_out","fasta")
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 481, in write
count = writer_class(fp).write_file(sequences)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages /Bio/SeqIO/Interfaces.py", line 209, in write_file
count = self.write_records(records)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 194, in write_records
self.write_record(record)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.py", line 202, in write_record
data = self._get_seq_string(record) # Catches sequence being None
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 100, in _get_seq_string
% record.id)
TypeError: SeqRecord (id=CALB_TCONS_00001015) has an invalid sequence.
If I change this line
seq_record.seq = re.sub('[^GATC]',"",str(sequence).upper())
to for example seq_record.seq = sequence + "A" everything works fine. However, re.sub('[^GATC]',"",str(sequence).upper()) also should work in theory.
Thanks
Biopython's SeqIO expects the SeqRecord object's .seq to be a Seq object (or similar), not a plain string. Try:
seq_record.seq = Seq(re.sub('[^GATC]',"",str(sequence).upper()))
For FASTA output there is no need to set the sequence's alphabet.

Resources