Error "StreamModeError: Fasta files must be opened in text mode" in BioPython on a Streamlit App - biopython

Please Assist, How can I resolve this Error?
I am building a streamlit app that take uploaded FASTA Files as input and read them.
I am getting the error below when I try to read an uploaded fasta file on my streamlit app
THE ERROR:
StreamModeError: Fasta files must be opened in text mode.
Traceback:
File "C:\Users\Sir Roberto\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\scriptrunner\script_runner.py", line 475, in _run_script
exec(code, module.__dict__)
File "C:\Users\Sir Roberto\PycharmProjects\SARS_CoV_2_Mutation_Forecasting_GUI\SARS_CoV_2_Mutation_Forecasting_GUI.py", line 45, in <module>
main()
File "C:\Users\Sir Roberto\PycharmProjects\SARS_CoV_2_Mutation_Forecasting_GUI\SARS_CoV_2_Mutation_Forecasting_GUI.py", line 21, in main
protein_sample = SeqIO.read(seq_file, 'fasta')
File "C:\Users\Sir Roberto\AppData\Local\Programs\Python\Python310\lib\site-packages\Bio\SeqIO\__init__.py", line 652, in read
iterator = parse(handle, format, alphabet)
File "C:\Users\Sir Roberto\AppData\Local\Programs\Python\Python310\lib\site-packages\Bio\SeqIO\__init__.py", line 605, in parse
return iterator_generator(handle)
File "C:\Users\Sir Roberto\AppData\Local\Programs\Python\Python310\lib\site-packages\Bio\SeqIO\FastaIO.py", line 183, in __init__
super().__init__(source, mode="t", fmt="Fasta")
File "C:\Users\Sir Roberto\AppData\Local\Programs\Python\Python310\lib\site-packages\Bio\SeqIO\Interfaces.py", line 53, in __init__
raise StreamModeError(
THE CODE GIVING THE ERROR
import streamlit as st
import tensorflow as tf
import tensorflow_io as tfio
from Bio import SeqIO
def main():
st.title('Covid-19 Mutation Forecasting App')
menu = ['Forecast Mutation', 'About The App']
choice = st.sidebar.selectbox('Select Activity', menu)
if choice == 'Forecast Mutation':
st.subheader('Mutation Forecasting Workspace')
seq_file = st.file_uploader('Upload a Sequence File:', type=['fasta'])
if seq_file is not None:
protein_sample = SeqIO.read(seq_file, 'fasta')
st.write(protein_sample)
loaded_model = tf.keras.models.load_model("")
next_step = st.checkbox('Forecast')
if next_step:
states = None
next_char = tfio.genome.read_fastaq(protein_sample)
result = [next_char]
for n in range(100):
next_char, states = one_step_reloaded.generate_one_step(next_char, states=states)
result.append(next_char)
print(tf.strings.join(result)[0].numpy().decode("utf-8"))
else:
st.subheader('About The App')
st.caption('Given the first few codons of a SARS-CoV-2 Spike Protein, '
'the App forecasts and display the complete sequence of the mutant')
if __name__ == '__main__':
main()`

The following code should get you started:
import streamlit as st
import tensorflow as tf
import tensorflow_io as tfio
from Bio import SeqIO
from io import StringIO
def main():
st.title('Covid-19 Mutation Forecasting App')
menu = ['Forecast Mutation', 'About The App']
choice = st.sidebar.selectbox('Select Activity', menu)
if choice == 'Forecast Mutation':
st.subheader('Mutation Forecasting Workspace')
seq_file = st.file_uploader('Upload a Sequence File:', type=['fasta'])
if seq_file is not None:
# To convert to a string based IO:
stringio = StringIO(seq_file.getvalue().decode("utf-8"))
for record in SeqIO.parse(stringio, 'fasta'):
sequence = str(record.seq)
st.write(f'Length of sequence: {len(sequence)}')
# The unique characters in the FASTA file
vocab = sorted(set(sequence))
st.write(f"{len(vocab)} unique characters: {', '.join(vocab)}")
tensor = tfio.genome.sequences_to_onehot(sequence)
st.write(tensor)
else:
st.subheader('About The App')
st.caption('Given the first few codons of a SARS-CoV-2 Spike Protein, '
'the App forecasts and display the complete sequence of the mutant')
if __name__ == '__main__':
main()
It looks as follows:
I used tfio.genome.sequence_to_onehot instead of tfio.genome.read_fastaq, because you are reading a FASTA file, not a FASTQ file.

Related

How to convert TensorFlow 2 saved model to be used with OpenCV dnn.readNet

I am struggling to find a way to convert my trained network using TensorFlow 2 Object detection API to be used with OpenCV for deployment purposes. I tried two methods for that but without success.
Could someone help me resolve this issue or propose the best and easy deep learning framework to convert my model to OpenCV (OpenCV friendly)?
I really appreciate any help you can provide.
This is my information system
OS Platform: Windows 10 64 bits
Tensorflow Version: 2.8
Python version: 3.9.7
OpenCV version: 4.5.5
1st Method: Using tf2onnx
I used the following code since I am using TensorFlow 2
python -m tf2onnx.convert --saved-model tensorflow-model-path --output model.onnx --opset 15
The conversion process generates the model.onnx successfully and returns the following:
However, when I try to read the converted model, I get the following error:
File "C:\Tensorflow\testcovertedTF2ToONNX.py", line 10, in <module> net = cv2.dnn.readNetFromONNX('C:/Tensorflow/model.onnx') cv2.error: Unknown C++ exception from OpenCV code
The code used to read the converted network is simple.
import cv2
import numpy as np
image = cv2.imread("img002500.jpg")
if image is None:
print("image emplty")
image_height, image_width, _ = image.shape
net = cv2.dnn.readNetFromONNX('model.onnx')
image = image.astype(np.float32)
input_blob = cv2.dnn.blobFromImage(image, 1, (640,640), 0, swapRB=False, crop=False)
net.setInput(input_blob)
output = net.forward()
2nd Method: Trying to get Frozen graph from saved model
I tried to get frozen_graph.pb from my saved_model using the script below, found in
https://github.com/opencv/opencv/issues/16879#issuecomment-603815872
import tensorflow as tf
print(tf.__version__)
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
loaded = tf.saved_model.load('models/mnist_test')
infer = loaded.signatures['serving_default']
f = tf.function(infer).get_concrete_function(input_tensor=tf.TensorSpec(shape=[None, 640, 640, 3], dtype=tf.float32))
f2 = convert_variables_to_constants_v2(f)
graph_def = f2.graph.as_graph_def()
# Export frozen graph
with tf.io.gfile.GFile('frozen_graph.pb', 'wb') as f:
f.write(graph_def.SerializeToString())
Then, I tried to generate the text graph representation (graph.pbtxt) using tf_text_graph_ssd.py found in https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API
python tf_text_graph_ssd.py --input path2frozen_graph.pb --config path2pipeline.config --output outputgraph.pbtxt
The execution of this script returns the following error:
cv.dnn.writeTextGraph(modelPath, outputPath)
cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\dnn\src\tensorflow\tf_graph_simplifier.cpp:1052: error: (-215:Assertion failed) permIds.size() == net.node_size() in function 'cv::dnn::dnn4_v20211220::sortByExecutionOrder'
During the handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_ssd.py", line 413, in <module>
createSSDGraph(args.input, args.config, args.output)
File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_ssd.py", line 127, in createSSDGraph
writeTextGraph(modelPath, outputPath, outNames)
File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_common.py", line 320, in writeTextGraph
from tensorflow.tools.graph_transforms import TransformGraph
ModuleNotFoundError: No module named 'tensorflow.tools.graph_transforms'
Trying to read the generated frozen model without a graph.pb using dnn.readNet the code below:
import cv2
import numpy as np
image = cv2.imread("img002500.jpg")
if image is None:
print("image emplty")
image_height, image_width, _ = image.shape
net = cv2.dnn.readNet('frozen_graph_centernet.pb')
image = image.astype(np.float32)
# create blob from image (opencv dnn way of pre-processing)
input_blob = cv2.dnn.blobFromImage(image, 1, (1024,1024), 0, swapRB=False, crop=False)
net.setInput(input_blob)
output = net.forward()
returns the following error
Traceback (most recent call last):
File "C:\Tensorflow\testFrozengraphTF2.py", line 14, in <module>
output = net.forward()
cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\dnn\src\dnn.cpp:621: error: (-2:Unspecified error) Can't create layer "StatefulPartitionedCall" of type "StatefulPartitionedCall" in function 'cv::dnn::dnn4_v20211220::LayerData::getLayerInstance'
I understand that OpenCV doesn't import models with StatefulPartitionedCall (TF Eager mode). Unfortunately, this means the script found to export my saved model to frozen_graph did not work.
saved model
you can get my saved model from the link below
https://www.dropbox.com/s/liw5ff87rz7v5n5/my_model.zip?dl=0
#note: the exported model works well with the TensorFlow script
2nd Method: Trying to get Frozen graph from saved model
make_FB
https://medium.com/#sebastingarcaacosta/how-to-export-a-tensorflow-2-x-keras-model-to-a-frozen-and-optimized-graph-39740846d9eb
use pyopencv
model = cv.dnn.readNetFromTensorflow('./frozen_graph2.pb')

How to address a Snakemake error at the stage of DAG computation?

I'm running in to an error with my Snakemake variant identification pipeline, when the original DAG of jobs is built. I believe this is a memory issue; when I test with a short list of input files, the DAG is constructed without issue, however, when I try with 300+ input paired-fastq, I receive the following error:
Building DAG of jobs...
Traceback (most recent call last):
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/__init__.py", line 633, in snakemake
keepincomplete=keep_incomplete,
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/workflow.py", line 568, in execute
dag.check_incomplete()
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/dag.py", line 281, in check_incomplete
incomplete = self.incomplete_files
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/dag.py", line 402, in incomplete_files
filterfalse(self.needrun, self.jobs),
File "/home/k/.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/dag.py", line 399, in <genexpr>
job.output
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 205, in incomplete
return any(map(lambda f: f.exists and marked_incomplete(f), job.output))
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 205, in <lambda>
return any(map(lambda f: f.exists and marked_incomplete(f), job.output))
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 203, in marked_incomplete
return self._read_record(self._metadata_path, f).get("incomplete", False)
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 322, in _read_record_cached
return self._read_record_uncached(subject, id)
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 328, in _read_record_uncached
return json.load(f)
File "/home//.conda/envs/snakemake/lib/python3.6/json/__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/home//.conda/envs/snakemake/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/home//.conda/envs/snakemake/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home//.conda/envs/snakemake/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I'm not sure how to resolve this - if this is a known bug or if there is a way to define my pipeline to build a less complex DAG? I am including the first section of my Snakemake file as well. I use the rule all to define all desired output files.
################################
#### Mtb bwa/GATK Snakemake ####
################################
import numpy as np
from collections import defaultdict
import pandas as pd
samples_df = pd.read_table('config/tgen_samples2a.tsv',sep = ',').set_index("sample", drop=False)
sample_names = list(samples_df['sample'])
batch_names = list(samples_df['batch'])
#print(sample_names)
# fastq1 input function definition
def fq1_from_sample(wildcards):
return samples_df.loc[wildcards.sample, "fastq_1"]
# fastq2 input function definition
def fq2_from_sample(wildcards):
return samples_df.loc[wildcards.sample, "fastq_2"]
# Define config file. Stores sample names and other things.
configfile: "config/config.yaml"
# Define a rule for running the complete pipeline.
rule all:
wildcard_constraints:
batch="IS-.+"
input:
trim = expand(['results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz'], zip, samp=sample_names,batch=batch_names),
kraken=expand('results/{batch}/{samp}/kraken/{samp}_trim_kr_1.fq.gz', zip, samp=sample_names,batch=batch_names),
bams=expand('results/{batch}/{samp}/bams/{samp}_{mapper}_{ref}_sorted.bam', zip, samp=sample_names,batch=batch_names, ref = config['ref']*len(sample_names), mapper = config['mapper']*len(sample_names)), # When using zip, need to use vectors of equal lengths for all wildcards.
per_samp_run_stats = expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_combined_stats.csv', zip, samp=sample_names,batch=batch_names, ref = config['ref']*len(sample_names), mapper = config['mapper']*len(sample_names)),
amr_stats=expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_amr.csv', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper']),
cov_stats=expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_cov_stats.txt', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper']),
all_sample_stats=expand('results/{batch}/stats/combined_per_run_sample_stats.csv',batch = batch_names),
vcfs=expand('results/{batch}/{samp}/vars/{samp}_{mapper}_{ref}_{caller}_qfilt.vcf.gz', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'], caller = config['caller']),
ann_vcfs=expand('results/{batch}/{samp}/vars/{samp}_{mapper}_{ref}_gatk_ann.vcf.gz', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'], caller = config['caller']),
fastas=expand('results/{batch}/{samp}/fasta/{samp}_{mapper}_{ref}_{caller}_{filter}.fa', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'], caller = config['caller'], filter=config['filter']),
profiles=expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_lineage.csv', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'])
# Trim reads for quality.
rule trim_reads:
input:
p1=fq1_from_sample,
p2=fq2_from_sample
output:
trim1='results/{batch}/{sample}/trim/{sample}_trim_1.fq.gz',
trim2='results/{batch}/{sample}/trim/{sample}_trim_2.fq.gz'
log:
'results/{batch}/{sample}/trim/{sample}_trim_reads.log'
shell:
'{config[scripts_dir]}trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'
# Filter reads taxonomically with Kraken.
rule taxonomic_filter:
input:
trim1='results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz',
trim2='results/{batch}/{samp}/trim/{samp}_trim_2.fq.gz'
output:
kr1='results/{batch}/{samp}/kraken/{samp}_trim_kr_1.fq.gz',
kr2='results/{batch}/{samp}/kraken/{samp}_trim_kr_2.fq.gz',
kraken_report='results/{batch}/{samp}/kraken/{samp}_kraken.report',
kraken_stats = 'results/{batch}/{samp}/kraken/{samp}_kraken_stats.csv'
log:
'results/{batch}/{samp}/kraken/{samp}_kraken.log'
threads: 8
shell:
'{config[scripts_dir]}run_kraken.sh {input.trim1} {input.trim2} {output.kr1} {output.kr2} {output.kraken_report} &>> {log}'
Thank you in advance for help using Snakemake!
All the best,
I kind of doubt memory is an issue. 300+ is not much, especially if each of them is processed independently of the others.
Try to start from the subset of samples that you say worked and gradually increase it until you see the problem appearing. Perhaps you have some funny value in your sample sheet or in your config? json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) hints at something like that in my impression.
The answer was from #TroyComi, above: after deleting the .snakemake directory, the issue was resolved. Thank you!

Gensim: error loading pretrained vectors No such file or directory: 'word2vec.kv.vectors.npy'

I am trying to load a Pretrained word2vec embeddings that is in gensim keyedvector 'word2vec.kv'
pretrained = KeyedVectors.load(args.pretrained mmap = 'r')
where arg.pretrained is "/ptembs/word2vec.kv"
and iam getting this error
File "main.py", line 60, in main
pretrained = KeyedVectors.load(args.pretrained, mmap = 'r')
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 1553, in load
model = super(WordEmbeddingsKeyedVectors, cls).load(fname_or_handle, **kwargs)
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 228, in load
return super(BaseKeyedVectors, cls).load(fname_or_handle, **kwargs)
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\utils.py", line 436, in load obj._load_specials(fname, mmap, compress, subname)
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\utils.py", line 478, in _load_specials
val = np.load(subname(fname, attrib), mmap_mode=mmap)
File "C:\Users\ASUS\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 417, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'ptembs/word2vec.kv.vectors.npy'
i dont understand why it need word2vec.kv.vectors.npy file ? and i dont have it.
Any idea how to solve this problem?
gensim version 3.8.3
tried it on 4.1.2 also same error.
Where did you get the file 'word2vec.kv'?
If loading that file triggers an error mentioning a 2nd file by name, then that 2nd file should've been created alongside 'word2vec.kv' when it was 1st saved using a .save() operation.
That other file needs to be kept alongside 'word2vec.kv' in order for 'word2vec.kv' to be .load()ed again in the future.

Problems with OAuth2 and gspread

I've had some working api code for quite a long time but suddenly (about 30 minutes from the previous use of the api) it's stopped working
here's the traceback
row_cells = self.range('%s:%s' % (start_cell, end_cell))
File"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/models.py", line 72, in wrapper
return method(self, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/models.py", line 412, in range
params={'range': name, 'return-empty': 'true'}
File "/Lbrary/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/client.py", line 176, in get_cells_feed
r = self.session.get(url)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/httpsession.py", line 73, in get
return self.request('GET', url, params=params, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/httpsession.py", line 69, in request
response.status_code, response.content))
gspread.exceptions.RequestError: (401, '401: b\'<HTML>\\n<HEAD>\\n<TITLE>Unauthorized</TITLE>\\n</HEAD>\\n<BODY BGCOLOR="#FFFFFF" TEXT="#000000">\\n<H1>Unauthorized</H1>\\n<H2>Error 401</H2>\\n</BODY>\\n</HTML>\\n\'')
And I don't really understand this...
here's the code
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import pprint
scope = [ 'https://spreadsheets.google.com/feeds' ]
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)
sheet = client.open('sheet_name').sheet1
I really don't know what to do, I've already created new api's email address, downloaded the json file (client_secret.json) but it still didn't get back to working and I honestly don't know why

"invalid sequence" error in seqio.write() of biopython

This question is related to bioinformatics. I did not recieve any suggestions in corresponding forums, so I write it here.
I need to remove non-ACTG nucleotides in fasta file and write output to a new file using seqio from biopython.
My code is
import re
import sys
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
seq_list=[]
for seq_record in SeqIO.parse("test.fasta", "fasta",IUPAC.ambiguous_dna):
sequence=seq_record.seq
sequence=sequence.tomutable()
seq_record.seq = re.sub('[^GATC]',"",str(sequence).upper())
seq_list.append(seq_record)
SeqIO.write(seq_list,"test_out","fasta")
Running this code gives errors:
Traceback (most recent call last):
File "remove.py", line 18, in <module>
SeqIO.write(list,"test_out","fasta")
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 481, in write
count = writer_class(fp).write_file(sequences)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages /Bio/SeqIO/Interfaces.py", line 209, in write_file
count = self.write_records(records)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 194, in write_records
self.write_record(record)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.py", line 202, in write_record
data = self._get_seq_string(record) # Catches sequence being None
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 100, in _get_seq_string
% record.id)
TypeError: SeqRecord (id=CALB_TCONS_00001015) has an invalid sequence.
If I change this line
seq_record.seq = re.sub('[^GATC]',"",str(sequence).upper())
to for example seq_record.seq = sequence + "A" everything works fine. However, re.sub('[^GATC]',"",str(sequence).upper()) also should work in theory.
Thanks
Biopython's SeqIO expects the SeqRecord object's .seq to be a Seq object (or similar), not a plain string. Try:
seq_record.seq = Seq(re.sub('[^GATC]',"",str(sequence).upper()))
For FASTA output there is no need to set the sequence's alphabet.

Resources