SeqIO.parse throwing error in genbank files - biopython

I'm working with some genbank seq files and have the following code:
for seq_record in SeqIO.parse("datafile_location, "genbank"):
And while it can run through most of the seqs in the seq file (which contains multiple seqs) I get the following error. Any thoughts about how to fix this?
Maybe delete the offending seq? It gets to record 92126 of 93145 and then throws the error.
I have tried re-downloading the seq file, but that doesn't fix the problem.
File "C:\python38\lib\site-packages\Bio\GenBank\Scanner.py", line 516,
in parse_records record = self.parse(handle, do_features) File
"C:\python38\lib\site-packages\Bio\GenBank\Scanner.py", line 499, in
parse if self.feed(handle, consumer, do_features): File
"C:\python38\lib\site-packages\Bio\GenBank\Scanner.py", line 466, in
feed self._feed_header_lines(consumer, self.parse_header()) File
"C:\python38\lib\site-packages\Bio\GenBank\Scanner.py", line 1801, in
feed_header_lines previous_value_line = structured_comment_dict[ KeyError: 'Assembly-Data'

Seems similar like BioPython issue #2844.
A pull request was recently merged to address this.

Related

Retrieving SwissProt from ExPASy server - error

I've just started learning my way around Biopython and I'm trying to use ExPASy to retrieve SwissProt records, just like described in page 180 of the Biopython tutorial (http://biopython.org/DIST/docs/tutorial/Tutorial.pdf), but also in a relevant ROSALIND exercise (http://rosalind.info/problems/dbpr/ - click to expand the "Programming shortcut" section).
The code I'm using is basically the same as in the ROSALIND exercise:
from Bio import ExPASy
from Bio import SwissProt
handle = ExPASy.get_sprot_raw('Q5SLP9')
record = SwissProt.read(handle)
However, the SwissProt.read function gives the following error messages (I've trimmed some of the filepaths):
Traceback (most recent call last): File "code.py", line 4, in <module>
record = SwissProt.read(handle) File "lib\site-packages\Bio\SwissProt\__init__.py", line 151, in read
record = _read(handle) File "lib\site-packages\Bio\SwissProt\__init__.py", line 255, in _read
_read_ft(record, line) File "lib\site-packages\Bio\SwissProt\__init__.py", line 594, in _read_ft
assert not from_res and not to_res, line AssertionError: /note="Single-stranded DNA-binding protein"
I found this has been reported in GitHub (https://github.com/biopython/biopython/issues/2417), so I'm not the first one who gets this, but I don't really find any updated version of the package or any way to fix the issue. Maybe it's because I'm very new to using packages. Could someone help me please?
Please update your BioPython to version 1.77. The issue has been fixed with pull request 2484.

CSV file that lists all student answers to the problem

As I followed the steps to download CSV of problem responses for a problem, It says “The problem responses report is being created. To view the status of the report, see Pending Tasks below.”
But, I am not seeing any pending tasks nor the files are being generated. Can’t see the generated file even after refreshing.
But, I can generate other CSV files for example, Problem Grade Report.
The issue is only for the problem response report.
PS: When I checked Admin, I could see that the request was failed with this error:
{"exception": "AssertionError", "traceback": "Traceback (most recent call last):\n File \"/openedx/venv/local/lib/python2.7/site-packages/celery/app/trace.py\", line 240, in trace_task\n R = retval = fun(*args, **kwargs)\n File \"/openedx/venv/local/lib/python2.7/site-packages/celery/app/trace.py\", line 438, in __protected_call__\n return self.run(*args, **kwargs)\n File \"/openedx/edx-platform/lms/djangoapps/instructor_task/tasks.py\", line 171, in calculate_problem_responses_csv\n return run_main_task(entry_id, task_fn, action_name)\n File \"/openedx/edx-platform/lms/djangoapps/instructor_task/tasks_helper/runner.py\", line 111, in run_main_task\n task_progress = task_fcn(entry_id, course_id, task_input, action_name)\n File \"/openedx/edx-platform/lms/djangoapps/instructor_task/tasks_helper/grades.py\", line 737, in generate\n usage_key_str=problem_location\n File \"/openedx/edx-platform/lms/djangoapps/instructor_task/tasks_helper/grades.py\", line 674, ...", "message": ""}
Please help.
After having much tries I was able to figure out my issue was caused because one operation which is not quite preferable by Open Edx. It is changing correct answer of a problem after few students submitting their responses. I was able to get their grades calculated based on new answers by using the Re-score option but unable to generate this report.
I will update here if I get a solution.

XMLStreamException on import-graphml

I exported the neo4j-database in graphml using neo4j-shell-tools format but while importing back the database at the production server I am getting the following error.
XMLStreamException: ParseError at [row,col]:[2542885,95] Message: An
invalid XML character (Unicode: 0x8) was found in the element content
of the document.
But there is no such character on line number 2542885.
I even deleted this line using sed -i (2542885d) but I am still getting the same error at the same line while importing. Strange.
It seems the line number which sed is referring to is not the same as the line at which the error is been thrown.
Please help out, I have spent a day to resolve this error. But no success.
Thank you. Error resolved.
I used xmllint, which gave the same error at another line number, and replacing that unicode character resolves the issue.

Erlang with Erlsom and DTD

I am trying to work with 1 GB XML and DTD file with Erlsom.
The problem is that parse_sax throws an exception becuase it cannot work with DTD file.
Basically i don't need this information so my question is how i tell the
sax_parser to just ignore this?
or even to use try and catch and when the error got catches then to skip this place on the file and continue from there.
This the exception:
** exception throw: {error,"Malformed: unknown reference: uuml"}
in function erlsom_sax_latin1:nowFinalyTranslate/3 (src/erlsom_sax_latin1.erl, line 1051)
in call from erlsom_sax_latin1:translateReferenceNonCharacter/4 (src/erlsom_sax_latin1.erl, line 1024)
in call from erlsom_sax_latin1:parseTextNoIgnore/3 (src/erlsom_sax_latin1.erl, line 922)
in call from erlsom_sax_latin1:parseContent/2 (src/erlsom_sax_latin1.erl, line 898)
in call from erlsom_sax_latin1:parse/2 (src/erlsom_sax_latin1.erl, line 172)
in call from mapReduce:run/0 (/home/alon/workspace/mapReduce/src/mapReduce.erl, line 26)(mapReduce#alon-Vostro-3300)2>
The problem is with "uuml" because in the XML file its apear with &uuml
Thanks for your help.
Hit the same error and found this in the ErlSom docs under limitations of the sax parser:
It doesn’t support entities, apart from the predefined ones (< etc.) and character references (&#nnn; and &#xhhh;).

twitter trends api UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: unexpected code byte

I am trying to follow the sample code of the book "Mining the social web", 1-3.
I know its old so I follow the new sample from the web page enter link description here
BUT, SOMETIMES, I will suffer a Error info when I implement the code:
[ trend.decode('utf-8') for trend in world_trends()[0]['trends'] ]
And the error info is like this:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.macosx-10.6-universal/egg/twitter/api.py", line 167, in __call__
File "build/bdist.macosx-10.6-universal/egg/twitter/api.py", line 173, in _handle_response
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: unexpected code byte
It doesnt always happen, but I think no programmer likes such a "random" case.
So could anyone help me on this issue? Whats the problem and how I can solve this?
Great thanks~
byte 0x8b in position 1 usually signals that the data stream is gzipped. For similar problems, see here and here.
To unzip the data stream:
buf = StringIO.StringIO(<response object>.content)
gzip_f = gzip.GzipFile(fileobj=buf)
content = gzip_f.read()
By default decode() will throw an error if it encounters a byte that it doesn't know how to decode.
You can use trend.decode('utf-8', 'replace') or trend.decode('utf-8', 'ignore') to not throw an error and silently ignore it.
Documentation on decode() here.

Resources