TypeError when attempting to parse pubmed EFetch - biopython

I'm new to this python/biopyhton stuff, so am struggling to work out why the following code, pretty much lifted straight out of the Biopython Cookbook, isn't doing what I'd expect.
I'd have thought it'd end up with the interpreter display two list containing the same number, but all i get is one list and then a message saying TypeError: 'generator' object is not subscriptable.
I'm guessing something is going wrong with the Medline.parse step and the result of the efetch isn't being processed in a way that allows subsequent interation to extract the PMID values. Or, the efetch isn't returning anything.
Any pointers at to what I'm doing wrong?
Thanks
from Bio import Medline
from Bio import Entrez
Entrez.email = 'A.N.Other#example.com'
handle = Entrez.esearch(db="pubmed", term="biopython")
record = Entrez.read(handle)
print(record['IdList'])
items = record['IdList']
handle2 = Entrez.efetch(db="pubmed", id=items, rettype="medline", retmode="text")
records = Medline.parse(handle2)
for r in records:
print(records['PMID'])

You're trying to print records['PMID'] which is a generator. I think you meant to do print(r['PMID']) which will print the 'PMID' entry in the current record dictionary object for each iteration. This is confirmed by the example given in the Bio.Medline.parse() documentation.

Related

Beautiful soup findAll returns empty list on this website?

I'm trying to extract property value history from this website:https://www.properly.ca/buy/home/view/ma-tEpHcSzeES-OlhE-V6A/bc/vancouver/1268-w-broadway-%23720/
But my code returns an empty list instead of the property cost history.
I used the following code:
from selenium import webdriver
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
url= "https://www.properly.ca/buy/home/view/ma-tEpHcSzeES-OlhE-V6A/bc/vancouver/1268-w-broadway-%23720/"
driver.maximize_window()
driver.get(url)
time.sleep(5)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
officials = soup.findAll("table",{"id":"property-history"})
for entry in officials:
print(str(entry))
Which returns an empty list, although this URL does have a property history table. Any help would be appreciated.
Thanks!
officials = soup.findAll("table",{"id":"property-history"})
On browser, I don't see a table with id="property-history" - but there is a div with that id, so maybe you can instead get the data you want through
officials = soup.find_all("div", {"id":"property-history"})
Btw, the only table I could find while inspecting the page was inside the map, and I don't think it holds any useful information for you.

How to retrieve an XMLTYPE data that contains special characters?

I want to retrieve XMLTYPE data from an Oracle table using cx_oracle.
the data looks like this:
<infos>
<Comment/>
<Observation>àéèç</Observation>
<Level>L3</Level>
<Duration/>
<Cause/>
<Depot> Haren </Depot>
<Resolution/>
</infos>
Here's my code:
#!/usr/bin/python
from __future__ import print_function
import cx_Oracle
# Connection to RTDIAG
try:
dsn_test = cx_Oracle.makedsn(host='xxxxx',port='1521',service_name='xxxxx')
con_test = cx_Oracle.connect(user='xxxx', password='xxxxx',dsn=xxxx)
except cx_Oracle.InterfaceError:
print ("Impossible to connect to the DB!")
print ("***exit script***")
quit()
ID_record = 1729
cursor = con_test.cursor()
query = """select a.content.getClobVal() from emb_log a where ID = :id and uncompleted_record=1
"""
cursor.execute(query,id=1729)
xml_retrieved = cursor.fetchone()[0].read() #string
print (xml_retrieved)
Here's what I get
<infos>
<Comment/>
<Observation>aeec</Observation>
<Level>L3</Level>
<Duration/>
<Cause/>
<Depot> Haren </Depot>
<Resolution/>
</infos>
The special characters contained within the XML child is not being retrieved proprely. They are converted in 'ascii like' characters.
Why and how to fetch the XML exactly the way it appears in the DB ?
Thank you.
Set your NLS environment. You will probably find it easiest to use
the
encoding
option when you connect.
for performance, you will want to fetch the CLOB via an OutputTypeHandler

How to correctly return a list of dictionaries in Zapier Code (Python)?

The Zapier code documentation says that the output of a code zap can be either a dictionary or a list of dictionaries (See "Data Variable" section: https://zapier.com/help/code-python/).
When doing this,
output = [{'Booking':'Shirt'},{'Booking':'Jeans'}]
the output of the code returns only the first dictionary, however:
runtime_meta__duration_ms: 2
runtime_meta__memory_used_mb: 22
id: [redacted]
Booking: Shirt
Fields with no value:
runtime_meta__logs
What am I doing wrong here? Thanks a lot!
David from the Zapier platform team here. Code steps returning an array is a mostly undocumented (because there's no UI support and it's confusing, as you can tell) feature.
When testing, it'll only show the first item in the array. When it runs for real, all steps after the code step will run for each item in the array. The task history will reflect this
So set up the zap and turn on and it'll work like you expect.
Sorry for the confusion and let me know if you have any other questions!
For anyone still looking for an answer to this questions, below is what find out returning list in Zapier.
# first import and convert your input value to an array.
# special note any line items imported into a python variable are converted to list format.
my_items = input_data['my_CSV_string']
my_list_of_items = my_items.split(",")
# Create a new list array
my_new_list = []
length = len(my_list_of_items)
#Do all your computations
for i in range(length):
my_new_list.append(float(my_list_of_items[i])*1.5)
# After completing any tasks you can return the list as follows,
# If you are using line items keep the list in its original format
return {
'my_processed_values': my_new_list,
'original_values': my_list_of_items
}
# If you want to return it as a CSV "basically making the array flat"
my_old_CSV_list= ','.join(map(str, my_list_of_items))
my_new_CSV_list= ','.join(map(str, my_new_list))
return {
'my_processed_cvs_values': my_new_CSV_list,
'original_values': my_list_of_items
}
Hope this helps. I am not a Python expert but in theory the more lists used the longer the zap will take to process. Try to keep your python processing time to the lowest.
Best,

Array.size() returned wrong values (Grails)

I'm developing an app using Grails. I want to get length of array.
I got a wrong value. Here is my code,
def Medias = params.medias
println params.medias // I got [37, 40]
println params.medias.size() // I got 7 but it should be 2
What I did wrong ?
Thanks for help.
What is params.medias (where is it being set)?
If Grials is treating it as a string, then using size() will return the length of the string, rather than an array.
Does:
println params.medias.length
also return 7?
You can check what Grails thinks an object is by using the assert keyword.
If it is indeed a string, you can try the following code to convert it into an array:
def mediasArray = Eval.me(params.medias)
println mediasArray.size()
The downside of this is that Eval presents the possibility of unwanted code execution if the params.medias is provided by an end user, or can be maliciously modified outside of your compiled code.
A good snippet on the "evil (or lack thereof) of eval" is here if you're interested (not mine):
https://javascriptweblog.wordpress.com/2010/04/19/how-evil-is-eval/
I think 7 is result of length of the string : "[37,40]"
Seems your media variable is an array not a collection
Try : params.medias.length
Thanks to everyone. I've found my mistake
First of all, I sent an array from client and my params.medias returned null,so I converted it to string but it is a wrong way.
Finally, I sent and array from client as array and in the grails, I got a params by
params."medias[]"
List medias = params.list('medias')
Documentation: http://grails.github.io/grails-doc/latest/guide/single.html#typeConverters

How can you join two or more dictionaries created by Bio.SeqIO.index?

I would like to be able to join the two "dictionaries" stored in "indata" and "pairdata", but this code,
indata = SeqIO.index(infile, infmt)
pairdata = SeqIO.index(pairfile, infmt)
indata.update(pairdata)
produces the following error:
indata.update(pairdata)
TypeError: update() takes exactly 1 argument (2 given)
I have tried using,
indata = SeqIO.to_dict(SeqIO.parse(infile, infmt))
pairdata = SeqIO.to_dict(SeqIO.parse(pairfile, infmt))
indata.update(pairdata)
which does work, but the resulting dictionaries take up too much memory to be practical for for the sizes of infile and pairfile I have.
The final option I have explored is:
indata = SeqIO.index_db(indexfile, [infile, pairfile], infmt)
which works perfectly, but is very slow. Does anyone know how/whether I can successfully join the two indexes from the first example above?
SeqIO.index returns a read-only dictionary-like object, so update will not work on it (apologies for the confusing error message; I just checked in a fix for that to the main Biopython repository).
The best approach is to either use index_db, which will be slower but
only needs to index the file once, or to define a higher level object
which acts like a dictionary over your multiple files. Here is a
simple example:
from Bio import SeqIO
class MultiIndexDict:
def __init__(self, *indexes):
self._indexes = indexes
def __getitem__(self, key):
for idx in self._indexes:
try:
return idx[key]
except KeyError:
pass
raise KeyError("{0} not found".format(key))
indata = SeqIO.index("f001", "fasta")
pairdata = SeqIO.index("f002", "fasta")
combo = MultiIndexDict(indata, pairdata)
print combo['gi|3318709|pdb|1A91|'].description
print combo['gi|1348917|gb|G26685|G26685'].description
print combo["key_failure"]
In you don't plan to use the index again and memory isn't a limitation (which both appear to be true in your case), you can tell Bio.SeqIO.index_db(...) to use an in memory SQLite3 index with the special index name ":memory:" like so:
indata = SeqIO.index_db(":memory:", [infile, pairfile], infmt)
where infile and pairfile are filenames, and infmt is their format type as defined in Bio.SeqIO (e.g. "fasta").
This is actually a general trick with Python's SQLite3 library. For a small set of files this should be much faster than building the SQLite index on disk.

Resources