DBSeqRecord cannot access annotations, features using BioSQL under BioPython - biopython

Running on Ubuntu 12.10 with Python 2.7, latest BioPython and BioSQL.
I have successfully established the MySQL-based BioSQL server, and I can load sequences into the system properly (or they seem to be proper--tables are populated correctly in MySQL and things are generally error-free).
However--when I retrieve via 'lookup,' I can only access the id, name, and description for the DBSeqRecords. Annotations and features are supposed to be called on demand, but this crashes things. For example:
File "/usr/lib/pymodules/python2.7/Bio/SeqRecord.py", line 595, in __str__
lines.append("Number of features: %i" % len(self.features))
File "/usr/lib/pymodules/python2.7/BioSQL/BioSeq.py", line 516, in __get_features
self._primary_id)
File "/usr/lib/pymodules/python2.7/BioSQL/BioSeq.py", line 280, in _retrieve_features
feature.location = SeqFeature.FeatureLocation(start, end)
File "/usr/lib/pymodules/python2.7/Bio/SeqFeature.py", line 561, in __init__
raise TypeError(start)
TypeError: 0
Any idea what is happening here?

I encountered the same error just this week.
A feature's start and end positions are retrieved from MySQL as type long (not int) but SeqFeature.py expects int only for start and end when instantiating a FeatureLocation. I don't know what has changed in mysql server, MySQLdb, BioSeqDatabase or SeqFeature to have provoked the problem.
Hopefully one of the biopython developers will be able to provide a long term solution but these are suggestions for temporary fixes:
try to get to the MySQLdb connection used by BioSeqDatabase and change the conversion behaviour with the conv keyword parameter of its connect function (I haven't been able to do this)
hack BioSeqDatabase so that start_pos and end_pos are converted to int after the values are fetched from MySQL (I haven't done this)
hack SeqFeature.py to allow long type for start and end when instantiating a FeatureLocation object (this is what I have done). Change this:
if isinstance(start, AbstractPosition):
self._start = start
elif isinstance(start, int):
self._start = ExactPosition(start)
else:
raise TypeError(start)
if isinstance(end, AbstractPosition):
self._end = end
elif isinstance(end, int):
self._end = ExactPosition(end)
else:
raise TypeError(end)
to this:
if isinstance(start, AbstractPosition):
self._start = start
elif isinstance(start, int) or isinstance(start, long):
self._start = ExactPosition(start)
else:
raise TypeError(start)
if isinstance(end, AbstractPosition):
self._end = end
elif isinstance(end, int) or isinstance(end, long):
self._end = ExactPosition(end)
else:
raise TypeError(end)

Related

Lua program throwing error with stdin after completion

I've written a lua program for OpenComputers in Minecraft which downloads a github repo to the computer, file structure and all. The code:
internet = require("internet")
io = require("io")
filesystem = require("filesystem")
handle = internet.request("https://api.github.com/repos/Vedvart/Minecraft/contents")
result = ""
for chunk in handle do result = result..chunk end
while(not (string.find(result, "download_url") == nil)) do
i, j = string.find(result, "download_url")
qi, qj = string.find(string.sub(result, j+4,-1), '"')
url = string.sub(result, j + 4, j + qi + 2)
if string.sub(url, -14) == '.gitattributes' then goto continue end
ni, nj = string.find(url, "Vedvart")
filepath = string.sub(url, nj+2, -1)
ri, rj = string.find(string.reverse(filepath), "/")
if not filesystem.isDirectory("/home/github/"..string.sub(filepath, 1, -ri)) then
print('had to make')
filesystem.makeDirectory("/home/github/"..string.sub(filepath, 1, -ri))
end
print('here 2')
file_content_handle = internet.request(url)
file_content = ''
for chunk in file_content_handle do file_content = file_content .. chunk end
print(filepath)
print(io)
print(filesystem.isDirectory("/home/github/"..string.sub(filepath, 1, -ri)))
file = io.open('/home/github/'..filepath, 'w')
print(file)
file:write(file_content)
file:close()
print('here 3')
print(file)
::continue::
result = string.sub(result, j+qi+3, -1)
print('here 4')
end
print('done')
The code actually runs fine, reaching the final print statement and successfully generating the file structure and downloading all files. However, after this, the program throws an error:
I've googled around, and struggled to even find the same error being thrown, much less in the same circumstances. It doesn't seem like the error is originating directly in my code, but other code seems to run fine, so something in my program must be triggering this error elsewhere. I'm not sure what is causing it - any help is appreciated.
Not sure why you need to require io. This is not necessary in vanilla Lua and looking at the example snippets at their website it shouldn't be necessary in OpenComputers either.
io = require("io") overwrites any existing global io library and maybe that newly required table does not have a stdin field.
Just an educated guess :)
If your code causes this error after being completed a change in a global variable is one of the few possible reasons.

Can't Parse DHCP packets with Ryu's get_protocol(dhcp.dhcp)

I'm using the Ryu SDN controller with an Open vSwitch on mininet using OpenFlow 1.3 to parse DHCP packets. Following online examples and the Ryu resources, I've implemented a DHCP packet parser. However, it does not work as I expected it to, and I'm wondering if anyone has any insight as to why my first solution does not work?
An example of a code snippet for parsing a DHCP packet is below:
from ryu.lib.packet import dhcp
...
...
#set_ev_cls(ofp_event.EventOFPPacketIn, MAIN_DISPATCHER)
def _packet_in_handler(self, ev):
msg = ev.msg
datapath = msg.datapath
pkt = packet.Packet(msg.data)
dhcpPacket = pkt.get_protocol(dhcp.dhcp)
My code follows a similar vein:
from ryu.lib.packet import dhcp
...
...
#set_ev_cls(ofp_event.EventOFPPacketIn, MAIN_DISPATCHER)
def _packet_in_handler(self, ev):
pkt = {}
pkt['msg'] = ev.msg
pkt['dp'] = pkt['msg'].datapath
pkt['pkt'] = packet.Packet(pkt['msg'].data)
pkt['dhcp'] = pkt['pkt'].get_protocol(dhcp.dhcp)
This seems reasonable as I am following this exact sequence with other protocols like ARP, ICMP, IP, etc. Examples below.
pkt['arp'] = pkt['pkt'].get_protocol(arp.arp)
pkt['ip'] = pkt['pkt'].get_protocol(ipv4.ipv4)
pkt['icmp'] = pkt['pkt'].get_protocol(icmp.icmp)
The only problem is that the three parsers I list above actually return data, while the get_protocol for DHCP consistently returns None. I have already tested this by sending DHCP packets through my switch.
What does work is the following code snippet where I identify packet lists having more than three values. I save the value at index three and set that as my DHCP packet. In the DHCP packet, I concentrate of parsing the string at index 2. That contains the data I'm interested in.
# pkt['dhcp'] = pkt['pkt'].get_protocol(dhcp.dhcp)
# Check if pkt['pkt]] > 3 elements, if so, parse DHCP string
#Standard pkt['dhcp'] = (None, None, String)
if len(pkt['pkt']) > 3:
pkt['dhcp'] = dhcp.dhcp.parser(pkt['pkt'][3])
pkt['op'] = hex(ord(dhcp_p[2][0]))
pkt['htype'] = hex(ord(dhcp_p[2][1]))
pkt['hlen'] = hex(ord(dhcp_p[2][2]))
pkt['hops'] = hex(ord(dhcp_p[2][3]))
def parseDHCP(pkt_d,start,stop):
s_value = ''
stop += 1
for val in range(start,stop):
s_value += str(hex(ord(pkt_d[val])))
return s_value
pkt['xid'] = parseDHCP(dhcp_p[2],4,7)
pkt['secs'] = parseDHCP(dhcp_p[2],8,9)
pkt['flags'] = parseDHCP(dhcp_p[2],10,11)
pkt['ciaddr'] = parseDHCP(dhcp_p[2],12,15)
pkt['yiaddr'] = parseDHCP(dhcp_p[2],16,19)
pkt['siaddr'] = parseDHCP(dhcp_p[2],20,23)
pkt['giaddr'] = parseDHCP(dhcp_p[2],24,27)
pkt['chaddr'] = parseDHCP(dhcp_p[2],28,33)
pkt['pad'] = parseDHCP(dhcp_p[2],34,43)
A print out of these values looks like so:
0x1
0x1
0x6
0x0
0x440x30x980x11
0x00x0
0x00x0
0x00x00x00x0
0x00x00x00x0
0x00x00x00x0
0x00x00x00x0
0x7e0x1d0xcc0xe70xee0x4f
0x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x0
The above code allows me to observe the contents of DHCP packets, but I'm really trying to figure out why I'm not achieving similar results using the pkt['pkt'].get_protocol(dhcp.dhcp) method?
Ok, I found the problem. In lines 200 – 218 of dhcp.py there is a try except statement. I pulled the cls._parser out of the try to see the error it throws and I get this:
Ryuretic_coupler: Exception occurred during handler processing. Backtrace
from offending handler [initial_event] servicing event [EventOFPPacketIn] follows.
Traceback (most recent call last):
File "/home/ubuntu/ryu/ryu/base/app_manager.py", line 290, in _event_loop
handler(ev)
File "/home/ubuntu/ryu/ryu/app/Ryuretic/Ryuretic.py", line 72, in initial_event
pkt = parsPkt.handle_pkt(ev)
File "/home/ubuntu/ryu/ryu/app/Ryuretic/Pkt_Parse13.py", line 81, in handle_pkt
dhcp_p = pkt['dhcp'] = dhcp.dhcp.parser(pkt['pkt'][3])
File "/home/ubuntu/ryu/ryu/lib/packet/dhcp.py", line 212, in parser
return cls._parser(buf)
File "/home/ubuntu/ryu/ryu/lib/packet/dhcp.py", line 192, in _parser
) = struct.unpack_from(unpack_str, buf)
error: unpack_from requires a buffer of at least 233 bytes
So, dhcp.py is not receiving the 233 bytes that it requires. Unfortunately, I am using an Open vSwitch on the VM provided by SDN hub, and 128 bytes seems to be a limit. So, I have a conflict with Ryu's dhcp.py file. My solution was to modify dhcp.py. How that was done follows.
Before modifying the code, I recommend you first update your Ryu controller. The procedures are as follows:
Step 1: If you are using a VM. Take a stapshot or clone it now.
Step 2: Update Ryu
cd ryu
git pull
If you were running an older version like I was, then the following error may occur when you next attempt to run your Ryu controller:
Traceback (most recent call last):
File "./bin/ryu-manager", line 18, in <module>
from ryu.cmd.manager import main
File "/home/ubuntu/ryu/ryu/cmd/manager.py", line 31, in <module>
from ryu.base.app_manager import AppManager
File "/home/ubuntu/ryu/ryu/base/app_manager.py", line 37, in <module>
from ryu.controller.controller import Datapath
File "/home/ubuntu/ryu/ryu/controller/controller.py", line 74, in <module>
help='Maximum number of unreplied echo requests before datapath is disconnected.')
File "/usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py", line 1033, in __init__
super(IntOpt, self).__init__(name, type=types.Integer(), **kwargs)
TypeError: __init__() got an unexpected keyword argument 'min'
Step 3: Update your oslo_config file
sudo pip install oslo.config --upgrade
The TypeError from step 2 should now be resolved. (Hopefully, you cloned your VM just in case.)
Step 3: Modify Ryu's dhcp.py file
Ryu's dhcp.py file (located at /ryu/ryu/lib/packet) expects to receive a buffer of more than 235 bytes. Otherwise, it throws and error and returns nothing to the controller. Since my buffer only receives a buffer size of about 81 bytes. I modified the Ryu dhcp.py file as follows.
The error occurs because dhcp.py specifies a string format of '!BBBBIHH4s4s4s4s16s64s128s'. In #1, I created a second option. Do so, allows me to insert a few if statements to handle the packet differently if a packet arrives that is smaller than 100 bytes. Likewise, if the packet is greater than 235 bytes, then dhcp.py's handles the packet normally and returns the extra values.
class dhcp(packet_base.PacketBase):
"""DHCP (RFC 2131) header encoder/decoder class.
....deleted....
"""
_MIN_LEN = 236
_HLEN_UNPACK_STR = '!BBB'
_HLEN_UNPACK_LEN = struct.calcsize(_HLEN_UNPACK_STR)
_DHCP_UNPACK_STR = '!BIHH4s4s4s4s%ds%ds64s128s'
##################################################
#1(mod) Created second option for unpacking string
_DHCP_UNPACK_STR2 = '!BIHH4s4s4s4s%ds%ds40s'
_DHCP_PACK_STR = '!BBBBIHH4s4s4s4s16s64s128s'
#################################################
_DHCP_CHADDR_LEN = 16
_HARDWARE_TYPE_ETHERNET = 1
_class_prefixes = ['options']
_TYPE = {
'ascii': [
'ciaddr', 'yiaddr', 'siaddr', 'giaddr', 'chaddr', 'sname'
]
}
def __init__(self, op, chaddr, options, htype=_HARDWARE_TYPE_ETHERNET,
hlen=0, hops=0, xid=None, secs=0, flags=0,
ciaddr='0.0.0.0', yiaddr='0.0.0.0', siaddr='0.0.0.0',
giaddr='0.0.0.0', sname='', boot_file=b''):
super(dhcp, self).__init__()
#...Deleted No Changes made to init.
#classmethod
def _parser(cls, buf):
(op, htype, hlen) = struct.unpack_from(cls._HLEN_UNPACK_STR, buf)
buf = buf[cls._HLEN_UNPACK_LEN:]
####################################################
#2(mod) provided option for smaller packet sizes
if len(buf) < 100:
unpack_str = cls._DHCP_UNPACK_STR2 % (hlen,
(cls._DHCP_CHADDR_LEN - hlen))
else:
unpack_str = cls._DHCP_UNPACK_STR % (hlen,
(cls._DHCP_CHADDR_LEN - hlen))
#####################################################
min_len = struct.calcsize(unpack_str)
######################################################
#3(mod) provided option for smaller packets, set bootfile to b''
if min_len > 233:
(hops, xid, secs, flags, ciaddr, yiaddr, siaddr, giaddr,
chaddr, dummy, sname, boot_file
) = struct.unpack_from(unpack_str, buf)
else:
boot_file=b''
(hops, xid, secs, flags, ciaddr, yiaddr, siaddr, giaddr,
chaddr, dummy, sname, boot_file
) = struct.unpack_from(unpack_str, buf)+(boot_file,)
########################################################
length = min_len
########################################################
# (mod) provided option for smaller packet sizes, no parse_opt
if len(buf) > 233:
parse_opt = options.parser(buf[min_len:])
length += parse_opt.options_len
else:
parse_opt = None
length = min_len
#########################################################
return (cls(op, addrconv.mac.bin_to_text(chaddr), parse_opt,
htype, hlen, hops, xid, secs, flags,
addrconv.ipv4.bin_to_text(ciaddr),
addrconv.ipv4.bin_to_text(yiaddr),
addrconv.ipv4.bin_to_text(siaddr),
addrconv.ipv4.bin_to_text(giaddr),
sname.decode('ascii'), boot_file),
None, buf[length:])
The complete file will soon be on https://github.com/Ryuretic/RyureticLabs/tree/master/ryu/ryu/app/Ryuretic/Support_Files.
If you make these adjustements to the dhcp.py file, then you will gain access to the following header fields:
============== ====================
Attribute Description
============== ====================
op Message op code / message type.\
1 = BOOTREQUEST, 2 = BOOTREPLY
htype Hardware address type (e.g. '1' = 10mb ethernet).
hlen Hardware address length (e.g. '6' = 10mb ethernet).
hops Client sets to zero, optionally used by relay agent\
when booting via a relay agent.
xid Transaction ID, a random number chosen by the client,\
used by the client and serverto associate messages\
and responses between a client and a server.
secs Filled in by client, seconds elapsed since client\
began address acquisition or renewal process.
flags Flags.
ciaddr Client IP address; only filled in if client is in\
BOUND, RENEW or REBINDING state and can respond\
to ARP requests.
yiaddr 'your' (client) IP address.
siaddr IP address of next server to use in bootstrap;\
returned in DHCPOFFER, DHserver.
giaddr Relay agent IP address, used in booting via a\
relay agent.
chaddr Client hardware address.
sname(partial) Optional server host name, null terminated string.

Lua script to extract info from wireshark .pcap traces

I want to get the frame time (relative) of the first packet that is not communicating on sos and DIS ports and ip address is not the one mentioned in the if statement. But the packet should be utilizing port 24111. However, the code below is not working for this purpose. It works, until I add udp_port~=24111. After that it gives me no results, which means that it doesn't go inside that conditional statement. I have tried to write the condition in multiple ways, even separating it out into a new if statement but it doesn't work. What I am doing wrong here. Thanks for suggestions in advance.
Here is the piece of code that I have at the moment
local first_outpacket = 0
local flag = 0
function stats_first_packet()
local udp_port
local frame_time
local ip_addr
frame_time = time_relative_extractor()
udp_port = udp_port_extractor()
ip_addr = ip_addr_extractor()
if ( udp_port ) then
if (not (udp_port == 3000 or udp_port==3838 or flag==1 or ip_addr=="192.168.1.2" or udp_port~=24111)) then
first_outpacket = frame_time
print(frame_time)
flag = 1
else
-- print("tcp_src_port already recorded")
end
else
-- print("no tcp_src_port")
end
end
The problem apparently lies in the data type returned by the extractor() functions. In order to compare them with another value in the if statement they have to be converted into strings using tostring() function.
For example:
if (not (tostring(udp_port) == "3000" or tostring(udp_port)=="3838" or flag==1))

issue 'attempt to index global 'base'' within protocol fields creation

i try to write dissector on lua for wireshark. i need to parse header field version = 4 bytes (0x00000000)
my code:
do
local asc_sccp =Proto("asc_sccp", "ASC Skinny Client Control Protocol")
local f =asc_sccp.fields
f.length = ProtoField.bytes("asc_sccp.length", "length")
f.version =ProtoField.uint8("asc_sccp.version", "version", base.HEX, 0xC)
function asc_sccp.init()
end
function asc_sccp.dissector(buffer,pinfo,tree)
local subtree = tree:add (asc_sccp, buffer())
local offset = 0
pinfo.cols.protocol = asc_sccp.name
local length = buffer (offset, 4)
subtree:add (f.length, length)
subtree:append_text ("Data length: " .. length)
offset = offset + 4
local version = buffer (offset, 4)
subtree:add (f.version, version)
subtree:append_text (" Version: " .. version)
end
local tcp_table = DissectorTable.get("tcp.port")
tcp_table:add(2000, asc_sccp)
end
Why i am getting error 'attempt to index global 'base' (a nil value)' ?
Could you please help, i looked through a lot of dissector example but i can't find solution
This may be happening because init.lua isn't installed. This can happen in Redhat-based distributions (Fedora, Centos, RHEL etc.) if the development packages aren't installed. See answer here: https://stackoverflow.com/a/40489742/409638
In this line of code:
f.version =ProtoField.uint8("asc_sccp.version", "version", base.HEX, 0xC)
you are accessing the variable 'base'. Specifically, you are indexing it by telling Lua it is a table containing the key "HEX" and trying to retrieve the value at that key. Unless you define the variable 'base' somewhere to be a table (or userdata) and add a value at the key "HEX", lua will complain that you are attempting to index a global variable called 'base', when it is actually nil (i.e. non-existent).

How can you join two or more dictionaries created by Bio.SeqIO.index?

I would like to be able to join the two "dictionaries" stored in "indata" and "pairdata", but this code,
indata = SeqIO.index(infile, infmt)
pairdata = SeqIO.index(pairfile, infmt)
indata.update(pairdata)
produces the following error:
indata.update(pairdata)
TypeError: update() takes exactly 1 argument (2 given)
I have tried using,
indata = SeqIO.to_dict(SeqIO.parse(infile, infmt))
pairdata = SeqIO.to_dict(SeqIO.parse(pairfile, infmt))
indata.update(pairdata)
which does work, but the resulting dictionaries take up too much memory to be practical for for the sizes of infile and pairfile I have.
The final option I have explored is:
indata = SeqIO.index_db(indexfile, [infile, pairfile], infmt)
which works perfectly, but is very slow. Does anyone know how/whether I can successfully join the two indexes from the first example above?
SeqIO.index returns a read-only dictionary-like object, so update will not work on it (apologies for the confusing error message; I just checked in a fix for that to the main Biopython repository).
The best approach is to either use index_db, which will be slower but
only needs to index the file once, or to define a higher level object
which acts like a dictionary over your multiple files. Here is a
simple example:
from Bio import SeqIO
class MultiIndexDict:
def __init__(self, *indexes):
self._indexes = indexes
def __getitem__(self, key):
for idx in self._indexes:
try:
return idx[key]
except KeyError:
pass
raise KeyError("{0} not found".format(key))
indata = SeqIO.index("f001", "fasta")
pairdata = SeqIO.index("f002", "fasta")
combo = MultiIndexDict(indata, pairdata)
print combo['gi|3318709|pdb|1A91|'].description
print combo['gi|1348917|gb|G26685|G26685'].description
print combo["key_failure"]
In you don't plan to use the index again and memory isn't a limitation (which both appear to be true in your case), you can tell Bio.SeqIO.index_db(...) to use an in memory SQLite3 index with the special index name ":memory:" like so:
indata = SeqIO.index_db(":memory:", [infile, pairfile], infmt)
where infile and pairfile are filenames, and infmt is their format type as defined in Bio.SeqIO (e.g. "fasta").
This is actually a general trick with Python's SQLite3 library. For a small set of files this should be much faster than building the SQLite index on disk.

Resources