Biopython fail to access hetero-residue directly - biopython

I am able to access directly a residue from the protein 1n31 by using:
residue = structure[0]['A'][100]
However, when I try to access a hetero-residue, like:
residue = structure[0]['A'][2003]
I get the error message:
File "<stdin>", line 1, in <module>
File "/home/azevedo/.local/lib/python3.5/site-packages/Bio/PDB/Chain.py", line 94, in __getitem__
return Entity.__getitem__(self, id)
File "/home/azevedo/.local/lib/python3.5/site-packages/Bio/PDB/Entity.py", line 41, in __getitem__
return self.child_dict[id]
KeyError: (' ', 2003, ' ')
Why is it happening? How can I directly access a hetero-residue?

Short answer
structure[0]['A'][('H_CYS', 2003, ' ')]
will give you the desired residue
<Residue CYS het=H_CYS resseq=2003 icode= >
BioPython's PDB indexes
BioPython's PDB residue index uses a tuple internally. It consists of hetero flag, sequence identifier and insertion code. For your residue 1000 it would be (' ', 100, ' '), in case of your hetero-residue it would be ('H_CYS', 2003, ' ').
If you provide only an integer as an index it gets translated to (' ', your_int, ' ').
The code can be found in the function _translate_id
General solution
If you don't know the hetero flag in advance, you could use your own function
def get_residue_by_number(residues, number):
for residue in residues:
if residue.id[1] == number:
return residue
get_residue_by_number(structure[0]['A'].get_residues(), 2003)
<Residue CYS het=H_CYS resseq=2003 icode= >
get_residue_by_number(structure[0]['A'].get_residues(), 100)
<Residue ASP het= resseq=100 icode= >

Related

Snowflake Store Procedure - Loop through csv files in AWS S3 and COPY INTO tables with same name

I was wondering if someone could help me with the error message I am getting from Snowflake. I am trying to create a stored procedure that will loop through 125 files in S3 and copy into the corresponding tables in Snowflake. The names of the tables are the same names as the csv files. In the example I only have 2 file names set up (if someone knows a better way than having to liste all 125, that will be extremely. helpful) .
The error message I am getting is the following:
syntax error line 5 at position 11 unexpected '1'.
syntax error line 6 at position 22 unexpected '='. (line 4)
CREATE OR REPLACE PROCEDURE load_data_S3(file_name VARCHAR,table_name VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
BEGIN
FOR i IN 1 to 2 LOOP
CASE i
WHEN 1 THEN
SET file_name = 'file1.csv';
SET table_name = 'FILE1';
WHEN 2 THEN
SET file_name = 'file2.csv';
SET table_name = 'FILE2';
--WILL LIST THE REMAINING 123 WHEN STATEMENTS
ELSE
-- Do nothing
END CASE;
COPY INTO table_name
FROM #externalstg/file_name
FILE_FORMAT = (type='csv');
END LOOP;
RETURN 'Data loaded successfully';
END;
$$;
There are various ways to list the files in a stage (see the post here). You can loop through the resultset and run COPY INTO on each record

In Sympy be, why does get_field fail for the finite field GF(9) (defined as a finite extension of GF(3))?

I'm trying to use the groebner function in Sympy to solve a system of multivariate polynomials in GF(9) = GF(3)[g]/(g**2 + 1). When I do this I get the error
File
"/Applications/Spyder.app/Contents/Resources/lib/python3.9/sympy/polys/groebnertools.py",
line 37, in groebner
orig, ring = ring, ring.clone(domain=domain.get_field())
File
"/Applications/Spyder.app/Contents/Resources/lib/python3.9/sympy/polys/domains/domain.py",
line 846, in get_field
raise DomainError('there is no field associated with %s' % self)
DomainError: there is no field associated with GF(3)[g]/(g**2 + 1)
I reduced the code to the minimum that gives this get_field() failure. I also looked at is_field ... which was True!?
I know that GF and similar mechanisms fail for non-prime finite fields, but I thought that all would be well, if I defined my own GF(9) using a finite extension.
Am I missing something, or does get_field() not realize that gf9 is a field?
My minimal code
from sympy import Symbol, Poly
from sympy.polys.agca.extensions import FiniteExtension
g = Symbol('g')
gf9 = FiniteExtension(Poly(g**2 + 1, g, modulus=3)) # correct GF(9)
print('gf9 = ', gf9, 'type =', type(gf9), 'is_field?', gf9.is_Field)
print()
print('field associated with', gf9, '=', gf9.get_field())
Response
gf9 = GF(3)[g]/(g**2 + 1) is_field? True
Traceback (most recent call last):
File
"/Applications/Spyder.app/Contents/Resources/lib/python3.9/spyder_kernels/py3compat.py",
line 356, in compat_exec
exec(code, globals, locals)
File "/Users/rick/.spyder-py3/temp.py", line 17, in
print('field associated with', gf9, '=', gf9.get_field())
File
"/Applications/Spyder.app/Contents/Resources/lib/python3.9/sympy/polys/domains/domain.py",
line 846, in get_field
raise DomainError('there is no field associated with %s' % self)
DomainError: there is no field associated with GF(3)[g]/(g**2 + 1)
Any help appreciated :)
Addendum: I just noticed that I never asked my real question: How can I use Sympy's groebner function with GF(3)[g]/(g**2 + 1)?

Cypher: How to get all possible variable-length chains and output concatenated string of node properties?

First time using neo4j. This is what my graph looks like.
The central node is of type Job, and the child nodes are of type Word. Each Word node has property word (i.e. Word.word), which is equivalent to the node labels such as "react", "php" etc. in the attached image.
What I am trying to do is for each chain of child nodes, generate a concatenated string of Word.word property values. For example, for the attached graph, I want to return something like:
[ "php", "react js", "javascript", "full stack development", "multithreaded load-balancing reactor engine"]
My current brute force approach looks like this:
match (webdev:Job {name:"Web Developer"}),
(webdev)-[a00:Appearance]->(w1:Word),
(w1)-[a01:Appearance]->(w2:Word),
(w2)-[a02:Appearance]->(w3:Word)
return w1.word + ' ' + w2.word + ' ' + w3.word as name
union
match (webdev:Job {name:"Web Developer"}),
(webdev)-[a00:Appearance]->(w1:Word),
(w1)-[a01:Appearance]->(w2:Word)
where not ((w2)-->())
return w1.word + ' ' + w2.word as name
union
match (webdev:Job {name:"Web Developer"}),
(webdev)-[a00:Appearance]->(w1:Word)
where not ((w1)-->())
return w1.word as name
which produces the output:
["multithreaded load-balancing reactor","full stack development","react js","php","javascript"]
This works for chains of length <= 3, but obviously it fails for length > 3. Notice how the string "multithreaded load-balancing reactor" should be "multithreaded load-balancing reactor engine".
My question is: how to I generalize this for all chains of variable length?
You can use a variable length relationship pattern:
MATCH p = (:Job {name:"Web Developer"})-[:Appearance*]->(leaf)
WHERE NOT (leaf)-[:Appearance]->()
RETURN REDUCE(s = NODES(p)[1].word, w IN NODES(p)[2..] | s + ' ' + w.word) AS name

Need to convert string of 1s and 0s as a binary representation into a decimal number

I am creating a subnetting program and have been able to convert the octets to a concatenated binary representation, but the problem I am facing is that the representation is in a string format:
IO.inspect(binary_subnet_address)
"11000000101010001100100000100000"
This is the subnetted address in binary, but how do I change it back into a grouping of 8 bits for each octet and convert it back to a decimal number?
I have found this answer but it doesn't specify how to change the binary back into a decimal number, and I am honestly not sure how to turn the string into a list of 8 items as it is.
Because your string:
"11000000101010001100100000100000"
contains only ASCII characters, each character is one byte long. That allows you to use a bitstring comprehension to extract 8 characters (= 8 bytes) at a time from your string:
defmodule A do
def split(str) do
for <<chunk::binary-size(8) <- str>> do
String.to_integer(chunk, 2)
end
end
end
In iex:
iex(13)> c "a.ex"
warning: redefining module A (current version defined in memory)
a.ex:1
[A]
iex(14)> A.split "11000000101010001100100000100000"
[192, 168, 200, 32]
have been able to convert the octets to a concatenated binary representation
You should post what you started with and then we can show you how to dispense with all the unnecessary stuff you probably did.
It looks like you just want to use String.to_integer/2 with base 2 on each part, right?
iex> for offset <- 0..3 do
...> "11000000101010001100100000100000"
...> |> binary_part(offset * 8, 8)
...> |> String.to_integer(2)
...> end
[192, 168, 200, 32]
Another (faster) option would be to use Bitwise for direct bits access, without a necessity to call String.to_integer(2) at all.
import Bitwise
input = "11000000101010001100100000100000"
for << <<b7, b6, b5, b4, b3, b2, b1, b0>> <- input>>,
do: ((b7 - ?0) <<< 7) +
((b6 - ?0) <<< 6) +
((b5 - ?0) <<< 5) +
((b4 - ?0) <<< 4) +
((b3 - ?0) <<< 3) +
((b2 - ?0) <<< 2) +
((b1 - ?0) <<< 1) +
((b0 - ?0) <<< 0)
#⇒ [192, 168, 200, 32]
The above might be made shorter with a macro, but I do not think it’s worth it here.

How to unparse a Julia expression

I've been trying to understand the Julia from a meta-programming viewpoint and often I find myself in the position where I wish to generate the user facing Julia syntax from an Expr.
Searching through the source code on GitHub I came across the "deparse" function defined in femtolisp. But it doesn't seem to be exposed at all.
What are the ways I can generate a proper Julia expression using just the internal representation?
P. S. There ought to be some sort of prettifying tool for the generated Julia code, do you know of some such (un/registered) pkg?
~#~#~#~#~
UPDATE
I've stored all the Meta.show_sexprof a julia source file into a different file.
# This function is identical to create_adder implementation above.
function create_adder(x)
y -> x + y
end
# You can also name the internal function, if you want
function create_adder(x)
function adder(y)
x + y
end
adder
end
add_10 = create_adder(10)
add_10(3) # => 13
is converted to
(:line, 473, :none),
(:function, (:call, :create_adder, :x), (:block,
(:line, 474, :none),
(:function, (:call, :adder, :y), (:block,
(:line, 475, :none),
(:call, :+, :x, :y)
)),
(:line, 477, :none),
:adder
)),
(:line, 480, :none),
(:(=), :add_10, (:call, :create_adder, 10)),
(:line, 481, :none),
(:call, :add_10, 3))
Now, Wish to evaluate these in julia.
Here's an example of a function that takes an "s_expression" in tuple form, and generates the corresponding Expr object:
"""rxpe_esrap: parse expr in reverse :p """
function rpxe_esrap(S_expr::Tuple)
return Expr( Tuple( isa(i, Tuple) ? rpxe_esrap(i) : i for i in S_expr )... );
end
Demo
Let's generate a nice s_expression tuple to test our function.
(Unfortunately Meta.show_sexpr doesn't generate a string, it just prints to an IOStream, so to get its output as a string that we can parse / eval, we either need to get it from a file, or print straight into something like an IOBuffer)
B = IOBuffer(); # will use to 'capture' the s_expr in
Expr1 = :(1 + 2 * 3); # the expr we want to generate an s_expr for
Meta.show_sexpr(B, Expr1); # push s_expr into buffer B
seek(B, 0); # 'rewind' buffer
SExprStr = read(B, String); # get buffer contents as string
close(B); # please to be closink after finished, da?
SExpr = parse(SExprStr) |> eval; # final s_expr in tuple form
resulting in the following s_expression:
julia> SExpr
(:call, :+, 1, (:call, :*, 2, 3))
Now let's test our function:
julia> rpxe_esrap(SExpr)
:(1 + 2 * 3) # Success!
Notes:
1. This is just a bare-bones function to demonstrate the concept, obviously this would need appropriate sanity checks if to be used on serious projects.
2. This implementation just takes a single "s_expr tuple" argument; your example shows a string that corresponds to a sequence of tuples, but presumably you could tokenise such a string first to obtain the individual tuple arguments, and run the function on each one separately.
3. The usual warnings regarding parse / eval and scope apply. Also, if you wanted to pass the s_expr string itself as the function argument, rather than an "s_expr tuple", then you could modify this function to move the parse / eval step inside the function. This may be a better choice, since you can check what the string contains before evaluating potentially dangerous code, etc etc.
4. I'm not saying there isn't an official function that does this. Though if there is one, I'm not aware of it. This was fun to write though.

Resources