Biopython: MMTFParser can't find distances between atoms - biopython

I'm using biopython to find the distance between the C alpha atoms of two residues and I keep getting an error. Here's my code and the error:
```
>>> from Bio.PDB.mmtf import MMTFParser
>>> structure = MMTFParser.get_structure_from_url('4mne')
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 0.
PDBConstructionWarning)
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain D is discontinuous at line 0.
PDBConstructionWarning)
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain E is discontinuous at line 0.
PDBConstructionWarning)
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain F is discontinuous at line 0.
PDBConstructionWarning)
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain H is discontinuous at line 0.
PDBConstructionWarning)
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 0.
PDBConstructionWarning)
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 0.
PDBConstructionWarning)
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain G is discontinuous at line 0.
PDBConstructionWarning)
>>> for c in structure.get_chains():
... if c.get_id() == 'B':
... chain = c
...
>>> chain
<Chain id=B>
>>> CAatoms = [a for a in chain.get_atoms() if a.id == 'CA']
>>> for a in CAatoms:
... for b in CAatoms:
... distance = a-b
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/Library/Python/2.7/site-packages/Bio/PDB/Atom.py", line 124, in __sub__
diff = self.coord - other.coord
TypeError: unsupported operand type(s) for -: 'list' and 'list'
>>>
```
Does this have something to do with the "get_structure_from_url" method of the MMTFParser?
I've tried this with PDBParser().get_structure and it worked fine.

Biopython's atom class has a custom subtraction method.
From the source code:
def __sub__(self, other):
"""Calculate distance between two atoms.
:param other: the other atom
:type other: L{Atom}
Examples
--------
>>> distance = atom1 - atom2
"""
diff = self.coord - other.coord
return numpy.sqrt(numpy.dot(diff, diff))
For MMTFParser this feature seems to be missing but you can easily do it yourself.
MMTFParser reads the coordinates as a list (init_atom(str(atom_name), [x, y, z] ..., line 53), unlike PDBParser which reads the coordinates as a Numpy array (coord = numpy.array((x, y, z), "f"), line 187).
In order to get the distance you can convert the coordinate lists to a Numpy array and then calculate the distance.
import numpy as np
distance = np.linalg.norm(np.array(a.coord) - np.array(b.coord))

Related

numpy cross does not support Drake's AutoDiffXd?

The following code
import numpy as np
from pydrake.all import InitializeAutoDiffTuple
x = np.array([1,2,3])
y = np.array([4,5,6])
np.cross(x,y)
(x_ad,y_ad) = InitializeAutoDiffTuple(x, y)
np.cross(x_ad, y_ad)
Leads to the error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-9-f44901f6fd4e> in <module>
----> 1 np.cross(x_ad, y_ad)
<__array_function__ internals> in cross(*args, **kwargs)
~/.local/lib/python3.8/site-packages/numpy/core/numeric.py in cross(a, b, axisa, axisb, axisc, axis)
1604 "(dimension must be 2 or 3)")
1605 if a.shape[-1] not in (2, 3) or b.shape[-1] not in (2, 3):
-> 1606 raise ValueError(msg)
1607
1608 # Create the output array
ValueError: incompatible dimensions for cross product
(dimension must be 2 or 3)
Does Drake AutoDiffXd not support numpy's cross product?
This is not a lack of support from Drake. np.cross just doesn't accept (3,1) vectors, which is the default shape coming out of InitializeAutoDiffTuple. Either of the following can work instead:
np.cross(np.squeeze(x_ad),np.squeeze(y_ad))
np.cross(x_ad.T, y_ad.T)

arguments and function call of LSTM in pytorch

Could anyone please explain me the below code:
import torch
import torch.nn as nn
input = torch.randn(5, 3, 10)
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
rnn = nn.LSTM(10,20,2)
output, (hn, cn) = rnn(input, (h0, c0))
print(input)
While calling rnn rnn(input, (h0, c0)) we gave arguments h0 and c0 in parenthesis. What is it supposed to mean? if (h0, c0) represents a single value then what is that value and what is the third argument passed here?
However, in the line rnn = nn.LSTM(10,20,2) we are passing arguments in LSTM function without paranthesis.
Can anyone explain me how this function call is working?
The assignment rnn = nn.LSTM(10, 20, 2) instanciates a new nn.Module using the nn.LSTM class. It's first three arguments are input_size (here 10), hidden_size (here 20) and num_layers (here 2).
On the other hand rnn(input, (h0, c0)) corresponds to actually calling the class instance, i.e. running __call__ which is roughly equivalent to the forward function of that module. The __call__ method of nn.LSTM takes in two parameters: input (shaped (sequnce_length, batch_size, input_size), and a tuple of two tensors (h_0, c_0) (both shaped (num_layers, batch_size, hidden_size) in the basic use case of nn.LSTM)
Please refer to the PyTorch documentation whenever using builtins, you will find the exact definition of the parameters list (the arguments used to initialize the class instance) as well as the input/outputs specifications (whenever inferring with that said module).
You might be confused with the notation, here's a small example that could help:
tuple as input:
def fn1(x, p):
a, b = p # unpack input
return a*x + b
>>> fn1(2, (3, 1))
>>> 7
tuple as output
def fn2(x):
return x, (3*x, x**2) # actually output is a tuple of int and tuple
>>> x, (a, b) = fn2(2) # unpacking
(2, (6, 4))
>>> x, a, b
(2, 6, 4)

error in getting categorial features from train dataset

My train data looks something like this:
train data
To extract categorial features out of it I ran following code"
categorial=[c for c in train.columns if train.columns(c).dtype in ['object'] ]
But I am getting error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-31-31eb7ac47e21> in <module>
----> 1 categorial=[c for c in train.columns if train.columns[c].dtype in ['object'] ]
<ipython-input-31-31eb7ac47e21> in <listcomp>(.0)
----> 1 categorial=[c for c in train.columns if train.columns[c].dtype in ['object'] ]
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in __getitem__(self, key)
4295 if is_scalar(key):
4296 key = com.cast_scalar_indexer(key, warn_float=True)
-> 4297 return getitem(key)
4298
4299 if isinstance(key, slice):
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
What is the possible solution?
Use this to select the 'object' type variables-
categorical = train.select_dtypes('object')
If you just want the variable names -
categorical_cols = train.select_dtypes('object').columns.tolist()

Is np.linalg.solve() not working for AutoDiff?

Does np.linalg.solve() not work for AutoDiff? I use is to solve manipulator equation. The error message is shown below.
I try a similar "double" version code, it is no issue. Please tell me how to fix it, thanks!
### here is the error message
vdot_ad = np.linalg.solve(M_,ggg_ad)
File "<__array_function__ internals>", line 5, in solve
File "/usr/local/lib/python3.8/site-packages/numpy/linalg/linalg.py", line 394, in solve
r = gufunc(a, b, signature=signature, extobj=extobj)
TypeError: No loop matching the specified signature and casting was found for ufunc solve1
####. here is the code
plant = MultibodyPlant(time_step= 0.01)
parser = Parser(plant)
parser.AddModelFromFile("double_pendulum.sdf")
plant.Finalize()
plant_autodiff = plant.ToAutoDiffXd()
####### <AutoDiff> get the error message
xu = np.hstack((x, u))
xu_ad = initializeAutoDiff(xu)[:,0]
x_ad = xu_ad[:4]
q_ad = x_ad[:2]
v_ad = x_ad[2:4]
u_ad = xu_ad[4:]
(M_, Cv_, tauG_, B_, tauExt_) = ManipulatorDynamics(plant_autodiff, q_ad, v_ad)
vdot_ad = np.linalg.solve(M_,tauG_ + np.dot(B_,u_ad) - np.dot(Cv_,v_ad))
Note that in pydrake, AutoDiffXd scalars are exposed to NumPy using dtype=object.
There are some drawbacks to this approach, like what you have ran into now.
This is not necessarily an issue with Drake, but a limitation on NumPy itself given the ufunc's that are implemented on the (super old) version that is on 18.04.
To illustrate, here is what I see on Ubuntu 18.04, CPython 3.6.9, NumPy 1.13.3:
>>> import numpy as np
>>> A = np.eye(2)
>>> b = np.array([1, 2])
>>> np.linalg.solve(A, b)
array([ 1., 2.])
>>> A = A.astype(object)
>>> b = b.astype(object)
>>> np.linalg.solve(A, b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/numpy/linalg/linalg.py", line 375, in solve
r = gufunc(a, b, signature=signature, extobj=extobj)
TypeError: No loop matching the specified signature and casting
was found for ufunc solve1
The most direct solution would be to expose an analogous routine in pydrake, and have users leverage that.
That is what we had to do for np.linalg.inv as well:
https://github.com/RobotLocomotion/drake/pull/11173/files
Not the best solution :( However, it's simple enough!

Keeping zeros in data with sklearn

I have a csv dataset that I'm trying to use with sklearn. The goal is to predict future webtraffic. However, my dataset contains zeros on days that there were no visitors and I'd like to keep that value. There are more days with zero visitors then there are with visitors (it's a tiny tiny site). Here's a look at the data
Col1 is the date:
10/1/11
10/2/11
10/3/11
etc....
Col2 is the # of visitors:
12
1
0
0
1
5
0
0
etc....
sklearn seems to interpret the zero values as NaN values which is understandable. How can I use those zero values in a logistic function (is that even possible)?
Update:
The estimator is https://github.com/facebookincubator/prophet and when I run the following:
df = pd.read_csv('~/tmp/datafile.csv')
df['y'] = np.log(df['y'])
df.head()
m = Prophet()
m.fit(df);
future = m.make_future_dataframe(periods=365)
future.tail()
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
m.plot(forecast);
m.plot_components(forecast);
plt.show
I get the following:
growthprediction.py:7: RuntimeWarning: divide by zero encountered in log
df['y'] = np.log(df['y'])
/usr/local/lib/python3.6/site-packages/fbprophet/forecaster.py:307: RuntimeWarning: invalid value encountered in double_scalars
k = (df['y_scaled'].ix[i1] - df['y_scaled'].ix[i0]) / T
Traceback (most recent call last):
File "growthprediction.py", line 11, in <module>
m.fit(df);
File "/usr/local/lib/python3.6/site-packages/fbprophet/forecaster.py", line 387, in fit
params = model.optimizing(dat, init=stan_init, iter=1e4)
File "/usr/local/lib/python3.6/site-packages/pystan/model.py", line 508, in optimizing
ret, sample = fit._call_sampler(stan_args)
File "stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.pyx", line 804, in stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.StanFit4Model._call_sampler (/var/folders/ym/m6j7kw0d3kj_0frscrtp58800000gn/T/tmp5wq7qltr/stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.cpp:16585)
File "stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.pyx", line 398, in stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666._call_sampler (/var/folders/ym/m6j7kw0d3kj_0frscrtp58800000gn/T/tmp5wq7qltr/stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.cpp:8818)
RuntimeError: k initialized to invalid value (nan)
In this line of your code:
df['y'] = np.log(df['y'])
you are taking logarithm of 0 when your df['y'] is zero, which results in warnings and NaNs in your resulting dataset, because logarithm of 0 is not defined.
sklearn itself does NOT interpret zero values as NaNs unless you replace them with NaNs in your preprocessing.

Resources