Create a polymer chain of nonstandard residues from a single residue pdb - biopython

I created a simple PDB file with a non-standard residue of repeat unit of polyethylene glycol (CH2-O-CH2) as follows
REMARK Materials Studio PDB file
REMARK Created: Mon Dec 04 09:52:49 2017
ATOM 1 CT1 EGR H 1 -14.882 2.339 0.134 1.00 0.00 C
ATOM 2 HC11 EGR H 1 -14.677 2.559 1.234 1.00 0.00 H
ATOM 3 HC12 EGR H 1 -14.774 3.298 -0.472 1.00 0.00 H
ATOM 4 OS1 EGR H 1 -13.892 1.317 -0.371 1.00 0.00 O
ATOM 5 CT2 EGR H 1 -12.493 1.852 -0.184 1.00 0.00 C
ATOM 6 HC21 EGR H 1 -12.292 2.009 0.928 1.00 0.00 H
ATOM 7 HC22 EGR H 1 -12.392 2.846 -0.732 1.00 0.00 H
TER 8
CONECT 1 2 3 4
CONECT 2 1
CONECT 3 1
CONECT 4 1 5
CONECT 5 4 7 8 6
CONECT 6 5
CONECT 7 5
END
I'm able to read this pdb file successfully using the bioPDB class using the following code
parser = PDBParser()
structure = parser.get_structure('EGR', pdb_file)
How to use this structure object to create a pdb file of a polymer chain of `'n' residues?

Let's say you want to replicate 10 times your residue over the x-axis with a gap of 5 angstroms between each residue. You could try something like:
import numpy as np
from Bio.PDB import PDBParser
from Bio.PDB.Residue import Residue
from Bio.PDB.Atom import Atom
parser = PDBParser()
io = PDBIO()
structure = parser.get_structure('EGR', pdb_file)
chain = list(structure.get_chains())[0]
atoms = list(structure.get_atoms())
serial_number = len(atoms)
gap = 5.0
for resnum in range(10):
resnum += 2 # position along the sequence
res_id = ('', resnum, '')
res_name = "EGR" + str(resnum) # define name of residue
res_segid = ' '
new_res = Residue(res_id, res_name, res_segid)
chain.add(new_res)
for atom in atoms:
serial_number += 1
atom_name = atom.name
atom_coord = atom.coord + [gap * (resnum + 1), 0, 0]
atom_bfactor = atom.bfactor
atom_occ = atom.occupancy
atom_altloc = atom.altloc
atom_fullname = atom.fullname
atom_serial = serial_number
atom_element = atom.element
new_atom = Atom(atom_name, atom_coord, atom_bfactor, atom_occ, atom_altloc, atom_fullname, atom_serial, element=atom_element)
new_res.add(new_atom)

Related

Substitute variable with value, but don't evaluate

Suppose I have the following expressions:
(%i1) (8*x)*(log(x) / log(10));
(%i2) X^2;
Now, because I want to find out what constant value I can pick to make the statement %i1 is O(%i2) true, I evaluate them in a loop like so:
for a:1 thru 10 do print(%i1, "=", ev(%i1, x=a), %i2, "=", ev(%i2, numer, x=a));
The output is:
8 x log(x) 2
---------- = 0.0 , x = 1
log(10)
8 x log(x) 2
---------- = 4.816479930623698 , x = 4
log(10)
8 x log(x) 2
---------- = 11.45091011327189 , x = 9
log(10)
8 x log(x) 2
---------- = 19.26591972249479 , x = 16
log(10)
8 x log(x) 2
---------- = 27.95880017344075 , x = 25
log(10)
8 x log(x) 2
---------- = 37.35126001841489 , x = 36
log(10)
8 x log(x) 2
---------- = 47.32549024079837 , x = 49
log(10)
8 x log(x) 2
---------- = 57.79775916748438 , x = 64
log(10)
8 x log(x) 2
---------- = 68.70546067963139 , x = 81
log(10)
8 x log(x) 2
---------- = 80.0 , x = 100
log(10)
I want to make the output easier to eyeball, something like:
8 1 log(1) 2
---------- = 0.0 , 1 = 1
log(10)
8 2 log(2) 2
---------- = 4.816479930623698 , 2 = 4
log(10)
8 3 log(3) 2
---------- = 11.45091011327189 , 3 = 9
log(10)
[snip]
8 10 log(10) 2
---------- = 80.0 , 10 = 100
log(10)
How can I tell Maxima to substitute the value of a for x in every iteration of the loop without evaluating the expression?
I've searched the manual, but I didn't find anything seemingly relevant.
A lot of operations in Maxima are carried out by a process called "simplification", which means applying identities to make a "simpler" expression. E.g. 1 + 1 simplifies to 2, sin(0) simplifies to 0, etc.
In order to get the effect you want, we must disable simplification in general, so that expressions are evaluated but not simplified. But to get the numerical values, we need to enable simplification just for those results.
Here's something to do that.
(%i16) simp : false $
(%i17) for x in [1,2,3,4,5]
do print (ev(%i1) = ev(%i1, simp, numer), ev(%i2) = ev(%i2, simp));
log(1) 2
(8 1) (-------) = 0.0 1 = 1
log(10)
log(2) 2
(8 2) (-------) = 4.816479930623698 2 = 4
log(10)
log(3) 2
(8 3) (-------) = 11.4509101132719 3 = 9
log(10)
log(4) 2
(8 4) (-------) = 19.26591972249479 4 = 16
log(10)
log(5) 2
(8 5) (-------) = 27.95880017344075 5 = 25
log(10)
(%o17) done
Note that I wrote for x in [1, 2, 3, 4, 5] ... instead of for x:1 thru 5 .... That's because the latter uses arithmetic, which requires simplification. Try it both ways, I think you'll see the difference, and it is very enlightening, I believe.
Nota bene I've used the same values of %i1 and %i2 as you.
Use "empty" function:
(%i1) display2d: false $
(%i2) prefix("") $
(%i3) almost_subst(a, x, e):= subst(""(a), x, e) $
(%i4) almost_subst(10, x, 8*x*log(x)/log(10));
(%o4) (8* 10*log( 10))/log(10)

formatting strings in lua in a pattern

I want to make a script that takes any number, counts up to them and returns them in a format.
so like this
for i = 1,9 do
print(i)
end
will return
1
2
3
4
5
6
7
8
9
however I want it to print like this
1 2 3
4 5 6
7 8 9
and I want it to work even with things more than 9 so things like 20 would be like this
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20
I'm sure it can be done using the string library in lua but I am not sure how to use that library.
Any help?
function f(n,per_line)
per_line = per_line or 3
for i = 1,n do
io.write(i,'\t')
if i % per_line == 0 then io.write('\n') end
end
end
f(9)
f(20)
The for loop takes an optional third step:
for i = 1, 9, 3 do
print(string.format("%d %d %d", i, i + 1, i + 2))
end
I can think of 2 ways to do this:
local NUMBER = 20
local str = {}
for i=1,NUMBER-3,3 do
table.insert(str,i.." "..i+1 .." "..i+2)
end
local left = {}
for i=NUMBER-NUMBER%3+1,NUMBER do
table.insert(left,i)
end
str = table.concat(str,"\n").."\n"..table.concat(left," ")
And another one using gsub:
local NUMBER = 20
local str = {}
for i=1,NUMBER do
str[i] = i
end
-- Makes "1 2 3 4 ..."
str = table.concat(str," ")
-- Divides it per 3 numbers
-- "%d+ %d+ %d+" matches 3 numbers divided by spaces
-- (You can replace the spaces (including in concat) with "\t")
-- The (...) capture allows us to get those numbers as %1
-- The "%s?" at the end is to remove any trailing whitespace
-- (Else each line would be "N N N " instead of "N N N")
-- (Using the '?' as the last triplet might not have a space)
-- ^ e.g. NUMBER = 6 would make it end with "4 5 6"
-- The "%1\n" just gets us our numbers back and adds a newline
str = str:gsub("(%d+ %d+ %d+)%s?","%1\n")
print(str)
I've benchmarked both code snippets. The upper one is a tiny bit faster, although the difference is almost nothing:
Benchmarked using 10000 interations
NUMBER 20 20 20 100 100
Upper 256 ms 276 ms 260 ms 1129 ms 1114 ms
Lower 284 ms 280 ms 282 ms 1266 ms 1228 ms
Use a temporary table to contain the values until you print them:
local temp = {}
local cols = 3
for i = 1,9 do
if #temp == cols then
print(table.unpack(temp))
temp = {}
end
temp[#temp + 1] = i
end
--Last minute check for leftovers
if #temp > 0 then
print(table.unpack(temp))
end
temp = nil

How to encode video 3840x2160 with 32x32 and 16x16 CU with depth 2 and 1 in HEVC Encoder HM 13

When I try to encode a video the encoder crashes after finishing first GOP.
This is the configuration I'm using:
MaxCUWidth : 16 # Maximum coding unit width in pixel
MaxCUHeight : 16 # Maximum coding unit height in pixel
MaxPartitionDepth : 2 # Maximum coding unit depth
QuadtreeTULog2MaxSize : 3 # Log2 of maximum transform size for
# quadtree-based TU coding (2...5) = MaxPartitionDepth + 2 - 1
QuadtreeTULog2MinSize : 2 # Log2 of minimum transform size for
# quadtree-based TU coding (2...5)
QuadtreeTUMaxDepthInter : 1
QuadtreeTUMaxDepthIntra : 1
#======== Coding Structure =============
IntraPeriod : 8 # Period of I-Frame ( -1 = only first)
DecodingRefreshType : 1 # Random Accesss 0:none, 1:CDR, 2:IDR
GOPSize : 4 # GOP Size (number of B slice = GOPSize-1)
# Type POC QPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2 temporal_id #ref_pics_active #ref_pics reference pictures predict deltaRPS #ref_idcs reference idcs
Frame1: P 4 1 0.5 0 0 0 1 1 -4 0
Frame2: B 2 2 0.5 1 0 1 1 2 -2 2 1 2 2 1 1
Frame3: B 1 3 0.5 2 0 2 1 3 -1 1 3 1 1 3 1 1 1
Frame4: B 3 3 0.5 2 0 2 1 2 -1 1 1 -2 4 0 1 1 0
This also happens with CU=16x16 with depth=1
Note: I encoded CU=64x64 with depth=4 with the same GOP configuration and every thing went fine.
This is most probably due to the fact that you have compiled the binary for a 32-bit system?
Please rebuild it for a 64-bit system and the problem will go away.

Extract x-axis value using y-axis data in R

I have a time-series dataset in this format:
Time Val1 Val2
0 0.68 0.39
30 0.08 0.14
35 0.12 0.07
40 0.17 0.28
45 0.35 0.31
50 0.14 0.45
100 1.01 1.31
105 0.40 1.20
110 2.02 0.57
115 1.51 0.58
130 1.32 2.01
Using this dataset I want to extract(not predict) Time at which FC1=1 and FC2=1. Here is a plot that I created with annotated points I would like to extract.
I am looking for a solution using or function to interpolate/intercept to extract values. For example, if I draw a straight line at fold change 1 (say in y-axis), I want to extract all the points on X-axis where the line intercepts.
Looking forward for suggestions and thanks in advance !
You can use approxfun to do interpolations and uniroot to find single roots (places where the line crosses). You would need to run uniroot multiple times to find all the crossings, the rle function may help choose the starting points.
The FC values in your data never get close to 1 let alone cross it, so you must either have a lot more data than shown, or mean a different value.
If you can give more detail (possibly include a plot showing what you want) then we may be able to give more detailed help.
Edit
OK, here is some R code that finds where the lines cross:
con <- textConnection(' Time Val1 Val2
0 0.68 0.39
30 0.08 0.14
35 0.12 0.07
40 0.17 0.28
45 0.35 0.31
50 0.14 0.45
100 1.01 1.31
105 0.40 1.20
110 2.02 0.57
115 1.51 0.58
130 1.32 2.01')
mydat <- read.table(con, header=TRUE)
with(mydat, {
plot( Time, Val1, ylim=range(Val1,Val2), col='green', type='l' )
lines(Time, Val2, col='blue')
})
abline(h=1, col='red')
afun1 <- approxfun( mydat$Time, mydat$Val1 - 1 )
afun2 <- approxfun( mydat$Time, mydat$Val2 - 1 )
points1 <- cumsum( rle(sign(mydat$Val1 - 1))$lengths )
points2 <- cumsum( rle(sign(mydat$Val2 - 1))$lengths )
xval1 <- numeric( length(points1) - 1 )
xval2 <- numeric( length(points2) - 1 )
for( i in seq_along(xval1) ) {
tmp <- uniroot(afun1, mydat$Time[ points1[c(i, i+1)] ])
xval1[i] <- tmp$root
}
for( i in seq_along(xval2) ) {
tmp <- uniroot(afun2, mydat$Time[ points2[c(i, i+1)] ])
xval2[i] <- tmp$root
}
abline( v=xval1, col='green' )
abline( v=xval2, col='blue')

Calculated nCr mod m (n choose r) for large values of n (10^9)

Now that CodeSprint 3 is over, I've been wondering how to solve this problem. We need to simply calculate nCr mod 142857 for large values of r and n (0<=n<=10^9 ; 0<=r<=n). I used a recursive method which goes through min(r, n-r) iterations to calculate the combination. Turns out this wasn't efficient enough. I've tried a few different methods, but they all seem to not be efficient enough. Any suggestions?
For non-prime mod, factor it (142857 = 3^3 * 11 * 13 * 37) and compute C(n,k) mod p^q for each prime factor of the mod using the general Lucas theorem, and combine them using Chinese remainder theorem.
For example, C(234, 44) mod 142857 = 6084, then
C(234, 44) mod 3^3 = 9
C(234, 44) mod 11 = 1
C(234, 44) mod 13 = 0
C(234, 44) mod 37 = 16
The Chinese Remainder theorem involves finding x such that
x = 9 mod 3^3
x = 1 mod 11
x = 0 mod 13
x = 16 mod 37
The result is x = 6084.
Example
C(234, 44) mod 3^3
First convert n, k, and n-k to base p
n = 234_10 = 22200_3
k = 44_10 = 1122_3
r = n-k = 190_10 = 21001_3
Next find the number of carries
e[i] = number of carries from i to end
e 4 3 2 1 0
1 1
r 2 1 0 0 1
k 1 1 2 2
n 2 2 2 0 0
Now create the factorial function needed for general Lucas
def f(n, p):
r = 1
for i in range(1, n+1):
if i % p != 0:
r *= i
return r
Since q = 3, you will consider only three digits of the base p representation at a time
So
f(222_3, 3)/[f(210_3, 3) * f(011_3, 3)] *
f(220_3, 3)/[f(100_3, 3) * f(112_3, 3)] *
f(200_3, 3)/[f(001_3, 3) * f(122_3, 3)] = 6719344775 / 7
Now
s = 1 if p = 2 and q >= 3 else -1
Then
p^e[0] * s * 6719344775 / 7 mod 3^3
e[0] = 2
p^e[0] = 3^2 = 9
s = -1
p^e[0] * s * 6719344775 = -60474102975
Now you have
-60474102975 / 7 mod 3^3
This is a linear congruence and can be solved with
ModularInverse(7, 3^3) = 4
4 * -60474102975 mod 27 = 9
Hence C(234, 44) mod 3^3 = 9

Resources