Searching for a Python tool to annotate the phylogenetic trees - biopython

I am working with the BioPython module Phylo in order to build phylogenetic trees.
https://biopython.org/wiki/Phylo
I have not found any options or other modules to add annotations or supplementary plots to the tree. Although I know it is existing for ggtree on R.
https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12628
Does anyone know if there is an annotation module on Phylo?
Here is my script if needed:
from Bio import Phylo, AlignIO
from io import StringIO
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
# Load IQtree tree file data: tree = Phylo.read(PATH + '/data/GVCF_SNPs.min4_RENAMED.phy.treefile', "newick")
tree.ladderize() fig = plt.figure(figsize=(20, 40), dpi=100) axes =
fig.add_subplot(1, 1, 1)
axes.legend(loc='center left', bbox_to_anchor=(1, 0.5))
Phylo.draw(tree, axes=axes)

Related

Difference between example Acrobot plant A matrix and standard form

In section 3.4.1 of the Underactuated Robotics notes (https://underactuated.mit.edu/acrobot.html#section4), the manipulator equations are linearized around a fixed point and the matrix A_lin is derived.
While verifying the linearization of my own attempt at making an acrobot, I used the python notebook provided in Example 3.5 (LQR for the Acrobot and Cart-pole) to obtain the A matrix of the linearized Acrobot (Plant from the Examples module). I did this by simply adding 'print(linearized_acrobot.A())' on line 21 of the LQR for Acrobot block. Interestingly, I noticed that the bottom right 2x2 block is nonzero, which is different from the form derived in the notes. What is the reason behind the difference? For convenience I'll leave the code below:
import matplotlib.pyplot as plt
import mpld3
import numpy as np
from IPython.display import HTML, display
from pydrake.all import (AddMultibodyPlantSceneGraph, ControllabilityMatrix,
DiagramBuilder, Linearize, LinearQuadraticRegulator,
MeshcatVisualizerCpp, Parser, Saturation, SceneGraph,
Simulator, StartMeshcat, WrapToSystem)
from pydrake.examples.acrobot import (AcrobotGeometry, AcrobotInput,
AcrobotPlant, AcrobotState)
from pydrake.solvers.mathematicalprogram import MathematicalProgram, Solve
from underactuated import FindResource, running_as_notebook
from underactuated.meshcat_cpp_utils import MeshcatSliders
from underactuated.quadrotor2d import Quadrotor2D, Quadrotor2DVisualizer
if running_as_notebook:
mpld3.enable_notebook()
def UprightState():
state = AcrobotState()
state.set_theta1(np.pi)
state.set_theta2(0.)
state.set_theta1dot(0.)
state.set_theta2dot(0.)
return state
def acrobot_controllability():
acrobot = AcrobotPlant()
context = acrobot.CreateDefaultContext()
input = AcrobotInput()
input.set_tau(0.)
acrobot.get_input_port(0).FixValue(context, input)
context.get_mutable_continuous_state_vector()\
.SetFromVector(UprightState().CopyToVector())
linearized_acrobot = Linearize(acrobot, context)
print(linearized_acrobot.A())
print(
f"The singular values of the controllability matrix are: {np.linalg.svd(ControllabilityMatrix(linearized_acrobot), compute_uv=False)}"
)
acrobot_controllability()
Great question. The AcrobotPlant in Drake has default parameters which include some joint friction, which leads to non-zero elements in the bottom corner. If you amend your code with
acrobot = AcrobotPlant()
context = acrobot.CreateDefaultContext()
params = acrobot.get_mutable_parameters(context)
print(params)
params.set_b1(0)
params.set_b2(0)
then the bottom-right 2x2 elements of the linearized A are zero as expected.

Complex nesting within imblearn pipelines

I have been trying to find a solution to this but unsuccessfully so far.
I am working with some data for which I need to adopt a resampling procedure within a (scikit-learn/imblearn) pipeline, meaning that the size of both the samples and targets has to change within the pipeline. In order to do this I am using FunctionSampler from imblearn.
My problem is that the main pipeline is composed of steps which are, actually, pipelines themselves, which is giving me some problems.
The code below shows an extremely simplified version of the scenario I am working in. Please note this is not the actual code I am using (the transformers/classifiers are different and many more in the original code), only the structure is similar.
# pipeline definition
from sklearn.preprocessing import StandardScaler, Normalizer, PolynomialFeatures
from sklearn.feature_selection import VarianceThreshold, SelectKBest
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.svm import SVC
# from sklearn.pipeline import Pipeline
from imblearn.pipeline import Pipeline
from imblearn import FunctionSampler
def outlier_extractor(X, y):
# just an example
return X, y
pipe = Pipeline(steps=[("feature_engineering", PolynomialFeatures()),
("variance_threshold", VarianceThreshold()),
("outlier_correction", FunctionSampler(func=outlier_extractor)),
("classifier", QuadraticDiscriminantAnalysis())])
# definition of the feature engineering options
feature_engineering_options = [
Pipeline(steps=[
("scaling", StandardScaler()),
("PCA", PCA(n_components=3))
]),
Pipeline(steps=[ # add div and prod features
("polynomial", PolynomialFeatures()),
("kBest", SelectKBest())
])
]
outlier_correction_options = [
FunctionSampler(func=outlier_extractor),
Pipeline(steps=[
("center_scaling", StandardScaler()),
("normalisation", Normalizer(norm="l2"))
])
]
# definition of the parameters to optimize in the pipeline
params = [ # support vector machine
{"feature_engineering": feature_engineering_options,
"variance_threshold__threshold": [0, 0.5, 1],
"outlier_correction": outlier_correction_options,
"classifier": [SVC()],
"classifier__C": [0.1, 1, 10, 50],
"classifier__kernel": ["linear", "rbf"],
},
# quadratic discriminant analysis
{"feature_engineering": feature_engineering_options,
"variance_threshold__threshold": [0, 0.5, 1],
"outlier_correction": outlier_correction_options,
"classifier": [QuadraticDiscriminantAnalysis()]
}
]
When using GridSearchCV(pipe, param_grid=params) I receive the error TypeError: All intermediate steps of the chain should be estimators that implement fit and transform or fit_resample. I know that I should unpack the pipelines, and I have also tried to follow this and this in order to solve the problem but my case seems (to me, at least) more complicated and I could not get these workarounds to work.
Any help/suggestion is very much appreciated. Thanks

Tensorflow, object detection API

Is there a way to view the images that tensorflow object detection api trains on after all preprocessing/augmentation.
I'd like to verify that things look correctly. I was able to verify the resizing my looking at the graph post resize in inference but I obviously can't do that for augmentation options.
TIA
I answered a similar question here.
You can utilize the test script provided by the api and make some changes to fit your need.
I wrote a little test script called augmentation_test.py. It borrowed some code from input_test.py
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import os
from absl.testing import parameterized
import numpy as np
import tensorflow as tf
from scipy.misc import imsave, imread
from object_detection import inputs
from object_detection.core import preprocessor
from object_detection.core import standard_fields as fields
from object_detection.utils import config_util
from object_detection.utils import test_case
FLAGS = tf.flags.FLAGS
class DataAugmentationFnTest(test_case.TestCase):
def test_apply_image_and_box_augmentation(self):
data_augmentation_options = [
(preprocessor.random_horizontal_flip, {
})
]
data_augmentation_fn = functools.partial(
inputs.augment_input_data,
data_augmentation_options=data_augmentation_options)
tensor_dict = {
fields.InputDataFields.image:
tf.constant(imread('lena.jpeg').astype(np.float32)),
fields.InputDataFields.groundtruth_boxes:
tf.constant(np.array([[.5, .5, 1., 1.]], np.float32))
}
augmented_tensor_dict =
data_augmentation_fn(tensor_dict=tensor_dict)
with self.test_session() as sess:
augmented_tensor_dict_out = sess.run(augmented_tensor_dict)
imsave('lena_out.jpeg',augmented_tensor_dict_out[fields.InputDataFields.image])
if __name__ == '__main__':
tf.test.main()
You can put this script under models/research/object_detection/ and simply run it with python augmentation_test.py (Of course you need to install the API first). To successfully run it you should provide any image name 'lena.jpeg' and the output image after augmentation would be saved as 'lena_out.jpeg'.
I ran it with the 'lena' image and here is the result before augmentation and after augmentation.
.
Note that I used preprocessor.random_horizontal_flip in the script. And the result showed exactly what the input image looks like after random_horizontal_flip. To test it with other augmentation options, you can replace the random_horizontal_flip with other methods (which are all defined in preprocessor.py), all you can append other options to the data_augmentation_options list, for example:
data_augmentation_options = [(preprocessor.resize_image, {
'new_height': 20,
'new_width': 20,
'method': tf.image.ResizeMethod.NEAREST_NEIGHBOR
}),(preprocessor.random_horizontal_flip, {
})]

No result after calculating the similarity of two words based on word vectors via Spacy's parser?

I have an example in spacy code:
from numpy import dot
from numpy.linalg import norm
from spacy.lang.en import English
parser = English()
# you can access known words from the parser's vocabulary
nasa = parser.vocab[u'NASA']
# cosine similarity
cosine = lambda v1, v2: dot(v1, v2) / (norm(v1) * norm(v2))
# gather all known words, take only the lowercased versions
allWords = list({w for w in parser.vocab if w.has_vector and
w.orth_.islower() and w.lower_ != unicode("nasa")})
# sort by similarity to NASA
allWords.sort(key=lambda w: cosine(w.vector, nasa.vector))
allWords.reverse()
print("Top 10 most similar words to NASA:")
for word in allWords[:10]:
print(word.orth_)
The result is like this:
Top 10 most similar words to NASA:
Process finished with exit code 0
So there is no similar words come out.
I have tried to install the parser and glove via cmd:
python -m spacy.en.download parser
python -m spacy.en.download glove
But failed, it turned out to be:
C:\Python\python.exe: No module named en
By the way, I use:
Python 2.7.9
Spacy 2.0.9
What's wrong with it? Thank you
The parser you are instantiating contains no word vectors. Check https://spacy.io/models/ for an overview of models.

Probable issue with LSTM in lasagne

With a simple constructor for the LSTM, as given in the tutorial, and an input of dimension [,,1] one would expect to see an output of shape [,,num_units].
But regardless of the num_units passed during construction, the output has the same shape as the input.
Following is the min code to replicate this issue...
import lasagne
import theano
import theano.tensor as T
import numpy as np
num_batches= 20
sequence_length= 100
data_dim= 1
train_data_3= np.random.rand(num_batches,sequence_length,data_dim).astype(theano.config.floatX)
#As in the tutorial
forget_gate = lasagne.layers.Gate(b=lasagne.init.Constant(5.0))
l_lstm = lasagne.layers.LSTMLayer(
(num_batches,sequence_length, data_dim),
num_units=8,
forgetgate=forget_gate
)
lstm_in= T.tensor3(name='x', dtype=theano.config.floatX)
lstm_out = lasagne.layers.get_output(l_lstm, {l_lstm:lstm_in})
f = theano.function([lstm_in], lstm_out)
lstm_output_np= f(train_data_3)
lstm_output_np.shape
#= (20, 100, 1)
An unqualified LSTM (I mean in its default mode) should produce one output for each unit right?
The code was run on kaixhin's cuda lasagne docker image docker image
What gives?
Thanks !
You can fix that by using a lasagne.layers.InputLayer
import lasagne
import theano
import theano.tensor as T
import numpy as np
num_batches= 20
sequence_length= 100
data_dim= 1
train_data_3= np.random.rand(num_batches,sequence_length,data_dim).astype(theano.config.floatX)
#As in the tutorial
forget_gate = lasagne.layers.Gate(b=lasagne.init.Constant(5.0))
input_layer = lasagne.layers.InputLayer(shape=(num_batches, # <-- change
sequence_length, data_dim),) # <-- change
l_lstm = lasagne.layers.LSTMLayer(input_layer, # <-- change
num_units=8,
forgetgate=forget_gate
)
lstm_in= T.tensor3(name='x', dtype=theano.config.floatX)
lstm_out = lasagne.layers.get_output(l_lstm, lstm_in) # <-- change
f = theano.function([lstm_in], lstm_out)
lstm_output_np= f(train_data_3)
print lstm_output_np.shape
If you feed your input into the input_layer, it is not ambiguous anymore, so you do not even need to specify where the input is supposed to go. Directly specifying a shape and adding the tensor3 into the LSTM does not work.

Resources