I have a time-series dataset in this format:
Time Val1 Val2
0 0.68 0.39
30 0.08 0.14
35 0.12 0.07
40 0.17 0.28
45 0.35 0.31
50 0.14 0.45
100 1.01 1.31
105 0.40 1.20
110 2.02 0.57
115 1.51 0.58
130 1.32 2.01
Using this dataset I want to extract(not predict) Time at which FC1=1 and FC2=1. Here is a plot that I created with annotated points I would like to extract.
I am looking for a solution using or function to interpolate/intercept to extract values. For example, if I draw a straight line at fold change 1 (say in y-axis), I want to extract all the points on X-axis where the line intercepts.
Looking forward for suggestions and thanks in advance !
You can use approxfun to do interpolations and uniroot to find single roots (places where the line crosses). You would need to run uniroot multiple times to find all the crossings, the rle function may help choose the starting points.
The FC values in your data never get close to 1 let alone cross it, so you must either have a lot more data than shown, or mean a different value.
If you can give more detail (possibly include a plot showing what you want) then we may be able to give more detailed help.
Edit
OK, here is some R code that finds where the lines cross:
con <- textConnection(' Time Val1 Val2
0 0.68 0.39
30 0.08 0.14
35 0.12 0.07
40 0.17 0.28
45 0.35 0.31
50 0.14 0.45
100 1.01 1.31
105 0.40 1.20
110 2.02 0.57
115 1.51 0.58
130 1.32 2.01')
mydat <- read.table(con, header=TRUE)
with(mydat, {
plot( Time, Val1, ylim=range(Val1,Val2), col='green', type='l' )
lines(Time, Val2, col='blue')
})
abline(h=1, col='red')
afun1 <- approxfun( mydat$Time, mydat$Val1 - 1 )
afun2 <- approxfun( mydat$Time, mydat$Val2 - 1 )
points1 <- cumsum( rle(sign(mydat$Val1 - 1))$lengths )
points2 <- cumsum( rle(sign(mydat$Val2 - 1))$lengths )
xval1 <- numeric( length(points1) - 1 )
xval2 <- numeric( length(points2) - 1 )
for( i in seq_along(xval1) ) {
tmp <- uniroot(afun1, mydat$Time[ points1[c(i, i+1)] ])
xval1[i] <- tmp$root
}
for( i in seq_along(xval2) ) {
tmp <- uniroot(afun2, mydat$Time[ points2[c(i, i+1)] ])
xval2[i] <- tmp$root
}
abline( v=xval1, col='green' )
abline( v=xval2, col='blue')
Related
I'm trying to set a lower bounds of zero on my influx query result so that negative values are replaced with zero in the result. e.g. For the query:
SELECT x from measurement
If my raw response is
time x
---- -
1632972969471900180 0
1632972969471988621 -130
1632972969472238055 803
then i want to alter the query so that the result is:
time x'
---- -
1632972969471900180 0
1632972969471988621 0
1632972969472238055 803
My solution was to use the ABS absolute value function, adding the absolute value to the original value and dividing by 2. This maps negative values to zero and leaves posiive (and zero) values unchanged. e.g.
SELECT (x + ABS(x)) / 2 from measurement
time x x'
---- - -
1632972969471900180 0 = 0 + 0 / 2 = 0
1632972969471988621 -130 = -130 + 130 / 2 = 0
1632972969472238055 803 = 803 + 803 / 2 = 803
Can anyone explain how to calculate the accuracy, sensitivity and specificity of multi-class dataset?
Sensitivity of each class can be calculated from its
TP/(TP+FN)
and specificity of each class can be calculated from its
TN/(TN+FP)
For more information about concept and equations
http://en.wikipedia.org/wiki/Sensitivity_and_specificity
For multi-class classification, you may use one against all approach.
Suppose there are three classes: C1, C2, and C3
"TP of C1" is all C1 instances that are classified as C1.
"TN of C1" is all non-C1 instances that are not classified as C1.
"FP of C1" is all non-C1 instances that are classified as C1.
"FN of C1" is all C1 instances that are not classified as C1.
To find these four terms of C2 or C3 you can replace C1 with C2 or C3.
In a simple sentences :
In a 2x2, once you have picked one category as positive, the other is automatically negative. With 9 categories, you basically have 9 different sensitivities, depending on which of the nine categories you pick as "positive". You could calculate these by collapsing to a 2x2, i.e. Class1 versus not-Class1, then Class2 versus not-Class2, and so on.
Example :
we get a confusion matrix for the 7 types of glass:
=== Confusion Matrix ===
a b c d e f g <-- classified as
50 15 3 0 0 1 1 | a = build wind float
16 47 6 0 2 3 2 | b = build wind non-float
5 5 6 0 0 1 0 | c = vehic wind float
0 0 0 0 0 0 0 | d = vehic wind non-float
0 2 0 0 10 0 1 | e = containers
1 1 0 0 0 7 0 | f = tableware
3 2 0 0 0 1 23 | g = headlamps
a true positive rate (sensitivity) calculated for each type of glass, plus an overall weighted average:
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.714 0.174 0.667 0.714 0.690 0.532 0.806 0.667 build wind float
0.618 0.181 0.653 0.618 0.635 0.443 0.768 0.606 build wind non-float
0.353 0.046 0.400 0.353 0.375 0.325 0.766 0.251 vehic wind float
0.000 0.000 0.000 0.000 0.000 0.000 ? ? vehic wind non-float
0.769 0.010 0.833 0.769 0.800 0.788 0.872 0.575 containers
0.778 0.029 0.538 0.778 0.636 0.629 0.930 0.527 tableware
0.793 0.022 0.852 0.793 0.821 0.795 0.869 0.738 headlamps
0.668 0.130 0.670 0.668 0.668 0.539 0.807 0.611 Weighted Avg.
You may print a classification report from the link below, you will get the overall accuracy of your model.
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report
compute sensitivity and specificity for multi classification
from sklearn.metrics import precision_recall_fscore_support
res = []
for l in [0,1,2,3]:
prec,recall,_,_ = precision_recall_fscore_support(np.array(y_true)==l,
np.array(y_prediction)==l,
pos_label=True,average=None)
res.append([l,recall[0],recall[1]])
pd.DataFrame(res,columns = ['class','sensitivity','specificity'])
I created a simple PDB file with a non-standard residue of repeat unit of polyethylene glycol (CH2-O-CH2) as follows
REMARK Materials Studio PDB file
REMARK Created: Mon Dec 04 09:52:49 2017
ATOM 1 CT1 EGR H 1 -14.882 2.339 0.134 1.00 0.00 C
ATOM 2 HC11 EGR H 1 -14.677 2.559 1.234 1.00 0.00 H
ATOM 3 HC12 EGR H 1 -14.774 3.298 -0.472 1.00 0.00 H
ATOM 4 OS1 EGR H 1 -13.892 1.317 -0.371 1.00 0.00 O
ATOM 5 CT2 EGR H 1 -12.493 1.852 -0.184 1.00 0.00 C
ATOM 6 HC21 EGR H 1 -12.292 2.009 0.928 1.00 0.00 H
ATOM 7 HC22 EGR H 1 -12.392 2.846 -0.732 1.00 0.00 H
TER 8
CONECT 1 2 3 4
CONECT 2 1
CONECT 3 1
CONECT 4 1 5
CONECT 5 4 7 8 6
CONECT 6 5
CONECT 7 5
END
I'm able to read this pdb file successfully using the bioPDB class using the following code
parser = PDBParser()
structure = parser.get_structure('EGR', pdb_file)
How to use this structure object to create a pdb file of a polymer chain of `'n' residues?
Let's say you want to replicate 10 times your residue over the x-axis with a gap of 5 angstroms between each residue. You could try something like:
import numpy as np
from Bio.PDB import PDBParser
from Bio.PDB.Residue import Residue
from Bio.PDB.Atom import Atom
parser = PDBParser()
io = PDBIO()
structure = parser.get_structure('EGR', pdb_file)
chain = list(structure.get_chains())[0]
atoms = list(structure.get_atoms())
serial_number = len(atoms)
gap = 5.0
for resnum in range(10):
resnum += 2 # position along the sequence
res_id = ('', resnum, '')
res_name = "EGR" + str(resnum) # define name of residue
res_segid = ' '
new_res = Residue(res_id, res_name, res_segid)
chain.add(new_res)
for atom in atoms:
serial_number += 1
atom_name = atom.name
atom_coord = atom.coord + [gap * (resnum + 1), 0, 0]
atom_bfactor = atom.bfactor
atom_occ = atom.occupancy
atom_altloc = atom.altloc
atom_fullname = atom.fullname
atom_serial = serial_number
atom_element = atom.element
new_atom = Atom(atom_name, atom_coord, atom_bfactor, atom_occ, atom_altloc, atom_fullname, atom_serial, element=atom_element)
new_res.add(new_atom)
I have a matlab Curve from which i would like to plot and find Concentration values at 17 different time samples
Following is the curve from which i would like to extract Concentration values at 17 different time points
following are the time points in minutes
t = 0,0.25,0.50,1,1.5,2,3,4,9,14,19,24,29,34,39,44,49. minutes samples
Following is the Function which i have written to plot the above graph
function c_t = output_function_constrainedK2(t, a1, a2, a3,b1,b2,b3,td, tmax,k1,k2,k3)
K_1 = (k1*k2)/(k2+k3);
K_2 = (k1*k3)/(k2+k3);
DV_free= k1/(k2+k3);
c_t = zeros(size(t));
ind = (t > td) & (t < tmax);
c_t(ind)= conv(((t(ind) - td) ./ (tmax - td) * (a1 + a2 + a3)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
ind = (t >= tmax);
c_t(ind)= conv((a1 * exp(-b1 * (t(ind) - tmax))+ a2 * exp(-b2 * (t(ind) - tmax))) + a3 * exp(-b3 * (t(ind) - tmax)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
plot(t,c_t);
axis([0 50 0 1400]);
xlabel('Time[mins]');
ylabel('concentration [Mbq]');
title('Model :Constrained K2');
end
If possible, Kindly please suggest me some idea how i could possibly alter the above function so that i can come up with concentration values at 17 different time points stated above
Following are the input values that i have used to come up with the curve
output_function_constrainedK2(0:0.1:50,2501,18500,65000,0.5,0.7,0.3,3,8,0.014,0.051,0.07)
This will give you concentration values at the time points you wanted. You will have to put this inside the output_function_constrainedK2 function so that you can access the variables t and c_t.
T=[0 0.25 0.50 1 1.5 2 3 4 9 14 19 24 29 34 39 44 49];
concentration=interp1(t,c_t,T)
I am so confused. I have tested a program for myself by following MATLAB code :
feature_train=[1 1 2 1.2 1 1 700 709 708 699 678];
No_of_Clusters = 2;
No_of_Iterations = 10;
[m,v,w]=gaussmix(feature_train,[],No_of_Iterations,No_of_Clusters);
feature_ubm=[1000 1001 1002 1002 1000 1060 70 79 78 99 78 23 32 33 23 22 30];
No_of_Clusters = 3;
No_of_Iterations = 10;
[mubm,vubm,wubm]=gaussmix(feature_ubm,[],No_of_Iterations,No_of_Clusters);
feature_test=[2 2 2.2 3 1 600 650 750 800 658];
[lp_train,rp,kh,kp]=gaussmixp(feature_test,m,v,w);
[lp_ubm,rp,kh,kp]=gaussmixp(feature_test,mubm,vubm,wubm);
However, the result is wondering me because the feature_test must be classified in feature_train not feature_ubm. As you see below the probability of feature_ubm is more than feature_train!?!
Can anyone explain for me what is the problem ?
Is the problem related to gaussmip and gaussmix MATLAB functions ?
sum(lp_ubm)
ans =
-3.4108e+06
sum(lp_train)
ans =
-1.8658e+05
As you see below the probability of feature_ubm is more than feature_train!?!
You see exactly the opposite, despite the absolute value of ubm is big, you are considering negative numbers and
sum(lp_train) > sum(lp_ubm)
hense
P(test|train) > P(test|ubm)
So your test chunk is correctly classified as train, not as ubm.