I work with a lot of histograms. In particular, these histograms are of basecalls along segments on the human genome.
Each point along the x-axis is one of the four nitrogenous bases(A,C,T,G) that compose DNA and the y-axis represents how many times a base was able to be "called" (or recognized by a sequencer machine, so as to sequence the genome, which is simply determining the identity of each base along the genome).
Many of these histograms display roughly linear dropoffs (when the machines aren't able to get sufficient read depth) that fall to 0 or (almost-0) from plateau-like regions. When the score drops to zero, it means the sequencer isn't able to determine the identity of the base. If you've seen the double helix before, it means the sequencer can't figure out the identify of one half of a rung of the helix. Certain regions of the genome are more difficult to characterize than others. Bases (or x data points) with high numbers of basecalls, on the order of >=100, are able to be definitively identified. For example, if there were a total of 250 calls for one base, and we had 248 T's called, 1 G called, and 1 A called, we would call that a T. Regions with 0 basecalls are of concern because then we've got to infer from neighboring regions what the identity of the low-read region could be. Is there a straightforward algorithm for assigning these plots a score that reflects this tendency? See box.net/shared/nbygq2x03u for an example histo.
You could just use the count of base numbers where read depth was 0... The slope of that line could also be a useful indicator (steep negative slope = drop from plateau).
Related
I have an electromagnetic sensor and electromagnetic field emitter.
The sensor will read power from the emitter. I want to predict the position of the sensor using the reading.
Let me simplify the problem, suppose the sensor and the emitter are in 1 dimension world where there are only position X (not X,Y,Z) and the emitter emits power as a function of distance squared.
From the painted image below, you will see that the emitter is drawn as a circle and the sensor is drawn as a cross.
E.g. if the sensor is 5 meter away from the emitter, the reading you get on the sensor will be 5^2 = 25. So the correct position will be either 0 or 10, because the emitter is at position 5.
So, with one emitter, I cannot know the exact position of the sensor. I only know that there are 50% chance it's at 0, and 50% chance it's at 10.
So if I have two emitters like the following image:
I will get two readings. And I can know exactly where the sensor is. If the reading is 25 and 16, I know the sensor is at 10.
So from this fact, I want to use 2 emitters to locate the sensor.
Now that I've explained you the situation, my problems are like this:
The emitter has a more complicated function of the distance. It's
not just distance squared. And it also have noise. so I'm trying to
model it using machine learning.
Some of the areas, the emitter don't work so well. E.g. if you are
between 3 to 4 meters away, the emitter will always give you a fixed
reading of 9 instead of going from 9 to 16.
When I train the machine learning model with 2 inputs, the
prediction is very accurate. E.g. if the input is 25,36 and the
output will be position 0. But it means that after training, I
cannot move the emitters at all. If I move one of the emitters to be
further apart, the prediction will be broken immediately because the
reading will be something like 25,49 when the right emitter moves to
the right 1 meter. And the prediction can be anything because the
model has not seen this input pair before. And I cannot afford to
train the model on all possible distance of the 2 emitters.
The emitters can be slightly not identical. The difference will
be on the scale. E.g. one of the emitters can be giving 10% bigger
reading. But you can ignore this problem for now.
My question is How do I make the model work when the emitters are allowed to move? Give me some ideas.
Some of my ideas:
I think that I have to figure out the position of both
emitters relative to each other dynamically. But after knowing the
position of both emitters, how do I tell that to the model?
I have tried training each emitter separately instead of pairing
them as input. But that means there are many positions that cause
conflict like when you get reading=25, the model will predict the
average of 0 and 10 because both are valid position of reading=25.
You might suggest training to predict distance instead of position,
that's possible if there is no problem number 2. But because
there is problem number 2, the prediction between 3 to 4 meters away
will be wrong. The model will get input as 9, and the output will be
the average distance 3.5 meters or somewhere between 3 to 4 meters.
Use the model to predict position
probability density function instead of predicting the position.
E.g. when the reading is 9, the model should predict a uniform
density function from 3 to 4 meters. And then you can combine the 2
density functions from the 2 readings somehow. But I think it's not
going to be that accurate compared to modeling 2 emitters together
because the density function can be quite complicated. We cannot
assume normal distribution or even uniform distribution.
Use some kind of optimizer to predict the position separately for each
emitter based on the assumption that both predictions must be the same. If
the predictions are not the same, the optimizer must try to move the
predictions so that they are exactly at the same point. Maybe reinforcement
learning where the actions are "move left", "move right", etc.
I told you my ideas so that it might evoke some ideas in you. Because this is already my best but it's not solving the issue elegantly yet.
So ideally, I would want the end-to-end model that are fed 2 readings, and give me position even when the emitters are moved. How would I go about that?
PS. The emitters are only allowed to move before usage. During usage or prediction, the model can assume that the emitter will not be moved anymore. This allows you to have time to run emitters position calibration algorithm before usage. Maybe this will be a helpful thing for you to know.
You're confusing memoizing a function with training a model; the former is merely recalling previous results; the latter is the province of AI. To train with two emitters, you need to give the useful input data and appropriate labels (right answers), and design your model topology such that it can be trained to a useful functional response fro cases it has never seen.
Let the first emitter be at position 0 by definition. Your data then consists of the position of the second emitter and the two readings. The label is the sensor's position. Your given examples would look like this:
emit2 read1 read2 sensor
1 25 36 0
1 25 16 5
2 25 49 0
1.5 25 9 5 distance of 3 < d < 4 always reads as 3^2
Since you know that you have an squared relationship in the underlying physics, you need to include quadratic capability in your model. To handle noise, you'll want some dampener capability, such as an extra node or two in a hidden layer after the first. For more complex relationships, you'll need other topologies, non-linear activation functions, etc.
Can you take it from there?
Is there a pixel-based region growing algorithm that can be employed for the extraction of features (segmentation) on an image, by adding pixels to the seed based on the minimization of a certain metric. Potentially, a pixel can be removed if the metric is not optimized when this pixel is added (i.e. possibility to backtrack and go back to the seed obtained in the previous iterations).
I'll try to explain further my objectives:
This algorithm starts from a central pixel selected as an initial seed on the image.
Afterwards, each of the 4 neighbors is explored (right, left, bottom and top neighbors) separately, to see if the metric is optimized by growing the seed in the selected direction.
A neighboring pixel might not optimize the metric immediately, even if the seed created by adding this pixel will be optimal in future iterations.
There is a possibility that a neighboring pixel is added to the seed but is removed later, if the obtained seed is not optimal.
Can anyone suggest to me an Artificial Intelligence technique (or a greedy approach) that is adequate to solve this kind of problems? Also, what would be a good criteria to judge that the addition of a pixel will optimize the metric even though this will probably happen in future iterations.
P.S: I started implementing what's explained above in Python but was stuck in the issue of determining if a path (neighboring pixel) is worth exploring or not. Right now, I try to add a neighboring pixel only if the seed produced improve (i.e. minimize) the error relatively to the metric. However, even though by adding the right or left neighbors the metric isn't optimized, one of these two paths might lead to the optimal solution in the future (as explained in the third objective).
You've basically outlined the most successful algorithm you could get with this approach. It's success will depend heavily on the metric you use to add/remove pixels, but there are a few things you can do to emulate the behavior you want.
Definitions
We'll call the metric we're optimizing M where M(R) is the metric's value for a region R and a region R is some collections of pixels. I will assume that optimizing the metric will result in the largest possible value of M, but this approach can work if the goal is to minimize M as well.
Methodology
This approach is going to be slightly backwards to your original outline, but it should satisfy both requirements of adding pixels that lie in non-optimal paths from the seed and removing pixels that do not contribute significantly to the optimization.
We will begin at a seed s, but instead of evaluating paths as we go we will add all pixels in the image (or maximum feature size) iteratively to our region. At each step we will determine a value of the pixel based on how much it improves the metric for the current region, M(p). This is not the same as the value of the region containing the pixel (M(R) where p is in R). Rather it would be the difference of the value of the region containing the pixel and the value of the region before the pixel was added (M(p) = M(R) - M(R') where R = R' + p). If you have the capacity to evaluate a single pixel you could simply use that instead.
The next change is to include an regularization parameter in M(R) that penalizes the score based on the number of pixels included: N(R) = M(R) - a * |R| where a is some arbitrary positive constant and |R| represents the cardinality (number of pixels) in our region. Note: if the goal is to minimize M then a should be negative. This will have the effect of penalizing the score of the region if it includes too many pixels.
Finally, after all pixels have been added to the region and N(p) has been evaluated for each pixel we iterate over the region again. This time we begin at the last pixel added and iterate backwards over our set of pixels, ending at the seed s. At each iteration determine the score of the region N(R). If the score N(R) has decreased since the last iteration then we remove the pixel p with the lowest score N(p). This should have the effect of the smallest number of pixels in the region that contribute the most to the score.
Additional Considerations
If the remaining pixels lie on non-contiguous paths after pruning you could run a secondary algorithm to add in adjoining pixels. You'll need to do testing to determine an optimal value of a such that enough pixels are kept to reconstruct the building, but it doesn't include every pixel from the image.
My Opinion (that you didn't ask for)
In general I think you would have more luck with more robust algorithms such as Convolutional Neural Networks for feature classification. They'll likely be faster and definitely more accurate than the algorithm described above.
Let's say I have two frequencies sounding at the same time.
There's a harmonic series above each, right.
I need to know the frequency of the coincidental partial in order to apply a band pass filter around that frequency in real time.
Can this be done?
if you have two frequencies, then you have two sinus in superposition.
you have two option to extract one of the frequencies:
do a fft, then in the output you will get the two frequencies represented as two bins, then select the bin that represent the frequency you are interested in, (this is time consuming if you are doing the fft especially for this reason).
you can also implement an iir (or fir, iir is usually more efficient) band pass filter around the relevant frequency (if you have only two frequencies a low pass or high pass will be sufficient).
I'm designing a feed forward neural network learning how to play the game checkers.
For the input, the board has to be given and the output should give the probability of winning versus losing. But what is the ideal transformation of the checkers board to a row of numbers for input? There are 32 possible squares and 5 different possibilities (king or piece of white or black player and free position) on each square. If I provide an input unit for each possible value for each square, it will be 32 * 5. Another option is that:
Free Position: 0 0
Piece of white: 0 0.5 && King Piece of white: 0 1
Piece of black: 0.5 1 && King Piece of black: 1 0
In this case, the input length will be just 64, but I'm not sure which one will give a better result. Could anyone give any insight on this?
In case anyone is still interested in this topic—I suggest encoding the Checkers board with a 32 dimensional vector. I recently trained a CNN on an expert Checkers database and was able to acheive a suprisingly high level of play with no search, somewhat similar (I suspect) to the supervised learning step that Deepmind used to pretrain AlphaGo. I represented my input as an 8x4 grid, with entries in the set [-3, -1, 0, 1, 3] corresponding to an opposing king, opposing checker, empty, own checker, own king, repsectively. Thus, rather than encoding the board with a 160 dimensional vector where each dimension corresponds to a location-piece combination, the input space can be reduced to a 32-dimensional vector where each board location is represented by a unique dimension, and the piece at that location is encoded by a set of real numbers—this is done without any loss of information.
The more interesting question, at least in my mind, is which output encoding is most conducive for learning. One option is to encode it in the same way as the input. I would advise against this having found that simplifying the output encoding to a location (of the piece to move) and a direction (along which to move said piece) is much more advantageous for learning. While the reasons for this are likely more subtle, I suspect it is due to the enormous state space of checkers (something like 50^20 board possitions). Considering that the goal of our predictive model is to accept an input containing an enourmous number of possible states, and produce one ouput (i.e., move) from (at-most) 48 possibilities (12 pieces times 4 possible directions excluding jumps), a top priority in architecting a neural network should be matching the complexity of its input and output space to that of the actual game. With this in mind, I chose to encode the ouput as a 32 x 4 matrix, with each row representing a board location, and each column representing a direction. During training I simply unraveled this into a 128 dimensional, one-hot encoded vector (using argmax of softmax activations). Note that this output encoding lends itself to many invalid moves for a given board (e.g., moves off the board from edges and corners, moves to occupied locations, etc..)—we hope that the neural network can learn valid play given a large enough training set. I found that the CNN did a remarkable job at learning valid moves.
I’ve written more about this project at http://www.chrislarson.io/checkers-p1.
I've done this sort of thing with Tic-Tac-Toe. There are several ways to represent this. One of the most common for TTT is have input and output that represent the entire size of the board. In TTT this becomes 9 x hidden x 9. Input of -1 for X, 0 for none, 1 for O. Then the input to the neural network is the current state of the board. The output is the desired move. Whatever output neuron has the highest activation is going to be the move.
Propagation training will not work too well here because you will not have a finite training set. Something like Simulated Annealing, PSO, or anything with a score function would be ideal. Pitting the networks against each other for the scoring function would be great.
This worked somewhat well for TTT. I am not sure how it would work for Checkers. Chess would likely destroy it. For Go it would likely be useless.
The problem is that the neural network will learn patters only at fixed location. For example jumping an opponent in the top-left corner would be a totally different situation than jumping someone in the bottom left corner. These would have to be learned separately.
Perhaps better is to represent the exact state of the board in position independent way. This would require some thought. For instance you might communicate what "jump" opportunities exist. What move-towards king square opportunity's exist, etc and allow the net to learn to prioritize these.
I've tried all possibilities and intuitive i can say that the most great idea is separating all possibilities for all squares. Thus, concrete:
0 0 0: free
1 0 0: white piece
0 0 1: black piece
1 1 0: white king
0 1 1: black king
It is also possible to enhance other parameters about the situation of the game like the amount of pieces under threat or amount of possibilities to jump.
Please see this thesis
Blondie24 page 46, there is description of input for neural network.
My lecturer has slides on edge histograms for image retrieval, whereby he states that one must first divide the image into 4x4 blocks, and then check for edges at the horizontal, vertical, +45°, and -45° orientations. He then states that this is then represented in a 14x1 histogram. I have no idea how he came about deciding that a 14x1 histogram must be created. Does anyone know how he came up with this value, or how to create an edge histogram?
Thanks.
The thing you are referring to is called the Histogram of Oriented Gradients (HoG). However, the math doesn't work out for your example. Normally you will choose spatial binning parameters (the 4x4 blocks). For each block, you'll compute the gradient magnitude at some number of different directions (in your case, just 2 directions). So, in each block you'll have N_{directions} measurements. Multiply this by the number of blocks (16 for you), and you see that you have 16*N_{directions} total measurements.
To form the histogram, you simply concatenate these measurements into one long vector. Any way to do the concatenation is fine as long as you keep track of the way you map the bin/direction combo into a slot in the 1-D histogram. This long histogram of concatenations is then most often used for machine learning tasks, like training a classifier to recognize some aspect of images based upon the way their gradients are oriented.
But in your case, the professor must be doing something special, because if you have 16 different image blocks (a 4x4 grid of image blocks), then you'd need to compute less than 1 measurement per block to end up with a total of 14 measurements in the overall histogram.
Alternatively, the professor might mean that you take the range of angles in between [-45,+45] and you divide that into 14 different values: -45, -45 + 90/14, -45 + 2*90/14, ... and so on.
If that is what the professor means, then in that case you get 14 orientation bins within a single block. Once everything is concatenated, you'd have one very long 14*16 = 224-component vector describing the whole image overall.
Incidentally, I have done a lot of testing with Python implementations of Histogram of Gradient, so you can see some of the work linked here or here. There is also some example code at that site, though a more well-supported version of HoG appears in scikits.image.