What is the purpose of Tiling layer in Caffe - machine-learning

What is the purpose of the Tiling layer in Caffe? It seems it is a form of reshaping the input, however I'm wondering how exactly it works and where it could be applied?
This is the source code:
template <typename Dtype>
void TilingLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
TilingParameter tiling_param = this->layer_param_.tiling_param();
tile_dim_ = tiling_param.tile_dim();
tile_dim_sq_ = tile_dim_ * tile_dim_;
CHECK(tile_dim_) << "tile_dim must be specified.";
CHECK_GT(tile_dim_, 0) << "tile_dim must be positive.";
}
template <typename Dtype> void TilingLayer<Dtype>::Reshape(const
vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
CHECK_EQ(top.size(), 1);
input_channels_ = bottom[0]->channels();
input_height_ = bottom[0]->height();
input_width_ = bottom[0]->width();
output_channels_ = bottom[0]->channels() / tile_dim_sq_;
output_width_ = input_width_ * tile_dim_;
output_height_ = input_height_ * tile_dim_;
count_per_output_map_ = output_width_ * output_height_;
count_per_input_map_ = input_width_ * input_height_;
CHECK_EQ(0, input_channels_ % tile_dim_sq_)
<< "The number of input channels for tiling layer must be multiples "
<< "of the tile_dim."; top[0]->Reshape(bottom[0]->num(),
input_channels_ / tile_dim_sq_,
input_height_ * tile_dim_, input_width_ * tile_dim_); }

tiling layer is different from tile layer, tiling layer is like reshape, but tile layer is like repmat.
=============== edit to add more details ==========
For tile layer, as shown in the source code, https://github.com/BVLC/caffe/blob/master/src/caffe/layers/tile_layer.cpp
Dtype* top_data = top[0]->mutable_cpu_data();
for (int i = 0; i < outer_dim_; ++i) {
for (int t = 0; t < tiles_; ++t) {
caffe_copy(inner_dim_, bottom_data, top_data);
top_data += inner_dim_;
}
bottom_data += inner_dim_;
}
the top data is just tiles_ times of input data, with NCHW and tile_dim = 8, you get blob with shape NC*(H*8)*(W*8)
but for tiling layer, it flatten the layer, for example you have NCHW blob, and tiling_dim=8, then after tiling layer, the count doesn't change, but you get blob with shape N(C/64)*(H*8)*(W*8).

"Tile" layer in caffe implements similar operation to numpy's tile, or Matlab's repmat functions: it copies the content of an array along a specified dimension.
For example, suppose you have a 2D "attention" (or "saliency") map, and you want to weigh the features according to these weights: give more weight to "salinet" regions and less to non "salient" regions. One way to achieve that is to multiply (element-wise) the 3D feature map by the 2D saliency map. To do so you need to "Tile" the saliency map along the channel dimension (from 2D to 3D) and then to apply "Eltwise" layer.

Related

Detect Image Problems

I really don't know what it is called (distortion or something else)
But I would like to detect lens camera problems for some different types of images by using emgucv (or opencv)
Any ideas about which algorithms to use would be appreciated
Second image seems to have high noise, but is there any way to understand high noise via opencv?
This is very difficult to achieve generically without reference data or a homogeneity sample. However, I have developed a recommendation analyzing the Average SNR (Signal to Noise) ratio of the image. The algorithm divides the input image into a specified number of "sub images' based on a specified kernel size in order to evaluate each independently for local SNR. The computed SNRs for each sub image are then mean averaged to provide an indicator for the global SNR of the image.
You will need to test this approach exhaustively, however it shows promise on the following three images, producing AvgSNR;
Image #1 - AvgSNR = 0.9
Image #2 - AvgSNR = 7.0
Image #3 - AvgSNR = 0.6
NOTE: See how the "clean" control image produces a much higher AvgSNR.
The only variable to consider is the kernel size. I would recommend keeping this at a size that will support will even the smallest of your potential input images. 30 pixels square should likely be appropriate for many images.
I enclose my test code with annotation:
class Program
{
static void Main(string[] args)
{
// List of file names to load.
List<string> fileNames = new List<string>()
{
"IifXZ.png",
"o1z7p.jpg",
"NdQtj.jpg"
};
// For each image
foreach (string fileName in fileNames)
{
// Determine local file path
string path = Path.Combine(Environment.CurrentDirectory, #"TestImages\", fileName);
// Load the image
Image<Bgr, byte> inputImage = new Image<Bgr, byte>(path);
// Compute the AvgSNR with a kernel of 30x30
Console.WriteLine(ComputeAverageSNR(30, inputImage.Convert<Gray, byte>()));
// Display the image
CvInvoke.NamedWindow("Test");
CvInvoke.Imshow("Test", inputImage);
while (CvInvoke.WaitKey() != 27) { }
}
// Pause for evaluation
Console.ReadKey();
}
static double ComputeAverageSNR(int kernelSize, Image<Gray, byte> image)
{
// Calculate the number of sub-divisions given the kernel size
int widthSubDivisions, heightSubDivisions;
widthSubDivisions = (int)Math.Floor((double)image.Width / kernelSize);
heightSubDivisions = (int)Math.Floor((double)image.Height / kernelSize);
int totalNumberSubDivisions = widthSubDivisions * heightSubDivisions;
Rectangle ROI = new Rectangle(0, 0, kernelSize, kernelSize);
double avgSNR = 0;
// Foreach sub-divions, calculate the SNR and sum to the avgSNR
for (int v = 0; v < heightSubDivisions; v++)
{
for (int u = 0; u < widthSubDivisions; u++)
{
// Iterate the sub-division position
ROI.Location = new Point(u * kernelSize, v * kernelSize);
// Calculate the SNR of this sub-division
avgSNR += ComputeSNR(image.GetSubRect(ROI));
}
}
avgSNR /= totalNumberSubDivisions;
return avgSNR;
}
static double ComputeSNR(Image<Gray, byte> image)
{
// Local varibles
double mean, sigma, snr;
// Calculate the mean pixel value for the sub-division
int population = image.Width * image.Height;
mean = CvInvoke.Sum(image).V0 / population;
// Calculate the Sigma of the sub-division population
double sumDeltaSqu = 0;
for (int v = 0; v < image.Height; v++)
{
for (int u = 0; u < image.Width; u++)
{
sumDeltaSqu += Math.Pow(image.Data[v, u, 0] - mean, 2);
}
}
sumDeltaSqu /= population;
sigma = Math.Pow(sumDeltaSqu, 0.5);
// Calculate and return the SNR value
snr = sigma == 0 ? mean : mean / sigma;
return snr;
}
}
NOTE: Without a reference, it is not possible to differentiate between natural variance/fidelity and "noise". For example, a highly texture background, or a scene with few homogeneous regions will yield a high AvgSNR. This approach will perform best when the evaluated scene consists mostly of plain, mono-color surfaces, such as the server room or shop front. Grass for example would contain a large amount of texture and therefore "noise".
An alternative method is to consider evaluating your images in the frequency domain following a Fourier transform. Principally, the noise examples you have provided are images containing unwanted, high frequency content. Conduct FFT and evaluate for images violating a threshold for high frequencies. Here you will from an example of FFT with Emgu: FFT with Emgu

Efficiently tell if one image is entirely comprised of the pixel values of another in OpenCV

I am trying to find an efficient way to see if one image is a subset of another (meaning that each unique pixel in one image is also found in the other.) The repetition or ordering of the pixels do not matter.
I am working in Java, so I would like all of my operations to be completed in OpenCV for efficiency's sake.
My first idea was to export a list of unique pixel values, and compare it to the list from the second image.
As there is not a built in function to extract unique pixels, I abandoned this approach.
I also understand that I can find the locations of a particular color with the inclusive inRange, and findNonZero operations.
Core.inRange(image, color, color, tempMat); // inclusive
Core.findNonZero(tempMat, colorLocations);
Unfortunately, this does not provide an adequate answer, as it would need to be executed per color, and would still require extracting unique pixels.
Essentially, I'm asking if there is a clever way to use the built in OpenCV functions to see if an image is comprised of the pixels found in another image.
I understand that this will not work for slight color differences. I am working on a limited dataset, and care about the exact pixel values.
To put the question more mathematically:
Because the only think you are interested in is the pixel values i would suggest to do the following.
Compute the histogram of image 1 using hist1 = calcHist()
Compute the histogram of image 2 using hist2 = calcHist()
Calculate the difference vector diff = hist1 - hist2
Check if each bin of the hist of the subimage is less or equal than the corresponding bin in the hist of the bigger image
Thanks to Miki for the fix.
I will keep Amitay's as the accepted answer, as he absolutely lead me down the correct path. I wanted to also share my exact answer for anyone who finds this in the future.
As I stated in my question, I was looking for an efficient way to see if the RGB values of one image were a subset of the RGB values of another image.
I made a function to the following specification:
The Java code is as follows:
private boolean isSubset(Mat subset, Mat subMask, Mat superset) {
// Get unique set of pixels from both images
subset = getUniquePixels(subset, subMask);
superset = getUniquePixels(superset, null);
// See if the superset pixels encapsulate the subset pixels
// OR the unique pixels together
Mat subOrSuper = new Mat();
Core.bitwise_or(subset, superset, subOrSuper);
//See if the ORed matrix is equal to the superset
Mat notEqualMat = new Mat();
Core.compare(superset, subOrSuper, notEqualMat, Core.CMP_NE);
return Core.countNonZero(notEqualMat) == 0;
}
subset and superset are assumed to be CV_8UC3 matricies, while subMask is assumed to be CV_8UC1.
private Mat getUniquePixels(Mat img, Mat mask) {
if (mask == null) {
mask = new Mat();
}
// int bgrValue = (b << 16) + (g << 8) + r;
img.convertTo(img, CvType.CV_32FC3);
Vector<Mat> splitImg = new Vector<>();
Core.split(img, splitImg);
Mat flatImg = Mat.zeros(img.rows(), img.cols(), CvType.CV_32FC1);
Mat multiplier;
for (int i = 0; i < splitImg.size(); i++) {
multiplier = Mat.ones(img.rows(), img.cols(), CvType.CV_32FC1);
// set powTwo = to 2^i;
int powTwo = (1 << i);
// Set multiplier matrix equal to powTwo;
Core.multiply(multiplier, new Scalar(powTwo), multiplier);
// n<<i == n * 2^i;
// I'm shifting the RGB values into separate parts of the same 32bit
// integer.
Core.multiply(multiplier, splitImg.get(i), splitImg.get(i));
// Add the shifted RGB components together.
Core.add(flatImg, splitImg.get(i), flatImg);
}
// Create a histogram of the pixel values.
List<Mat> images = new ArrayList<>();
images.add(flatImg);
MatOfInt channels = new MatOfInt(0);
Mat hist = new Mat();
// 16777216 == 256*256*256
MatOfInt histSize = new MatOfInt(16777216);
MatOfFloat ranges = new MatOfFloat(0f, 16777216f);
Imgproc.calcHist(images, channels, mask, hist, histSize, ranges);
Mat uniquePixels = new Mat();
Core.inRange(hist, new Scalar(1), new Scalar(Float.MAX_VALUE), uniquePixels);
return uniquePixels;
}
Please feel free to ask questions, or point out problems!

How to determine the distance between upper lip and lower lip by using webcam in Processing?

Where should I start? I can see plenty of face recognition and analysis using Python, Java script but how about Processing ?
I want to determine the distance by using 2 points between upper and lower lip at their highest and lowest point via webcam to use it in further project.
any help would be appreciated
If you want to do it in Processing alone you can use Greg Borenstein's OpenCV for Processing library:
You can start with the Face Detection example
Once you detect a face, you can detect a mouth within the face rectangle using OpenCV.CASCADE_MOUTH.
Once you have mouth detected maybe you can get away with using the mouth bounding box height. For more detail you use OpenCV to threshold that rectangle. Hopefully the open mouth will segment nicely from the rest of the skin. Finding contours should give you lists of points you can work with.
For something a lot more exact, you can use Jason Saragih's CLM FaceTracker, which is available as an OpenFrameworks addon. OpenFrameworks has similarities to Processing. If you do need this sort of accuracy in Processing you can run FaceOSC in the background and read the mouth coordinates in Processing using oscP5
Update
For the first option, using HAAR cascade classifiers, turns out there are a couple of issues:
The OpenCV Processing library can load one cascade and a second instance will override the first.
The OpenCV.CASCADE_MOUTH seems to work better for closed mouths, but not very well with open mouths
To get past the 1st issue, you can use the OpenCV Java API directly, bypassing OpenCV Processing for multiple cascade detection.
There are couple of parameters that can help the detection, such as having idea of the bounding box of the mouth before hand to pass as a hint to the classifier.
I've done a basic test using a webcam on my laptop and measure the bounding box for face and mouth at various distances. Here's an example:
import gab.opencv.*;
import org.opencv.core.*;
import org.opencv.objdetect.*;
import processing.video.*;
Capture video;
OpenCV opencv;
CascadeClassifier faceDetector,mouthDetector;
MatOfRect faceDetections,mouthDetections;
//cascade detections parameters - explanations from Mastering OpenCV with Practical Computer Vision Projects
int flags = Objdetect.CASCADE_FIND_BIGGEST_OBJECT;
// Smallest object size.
Size minFeatureSizeFace = new Size(50,60);
Size maxFeatureSizeFace = new Size(125,150);
Size minFeatureSizeMouth = new Size(30,10);
Size maxFeatureSizeMouth = new Size(120,60);
// How detailed should the search be. Must be larger than 1.0.
float searchScaleFactor = 1.1f;
// How much the detections should be filtered out. This should depend on how bad false detections are to your system.
// minNeighbors=2 means lots of good+bad detections, and minNeighbors=6 means only good detections are given but some are missed.
int minNeighbors = 4;
//laptop webcam face rectangle
//far, small scale, ~50,60px
//typing distance, ~83,91px
//really close, ~125,150
//laptop webcam mouth rectangle
//far, small scale, ~30,10
//typing distance, ~50,25px
//really close, ~120,60
int mouthHeightHistory = 30;
int[] mouthHeights = new int[mouthHeightHistory];
void setup() {
opencv = new OpenCV(this,320,240);
size(opencv.width, opencv.height);
noFill();
frameRate(30);
video = new Capture(this,width,height);
video.start();
faceDetector = new CascadeClassifier(dataPath("haarcascade_frontalface_alt2.xml"));
mouthDetector = new CascadeClassifier(dataPath("haarcascade_mcs_mouth.xml"));
}
void draw() {
//feed cam image to OpenCV, it turns it to grayscale
opencv.loadImage(video);
opencv.equalizeHistogram();
image(opencv.getOutput(), 0, 0 );
//detect face using raw Java OpenCV API
Mat equalizedImg = opencv.getGray();
faceDetections = new MatOfRect();
faceDetector.detectMultiScale(equalizedImg, faceDetections, searchScaleFactor, minNeighbors, flags, minFeatureSizeFace, maxFeatureSizeFace);
Rect[] faceDetectionResults = faceDetections.toArray();
int faces = faceDetectionResults.length;
text("detected faces: "+faces,5,15);
if(faces >= 1){
Rect face = faceDetectionResults[0];
stroke(0,192,0);
rect(face.x,face.y,face.width,face.height);
//detect mouth - only within face rectangle, not the whole frame
Rect faceLower = face.clone();
faceLower.height = (int) (face.height * 0.65);
faceLower.y = face.y + faceLower.height;
Mat faceROI = equalizedImg.submat(faceLower);
//debug view of ROI
PImage faceImg = createImage(faceLower.width,faceLower.height,RGB);
opencv.toPImage(faceROI,faceImg);
image(faceImg,width-faceImg.width,0);
mouthDetections = new MatOfRect();
mouthDetector.detectMultiScale(faceROI, mouthDetections, searchScaleFactor, minNeighbors, flags, minFeatureSizeMouth, maxFeatureSizeMouth);
Rect[] mouthDetectionResults = mouthDetections.toArray();
int mouths = mouthDetectionResults.length;
text("detected mouths: "+mouths,5,25);
if(mouths >= 1){
Rect mouth = mouthDetectionResults[0];
stroke(192,0,0);
rect(faceLower.x + mouth.x,faceLower.y + mouth.y,mouth.width,mouth.height);
text("mouth height:"+mouth.height+"~px",5,35);
updateAndPlotMouthHistory(mouth.height);
}
}
}
void updateAndPlotMouthHistory(int newHeight){
//shift older values by 1
for(int i = mouthHeightHistory-1; i > 0; i--){
mouthHeights[i] = mouthHeights[i-1];
}
//add new value at the front
mouthHeights[0] = newHeight;
//plot
float graphWidth = 100.0;
float elementWidth = graphWidth / mouthHeightHistory;
for(int i = 0; i < mouthHeightHistory; i++){
rect(elementWidth * i,45,elementWidth,mouthHeights[i]);
}
}
void captureEvent(Capture c) {
c.read();
}
One very imortant note to make: I've copied cascade xml files from the OpenCV Processing library folder (~/Documents/Processing/libraries/opencv_processing/library/cascade-files) to the sketch's data folder. My sketch is OpenCVMouthOpen, so the folder structure looks like this:
OpenCVMouthOpen
├── OpenCVMouthOpen.pde
└── data
├── haarcascade_frontalface_alt.xml
├── haarcascade_frontalface_alt2.xml
├── haarcascade_frontalface_alt_tree.xml
├── haarcascade_frontalface_default.xml
├── haarcascade_mcs_mouth.xml
└── lbpcascade_frontalface.xml
If you don't copy the cascades files and use the code as it is you won't get any errors, but the detection simply won't work. If you want to check, you can do
println(faceDetector.empty())
at the end of the setup() function and if you get false, the cascade has been loaded and if you get true, the cascade hasn't been loaded.
You may need to play with the minFeatureSize and maxFeatureSize values for face and mouth for your setup. The second issue, cascade not detecting wide open mouth very well is tricky. There might be an already trained cascade for open mouths, but you'd need to find it. Otherwise, with this method you may need to train one yourself and that can be a bit tedious.
Nevertheless, notice that there is an upside down plot drawn on the left when a mouth is detected. In my tests I noticed that the height isn't super accurate, but there are noticeable changes in the graph. You may not be able to get a steady mouth height, but by comparing current to averaged previous height values you should see some peaks (values going from positive to negative or vice-versa) which give you an idea of a mouth open/close change.
Although searching through the whole image for a mouth as opposed to a face only can be a bit slower and less accurate, it's a simpler setup. It you can get away with less accuracy and more false positives on your project this could be simpler:
import gab.opencv.*;
import java.awt.Rectangle;
import org.opencv.objdetect.Objdetect;
import processing.video.*;
Capture video;
OpenCV opencv;
Rectangle[] faces,mouths;
//cascade detections parameters - explanations from Mastering OpenCV with Practical Computer Vision Projects
int flags = Objdetect.CASCADE_FIND_BIGGEST_OBJECT;
// Smallest object size.
int minFeatureSize = 20;
int maxFeatureSize = 150;
// How detailed should the search be. Must be larger than 1.0.
float searchScaleFactor = 1.1f;
// How much the detections should be filtered out. This should depend on how bad false detections are to your system.
// minNeighbors=2 means lots of good+bad detections, and minNeighbors=6 means only good detections are given but some are missed.
int minNeighbors = 6;
void setup() {
size(320, 240);
noFill();
stroke(0, 192, 0);
strokeWeight(3);
video = new Capture(this,width,height);
video.start();
opencv = new OpenCV(this,320,240);
opencv.loadCascade(OpenCV.CASCADE_MOUTH);
}
void draw() {
//feed cam image to OpenCV, it turns it to grayscale
opencv.loadImage(video);
opencv.equalizeHistogram();
image(opencv.getOutput(), 0, 0 );
Rectangle[] mouths = opencv.detect(searchScaleFactor,minNeighbors,flags,minFeatureSize, maxFeatureSize);
for (int i = 0; i < mouths.length; i++) {
text(mouths[i].x + "," + mouths[i].y + "," + mouths[i].width + "," + mouths[i].height,mouths[i].x, mouths[i].y);
rect(mouths[i].x, mouths[i].y, mouths[i].width, mouths[i].height);
}
}
void captureEvent(Capture c) {
c.read();
}
I was mentioning segmenting/thresholding as well. Here's a rough example using the lower part of a detected face just a basic threshold, then some basic morphological filters (erode/dilate) to cleanup the thresholded image a bit:
import gab.opencv.*;
import org.opencv.core.*;
import org.opencv.objdetect.*;
import org.opencv.imgproc.Imgproc;
import java.awt.Rectangle;
import java.util.*;
import processing.video.*;
Capture video;
OpenCV opencv;
CascadeClassifier faceDetector,mouthDetector;
MatOfRect faceDetections,mouthDetections;
//cascade detections parameters - explanations from Mastering OpenCV with Practical Computer Vision Projects
int flags = Objdetect.CASCADE_FIND_BIGGEST_OBJECT;
// Smallest object size.
Size minFeatureSizeFace = new Size(50,60);
Size maxFeatureSizeFace = new Size(125,150);
// How detailed should the search be. Must be larger than 1.0.
float searchScaleFactor = 1.1f;
// How much the detections should be filtered out. This should depend on how bad false detections are to your system.
// minNeighbors=2 means lots of good+bad detections, and minNeighbors=6 means only good detections are given but some are missed.
int minNeighbors = 4;
//laptop webcam face rectangle
//far, small scale, ~50,60px
//typing distance, ~83,91px
//really close, ~125,150
float threshold = 160;
int erodeAmt = 1;
int dilateAmt = 5;
void setup() {
opencv = new OpenCV(this,320,240);
size(opencv.width, opencv.height);
noFill();
video = new Capture(this,width,height);
video.start();
faceDetector = new CascadeClassifier(dataPath("haarcascade_frontalface_alt2.xml"));
mouthDetector = new CascadeClassifier(dataPath("haarcascade_mcs_mouth.xml"));
}
void draw() {
//feed cam image to OpenCV, it turns it to grayscale
opencv.loadImage(video);
opencv.equalizeHistogram();
image(opencv.getOutput(), 0, 0 );
//detect face using raw Java OpenCV API
Mat equalizedImg = opencv.getGray();
faceDetections = new MatOfRect();
faceDetector.detectMultiScale(equalizedImg, faceDetections, searchScaleFactor, minNeighbors, flags, minFeatureSizeFace, maxFeatureSizeFace);
Rect[] faceDetectionResults = faceDetections.toArray();
int faces = faceDetectionResults.length;
text("detected faces: "+faces,5,15);
if(faces > 0){
Rect face = faceDetectionResults[0];
stroke(0,192,0);
rect(face.x,face.y,face.width,face.height);
//detect mouth - only within face rectangle, not the whole frame
Rect faceLower = face.clone();
faceLower.height = (int) (face.height * 0.55);
faceLower.y = face.y + faceLower.height;
//submat grabs a portion of the image (submatrix) = our region of interest (ROI)
Mat faceROI = equalizedImg.submat(faceLower);
Mat faceROIThresh = faceROI.clone();
//threshold
Imgproc.threshold(faceROI, faceROIThresh, threshold, width, Imgproc.THRESH_BINARY_INV);
Imgproc.erode(faceROIThresh, faceROIThresh, new Mat(), new Point(-1,-1), erodeAmt);
Imgproc.dilate(faceROIThresh, faceROIThresh, new Mat(), new Point(-1,-1), dilateAmt);
//find contours
Mat faceContours = faceROIThresh.clone();
List<MatOfPoint> contours = new ArrayList<MatOfPoint>();
Imgproc.findContours(faceContours, contours, new Mat(), Imgproc.RETR_EXTERNAL , Imgproc.CHAIN_APPROX_SIMPLE);
//draw contours
for(int i = 0 ; i < contours.size(); i++){
MatOfPoint contour = contours.get(i);
Point[] points = contour.toArray();
stroke(map(i,0,contours.size()-1,32,255),0,0);
beginShape();
for(Point p : points){
vertex((float)p.x,(float)p.y);
}
endShape();
}
//debug view of ROI
PImage faceImg = createImage(faceLower.width,faceLower.height,RGB);
opencv.toPImage(faceROIThresh,faceImg);
image(faceImg,width-faceImg.width,0);
}
text("Drag mouseX to control threshold: " + threshold+
"\nHold 'e' and drag mouseX to control erodeAmt: " + erodeAmt+
"\nHold 'd' and drag mouseX to control dilateAmt: " + dilateAmt,5,210);
}
void mouseDragged(){
if(keyPressed){
if(key == 'e') erodeAmt = (int)map(mouseX,0,width,1,6);
if(key == 'd') dilateAmt = (int)map(mouseX,0,width,1,10);
}else{
threshold = mouseX;
}
}
void captureEvent(Capture c) {
c.read();
}
This could be improved a bit by using YCrCb colour space to segment skin better, but overall you notice that there are quite a few variables to get right which doesn't make this a very flexible setup.
You will be much better results using FaceOSC and reading the values you need in Processing via oscP5. Here is a slightly simplified version of the FaceOSCReceiver Processing example focusing mainly on mouth:
import oscP5.*;
OscP5 oscP5;
// num faces found
int found;
// pose
float poseScale;
PVector posePosition = new PVector();
// gesture
float mouthHeight;
float mouthWidth;
void setup() {
size(640, 480);
frameRate(30);
oscP5 = new OscP5(this, 8338);
oscP5.plug(this, "found", "/found");
oscP5.plug(this, "poseScale", "/pose/scale");
oscP5.plug(this, "posePosition", "/pose/position");
oscP5.plug(this, "mouthWidthReceived", "/gesture/mouth/width");
oscP5.plug(this, "mouthHeightReceived", "/gesture/mouth/height");
}
void draw() {
background(255);
stroke(0);
if(found > 0) {
translate(posePosition.x, posePosition.y);
scale(poseScale);
noFill();
ellipse(0, 20, mouthWidth* 3, mouthHeight * 3);
}
}
// OSC CALLBACK FUNCTIONS
public void found(int i) {
println("found: " + i);
found = i;
}
public void poseScale(float s) {
println("scale: " + s);
poseScale = s;
}
public void posePosition(float x, float y) {
println("pose position\tX: " + x + " Y: " + y );
posePosition.set(x, y, 0);
}
public void mouthWidthReceived(float w) {
println("mouth Width: " + w);
mouthWidth = w;
}
public void mouthHeightReceived(float h) {
println("mouth height: " + h);
mouthHeight = h;
}
// all other OSC messages end up here
void oscEvent(OscMessage m) {
if(m.isPlugged() == false) {
println("UNPLUGGED: " + m);
}
}
On OSX you can simply download the compiled FaceOSC app.
On other operating systems you may need to setup OpenFrameworks, download ofxFaceTracker and compile FaceOSC yourself.
It's really hard to answer general "how do I do this" type questions. Stack Overflow is designed for specific "I tried X, expected Y, but got Z instead" type questions. But I'll try to answer in a general sense:
You need to break your problem down into smaller pieces.
Step 1: Can you get a webcam feed showing in your sketch? Don't worry about the computer vision stuff for a second. Just get the camera connected. Do some research and try something out.
Step 2: Can you detect facial features in that video? You might try doing it yourself, or you might use one of the many libraries listed in the Videos and Vision section of the Processing libraries page.
Step 3: Read the documentation on those libraries. Try them out. You might have to make a bunch of little example sketches using each library until you find one you like. We can't do this for you, as which one is right for you depends on you. If you're confused about something specific we can try to help you, but we can't really help you with picking out a library.
Step 4: Once you've done a bunch of example programs and picked out a library, start working towards your goal. Can you detect facial features using the library? Get just that part working. Once you have that working, can you detect changes like opening or closing a mouth?
Work on one small step at a time. If you get stuck, post an MCVE along with a specific technical question, and we'll go from there. Good luck.

Template Matching on various sizes

Right now I am working on an OCR algorithm with Template Matching, using the opencv library. I am comparing pixel by pixel, and till now I have obtained good results. The problem comes when the area I want to match is of different size.
Ex: Template size = 70x100 while ROI = 140x200.
Is there any function that I can use in order adapt the required size and end up with the same amount of rows and columns?
Thanks
Robert Grech
Usually one makes an image scale pyramid and then only scans with the 70x100 windows across all scales i.e. as in opencv HOGDescriptor:
double scale = 1.;
double scale0 = 1.05;
int maxLevels = 64;
int nLevels;
Size templateSize(70,100);
cv::Mat testImage = cv::imread("test1.jpg");
vector<double> levelScale;
for( nLevels = 0; nLevels < maxLevels; nLevels++ )
{
levelScale.push_back(scale);
if( cvRound(testImage.cols/scale) < templateSize.width ||
cvRound(testImage.rows/scale) < templateSize.height ||
scale0 <= 1 )
break;
scale *= scale0;
}
nLevels = std::max(nLevels, 1);
levelScale.resize(nLevels);
int level;
for(level =0; level<nLevels; level++)
{
cv::Mat testAtScale;
Size sz(cvRound(testImage.cols/levelScale[level]),
cvRound(testImage.rows/levelScale[level]));
resize(testImage,testAtScale,sz);
//result = match(template,testAtScale);
//cv::imshow("sclale",testAtScale);
//cv::waitKey();
}
you would then need to post-process your results back to the original scale, this is simple with a box, but if you have a heat map / response map / probability map, then re-sizing it back up maybe somewhat hacky.

What is the correct way to apply filter to a image

I was wondering what the correct way would be to apply filter to a image. The image processing textbook that I am reading only talks about the mathematical and theoretical aspect of filters but doesn't talk much the programming part of it !
I came up with this pseudo code could some one tell me if it is correct cause I applied the sobel edge filter to a image and I am not satisfied with the output. I think it detected many unnecessary points as edges and missed out on several points along the edge.
int filter[][] = {{0d,-1d,0d},{-1d,8d,-1d},{0d,-1d,0d}};// I dont exactly remember the //sobel filter
int total = 0;
for(int i = 2;i<image.getWidth()-2;i++)
for(int j = 2;j<image.getHeight()-2;j++)
{
total = 0;
for(int k = 0;k<3;k++)
for(int l = 0;l<3;l++)
{
total += intensity(image.getRGB(i,j)) * filter[i+k][j+l];
}
if(total >= threshold){
image.setRGB(i,j,WHITE);
}
}
int intensity(int color)
{
return (((color >> 16) & 0xFF) + ((color >> 8) & 0xFF) + color)/3;
}
Two issues:
(1) The sober operator includes x-direction and y-direction, they are
int filter[][] = {{1d,0d,-1d},{2d,0d,-2d},{1d,0d,-1d}}; and
int filter[][] = {{1d,2d,1d},{0d,0d,0d},{-1d,-2d,-1d}};
(2) The convolution part:
total += intensity(image.getRGB(i+k,j+l)) * filter[k][l];
Your code doesn't look quiet right to me. In order to apply the filter to the image you must apply the discrete time convolution algorithm http://en.wikipedia.org/wiki/Convolution.
When you do convolution you want to slide the 3x3 filter over the image, moving it one pixel at a time. At each step you multiply the value of the filter 'pixel' by the corresponding value of the image pixel which is under that particular filter 'pixel' (the 9 pixels under the filter are all affected). The values that result should be added up onto a new resulting image as you go.
Thresholding is optional...
The following is your code modified with some notes:
int filter[][] = {{0d,-1d,0d},{-1d,8d,-1d},{0d,-1d,0d}};
//create a new array for the result image on the heap
int newImage[][][3] = ...
//initialize every element in the newImage to 0
for(int i = 0;i<image.getWidth()-1;i++)
for(int j = 0;j<image.getHeight()-1;j++)
for (int k = 0; k<3; k++)
{
newImage[i][j][k] = 0;
}
//Convolve the filter and the image
for(int i = 1;i<image.getWidth()-2;i++)
for(int j = 1;j<image.getHeight()-2;j++)
{
for(int k = -1;k<2;k++)
for(int l = -1;l<2;l++)
{
newImage[i+k][j+l][1] += getRed(image.getRGB(i+k ,j+l)) * filter[k+1][l+1];
newImage[i+k][j+l][2] += getGreen(image.getRGB(i+k ,j+l)) * filter[k+1][l+1];
newImage[i+k][j+l][3] += getBlue(image.getRGB(i+k ,j+l)) * filter[k+1][l+1];
}
}
int getRed(int color)
{
...
}
int getBlue(int color)
{
...
}
int getGreen(int color)
{
...
}
Please note that the code above does not handle the edges of the image exactly right. If you wanted to make it absolutely perfect you'd start by sliding the filter mostly off screen (so the first position would apply the lower right corner of the filter to the image 0,0 pixel of the image. Doing this is really a pain though, so usually its easier just to ignore the 2 pixel border around the edges.
Once you've got that working you can experiment by sliding the Sobel filter in the horizontal and then the vertical directions. You will notice that the filter acts most strongly on lines which are perpendicular to the direction of travel (to the filter). So for the best results apply the filter in the horizontal and then the vertical direction (using the same newImage). That way you will detect vertical as well as horizontal lines equally well. :)
You have some serious undefined behavior going on here. The array filter is 3x3 but the subscripts you're using i+k and j+l are up to the size of the image. It looks like you've misplaced this addition:
total += intensity(image.getRGB(i+k,j+l)) * filter[k][l];
Use GPUImage, it's quite good for you.

Resources