I'm now reading Caffe source code, and the question occurred to me.
Take caffe/relu_layer.cpp for example. When computing gradient, from
void ReLULayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
if (propagate_down[0]) {
const Dtype* bottom_data = bottom[0]->cpu_data();
const Dtype* top_diff = top[0]->cpu_diff();
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
const int count = bottom[0]->count();
Dtype negative_slope = this->layer_param_.relu_param().negative_slope();
for (int i = 0; i < count; ++i) {
bottom_diff[i] = top_diff[i] * ((bottom_data[i] > 0)
+ negative_slope * (bottom_data[i] <= 0));
we can see a value is finally assigned to bottom_diff, indicating that value is the gradient of the corresponding bottom blob.
However, when multiple layers take one blob as inputs, e.g., stacking multiple ReLU layers on one blob, how does Caffe handle the gradient computation? The first ReLU layer modifies bottom_diff, and it seems that the second ReLU layer just overrides it, instead of adding two gradients.
I didn't see anywhere performing gradient summation, and I am confuses. Please inform me if I have missed something important, and thanks a lot.
Caffe automatically inserts Split layer when a top blob is used in multiple bottoms. This is done inside Net<Dtype>::Init(...) by a call to InsertSplits(...) from caffe/utils/insert_splits.cpp.
Original network in NetParameter protobuf object (nodes here are layers):
data ---> conv1 -> conv2 -> ...
\-> somelayer -> ...
Net Layers in memory after Net::Init():
data -> split ---> conv1 -> conv2 -> ...
\-> somelayer -> ...
(An interesting detail, by the way: .diff in activation Blobs is assigned to by Backward(), while .diff in layer learnable parameters is added to by Backward().)
I am trying to backpropogate a very primitive / simple ANN.
I've almost got it working. I'm trying to implement the formulas and the article I'm reading does not specify whether to use dot product or element wise multiplication or some other multiplication.
Article: https://ml-cheatsheet.readthedocs.io/en/latest/backpropagation.html
Here's the formula for calculating the error (or delta) of a single Hidden layer:
Or, as I read it in the context of my algorithm,
Delta = prev_delta * prev_weight * zprime
Where delta is the error of this layer, prev_delta is the delta of the previous layer, prev_weight is the weight of the previous layer, and zprime is the derivative of the activation function of the current layer.
Also, for a single Output Layer:
Or, as I read it in the context of my algorithm,
Delta = (output - target) % zprime;
Where output is the final output of the feed-forward and target is the target values.
I've written this code to run this calculation:
void Layer::backward(Matrix & prev_delta, Matrix & prev_weight) {
// all variables are matrices
// except for prev_layer, that's a pointer to a layer object.
// I'm using Armadillo for linear algebra / matrices
// delta, weight, output, and zprime refer to the current layer.
// prev_delta, prev_weight belong the the previous layer.
if (next_layer == nullptr) {
// if next layer is null, this is the output layer.
// in that case, prev_delta is target.
// yHat - y * R'(Zo)
delta = (output - prev_delta) * zprime;
else {
// Eo * Wo * R'(Zh)
delta = prev_delta * prev_weight * zprime;
// tell the next layer to backpropogate
if (prev_layer != nullptr)
prev_layer -> backward(delta, weight);
matrix * matrix indicates a matrix multiplication (dot product)
matrix % matrix indicates element-wise multiplication
The issue I'm having is that these matrices don't seem to multiply properly. I've made sure everything lines up the same way the article has it, but these pieces just don't seem to fit. How should these matrices be multiplied to get the result?
Edit: to clarify, I get errors when I try to take the dot product of these matrices. "invalid size". I've tried using element wise multiplication but then things get weird there too.
I really don't know what it is called (distortion or something else)
But I would like to detect lens camera problems for some different types of images by using emgucv (or opencv)
Any ideas about which algorithms to use would be appreciated
Second image seems to have high noise, but is there any way to understand high noise via opencv?
This is very difficult to achieve generically without reference data or a homogeneity sample. However, I have developed a recommendation analyzing the Average SNR (Signal to Noise) ratio of the image. The algorithm divides the input image into a specified number of "sub images' based on a specified kernel size in order to evaluate each independently for local SNR. The computed SNRs for each sub image are then mean averaged to provide an indicator for the global SNR of the image.
You will need to test this approach exhaustively, however it shows promise on the following three images, producing AvgSNR;
Image #1 - AvgSNR = 0.9
Image #2 - AvgSNR = 7.0
Image #3 - AvgSNR = 0.6
NOTE: See how the "clean" control image produces a much higher AvgSNR.
The only variable to consider is the kernel size. I would recommend keeping this at a size that will support will even the smallest of your potential input images. 30 pixels square should likely be appropriate for many images.
I enclose my test code with annotation:
class Program
static void Main(string[] args)
// List of file names to load.
List<string> fileNames = new List<string>()
// For each image
foreach (string fileName in fileNames)
// Determine local file path
string path = Path.Combine(Environment.CurrentDirectory, #"TestImages\", fileName);
// Load the image
Image<Bgr, byte> inputImage = new Image<Bgr, byte>(path);
// Compute the AvgSNR with a kernel of 30x30
Console.WriteLine(ComputeAverageSNR(30, inputImage.Convert<Gray, byte>()));
// Display the image
CvInvoke.Imshow("Test", inputImage);
while (CvInvoke.WaitKey() != 27) { }
// Pause for evaluation
static double ComputeAverageSNR(int kernelSize, Image<Gray, byte> image)
// Calculate the number of sub-divisions given the kernel size
int widthSubDivisions, heightSubDivisions;
widthSubDivisions = (int)Math.Floor((double)image.Width / kernelSize);
heightSubDivisions = (int)Math.Floor((double)image.Height / kernelSize);
int totalNumberSubDivisions = widthSubDivisions * heightSubDivisions;
Rectangle ROI = new Rectangle(0, 0, kernelSize, kernelSize);
double avgSNR = 0;
// Foreach sub-divions, calculate the SNR and sum to the avgSNR
for (int v = 0; v < heightSubDivisions; v++)
for (int u = 0; u < widthSubDivisions; u++)
// Iterate the sub-division position
ROI.Location = new Point(u * kernelSize, v * kernelSize);
// Calculate the SNR of this sub-division
avgSNR += ComputeSNR(image.GetSubRect(ROI));
avgSNR /= totalNumberSubDivisions;
return avgSNR;
static double ComputeSNR(Image<Gray, byte> image)
// Local varibles
double mean, sigma, snr;
// Calculate the mean pixel value for the sub-division
int population = image.Width * image.Height;
mean = CvInvoke.Sum(image).V0 / population;
// Calculate the Sigma of the sub-division population
double sumDeltaSqu = 0;
for (int v = 0; v < image.Height; v++)
for (int u = 0; u < image.Width; u++)
sumDeltaSqu += Math.Pow(image.Data[v, u, 0] - mean, 2);
sumDeltaSqu /= population;
sigma = Math.Pow(sumDeltaSqu, 0.5);
// Calculate and return the SNR value
snr = sigma == 0 ? mean : mean / sigma;
return snr;
NOTE: Without a reference, it is not possible to differentiate between natural variance/fidelity and "noise". For example, a highly texture background, or a scene with few homogeneous regions will yield a high AvgSNR. This approach will perform best when the evaluated scene consists mostly of plain, mono-color surfaces, such as the server room or shop front. Grass for example would contain a large amount of texture and therefore "noise".
An alternative method is to consider evaluating your images in the frequency domain following a Fourier transform. Principally, the noise examples you have provided are images containing unwanted, high frequency content. Conduct FFT and evaluate for images violating a threshold for high frequencies. Here you will from an example of FFT with Emgu: FFT with Emgu
I need to write an own implementation of computing the fundamental matrix between two images based on the corresponding image coordinates without using OpenCV.
Is it possible to describe this algorithm in its simplest form in accordance with the following function? a simple and straightforward formula.
Input Arguments:
points1(x,y)−pixel coordinates in the first image ,
corresponding to points2 in the second image
points2(x,y)−pixel coordinates in the second image ,
corresponding to points1 in the first image
Output :
F − the fundamental matrix between the first image and the second image
Yes, it is possible to describe the algorithm in the mentioned form.
If you would use OpenCV, you could just use findFundamentalMat. This also provides the 8-point method for computing the fundamental matrix.
The example (in C++) taken from the OpenCV documentation, but adapted (using the RANSAC algorithm for computing the fundamental matrix):
// Example. Estimation of fundamental matrix using the 8-point algorithm
int point_count = 8; // must be >= 8
vector<Point2f> points1(point_count);
vector<Point2f> points2(point_count);
// initialize the points here ... */
for( int i = 0; i < point_count; i++ )
points1[i] = ...;
points2[i] = ...;
Mat fundamental_matrix =
findFundamentalMat(points1, points2, CV_FM_8POINT);
If you want to write your own function, it would look like this (no valid code)
Matrix findFundamentalMat(Array points1, Array points2)
Matrix fundamentalMatrix;
// compute fundamental matrix based on input points1 and points2 or call OpenCV's findFundamentalMat
return fundamentalMatrix;
What is the purpose of the Tiling layer in Caffe? It seems it is a form of reshaping the input, however I'm wondering how exactly it works and where it could be applied?
This is the source code:
template <typename Dtype>
void TilingLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
TilingParameter tiling_param = this->layer_param_.tiling_param();
tile_dim_ = tiling_param.tile_dim();
tile_dim_sq_ = tile_dim_ * tile_dim_;
CHECK(tile_dim_) << "tile_dim must be specified.";
CHECK_GT(tile_dim_, 0) << "tile_dim must be positive.";
template <typename Dtype> void TilingLayer<Dtype>::Reshape(const
vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
CHECK_EQ(top.size(), 1);
input_channels_ = bottom[0]->channels();
input_height_ = bottom[0]->height();
input_width_ = bottom[0]->width();
output_channels_ = bottom[0]->channels() / tile_dim_sq_;
output_width_ = input_width_ * tile_dim_;
output_height_ = input_height_ * tile_dim_;
count_per_output_map_ = output_width_ * output_height_;
count_per_input_map_ = input_width_ * input_height_;
CHECK_EQ(0, input_channels_ % tile_dim_sq_)
<< "The number of input channels for tiling layer must be multiples "
<< "of the tile_dim."; top[0]->Reshape(bottom[0]->num(),
input_channels_ / tile_dim_sq_,
input_height_ * tile_dim_, input_width_ * tile_dim_); }
tiling layer is different from tile layer, tiling layer is like reshape, but tile layer is like repmat.
=============== edit to add more details ==========
For tile layer, as shown in the source code, https://github.com/BVLC/caffe/blob/master/src/caffe/layers/tile_layer.cpp
Dtype* top_data = top[0]->mutable_cpu_data();
for (int i = 0; i < outer_dim_; ++i) {
for (int t = 0; t < tiles_; ++t) {
caffe_copy(inner_dim_, bottom_data, top_data);
top_data += inner_dim_;
bottom_data += inner_dim_;
the top data is just tiles_ times of input data, with NCHW and tile_dim = 8, you get blob with shape NC*(H*8)*(W*8)
but for tiling layer, it flatten the layer, for example you have NCHW blob, and tiling_dim=8, then after tiling layer, the count doesn't change, but you get blob with shape N(C/64)*(H*8)*(W*8).
"Tile" layer in caffe implements similar operation to numpy's tile, or Matlab's repmat functions: it copies the content of an array along a specified dimension.
For example, suppose you have a 2D "attention" (or "saliency") map, and you want to weigh the features according to these weights: give more weight to "salinet" regions and less to non "salient" regions. One way to achieve that is to multiply (element-wise) the 3D feature map by the 2D saliency map. To do so you need to "Tile" the saliency map along the channel dimension (from 2D to 3D) and then to apply "Eltwise" layer.
I am trying to find an efficient way to see if one image is a subset of another (meaning that each unique pixel in one image is also found in the other.) The repetition or ordering of the pixels do not matter.
I am working in Java, so I would like all of my operations to be completed in OpenCV for efficiency's sake.
My first idea was to export a list of unique pixel values, and compare it to the list from the second image.
As there is not a built in function to extract unique pixels, I abandoned this approach.
I also understand that I can find the locations of a particular color with the inclusive inRange, and findNonZero operations.
Core.inRange(image, color, color, tempMat); // inclusive
Core.findNonZero(tempMat, colorLocations);
Unfortunately, this does not provide an adequate answer, as it would need to be executed per color, and would still require extracting unique pixels.
Essentially, I'm asking if there is a clever way to use the built in OpenCV functions to see if an image is comprised of the pixels found in another image.
I understand that this will not work for slight color differences. I am working on a limited dataset, and care about the exact pixel values.
To put the question more mathematically:
Because the only think you are interested in is the pixel values i would suggest to do the following.
Compute the histogram of image 1 using hist1 = calcHist()
Compute the histogram of image 2 using hist2 = calcHist()
Calculate the difference vector diff = hist1 - hist2
Check if each bin of the hist of the subimage is less or equal than the corresponding bin in the hist of the bigger image
Thanks to Miki for the fix.
I will keep Amitay's as the accepted answer, as he absolutely lead me down the correct path. I wanted to also share my exact answer for anyone who finds this in the future.
As I stated in my question, I was looking for an efficient way to see if the RGB values of one image were a subset of the RGB values of another image.
I made a function to the following specification:
The Java code is as follows:
private boolean isSubset(Mat subset, Mat subMask, Mat superset) {
// Get unique set of pixels from both images
subset = getUniquePixels(subset, subMask);
superset = getUniquePixels(superset, null);
// See if the superset pixels encapsulate the subset pixels
// OR the unique pixels together
Mat subOrSuper = new Mat();
Core.bitwise_or(subset, superset, subOrSuper);
//See if the ORed matrix is equal to the superset
Mat notEqualMat = new Mat();
Core.compare(superset, subOrSuper, notEqualMat, Core.CMP_NE);
return Core.countNonZero(notEqualMat) == 0;
subset and superset are assumed to be CV_8UC3 matricies, while subMask is assumed to be CV_8UC1.
private Mat getUniquePixels(Mat img, Mat mask) {
if (mask == null) {
mask = new Mat();
// int bgrValue = (b << 16) + (g << 8) + r;
img.convertTo(img, CvType.CV_32FC3);
Vector<Mat> splitImg = new Vector<>();
Core.split(img, splitImg);
Mat flatImg = Mat.zeros(img.rows(), img.cols(), CvType.CV_32FC1);
Mat multiplier;
for (int i = 0; i < splitImg.size(); i++) {
multiplier = Mat.ones(img.rows(), img.cols(), CvType.CV_32FC1);
// set powTwo = to 2^i;
int powTwo = (1 << i);
// Set multiplier matrix equal to powTwo;
Core.multiply(multiplier, new Scalar(powTwo), multiplier);
// n<<i == n * 2^i;
// I'm shifting the RGB values into separate parts of the same 32bit
// integer.
Core.multiply(multiplier, splitImg.get(i), splitImg.get(i));
// Add the shifted RGB components together.
Core.add(flatImg, splitImg.get(i), flatImg);
// Create a histogram of the pixel values.
List<Mat> images = new ArrayList<>();
MatOfInt channels = new MatOfInt(0);
Mat hist = new Mat();
// 16777216 == 256*256*256
MatOfInt histSize = new MatOfInt(16777216);
MatOfFloat ranges = new MatOfFloat(0f, 16777216f);
Imgproc.calcHist(images, channels, mask, hist, histSize, ranges);
Mat uniquePixels = new Mat();
Core.inRange(hist, new Scalar(1), new Scalar(Float.MAX_VALUE), uniquePixels);
return uniquePixels;
Please feel free to ask questions, or point out problems!