Weighting Vector Dimensions in Mahout - mahout

Is there a way to weight vector dimensions in Mahout?
I want to build a content based recommender, so am experimenting with building vectors for each item, then running a cosine distance measure. See simple example below.
DenseVector a = new DenseVector(new double[] {0.11, 510, 1});
DenseVector b = new DenseVector(new double[] {0.23, 650, 3});
CosineDistanceMeasure cosineDistanceMeasure = new CosineDistanceMeasure();
System.out.println(cosineDistanceMeasure.distance(a, b));
However, some vector dimensions are more important than others, so I was wondering how I would go about adding weight to each dimensions.

Related

Are ORB feature descriptors get affected if the keypoints were detected through image bucketing (i.e. dividing image into subimages)?

I've been working on a visual odometry project and in this article, it says:
Another thing that we do in this approach is something that is called “bucketing”. If we just run a feature detector over an entire image, there is a very good chance that most of the features would be concentrated in certain rich regions of the image, while certain other regions would not have any representation. This is not good for our algorithm, since it relies on the assumption of a static scene, and to find the “true” static scene, we must look at all of the image, instead of just certain regions of it. In order to tackle this issue, we divide the images into grids (of roughly 100x100px), and extract at most 20 features from each of this grid, thus maintaing a more uniform distribution of features.
So, I divided my image into 100x100 subimages/buckets. So, if the image is of size 1242x375, all the subimages will be of size 100x100, except the ones on the right edge and bottom edge. Those images will be of size 42x100 (right edge), 100x75 (bottom edge), or 42x75 (bottom right corner).
I'm using the ORB features:
static Ptr<ORB> cv::ORB::create (
int nfeatures = 500,
float scaleFactor = 1.2f,
int nlevels = 8,
int edgeThreshold = 31,
int firstLevel = 0,
int WTA_K = 2,
ORB::ScoreType scoreType = ORB::HARRIS_SCORE,
int patchSize = 31,
int fastThreshold = 20
)
For all the subimages, I'm updating the ORB object as:
int orb_patch_size = std::min(subimage_width, subimage_height) / 4;
orb = cv::ORB::create(50, 1.2f, 8, orb_patch_size, 0, 2, cv::ORB::HARRIS_SCORE, orb_patch_size);
orb->detectAndCompute(subimage, cv::Mat(), sub_img_kps, sub_img_desc);
So, once keypoints in the subimage are detected, I'm translating them to the main image coordinate system:
cv::KeyPoint translated_kp;
for (const auto kp : sub_img_kps) {
translated_kp = kp;
translated_kp.pt.x += subimage_rect.x;
translated_kp.pt.y += subimage_rect.y;
all_kps.push_back(translated_kp);
}
However, what should I do about feature descriptors (cv::Mat)? Just vertically concatenate those subimage features descriptor cv::Mats and create a big all descriptors cv::Mat? For now, I'm vertically concatenating them:
cv::vconcat(all_desc, sub_img_desc, all_desc);
But the feature matches don't look that good compared to if I don't divide the image into subimages and calculate keypoints & descriptors directly. So, is my approach for combining those feature descriptors (and keypoints) from subimages correct, or is there a better way or I can't really combine them?

What is a joint histogram and a marginal histogram in image processing?

What is a joint histogram and a marginal histogram in image processing and how do they work and how to construct one, with simple examples if possible.
For example, if I have a feature space of 10 dimensions and want to build a histogram with each dimension quantize into 20 values. How to calculate the total bins for the joint histogram and for the marginal histogram?
I assume you know what histograms are in general. Joint histograms of data in an N-dimensional feature space are N dimensional. You just put the data points into N-dimensional bins (typically Cartesian products of N 1-dimensional grids). Marginal histograms are less than N-dimensional histograms where one or more dimension has been ignored. Joint and marginal histograms are very similar to joint/marginal distributions.
How to compute them depends on your specific situation. You could compute marginal histograms from joint histograms by integrating over some dimensions or you could build them the same way as joint histograms but with fewer dimensions. In Matlab, histcounts2 for example computes a joint histogram of 2D data. For higher dimensional data, accumarray might be of help. In Python with NumPy, histogramdd generates multi-dimensional histograms. Typically the N-dimensional bins are Cartesian products of bins in each dimension and the resulting histograms are simple Numpy arrays (in Python) or matrices (in Matlab).
Simple example in N=2D (in Matlab)
Let's create some data first
x = 3*randn(1e4, 1);
y = randn(1e4, 1);
scatter(x, y, '.');
xlim([-10,10]);
ylim([-10,10]);
pbaspect([1,1,1]);
Let's compute the joint histogram
h = histcounts2(x, y, -10:10, -10:10);
Let's display the joint histogram and on each side the marginal histograms, which could have been obtained either by integrating the joint histogram over one dimension or by creating 1D histograms for the data axes separately. Here the marginal histograms are created by just computing 1D histograms (ignoring the other data dimension).
fig = figure;
subplot('Position', [0.35, 0.35, 0.6, 0.6]);
im = imagesc(-10:10, -10:10, h.');
im.Parent.YDir = 'normal';
axis image;
title('joint histogram (x,y)');
subplot('Position', [0.43, 0.1, 0.45, 0.15]);
histogram(x, -10:10);
camroll(180);
title('marginal histogram x');
subplot('Position', [0.2, 0.4, 0.15, 0.55]);
histogram(y, -10:10);
camroll(90);
title('marginal histogram y');
One can see nicely that the marginal histograms just correspond to add-ups of the joint histogram along a directions.

Probability index yolov2 darknet openCV3.4

I'm refer to https://github.com/opencv/opencv/blob/master/samples/dnn/yolo_object_detection.cpp
What is meant by line # 136?
const int probability_index = 5;
How do i modify the bounding box calculations of I were to just classify the image ( I'm interested in the object detection but just the image classification)?
YOLO models, at least YoloV2voc and TinyYoloV2voc, produce output matrices of shape Nx(C+4) where N is a number of detections, C is a number of classes (including background), 4 is a vector [centerX, centerY, width, height]. So classification confidences start from 5th element.

What is "linear projection" in convolutional neural network [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 10 months ago.
Improve this question
I am reading through Residual learning, and I have a question.
What is "linear projection" mentioned in 3.2? Looks pretty simple once got this but could not get the idea...
Can someone provide simple example?
First up, it's important to understand what x, y and F are and why they need any projection at all. I'll try explain in simple terms, but basic understanding of ConvNets is required.
x is an input data (called tensor) of the layer, in case of ConvNets it's rank is 4. You can think of it as a 4-dimensional array. F is usually a conv layer (conv+relu+batchnorm in this paper), and y combines the two together (forming the output channel). The result of F is also of rank 4, and most of dimensions will be the same as in x, except for one. That's exactly what the transformation should patch.
For example, x shape might be (64, 32, 32, 3), where 64 is the batch size, 32x32 is image size and 3 stands for (R, G, B) color channels. F(x) might be (64, 32, 32, 16): batch size never changes, for simplicity, ResNet conv-layer doesn't change the image size too, but will likely use a different number of filters - 16.
So, in order for y=F(x)+x to be a valid operation, x must be "reshaped" from (64, 32, 32, 3) to (64, 32, 32, 16).
I'd like to stress here that "reshaping" here is not what numpy.reshape does.
Instead, x[3] is padded with 13 zeros, like this:
pad(x=[1, 2, 3],padding=[7, 6]) = [0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0]
If you think about it, this is a projection of a 3-dimensional vector onto 16 dimensions. In other words, we start to think that our vector is the same, but there are 13 more dimensions out there. None of the other x dimensions are changed.
Here's the link to the code in Tensorflow that does this.
A linear projection is one where each new feature is simple a weighted sum of the original features. As in the paper, this can be represented by matrix multiplication. if x is the vector of N input features and W is an M-byN matrix, then the matrix product Wx yields M new features where each one is a linear projection of x. Each row of W is a set of weights that defines one of the M linear projections (i.e., each row of W contains the coefficients for one of the weighted sums of x).
In Pytorch (in particular torchvision\models\resnet.py), at the end of a Bottleneck you will either have two scenarios
The input vector x's channels, say x_c (not spatial resolution, but channels), are less than equal to the output after layer conv3 of the Bottleneck, say d dimensions. This can then be alleviated by a 1 by 1 convolution with in planes = x_c and out_planes = d, with stride 1, followed by batch normalization, and then the addition F(x) + x occurs assuming x and F(x) have the same spatial resolution.
Both the spatial resolution of x and its number of channels don't match the output of the BottleNeck layer, in which case the 1 by 1 convolution mentioned above needs to have stride 2 in order for both the spatial resolution and the number of channels to match for the element-wise addition (again with batch normalization of x before the addition).

Fast RCNN: Applying ROIs to feature map

In Fast RCNN, I understand that you first apply a CNN to the image in order to get a feature map. Then, you use the ROIs generated an external object detector (selectivesearch) to get the bounding box of potential objects of interests. However, I don't understand how you get the features from the feature map associated with the region of interest.
Ex. Apply Selectivesearch and I get a list of (x,y,width,height). Then, I apply a CNN(inceptionv3) to get a 2048x1 feature vector(from pool3 layer). How do I get the regions of interest from my feature vector of the image or am I interpreting this method incorrectly
Thanks for your help!
Then you use CNN for classification task, your network has two part:
Feature generator. Part which by image with size WI x HI and CI channels generates feature map with size WF x HF and CF channels. The relation between image sizes and feature map size depends of structure your NN (for example, on amount of pooling layers and stride of them). Also we can multiply strides of all layers in this part of CNN and get Step value (we will use it later)
Classificator. Part which solve the task of classification vectors with WF*HF*CF components into classes.
Now if you have image with size W x H, and W > WI and H > HI, you can apply first part of your network (because in this part only convolution and pooling layers) and get feature map with WFB > WF and HFB > HF.
Every windows with size WF x HF in this feature map corresponds to the window WI x HI on source image.
Rectangle (0, 0, WF, HF) on the feature map corresponds to the rectangle (0, 0, WI, HI) on the image. Rectangle (1, 0, WF+1, HF) corresponds to the rectangle (Step, 0, WI + Step, HI) on the image etc.
Therefore if you have coordinates of ROI in the feature map you can return to the ROI on the source image.

Resources