I am stuck in my homework about convolution filters. I ofcourse don't want to know the answer but I want some guidance on how to maybe solve it. The question is given 2 kernels (k1 and k2), how to find an equivalent filter that would apply the effects of both the filters on an image.
Related
It might not be clear from the question what I want to say, but how can we apply masked language modelling with the text and image given using multimodal models like lxmert. For example, if there is some text given (This is a MASK) and we mask some word in it, and there is an image given (maybe of a cat), how can we apply MML to predict the word as cat? How can we implement such a thing and get MLM scores out of it using huggingface library api? A snippet of code explaining such will be great. If anyone can help, it would help in better understanding.
Honestly im learning the neural network but i have a question in the activation part.
I know that the question is general and a lot of explanation around the internet. But i still don't understand clearly.
Why we need to derivate the sigmoid function? why do not we just use
it?
It will be good if you give the clear explanation. Thankyou.
I've seen many videos on youtube, i've read many article about it but still don't get it.
Thanks for your help.
Your question is not entirely clear, but I assume you are asking: "Why don't we just use the Sigmoid function without having to calculate its derivative?".
Your question is also very broad, so my answer is very broad and wordy, you will need to read more to understand all the details, for which I'll try to provide links.
Activation function: as the name suggests, we are wanting to know if a given node is "on" or "off", for which the sigmoid function provides an easy way to turn continuous variables (X) into a range of {0,1}.
Use cases can vary and this function has certain properties, and so that is why there are many alternative "activation" functions, like tanh, ReLU, etc. Read more here: https://en.wikipedia.org/wiki/Sigmoid_function
Differentiate (derivate): most models we want to find the best-fit beta parameters for all our activation functions. To do this we, we typically want to minimise a "cost" function that describes how good our model is at predicting observed data. One way to solve this optimisation problem is Gradient Descent. Each step of gradient descent updates the parameters by following the multi-dimensional cost-function space. To do this, it needs the gradient of the activation function. This is important for back propagation that uses gradient descent to optimise the network, it requires that the activation functions you use (in most cases) to be differentiateable.
Read more here: https://en.wikipedia.org/wiki/Gradient_descent
I suggest if you have a deeper question that you take it to one of the machine learning stackexchange sites.
This question is about the computer algebra system Magma (not the linear algebra library), and is crossposted from scicomp.SE.
Please forgive if this is off-topic; I am a regular user of the StackExchange network but this is my first post on StackOverflow. I am looking for the right home for this kind of question. (In principle it seems to me to be scicomp.SE but it hasn't gotten an answer in 4 days so I wanted to know if StackOverflow yielded a different result.)
Suppose one has constructed a polynomial algebra A over a ring R in Magma. How does one construct the sub-R-algebra of A generated by a given list of elements of A?
This seems to me to be a very basic operation so I can't believe there isn't a way to do it, but I haven't so far found it in the handbook. (I see functionality to construct subalgebras of matrix algebras and of endomorphism rings of abelian varieties, but not polynomial rings.)
I have 5000 images and each image can generate a vector with about 1000 dimensions(hog feature), but some of the images are very similar so I want to remove the similar ones. Is there a way to achieve this?
===============================================================
As #thedarkside ofthemoon suggested, let me explain a little bit more about what I am trying to do. I am using SVM + HOG features to do image classification. I have prepared some training data but some of the training images are very similar so that I want to remove the similar ones to reduce computation cost. I don't know if the removal of similar images has a side effect on the final classification rate so a good criteria of 'similarity' must be found. That's what i am trying to do.
In another way(not using hog features) you can compute color histogram for each image and compare against others.
Like,
Get the first image and compute the histogram,
Now for each other images calculate histogram and compare with the first one.
If you find close match on the histogram you can discard it. And by using CV_COMP_CORREL you will get match in the range of 0-1.
Well it depends what you mean by similar, currently my favorite image similarity descriptor is the gist descriptor.
http://people.csail.mit.edu/torralba/code/spatialenvelope/
but it is not in opencv. however it is coded in C here, so can be added to a c++ project (extern "C"), if your using the c++ opencv, not sure about python sorry.
http://people.rennes.inria.fr/Herve.Jegou/software.html
I have found this to be pretty good, and quite efficient.
(Sorry this is not a direct opencv solution, but i feel it is a reasonable answer as gist C code can be added to c++ project, and works nicely.)
EDIT:
if you just want to remove ones with similar hog descriptor you can use the:
http://docs.opencv.org/modules/ml/doc/k_nearest_neighbors.html
or
http://docs.opencv.org/trunk/modules/flann/doc/flann_fast_approximate_nearest_neighbor_search.html
I have a large image (5400x3600) that has multiple CCTVs that I need to detect.
The detection takes lot of time (4-7 minutes) with rotation. But it still fails to resolve certain CCTVs.
What is the best method to match a template like this?
I am using skImage - openCV is not an option for me, but I am open to suggestions on that too.
For example: in the images below, the template is correct matched with the second image - but the first image is not matched - I guess due to the noise created by the text "BLDG..."
Template:
Source image:
Match result:
The fastest method is probably a cascade of boosted classifiers trained with several variations of your logo and possibly a few rotations and some negative examples too (non-logos). You have to roughly scale your overall image so the test and training examples are approximately matched by scale. Unlike SIFT or SURF that spend a lot of time in searching for interest points and creating descriptors for both learning and searching, binary classifiers shift most of the burden to a training stage while your testing or search will be much faster.
In short, the cascade would run in such a way that a very first test would discard a large portion of the image. If the first test passes the others will follow and refine. They will be super fast consisting of just a few intensity comparison in average around each point. Only a few locations will pass the whole cascade and can be verified with additional tests such as your rotation-correlation routine.
Thus, the classifiers are effective not only because they quickly detect your object but because they can also quickly discard non-object areas. To read more about boosted classifiers see a following openCV section.
This problem in general is addressed by Logo Detection. See this for similar discussion.
There are many robust methods for template matching. See this or google for a very detailed discussion.
But from your example i can guess that following approach would work.
Create a feature for your search image. It essentially has a rectangle enclosing "CCTV" word. So the width, height, angle, and individual character features for matching the textual information could be a suitable choice. (Or you may also use the image having "CCTV". In that case the method will not be scale invariant.)
Now when searching first detect rectangles. Then use the angle to prune your search space and also use image transformation to align the rectangles in parallel to axis. (This should take care of the need for the rotation). Then according to the feature choosen in step 1, match the text content. If you use individual character features, then probably your template matching step is essentially a classification step. Otherwise if you use image for matching, you may use cv::matchTemplate.
Hope it helps.
Symbol spotting is more complicated than logo spotting because interest points work hardly on document images such as architectural plans. Many conferences deals with pattern recognition, each year there are many new algorithms for symbol spotting so giving you the best method is not possible. You could check IAPR conferences : ICPR, ICDAR, DAS, GREC (Workshop on Graphics Recognition), etc. This researchers focus on this topic : M Rusiñol, J Lladós, S Tabbone, J-Y Ramel, M Liwicki, etc. They work on several techniques for improving symbol spotting such as : vectorial signatures, graph based signature and so on (check google scholar for more papers).
An easy way to start a new approach is to work with simples shapes such as lines, rectangles, triangles instead of matching everything at one time.
Your example can be recognized by shape matching (contour matching), much faster than 4 minutes.
For good match , you require nice preprocess and denoise.
examples can be found http://www.halcon.com/applications/application.pl?name=shapematch