I am using HRNet estimator for extracting Human Poses and Heatmaps. SimpleHRNet
I was processing a lot of videos to save pose keypoints and heatmaps and after the processing, I realized there's some major error in the pose keypoints files, but heatmaps files seems to be okay.
Is there a way to convert the heatmaps into Pose keypoints in a fast and efficient manner. I just need a few ideas to implement.
I'm using coco-keypoint format (17 keypoints) and heatmaps are stored as .npy files each containing heatmaps for all 17 keypoints (shape is 17, 96, 72)
Related
i know that cv2.face.LBPHFaceRecognizer_create() use it for recognize face in real time
but i want to know what its fonction?,what exist inside this instruction ? how it is work?
i want to know what itss struct for exemple it is take the image and extract caractrestic in forme lbph and its use for that .... than train image for that its use (name of trainer) compare the images for can recognise them.
any information or document can help me please pratge with me
LBP(Local Binary Patterns) are one way to extract characteristic features of an object (Could be face, coffe cup or anything that has a representation). LBP's algorithm is really straight forward and can be done manually. (pixel thresholding + pixel level arithmetic operations.)
LBP Algorithm:
There is a "training" part in OpenCV's FaceRecognizer methods. Don't make this confuse you, there is no deep learning approach here. Just simple math.
OpenCV transforms LBP images to histograms to store spatial information of faces with the representation proposed by Timo Ahonen, Abdenour Hadid, and Matti Pietikäinen. Face recognition with local binary patterns. In Computer vision-eccv 2004, pages 469–481. Springer, 2004. . Which divides the LBP image to local regions sized m and extracting histogram for each region and concatenating them.
After having informations about one person's (1 label's) face, the rest is simple. During the inference, it calculates the test face's LBP, divides the regions and creates a histogram. Then compares its euclidian distance to the trained faces' histograms. If it's less than the tolerance value, it counts as a match. (Other distance methods can be used also, chi-squared distance absolute value etc. )
I am trying to use Freak in opencv to detect features and extract descriptors, then build my BOW vocabulary and for each image use the vocabulary to match with BOW. You know, the whole thing. I know BOW can be used with other descriptors like SIFT or SURF, it is not clear to me if Freak descriptors, which are binary, can be used with BOW. More specifically, when opencv builds a BOW vocabulary, it uses k-means cluster. It is not clear to me what distance function the k-means cluster algorithm uses. For binary descriptors like Freak, Hamming distance seems to be the only choice.
It looks to me opencv k-means only uses euclidean distance when calculating distance, bummer. Looks like I have to build my own k-means and my own vocabulary matching. Any smart people out there know a workaround?
Thanks!
I read on a paper that Freak is not easy to be used. Here is the excerpt form the paper "....These algorithms cannot be easily used in many retrieval algorithms because they must be compared with a Hamming distance, which is not easily adapted to accelerated search structures such as vocabulary trees or Approximate Nearest Neighbors (ANN)...."
(ORB ,FREAK and BRISK)
FREAK works with locality sensitive hashing. You can use it with FLANN (Fast approximate nearest neighbors) included in OpenCV.
For the BOW, only the first 5, 6, 7, 8 bytes of the descriptor might be sufficient to construct the tree.
Bascially you have first to do a:
SurfFeatureDetector surf(400);
surf.detect(image1, keypoints1);
And then a:
surfDesc.compute(image1, keypoints1, descriptors1);
Why detect and comput are 2 different operation?
Doing the computing after the detect doesn't make redundance loops?
I found myself that .compute is the most expensive in my application.
.detect
is done in 0.2secs
.compute
Takes ~1sec. Is there any way to speed up .compute ?
The detection of keypoints is just a process for selecting points in the image that are considered "good features".
The extraction of descriptors of those keypoints is a completely different process that encodes properties of that feature like contrast with neighbours etc so it can be compared with other keypoints from different images, different sclae and orientation.
The way you describe a keypoint could be crucial for a successful matching, and this is really the key factor. Also the way you describe a keypoint is determinant for the matching speed. For example you can describe it as a float or as a binary secuence.
There is a difference between detecting the keypoints in an image and computing the descriptors for those keypoints. For example you can extract SURF keypoints and compute SIFT features. Note that in DescriptorExtractor::compute method, filters on keypoints are applied:
KeyPointsFilter::runByImageBorder()
KeyPointsFilter::runByKeypointSize();
Picking up from where Jay_Rock left, you can improve those processing times by using a binary descriptor offered by algorithms like ORB, Brisk or FREAK. Not only will they occupy 32 bit instead of 64, they also offer different methods for computing descriptors that are as robust as SURF's and much faster.
If you eventually want to perform matching operations between descriptors, this is done by calculating the Hamming distance between both. Given that it's a XOR operation between two binary strings, it takes only a few milliseconds to run.
I have extracted the features by using OpenCV opensource..
I have done these steps by using these 2 functions
SiftFeatureDetector
SiftDescriptorExtractor
which I got a matrix of 128*128 from the descriptors, which I think as well that I will use
this matrix to train the features...
What I'm confused about is the following,
When I want to train the features,
I should use a matrix of number of features and every single row contains the information about that feature.. which it might be a matrix of
number of features * 6
For example, I got 344 features in an image... and I got a matrix of 128*128 for the descriptor, which I need this matrix in order to train my features
but as I mentioned, I'm just getting 128*128 matrix.. so what's the problem?
And, what should I get to train later on?
Have you looked at the descriptor_extractor_matcher.cpp, or the matcher_simple.cpp samples from OpenCV? Also, could you post the code you are using to detect the features?
Several questions have been asked about the SIFT algorithm, but they all seem focussed on a simple comparison between two images. Instead of determining how similar two images are, would it be practical to use SIFT to find the closest matching image out of a collection of thousands of images? In other words, is SIFT scalable?
For example, would it be practical to use SIFT to generate keypoints for a batch of images, store the keypoints in a database, and then find the ones that have the shortest Euclidean distance to the keypoints generated for a "query" image?
When calculating the Euclidean distance, would you ignore the x, y, scale, and orientation parts of the keypoints, and only look at the descriptor?
There are several approaches.
One popular approach is the so called bag of words representation which does matching based solely upon how many descriptors match, thus ignoring the location part consisting of (x, y, scale, and orientation) and just look at the descriptor.
Efficient querying of a large database may use approximate methods like locality sensitive hashing
Other methods may involve vocabulary trees or other data structures.
For an efficient method that also takes into account location information, check out pyramid match kernels