Searching OpenCV ORB descriptors using ElasticSearch - opencv

I'm storing ORB Descriptors in an ElasticSearch Vector Field and then performing kNN Searches using the new API in ElasticSearch 8.0+.
# Read the image in via URL and convert to Gray
resp = urllib.request.urlopen(URL)
image = np.asarray(bytearray(, dtype="uint8")
image = cv2.imdecode(image, cv2.COLOR_BGR2GRAY)
# Only looking for 16 features since ElasticSearch
# will not let us index a vector larger then 1024
orb = cv2.ORB_create(nfeatures=16)
kp, des = orb.detectAndCompute(image, None)
# Flatten for saving in ElasticSearch
dsc = des.flatten()
# Finally index it in ElasticSearch
This is my mapping for ElasticSearch
mappings: {
dynamic: 'true',
properties: {
image_dense_vector: {
type: 'dense_vector',
dims: 1024,
index: true,
similarity: 'cosine'
And finally this is my search query.
body = {
res =, knn=body)
The data set consists of ~34,000 records.
This will return results if the image passed in is an exact match. But if the image is off even slightly the results that come back are not even close to accurate.
Any suggestions?

First, ponder the ORB paper.
ORB uses the BRIEF descriptor. BRIEF emits binary vectors.
detectAndCompute may give you an array of uint8 but those byte values aren't scalars. Those bytes merely hold the bits.
For binary vectors, you need to use the Hamming distance. "Cosine similarity" doesn't work. Not unless you blow your data up and interpret the 32*8=256 bits as scalars.


how to apply custom encoders to multiple clients at once? how to use custom encoders in run_one_round?

So my goal is basically implementing global top-k subsampling. Gradient sparsification is quite simple and I have already done this building on stateful clients example, but now I would like to use encoders as you have recommended here at page 28. Additionally I would like to average only the non-zero gradients, so say we have 10 clients but only 4 have nonzero gradients at a given position for a communication round then I would like to divide the sum of these gradients to 4, not 10. I am hoping to achieve this by summing gradients at numerator and masks, 1s and 0s, at denominator. Also moving forward I will add randomness to gradient selection so it is imperative that I create those masks concurrently with gradient selection. The code I have right now is
import tensorflow as tf
from tensorflow_model_optimization.python.core.internal import tensor_encoding as te
class GrandienrSparsificationEncodingStage(te.core.AdaptiveEncodingStageInterface):
"""An example custom implementation of an `EncodingStageInterface`.
Note: This is likely not what one would want to use in practice. Rather, this
serves as an illustration of how a custom compression algorithm can be
provided to `tff`.
This encoding stage is expected to be run in an iterative manner, and
alternatively zeroes out values corresponding to odd and even indices. Given
the determinism of the non-zero indices selection, the encoded structure does
not need to be represented as a sparse vector, but only the non-zero values
are necessary. In the decode mehtod, the state (i.e., params derived from the
state) is used to reconstruct the corresponding indices.
Thus, this example encoding stage can realize representation saving of 2x.
ENCODED_VALUES_KEY = 'stateful_topk_values'
INDICES_KEY = 'indices'
SHAPES_KEY = 'shapes'
ERROR_COMPENSATION_KEY = 'error_compensation'
def encode(self, x, encode_params):
shapes_list = [tf.shape(y) for y in x]
flattened = tf.nest.map_structure(lambda y: tf.reshape(y, [-1]), x)
gradients = tf.concat(flattened, axis=0)
error_compensation = encode_params[self.ERROR_COMPENSATION_KEY]
gradients_and_error_compensation = tf.math.add(gradients, error_compensation)
percentage = tf.constant(0.1, dtype=tf.float32)
k_float = tf.multiply(percentage, tf.cast(tf.size(gradients_and_error_compensation), tf.float32))
k_int = tf.cast(tf.math.round(k_float), dtype=tf.int32)
values, indices = tf.math.top_k(tf.math.abs(gradients_and_error_compensation), k = k_int, sorted = False)
indices = tf.expand_dims(indices, 1)
sparse_gradients_and_error_compensation = tf.scatter_nd(indices, values, tf.shape(gradients_and_error_compensation))
new_error_compensation = tf.math.subtract(gradients_and_error_compensation, sparse_gradients_and_error_compensation)
state_update_tensors = {self.ERROR_COMPENSATION_KEY: new_error_compensation}
encoded_x = {self.ENCODED_VALUES_KEY: values,
self.INDICES_KEY: indices,
self.SHAPES_KEY: shapes_list}
return encoded_x, state_update_tensors
def decode(self,
del num_summands, decode_params, shape # Unused.
flat_shape = tf.math.reduce_sum([tf.math.reduce_prod(shape) for shape in encoded_tensors[self.SHAPES_KEY]])
sizes_list = [tf.math.reduce_prod(shape) for shape in encoded_tensors[self.SHAPES_KEY]]
scatter_tensor = tf.scatter_nd(
nonzero_locations = tf.nest.map_structure(lambda x: tf.cast(tf.where(tf.math.greater(x, 0), 1, 0), tf.float32) , scatter_tensor)
reshaped_tensor = [tf.reshape(flat_tensor, shape=shape) for flat_tensor, shape in
zip(tf.split(scatter_tensor, sizes_list), encoded_tensors[self.SHAPES_KEY])]
reshaped_nonzero = [tf.reshape(flat_tensor, shape=shape) for flat_tensor, shape in
zip(tf.split(nonzero_locations, sizes_list), encoded_tensors[self.SHAPES_KEY])]
return reshaped_tensor, reshaped_nonzero
def initial_state(self):
return {self.ERROR_COMPENSATION_KEY: tf.constant(0, dtype=tf.float32)}
def update_state(self, state, state_update_tensors):
return {self.ERROR_COMPENSATION_KEY: state_update_tensors[self.ERROR_COMPENSATION_KEY]}
def get_params(self, state):
encode_params = {self.ERROR_COMPENSATION_KEY: state[self.ERROR_COMPENSATION_KEY]}
decode_params = {}
return encode_params, decode_params
def name(self):
return 'gradient_sparsification_encoding_stage'
def compressible_tensors_keys(self):
return False
def commutes_with_sum(self):
return False
def decode_needs_input_shape(self):
return False
def state_update_aggregation_modes(self):
return {}
I have run some simple tests manually following the steps you outlined here at page 45. It works but I have some questions/problems.
When I use list of tensors of same shape (ex:2 2x25 tensors) as input,x, of encode it works without any issues but when I try to use list of tensors of different shapes (2x20 and 6x10) it gives and error saying
InvalidArgumentError: Shapes of all inputs must match: values[0].shape = [2,20] != values1.shape = [6,10] [Op:Pack] name: packed
How can I resolve this issue? As i said I want to use global top-k so it is essential I encode entire trainable model weights at once. Take the cnn model used here, all the tensors have different shapes.
How can I do the averaging I described at the beginning? For example here you have done
mean_factory = tff.aggregators.MeanFactory(
tff.aggregators.EncodedSumFactory(mean_encoder_fn), # numerator
tff.aggregators.EncodedSumFactory(mean_encoder_fn), # denominator )
Is there a way to repeat this with one output of decode going to numerator and other going to denominator? How can I handle dividing 0 by 0? tensorflow has divide_no_nan function, can I use it somehow or do I need to add eps to each?
How is partition handled when I use encoders? Does each client get a unique encoder holding a unique state for it? As you have discussed here at page 6 client states are used in cross-silo settings yet what happens if client ordering changes?
Here you have recommended using stateful clients example. Can you explain this a bit further? I mean in the run_one_round where exactly encoders go and how are they used/combined with client update and aggregation?
I have some additional information such as sparsity I want to pass to encode. What is the suggested method for doing that?
Here are some answers, hope it helps:
If you want to treat all of the aggregated structure just as a single tensor, use concat_factory as the outermost aggregator. That will concatenate entire structure to a rank-1 Tensor at clients, and then unpack back to the original structure at the end. Example use: tff.aggregators.concat_factory(tff.aggregators.MeanFactory(...))
Note the encoding stage objects are meant to work with a single tensor, so what you describe with identical tensors probably works only accidentally.
There are two options.
a. Modify the client training code such that the weights being passed to the weighted aggregator are already what you want it to be (zero/one
mask). In the stateful clients example you link, that would be here. You will then get what you need by default (by summing the numerator).
b. Modify UnweightedMeanFactory to do exactly the variant of averaging you describe and use that. Start would be modifying this
(and 4.) I think that is what you would need to implement. The same way existing client states are initialized in the example here, you would need extend it to contain the aggregator states, and make sure those are sampled together with the clients, as done here. Then, to integrate the aggregators in the example you would need to replace this hard-coded tff.federated_mean. An example of such integration is in the implementation of tff.learning.build_federated_averaging_process, primarily here
I am not sure what the question is. Perhaps get the previous working (seems like a prerequisite to me), and then clarify and ask in a new post?

How to compare probe face images with gallery images with feature extractor | Python

I have a dataset that contains 1500 face images and i have selected 150 images as probe.
Now 150 images are in probe folder and other images are in gallery folder.
I have facenet feature extractor which extract features from images and save into .npy array to compute euclidean distance.
How i can compare these 150 images with whole gallery folder and draw a accuracy graph of rank-1,5 and 10 and between similar images and compute mAP?
First, i will run feature extractor to the test image. Then calculate the difference between each 150 images feature extraction results (lets say train sets) and test image feature extraction results.
all_res = []
for set in train sets :
res = set - test_res
res = sum(res)
all_res = all_rest.sort()
So the smallest index of the all_res list are the first rank and the biggest one is the latest rank. I hope it can be good reference. Also you can use sklearn to evaluate your model such as SVC, accuracy_score, etc.
Suppose that 1500 face images are stored in source folder, and 150 images are stored in target folder.
#!pip install deepface
from deepface import DeepFace
targets = ["img1.jpg", "img2.jpg", "img150.jpg"]
resp = DeepFace.find(img_path = targets, db_path = "source", model_name = "Facenet")
BTW, you can set Facenet, VGG-Face, OpenFace, DeepFace or DeepID as model name.
Response object will return list of pandas data frames. Each data frame is sorted from the most similar one to least similar one. That's why, I'll get the 1st one.
index = 0
for df in resp:
if df.shape[0] > 0:
#print(targets[index], ": ", df.head(1))
df.to_csv("%s" % (targets[index]), index = False)
index = index + 1
This will match identities in two folders.

OpenCV Best way to match the spot patterns

I'm trying to write an app for wild leopard classification and conservation in South Asia. For this, I have the main challenge to identify the leopards by their spot pattern in the forehead.
The current approach I am using is,
Store the known leopard forehead images as a base list
Get the user-provided leopard image and crop the forehead of the leopard
Pre-process the images with the bilateral filter to reduce the noise
Identify the keypoints using the SIFT algorithm
Use FLANN matcher to get KNN matches
Select good matches based on the ratio threshold
Sample code:
# Pre-Process & reduce noise.
img1 = cv.bilateralFilter(baseImg, 9, 75, 75)
img2 = cv.bilateralFilter(userImage, 9, 75, 75)
detector = cv.xfeatures2d_SIFT.create()
keypoints1, descriptors1 = detector.detectAndCompute(img1, None)
keypoints2, descriptors2 = detector.detectAndCompute(img2, None)
# FLANN parameters
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50) # or pass empty dictionary
matcher = cv.FlannBasedMatcher(index_params, search_params)
knn_matches = matcher.knnMatch(descriptors1, descriptors2, 2)
allmatchpointcount = len(knn_matches)
ratio_thresh = 0.7
good_matches = []
for m, n in knn_matches:
if m.distance < ratio_thresh * n.distance:
goodmatchpointcount = len(good_matches)
print("Good match count : ", goodmatchpointcount)
matchsuccesspercentage = goodmatchpointcount/allmatchpointcount*100
print("Match percentage : ", matchsuccesspercentage)
Problems I have with this approach:
The method has a medium-low success rate and tends to break when there is a new user image.
The user images are sometimes taken from different angles where some key patterns are not visible or warped.
The user image quality affects the match result significantly.
I appreciate any suggestions to get this improved in any manner.
Sample Images
Base Image
Above is matching to below: (Incorrect pattern matched)
More sample images as requested.

Tensorflow Deprecation Warning

I am trying to create a convolutional neural network for image classification using one of the open access github codes. I have two classes of images. But, when I start running the one part of the code I keep getting this error
/Users/user/anaconda/envs/tensorflow/lib/python3.5/site-packages/ipykernel/ DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
This is the part of code that has error (although the origin of this error are probably somewhere else, my intuition tells me that it lies in the labelling of images, but I am not sure how to fix that, I tried relabelling multiple times, nothing worked to fix this).
def print_test_accuracy(show_example_errors=False,
# Number of images in the test-set.
num_test = len(test_images)
# Allocate an array for the predicted classes which
# will be calculated in batches and filled into this array.
cls_pred = np.zeros(shape=num_test,
# Now calculate the predicted classes for the batches.
# We will just iterate through all the batches.
# There might be a more clever and Pythonic way of doing this.
# The starting index for the next batch is denoted i.
i = 0
while i < num_test:
# The ending index for the next batch is denoted j.
j = min(i + test_batch_size, num_test)
# Get the images from the test-set between index i and j.
images = test_images[i:j, :]
# Get the associated labels.
labels = test_labels[i:j, :]
# Create a feed-dict with these images and labels.
feed_dict = {x: images,
y_true: labels}
# Calculate the predicted class using TensorFlow.
cls_pred[i:j] =, feed_dict=feed_dict)
# Set the start-index for the next batch to the
# end-index of the current batch.
i = j
# Convenience variable for the true class-numbers of the test-set.
cls_true = test_class_labels
# Create a boolean array whether each image is correctly classified.
correct = (cls_true == cls_pred)
# Calculate the number of correctly classified images.
# When summing a boolean array, False means 0 and True means 1.
correct_sum = sum(correct)
# Classification accuracy is the number of correctly classified
# images divided by the total number of images in the test-set.
acc = float(correct_sum) / num_test
# Print the accuracy.
msg = "Accuracy on Test-Set: {0:.1%} ({1} / {2})"
print(msg.format(acc, correct_sum, num_test))
# Plot some examples of mis-classifications, if desired.
if show_example_errors:
print("Example errors:")
plot_example_errors(cls_pred=cls_pred, correct=correct)
# Plot the confusion matrix, if desired.
if show_confusion_matrix:
print("Confusion Matrix:")
Try tf.equal:
correct = tf.equal(cls_pred, cls_true)
or, if it is a probability distribution rather than just the argmax already:
correct = tf.equal(tf.argmax(cls_pred, 1), tf.argmax(cls_true, 1))

Blocproc in matlab with two output variables

I have the following problem. I have to compute dense SIFT interest points in a very high dimensional image (182MP). When I run the code in the full image Matlab always close suddently. So I decided to run the code in image patches.
the code
I tried to use blocproc in matlab to call the c++ function that performs the dense sift interest points detection this way:
fun = #(block_struct) denseSIFT(, options);
[dsift , infodsift] = blockproc(ndvi,[1000 1000],fun);
where dsift is the sift descriptors (vectors) and infodsift has the information of the interest points, such as the x and y coordinates.
the problem
The problem is the fact that blocproc just allow one output, but i want both outputs. The following error is given by matlab when i run the code.
Error using blockproc
Too many output arguments.
Is there a way for me doing this?
Would it be a problem for you to "hard code" a version of blockproc?
Assuming for a moment that you can divide your image into NxM smaller images, you could loop around as follows:
bigImage = someFunction();
sz = size(bigImage);
smallSize = sz ./ [N M];
dsift = cell(N,M);
infodsift = cell(N,M);
for ii = 1:N
for jj = 1:M
smallImage = bigImage((ii-1)*smallSize(1) + (1:smallSize(1)), (jj-1)*smallSize(2) + (1:smallSize(2));
[dsift{ii,jj} infodsift{ii,jj}] = denseSIFT(smallImage, options);
The results will then be in the two cell arrays. No real need to pre-allocate, but it's tidier if you do. If the individual matrices are the same size, you can convert into a single large matrix with
dsiftFull = cell2mat(dsift);
Almost magic. This won't work if your matrices are different sizes - but then, if they are, I'm not sure you would even want to put them all in a single one (unless you decide to horzcat them).
If you do decide you want a list of "all the colums as a giant matrix", then you can do
giantMatrix = [dsift{:}];
This will return a matrix with (in your example) 128 rows, and as many columns as there were "interest points" found. It's shorthand for
giantMatrix = [dsift{1,1} dsift{2,1} dsift{3,1} ... dsift{N,M}];
