Face-recognition using Opencv - opencv

I need to do a face recognition system using opencv LBP and here is the link where the facerec code.
In this code the CSV file has to be generated for Multiple users and the code will recognize if the input face is in the list of CSV not.
My intention is to do face verification against single user. i.e., User will register his face for the first time ( I will write it in csv ) and whenever the same user tries to authenticate
I will collect the few images of the user and compare with the previous CSV file. How to do this with the above code?

consider a threshold (given by predict function) smaller than a value to determine if the face in known or not.
for the first time, passing face will be predicted as unknown (smaller than the threshold), then put it in the DB (csv file too). for the next pass, it should predict by a value a bit greater than the previous, you can take it , and so on until you think that the the prediction is ok for you

Related

Use only the firsrt 4 layers of XLNET

First sorry for my bad english.
Short version :Can anyone tell me how to use only the first n layers of XLNET for classification ?
Long Version :
I have a dataset composed of texts and their summary. The goal is to detect if the summary is generated by a bot or not.
So I thought of using bert and give him as input "[CLS] "+Text+" [SEP]"+summary then take the representation of the "[CLS] " token and detect using a classifier if the summary was written by a bot.
Th problem is bert takes no more than 512 words as input.
So I thought of using XLNET. But here another problem appeared : My gpu (RTX 2060) can't handle a batch of size 1.
So I Thought of using only like the first 4 layers of XLNET but the problem is: I don't know how to do it.
So my code to load the model is model=XLNetForSequenceClassification.from_pretrained("xlnet-base-cased", num_labels = 2)
can anyone tell me what to add to use only a part of the network please ?

Caffe mean file creation without database

I run caffe using an image_data_layer and don't want to create an LMDB or LevelDB for the data, But The compute_image_mean tool only works with LMDB/LevelDB databases.
Is there a simple solution for creating a mean file from a list of files (the same format that image_data_layer is using)?
You may notice that recent models (e.g., googlenet) do not use a mean file the same size as the input image, but rather a 3-vector representing a mean value per image channel. These values are quite "immune" to the specific dataset used (as long as it is large enough and contains "natural images").
So, as long as you are working with natural images you may use the same values as e.g., GoogLenet is using: B=104, G=117, R=123.
The simplest solution is to create a LMDB or LevelDB database of the image set.
The complicated solution is to write a tool similar to compute_image_mean, which takes image inputs and do the transformations and find the mean!

Why does ELKI need db.in file in addition to distance matrix? Also what should db.in file contain?

I tried to follow this tutorial on using ELKI with pre-computed distances for clustering.
http://elki.dbs.ifi.lmu.de/wiki/HowTo/PrecomputedDistances
I used the following set of command line options:
-dbc.filter FixedDBIDsFilter -dbc.startid 0 -algorithm clustering.OPTICS
-algorithm.distancefunction external.FileBasedDoubleDistanceFunction
-distance.matrix /path/to/matrix -optics.minpts 5 -resulthandler ResultWriter
ELkI fails with a configuration error saying db.in file is needed to make the computation.
The following configuration errors prevented execution:
No value given for parameter "dbc.in":
Expected: The name of the input file to be parsed.
No value given for parameter "parser.distancefunction":
Expected: Distance function used for parsing values.
My question is what is db.in file? Why should I provide it in addition to the distance matrix file since the pair-wise distance matrix file completely specifies all the information about the point cloud. (also I don't have access to any other information other than the pair-wise distance information).
What should I do about db.in? Should I override it, or specify some dummy information etc. Kindly help me understand.
thank you.
This is documented in the ELKI HowTos:
http://elki.dbs.ifi.lmu.de/wiki/HowTo/PrecomputedDistances
Using without primary data
-dbc DBIDRangeDatabaseConnection -idgen.count 100
However, there is a bug (patch is on the howto page, and will be in the next release) so you right now can't fully use this; as a workaround you can use a text file that enumerates the objects.
The reason for this is that ELKI is designed to work on multi-relational data. It's not just processing matrixes. But some algorithms may e.g. need a geographic representation of an object, some measurements for this object, and a label for evaluation. That is three relations.
What the DBIDRange data source essentially does is create a single "fake" relation that is just the DBIDs 0 to 99. On algorithms that don't need actual data, but only distances (e.g. LOF or DBSCAN or OPTICS), it is sufficient to have object IDs and a distance matrix.

Unique identifiers for each data point in Mahout

Suppose I have a dataset I want to run a Mahout clustering job on. I want each data point to have a unique identifier, such as an ID number. I don't want to append the ID to the vector as this way it will be included in the clustering calculations. How can I include an identifier in the data without the algorithm including the ID number in its calculations? Is there a way to have the input be a key-value pair where the key is the ID and the value is the Vector I want to run the algorithm on?
Alison before worrying about this, see the output first. Many times, you have lines of assignedCLusterIDs, where line orders in input and output files are the same. For example, the node in the first line of your input file will be in the first line of the output file. So you can keep ids in a separate file, their vectors in the input file. Then you can combine the separate file and the output file to see which node is assigned which cluster.

How to save CV_32F type CV::Mat to a file without loosing precision?

I'm using cv::PCA class for a face recognition project. I convert photos of faces to one row vectors, concatenate them to one big array and feed to pca, to acquire a new space in which I can try to use distance for recognition. Problem is, that calculating the pca from scratch each time I start the program is really time consuming (almost five minutes). I figured out that I need to save the calculated pca to hard drive, and load it when I start the program again. And here is the problem. As I can see, all cv::Mat objects in cv::PCA are of type CV_32F. When i try to save it as a normal picture, its converted to 8 bit image, and there is some data lost. When i use XML/YAML persistence, the generated file is really big, and data is also lost (I have saved it, loaded to another structure and ran cerr<<sum(pca_orginal.mean==pca_loaded.mean)[0]<<endl to check how big is the difference). Right now I'm trying to use std::ofstream::write with std::ofstream::binary flag, and istream::read, but there are some type issues (out.write(_pca.mean.data,_pca.mean.rows*_pca.mean.cols*4/*CV_32F->4*CV_8U*/\); generates error: no matching function for call to ‘std::basic_ofstream<char, std::char_traits<char> >::write(uchar*&, int). I've also heard about openexr library and it's file format, but I would rather avoid using additional libraries. I'm using OpenCV 2.3.1 and OpenCV 2.2.
edit:
I'm sorry for the confusion. I misread cv::Mat operator== description, and thought that it works the opposite way that it does, so sum(pca_orginal.mean==pca_loaded.mean)[0] giving 0 is the worse possible result, not the best. It means that XML/YML works fine apart from generating huge files. Also, after using c-style casting I was able to make the binary streams work, but the files generated are also big (over 150MB).
In the C interface, there are functions cvSave and cvLoad for saving arbitrary matrices. There are probably C++ interface counterparts, too.

Resources