Extract features from Alphapose keypoints output - pose-estimation

I am currently trying to use Alphapose keypoints output.
I have a few questions.
I am wondering how can I use these output to extract the features like cadence, stride length and more.
I am wondering how many frames are generated each minute. My below json actually contained more than hundreds of jpegs. below is just a sample.
-> Here is the output examples:
[{'image_id': '3.jpg',
'category_id': 1,
'keypoints': [3084.453125,
766.064453125,
0.27073606848716736,
3109.76806640625,
808.2561645507812,
0.21385984122753143,
3084.453125,
766.064453125,
0.24748951196670532,
3109.76806640625,
875.7628173828125,
0.3147721290588379,
3084.453125,
715.4345092773438,
0.20229308307170868,
3067.576416015625,
1019.2144775390625,
0.4297007918357849,
3050.69970703125,
732.3111572265625,
0.32160425186157227,
2881.93310546875,
968.58447265625,
0.32375261187553406,
2848.1796875,
681.68115234375,
0.3226509690284729,
2730.043212890625,
867.324462890625,
0.5148664116859436,
2730.043212890625,
782.941162109375,
0.33549684286117554,
2763.79638671875,
884.201171875,
0.29956817626953125,
2831.30322265625,
766.064453125,
0.22447820007801056,
2308.12646484375,
884.201171875,
0.2733159065246582,
2325.003173828125,
901.0778198242188,
0.2143297791481018,
1717.4429931640625,
917.9544677734375,
0.3410451412200928,
1683.689697265625,
934.8311767578125,
0.3064221441745758],
'score': 1.1431528329849243,
'box': [1685.37744140625,
665.083251953125,
1296.1279296875,
421.359130859375],
'idx': [0.0]},
{'image_id': '6.jpg',
'category_id': 1,
'keypoints': [2716.578125,
708.694580078125,
0.20404888689517975,
2760.772705078125,
693.9630126953125,
0.24653863906860352,
2731.3095703125,
679.2314453125,
0.24123729765415192,
3003.84375,
878.107666015625,
0.1836661398410797,
2981.746337890625,
679.2314453125,
0.16582731902599335,
3003.84375,
966.4970703125,
0.1983313262462616,
2967.014892578125,
708.694580078125,
0.313667356967926,
3003.84375,
1025.42333984375,
0.3715871572494507,
2760.772705078125,
826.5471801757812,
0.2802092432975769,
2760.772705078125,
856.0103149414062,
0.32904109358787537,
2760.772705078125,
826.5471801757812,
0.27431410551071167,
2701.846435546875,
929.6681518554688,
0.16013894975185394,
2628.188720703125,
885.4734497070312,
0.18557937443256378,
2318.82568359375,
914.9365844726562,
0.33809077739715576,
2333.55712890625,
900.2050170898438,
0.24967674911022186,
1714.8311767578125,
929.6681518554688,
0.40348780155181885,
1744.2943115234375,
914.9365844726562,
0.271779328584671],
'score': 0.8994148969650269,
'box': [1760.4990234375, 702.509521484375, 1131.384765625, 410.12255859375],
'idx': [0.0]}]
Here is the reference I used.
https://github.com/MVIG-SJTU/AlphaPose

Related

Correct way to use librosa mfcc with Random Forest

I would like to know what is the best approach in order to use librosa.feature.mfcc feature extraction in a Random Forest classifier?
2 cases is as follows:
Case 1:
I have 1000 audio files and use the librosa mfcc feature extraction as is:
def extract_features(file_name):
try:
durationSeconds = 1
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
trimmed = librosa.util.fix_length(audio, size=int(sample_rate * durationSeconds))
mfccs = librosa.feature.mfcc(y=trimmed, sr=sample_rate, n_mfcc=40)
pad_width = max_pad_len - mfccs.shape[1]
mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')
except Exception as e:
print("Error encountered while parsing file: ", file_name)
return None
return mfccs
I would then flatten the 3D array generated by this feature extraction via:
X_train_flat = np.array([features_2d.flatten() for features_2d in X_train])
and then send this to the Random Forest classifier.
Case 2:
For the same 1000 audio files, I use:
def extract_features(file_name):
try:
durationSeconds = 1
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
trimmed = librosa.util.fix_length(audio, size=int(sample_rate * durationSeconds))
mfccs = librosa.feature.mfcc(y=trimmed, sr=sample_rate, n_mfcc=40)
pd.Series(np.hstack((np.mean( mfccs, axis=1), np.std( mfccs, axis=1)))) #returns mfcc features with mean and standard deviation along time
except Exception as e:
print("Error encountered while parsing file: ", file_name)
return None
return pd.Series([0]*40)
I would then pass this X_train data to the Random Forest Classifier.
rfc.fit(X_train, y_train)
Note that in the first use case, when I flatten the data, I get X_train of size 1000 x 6920. This, in effect, have me parsing 6920 features to the Random Forest classifier for analysis compared to the 2nd use case of 40.
Can you tell me which approach is correct?
Thanks!

Ignite ML with multiple preprocessing

Using Ignite machine learning, say I have a labeled dataset like this:
IgniteCache<Integer, LabeledVector<Integer>> contents = ignite.createCache(cacheConfiguration);
contents.put(1, new LabeledVector<Integer>(new DenseVector(new Serializable[] { 705.2, "HD", 29.97, 1, 1, 96.13 }), 2));
contents.put(2, new LabeledVector<Integer>(new DenseVector(new Serializable[] { 871.3, "HD", 30, 1, 1, 95.35 }), 3));
contents.put(3, new LabeledVector<Integer>(new DenseVector(new Serializable[] { 2890.2, "SD", 29.97, 1, 1, 95.65 }), 10));
contents.put(4, new LabeledVector<Integer>(new DenseVector(new Serializable[] { 1032, "SD", 29.97, 1, 1, 96.8 }), 4));
How would I use the NormalizationTrainer on features 0 and 5 but the EncoderTrainer on feature 1? I think I'm having difficulties understanding how to concatenate multiple preprocessing before finally feeding the model trainer.
What I currently have is this (modified Ignite sample):
Vectorizer<Integer, LabeledVector<Integer>, Integer, Integer> vectorizer = new LabeledDummyVectorizer<Integer, Integer>(0, 5);
Preprocessor<Integer, LabeledVector<Integer>> preprocessor1 = new NormalizationTrainer<Integer, LabeledVector<Integer>>().withP(1).fit(ignite, data, vectorizer);
Preprocessor<Integer, LabeledVector<Integer>> preprocessor2 = new EncoderTrainer<Integer, LabeledVector<Integer>>().withEncoderType(EncoderType.STRING_ENCODER).withEncodedFeature(1).fit(ignite, data, preprocessor1);
KNNClassificationTrainer trainer = new KNNClassificationTrainer();
KNNClassificationModel mdl = trainer.fit(ignite, data, preprocessor2);
Do I understand the multiple preprocessor correctly? If so, how would I add another BinarizationTrainer on feature 2? I think I'm getting confused by where to specify which feature to apply the preprocessing trainer on. For one trainer (NormalizationTrainer) I have to use the Vectorizer to tell which features to use, for the EncoderTrainer I can do this as a method function. How would I then add BinarizationTrainer with another Vectorizer?
One preprocessor builds on top of another.
Coordinates are relative to the preprocessor that comes before.
This example shows how to accomplish what you want to do:
https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/tutorial/Step_6_KNN.java
put a breakpoint here: https://github.com/apache/ignite/blob/eabe50d90d5db2d363da36393cd957ff54a18d90/modules/ml/src/main/java/org/apache/ignite/ml/preprocessing/encoding/EncoderTrainer.java#L93
to see how the String Encoder references coordinates
examine all the variables:
UpstreamEntry<K, V> entity = upstream.next(); //this is the row from the file
LabeledVector<Double> row = basePreprocessor.apply(entity.getKey(), entity.getValue()); //after the previous preprocessor has been applied
categoryFrequencies = calculateFrequencies(row, categoryFrequencies); //use the given coordinates to calculate results.
more about preprocessing: https://apacheignite.readme.io/docs/preprocessing
Alternatively, you can use the pipelines API for a more streamlined approach to preprocessing: https://apacheignite.readme.io/docs/pipeline-api

How to get class labels from TensorFlow prediction

I have a classification model in TF and can get a list of probabilities for the next class (preds). Now I want to select the highest element (argmax) and display its class label.
This may seems silly, but how can I get the class label that matches a position in the predictions tensor?
feed_dict={g['x']: current_char}
preds, state = sess.run([g['preds'],g['final_state']], feed_dict)
prediction = tf.argmax(preds, 1)
preds gives me a vector of predictions for each class. Surely there must be an easy way to just output the most likely class (label)?
Some info about my model:
x = tf.placeholder(tf.int32, [None, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [None, 1], name='labels_placeholder')
batch_size = batch_size = tf.shape(x)[0]
x_one_hot = tf.one_hot(x, num_classes)
rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in
tf.split(x_one_hot, num_steps, 1)]
tmp = tf.stack(rnn_inputs)
print(tmp.get_shape())
tmp2 = tf.transpose(tmp, perm=[1, 0, 2])
print(tmp2.get_shape())
rnn_inputs = tmp2
with tf.variable_scope('softmax'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
rnn_outputs = rnn_outputs[:, num_steps - 1, :]
rnn_outputs = tf.reshape(rnn_outputs, [-1, state_size])
y_reshaped = tf.reshape(y, [-1])
logits = tf.matmul(rnn_outputs, W) + b
predictions = tf.nn.softmax(logits)
A prediction is an array of n types of classes(labels). It represents the model's "confidence" that the image corresponds to each of its classes(labels). You can check which label has the highest confidence value by using:
prediction = np.argmax(preds, 1)
After getting this highest element index using (argmax function) out of other probabilities, you need to place this index into class labels to find the exact class name associated with this index.
class_names[prediction]
Please refer to this link for more understanding.
You can use tf.reduce_max() for this. I would refer you to this answer.
Let me know if it works - will edit if it doesn't.
Mind that there are sometimes several ways to load a dataset. For instance with fashion MNIST the tutorial could lead you to use load_data() and then to create your own structure to interpret a prediction. However you can also load these data by using tensorflow_datasets.load(...) like here after installing tensorflow-datasets which gives you access to some DatasetInfo. So for instance if your prediction is 9 you can tell it's a boot with:
import tensorflow_datasets as tfds
_, ds_info = tfds.load('fashion_mnist', with_info=True)
print(ds_info.features['label'].names[9])
When you use softmax, the labels you train the model on are either numbers 0..n or one-hot encoded values. So if original labels of your data are let's say string names, you must map them to integers first and keep the mapping as a variable (such as 0 -> "apple", 1 -> "orange", 2 -> "pear" ...).
When using integers (with loss='sparse_categorical_crossentropy'), you get predictions as an array of probabilities, you just find the array index with the max value. You can use this predicted index to reverse-map to your label:
predictedIndex = np.argmax(predictions) // 2
predictedLabel = indexToLabelMap[predictedIndex] // "pear"
If you use one-hot encoded labels (with loss='categorical_crossentropy'), the predicted index corresponds with the "hot" index of your label.
Just for reference, I needed this info when I was working with MNIST dataset used in Google's Machine learning crash course. There is also a good classification tutorial in the Tensorflow docs.

The initial value is very different from what I set in tensorflow

I create a neural network with the initial tensors like this
tensor_dict = {
'model_conv1_weights': tf.get_variable('model_conv1_weights',
shape=[4, 4, 1, 64],
initializer=tf.truncated_normal_initializer(mean=10.0, stddev=2.0))
'model_conv1_biases': tf.get_variable('model_conv1_biases',
shape=[64],
initializer=tf.truncated_normal_initializer(mean=2.0, stddev=1.0))
'model_conv2_weights': tf.get_variable('model_conv2_weights',
shape=[4, 4, 64, 32],
initializer=tf.truncated_normal_initializer(mean=10.0, stddev=2.0))
'model_conv2_biases': tf.get_variable('model_conv2_biases',
shape=[32],
initializer=tf.truncated_normal_initializer(mean=2.0, stddev=1.0))
}
But when I start to train the model, the initial values of those tensors are very different from what I configured.
Did anyone here meet this issue before?

How to train an SVM with opencv based on a set of images?

I have a folder of positives and another of negatives images in JPG format, and I want to train an SVM based on that images, I've done the following but I receive an error:
Mat classes = new Mat();
Mat trainingData = new Mat();
Mat trainingImages = new Mat();
Mat trainingLabels = new Mat();
CvSVM clasificador;
for (File file : new File(path + "positives/").listFiles()) {
Mat img = Highgui.imread(file.getAbsolutePath());
img.reshape(1, 1);
trainingImages.push_back(img);
trainingLabels.push_back(Mat.ones(new Size(1, 1), CvType.CV_32FC1));
}
for (File file : new File(path + "negatives/").listFiles()) {
Mat img = Highgui.imread(file.getAbsolutePath());
img.reshape(1, 1);
trainingImages.push_back(img);
trainingLabels.push_back(Mat.zeros(new Size(1, 1), CvType.CV_32FC1));
}
trainingImages.copyTo(trainingData);
trainingData.convertTo(trainingData, CvType.CV_32FC1);
trainingLabels.copyTo(classes);
CvSVMParams params = new CvSVMParams();
params.set_kernel_type(CvSVM.LINEAR);
clasificador = new CvSVM(trainingData, classes, new Mat(), new Mat(), params);
When I try to run that I obtain:
OpenCV Error: Bad argument (train data must be floating-point matrix) in cvCheckTrainData, file ..\..\..\src\opencv\modules\ml\src\inner_functions.cpp, line 857
Exception in thread "main" CvException [org.opencv.core.CvException: ..\..\..\src\opencv\modules\ml\src\inner_functions.cpp:857: error: (-5) train data must be floating-point matrix in function cvCheckTrainData
]
at org.opencv.ml.CvSVM.CvSVM_1(Native Method)
at org.opencv.ml.CvSVM.<init>(CvSVM.java:80)
I can't manage to train the SVM, any idea? Thanks
Assuming that you know what you are doing by reshaping an image and using it to train SVM, the most probable cause of this is that your
Mat img = Highgui.imread(file.getAbsolutePath());
fails to actually read an image, generating a matrix img with null data property, which will eventually trigger the following in the OpenCV code:
// check parameter types and sizes
if( !CV_IS_MAT(train_data) || CV_MAT_TYPE(train_data->type) != CV_32FC1 )
CV_ERROR( CV_StsBadArg, "train data must be floating-point matrix" );
Basically train_data fails the first condition (being a valid matrix) rather than failing the second condition (being of type CV_32FC1).
In addition, even though reshape works on the *this object, it acts like a filter and its effect is not permanent. If it's used in a single statement without immediately being used or assigned to another variable it will be useless. Change the following lines in your code:
img.reshape(1, 1);
trainingImages.push_back(img);
to:
trainingImages.push_back(img.reshape(1, 1));
Just as the error says, You need to change type of Your matrix, from integer type, probably CV_8U, to floating point one, CV_32F or CV_64F. To do it You can use cv::Mat::convertTo(). Here is a bit about depths and types of matrices.

Resources