Implementing WeightedRandomSampler on imbalanced data set: RuntimeError: invalid multinomial distribution - machine-learning

I am trying to implement a weighted sampler for a very imbalanced data set. There are 182 different classes. Here is an array of the bin counts per class:
array([69487, 5770, 5753, 138, 4308, 10, 1161, 29, 5611,
350, 7, 183, 218, 4, 3, 3872, 5, 950,
33, 3, 443, 16, 20, 330, 4353, 186, 19,
122, 546, 6, 44, 6, 3561, 2186, 3, 48,
8440, 338, 9, 610, 74, 236, 160, 449, 72,
6, 37, 1729, 2255, 1392, 12, 1, 3426, 513,
44, 3, 28, 12, 9, 27, 5, 75, 15,
3, 21, 549, 7, 25, 871, 240, 128, 28,
253, 62, 55, 12, 8, 57, 16, 99, 6,
5, 150, 7, 110, 8, 2, 1296, 70, 1927,
470, 1, 1, 511, 2, 620, 946, 36, 19,
21, 39, 6, 101, 15, 7, 1, 90, 29,
40, 14, 1, 4, 330, 1099, 1248, 1146, 7414,
934, 156, 80, 755, 3, 6, 6, 9, 21,
70, 219, 3, 3, 15, 15, 12, 69, 21,
15, 3, 101, 9, 9, 11, 6, 32, 6,
32, 4422, 16282, 12408, 2959, 3352, 146, 1329, 1300,
3795, 90, 1109, 120, 48, 23, 9, 1, 6,
2, 1, 11, 5, 27, 3, 7, 1, 3,
70, 1598, 254, 90, 20, 120, 380, 230, 180,
10, 10])
In some classes, instances are as low as 1. I am trying to implement a Weighted random sampler from torch for this dataset. However, as the class imbalance is so large, when I calculate weights using
count_occr = np.bincount(dataset.y)
lbl_weights = 1. / count_occr
weights = np.array(lbl_weights)
weights = torch.from_numpy(weights)
sampler = WeightedRandomSampler(weights.type('torch.DoubleTensor'), len(weights*2))
I get two error messages:
RuntimeWarning: divide by zero encountered in true_divide
and
RuntimeError: invalid multinomial distribution (encountering probability entry = infinity or NaN)
Does anyone have a work around for this ? I was considering multiplying the lbl_weights by some scalar however I am not sure if this is a viable option.

Related

how to convert 8-bit unsigned int data to signed?

I am getting List of 8-bit unsigned int from a mic source for each sample rate which looks like this
[61, 251, 199, 251, 56, 252, 138, 252, 211, 252, 18, 253, 91, 253, 194, 253, 25, 254, 54, 254, 19, 254, 190, 253, 80, 253, 249, 252, 233, 252, 46, 253, 180, 253, 54, 254, 136, 254, 157, 254, 110, 254, 38, 254, 208, 253, 117, 253, 68, 253, 57, 253, 83, 253, 163, 253, 20, 254, 151, 254, 51, 255, 215, 255, 105, 0, 207, 0, 246, 0, 249, 0, 10, 1, 64, 1, 162, 1, 4, 2, 64, 2, 97, 2, 111, 2, 110, 2, 89, 2, 40, 2, 241, 1, 199, 1, 178, 1, 192, 1, 241, 1, 45, 2, 77, 2, 70, 2, 45, 2, 36, 2, 83, 2, 176, 2, 21, 3, 121, 3, 229, 3, 87, 4, 185, 4, 225, 4, 197, 4, 129, 4, 26, 4, 150, 3, 7, 3, 128, 2, 55, 2, 65, 2, 134, 2, 223, 2, 25, 3, 41, 3, 28, 3, 255, 2, 234, 2, 240, 2, 25, 3, 62, 3, 92, 3, 146, 3, 219, 3, 65, 4, 149, 4, 164, 4, 130, 4, 51, 4, 195, 3, 69, 3, 164, 2, 244, 1, 75, 1, 187, 0, 81, 0, 240, 255, 135, 255, 19, 255, 155, 254, 64, 254, 22, 254, 58, 254, 146, 254, 217, 254, 248, 254, 215, 254, 144, 254, 92, 254, 84, 254, 141, 254, 229, 254, 39, 255, 96, 255, 170, 255, 248, 255, 69, 0, 117, 0, 128, 0, 137, 0, 131, 0,
so how can I convert this into signed decimal value or someone can guide me to the right path
That depends on what the bytes mean.
Looking at the bytes, every other byte is either very low or very high. That suggests to me that the bytes are really little-endian signed 16-bit values.
In that case, you just need to view them as such. If we assume that the platform is little-endian (most are), you can just do:
List<int> list = ...;
Uint8List bytes = Uint8List.fromList(list); //
Int16List words = Int16List.sublistView(bytes);
Then the words list contains signed 16-bit numbers.
(If the list is already a Uint8List, you can skip the first step.)
If that's not what the bytes mean, you'll have to figure out what they do mean.
Dart int type provide a method to convert from signed to unsigned and from unsigned to signed.
For example:
int a = 16;
int b = 239;
print(a.toSigned(5).toString()); // Print -16
print(b.toSigned(5).toString()); // Print 15
the toSigned method parameter indicate the bit order of the sign bit.
You can get more information here: https://api.flutter.dev/flutter/dart-core/int/toSigned.html
A toUnsigned method exixts too: https://api.flutter.dev/flutter/dart-core/int/toUnsigned.html

Dart Bech32 and Hex encoding and decoding

I'm trying to decode this Bech32 address into a hex.
When given cosmos1qpjrq625nglf3xx9chdkq953nhrd3nygte44rt. It breaks it down into it's head which is 'cosmos' and the remainder is represented as a List of 8-bit unsigned integers (Uint8List).
When this is encoded to hexadecimal (HEX.Encode), i get a value of 00011203001a0a1413081f091106060518170d16000514111317030d11130408.
However, it is meant to be getting me 00643069549a3e9898c5c5db6016919dc6d8cc88 instead.
You can check this if you go to https://slowli.github.io/bech32-buffer/ -> and decode cosmos1qpjrq625nglf3xx9chdkq953nhrd3nygte44rt which gives 00643069549a3e9898c5c5db6016919dc6d8cc88.
I can't figure out the issue, is it perhaps
-The formatting is wrong, different bases? or am i doing this completely wrong.
Thanks and i appreciate any replies
Here is a snippet of code;
import 'package:bech32/bech32.dart';
import 'package:hex/hex.dart';
Bech32Codec bech32codec = Bech32Codec();
// target address : 00643069549a3e9898c5c5db6016919dc6d8cc88 -> to get to this address
String address = 'cosmos1qpjrq625nglf3xx9chdkq953nhrd3nygte44rt';
Bech32 bech32 = bech32codec.decode(address);
print(bech32.data);
// this returns [0, 1, 18, 3, 0, 26, 10, 20, 19, 8, 31, 9, 17, 6, 6, 5, 24, 23, 13, 22, 0, 5, 20, 17, 19, 23, 3, 13, 17, 19, 4, 8]
print(bech32.hrp);
print(bech32codec.encode(Bech32("cosmos", bech32.data)));
var answer2 = HEX.encode(bech32.data);
print(answer2);
var decode = HEX.decode('00643069549a3e9898c5c5db6016919dc6d8cc88');
print(decode);
// this returns [0, 100, 48, 105, 84, 154, 62, 152, 152, 197, 197, 219, 96, 22, 145, 157, 198, 216, 204, 136]

How to create array from 1 to n digit with single line of code in ruby [duplicate]

This question already has answers here:
Create array of n items based on integer value
(6 answers)
Closed 4 years ago.
Need to create an array of 1 to n numbers with a single line of code in ruby.
I have tried it using while loop. But I'm sure there are other simpler way of doing this in ruby.
a = []
b = 1
while b < 100 do
a << b
b += 1
end
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
Convert a range into an array.
(1..n).to_a
another way
You can just splat a range:
[*1..n]
example
[*1..10]
=>[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Or
a= Array(0..10)
puts a # => =>[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Upgrade to OTP 18 breaks usage of public_key library

Building a pem file in Elixir requires several steps, including building an entity. In OTP 17, the following works:
{public, private} = :crypto.generate_key(:ecdh, :secp256k1)
ec_entity = {:ECPrivateKey,
1,
:binary.bin_to_list(private),
{:namedCurve, {1, 3, 132, 0, 10}},
{0, public}}
der_encoded = :public_key.der_encode(:ECPrivateKey, ec_entity)
pem = public_key.pem_encode([{:ECPrivateKey, der_encoded, :not_encrypted}])
But using OTP 18, the following error occurs:
{public, private} = :crypto.generate_key(:ecdh, :secp256k1)
ec_entity = {:ECPrivateKey,
1,
:binary.bin_to_list(private),
{:namedCurve, {1, 3, 132, 0, 10}},
{0, public}}
der_encoded = :public_key.der_encode(:ECPrivateKey, ec_entity)
** (MatchError) no match of right hand side value: {:error, {:asn1, :badarg}}
public_key.erl:253: :public_key.der_encode/2
What is the source of this error?
The source of the error is a change in the way that the public_key entity is constructed between OTP 17 and OTP 18. If we reverse the process, starting with a pem file, we can see the difference.
OTP 17:
iex(6)> pem = "-----BEGIN EC PRIVATE KEY-----\nMHQCAQEEIJniJF4vtTqE4wS5AkhmMZsHIbil0l3XfRButkw5IJYFoAcGBSuBBAAK\noUQDQgAEtxm+jijBB0JxZTceHnCHE0HpMXJp1ScVUZ5McvDUVsS/Dek8IdAsMOPz\nnnVALflZzXtH/wU9p2LrFdJeuXwL8g==\n-----END EC PRIVATE KEY-----\n\n"
"-----BEGIN EC PRIVATE KEY-----\nMHQCAQEEIJniJF4vtTqE4wS5AkhmMZsHIbil0l3XfRButkw5IJYFoAcGBSuBBAAK\noUQDQgAEtxm+jijBB0JxZTceHnCHE0HpMXJp1ScVUZ5McvDUVsS/Dek8IdAsMOPz\nnnVALflZzXtH/wU9p2LrFdJeuXwL8g==\n-----END EC PRIVATE KEY-----\n\n"
iex(7)> [{type, decoded, _}] = :public_key.pem_decode(pem)
[{:ECPrivateKey,
<<48, 116, 2, 1, 1, 4, 32, 153, 226, 36, 94, 47, 181, 58, 132, 227, 4, 185, 2, 72, 102, 49, 155, 7, 33, 184, 165, 210, 93, 215, 125, 16, 110, 182, 76, 57, 32, 150, 5, 160, 7, 6, 5, 43, 129, 4, 0, 10, ...>>,
:not_encrypted}]
iex(8)> :public_key.der_decode(type, decoded)
{:ECPrivateKey, 1,
[153, 226, 36, 94, 47, 181, 58, 132, 227, 4, 185, 2, 72, 102, 49, 155, 7, 33,
184, 165, 210, 93, 215, 125, 16, 110, 182, 76, 57, 32, 150, 5],
{:namedCurve, {1, 3, 132, 0, 10}},
{0,
<<4, 183, 25, 190, 142, 40, 193, 7, 66, 113, 101, 55, 30, 30, 112, 135, 19, 65, 233, 49, 114, 105, 213, 39, 21, 81, 158, 76, 114, 240, 212, 86, 196, 191, 13, 233, 60, 33, 208, 44, 48, 227, 243, 158, 117, ...>>}}
OTP 18:
iex(5)> [{type, decoded, _}] = :public_key.pem_decode(pem)
[{:ECPrivateKey,
<<48, 116, 2, 1, 1, 4, 32, 153, 226, 36, 94, 47, 181, 58, 132, 227, 4, 185, 2, 72, 102, 49, 155, 7, 33, 184, 165, 210, 93, 215, 125, 16, 110, 182, 76, 57, 32, 150, 5, 160, 7, 6, 5, 43, 129, 4, 0, 10, ...>>,
:not_encrypted}]
iex(6)> entity = :public_key.der_decode(type, decoded)
{:ECPrivateKey, 1,
<<153, 226, 36, 94, 47, 181, 58, 132, 227, 4, 185, 2, 72, 102, 49, 155, 7, 33, 184, 165, 210, 93, 215, 125, 16, 110, 182, 76, 57, 32, 150, 5>>,
{:namedCurve, {1, 3, 132, 0, 10}},
<<4, 183, 25, 190, 142, 40, 193, 7, 66, 113, 101, 55, 30, 30, 112, 135, 19, 65, 233, 49, 114, 105, 213, 39, 21, 81, 158, 76, 114, 240, 212, 86, 196, 191, 13, 233, 60, 33, 208, 44, 48, 227, 243, 158, 117, 64, ...>>}
The difference is in how the public and private keys are represented.
The signature of an ECPrivateKey Record is:
ECPrivateKey'{ version, privateKey, parameters, publicKey}
In Erlang 18, both values are represented at plain binaries, in 17, the private key is a list and the public key is part of a tuple, {0, binary}.
So in order to build the pem file correctly, the entity representation has to change.
{public, private} = :crypto.generate_key(:ecdh, :secp256k1)
entity = {:ECPrivateKey,
1,
private,
{:namedCurve, {1, 3, 132, 0, 10}},
public}
Using the new representation of the record will solve the problem.
I didn't really check why your version works on some versions, but I've got some code that works on all these erlang versions: 19.0, 18.2.1, 18.1, 18.0, 17.5, R16B03 (running on travis).
-include_lib("public_key/include/public_key.hrl").
genPEMKey() ->
CurveId = secp256k1,
{PubKey, PrivKey} = crypto:generate_key(ecdh, CurveId),
Key = #'ECPrivateKey'{version = 1,
privateKey = PrivKey,
parameters = {
namedCurve,
pubkey_cert_records:namedCurves(CurveId)},
publicKey = PubKey},
DERKey = public_key:der_encode('ECPrivateKey', Key),
public_key:pem_encode([{'ECPrivateKey', DERKey, not_encrypted}]).
This piece of code was based on the examples found in the OTP codebase:
https://github.com/erlang/otp/blob/master/lib/public_key/test/erl_make_certs.erl#L407

Improve precision algorithm to detect facial expression using LBP

I'm developping a simple algorithm to detect several facial expressions (happiness, sadness, anger...). I'm based on this paper to do that. I'm preprocessing before to apply LBP uniform operator dividing the normalized image into 6x6 regions as shown in the example below:
By applying uniform LBP 59 feats are extracted for each region, so finally I have 2124 feats by image (6x6x59). I think it's a too large number of feats when I have about 700 images to train a model. I have read that's not good to get a good precission. My question is how can I reduce the dimension of the feats or another technique to improve the precision of the algorithm.
A straightforward way to reduce feature dimensionality - and increase robustness at the same time - would be using rotation-invariant uniform patterns. For a circular neighbourhood of radius and formed by pixels, the texture descriptor represents each region through 10 features. Thus dimensionality is reduced from 2124 to 6 × 6 × 10 = 360.
PCA can help to reduce the size of descriptor without loosing important information. Just google "opencv pca example".
Another helpful thing is to add rotation invariance to your uniform lbp features. This will improve the precision as well as dramatically decrease size of descriptor from 59 to 10.
static cv::Mat rotate_table = (cv::Mat_<uchar>(1, 256) <<
0, 1, 1, 3, 1, 5, 3, 7, 1, 9, 5, 11, 3, 13, 7, 15, 1, 17, 9, 19, 5, 21, 11, 23,
3, 25, 13, 27, 7, 29, 15, 31, 1, 33, 17, 35, 9, 37, 19, 39, 5, 41, 21, 43, 11,
45, 23, 47, 3, 49, 25, 51, 13, 53, 27, 55, 7, 57, 29, 59, 15, 61, 31, 63, 1,
65, 33, 67, 17, 69, 35, 71, 9, 73, 37, 75, 19, 77, 39, 79, 5, 81, 41, 83, 21,
85, 43, 87, 11, 89, 45, 91, 23, 93, 47, 95, 3, 97, 49, 99, 25, 101, 51, 103,
13, 105, 53, 107, 27, 109, 55, 111, 7, 113, 57, 115, 29, 117, 59, 119, 15, 121,
61, 123, 31, 125, 63, 127, 1, 3, 65, 7, 33, 97, 67, 15, 17, 49, 69, 113, 35,
99, 71, 31, 9, 25, 73, 57, 37, 101, 75, 121, 19, 51, 77, 115, 39, 103, 79, 63,
5, 13, 81, 29, 41, 105, 83, 61, 21, 53, 85, 117, 43, 107, 87, 125, 11, 27, 89,
59, 45, 109, 91, 123, 23, 55, 93, 119, 47, 111, 95, 127, 3, 7, 97, 15, 49, 113,
99, 31, 25, 57, 101, 121, 51, 115, 103, 63, 13, 29, 105, 61, 53, 117, 107, 125,
27, 59, 109, 123, 55, 119, 111, 127, 7, 15, 113, 31, 57, 121, 115, 63, 29, 61,
117, 125, 59, 123, 119, 127, 15, 31, 121, 63, 61, 125, 123, 127, 31, 63, 125,
127, 63, 127, 127, 255
);
// the well known original uniform2 pattern
static cv::Mat uniform_table = (cv::Mat_<uchar>(1, 256) <<
0,1,2,3,4,58,5,6,7,58,58,58,8,58,9,10,11,58,58,58,58,58,58,58,12,58,58,58,13,58,
14,15,16,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,17,58,58,58,58,58,58,58,18,
58,58,58,19,58,20,21,22,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,
58,58,58,58,58,58,58,58,58,58,58,58,23,58,58,58,58,58,58,58,58,58,58,58,58,58,
58,58,24,58,58,58,58,58,58,58,25,58,58,58,26,58,27,28,29,30,58,31,58,58,58,32,58,
58,58,58,58,58,58,33,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,34,58,58,58,58,
58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,
58,35,36,37,58,38,58,58,58,39,58,58,58,58,58,58,58,40,58,58,58,58,58,58,58,58,58,
58,58,58,58,58,58,41,42,43,58,44,58,58,58,45,58,58,58,58,58,58,58,46,47,48,58,49,
58,58,58,50,51,52,58,53,54,55,56,57
);
static cv::Mat rotuni_table = (cv::Mat_<uchar>(1, 256) <<
0, 1, 1, 2, 1, 9, 2, 3, 1, 9, 9, 9, 2, 9, 3, 4, 1, 9, 9, 9, 9, 9, 9, 9, 2, 9, 9, 9,
3, 9, 4, 5, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 2, 9, 9, 9, 9, 9, 9, 9,
3, 9, 9, 9, 4, 9, 5, 6, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 2, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
3, 9, 9, 9, 9, 9, 9, 9, 4, 9, 9, 9, 5, 9, 6, 7, 1, 2, 9, 3, 9, 9, 9, 4, 9, 9, 9, 9,
9, 9, 9, 5, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 6, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 7, 2, 3, 9, 4,
9, 9, 9, 5, 9, 9, 9, 9, 9, 9, 9, 6, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 7,
3, 4, 9, 5, 9, 9, 9, 6, 9, 9, 9, 9, 9, 9, 9, 7, 4, 5, 9, 6, 9, 9, 9, 7, 5, 6, 9, 7,
6, 7, 7, 8
);
static void hist_patch_uniform(const Mat_<uchar> &fI, Mat &histo,
int histSize, bool norm, bool rotinv)
{
cv::Mat ufI, h, n;
if (rotinv) {
cv::Mat r8;
// rotation invariant transform
cv::LUT(fI, rotate_table, r8);
// uniformity for rotation invariant
cv::LUT(r8, rotuni_table, ufI);
// histSize is max 10 bins
} else {
cv::LUT(fI, uniform_table, ufI);
}
// the upper boundary is exclusive
float range[] = {0, (float)histSize};
const float *histRange = {range};
cv::calcHist(&ufI, 1, 0, Mat(), h, 1, &histSize, &histRange, true, false);
if (norm)
normalize(h, n);
else
n = h;
histo.push_back(n.reshape(1, 1));
}
The input is your CV_8U grey-scaled patch (one of those rects). The out is the rotation invariant, uniform, normalized reshaped histogram (1 line). Then you concat your patches histograms into the face descriptor. You will have 6*6*10 = 360. This is good by itself but with pca you can make it 300 or less without loosing important information and even improving the quality of detection because removed dimensions (let's say with variances less than 5%) not just occupy space but also contain mostly the noise (coming from, for example, gaussian noise from the sensor).
Then you can compare this concat histogram with the bank of faces or using svm (rbf kernel fits better). If you do it correctly, then predict for one face should not take more than 1-15ms (5 ms on my iphone7).
Hope this helps.

Resources