Fitting a Support Vector Classifier in scikit-learn with image data produces error - machine-learning

I'm trying to train an SVC classifier for image data. Yet, when I run this code:
classifier = svm.SVC(gamma=0.001)
classifier.fit(train_set, train_set_labels)
I get this error:
ValueError: setting an array element with a sequence.
I produced the images into an array with Matplotlib: plt.imread(image).
The error seems like it's not in an array, yet when I check the types of the data and the labels they're both lists (I manually add to a list for the labels data):
print(type(train_set))
print(type(train_set_labels))
<class 'list'>
<class 'list'>
If I do a plt.imshow(items[0]) then the image shows correctly in the output.
I also called train_test_split from scikit-learn:
train_set, test_set = train_test_split(items, test_size=0.2, random_state=42)
Example input:
train_set[0]
array([[[212, 134, 34],
[221, 140, 48],
[240, 154, 71],
...,
[245, 182, 51],
[235, 175, 43],
[242, 182, 50]],
[[230, 152, 51],
[222, 139, 47],
[236, 147, 65],
...,
[246, 184, 49],
[238, 179, 43],
[245, 186, 50]],
[[229, 150, 47],
[205, 122, 28],
[220, 129, 46],
...,
[232, 171, 28],
[237, 179, 35],
[244, 188, 43]],
...,
[[115, 112, 103],
[112, 109, 102],
[ 80, 77, 72],
...,
[ 34, 25, 28],
[ 55, 46, 49],
[ 80, 71, 74]],
[[ 59, 56, 47],
[ 66, 63, 56],
[ 48, 45, 40],
...,
[ 32, 23, 26],
[ 56, 47, 50],
[ 82, 73, 76]],
[[ 29, 26, 17],
[ 41, 38, 31],
[ 32, 29, 24],
...,
[ 56, 47, 50],
[ 59, 50, 53],
[ 84, 75, 78]]], dtype=uint8)
Example label:
train_set_labels[0]
'Picasso'
I'm not sure what step I'm missing to get the data in the form that the classifier needs in order to train it. Can anyone see what may be needed?

The error message you are receiving:
ValueError: setting an array element with a sequence,
normally results when you are trying to put a list somewhere that a single value is required. This would suggest to me that your train_set is made up of a list of multidimensional elements, although you do state that your inputs are lists. Would you be able to post an example of your inputs and labels?
UPDATE
Yes, it's as I thought. The first element of your training data, train_set[0], corresponds to a long list (I can't tell how long), each element of which consists of a list of 3 elements. You are therefore calling the classifier on a list of lists of lists, when the classifier requires a list of lists (m rows corresponding to the number of training examples with each row made up of a list of n features). What else is in your train_set array? Is the full data set in train_set[0]? If so, you would need to create a new array with each element corresponding to each of the subelements of train_set[0], and then I believe your code should run, although I am not too familiar with that classifier. Alternatively you could try running the classifier with train_set[0].
UPDATE 2
I don't have experience with scikit-learn.svc so I wouldn't be able to tell you what the best way of preprocessing the data in order for it to be acceptable to the algorithm, but one method would be to do as I said previously and for each element of train_set, which is composed of lists of lists, would be to recurse through and place all the elements of sublist into the list above. For example
new_train_set = []
for i in range(len(train_set)):
for j in range(len(train_set[i]):
new_train_set.append([train_set[i,j])
I would then train with new_train_set and the training labels.

Related

darknet mask and anchor values for yolov4

In the README.md of darknet repo https://github.com/AlexeyAB/darknet we have this sentence about anchor boxes:
But you should change indexes of anchors masks= for each [yolo]-layer, so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30, 2nd smaller than 60x60, 3rd remaining.
It looks like the default anchor boxes for yolov4-sam-mish.cfg are
12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
and the first yolo layer has config:
mask = 0,1,2
Do I understand correctly that this will use those anchors?
(12, 16), (19, 36), (40, 28)
If yes it seems to contradict with the statement or do I understand it incorrectly.
I'm asking because for my dataset and my image sizes (256, 96) I got those anchors from calc_anchors in darknet
15, 56, 22, 52, 28, 48, 23, 62, 26, 59, 39, 43, 31, 57, 29, 66, 37, 64
and trying to figure out how should I set the masks.
Looks good to me.
12, 16,
19, 36,
40, 28,
36, 75,
76, 55,
72, 146,
142, 110,
192, 243,
459, 401
You may leave the masks as are. She current config you show will yield higher MaP; supporting documentation here:
https://github.com/WongKinYiu/PartialResidualNetworks/issues/2

How can I get output of sentencepiece function as array format?

I am converting word to vector where I need to get vector as type array int format but I am getting array object type.
Can anyone help me with solution?
def word2idx(statement):
#here I am using sentencepieceprocessor as sp
id1 = np.asarray(sp.encode_as_ids(statement)).astype(np.int32)
return id1
sentence = 'the world', 'hello cherry', 'make me proud'
id2 = [word2idx(s)for s in sentence]
print(id2)
actual output:
[[array([ 34, 1867]), array([ 83, 184, 63, 50, 47, 71, 41]), array([328, 69, 7, 303, 649])]]
Expect output:
[[ 34, 1867], [ 83, 184, 63, 50, 47, 71, 41], [328, 69, 7, 303, 649]]
The problem is that arrays are of different lengths, so numpy cannot make tensor out of it.
If you are happy with a list of lists and don't need a numpy array, you can do:
id2 = np.array([[ 34, 1867], [ 83, 184, 63, 50, 47, 71, 41]])
id2.tolist()
and get: [[34, 1867], [83, 184, 63, 50, 47, 71, 41]].
You need a dense numpy array, you need to pad all sequence to the same length. You can do something like:
id2 = np.array([[ 34, 1867], [ 83, 184, 63, 50, 47, 71, 41]])
idx = np.zeros((len(id2), max(len(s) for s in id2)))
for i, sent_ids in enumerate(id2):
idx[i,:len(sent_ids)] = sent_ids
In this case you will get:
array([[ 34., 1867., 0., 0., 0., 0., 0.],
[ 83., 184., 63., 50., 47., 71., 41.]])

How can I getting value of 8 neighbor of a image as the third dimension in Numpy

Given an 2d image data, for every pixel P1, how can I get the following 3d array out of it?
P9 P2 P3
P8 P1 P4
P7 P6 P5
img[x,y,:] = [P2, P3, P4, P5, P6, P7, P8, P9, P2]
without using forloop, just numpy operation (because of performance issue)
Here's one approach with zeros padding for boundary elements and using NumPy strides with the built-in scikit-image's view_as_windows for efficient sliding window extraction -
from skimage.util import view_as_windows as viewW
def patches(a, patch_shape):
side_size = patch_shape
ext_size = (side_size[0]-1)//2, (side_size[1]-1)//2
img = np.pad(a, ([ext_size[0]],[ext_size[1]]), 'constant', constant_values=(0))
return viewW(img, patch_shape)
Sample run -
In [98]: a = np.random.randint(0,255,(5,6))
In [99]: a
Out[99]:
array([[139, 176, 141, 172, 192, 81],
[163, 115, 7, 234, 72, 156],
[ 75, 60, 9, 81, 132, 12],
[106, 202, 158, 199, 128, 238],
[161, 33, 211, 233, 151, 52]])
In [100]: out = patches(a, [3,3]) # window size = [3,3]
In [101]: out.shape
Out[101]: (5, 6, 3, 3)
In [102]: out[0,0]
Out[102]:
array([[ 0, 0, 0],
[ 0, 139, 176],
[ 0, 163, 115]])
In [103]: out[0,1]
Out[103]:
array([[ 0, 0, 0],
[139, 176, 141],
[163, 115, 7]])
In [104]: out[-1,-1]
Out[104]:
array([[128, 238, 0],
[151, 52, 0],
[ 0, 0, 0]])
If you want a 3D array, you could add a reshape at the end, like so -
out.reshape(a.shape + (9,))
But, be mindful that this would create a copy instead of the efficient strided based views we would get from the function itself.

Finding hamming distance between ORB feature descriptors

I am trying to write a function to match ORB features. I am not using default matchers (bfmatcher, flann matcher) because i just want match speific features in image with features in other image.
I saw ORS descriptor its a binary array.
My query is how to match 2 features i.e how to find hamming distance between 2 descriptors ?
ORB descriptors:
descriptor1 =[34, 200, 96, 158, 75, 208, 158, 230, 151, 85, 192, 131, 40, 142, 54, 64, 75, 251, 147, 195, 78, 11, 62, 245, 49, 32, 154, 59, 21, 28, 52, 222]
descriptor2 =[128, 129, 2, 129, 196, 2, 168, 101, 60, 35, 83, 18, 12, 10, 104, 73, 122, 13, 2, 176, 114, 188, 1, 198, 12, 0, 154, 68, 5, 8, 177, 128]
Thanks.
ORB descriptors are just 32 byte uchar Mat's.
the bruteforce and flann matchers do some more work, than just comparing descriptors, but if that's all you want for now, it would be a straight norm:
Mat descriptor1, descriptor2;
double dist = norm( descriptor1, descriptor2, NORM_HAMMING);
// NORM_HAMMING2 or even NORM_L1 would make sense, too.
// dist is a double, but ofc. you'd only get integer values in this case.

Best JSON Data Points for $N/MultistrokeGestureRecognizer-iOS?

I am attempting to use the Objective-C MultistrokeGestureRecognizer-iOS library to detect what shape the user is drawing on the screen of their device. This library uses the $N Multistroke Recognizer's algorithm for determining the correlation between a user's drawing and glyphs written in JSON.
I have tried to determine when a user draws a Circle, Square, Triangle, or Diamond based on the float returned by the WTMGlyphDetector, but it seems to always return horribly inaccurate results.
For example, when simply drawing a dot on the screen, it will return a higher similarity to a diamond than when a user actually attempts to draw an accurate diamond.
I have tried to use a couple different complexity levels of JSON to see if the results would change. For example, with the Square shape, I started off with this JSON object:
[
[
[ 27,19], [348,19], [347,343], [22,344], [23,18]
]
]
Which did not seem to return the results I wanted, probably because it was far too vague. I switched to a much more detailed JSON Object like this one:
[
[
[226, 12], [ 285, 300], [ 214, 12], [ 443, 26], [ 451, 29], [ 102, 111],
[196, 297], [ 394, 299], [ 223, 297], [ 95, 302], [ 108, 87], [ 176, 13],
[ 487, 299], [ 449, 28], [ 103, 99], [ 253, 12], [ 369, 12], [ 458, 269],
[ 150, 297], [ 287, 12], [ 459, 269], [ 216, 297], [ 156, 297], [ 118, 16],
[ 116, 306], [ 236, 300], [ 232, 299], [ 115, 16], [ 399, 15], [ 243, 300],
[ 307, 12], [ 425, 299], [ 107, 89], [ 98, 297], [ 128, 303], [ 101, 284],
[ 246, 300], [ 460, 269], [ 376, 299], [ 163, 297], [ 452, 29], [ 109, 65],
[ 461, 294], [ 148, 299], [ 114, 16], [ 143, 300], [ 450, 29], [ 256, 300],
[ 453, 86], [ 136, 301], [ 113, 16], [ 111, 55], [ 147, 15], [ 456, 106],
[ 185, 297], [ 173, 297], [ 121, 305], [ 157, 14], [ 322, 299], [ 495, 299],
[ 302, 300], [ 473, 299], [ 112, 39], [ 494, 299], [ 340, 12]
]
]
Which still returned poor results. I also tried loading just one JSON object compared to loading all four, and there did not seem to be a noticeable difference in the quality of the the result returned.
Am I using the library incorrectly? Is there a preferable setup in terms of accuracy of the JSON objects and/or how many of them you load into the WTMGlyphDetector?
EDIT: I found a pretty good balance
Using Plotly I graphed out all of the JSON points and tried several different styles of writing the JSON until I found something that works. At first I tried using very complex shapes with tons of points, as shown above. These complex JSON Objects were generated using an app I built that printed out JSON objects that the user drew on the screen, solely for the purpose of spitting out the JSON I needed to make the WTMGlyphDetector function properly.
Next I tried using simple data generated in an app bundled with the MultistrokeGestureRecognizer-iOS GitHub repo, but only after using Plotly did I find out (much later) that this app outputted the data upside down! I manually reversed all of the datapoints and found that the app generally now worked.
I found that the scores returned by the WTMGlyphDetector was somewhere in the range between 2 and 3 when drawn properly and between 1 and 2 when drawn incorrectly. I decided to use 1.75 as my threshold for properly drawn shapes because I found (for the most part while testing) that this was the sweet spot that separated the bad/incorrect drawings from the actual attempts.
The only downside to the WTMGlyphDetector is that it takes the shapes in quite literally. For example, if you drew a triangle and then went past the tip as follows:
The WTMGlyphDetector would not be able to identify this shape properly (only would work a percentage of the time).
Anyhow, I hope this write-up helps anyone who encounters the same issue as I did.
TL;DR: Simple JSON objects are better, avoid the bundled point generator and use a service such as Plotly instead. From my results, a float over 1.75 meant that the user drew it correctly, and below 1.75 means they drew it incorrectly.

Resources