Why is my yolo model has one extra output? - opencv

I have trained yolo model to detect 24 different classes and now when I'm trying to extract outputs of it, it returns 29 numbers for each prediction. Here they are:
0.605734 0.0720678 0.0147335 0.0434446 0.999661 0 0 0 0.999577 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I suppose that last 24 numbers are scores for each class, first 4 are parameters of bbox, but what is 5th? It is always bigger than 0.9. I'm confused. Please, help me.

It's the probability that the specific box has an object

Related

How Standard CAN win the bus access in Arbitration with EXT CAN?

I read some documents and all of them say that the Std CAN have higher priority than the Ext CAN because the SRR bit is always Recessive in EXT CAN when they have the same ID, but from my understanding it depends.
https://copperhilltech.com/blog/controller-area-network-can-bus-tutorial-extended-can-protocol/
To simplify, let's say we have message ID 0x1(Std CAN) and 0x1(Ext CAN) sending simultaneously on the same bus.
The arbitration field of the Std CAN be compared to Ext CAN should be like this:
Std CAN: 0 0 0 0 0 0 0 0 0 0 1 0 (The bold bit is RTR)
Ext CAN: 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 (The bold bits are SRR, IDE and RTR)
At the 11th bit, The node that sends Std CAN is sending 1 (Recessive bit), and the node that sends Ext CAN is sending 0 (Dominant bit), so the Ext CAN wins the bus access and the node that sends Std CAN switch to listen mode and not sending anything after that, so the SRR and IDE bits never be reached to decide the message is Ext CAN or Std CAN.
Is my above understanding correct?
Thank you in advance,
Yes, an 29 bit frame with RTR set has higher priority than an 11 bit frame without RTR, given the first 11 bits of the identifiers are identical. So saying that standard frames have higher priority than extended is a simplification.
RTR frames is a bit of an oddball case overall, as they may also have varied length in the DLC area even though there's no data at all in the frame.

Create dummies from column with multiple values in dask

My question is similar to this thread Create dummies from column with multiple values in pandas
Objective: I would like to produce similar result below but using dask
In Pandas
import pandas as pd
df = pd.DataFrame({'fruit': ['Banana, , Apple, Dragon Fruit,,,', 'Kiwi,', 'Lemon, Apple, Banana', ',']})
df['fruit'].str.get_dummies(sep=',')
Which will output the following:
Apple Banana Dragon Fruit Banana Kiwi Lemon
0 1 1 0 1 1 0 0
1 0 0 0 0 0 1 0
2 0 1 1 0 0 0 1
3 0 0 0 0 0 0 0
get_dummies() above is of type <pandas.core.strings.StringMethods>
Now the problem is there is no get_dummies() for dask equivalent <dask.dataframe.accessor.StringAccessor>
How can I solve my problem using dask?
Apparently this is not possible in dask as we wouldn't know the output columns before hand. See https://github.com/dask/dask/issues/4403.

How to use lightGBM in multi-classification problem whose target has multi-labels

I get a multi-class classification problem that the samples can have more than one labels. So I want to know how to use lightGBM in such multi-class classification problems.
For examples, the target is as follows:
id label1 label2 label3 label4 label5
0 0 1 1 0 0 0
1 1 0 1 0 0 0
2 2 1 0 0 0 0
3 3 0 0 0 0 0
And I have checked the documentation of lightGBM but it doesn't offer what I want.
So I am wondering how to use lightGBM in such multi-class classification problems.

Feature engineering, handling missing data

Consider this data table
NumberOfAccidents MeanDistance
1 5
3 0
0 NA
0 NA
6 1.2
2 0
the first feature is the number of accidents and the second is the average distance of these accidents to a certain point. It is obvious for a record with zero accident, there won't be a value for MeanDistance. However, imputing these missing values are not logical!
MY SOLUTION: I have decided to discretize the MeanDistance with NAs being a level (bin) and the rest of the data being in bins like: [0,1), [1,2.5), [2.5, Inf). the final table will look like this:
NumberOfAccidents NAs first_bin sec_bin third_bin
1 0 0 0 1
3 0 1 0 0
0 1 0 0 0
0 1 0 0 0
6 0 0 1 0
2 0 1 0 0
What is your idea with these types of missing values that cannot be imputed?
what is your solution to this problem?
It really depends on the domain and what you are trying to predict. Even though your solution is fine, I wouldn't bin the rest of the data as you did. Giving that the NumberOfAccidents feature already tells what MeanDistance have NA values, I would probably just impute 0 into the NA values (for computations) and leave the rest of the data as it is.
Nevertheless, there is no need to limit yourself, just try different approaches and keep the one that boost your KPI (Key Performance Indicator).

Where are class names stored in a machine learning dataset in Python?

I'm learning machine learning using the iris dataset on Python 3.6 with sklearn, and I don't understand where the class names that are being retrieved are stored. In Iris, there are 3 classes, and each class contains 50 observations. You can use several commands to print the classes, and their associated numerical values:
print(iris.target)
print(iris.target_names)
This will result in the output:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
['setosa' 'versicolor' 'virginica']
So as can be seen, the classes are Setosa, Versicolor, and Virginica. What I don't understand is where these class names are being stored, or how they're called upon within the model. If you use the shape command on the data, or target, the result is (150,4) and (150,) meaning there is 150 observations and 4 rows in the data, and 150 rows in the target. I am just not able to bridge the gap with my mind as to where this is coming from, however.
What I don't understand is where the class names are supposed to be stored. If I made a brand new dataset for pokemon types and had ice, fire, water, flying, where could I store these types? Would they be required to be numerical as well, like iris, with 0,1,2,3?
Sklearn uses a custom type of object to store its datasets, exactly so that they can store metadata along with the raw data.
If you load the iris dataset
In [2]: from sklearn import datasets
In [3]: iris = datasets.load_iris()
You can inspect the type of object with type:
In [4]: type(iris)
Out[4]: sklearn.utils.Bunch
You can look at the attributes inside the object with dir:
In [5]: dir(iris)
Out[5]: ['DESCR', 'data', 'feature_names', 'target', 'target_names']
And then use . notation to take a look at the attributes themselves:
In [6]: type(iris.data)
Out[6]: numpy.ndarray
In [7]: type(iris.target)
Out[7]: numpy.ndarray
In [8]: type(iris.feature_names)
Out[8]: list
If you want to mimic this for your own datasets, you will have to define your own custom object type to mimic this structure. That would involve defining your own class.

Resources