Why classification_report and confusion_matrix_normalized are different?

Why classification_report and confusion_matrix_normalized are different? - machine-learning

What is the difference between classification_report and confusion matrix normalized? Why in classification_report last class is not accurate, and in confusion matrix normalized is most accurate?
This is the screenshots:

Related

U-Net predicted masks are all black

I've trained a U-Net to segment some leaves images with a symptom of a agricultural pest.. so I got 3 classes, backgroud (0), leaf(1) and the symptom (3).
When I predict all test images, I got a black image in 8bit.
So when I opened (IMAGEJ) them and changed the type to 16-bit I can see the image itself and all 3 classes predicted, not a black square anymore...
How can I predict the masks correctly?
All test and train images are in RGB, and all maks are in 8bit (but in here, the classes are shown)
import numpy as np
import cv2
import os
import glob
#Process each patch in test database
img_number = 1
for image in range(test_images.shape[0]):
input_img = [test_images]
y_pred = model.predict(input_img)
y_pred_argmax=np.argmax(y_pred, axis=3)
prediction = y_pred_argmax[image]
cv2.imwrite('/content/drive/MyDrive/BD_filtred/predicted'+str(img_number)+".png", prediction)
img_number +=1

How to prepare confusion matrix from the predicted class probabilities?

There is a Naive Bayesian classifier which is created with a given training data. In the table, the predicted positive class probabilities and the actual class labels are shown. I want to prepare the confusion matrix but I could not find out how to do it with just knowing the probabilities.
ID
Actual class label
Predicted positive class probability
1
+
0.6
2
+
0.8
3
-
0.2
4
+
0.3
5
-
0.4

First, you need to have discrete class labels to compute confusion matrix. Define a threshold on the predicted positive class probability to predict class labels (y_pred).
You can then use actual class labels (y_actual) and y_pred to compute the confusion matrix.
from sklearn.metrics import confusion_matrix
confusion_matrix(y_actual, y_pred)

Critical parameter behind skimage's watershed "over-segmentation"

I have the following mask of cell nuclei, and my goal is to segment them. However, using what seems to be a very standard approach,
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from skimage.segmentation import watershed
from skimage import measure
# load mask
mask = mpimg.imread('mask.png')
# find distance to nearest border
distance = scipy.ndimage.distance_transform_edt(mask)
# find local maxima based on distance to border
local_maxi = peak_local_max(distance, indices=False, footprint=np.ones((125, 125)), labels=mask)
# generate markers for regions
markers = measure.label(local_maxi)
# watershed segmentation
labeled = watershed(-distance, markers, mask=mask, watershed_line = True)
# plot figure
fig, axs = plt.subplots()
axs.imshow(labeled, cmap='flag')
some large, connected components are unsegmented while smaller unconnected components become oversegmented:
Throughly browosing answers on StackOverflow, I haven't been able to find is a discussion of which parameters drive 'under-segmentation' vs 'over-segmentation' in the skimage.segmentation.watershed algorithm.
Which parameter most strongly influences "oversegmentation" in the watershed algorithm? My intuition tells me it could be the footprint size? or the distance transform? What is the most critical parameter that determines the segmentation neighbourhood?
EDIT1: Below I have included the distance transform, the filtering of which others have pointed out is a critically important step. However, I am still unable to diagnose symptoms of a "bad" distance transform, and unaware of rules of thumbs for filtering said transform.

In your particular case, the origin of some of your over-segmentation is on the result of peak_local_max().
If you run the following code you will be able to find which local maximums are selected for your image. I'm using OpenCV for plotting dots, you might want to adapt it for another library.
import cv2
import numpy as np
import matplotlib.pyplot as plt
localMax_idx = np.where(local_maxi)
localMax_img = mask.copy()
localMax_img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
for i in range(localMax_idx[0].shape[0]):
x = localMax_idx[1][i]
y = localMax_idx[0][i]
localMax_img = cv2.circle(localMax_img, (x,y), radius=5, color=(255, 0, 0), thickness=-1)
plt.imshow(localMax_img)
plt.show()
You will see that there are multiple markers for over-segmented cells. There are some suggested approaches to deal with this issue (for example, this one).

How does sklearn actually calculate AUROC?

I understand that the ROC curve for a model is constructed by varying the threshold (that affects TPR, FPR).
Thus my initial understanding is that, to calculate the AUROC, you need to run the model many times with different threshold to get that curve and finally calculate the area.
But it seems like you just need some probability estimate of the positive class, as in the code example in sklearn's roc_auc_score below, to calculate AUROC.
>>> import numpy as np
>>> from sklearn.metrics import roc_auc_score
>>> y_true = np.array([0, 0, 1, 1])
>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> roc_auc_score(y_true, y_scores)
0.75
How does that work? Any recommended read?

XY coordinates in a image stored as numpy?

i have a 96x96 pixel numpy array, which is a grayscale image. How do i find and plot the x,y cordinate of the maximum pixel intensity in this image?
image = (96,96)
Looks simple but i could find any snippet of code.
Please may you help :)

Use the argmax function, in combination with unravel_index to get the row and column indices:
>>> import numpy as np
>>> a = np.random.rand(96,96)
>>> rowind, colind = np.unravel_index(a.argmax(), a.shape)
As far as plotting goes, if you just want to pinpoint the maximum value using a Boolean mask, this is the way to go:
>>> import matplotlib.pyplot as plt
>>> plt.imshow(a==a.max())
<matplotlib.image.AxesImage object at 0x3b1eed0>
>>> plt.show()
In that case, you don't need the indices even.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Why classification_report and confusion_matrix_normalized are different? - machine-learning

What is the difference between classification_report and confusion matrix normalized? Why in classification_report last class is not accurate, and in confusion matrix normalized is most accurate? This is the screenshots:

Related

U-Net predicted masks are all black

How to prepare confusion matrix from the predicted class probabilities?

Critical parameter behind skimage's watershed "over-segmentation"

How does sklearn actually calculate AUROC?

XY coordinates in a image stored as numpy?

Categories

Resources