I have an image of shape (466,394,1) which I want to split into 7x7 patches.
image = tf.placeholder(dtype=tf.float32, shape=[1, 466, 394, 1])
Using
image_patches = tf.extract_image_patches(image, [1, 7, 7, 1], [1, 7, 7, 1], [1, 1, 1, 1], 'VALID')
# shape (1, 66, 56, 49)
image_patches_reshaped = tf.reshape(image_patches, [-1, 7, 7, 1])
# shape (3696, 7, 7, 1)
unfortunately does not work in practice as image_patches_reshaped mixes up the pixel order (if you view images_patches_reshaped you will only see noise).
So my new approach was to use tf.split:
image_hsplits = tf.split(1, 4, image_resized)
# [<tf.Tensor 'split_255:0' shape=(462, 7, 1) dtype=float32>,...]
image_patches = []
for split in image_hsplits:
image_patches.extend(tf.split(0, 66, split))
image_patches
# [<tf.Tensor 'split_317:0' shape=(7, 7, 1) dtype=float32>, ...]
this indeed preserves the image pixel order unfortunately it creates a lot of OPs which is not very good.
How do I split an image into smaller patches with less OPs?
Update1:
I ported the answer of this question for numpy to tensorflow:
def image_to_patches(image, image_height, image_width, patch_height, patch_width):
height = math.ceil(image_height/patch_height)*patch_height
width = math.ceil(image_width/patch_width)*patch_width
image_resized = tf.squeeze(tf.image.resize_image_with_crop_or_pad(image, height, width))
image_reshaped = tf.reshape(image_resized, [height // patch_height, patch_height, -1, patch_width])
image_transposed = tf.transpose(image_reshaped, [0, 2, 1, 3])
return tf.reshape(image_transposed, [-1, patch_height, patch_width, 1])
but I think there is still room for improvement.
Update2:
This will convert patches back to the original image.
def patches_to_image(patches, image_height, image_width, patch_height, patch_width):
height = math.ceil(image_height/patch_height)*patch_height
width = math.ceil(image_width/patch_width)*patch_width
image_reshaped = tf.reshape(tf.squeeze(patches), [height // patch_height, width // patch_width, patch_height, patch_width])
image_transposed = tf.transpose(image_reshaped, [0, 2, 1, 3])
image_resized = tf.reshape(image_transposed, [height, width, 1])
return tf.image.resize_image_with_crop_or_pad(image_resized, image_height, image_width)
I think your issue is somewhere else. I wrote the following code snippet (using a smaller 14x14 image so that I could hand-check all the values), and confirmed that your initial code did the correct operations:
import tensorflow as tf
import numpy as np
IMAGE_SIZE = [1, 14, 14, 1]
PATCH_SIZE = [1, 7, 7, 1]
input_image = np.reshape(np.array(xrange(14*14)), IMAGE_SIZE)
image = tf.placeholder(dtype=tf.int32, shape=IMAGE_SIZE)
image_patches = tf.extract_image_patches(
image, PATCH_SIZE, PATCH_SIZE, [1, 1, 1, 1], 'VALID')
image_patches_reshaped = tf.reshape(image_patches, [-1, 7, 7, 1])
sess = tf.Session()
(output, output_reshaped) = sess.run(
(image_patches, image_patches_reshaped),
feed_dict={image: input_image})
print "Output (shape: %s):" % (output.shape,)
print output
print "Reshaped (shape: %s):" % (output_reshaped.shape,)
print output_reshaped
The output was:
python resize.py
Output (shape: (1, 2, 2, 49)):
[[[[ 0 1 2 3 4 5 6 14 15 16 17 18 19 20 28 29 30 31
32 33 34 42 43 44 45 46 47 48 56 57 58 59 60 61 62 70
71 72 73 74 75 76 84 85 86 87 88 89 90]
[ 7 8 9 10 11 12 13 21 22 23 24 25 26 27 35 36 37 38
39 40 41 49 50 51 52 53 54 55 63 64 65 66 67 68 69 77
78 79 80 81 82 83 91 92 93 94 95 96 97]]
[[ 98 99 100 101 102 103 104 112 113 114 115 116 117 118 126 127 128 129
130 131 132 140 141 142 143 144 145 146 154 155 156 157 158 159 160 168
169 170 171 172 173 174 182 183 184 185 186 187 188]
[105 106 107 108 109 110 111 119 120 121 122 123 124 125 133 134 135 136
137 138 139 147 148 149 150 151 152 153 161 162 163 164 165 166 167 175
176 177 178 179 180 181 189 190 191 192 193 194 195]]]]
Reshaped (shape: (4, 7, 7, 1)):
[[[[ 0]
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]]
[[ 14]
[ 15]
[ 16]
[ 17]
[ 18]
[ 19]
[ 20]]
[[ 28]
[ 29]
[ 30]
[ 31]
[ 32]
[ 33]
[ 34]]
[[ 42]
[ 43]
[ 44]
[ 45]
[ 46]
[ 47]
[ 48]]
[[ 56]
[ 57]
[ 58]
[ 59]
[ 60]
[ 61]
[ 62]]
[[ 70]
[ 71]
[ 72]
[ 73]
[ 74]
[ 75]
[ 76]]
[[ 84]
[ 85]
[ 86]
[ 87]
[ 88]
[ 89]
[ 90]]]
[[[ 7]
[ 8]
[ 9]
[ 10]
[ 11]
[ 12]
[ 13]]
[[ 21]
[ 22]
[ 23]
[ 24]
[ 25]
[ 26]
[ 27]]
[[ 35]
[ 36]
[ 37]
[ 38]
[ 39]
[ 40]
[ 41]]
[[ 49]
[ 50]
[ 51]
[ 52]
[ 53]
[ 54]
[ 55]]
[[ 63]
[ 64]
[ 65]
[ 66]
[ 67]
[ 68]
[ 69]]
[[ 77]
[ 78]
[ 79]
[ 80]
[ 81]
[ 82]
[ 83]]
[[ 91]
[ 92]
[ 93]
[ 94]
[ 95]
[ 96]
[ 97]]]
[[[ 98]
[ 99]
[100]
[101]
[102]
[103]
[104]]
[[112]
[113]
[114]
[115]
[116]
[117]
[118]]
[[126]
[127]
[128]
[129]
[130]
[131]
[132]]
[[140]
[141]
[142]
[143]
[144]
[145]
[146]]
[[154]
[155]
[156]
[157]
[158]
[159]
[160]]
[[168]
[169]
[170]
[171]
[172]
[173]
[174]]
[[182]
[183]
[184]
[185]
[186]
[187]
[188]]]
[[[105]
[106]
[107]
[108]
[109]
[110]
[111]]
[[119]
[120]
[121]
[122]
[123]
[124]
[125]]
[[133]
[134]
[135]
[136]
[137]
[138]
[139]]
[[147]
[148]
[149]
[150]
[151]
[152]
[153]]
[[161]
[162]
[163]
[164]
[165]
[166]
[167]]
[[175]
[176]
[177]
[178]
[179]
[180]
[181]]
[[189]
[190]
[191]
[192]
[193]
[194]
[195]]]]
Based on the reshaped output, you can see it is a 4x7x7x1 with values for the first patch as: [0-7),[14-21), [28-35), [42-49), [56-63), [70-77), and [84-91), which corresponds to the upper left 7x7 grid.
Perhaps you can explain a bit further what's going on when it doesn't work correctly?
Related
I'm making a model to detect potholes in an image. I've done everything right or so it seems to me, but I can't train the model for some reason. What might be the problem here?
!python train.py --img 640 --cfg yolov5m.yaml --hyp data/hyps/hyp.scratch-med.yaml --batch 20 --epochs 300 --data data/potholeData.yaml --weights yolov5m.pt --workers 4 --name yolo_pothole_det_m
This is the final line of the code, which outputs the following.
train: weights=yolov5m.pt, cfg=yolov5m.yaml, data=data/potholeData.yaml, hyp=data/hyps/hyp.scratch-med.yaml, epochs=300, batch_size=20, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=4, project=runs/train, name=yolo_pothole_det_m, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v7.0-23-g5dc1ce4 Python-3.9.13 torch-1.13.0 CPU
hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.1, copy_paste=0.0
ClearML: run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 🚀 in ClearML
Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=1
from n params module arguments
0 -1 1 5280 models.common.Conv [3, 48, 6, 2, 2]
1 -1 1 41664 models.common.Conv [48, 96, 3, 2]
2 -1 2 65280 models.common.C3 [96, 96, 2]
3 -1 1 166272 models.common.Conv [96, 192, 3, 2]
4 -1 4 444672 models.common.C3 [192, 192, 4]
5 -1 1 664320 models.common.Conv [192, 384, 3, 2]
6 -1 6 2512896 models.common.C3 [384, 384, 6]
7 -1 1 2655744 models.common.Conv [384, 768, 3, 2]
8 -1 2 4134912 models.common.C3 [768, 768, 2]
9 -1 1 1476864 models.common.SPPF [768, 768, 5]
10 -1 1 295680 models.common.Conv [768, 384, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 2 1182720 models.common.C3 [768, 384, 2, False]
14 -1 1 74112 models.common.Conv [384, 192, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 2 296448 models.common.C3 [384, 192, 2, False]
18 -1 1 332160 models.common.Conv [192, 192, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 2 1035264 models.common.C3 [384, 384, 2, False]
21 -1 1 1327872 models.common.Conv [384, 384, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 2 4134912 models.common.C3 [768, 768, 2, False]
24 [17, 20, 23] 1 24246 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]]
Isn't it supposed to train the model after that? What am I doing wrong for it to stop it right here?
in cmd you can see that it didn't read any images dataset. make sure that your potholedata.yaml file true located. in this file you have to write this code:
train: ../train/images #path to train images
val: ../valid/images #path to valid images
nc: 1 #number of classes
names: ['Weapon'] #name of classes
After this you can run and your train will continue
I've looked around for a bit to find a solution to my problem but I haven't gotten anything that completely fixes it. Essentially the function does sort but it doesn't sort the numbers in the table just the numbers 1 through 10
local numbers = {18, 45, 90, 77, 65, 18, 3, 57, 81, 10}
local function selectionSort(t)--t is the table to be sorted
local t = {18, 45, 90, 77, 65, 18, 3, 57, 81, 10}
local tkeys = {}
for k in pairs(t) do table.insert(tkeys, k) end
table.sort(tkeys)
for _, k in ipairs(tkeys) do print(k, t[k]) end
return t -- return the sorted table
end
list = selectionSort(list)
and this is what comes out
1 18
2 45
3 90
4 77
5 65
6 18
7 3
8 57
9 81
10 10
and what I want is
3 18
10 45
18 90
18 77
45 65
57 18
65 3
77 57
81 81
90 10
any solutions?
You are taking the key from your input and you want the value.
you can change it to:
local list = {18, 45, 90, 77, 65, 18, 3, 57, 81, 10}
local function selectionSort(t)--t is the table to be sorted
local tSorted = {}
for _,v in pairs(t) do
table.insert(tSorted, v)
end
table.sort(tSorted)
for i=1,#t,1 do
print(tSorted[i], t[i])
end
return tSorted -- return the sorted table
end
list = selectionSort(numbers)
and you will get:
sorted original
3 18
10 45
18 90
18 77
45 65
57 18
65 3
77 57
81 81
90 10
Given an image im,
>>> np.random.seed(0)
>>> im = np.random.randint(0, 100, (10,5))
>>> im
array([[44, 47, 64, 67, 67],
[ 9, 83, 21, 36, 87],
[70, 88, 88, 12, 58],
[65, 39, 87, 46, 88],
[81, 37, 25, 77, 72],
[ 9, 20, 80, 69, 79],
[47, 64, 82, 99, 88],
[49, 29, 19, 19, 14],
[39, 32, 65, 9, 57],
[32, 31, 74, 23, 35]])
what is the best way to find a specific segment of this image, for instance
>>> im[6:9, 2:5]
array([[82, 99, 88],
[19, 19, 14],
[65, 9, 57]])
If the specific combination does not exist (maybe due to noise), I would like to have a similarity measure, which searches for segments with a similar distribution and tells me for each pixel of im, how good the agreement is. For instance something like
array([[0.03726647, 0.14738364, 0.04331007, 0.02704363, 0.0648282 ],
[0.02993497, 0.04446428, 0.0772978 , 0.1805197 , 0.08999 ],
[0.12261269, 0.18046972, 0.01985607, 0.19396181, 0.13062801],
[0.03418192, 0.07163043, 0.15013723, 0.12156613, 0.06500945],
[0.00768509, 0.12685481, 0.19178985, 0.13055806, 0.12701177],
[0.19905991, 0.11637007, 0.08287372, 0.0949395 , 0.12470202],
[0.06760152, 0.13495046, 0.06344035, 0.1556691 , 0.18991421],
[0.13250537, 0.00271433, 0.12456922, 0.97 , 0.194389 ],
[0.17563869, 0.10192488, 0.01114294, 0.09023184, 0.00399753],
[0.08834218, 0.19591735, 0.07188889, 0.09617871, 0.13773224]])
The example code is python.
I think there should be a solution correlating a kernel with im. This will have the issue though, that a segment with the same value but scaled, will give a sharper response.
Template matching would be one of the ways to go about it. Of course deep learning/ML can also be used for more complicated matching.
Most image processing libraries support some sort of matching function which compares a set of 2 image - reference and the one to match. In OpenCV it returns a score which can used to determine a match. The matching method uses various functions that support scale and/or rotation invariant matching. Beware of licensing constraints in the method you plan to use.
In case the images may not always be exact, you can use standard deviation (StdDev) to allow for permissible deviation and yet classify them into buckets. Histogram matching may also be used depending on the condition of image to be matched (lighting, color can be important, unless you use specific channels). Use of histogram will avoid matching template in its entirety.
Ref for Template Matching:
OpenCV - https://docs.opencv.org/master/d4/dc6/tutorial_py_template_matching.html
SciPy - https://scikit-image.org/docs/dev/auto_examples/features_detection/plot_template.html
Thanks to banerjk for the great answer - template matching is exactly the solution!
some backup method
Considering my correlating-with-a-kernel idea, there is some progress:
When one correlates the image with the template (i.e. what I called target segment in the question), chances are high, that the most intense point in the correlated image (relative to the mean intensity) matches the template position (see im and m in the example). Seems like I am not the first, who comes up with this idea, as can be see in these lecture notes on page 39.
However, this is not always true. This method, more or less, just detects weight at the largest values in the template. In the example, im2 is constructed such, that it tricks this concept.
Maybe it gets more reliable if one applies some filter (for instance median) on the image beforehand.
I just wanted to mention it here, as it might have advantages for certain situations (it should be more performant compared to the Wikipedia-implementation of template_matching).
example
import numpy as np
from scipy import ndimage
np.random.seed(0)
im = np.random.randint(0, 100, (10,5))
t = im[6:9, 2:5]
print('t', t, sep='\n')
m = ndimage.correlate(im, t) / ndimage.correlate(im, np.ones(t.shape))
m /= np.amax(m)
print('im', im, sep='\n')
print('m', m, sep='\n')
print("this can be 'tricked', however")
im2 = im.copy()
im2[6:9, :3] = 0
im2[6,1] = 1
m2 = ndimage.correlate(im2, t) / ndimage.correlate(im2, np.ones(t.shape))
m2 /= np.amax(m2)
print('im2', im2, sep='\n')
print('m2', m2, sep='\n')
output
t
[[82 99 88]
[19 19 14]
[65 9 57]]
im
[[44 47 64 67 67]
[ 9 83 21 36 87]
[70 88 88 12 58]
[65 39 87 46 88]
[81 37 25 77 72]
[ 9 20 80 69 79]
[47 64 82 99 88]
[49 29 19 19 14]
[39 32 65 9 57]
[32 31 74 23 35]]
m
[[0.73776208 0.62161208 0.74504705 0.71202601 0.66743979]
[0.70809611 0.70617161 0.70284942 0.80653741 0.67067733]
[0.55047727 0.61675268 0.5937487 0.70579195 0.74351706]
[0.7303857 0.77147963 0.74809273 0.59136392 0.61324214]
[0.70041161 0.7717032 0.69220064 0.72463532 0.6957257 ]
[0.89696894 0.69741108 0.64136612 0.64154719 0.68621613]
[0.48509474 0.60700037 0.65812918 0.68441118 0.68835903]
[0.73802038 0.83224745 0.87301124 1. 0.92272565]
[0.72708573 0.64909142 0.54540817 0.60859883 0.52663327]
[0.72061572 0.70357846 0.61626289 0.71932261 0.75028955]]
this can be 'tricked', however
im2
[[44 47 64 67 67]
[ 9 83 21 36 87]
[70 88 88 12 58]
[65 39 87 46 88]
[81 37 25 77 72]
[ 9 20 80 69 79]
[ 0 1 0 99 88]
[ 0 0 0 19 14]
[ 0 0 0 9 57]
[32 31 74 23 35]]
m2
[[0.53981867 0.45483201 0.54514907 0.52098765 0.48836403]
[0.51811216 0.51670401 0.51427317 0.59014141 0.49073293]
[0.40278285 0.4512764 0.43444444 0.51642621 0.54402958]
[0.5344214 0.56448972 0.54737758 0.43269951 0.44870774]
[0.51248943 0.56465331 0.50648148 0.53021386 0.50906076]
[0.78923691 0.56633529 0.51641414 0.44336403 0.50210263]
[0.88137788 0.89779614 0.63552189 0.55070797 0.50367059]
[0.88888889 1. 0.75544508 0.75694003 0.67515605]
[0.43965976 0.48492221 0.37490287 0.48511085 0.38533625]
[0.30754918 0.32478065 0.27066895 0.46685032 0.548985 ]]
Maybe someone can contribute on the background of the lecture notes.
update: It is discussed in J. P. Lewis, “Fast Normalized Cross-Correlation”, Industrial Light and Magic. on the very first page.
The input tensor shape as below
input =
[[ 0 0 1 2]
[ 0 3 4 5]
[ 0 6 7 8]
[ 1 9 10 11]
[ 1 12 13 14]
[ 1 15 16 17]
[ 1 18 19 20]
[ 1 21 22 23]
[ 1 24 25 26]
[ 1 27 28 29]
[ 1 30 31 32]
[ 2 33 34 35]
[ 2 36 37 38]
[ 2 39 40 41]]
And I want to extract block-wise elements according to the first element of each row(like:0,1,2), does anyone help me with it, THANKS!
If there are off-the-shelf function would be great.
I found that the result of LDA in OpenCV is different from other libraries. For example, the input data was
DATA (13 data samples with 4 dimensions)
7 26 6 60
1 29 15 52
11 56 8 20
11 31 8 47
7 52 6 33
11 55 9 22
3 71 17 6
1 31 22 44
2 54 18 22
21 47 4 26
1 40 23 34
11 66 9 12
10 68 8 12
LABEL
0 1 2 0 1 2 0 1 2 0 1 2 0
The OpenCV code is
Mat data = (Mat_<float>(13, 4) <<\
7, 26, 6, 60,\
1, 29, 15, 52,\
11, 56, 8, 20,\
11, 31, 8, 47,\
7, 52, 6, 33,\
11, 55, 9, 22,\
3, 71, 17, 6,\
1, 31, 22, 44,\
2, 54, 18, 22,\
21, 47, 4, 26,\
1, 40, 23, 34,\
11, 66, 9, 12,\
10, 68, 8, 12);
Mat mean;
reduce(data, mean, 0, CV_REDUCE_AVG);
mean.convertTo(mean, CV_64F);
Mat label(data.rows, 1, CV_32SC1);
for (int i=0; i<label.rows; i++)
label.at<int>(i) = i%3;
LDA lda(data, label);
Mat projection = lda.subspaceProject(lda.eigenvectors(), mean, data);
The matlab code is (used Matlab Toolbox for Dimensionality Reduction)
cd drtoolbox\techniques\
load hald
label=[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
[projection, trainedlda] = lda(ingredients, label)
The eigenvalues are
OpenCV (lda.eigenvectors())
0.4457 4.0132
0.4880 3.5703
0.5448 3.3466
0.5162 3.5794
Matlab Toolbox for Dimensionality Reduction (trainedlda.M)
0.5613 0.7159
0.6257 0.6203
0.6898 0.5884
0.6635 0.6262
Then the projections of data are
OpenCV
1.3261 7.1276
0.8892 -4.7569
-1.8092 -6.1947
-0.0720 1.1927
0.0768 3.3105
-0.7200 0.7405
-0.3788 -4.7388
1.5490 -2.8255
-0.3166 -8.8295
-0.8259 9.8953
1.3239 -3.1406
-0.5140 4.2194
-0.5285 4.0001
Matlab Toolbox for Dimensionality Reduction
1.8030 1.3171
1.2128 -0.8311
-2.3390 -1.0790
-0.0686 0.3192
0.1583 0.5392
-0.9479 0.1414
-0.5238 -0.9722
1.9852 -0.4809
-0.4173 -1.6266
-1.1358 1.9009
1.6719 -0.5711
-0.6996 0.7034
-0.6993 0.6397
The eigenvectors and projections are different even though these LDAs have the same data. I believe there are 2 possibilities.
One of the libraries is wrong.
I am doing it wrong.
Thank you!
The difference is because eigenvectors are not normalized.
The normalized (L2 norm) eigenvectors are
OpenCV
0.44569 0.55196
0.48798 0.49105
0.54478 0.46028
0.51618 0.49230
Matlab Toolbox for Dimensionality Reduction
0.44064 0.55977
0.49120 0.48502
0.54152 0.46008
0.52087 0.48963
They look simliar now, although they have quite different eigenvalues.
Even though the PCA in OpenCV returns normalized eigenvectors, LDA does not. My next question is 'Is normalizing eigenvectors in LDA not necessary?'