I just posted so recently but I'm new to rust and keep hitting walls with what I imagine is elementary for someone more documentation-literate than I am.
I have been recreating my python code listed here
while True:
cv2.namedWindow("HSV")
cv2.resizeWindow("HSV",640,480)
cv2.createTrackbar("HUE MIN","HSV",0,179,empty)
cv2.createTrackbar("HUE MAX","HSV",0,179,empty)
cv2.createTrackbar("SAT MIN","HSV",0,255,empty)
cv2.createTrackbar("SAT MAX","HSV",0,255,empty)
cv2.createTrackbar("VAL MIN","HSV",0,255,empty)
cv2.createTrackbar("VAL MAX","HSV",0,255,empty)
while(cap.isOpened()):
ret, frame = cap.read()
if ret==True:
hsvframe = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
h_min = cv2.getTrackbarPos("HUE MIN","HSV")
h_max = cv2.getTrackbarPos("HUE MAX","HSV")
s_min = cv2.getTrackbarPos("SAT MIN","HSV")
s_max = cv2.getTrackbarPos("SAT MAX","HSV")
v_min = cv2.getTrackbarPos("VAL MIN","HSV")
v_max = cv2.getTrackbarPos("VAL MAX","HSV")
lower = np.array([h_min,s_min,v_min])
upper = np.array([h_max,s_max,v_max])
mask = cv2.inRange(hsvframe,lower,upper)
result = cv2.bitwise_and(frame,frame,mask=mask)
print(lower,upper)
out.write(mask)
cv2.imshow('mask',mask)
pass
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
I have been trying to recreate this while loop in rust and cant seem to understand how to call the position with opencv::highgui::get_trackbar_pos like cv2.getTrackbarPos
I also dont necessarily understand loop {} and its nuances so im not sure what the equivalent ret, frame = cap.read, if ret==True: would be because it seems like rust's cvt_color handles that so I dont know if it affects where my get_trackbar_pos should be
Lastly I'm really not sure how I should be writing to a lower and upper set of arrays so any advice there would be helpful.
use std::f32::MAX;
use opencv::{ highgui::{self, WINDOW_AUTOSIZE}, imgproc, prelude::*, videoio, Result};
fn run() -> Result<()> {
let window = "video capture";
let hsv = "HSV";
highgui::named_window(window, 1)?;
highgui::named_window(hsv, 1)?;
highgui::named_window(hsv, WINDOW_AUTOSIZE)?;
highgui::create_trackbar("HUE MAX", hsv, None, 179, None)?;
highgui::create_trackbar("HUE MIN", hsv, None, 179, None)?;
highgui::create_trackbar("SAT MAX", hsv, None, 179, None)?;
highgui::create_trackbar("SAT MIN", hsv, None, 179, None)?;
highgui::create_trackbar("VAL MAX", hsv, None, 179, None)?;
highgui::create_trackbar("VAL MIN", hsv, None, 179, None)?;
opencv::opencv_branch_32! {
let mut cam = videoio::VideoCapture::new_default(0)?; // 0 is the default camera
}
opencv::not_opencv_branch_32! {
let mut cam = videoio::VideoCapture::new(0, videoio::CAP_ANY)?; // 0 is the default camera
}
let opened = videoio::VideoCapture::is_opened(&cam)?;
if !opened {
panic!("Unable to open default camera!");
}
loop {
let mut frame = Mat::default();
cam.read(&mut frame)?;
if frame.size()?.width > 0 {
let mut gray = Mat::default();
let h_max = opencv::highgui::get_trackbar_pos("HUE MAX", hsv)?;
let h_min = opencv::highgui::get_trackbar_pos("HUE MIN", hsv)?;
let s_max = opencv::highgui::get_trackbar_pos("SAT MAX", hsv)?;
let s_min = opencv::highgui::get_trackbar_pos("SAT MIN", hsv)?;
let v_max = opencv::highgui::get_trackbar_pos("VAL MAX", hsv)?;
let v_min = opencv::highgui::get_trackbar_pos("VAL MIN", hsv)?;
let _lower = [h_min, s_min, v_min];
let _upper = [h_max, s_max, v_max];
imgproc::cvt_color(&frame, &mut gray, imgproc::COLOR_BGR2HSV, 0)?;
highgui::imshow(window, &gray)?;
}
if highgui::wait_key(10)? > 0 {
break;
}
}
Ok(())
}
fn main() {
run().unwrap()
}
Here is my main.rs
You can see at the bottom my loop and arrays.
The Rust OpenCV Documentation seems really simple and clear, I'm just fairly rust illiterate and even reading the rust book I'm having trouble.
Related
TL;DR I'm using:
adaptive thresholding
segmenting by keys (width/height ratio) - see green boxes in image result
psm 10 to treat each key as a character
but it fails to recognize some keys, falsely identifies others or identifies 2 for 1 char (see the L character in the image result, it's an L and P), etc.
Note: I cropped the image and re-ran the results to get it to fit on this site, but before cropping it did slightly better (recognized more keys, fewer false positives, etc).
I just want it to recognize the alphabet keys. Ultimately I will want it to work for realtime video.
config:
'-l eng --oem 1 --psm 10 -c tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZ"'
I've tried scaling the image differently, scaling the individual key segments, using opening/closing/etc but it doesn't recognize all the keys.
original image
image result
Update: new results if I make the image straighter (bird's eye) and remove the whitelisting, it manages to detect all for the most part (although it thinks the O is a 0 and the I is a |, which is understandable). Why is this and how could I make this adaptive enough for a dynamic video when it is so sensitive to these conditions?
Code:
import pytesseract
import numpy as np
try:
from PIL import Image
except ImportError:
import Image
import cv2
from tqdm import tqdm
from collections import defaultdict
def get_missing_chars(dict):
capital_alphabet = [chr(ascii) for ascii in range(65, 91)]
return [let for let in capital_alphabet if let not in dict]
def draw_box_and_char(img, contour_dims, c, box_col, text_col):
x, y, w, h = contour_dims
top_left = (x, y)
bot_right = (x + w, y+h)
font_offset = 3
text_pos = (x+h//2+12, y+h-font_offset)
img_copy = img.copy()
cv2.rectangle(img_copy, top_left, bot_right, box_col, 2)
cv2.putText(img_copy, c, text_pos, cv2.FONT_HERSHEY_SIMPLEX, fontScale=.5, color=text_col, thickness=1, lineType=cv2.LINE_AA)
return img_copy
def detect_keys(img):
scaling = .25
img = cv2.resize(img, None, fx=scaling, fy=scaling, interpolation=cv2.INTER_AREA)
print("img shape", img.shape)
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ratio_min = 0.7
area_min = 1000
nbrhood_size = 1001
bias = 2
# adapt to different lighting
bin_img = cv2.adaptiveThreshold(gray_img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv2.THRESH_BINARY_INV, nbrhood_size, bias)
items = cv2.findContours(bin_img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = items[0] if len(items) == 2 else items[1]
key_contours = []
for c in contours:
x, y, w, h = cv2.boundingRect(c)
ratio = h/w
area = cv2.contourArea(c)
# square-like ratio, try to get character
if ratio > ratio_min and area > area_min:
key_contours.append(c)
detected = defaultdict(int)
n_kept = 0
img_copy = cv2.cvtColor(bin_img, cv2.COLOR_GRAY2RGB)
let_to_contour = {}
n_contours = len(key_contours)
# offset to get smaller square within the key segment for easier char recognition
offset = 10
show_each_char = False
for _, c in tqdm(enumerate(key_contours), total=n_contours):
x, y, w, h = cv2.boundingRect(c)
ratio = h/w
area = cv2.contourArea(c)
base = np.zeros(bin_img.shape, dtype=np.uint8)
base.fill(255)
n_kept += 1
new_y = y+offset
new_x = x+offset
new_h = h-2*offset
new_w = w-2*offset
base[new_y:new_y+new_h, new_x:new_x+new_w] = bin_img[new_y:new_y+new_h, new_x:new_x+new_w]
segment = cv2.bitwise_not(base)
# try scaling up individual keys
# scaling = 2
# segment = cv2.resize(segment, None, fx=scaling, fy=scaling, interpolation=cv2.INTER_CUBIC)
# psm 10: treats the segment as a single character
custom_config = r'-l eng --oem 1 --psm 10 -c tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZ"'
d = pytesseract.image_to_data(segment, config=custom_config, output_type='dict')
conf = d['conf']
c = d['text'][-1]
if c:
# sometimes recognizes multiple keys even though there is only 1
for sub_c in c:
# save character and contour to draw on image and show bounds/detection
if sub_c not in let_to_contour or (sub_c in let_to_contour and conf > let_to_contour[sub_c]['conf']):
let_to_contour[sub_c] = {'conf': conf, 'cont': (new_x, new_y, new_w, new_h)}
else:
c = "?"
text_col = (0, 0, 255)
if show_each_char:
contour_dims = (new_x, new_y, new_w, new_h)
box_col = (0, 255, 0)
text_col = (0, 0, 0)
segment_with_boxes = draw_box_and_char(segment, contour_dims, c, box_col, text_col)
cv2.imshow('segment', segment_with_boxes)
cv2.waitKey(0)
cv2.destroyAllWindows()
# draw boxes around recognized keys
for c, data in let_to_contour.items():
box_col = (0, 255, 0)
text_col = (0, 0, 0)
img_copy = draw_box_and_char(img_copy, data['cont'], c, box_col, text_col)
detected = {k: 1 for k in let_to_contour}
for det in let_to_contour:
print(det, let_to_contour[det])
print("total detected: ", let_to_contour.keys())
missing = get_missing_chars(detected)
print(f"n_missing: {len(missing)}")
print(f"chars missing: {missing}")
return img_copy
if __name__ == "__main__":
img_file = "keyboard.jpg"
img = cv2.imread(img_file)
img_with_detected_keys = detect_keys(img)
cv2.imshow("detected", img_with_detected_keys)
cv2.waitKey(0)
cv2.destroyAllWindows()
I am trying to do an object detection problem and been working with aquarium dataset from roboflow. I have been trying to create a bounding box for the fishes, but I have getting the error:
UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7fd42c66d3b0>
I also tried to see what images are corrupted and ran a code
import PIL
from pathlib import Path
from PIL import UnidentifiedImageError
count = 0
path = Path("/content/drive/MyDrive/archive/Aquarium Combined").rglob("*.jpg")
for img_p in path:
try:
img = PIL.Image.open(img_p)
except PIL.UnidentifiedImageError:
print(img_p)
count +=1
print(count)
It has given me a count of 651 images, but my dataset has 662 images. I guess PIL doesn't know how to decode it or I don't know what the problem is. I will attach a sample image file name
/content/drive/MyDrive/archive/Aquarium Combined/test/IMG_2301_jpeg_jpg.rf.2c19ae5efbd1f8611b5578125f001695.jpg
Full traceback:
UnidentifiedImageError Traceback (most recent call last)
<ipython-input-31-2785d562a97e> in <module>()
4 sample[1]['boxes'][:, [1, 0, 3, 2]],
5 [classes[i] for i in sample[1]['labels']],
----> 6 width=4).permute(1, 2, 0)
7 )
3 frames
/usr/local/lib/python3.7/dist-packages/PIL/Image.py in open(fp, mode)
2894 if mode == "P":
2895 from . import ImagePalette
-> 2896
2897 im.palette = ImagePalette.ImagePalette("RGB", im.im.getpalette("RGB"))
2898 im.readonly = 1
UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7fd42c66d3b0>
Also I am providing the class functions"
class AquariumDetection(datasets.VisionDataset):
def __init__(
self,
root: str,
split = "train",
transform= None,
target_transform = None,
transforms = None,
) -> None:
super().__init__(root, transforms, transform, target_transform)
self.split = split
self.coco = COCO(os.path.join(root, split, "_annotations.coco.json"))
self.ids = list(sorted(self.coco.imgs.keys()))
self.ids = [id for id in self.ids if (len(self._load_target(id)) > 0)]
def _load_image(self, id: int) -> Image.Image:
path = self.coco.loadImgs(id)[0]["file_name"]
image = cv2.imread(os.path.join(self.root, self.split, path))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
return image
def _load_target(self, id: int):
return self.coco.loadAnns(self.coco.getAnnIds(id))
def __getitem__(self, index: int):
id = self.ids[index]
image = self._load_image(id)
target = copy.deepcopy(self._load_target(id))
boxes = [t['bbox'] + [t['category_id']] for t in target]
if self.transforms is not None:
transformed = self.transforms(image=image, bboxes=boxes)
image = transformed['image']
boxes = transformed['bboxes']
new_boxes = []
for box in boxes:
xmin = box[0]
ymin = box[1]
xmax = xmin + box[2]
ymax = ymin + box[3]
new_boxes.append([ymin, xmin, ymax, xmax])
boxes = torch.tensor(new_boxes, dtype=torch.float32)
_, h, w = image.shape
targ = {}
targ["boxes"] = boxes
targ["labels"] = torch.tensor([t["category_id"] for t in target], dtype=torch.int64)
targ["image_id"] = torch.tensor([t["image_id"] for t in target])
targ["area"] = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
targ["iscrowd"] = torch.tensor([t["iscrowd"] for t in target], dtype=torch.int64)
targ["img_scale"] = torch.tensor([1.0])
targ['img_size'] = (h, w)
image = image.div(255)
normalize = T.Compose([T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
return normalize(image), targ, index
def __len__(self) -> int:
return len(self.ids)
I am trying to write a neural network in rust + arrayfire, and while gradient descent works, ADAM does not.
fn back_propagate(
&mut self,
signals: &Vec<Array<f32>>,
labels: &Array<u8>,
learning_rate_alpha: f64,
batch_size: i32,
) {
let mut output = signals.last().unwrap();
let mut error = output - labels;
for layer_index in (0..self.num_layers - 1).rev() {
let signal = Self::add_bias(&signals[layer_index]);
let deriv = self.layer_activations[layer_index].apply_deriv(output);
let delta = &(deriv * error).T();
let matmul = matmul(&delta, &signal, MatProp::NONE, MatProp::NONE);
let gradient_t = (matmul / batch_size).T();
match self.optimizer {
Optimizer::GradientDescent => {
let weight_update = learning_rate_alpha * gradient_t;
self.weights[layer_index] -= weight_update;
}
Optimizer::Adam => {
let exponents = constant(2f32, gradient_t.dims());
self.first_moment_vectors[layer_index] = (&self.beta1[layer_index]
* &self.first_moment_vectors[layer_index])
+ (&self.one_minus_beta1[layer_index] * &gradient_t);
self.second_moment_vectors[layer_index] = (&self.beta2[layer_index]
* &self.second_moment_vectors[layer_index])
+ (&self.one_minus_beta2[layer_index]
* arrayfire::pow(&gradient_t, &exponents, true));
let corrected_first_moment_vector = &self.first_moment_vectors[layer_index]
/ &self.one_minus_beta1[layer_index];
let corrected_second_moment_vector = &self.second_moment_vectors[layer_index]
/ &self.one_minus_beta2[layer_index];
let denominator = sqrt(&corrected_second_moment_vector) + 1e-8;
let weight_update =
learning_rate_alpha * (corrected_first_moment_vector / denominator);
self.weights[layer_index] -= weight_update;
}
}
output = &signals[layer_index];
let err = matmulTT(
&delta,
&self.weights[layer_index],
MatProp::NONE,
MatProp::NONE,
);
error = index(&err, &[seq!(), seq!(1, output.dims()[1] as i32, 1)]);
}
}
I've stored beta1, beta2, 1-beta1, 1-beta2 in constant arrays for every layer just to avoid having to recompute them. It appears to have made no difference.
GradientDescent converges with a learning rate alpha=2.0, however with Adam, if i use alpha>~0.02, the network appears to get locked in. Funnily enough, if I remove all the hidden layers, it does work. Which tells me something, but I'm not sure what it is.
I figured it out, for anyone else, my alpha=0.01 is still too high, once I reduced it to 0.001, it converged very quickly
This project is face-recognition with barcode. It needs to detect the face first before it can scan the barcodes. The flow is fine not unless after it detect someone face the window in imshow is not responding anymore, the webcam got froze. I want the webcam to continue moving while processing my codes, how can I do that?
cap = cv2.VideoCapture(0)
if not cap.isOpened():
raise IOError("Cannot open webcam") \
temp = ""
while True:
success, eImgs = cap.read()
if success:
font = cv2.FONT_HERSHEY_PLAIN
datet = str(datetime.now())
frame = cv2.putText(eImgs, datet, (10, 50), font, 1, (0, 0, 128), 1, cv2.LINE_AA)
# eImgs_v1 = cv2.resize (eImgs, (0, 0), None, 0.25, 0,25)
eImgs_v1 = cv2.cvtColor(eImgs, cv2.COLOR_BGR2RGB)
facesWebcam = face_recognition.face_locations(eImgs_v1)
encodesWebcam = face_recognition.face_encodings(eImgs_v1, facesWebcam)
for encodeKnown_v2, faceLoc in zip(encodesWebcam, facesWebcam):
facesCompared = face_recognition.compare_faces(encodeKnown, encodeKnown_v2)
faceDistance = face_recognition.face_distance(encodeKnown, encodeKnown_v2)
faceIndex = np.argmin(faceDistance)
if facesCompared[faceIndex]:
employeeName = ListNames[faceIndex]
y = employeeName
if temp == "" or temp != name:
print(name)
temp = name
if y:
print(y)
print("AUTHORIZED")
time.sleep(1)
**# Arduino and Python connection**
*arduino = serial.Serial('COM9', 115200, timeout=.1)
time.sleep(1)
print("The system is ready!")
while True:
barcode = arduino.readline()[:-2]
strbarcode = barcode.decode('utf-8')
if strbarcode:
x = strbarcode
print(x)
if y == x:
print('Have a nice day!')
time.sleep(3)
print("Next Employee please!")
else:
print('This is not yours!')*
else:
print("UNAUTHORIZED")
p1, p2, p3, p4 = faceLoc
cv2.rectangle(eImgs, (p1, p2), (p3, p4), (0, 255, 0), 2)
cv2.putText(eImgs, y, (p1, p3), cv2.FONT_HERSHEY_PLAIN, 1, (0, 255, 0), 2)
cv2.imshow('EMPLOYEE', eImgs)
cv2.waitKey(1)
cap.release()
cv2.destroyAllWindows()
I recommend you to use deepface. Its stream function applies face recognition with several state-of-the-art face recognition models.
models = ['VGG-Face', 'Facenet', 'OpenFace', 'DeepFace', 'DeepID', 'Dlib', 'ArcFace']
#!pip install deepface
from deepface import DeepFace
DeepFace.stream(db_path = 'C:/my_db'
, model_name = models[0]
, enable_face_analysis = False #to disable age, gender, emotion prediction
)
So, I have what is basically a three-dimensional array in Lua, really a voxel system. It looks like this:
local VoxelTable = {
[1] = { --X
[5] = { --Y
[2] = { --Z
["Type"] = "Solid",
["Rotation"] = "InverseX",
["Material"] = "Grass",
["Size"] = Vector3.new(1,1,1)
--A 1x1x1 Solid grass block with rotation "InverseX"
}
}
}
}
The Voxels are generated, and because of that I can't compress them manually. But without compression rendering lags the game down a lot.
What I want to do is if there are three grass blocks right above/below eachother with the same rotation value, I combine them into one voxel, with a size of Vector3.new(1,3,1), with the position of the middle voxel.
So
[1] = { --X
[5] = { --Y
[2] = { --Z
["Type"] = "Solid",
["Rotation"] = "InverseX",
["Material"] = "Grass",
["Size"] = Vector3.new(1,1,1)
}
},
[6] = { --Y
[2] = { --Z
["Type"] = "Solid",
["Rotation"] = "InverseX",
["Material"] = "Grass",
["Size"] = Vector3.new(1,1,1)
}
},
[7] = { --Y
[2] = { --Z
["Type"] = "Solid",
["Rotation"] = "InverseX",
["Material"] = "Grass",
["Size"] = Vector3.new(1,1,1)
}
}
}
becomes
[1] = { --X
[6] = { --Y
[2] = { --Z
["Type"] = "Solid",
["Rotation"] = "InverseX",
["Material"] = "Grass",
["Size"] = Vector3.new(1,3,1)
}
}
}
Here’s a somewhat simplified example. I’ve created a 10 x 10 x 10 cube of voxels, giving each voxel a vec3 size attribute (as you have it) and a random letter attribute (a, b, or c). I then iterate over the voxels, looking up and down. If the voxel I’m on has the same letter attribute as the voxel above and below, then I set the above and below voxels to nil, and increase the size attribute of the middle voxel. I am sure this could all be optimized, and I’m sure more sophisticated logic could look for other voxel relationships besides this hard-coded stack-of-three identical voxels. But this is a start:
local world = {}
local letters = {"a", "b", "c"}
function setup()
-- Create simplified test data
for x = 1, 10 do
world[x] = {}
for y = 1, 10 do
world[x][y] = {}
for z = 1, 10 do
world[x][y][z] = {}
local randomIndex = math.random(1, 3)
world[x][y][z].letter = letters[randomIndex]
world[x][y][z].size = vec3(1, 1, 1)
end
end
end
-- Combine common stacks of three
for x = 1, 10 do
for y = 2, 9 do -- Ensure there is at least a level below (y == 1) or above (y == 10)
for z = 1, 10 do
combineStacks(x, y, z)
end
end
end
end
function combineStacks(x, y, z)
local low = world[x][y - 1][z]
local mid = world[x][y][z]
local high = world[x][y + 1][z]
if low ~= nil and mid ~= nil and high ~= nil then
if low.letter == mid.letter and mid.letter == high.letter then
world[x][y - 1][z] = nil -- low
world[x][y + 1][z] = nil -- high
mid.size = vec3(1, 3, 1)
print("Stack of three identical voxels found!")
end
end
end
The above was written and tested (and visualized, shown below) in Codea. The vec3 construct is native to that environment and not to Lua in general, so keep that in mind.
Here’s a 2D visualization of the results, with each square showing a slice of the voxel cube. If you see a yellow point (representing a stack of three!), look at the square slice to the left and right, and at the same location you will see the voxels there are nil: