I'm trying to process live screen. There is a game about catching fish. You have to click on fish when it is in circle. I think that I can process my screen with opencv, find fish and click on it with pyautogui.
I did it but problem is program not fast enough to click. By the way game is a mini game in Metin 2 mmorpg. It is like a hack or bot but I just wondering if can I do that.
Here is my code:
import numpy as np
import cv2
from PIL import ImageGrab
import pyautogui
import time
while True:
img=ImageGrab.grab(bbox=(341,208,430,290))
img_np=np.array(img)
#gray=cv2.cvtColor(img_np, cv2.COLOR_BGR2GRAY)
#lower=np.array([57,91,120])
#upper=np.array([65,95,160])
#mask=cv2.inRange(gray,95,130)
#sonuc=cv2.bitwise_and(gray,gray,mask=mask)
#cv2.imshow('frame',mask)
degsk=np.argwhere(img_np==[123,90,57])
if len(degsk)!=0:
#print(degsk)
yerx=341+degsk[int(len(degsk)/2),1]
yery=208+degsk[int(len(degsk)/2),0]
#pyautogui.click(x=yerx, y=yery)
time.sleep(0.8)
if cv2.waitKey(1)&0xFF==ord('q'):
break
cv2.destroyAllWindows()
As you can see, first I tried mask the screen than I realise that it is not necessary so I found BGR value of fish and programed to find it in numpy array than I took value middle in the array and than i used mouse move function. As I said this is not fast enough to catch fish.
So the program is working but delayed for catch fish. How can I make faster this program?
Game Screen Here
Using mss is really fast. Try this:
import time
import cv2
import mss
import numpy
with mss.mss() as sct:
# Part of the screen to capture
monitor = {"top": 40, "left": 0, "width": 800, "height": 640}
while "Screen capturing":
last_time = time.time()
# Get raw pixels from the screen, save it to a Numpy array
img = numpy.array(sct.grab(monitor))
# Display the picture
cv2.imshow("OpenCV/Numpy normal", img)
# Display the picture in grayscale
# cv2.imshow('OpenCV/Numpy grayscale',
# cv2.cvtColor(img, cv2.COLOR_BGRA2GRAY))
print("fps: {}".format(1 / (time.time() - last_time)))
# Press "q" to quit
if cv2.waitKey(25) & 0xFF == ord("q"):
cv2.destroyAllWindows()
break
More information on https://python-mss.readthedocs.io/index.html
I find IPython best for timing things, so if you start IPython and paste in the following code:
from PIL import ImageGrab
img=ImageGrab.grab(bbox=(341,208,430,290))
You can then time a statement with:
%timeit img=ImageGrab.grab(bbox=(341,208,430,290))
and I get this:
552 ms ± 5.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So, grabbing the screen takes over 500ms, so you are only going to get under 2 frames/second - without even processing it.
If you want to grab the screen faster, I would suggest ffmpeg. I installed it on my iMac running macOS with homebrew using:
brew install ffmpeg
I can then see the list of available video sources and find what I need to record the screen with:
ffmpeg -f avfoundation -list_devices true -i ""
Sample Output
[AVFoundation input device # 0x7fa7dcf05b40] AVFoundation video devices:
[AVFoundation input device # 0x7fa7dcf05b40] [0] FaceTime HD Camera
[AVFoundation input device # 0x7fa7dcf05b40] [1] Capture screen 0 <--- THIS ONE IS THE SCREEN
[AVFoundation input device # 0x7fa7dcf05b40] AVFoundation audio devices:
[AVFoundation input device # 0x7fa7dcf05b40] [0] MacBook Pro Microphone
[AVFoundation input device # 0x7fa7dcf05b40] [1] CalDigit Thunderbolt 3 Audio
So I know I need input 1 for the screen.
So, if I want to record the screen from top-left corner (0,0) at a width of 400px and height of 200px at 20fps for 10s and pass RGBA8888 data to my fishing program, I can do this:
ffmpeg -y -pix_fmt bgr0 -f avfoundation -r 20 -t 10 -i 1 -filter:v "crop=400:200:0:0" -f rawvideo - | ./fish.py
I can now use the following as my fishing program:
#!/usr/bin/env python3
import numpy as np
import pyautogui
import time
import os, sys
# width, height
w, h = 400, 200
# Bytes per frame - assumes bgr0, i.e. 8-bits of blue, 8-bits red. 8-bits green and 8-bits junk
bytesPerFrame = w * h * 4
while True:
img = sys.stdin.buffer.read(bytesPerFrame)
if len(img) != bytesPerFrame:
break
# Process your video here
Keywords: pyautogui, screen grab, screen-grab, screengrab, slow, Mac, macOS, Python, capture, screen capture, ffmpeg, PIL, OpenCV
Related
image of the program execution
I'm trying to make a program that shows a video streaming of an usb camera and a tkinter app. I use opencv for displaying the image but i'm not plotting the image in the tkinter app.
The program is for controlling a differential robot using the information about the position and orientation obtained from the camera. The first step is to connect the pc to the raspberry (the raspberry sends via serial port commands to the robot) and when I click the button "Conectar" the camera stream stops. But if I use the laptop camera, the camera stream doesn't stop. I don't undserstand why. But I need to not stop the streaming because it also happens with the button "Ir al punto", which execute the function that lead the robot to the destination point. And if the streaming is stopped, the information about the position and orientation is not correct and the robot can't reach the point.
I use threading to display the camera:
# Creation of the tkinter app: labels, buttons....
# I'm not showing this because is too long
# Camera parameters
cv2.namedWindow("Zoom")
cv2.moveWindow("Zoom", 0,512)
cv2.moveWindow("Visualization", 0,0)
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FPS, 30)
detector = apriltag.Detector()
zoomed = np.zeros((300, 300, 3), dtype=np.uint8)
# Threading
t1 = threading.Thread(target=show_camera2) # show_camera2() is the typical opencv videocapture imshow loop
t1.start()
root.mainloop()
I've tried to merge the opencv windows in the tkinter app using this:
imgframe = cv2.cvtColor(frame, cv2.COLOR_BGR2RGBA)
img1 = Image.fromarray(imgframe)
imgtk1 = ImageTk.PhotoImage(image=img1)
label_camera.imgtk = imgtk1
label_camera.configure(image=imgtk1)
label_camera.update()
but still stopping when pressing a button.
Was trying to quickly view my wip particle simulator and so I used matplotlib to plot the particles. Howevever, matplotlib seems to make small adjustments inbetween images. (The images are written in to video using cv2 videowriter).
Does anyone know how to hardset the ranges? (currently using xlim and ylim)
fig.tight_layout()
plt.xlim(xmin-x_axis_buffer,xmax+x_axis_buffer)
plt.ylim(ymin-y_axis_buffer,ymax+y_axis_buffer)
for i in range(nIters):
plt.scatter(data[i,:,0],data[i,:,1],c=[i for i in range(nParticles)],cmap="gist_rainbow")
fig.set_size_inches(8, 6)
_=f'{i:04}.png'
plt.savefig(_, dpi=100)
plt.cla()
output_from_video.gif
My code processes a frame and it takes a couple of seconds to execute. If I'm streaming from a camera, I will naturally drop frames and get a frame every couple of seconds, right? I want to simulate the same thing in replaying a video file.
Normally, when you call vidcap.read(), you get the next frame in the video. This essentially slows down the video and does not miss a frame. This is not like processing a live camera stream. Is there a way to process the video file and drop frames during processing like when processing the camera stream?
The solution that comes to my mind is to keep track of time myself and call vidcap.set(cv2.CAP_PROP_POS_MSEC, currentTime) before each vidcap.read(). Is this how I should do it, or is there a better way?
One approach is to keep track of the processing time and skip that amount of frames:
import cv2, time, math
# Capture saved video that is used as a stand-in for webcam
cap = cv2.VideoCapture('/home/stephen/Desktop/source_vids/ss(6,6)_id_146.MP4')
# Get the frames per second of the source video
fps = 120
# Iterate through video
while True:
# Record your start time for the frame
start = time.time()
# Read the frame
_, img = cap.read()
# Show the image
cv2.imshow('img', img)
# What ever processing that is going to slow things down should go here
k = cv2.waitKey(0)
if k == 27: break
# Calculate the time it took to process this frame
total = time.time() - start
# Print out how many frames to skip
print(total*fps)
# Skip the frames
for skip_frame in range(int(total*fps)): _, _ = cap.read()
cv2.destroyAllWindows()
This is probably better than nothing, but it does not correctly simulate the way that frames will be dropped. It appears that during processing, the webcam data is written to a buffer (until the buffer fills up). A better approach is to capture the video with a dummy process. This processor intensive dummy process will cause frames to be dropped:
import cv2, time, math
# Capture webcam
cap = cv2.VideoCapture(0)
# Create video writer
vid_writer = cv2.VideoWriter('/home/stephen/Desktop/drop_frames.avi',cv2.VideoWriter_fourcc('M','J','P','G'),30, (640,480))
# Iterate through video
while True:
# Read the frame
_, img = cap.read()
# Show the image
cv2.imshow('img', img)
k = cv2.waitKey(1)
if k == 27: break
# Do some processing to simulate your program
for x in range(400):
for y in range(40):
for i in range(2):
dummy = math.sqrt(i+img[x,y][0])
# Write the video frame
vid_writer.write(img)
cap.release()
cv2.destroyAllWindows()
I am using FFmpeg to overlay image/emoji on video by this command -
"-i "+inputfilePath+" -filter_complex "+"[0][1]overlay=enable='between(t,"+startTime+","+endTime+")'[v1]"+" -map [v0] -map 0:a "+OutputfilePath;
But above command only overlay image over video and stays still.
In Instagram and Snapchat there is New pin feature. I want exactly same ,eg blur on moving faces or as in below videos -
Here is link.
Is it possible via FFmpeg?
I think someone with OPENCV or Argumented Reality knowledge can help in this. It is quiet similar to AR as we need to move/zoom emoji exactly where we want to on video/live cam.
Based on overlay specification:
https://ffmpeg.org/ffmpeg-filters.html#overlay-1
when you specify time interval it will happen only at that time interval:
For example, to enable a blur filter (smartblur) from 10 seconds to 3 minutes:
smartblur = enable='between(t,10,3*60)'
What you need to do is to overlay an image at specific coordinates, for example the following at fixed x and y:
ffmpeg -i rtsp://[host]:[port] -i x.png -filter_complex 'overlay=10:main_h-overlay_h-10' http://[host]:[post]/output.ogg
Now the idea is to calculate those coordinates based on the current frame of the video and force filter to use changed coordinates on every frame.
For example based on time:
FFmpeg move overlay from one pixel coordinate to another
ffmpeg -i bg.mp4 -i fg.mkv -filter_complex \
"[0:v][1:v]overlay=enable='between=(t,10,20)':x=720+t*28:y=t*10[out]" \
-map "[out]" output.mkv
Or using some other expressions:
http://ffmpeg.org/ffmpeg-utils.html#Expression-Evaluation
Unfortunately this will require to find a formula before using those limited expressions of cat moving his head or drawing a pen for x and y. It can be linear, trigonometric or other dependency from time:
x=sin(t)
With the free move it is not always possible.
To be more precise of finding an object coordinates to overlay something it should be possible to provide your own filter(ffmpeg is open sourced) similar to overlay:
https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/vf_overlay.c
Calculating x and y either based on external file(where you can dump all x and y for all times if it is a static video) or do some image processing to find specific region.
Hopefully it will give you an idea and direction to move to.
It's very interesting feature.
I'm trying to grab frames from a web cam using OpenCV. I also tried 'cheese'. Both give me a pretty weird picture: distorted, wrong colors. Using mplayer I was able to figure out the correct codec "yuy2". Even mplayer sometimes would select the wrong codec ("yuv"), which makes it look just like using OpenCV / cheese to capture an image.
Can I somehow tell OpenCV which codec to use?
Thanks!
in the latest version of opencv you can set the capture format form the camera with the same fourcc style code you would use for video. See http://docs.opencv.org/modules/highgui/doc/reading_and_writing_images_and_video.html#videocapture
it may still take a bit of trial-and-error, terms like YUV, YUYV, YUY2 are used a bit loosely by the camera maker, the driver maker, the operating system, the directshow layer and opencv !
OpenCV automatically selects the first available capture backend (see here). It can be that it is not using V4L2 automatically.
Also set both -D WITH_V4L=ON and -D WITH_LIBV4L=ON if building from source.
In order to set the pixel format to be used set the CAP_PROP_FOURCC property of the capture:
capture = cv2.VideoCapture(self.cam_id, cv2.CAP_V4L2)
scapture.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'))
width = 1920
height = 1080
capture.set(cv2.CAP_PROP_FRAME_WIDTH, width)
capture.set(cv2.CAP_PROP_FRAME_HEIGHT, height)