Image captioning of videogame screenshots using deep learning - image-processing

I am looking for a way to caption screenshots from fantasy videogames like Genshin Impact. Does anyone have suggestions on which model I should use? In the case below, ClipCap gives as result digital art selected for the #.


Recognize Logo in a full image

First, you need to know that I'm a beginner in this subject. Initially, I'm an Embedded System Developpers but I never worked with image recognition.
Let me expose my main goal:
I would like to create my own database of Logos and be able to
recognize them in a larger image. Typical application would be, for
example, to make a database of pepsi logos and coca-cola logos and
when I take a photo of a bottle of Soda, it tells me if it one of
them or an another.
So, here is my problem:
I first wanted to use the Auto ML Kit of Google. I gave him my
databases so it could train itself on it. My first attempt was to
take photos of bottle entirely and then compare. It was ok but not
too efficient. I then tried to give him only logos but after
training, it couldnt recognize anything in the whole image of a
I think I didn't give enough images in the first case. But I'd prefer to use the second case (by giving only logo) so that the machine would search something similar in the image.
Finally, my questions:
If you've worked with ML Kit from Google, were you able to train a
model by giving images that should be recognized in a larger image?
If yes, do you have any hints to give me?
Do you know reliable software that could help me to perform tests of this kind? I thought about Azure Machine Learning Studio from
Microsoft (since I develop on Visual Studio).
In a first time, I'd like to code as few as I can just for testing. Maybe later I could try to code my own Machine Learning System but I think it's a big challenge.
I also thought that I would need to split my image in smaller image and then send each of this images into the Machine but it would be time consuming and I need a fast reaction (like < 2 seconds).
Thanks in advance for your answer. I don't need complete answer with full tutorial (Stack Overflow is not intended for that anyway ^^) but just some advices would already be good.
Have a good day!
Azure’s Custom Vision is great for this:
Let’s say you want to detect a pepsi logo. Upload 70 images of products with the logo on them. Use Custom Vision to draw a box around the logo for each photo. Click “train”, and you get a tensorflow model with code.
Look up any tutorial for it, it’s pretty incredible and really easy to use.

How to detect an image in a news paper and play a video relevant to it using augmented reality?

I have planned to detect an image in a news paper play the video relevant to it. I have seen several news paper reading AR apps include this feature. But i couldn't find how to do so. How can I do it??
I dont expect any code. But like to know what are the steps I should follow to do this. Thank you.
You need to browse through the available marker-based AR SDKs - such SDKs let you defined in advance the database of images you would like to detect and respond to, and once any of these images is detected during runtime, you get some kind of an event with data on the detected image.
Vuforia is considered a good one and it has good samples, so it is supposed to be easier to start with. You should also check out Kudan, and there are more.

Image and logo recognition

I need to achieve this goal : recognize a specific logo painted on a wall with the camera of the iPhone. I'd like to have a sort of database with N logos that the app should be able to recognize. Could you suggest me some useful libraries (premium or free) designed to do this?
I would suggest the Watson Visual Recognition API for this task. You could train a custom classifier for each of the logos you want to recognize. There is a demo here: and docs here:
Pricing info is here - (there is a free plan)

multiple choice test mark reader - where to start?

I was assigned a project (in school) for automated multiple choice test scoring and I do not know where to start.
I think his is a kind of popular program and you already know about it. Enter an image file scanned of the answer sheet and return results.
Everything I know about computer vision is a few examples of photo editing with OpenCV. I hope you can give me a few keywords related to the problem or maybe a couple of blog articles, documents and related libraries.
Is there any free open source programs that I can refer to?
Edit: Add 2 example of the answer sheet (sory that I cannot find a sheet in English):
I think there are basically two steps to the problem
bring the form into a normalized position
now you know where the boxes are and can look at them by thresholding the gray values in that region.
What methods to use for step 1 depends on your actual images and how much the vary. Do you have some example images you can upload?
Also I think it is a good idea, especially if you are a beginner, to start with some simple examples and work your way up from there by adding more and more variation.

Computer Vision Website Image Slideshow help

I'm making an online display of the output of a computer vision algorithm. After running the algorithm I am left with a folder of about 1000 16 bit .tiff files. I need to put those on the website in a list for so that the researchers can click through and find the list. Also there needs to be an image frame with an "animated gif" feel that can be started stopped and played in reverse. Any ideas on the best way to do this? What language to use? I made a simple website in Ruby on Rails but I don't know if it has the capabilities to do what I require.
ImageMagick is the answer to both parts of your question. Here's a tutorial on how to make an animated gif with it.
