I am using the Google Cloud Vision API to search similar images (web detection) and it works pretty well. Google detects full matching images and partial matching images (cropped versions).
I am looking for a way to detect more different versions. For example, when I look for a logo, I would like to detect large, small, square, rectangular ... versions of this logo. For now, I detect images that match exactly the one I upload and cropped versions.
Do you know if this is possible and how can I do that?
The Web entities feature supports the detection of exact and partial matches of the image as well as the URLs of visually similar images, as mentioned here; however, keep in mind that these objects need to have certain percentage of similarity with the original image which means that the objects that doesn't satisfy these parameters cannot be detected by the model.
In case this feature doesn't cover your current needs, you can use the Send Feedback button, located at the lower left and upper right corners of the service public documentation, as well as take a look the Issue Tracker tool in order to raise a Vision API feature request and notify to Google about this desired functionality.
Related
I have about 2,000 images of cars, most pointing right, but some pointing left.
I'd like to find a way of automatically tagging a car with it's direction (new images will be coming in continually).
I'm struggling to get started and wondered if this kind of image detection problem has a name that may help my searches. Is object orientation detection a thing?
I'm a software developer (not doing much ML or Image stuff) and have a ton of azure and gcc resources available, but I can't find anything to solve this. Azure Cognitive Service can tell us it's a car in the picture, but doesn't tell us the direction.
Could just do with a good starting point to get going.
Should add, the images are quite clean on white backgrounds, examples:
Thanks to Venkata for commenting, it was a bad dataset causing our issues (too many rights vs left).
Here's what we did to get it all working:
We set up a training and prediction instance in azure (using custom vision cognitive services in our portal).
We then used https://www.customvision.ai/ to set everything up and train the model (it's super simple).
We didn't actually need any left facing images in the end, we just took all the right facing images we had (about 500 in the final instance), we uploaded them all with the tag "Right". We then mirrored all the images with a photoshop script and then uploaded them all again with a "Left" tag. It trained for about 15 minutes and we ended up with a 100% prediction score. We tested it with a load of images that weren't contained in the training set to confirm it was all working.
We then did the same for a ton of van/truck images, these were taken from a different angle (cars were all side profile shots, the vans were all front 3 quarter so we weren't sure if we'd have the same success).
Again, we flipped the images ourselves to create the left images so we only needed to source right facing vans to create the whole model.
We ended up with a 99.8% score, which is totally acceptable for our use case and we can now detect all cars and van directions and it even detects car directions that are front 3 quarters and vans that are in profile (even though we only trained cars in profile and vans in 3 quarter).
The custom vision portal gives you an API endpoint and a key, now when we detect a new image in our system it goes via the API (using the custom image sdk/nuget in our .net site) and we check the tags to see if it needs flipping. If it does, we flip it and save it back to the disk and it's then cached so it doesn't keep hitting the API.
It's pretty amazing, it took us just two days to research the options, pick a provider and then implement the solution in to a production platform. It's probably a simple use case for ML, but 10 years ago (or even 5) we couldn't have dreamed that things would have come along so far.
tldr; If you need to detect if an object in an image is pointing left or right, just grab a lot of right facing examples and then flip them yourself to create a well balanced model. Obviously, this relies on the object looking the same from one side to the other.
i'm sorry, i need your help. i have problem to find unique tecnology (apps, system, or tool) in topic CBIR. do you have any idea unique apps that can be developed using CBIR? i blind and have nothing idea about CBIR. i mean, i have search idea about CBIR, but its too ordinary, and my teacher asked me to find more attractive idea about CBIR apps. search engine image, apps to identified tourism object, that my idea, any other idea from you?
NB : CBIR Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR) is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases (see this survey[1] for a recent scientific overview of the CBIR field). Content-based image retrieval is opposed to traditional concept-based approaches (see Concept-based image indexing).
"Content-based" means that the search analyzes the contents of the image rather than the metadata such as keywords, tags, or descriptions associated with the image. The term "content" in this context might refer to colors, shapes, textures, or any other information that can be derived from the image itself.
for using ordinary methods
https://github.com/dermotte/LIRE you may use this library this is a demo site developed
lire Demo
But if you have enough time and enthusiasm you should look deep learning topics which is all state of art works on the field done on. Forexample you may look Karpathy's NueralTalk on github https://github.com/karpathy/neuraltalk2 and the wonderful demo page
I have developed a part of an iOS application that involves using Facebook's Graph API that accesses a user's photos, and allows the user to crop that image into a square with a desired zoom. The images must be squares. Is there a way to use the Graph API with a given rect parameter so it returns a URL of the desired photo cropped into that rect? I have done some research and it seems like there isn't, but I was hoping for another set of eyes on that.
Assuming that there isn't, what sounds like a better idea:
Uploading the cropped photo to my own servers for future access.
or
Use my own SQL database to store the rect of the cropping and the URL of the photo (hosted by facebook), and then load the full facebook photo and crop it to how I want.
1 offers efficiency when loading data from the internet, but it means storing more data in my own servers (this could get expensive in the future)
2 means I will use less space on my own servers, but also means that the entire photo will be forced to be loaded, even parts that won't be used.
I'm leaning towards 2, but I don't deal too much with web/database work so I was hoping for some advice. Thanks.
Facebook already does some transformations of the pictures. It is not exactly what you want but might be interesting to take a look:
https://developers.facebook.com/docs/graph-api/reference/photo
https://developers.facebook.com/docs/graph-api/reference/platform-image-source/
#CBroe has a good point not to store the urls obtained from Facebook directly. I would try and save the cropped picture on your servers.
First post on SO; hopefully I am doing it right :-)
Have a situation where users need to upload and view very high resolution files (they need to pan, tilt, zoom, and annotate images). A single file sometimes crosses 1 GB so loading complete file on client side is not an option.
We are thinking about letting the users upload files to the server (like everyone does), then apply some encryption on server side creating multiple, relatively small low resolution images with varying sizes. We then give users thumbnails with canvas size option on the webpage for them to pick and start their work.
Lets assume a user opens low grade image with 1280 x 1028 canvas size. Image will be broken into tiles before display, and when user clicks on a title it will be like zooming in to a specific tile. Client will send request to the server asking for higher resolution image for the title. Server will send the image which will be broken into titles again for the user to click and get another higher resolution image from server and so on ... Having multiple images with varying resolution will help us break images into tiles and serve user needs ('keep zooming in' or out using tiles).
Has anyone dealt with humongous image files? Is there a preferred technical design you can suggest? How to handle areas that have been split across tiles is bothering me a lot so not sure how above approach can be modified to address this issue.
We need to plan for 100 to 200 users connected to the website simultaneously, and ours is .NET environment if it matters
Thanks!
The question is a little vague. I assume you are looking for hints, so here are a few:
I see uploading the images is a problem in the firstplace. Where I come from, upload-speeds are way slower than download speeds. (But there is litte you can do if you need your user to upload gigabytes...) Perhaps offer some more stable upload than web. FTP if you must.
Converting in smaller pieces should be no big problem. Use one of the availabe tools. Perhaps imagemagick. I see there is a .net wrapper out: https://magick.codeplex.com/
More than converting alone I think it is important not to do it everytime on the fly (you would need a realy big machine) but only once the image is uploaded. If you want to scale you can outsource this to another box in the network.
For the viewer. This is the interessting part. There are some ready to use ones. Google has one. It's called 'Maps' :). But there is a free alternative: OpenLayers from the OpenStreetmap Project: http://wiki.openstreetmap.org/wiki/OpenLayers All you have to do is naming your generated files in the right way and a litte configuration.
Even if you must for some reasons create the tiles on the fly or can't use something like OpenLayers I would try to stick to its naming scheme. Having something working to start with is never a bad idea.
is it possible to instantiate the use of an image search engine within an app? I have an idea to incorporate image search engines with the pictures that can be taken with the camera and then have the app return info about the picture that is recognized.
Google Goggles, Like.com (formerly Riya) now acquired by Google, Tineye.com are some sites that offer visual search. Not sure they offer an API.
If you want to whip one up, it is as you would expect, no trivial task. AFAIK, there are no OOTB solutions available: especially, considering your use-case of taking an image and getting related information (known in the trade parlance as RST invariant template matching) - and you would need to look into significant investment of time and $.
We offer an image search engine for mobile app cameras - www.iqengines.com.