I'm using Googles Vision API to analyze screenshots of error messages from a product of ours. The OCR part is easy with these managed services, but is there any best practices tools to use on the acual text?
More specifically an error screenshot will contain things like product name, product version, version of underlying operating system, if OS is 32 or 64 bit and the actual error message (C# Stacktrace)
So all the text is there from the OCR Scan but since screenshots are taken by user one cannot assume the different info above is in specific areas of the screenshot.
How to go about analyzing this data? Are we talking simple string manipulation and custom domain knowledge (Tried this and it get me pretty far), or is this the job of some sort of machine learning text analysis offered by google/microsoft (or is that overkill)?
So all the text is there from the OCR Scan but since screenshots are taken by user one cannot assume the different info above is in specific areas of the screenshot.
Using simple Template matching find error message window you are looking for in screenshot.
Use Googles Vision API in specific areas relative to position you find in step 1, to obtain specific informations.
Related
I have this timeline from a newspaper produced by my Native American tribe. I was trying to use AWS Textract to produce some kind of table from this. AWS Textract does not recognize any tables in this. So I don't think that will work (perhaps more can happen there if I pay, but it doesn't say so).
Ultimately, I am trying to sift through all the archived newspapers and download all the timelines for all of our election cycles (both "general" and "special advisory") to find number of days between each item in timeline.
Since this is all in the public domain, I see no reason I can't paste a picture of the table here. I will include the download URL for the document as well.
Download URL: Download
I started off by using Foxit Reader on individual documents to find the timelines on Windows.
Then I used a tool 'ocrmypdf' on ubuntu to ensure all these documents are searchable (ocrmypdf --skip-text Notice_of_Special_Election_2023.pdf.pdf ./output/Notice_of_Special_Election_2023.pdf).
Then I just so happened to see an ad for AWS Textract this morning in my Google Newsfeed. Saw how powerful it is. But when I tried it, it didn't actually find these human-readable timelines.
I'm hopefully wondering if any ML tools or even other solutions exist for this type of problem.
I am namely trying to keep my tech knack up to par. I was sick the last two years and this is a fun problem to tackle that I think is pretty fringe.
Our initial use case called for writing an application in Unity3D (write solely in C# and deploy to both iOS and Android simultaneously) that allowed a mobile phone user to hold their camera up to the title of a magazine article, use OCR to read the title, and then we would process that title on the backend to get related stories. Vuforia was far and away the best for this use case because of its fast native character recognition.
After the initial application was demoed a bit, more potential uses came up. Any use case that needed solely A-z characters recognized was easy in Vuforia, but the second it called for number recognition we had to look elsewhere because Vuforia does not support number recognition (now or anywhere in the near future).
Attempted Workarounds:
Google Cloud Vision - works great, but not native and camera images are sometime quite large, so not nearly as fast as we require. Even thought about using the OpenCV Unity asset to identify the numbers and then send multiple much smaller API calls, but still not native and one extra step.
Following instructions from SO to use a .Net wrapper for Tesseract - would probably work great, but after building and trying to bring the external dlls into Unity I receive this error .Net Assembly Not Found (most likely an issue with the version of .Net the dlls were compiled in).
Install Tesseract from source on a server and then create our own API - honestly unclear why we tried this when Google's works so well and is actively maintained.
Has anyone run into this same problem in Unity and ultimately found a good solution?
Vuforia on itself doesn't provide any system to detect numbers, just letters. To solve this problem I followed the next strategy (just for numbers near of a common image):
Recognize the image.
Capture a Screenshot just after the target image is recognized (this screenshot must contain the numbers).
Send the Screenshot to an OCR web-service and get the response.
Extract the numbers from the response.
Use these numbers to do whatever you need and show AR info.
This approach solves this problem, but it doesn't work like a charm. Their success depends on the quality of the screenshot and the OCR service.
Local travel cards in Saint-Petersburg, Russia have got huge id numbers that aren't easy to read and type into a web page when topping up the card online. So I want to build a small app that would take a photo of a travel card and parse the number out.
The task is a bit easier than a free form recognition:
card is of the very well known size
id numbers are of known size, are located in the very well known location on a card and they are number only, no letters (okay, there are two variations I think and maybe they will add 1-2 more in the future)
even the font is known in advance
even the first several numbers are the same for most of the card (so far there are only two prefixes used)
How would you do it? Are there any libraries tuned not for the general OCR, but for a "hinted" OCR like I need?
Best regards,
Artem.
P.S.
Actually a free/cheap web service for this task would also be good enough
Yes Google has a library called Tesseract and there is an iOS SDK on Github you can import into your application. So you can use this SDK and it has some documentation that you can read that will explain how to set it up in your app. It has methods that will return you a string with the text of the card in the string. BUT it will be ALL of the text from the card. So best thing to do would be to:
1 "clip" the original image to extract a sub image that displays only the portion of the card you wish to get the numbers from.
2 Process this sub image through Tesseract to retrieve the string you are looking for.
3 Then parse through the string and pick out the data that you need.
But just be warned, it can be a bit quirky. This SDK tends to recognize words best from images that are scanned, not "taken a picture of". Because although it is an advance piece of technology, it isn't perfect. So to get it to work as perfectly as possible for you, try to get scanned copies of the originals.
Best of luck.
The ideal solution for you would have three components:
1) Detection of the card. This is useful because if you have the detection, then the end users have much easier time actually using the scanner, because they can place the phone above the card in an arbitrary direction
2) Accurate OCR component. Ideally, customizable for this exact font you have on the card, for the exact position on the card.
3) Parsing mechanism. This would enable you to obtain the exact string written on the card without writing huge amount of OCR parsing code.
BlinkID SDK has all this. It has a preset for detection cards in the ID-1 format. It has integrated OCR engine. And it provides RegexParser, where you can define the exact format of the text which you're trying to extract from the document.
BlinkID was initially built for scanning ID documents which have very similar properties as the problem you're trying to solve.
Note. I'm one of the developers working on BlinkID.
i'm sorry, i need your help. i have problem to find unique tecnology (apps, system, or tool) in topic CBIR. do you have any idea unique apps that can be developed using CBIR? i blind and have nothing idea about CBIR. i mean, i have search idea about CBIR, but its too ordinary, and my teacher asked me to find more attractive idea about CBIR apps. search engine image, apps to identified tourism object, that my idea, any other idea from you?
NB : CBIR Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR) is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases (see this survey[1] for a recent scientific overview of the CBIR field). Content-based image retrieval is opposed to traditional concept-based approaches (see Concept-based image indexing).
"Content-based" means that the search analyzes the contents of the image rather than the metadata such as keywords, tags, or descriptions associated with the image. The term "content" in this context might refer to colors, shapes, textures, or any other information that can be derived from the image itself.
for using ordinary methods
https://github.com/dermotte/LIRE you may use this library this is a demo site developed
lire Demo
But if you have enough time and enthusiasm you should look deep learning topics which is all state of art works on the field done on. Forexample you may look Karpathy's NueralTalk on github https://github.com/karpathy/neuraltalk2 and the wonderful demo page
I have been working with Carmen http://carmen.sourceforge.net/ for a while now, and I really like the software but I need to make some changes inside the source code.
I am therefore interesting in some students reports/projects there have been working with Carmen, or any documentation of the source code.
I have been reading the documentation on the webpage for Carmen, but with all respect I think the literature there is a bit outdated and insufficient.
ROS is the new hot navigation toolkit for robotics. It has a professional development group and a very active community. The documentation is okay, but it's the best I've seen for robotic operating systems.
There are a lot of student project teams that are using it.
Check it out at www.ros.org
I'll be more specific on why ROS is awesome...
Built in visualizer/simulator rviz
- It has a record function which will record all of the messages passed out of nodes, this allows you take in a lot of raw data store it in a "ros bag" and then play it back later when you need to test your AI, but want to sit in your bed.
Built in navigation capabilities,
-all you have to do is write the publishers of data for your sensors.
-It has standard messages that you need to fill out so that the stack has enough information.
There is an Extended Kalman Filter which is pretty awesome because I didn't want to write one. Currently implementing it, i'll let you know how that turns out.
It also has built in message levels, by that I mean you can change which severity of print messages are printed during runtime, fairly handy for debugging.
There's a robot monitor node that you can publish the status of your sensors to and it bundles all of that information into a GUI for your viewing pleasure.
There are some basic drivers already written. For example SICK lidars are supported right out of the box.
There is also a built in transform function, to help you move everything to the right coordinate system.
ROS was made to run across multiple computers, but can work on just one.
Data transfer is handled over TCP ports.
I hope that's more helpful.