Is there an annotation tool for instance segmentation on Ipad? - ipad

Is there an annotation tool to produce multi layer (for overlapping objects within the image) and pixel exact image annotations on IPad?
Background to my question:
There a lot of annotation tools for Linux and Windows
(e.g. the ones listed here: https://www.v7labs.com/blog/best-image-annotation-tools
or here: https://humansintheloop.org/10-of-the-best-open-source-annotation-tools-for-computer-vision-2021/)
I haven't tried all of them, but non of them seem to be available for the IPad.
I am using the Ipad to make image annotations because it is faster for me to annotate with a stylus than with a mouse on the PC (also I can do annotations when I am not in the office). Further, most annotation tools feel clunky and too overloaded with bureaucracy (this is only my subjective opinion).
I am currently using Adobe Fresco (sucks only because its not open source and a little expensive), which works well in combination with a small script, that I wrote to convert the .psd files into torch tensors.
My workflow with Fresco is fast and the annotations are very precise. However, I was bashed by a reviewer when submitting a paper mentioning that the annotations were produced with Fresco. The paper was rejected because the reviewer thought annotating images with fresco was ridiculous and that there are supposedly much better alternatives (which he did not mention)... and which I am still too dump to find. Any suggestions?

Related

Direct2D versus Direct3D for digital video rendering

I need to render video from multiple IP cameras into several controls within the client application.
On top of the video, I should be able to add some OSD such as timestamp and camera name.
What I'm trying to do has nothing to do with 3D since we're talking about digital video with some text on it.
Which API is more suitable for this purpose? Direct3D or Direct2D?
Performance should also be a consideration here.
It used to be that Direct2D was a poor choice for Windows Phone (if you care about that system) because it wasn't supported, but Win Phone 8.1 has it now, so less of an issue.
My experience with D2D was that it offered fast, high quality 2D rendering, and I would say it is a good choice.
You might want to take a look at this article on Code Project. That looks appropriate for your purposes.
If you are certain you only need MS system support, then you're all set.
Another way to go would be a cross platform system like nanovg, which offers nice 2D rendering and would work on a Mac. Of course, you'd need to figure out how to do the video part on non windows systems.
Regarding D3D, you could certainly do it that way, but my guess would be it would make some things trickier to do. Don't forget you can combine the two as well...

Easiest way to display text in OpenGL ES 2.0

I'm creating a simple Breakout style game and would like a simple way to display the score.
I've been doing some research and found several ways to do text in OpenGL ES, but most methods look fairly complicated.
This looks like it would do the trick, but I couldn't get it to work.
I've looked into FTGL and FreeType, but they look complicated.
I've also read one can display a UILabel over the EAGLContext, but not sure how that would be in the performance department.
I could probably get any of these options to work, I'm just wondering what the best solution is for this situation.
For simple use cases like you're describing, on even vaguely modern hardware (i.e. iPhone 3GS and later, I think), the compositing penalty for layering UIKit/CoreAnimation content on top of OpenGL ES content is negligible. (You can see this if you run your app in Instruments with the "Color OpenGL ES fast path blue" option turned on.)
They say premature optimization is the root of all evil — it's pretty easy to try UILabel, see if it makes a significant difference to your app's performance, and look into third-party libraries and more complicated solutions only if it does.
(Also, it sounds like you might be trying to manage your own CAEAGLLayer. For common use cases, it's a lot easier to use GLKView, plus GLKViewController for animation.)
I'd recommend checking out the Print3D functionality of the PowerVR SDK's PVRTools framework. Print3D is free to use, cross-platform (iOS, Android, Linux, Windows, OS X etc.) and it efficiently renders text within OpenGL ES 1.x, 2.0 & 3.0 applications. The SDK includes an example application with source that demonstrates how to use the Print3D framework (IntroducingPrint3D).
The PowerVR Graphics SDK can be downloaded for free from Imagination's website: http://www.imgtec.com/powervr/insider/sdkdownloads/index.asp
An overview of the source included in the SDK can be found here: http://www.imgtec.com/powervr/insider/sdkdownloads/learn_more.asp

iOS graphics engines

I am new to iOS programming and am interested in working with images. Basically, I want to be able to obtain the (0,255) and RGB tuples of every pixel in a given image. What would be the best way of doing this? Would I need to use Open GL, or something similar?
Thanks
If you want to work with images, get a copy of Apple's 'Quartz 2D Programming Guide'. If you want even more detailed how-to, get a copy of the "Programming with Quartz" book on Amazon (its says Mac in the title as it predates iOS).
Essentially you are going to take images, draw them into bit map contexts, then determine the rgba layout by querying the image.
If you want to use system resources to assist you in making certain types of changes to images, there is a OSX framework recently moved to iOS called the Accelerate Framework. and it has a lot of functions in it for image manipulation (vImage).
For reading and writing images to the file system look at Apple's 'Image I/O Guide'. For advanced filtering there is Core Image, which allows you to apply filters to images.
EDIT: If you have any interest in really fast accellerated code that uses the GPU to perform some sophisticated filtering, you can checkout Brad Larson's GPU Image project on github.

3D library recommendations for interactive spatial data visualisation?

Our software produces a lot of data that is georeferenced and recorded over time. We are considering ways to improve the visualisation, and showing the (processed) data in a 3D view, given it's georeferenced, seems a good idea.
I am looking for SO's recommendations for what 3D libraries are best to use as a base when building these kind of visualisations in a Delphi- / C++Builder-based Windows application. I'll add a bounty when I can.
The data
Is recorded over time (hours to days) and is GPS-tagged. So, we have a lot of data following a path over time.
Is spatial: it represents real 3D elements of the earth, such as the land, or 3D elements of objects around the earth.
Is high volume: we could have a point cloud, say, of hundreds of thousands to millions of points. Processed data may display as surfaces created from these point clouds.
From that, you can see that an interactive, spatially-based 3D visualisation seems a good approach. I'm envisaging something where you can easily and quickly navigate around in space, and data will load or be generated on the fly depending on what you're looking at. I would prefer we don't try to write our own 3D library from scratch - for something like this, there have to be good existing libraries we can work from.
So, I'm hoping for a library which supports:
good navigation (is the library based on Euler rotations only, for example? Can you 'pick' objects to rotate around or move with easily?);
modern GPUs (shader-only rendering is ok; being able to hook into the pipeline to write shaders that map values to colours and change dynamically would be great - think data values given a colour through a colour lookup table);
dynamic data / objects (data can be added as it's recorded; and if the data volume is too high, we should be able to page things in and out or recreate them, and only show a sensible subset so that whatever the user's viewport is looking at is there onscreen, but other data can be loaded/regenerated, preferably asynchronously, or at least quickly as the user navigates. Obviously data creation is dependent on us, but a library that has hooks for this kind of thing would be great.)
and technologically, works with Delphi / C++Builder and the VCL.
Libraries
There are two main libraries I've considered so far - I'm looking for knowledgeable opinions about these, or for other libraries I haven't considered.
1. FireMonkey
This is Embarcadero's new UI library, which is only available in XE2 and above. Our app is based on the VCL and we'd want to host this in a VCL window; that seems to be officially unsupported but unofficially works fine, or is available through third-parties.
The mix of UI framework and 3D framework with shaders etc sounds great. But I don't know how complex the library is, what support it has for data that's not a simple object like a cube or sphere, and how well-designed it is. That last link has major criticisms of the 3D side of the library - severe enough I am not sure it's worthwhile in its current state at the time of writing for a non-trivial 3D app.
Is it worth trying to write a new visualisation window in our VCL app using FireMonkey?
2. GLScene
GLScene is a well-known 3D OpenGL framework for Delphi. I have never used it myself so have no experience about how it works or is designed. However, I believe it integrates well into VCL windows and supports shaders and modern GPUs. I do not know how its scene graph or navigation work or how well dynamic data can be implemented.
Its feature list specifically mentions some things I'm interested in, such as easy rotation/movement, procedural objects (implying dynamic data is easy to implement), and helper functions for picking. It seems shaders are Cg only (not GLSL or another non-vendor-specific language.) It also supports "polymorphic image support for texturing (allows many formats as well as procedural textures), easily extendable" - that may just mean many image formats, or it may indicate something where the texture can be dynamically changed, such as for dynamic colour mapping.
Where to from here?
These are the only two major 3D libraries I know of for Delphi or C++Builder. Have I missed any? Are there pros and cons I'm not aware of? Do you have any experience using either of these for this kind of purpose, and what pitfalls should we be wary of or features should we know about and use?
We currently use Embarcadero RAD Studio 2010 and most of our software is written in C++. We have small amounts of Delphi and may consider upgrading IDEs, but we are most likely to wait until the 64-bit C++ compiler is released. For that reason, a library that works in RS2010 might be best.
Thanks for your input :) I'm after high-quality answers, so I'll add a bounty when I can!
I have used GLScene in my 3D geomapping software and although it's not used to an extent you're looking for I can vouch that it seems the most appropriate for what you're trying to do.
GLScene supports terrain rendering and adding customizable objects to the scene. Objects can be interacted with and you can create complex 3D models of objects using the various building blocks of GLScene.
Unfortunately I cannot state how it will work with millions of points, but I do know that it is quite optimized and performs great on minimal hardware - that being said - the target PC I found required a dedicated graphics card capable of using OpenGL 2.1 extensions or higher (I found small issues with integrated graphics cards).
The other library I looked at was DXscene - which appears quite similar to GLScene albeit using DirectX instead of OpenGL. From memory this was a commercial product where GLScene was licensed under GPL. (EDIT - the page seems to be down at the moment : http://www.ksdev.com/index.html)
GLScene is still in active development and provides a fairly comprehensive library of functions, base objects and texturing etc. Things like rotation, translation, pitch, roll, turn, ray casting - to name a few - are all provided for you. Visibility culling is provided for each base object as well as viewing cameras, lighting and meshes. Base objects include cubes, spheres, pipes, tetrahedrons, cones, terrain, grids, 3d text, arrows to name a few.
Objects can be picked with the mouse and moved along 1,2 or 3 axes. Helper functions are included to automatically calculate the top-most object the mouse is under. Complex 3D shapes can be built by attaching base objects to other base objects in a hierarchical manner. So, for example, a car could be built using a rectangle as the base object and attaching four cylinders to it for the wheels - then you can manipulate the 'car' as a whole - since the four cylinders are attached to the base rectangle.
The only downside I could bring to your attention is the sometimes limited help/support available to you. Yes, there is a reference manual and a number of demo applications to show you how to do things such as select objects and move them around, however the reference manual is not complete and there is potential to get 'stuck' on how to accomplish a certain task. Forum support is somewhat limited/sparse. If you have a sound knowledge of 3D basics and concepts I'm sure you could nut it out.
As for Firemonkey - I have had no experience with this so I can't comment. I believe this is more targeted at mobile applications with lower hardware requirements so you may have issues with larger data sets.
Here are some other links that you may consider - I have no experience with them:
http://www.truevision3d.com/
http://www.3impact.com/
Game Development in Delphi
The last one is targeted at game development - but may provide useful information.
Have you tried glData? http://gldata.sourceforge.net/
It is old (~2004, Delphi 7), and I have not personally used the library, but some of the output is amazing.
you can use the GLScene or OpenGL they are good 3D rendering and its very easy to use.
Since you are already using georeferenced data, maybe you should consider embedding GoogleEarth in your Delphi application like this? Then you can add data to it as points, paths, or objects.

Open Source way to real-time image processing OCR application? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an application in mind that I want to produce. We have wall-mounted schedule boards that are divided into small rectangles using black lines on a white background. Magnetic name tags are placed into a particular partition to indicate this person is to work in that cell. This system works very well for communication among people, but I would like an automatic way of saving this schedule information into a database automatically.
I am envisioning a system where a camera is set in a fix position focusing on the schedule board. Periodically the camera will take a picture of the board. I want to write some code to decipher which name tags are in which area. This would require some OCR or symbol recognition. There are big numbers on each name tag that I will use to identify the person whose name tag it is.
I naturally go to Python when tackling a new programming problem. I found this post -> python image recognition which looks like a good place to start (with PIL and numpy).
Do you know a good way to do this?
Update: I have tried SimpleCV and it seems good for now.
This is actually a pretty hard problem, even though it looks quite simple. But you can make it a lot easier by doing some stuff to your image to make this manageable. I have the following suggestions:
Try to make it so that your camera is looking straight at the board with a reasonable lens so that there is minimal distortion of the image on the edges, and no perspective distortion.
Given that you'll be shooting the occasional image for analysis I think performance is in no way an issue, so shoot high-resolution images, with a flash or with a long exposure time (because everything you're shooting is stationary) to get the best possible picture quality.
If the number of different tags you expect is not too large you might find it easier to just try to match reference images of these tags in your image through template matching rather than going for full OCR of numbers. This is a lot easier to get working if your image is good enough. The python opencv interface is very complete.
High Performance Mark has a good comment to your question about including barcodes on the tags. I would add the option of QR codes, but that is just the same thing. Both are easy to detect and there are good libraries to help you read them.
If you decide you do need OCR, you should look into available OCR packages and not try to roll your own. Try pytesser for the tesseract engine or the OCRopus python interface.
Since you mentioned that you would like to use Python for this problem, perhaps you could take a look at SimpleCV. It will provides you an easy way to grab the image from the camera and do basic image processing.
I strongly agree with jilles de witt that OCR would be an extremely hard image analysis task to develop from scratch. Code reading would be a better option, but that also will be difficult to program and will require sophisticated or somewhat challenging imaging as others have noted. However, for this app you really do not need to implement OCR or formal bar codes, QR or other 2d codes.
Since your application is constrained to a limited number of targets, perhaps you could make your own simple code. For example, you could place 0 to 4 big dots in a 2x2 array after each person's name. This simple example code uniquely identifies 16 unique tags, and the features will be much easier to image, extract and decode than formal codes. Add a locator line if the code position is not consistent.

Resources