Google Cloud Vision API Object Localization - number of recognized objects - image-processing

I'm using the following python code to detect License Plates with Cloud Vision API.
response = client.annotate_image({
'image': {'source': {'image_uri': uri}},
'features': [
{'max_results': 1000, 'type_': vision.Feature.Type.TEXT_DETECTION},
{'max_results': 1000, 'type_': vision.Feature.Type.OBJECT_LOCALIZATION},
],
})
lo_annotations = response.localized_object_annotations
for obj in lo_annotations:
print('\n{} (confidence: {})'.format(obj.name, obj.score))
print('Normalized bounding polygon vertices: ')
for vertex in obj.bounding_poly.normalized_vertices:
print(' - ({}, {})'.format(vertex.x, vertex.y))
If i use an image showing more cars, buildings, persons etc. I get about 4-7 objects recognized. The recognized objects are the bigger ones in the scene like "Car", "Car", "Building", "Building", "Person"
If I snip out just one car from this image and do the Object Localization with this new image I get objects like "Car", "Tire", "Tire", "License plate" which is perfect - because the plate gets recogized and listed.
So it seems the Object Localization algorithm picks out some prominent objects from the image and ignores smaller or less prominent objects.
But in my case I need to localize all license plates in the image. Is there a way to get the used Model to list all license plates in the image or more objects than just the most prominent ones?
Otherwise what would then be the right approach to get all plates out of an image - do I have to train a custom model?

Vision API is a pre-trained image detection service provided by Google that could perform basic image detection like detecting of text, objects, etc. hence the observation you have mentioned about the API where it usually detects prominent objects in the image.
What I could suggest is, if the objects in your images usually appear in a specific area in the image (eg. objects appear at the lower half) you can pre-process the image by cropping it using python libraries like PIL and OpenCV before using Vision API to detect license plates. Or detect the objects, get the coordinates per object, use coordinates as input to crop specific objects then use Vision API to detect license plates.
And also as you have mentioned, as an alternative you can always create a custom model to detect license plates if you are not satisfied with the results of Vision API. With a custom model, you have more freedom on how to tweak the model to increase the accuracy to detect license plates.

Related

Graph-based representation for land borders

I'm trying to get the 2D vectors from a set of countries. I've built my graph by the following process (see the picture):
each node represents a country
each edge represents the land border between 2 countries (or nodes)
I'm using Node2vec library to manage it but results are not relevant.
countries = [
"France", "Andorra",
"Spain", "Italy", "Switzerland",
"Germany", "Portugal"
]
crossing_borders = [
("France", "Andorra"),
("France", "Spain"),
("Andorra", "Spain"),
("France", "Italy"),
("France", "Switzerland"),
("Italy", "Switzerland"),
("Switzerland", "Italy"),
("Switzerland", "Germany"),
("France", "Germany"),
("Spain", "Portugal")
]
graph.add_nodes_from(countries)
graph.add_edges_from(crossing_borders)
# Generate walks
node2vec = Node2Vec(graph, dimensions=2, walk_length=2, num_walks=50)
# Learn embeddings
model = node2vec.fit(window=1)
I would like to get countries which are sharing the land border closer each other. As below, Spain is too far from France. I only considered direct border that's why walk-length = 2.
Do you have any idea that would fit my problem ?
If I understand correctly, Node2Vec is basd on word2Vec, and thus like word2vec, requires a large amount of varied training data, and shows useful results when learning dense high-dimensional vectors per entity.
A mere 7 'words' (country-nodes) with a mere 10 'sentences' of 2 words each (edge-pairs) thus isn't expecially likely to do anything useful. (It wouldn't in word2vec.)
These countries literally are regions on a sphere. A sphere's surface can be mapped to a 2-D plane - hence, 'maps'. If you just want a 2-D vector for each country, which reflects their relative border/distance relationships, why not just lay your 2-D coordinates over an actual map large enough to show all the countries, and treat each country as its 'geographical center' point?
Or more formally: translate the x-longitude/y-latitude of each country's geographical center into whatever origin-point/scale you need.
If this simple, physically-grounded approach is inadequate, then being explicit about why it's inadequate might suggest next steps. Something that's an incremental transformation of those starting points to meet whatever extra constraints you want may be the best solution.
For example, if your not-yet-stated formal goal is that "every country-pair with an actual border should be closer than any country-pair without a border", then you could write code to check that, list any deviations, and try to 'nudge' the deviations to be more compliant with that constraint. (It might not be satisfiable; I'm not sure. And if you added other constraints, like "any country pair with just 1 country between them should be closer than any country pair with 2 countries between them", satisfying them all at once could become harder.)
Ultimately, next steps may depend on exactly why you want these per-country vectors.
Another thing worth checking out might be the algorithms behind 'force-directed graphs'. There, after specifying a graph's desired edges/edge-lengths, and some other parameters, a physics-inspired simulation will arrive at some 2-d layout that tries to satisfy the inputs. See for example from the JS world:
https://github.com/d3/d3-force

What is the difference between data association and feature matching in SLAM/VO?

I have read a little bit about it and saw for Instance the terms used interchangably or that feature matching is part of data association. In "An Overview to Visual Odometry and Visual SLAM: Applications to Mobile Robotics" by Yousif et. al. it is said, that "…feature matching is the process of individually extracting features and matching them over multiple frames" but also that "DA is defined as the process of associating a measurement (or feature) to its corresponding previously extracted feature.", but separetes them from each other. Other things i read about weren't that clear but mostly seem to indicate that feature matching is part of DA. Im a little bit confused.
Data Association methods are the ones where we choose how do we find the transformations between two images. There are as follows:
Features points
Image patches around the features -semi dense methods/ semi direct
pixel to pixel - direct/optical flow based methods

Can I get some 3D model file(s) from 3D ultrasound?

do you know anybody if it is possible to get some model file from doctor when he made 3d ultrasound of pregnant woman? I mean something like DICOM (.dcm) file or .stl file or something like that what I can then work with and finaly print with 3D printer.
Thanks a lot.
Quick search for "dicom 3d ultrasound sample" resulted in one that you might be able to use for internal testing. You can get the file from here
Bonjour,
The first problem you will face is the file format.
Because of the way the images are generated, 3D ultrasound data have voxels that are expressed in a spherical system. DICOM (as it stand now) only support voxels in a Cartesian system.
So the manufacturers have a few choices:
They can save the data in proprietary format (ex: Kretzfile for Ge, MVL for Samsung).
They can save the data in private tags inside a DICOM file (Ge, Hitachi, Philips)
They can re-format the voxels to be in Cartesian, but then the data has been transformed and nobody like that. And anyway, since they also need to save the original (untransformed) data, the companies that do offer Cartesian voxels, usually save them in the same way as the original, so they are not saved in normal DICOM tags, but in their proprietary version.
This is why most of the standard software that can do 3D from CT or MR will not be able to cope with the data files.
Then the second problem is the noise. Ultrasound datasets are inherently very noisy! Again standard 3D reconstruction software where designed for CT or MR and have problems with this.
I do have a product that will read most of the 3D ultrasound files and create an STL model directly from the datasets (spherical or Cartesian). It is called baby SliceO (http://www.tomovision.com/products/baby_sliceo.html)
Unfortunately, it is not free, but you can try it without any licenses. Give it a try and let me know if you like it...
Yves

How to get nearby city or state name of a geopoint in water in ios?

I am developing a location-based application in which I need to get nearby location name of any geopoint selected by user. I'm using Google Places API which is working fine for me.
Only problem is the service returns null for geopoints in water. Is there any way that I can retrieve nearby locations for a geopoint in water or ocean?
AFAIK the API has no way to do that.
So, you've got two options, in order of the effort it takes:
When user taps water just throw a dialog saying "Please select a
point on land". Next to no effort and will slightly annoy the user.
Try to find the closest land geopoint yourself and use it to run the API request on
(instead of the original point). Below are some ideas on that.
A good approach can be based on this answer: basically you can get a KML file with land polygons. For performance reasons, you can simplify the polygons to the extent that makes sense for your zoom levels. Now if your point is in one of those polygons -- it's sea. And you can simply iterate over all polygon edges and pick the one that's closest to your point, then pick a point on it - again closest to your point - and do one little epsilon-sized step towards the outside of the polygon to get a land point you can do a geocode request on. Also, the original author suggests you can use Haversine formula to determine neares land point -- I'm not really familiar with the appliance of that one.
The downside is, you have to deal with KML, iterate over a lot of polygons and optimize them (and lose precision doing that, in addition to possible differences between marineregions.org data and Google Places data)
Another cool trick you could try is using Sobel Filter [edge detection] on the visible map fragment to determine where coastline is (although you will get some false positives there), then trace it (as in raster->vector) to get some points and edges to calculate the closest land position with, in a manner similar to the former approach. Here's a clumsy drawing of the idea
For Sobel edge detection, consider GPUImage lib -- they have the filter implemented and it's probably going to work crazy fast since the lib does all the calculations on GPU.
UPD Turns out there's also a service called Koordinates that has coastline data available, check the answer here

Game Terrain Database Model

I am developing a game for the web. The map of this game will be a minimum of 2000km by 2000km. I want to be able to encode elevation and terrain type at some level of granularity - 100m X 100m for example.
For a 2000km by 2000km map storing this information in 100m2 buckets would mean 20000 by 20000 elements or a total of 400,000,000 records in a database.
Is there some other way of storing this type of information?
MORE INFORMATION
The map itself will not ever be displayed in its entirety. Units will be moved on the map in a turn based fashion and the players will get feedback on where they are located and what the local area looks like. Terrain will dictate speed and prohibition of movement.
I guess I am trying to say that the map will be used for the game and not necessarily for a graphical or display purposes.
It depends on how you want to generate your terrain.
For example, you could procedurally generate it all (using interpolation of a low resolution terrain/height map - stored as two "bitmaps" - with random interpolation seeded from the xy coords to ensure that terrain didn't morph), and use minimal storage.
If you wanted areas of terrain that were completely defined, you could store these separately and use them where appropriate, randomly generating the rest.)
If you want completely defined terrain, then you're going to need to look into some kind of compression/streaming technique to only pull terrain you are currently interested in.
I would treat it differently, by separating terrain type and elevation.
Terrain type, I assume, does not change as rapidly as elevation - there are probably sectors of the same type of terrain that stretch over much longer than the lowest level of granularity. I would map those sectors into database records or some kind of hash table, depending on performance, memory and other requirements.
Elevation I would assume is semi-contiuous, as it changes gradually for the most part. I would try to map the values into set of continuous functions (different sets between parts that are not continues, as in sudden change in elevation). For any set of coordinates for which the terrain is the same elevation or can be described by a simple function, you just need to define the range this function covers. This should reduce much the amount of information you need to record to describe the elevation at each point in the terrain.
So basically I would break down the map into different sectors which compose of (x,y) ranges, once for terrain type and once for terrain elevation, and build a hash table for each which can return the appropriate value as needed.
If you want the kind of granularity that you are looking for, then there is no obvious way of doing it.
You could try a 2-dimensional wavelet transform, but that's pretty complex. Something like a Fourier transform would do quite nicely. Plus, you probably wouldn't go about storing the terrain with a one-record-per-piece-of-land way; it makes more sense to have some sort of database field which can store an encoded matrix.
I think the usual solution is to break your domain up into "tiles" of manageable sizes. You'll have to add a little bit of logic to load the appropriate tiles at any given time, but not too bad.
You shouldn't need to access all that info at once--even if each 100m2 bucket occupied a single pixel on the screen, no screen I know of could show 20k x 20k pixels at once.
Also, I wouldn't use a database--look into height mapping--effectively using a black & white image whose pixel values represent heights.
Good luck!
That will be awfully lot of information no matter which way you look at it. 400,000,000 grid cells will take their toll.
I see two ways of going around this. Firstly, since it is a web-based game, you might be able to get a server with a decently sized HDD and store the 400M records in it just as you would normally. Or more likely create some sort of your own storage mechanism for efficiency. Then you would only have to devise a way to access the data efficiently, which could be done by taking into account the fact that you doubtfully will need to use it all at once. ;)
The other way would be some kind of compression. You have to be careful with this though. Most out-of-the-box compression algorithms won't allow you to decompress an arbitrary location in the stream. Perhaps your terrain data has some patterns in it you can use? I doubt it will be completely random. More likely I predict large areas with the same data. Perhaps those can be encoded as such?

Resources