Google Cloud ML: I want to recognize Korea license plate - machine-learning

I want to recognize Korea license plate.
So, I tried to predicted South Korean license plates using Google Cloud ML.
But, Fail to predicted. Google Cloud ML not recognize Korean language part.
How do I train to recognize the Korean part?
The final goal is save Korean license plates using OCR.

I found the answer myself.
{
"requests": [
{
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
],
"image": {
"source": {
"imageUri": "http://t1.daumcdn.net/liveboard/mrpic/fac5cab6a8bb4ea2b3447cc01bd8b097.JPG"
}
},
"imageContext": {
"languageHints": [
"ko"
]
}
}
]
}

Related

Getting explainable predictions from Vertex AI for custom trained model

I've created a custom docker container to deploy my model on Vertex AI (the model uses LightGBM, so I can't use the pre-built containers provided by Vertex AI for TF/SKL/XGBoost).
I'm getting errors while trying to get explainable predictions from the model (I deployed the model and normal predictions are working just fine). I have gone through the official Vertex AI guide(s) for getting predictions/explanations, and also tried different ways of configuring the explanation parameters and metadata, but it's still not working. The errors are not very informative, and this is what they look like:
400 {"error": "Unable to explain the requested instance(s) because: Invalid response from prediction server - the response field predictions is missing. Response: {'error': '400 Bad Request: The browser (or proxy) sent a request that this server could not understand.'}"}
This notebook from Vertex provided by Google has some examples on how to configure the explanation parameters and metadata for models trained with different frameworks. I'm trying something similar.
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage4/get_started_with_vertex_xai.ipynb
The model is a classifier that takes tabular input with 5 features (2 string, 3 numeric), and output value from model.predict() is 0/1 for each input instance. My custom container returns predictions in this format:
# Input for prediction
raw_input = request.get_json()
input = raw_input['instances']
df = pd.DataFrame(input, columns = ['A', 'B', 'C', 'D', 'E'])
# Prediction from model
predictions = model.predict(df).tolist()
response = jsonify({"predictions": predictions})
return response
This is how I am configuring the explanation parameters and metadata for the model:
# Explanation parameters
PARAMETERS = {"sampled_shapley_attribution": {"path_count": 10}}
exp_parameters = aip.explain.ExplanationParameters(PARAMETERS)
# Explanation metadata (this is probably the part that is causing the errors)
COLUMNS = ["A", "B", "C", "D", "E"]
exp_metadata = aip.explain.ExplanationMetadata(
inputs={
"features": {"index_feature_mapping": COLUMNS, "encoding": "BAG_OF_FEATURES"}
},
outputs={"predictions": {}},
)
For getting predictions/explanations, I tried using the below format, among others:
instance_1 = {"A": <value>, "B": <>, "C": <>, "D": <>, "E": <>}
instance_2 = {"A": <value>, "B": <>, "C": <>, "D": <>, "E": <>}
inputs = [instance_1, instance_2]
predictions = endpoint.predict(instances=inputs) # Works fine
explanations = endpoint.explain(instances=inputs) # Returns error
Could you please suggest me how to correctly configure the explanation metadata, or provide input in the right format to the explain API, to get explanations from Vertex AI? I have tried many different formats, but nothing is working so far. :(

Netlogo: create small-world network while running

I'm trying to generate a small-world type of network (https://en.wikipedia.org/wiki/Small-world_network) in my Netlogo model which is created throughout the model itself; people get to know one another while the model is running.
I know how to generate a small world model in Netlogo in the setup. But how do you generate a small world network on the go?
My code for generating a small world during the setup is as follows.
breed [interlinks interlink] ;links between different breeds
breed [intralinks intralink] ; links between same breeds
to set_sw_network
ask turtles[
let max-who 1 + max [who] of turtles
let sorted sort ([who] of turtles)
foreach sorted [ x ->
ask turtle x [
let i 1
repeat same_degree + dif_degree [
ifelse [breed] of self = [breed] of turtle (( x + i ) mod max-who)
[create-intralink-with turtle (( x + i ) mod max-who)]
[create-interlink-with turtle (( x + i) mod max-who)]
set i i + 1
]
]
]
repeat round (rewire_prop * number_of_members) [ ;rewire_prop is a slider 0 - 1 with steps of 0.1
ask one-of turtles [
ask one-of my-links [die]
create-intralink-with one-of other turtles with [link-with self = nobody]
]
]
]
end
But, I am not interested in creating a small world at the beginning. I'm interested in creating a network with small world properties throughout the model. Currently, I do have this on the go create-link feature in my model, but I'm not sure how to tweak it so it results in a small world type of network:
to select_interaction:
ommitted code: sorts pre-existing links and interacts with them
if count my-links < my_degree
[
repeat number_of_interactions_per_meeting
[
let a select_turtle ;delivers a turtle with link to self = nobody
if a != nobody
[
ifelse [breed] of a = [breed] of myself
[
create-intralink-with a
[
set color cyan
interact
]
]
[
create-interlink-with a
[
set color orange + 2
interact
]
]
]
]
]
end
At the moment, my strategy is to give every turtle a variable for my_degree that is based on the distribution of the given social network. But the question remains, if this is a good strategy at all, then what is the correct distribution for a small world network?
pseudo-code for this strategy:
to setup-turtles
If preferential attachment: set my_degree random-poisson 'mean'
If small world: set my_degree ????? 'mean'
end
Any insight would be wonderful.

Splitting complex PDF files using Watson Document Conversion Service

We are implementing Question & Answering System using Watson Discovery Service(WDS). We required each answer unit available in single document. We have complex PDF files as corpus. The PDF files contains two column data, tables and images. Instead ingesting whole PDF files as corpus to WDS and using passage retrieval we are using Watson Document Conversion Service(WDC) to split each PDF file into answer units and later we are ingesting there answer units into WDS.
We are facing two issues with Watson Document Conversion service for complex PDF splitting.
We are expecting each heading as title and corresponding text as data(answer). However it is splitting each chapter as single answer unit. Is there any way to split the two column document based on the heading?
In case the input PDF file contains table the document conversion service reading structured data available in PDF file as simple text(missing table formatting). Is there any way to read structured data from PDF to answer unit?
I would recommend that you first convert your PDF to normalized HTML by using this setting:
"conversion_target": "normalized_html"
and inspect the generated HTML. Look for the places where headings (<h1>, <h2>, ..., <h6>) are detected. Those are the tags that will be used to split by answer units when you switch back to answer_units.
The reason you are currently seeing each chapter being split as an answer unit is because each chapter probably starts with a heading, but no headings are detected within each chapter.
In order to generate more answer units, you will need to tweak the PDF input configurations as described here, so that more headings are generated from the PDF to HTML conversion step and hence more answer units are generated.
For example, the following configuration will detect headings at 6 different levels, based on certain font characteristics for each level:
{
"conversion_target": "normalized_html",
"pdf": {
"heading": {
"fonts": [
{"level": 1, "min_size": 24},
{"level": 2, "min_size": 18, "max_size": 23, "bold": true},
{"level": 3, "min_size": 14, "max_size": 17, "italic": false},
{"level": 4, "min_size": 12, "max_size": 13, "name": "Times New Roman"},
{"level": 5, "min_size": 10, "max_size": 12, "bold": true},
{"level": 6, "min_size": 9, "max_size": 10, "bold": true}
]
}
}
}
You can start with a configuration like this and keep tweaking it until the produced normalized HTML contains the headings at the places that you expect the answer units to be. Then, take the tweaked configuration, switch to answer_units and put it all together:
{
"conversion_target": "answer_units",
"answer_units": {
"selector_tags": ["h1", "h2", "h3", "h4", "h5", "h6"]
},
"pdf": {
"heading": {
"fonts": [
{"level": 1, "min_size": 24},
{"level": 2, "min_size": 18, "max_size": 23, "bold": true},
{"level": 3, "min_size": 14, "max_size": 17, "italic": false},
{"level": 4, "min_size": 12, "max_size": 13, "name": "Times New Roman"},
{"level": 5, "min_size": 10, "max_size": 12, "bold": true},
{"level": 6, "min_size": 9, "max_size": 10, "bold": true}
]
}
}
}
Regarding your second question about tables, unfortunately there is no way to convert table content into answer units. As explained above, answer unit generation is based on heading detection. That being said, if there is a table between two detected headings, that table will be part of the answer unit as any other content between the two headings.

PYTHON: Memory Error - MultinomialNB.partial_fit() - 17k classes

Hi i am new to Python SKLearn and ML in general. Im encountering a Memory Error when using MultinomialNB partial fit, Im trying to do Multi Label Classification on the DMOZ directory data.
My questions:
What am i doing wrong? Is it my lack of memory or is the data wrong?
Am i using the right approach ?
Anything i can do to improve my appraoch ?
Approach:
Store DMOZ DB directories into MongoDB/TokuMX
{
"_id": {
"$oid": "54e758c91d41c804d8ace196"
},
"docs": [
{
"url": "http://www.awn.com/",
"description": "Provides information resources to the international animation community. Features include searchable database archives, monthly magazine, web animation guide, the Animation Village, discussion forums and other useful resources.",
"title": "Animation World Network"
}
],
"labels": [
"Top",
"Arts",
"Animation"
]
}
Itterate over the docs array and pass docs elements into my classifier function.
Vectorizer and Classifier
classifier = MultinomialNB()
vectorizer = HashingVectorizer(
stop_words='english',
strip_accents='unicode',
norm='l2'
)
My classifier function
def classify(doc, labels, classifier, vectorizer, *args):
r = requests.get(doc['url'], verify=False)
print "Retrieving URL = {0}\n".format(doc['url'])
if r.status_code == 200:
html = lxml.html.fromstring(r.text)
doc['content'] = []
tags = ['font', 'td', 'h1', 'h2', 'h3', 'p', 'title']
for tag in tags:
for x in html.xpath('//'+tag):
try:
bag_of_words = nltk.word_tokenize(x.text_content())
pos_tagged = nltk.pos_tag(bag_of_words)
for word, pos in pos_tagged:
if pos[:2] == 'NN':
doc['content'].append(word)
except AttributeError as e:
print e
x_train = vectorizer.fit_transform(doc['content'])
#if we are the first one to run partial_fit, pass all classes
if len(args) == 1:
classifier.partial_fit(x_train, labels, classes=args[0])
else:
classifier.partial_fit(x_train, labels)
return doc
X: doc['content'] consists of a array with NOUNS. (600)
Y: labels consists of a array with labels inside the mongo document showed above. (3)
Classes args[0] consists of array with all the (UNIQUE)labels in the database. ( 17490)
Running inside VirtualBox on a Quadcore laptop with 4gb ram assigned to VM.
What are the 17490 unique labels? There will be one coefficient for each label and each feature, which is likely where your memory error comes from.

GeoJSON Coordinates?

I have a GeoJSON file that I am trying to process in order to draw some features on top of google maps. The problem, however, is that the coordinates are not in the conventional latitude/longitude representation, but rather some large six/seven figure numbers. Example:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"id": 0,
"properties": {
"OBJECTID": 1,
"YR_BUILT": 1950.0
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
772796.724674999713898,
2960766.746374994516373
],
[
772770.131549999117851,
2960764.944537505507469
],
[
772794.544237494468689,
2960796.93857
],
[
772810.48285000026226,
2960784.77685
],
[
772796.724674999713898,
2960766.746374994516373
]
]
]
}
},
.....
]
}
I have been reading about the different coordinates systems, but being new to this I have not reached any where. Ideas?
If your coordinate source is in the United States, most likely the coordinate system is some variation of State Plane or UTM. Otherwise, it's some other coordinate system that works best for the country of origin. There are literally thousands of coordinate systems, and it can be difficult to guess which you have based on just the coordinates.
You'll need to find out from the data provider what the coordinate system is, and then use an API in your programming language of choice to reproject the points. proj4 is a popular one, with bindings in many languages, and it has a port to Javascript called proj4js.

Resources