GeoJSON Coordinates? - geolocation

I have a GeoJSON file that I am trying to process in order to draw some features on top of google maps. The problem, however, is that the coordinates are not in the conventional latitude/longitude representation, but rather some large six/seven figure numbers. Example:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"id": 0,
"properties": {
"OBJECTID": 1,
"YR_BUILT": 1950.0
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
772796.724674999713898,
2960766.746374994516373
],
[
772770.131549999117851,
2960764.944537505507469
],
[
772794.544237494468689,
2960796.93857
],
[
772810.48285000026226,
2960784.77685
],
[
772796.724674999713898,
2960766.746374994516373
]
]
]
}
},
.....
]
}
I have been reading about the different coordinates systems, but being new to this I have not reached any where. Ideas?

If your coordinate source is in the United States, most likely the coordinate system is some variation of State Plane or UTM. Otherwise, it's some other coordinate system that works best for the country of origin. There are literally thousands of coordinate systems, and it can be difficult to guess which you have based on just the coordinates.
You'll need to find out from the data provider what the coordinate system is, and then use an API in your programming language of choice to reproject the points. proj4 is a popular one, with bindings in many languages, and it has a port to Javascript called proj4js.

Related

Google Cloud ML: I want to recognize Korea license plate

I want to recognize Korea license plate.
So, I tried to predicted South Korean license plates using Google Cloud ML.
But, Fail to predicted. Google Cloud ML not recognize Korean language part.
How do I train to recognize the Korean part?
The final goal is save Korean license plates using OCR.
I found the answer myself.
{
"requests": [
{
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
],
"image": {
"source": {
"imageUri": "http://t1.daumcdn.net/liveboard/mrpic/fac5cab6a8bb4ea2b3447cc01bd8b097.JPG"
}
},
"imageContext": {
"languageHints": [
"ko"
]
}
}
]
}

Netlogo: create small-world network while running

I'm trying to generate a small-world type of network (https://en.wikipedia.org/wiki/Small-world_network) in my Netlogo model which is created throughout the model itself; people get to know one another while the model is running.
I know how to generate a small world model in Netlogo in the setup. But how do you generate a small world network on the go?
My code for generating a small world during the setup is as follows.
breed [interlinks interlink] ;links between different breeds
breed [intralinks intralink] ; links between same breeds
to set_sw_network
ask turtles[
let max-who 1 + max [who] of turtles
let sorted sort ([who] of turtles)
foreach sorted [ x ->
ask turtle x [
let i 1
repeat same_degree + dif_degree [
ifelse [breed] of self = [breed] of turtle (( x + i ) mod max-who)
[create-intralink-with turtle (( x + i ) mod max-who)]
[create-interlink-with turtle (( x + i) mod max-who)]
set i i + 1
]
]
]
repeat round (rewire_prop * number_of_members) [ ;rewire_prop is a slider 0 - 1 with steps of 0.1
ask one-of turtles [
ask one-of my-links [die]
create-intralink-with one-of other turtles with [link-with self = nobody]
]
]
]
end
But, I am not interested in creating a small world at the beginning. I'm interested in creating a network with small world properties throughout the model. Currently, I do have this on the go create-link feature in my model, but I'm not sure how to tweak it so it results in a small world type of network:
to select_interaction:
ommitted code: sorts pre-existing links and interacts with them
if count my-links < my_degree
[
repeat number_of_interactions_per_meeting
[
let a select_turtle ;delivers a turtle with link to self = nobody
if a != nobody
[
ifelse [breed] of a = [breed] of myself
[
create-intralink-with a
[
set color cyan
interact
]
]
[
create-interlink-with a
[
set color orange + 2
interact
]
]
]
]
]
end
At the moment, my strategy is to give every turtle a variable for my_degree that is based on the distribution of the given social network. But the question remains, if this is a good strategy at all, then what is the correct distribution for a small world network?
pseudo-code for this strategy:
to setup-turtles
If preferential attachment: set my_degree random-poisson 'mean'
If small world: set my_degree ????? 'mean'
end
Any insight would be wonderful.

Splitting complex PDF files using Watson Document Conversion Service

We are implementing Question & Answering System using Watson Discovery Service(WDS). We required each answer unit available in single document. We have complex PDF files as corpus. The PDF files contains two column data, tables and images. Instead ingesting whole PDF files as corpus to WDS and using passage retrieval we are using Watson Document Conversion Service(WDC) to split each PDF file into answer units and later we are ingesting there answer units into WDS.
We are facing two issues with Watson Document Conversion service for complex PDF splitting.
We are expecting each heading as title and corresponding text as data(answer). However it is splitting each chapter as single answer unit. Is there any way to split the two column document based on the heading?
In case the input PDF file contains table the document conversion service reading structured data available in PDF file as simple text(missing table formatting). Is there any way to read structured data from PDF to answer unit?
I would recommend that you first convert your PDF to normalized HTML by using this setting:
"conversion_target": "normalized_html"
and inspect the generated HTML. Look for the places where headings (<h1>, <h2>, ..., <h6>) are detected. Those are the tags that will be used to split by answer units when you switch back to answer_units.
The reason you are currently seeing each chapter being split as an answer unit is because each chapter probably starts with a heading, but no headings are detected within each chapter.
In order to generate more answer units, you will need to tweak the PDF input configurations as described here, so that more headings are generated from the PDF to HTML conversion step and hence more answer units are generated.
For example, the following configuration will detect headings at 6 different levels, based on certain font characteristics for each level:
{
"conversion_target": "normalized_html",
"pdf": {
"heading": {
"fonts": [
{"level": 1, "min_size": 24},
{"level": 2, "min_size": 18, "max_size": 23, "bold": true},
{"level": 3, "min_size": 14, "max_size": 17, "italic": false},
{"level": 4, "min_size": 12, "max_size": 13, "name": "Times New Roman"},
{"level": 5, "min_size": 10, "max_size": 12, "bold": true},
{"level": 6, "min_size": 9, "max_size": 10, "bold": true}
]
}
}
}
You can start with a configuration like this and keep tweaking it until the produced normalized HTML contains the headings at the places that you expect the answer units to be. Then, take the tweaked configuration, switch to answer_units and put it all together:
{
"conversion_target": "answer_units",
"answer_units": {
"selector_tags": ["h1", "h2", "h3", "h4", "h5", "h6"]
},
"pdf": {
"heading": {
"fonts": [
{"level": 1, "min_size": 24},
{"level": 2, "min_size": 18, "max_size": 23, "bold": true},
{"level": 3, "min_size": 14, "max_size": 17, "italic": false},
{"level": 4, "min_size": 12, "max_size": 13, "name": "Times New Roman"},
{"level": 5, "min_size": 10, "max_size": 12, "bold": true},
{"level": 6, "min_size": 9, "max_size": 10, "bold": true}
]
}
}
}
Regarding your second question about tables, unfortunately there is no way to convert table content into answer units. As explained above, answer unit generation is based on heading detection. That being said, if there is a table between two detected headings, that table will be part of the answer unit as any other content between the two headings.

creating a specific path in Netlogo for turtles to follow

I am creating a Netlogo model about a zoo. I need my zoo guests (multiple turtles) to follow a circular pathway that starts at the entrance of the zoo every 24 ticks (1 tick is 1 hour in my model). It has to move around cages that hold animals because I cannot have my guests enter the areas for animals. The path doesn't have to be fast or the shortest, I just need the turtle not to stray from it. I would prefer not to use GIS to create a pathway.
My world's dimensions are -30 to 30 in both directions and does not wrap around.
The whereabouts of the cages are described below:
patches-own [ tigerhabitat?
flamingohabitat?
monkeyhabitat?
hippohabitat?
giraffehabitat?
]
to create-habitats
ask patches with [ pxcor < -12 and pycor > 23 ]
[ set tigerhabitat? true
set pcolor green ]
ask patches with [ pxcor > 20 and pycor > 20 ]
[ set hippohabitat? true
set pcolor blue ]
ask patches with [ pxcor > 18 and pycor < 15 and -1 < pycor ]
[ set flamingohabitat? true
set pcolor 96 ]
ask patches with [ pxcor > -10 and pxcor < 10 and pycor < 10 and -10 < pycor ]
[ set monkeyhabitat? true
set pcolor green ]
ask patches with [ pxcor < -12 and pycor < -20 ]
[ set giraffehabitat? true
set pcolor 67 ]
end
Paula- from your comment I think I understand a little better, thanks. One simple way to control where turtles can move is to use logical operators to exclude patches that they "consider" as they walk along. For a basic (non-path, yet) version of what you want, you could tell turtles that they can only move on patches that are not cages. You can set up a patch-only variable that explicitly says if a patch is caged or not, but in your example above all non-cage patches are black- you can use that to tell turtles that they should only walk onto a path if it is black. For example, you could add the procedures below to your code as above:
to setup
ca
reset-ticks
crt 10 [
setxy -25 0
]
create-habitats
end
to go
exclude-cage-walk
tick
end
to exclude-cage-walk
ask turtles [
rt random 30 - 15
let target one-of patches in-cone 1.5 180 with [ pcolor = black ]
if target != nobody [
face target
move-to target
]
]
end
You can see that before moving forward, each turtle assesses whether or not the patch it has chosen to move-to is black, and if it is not black, the turtle will not move there. Of course, you would have to modify this to suit your needs and have the turtles walk in a one-directional circuit, but it is a simple way to constrain turtle movement.

PYTHON: Memory Error - MultinomialNB.partial_fit() - 17k classes

Hi i am new to Python SKLearn and ML in general. Im encountering a Memory Error when using MultinomialNB partial fit, Im trying to do Multi Label Classification on the DMOZ directory data.
My questions:
What am i doing wrong? Is it my lack of memory or is the data wrong?
Am i using the right approach ?
Anything i can do to improve my appraoch ?
Approach:
Store DMOZ DB directories into MongoDB/TokuMX
{
"_id": {
"$oid": "54e758c91d41c804d8ace196"
},
"docs": [
{
"url": "http://www.awn.com/",
"description": "Provides information resources to the international animation community. Features include searchable database archives, monthly magazine, web animation guide, the Animation Village, discussion forums and other useful resources.",
"title": "Animation World Network"
}
],
"labels": [
"Top",
"Arts",
"Animation"
]
}
Itterate over the docs array and pass docs elements into my classifier function.
Vectorizer and Classifier
classifier = MultinomialNB()
vectorizer = HashingVectorizer(
stop_words='english',
strip_accents='unicode',
norm='l2'
)
My classifier function
def classify(doc, labels, classifier, vectorizer, *args):
r = requests.get(doc['url'], verify=False)
print "Retrieving URL = {0}\n".format(doc['url'])
if r.status_code == 200:
html = lxml.html.fromstring(r.text)
doc['content'] = []
tags = ['font', 'td', 'h1', 'h2', 'h3', 'p', 'title']
for tag in tags:
for x in html.xpath('//'+tag):
try:
bag_of_words = nltk.word_tokenize(x.text_content())
pos_tagged = nltk.pos_tag(bag_of_words)
for word, pos in pos_tagged:
if pos[:2] == 'NN':
doc['content'].append(word)
except AttributeError as e:
print e
x_train = vectorizer.fit_transform(doc['content'])
#if we are the first one to run partial_fit, pass all classes
if len(args) == 1:
classifier.partial_fit(x_train, labels, classes=args[0])
else:
classifier.partial_fit(x_train, labels)
return doc
X: doc['content'] consists of a array with NOUNS. (600)
Y: labels consists of a array with labels inside the mongo document showed above. (3)
Classes args[0] consists of array with all the (UNIQUE)labels in the database. ( 17490)
Running inside VirtualBox on a Quadcore laptop with 4gb ram assigned to VM.
What are the 17490 unique labels? There will be one coefficient for each label and each feature, which is likely where your memory error comes from.

Resources