currently I have already made a geojson file (called output):
{"type": "FeatureCollection", "features": [ {"type": "Feature", "geometry": {"type": "Point", "coordinates": [103.815381, 1.279109]}, "properties": {"temperature": 24, "marker-symbol": "park", "marker-color": "#AF4646"}}, {"type": "Feature", "geometry": {"type": "MultiLineString", "coordinates": [[[103.809297, 1.294906], [103.799445, 1.283906], [103.815381, 1.294906]]]}, "properties": {"temperature": 24, "stroke": "#AF4646"}}]}
It contains a multiline string type and a point type. The expected output should be like this (visualised by using geojson.io), where all the properties (e.g. colour of string and the marker, the forest icon of the marker) are all kept:
My goal is to generate an html or an image file (the best choice) of this map. So I turned to folium. However, when I use command:
m = folium.Map(location=[1.2791,103.8154], zoom_start=12)
folium.GeoJson(output, name='test').add_to(m)
m.save('map.html')
The visualisation is like this:
Where all the property information has been wiped out. Are there any way to keep those property information? Thanks.
Provided GeoJSON (output) contains styling properties defined in simplestyle spec which are not supported by leaflet L.geoJSON
leaflet-simplestyle plugin could be utilized which extends L.geoJSON to support the simplestyle spec, here is a an example on how to utilize it in folium
import folium
from folium.elements import JSCSSMixin
from folium.map import Layer
from jinja2 import Template
class StyledGeoJson(JSCSSMixin, Layer):
"""
Creates a GeoJson which supports.
"""
_template = Template(u"""
{% macro script(this, kwargs) %}
var {{ this.get_name() }} = L.geoJson({{ this.data }},
{
useSimpleStyle: true,
useMakiMarkers: true
}
).addTo({{ this._parent.get_name() }});
{% endmacro %}
""")
default_js = [
('leaflet-simplestyle', 'https://unpkg.com/leaflet-simplestyle'),
]
def __init__(self, data,name=None, overlay=True, control=True, show=True):
super(StyledGeoJson, self).__init__(name=name, overlay=overlay,
control=control, show=show)
self._name = 'StyledGeoJson'
self.data = data
Usage
output = {"type": "FeatureCollection", "features": [ {"type": "Feature", "geometry": {"type": "Point", "coordinates": [103.815381, 1.279109]}, "properties": {"temperature": 24, "marker-symbol": "park", "marker-color": "#AF4646"}}, {"type": "Feature", "geometry": {"type": "MultiLineString", "coordinates": [[[103.809297, 1.294906], [103.799445, 1.283906], [103.815381, 1.294906]]]}, "properties": {"temperature": 24, "stroke": "#AF4646"}}]}
m = folium.Map(location=[1.2791,103.8154], zoom_start=14)
StyledGeoJson(output).add_to(m)
m
Result
Related
I want to convert a grib2 file to a geojson with the following format:
{
"type": "FeatureCollection",
"features": [
{ "type": "Feature", "properties": { "ID": 0, "sigwaveht": 1.000000 }, "geometry": { "type": "LineString", "coordinates": [ [ 20.5, 77.559374979743737 ], [ 20.756756711040964, 77.5 ], [ 21.0, 77.426829270065582 ], [ 21.5, 77.426829270065582 ] ] } },
{ "type": "Feature", "properties": { "ID": 1, "sigwaveht": 1.000000 }, "geometry": { "type": "LineString", "coordinates": [ [ 17.5, 76.879518074163784 ], [ 18.0, 76.840000001907356 ], [ 18.555555592348554, 77.0 ], [ 18.555555592348554, 77.5 ] ] } },
{ "type": "Feature", "properties": { "ID": 2, "sigwaveht": 1.000000 }, "geometry": { "type": "LineString", "coordinates": [ [ 28.5, 76.732142838136269 ], [ 29.0, 76.634146323734484 ], [ 29.937500058207661, 77.0 ], [ 29.937500058207661, 77.5 ] ] } },
I can accomplish this by using ogr2ogr2 to convert a shape file to a geojson in this format but what can I do to convert a grib2 to a geoJSON of this format?
You can't convert a GRIB, which is a raster format, to GeoJSON, which is a vector format.
What do you expect to achieve? Vector data composed of points where each point is one of the pixels of the raster format?
If this is what you want, you will probably have to code it yourself, I don't think there are any standard tools to do this. Just make a loop over the raster data pixels and write one point feature for every pixel.
I tried different things with the following web ui
https://schema-registry-ui.landoop.com
I couldn't seem to put the following into the registry:
{
"namespace": "test.avro",
"type": "record",
"name": "test",
"fields": [
{
"name": "field1",
"type": "string"
},
{
"name": "field2",
"type": "record",
"fields":[
{"name": "field1", "type": "string" },
{"name": "field2", "type": "string"},
{"name": "intField", "type": "int"}
]
}
]
}
Also, is there a way to refer to another schema from inside the current one to create a compound/nested schema?
Have a look at the example at
https://github.com/Landoop/schema-registry-ui/issues/43
You need to define schema as an array - with the 1st element the nested record
and as a 2nd element the main avro record
Edit: recreated the logic on jsfiddle https://jsfiddle.net/exLtcgrq/1/
I am trying to parse a simple GeoJSON file to D3 using the D3 V4 API.
My GeoJSON is simple:
{ "type": "FeatureCollection",
"features": [
{ "type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[10.0, 10.0], [60.0, 40.0], [50.0, 75.0],[20.0, 60.0]
]
},
"properties": {
"id": "1",
"Type": "campingspot"
}
},
{ "type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[20.0, 65.0], [50.0, 80.0], [50.0, 110.0],[20.0, 115.0]
]
},
"properties": {
"id": "1",
"Type": "campingspot"
}
}
]
}
I load thus using the d3.json() method and try using the d3-geo api to convert it to a path with this code:
var jsonData2 = d3.json("campingGeojson.json", function(error, json){
svg.selectAll("path")
.data(json.features)
.enter()
.append("path")
.attr("d", d3.geoPath())
.attr("stroke", "black")
.attr("stroke-width", 1)
.attr("fill", "green")
});
The console output on chrome tells me the following
Error: <path> attribute d: Expected number, "M,ZM,ZM,ZM,Z".
Any suggestions what is going wrong with using the geoPath method is highly appreciated.
Thank you.
Coordinates for geoJson polygons are an array of coordinate arrays (with the coordinates themselves being arrays). The first array indicates the shell, following arrays indicate holes.
So I think your geoJson should look more like:
"coordinates": [
[ [10.0, 10.0], [60.0, 40.0], [50.0, 75.0],[20.0, 60.0] ]
]
I'm dealing with server logs which are JSON format, and I want to store my logs on AWS S3 in Parquet format(and Parquet requires an Avro schema). First, all logs have a common set of fields, second, all logs have a lot of optional fields which are not in the common set.
For example, the follwoing are three logs:
{ "ip": "172.18.80.109", "timestamp": "2015-09-17T23:00:18.313Z", "message":"blahblahblah"}
{ "ip": "172.18.80.112", "timestamp": "2015-09-17T23:00:08.297Z", "message":"blahblahblah", "microseconds": 223}
{ "ip": "172.18.80.113", "timestamp": "2015-09-17T23:00:08.299Z", "message":"blahblahblah", "thread":"http-apr-8080-exec-1147"}
All of the three logs have 3 shared fields: ip, timestamp and message, some of the logs have additional fields, such as microseconds and thread.
If I use the following schema then I will lose all additional fields.:
{"namespace": "example.avro",
"type": "record",
"name": "Log",
"fields": [
{"name": "ip", "type": "string"},
{"name": "timestamp", "type": "String"},
{"name": "message", "type": "string"}
]
}
And the following schema works fine:
{"namespace": "example.avro",
"type": "record",
"name": "Log",
"fields": [
{"name": "ip", "type": "string"},
{"name": "timestamp", "type": "String"},
{"name": "message", "type": "string"},
{"name": "microseconds", "type": [null,long]},
{"name": "thread", "type": [null,string]}
]
}
But the only problem is that I don't know all the names of optional fields unless I scan all the logs, besides, there will new additional fields in future.
Then I think out an idea that combines record and map:
{"namespace": "example.avro",
"type": "record",
"name": "Log",
"fields": [
{"name": "ip", "type": "string"},
{"name": "timestamp", "type": "String"},
{"name": "message", "type": "string"},
{"type": "map", "values": "string"} // error
]
}
Unfortunately this won't compile:
java -jar avro-tools-1.7.7.jar compile schema example.avro .
It will throw out an error:
Exception in thread "main" org.apache.avro.SchemaParseException: No field name: {"type":"map","values":"long"}
at org.apache.avro.Schema.getRequiredText(Schema.java:1305)
at org.apache.avro.Schema.parse(Schema.java:1192)
at org.apache.avro.Schema$Parser.parse(Schema.java:965)
at org.apache.avro.Schema$Parser.parse(Schema.java:932)
at org.apache.avro.tool.SpecificCompilerTool.run(SpecificCompilerTool.java:73)
at org.apache.avro.tool.Main.run(Main.java:84)
at org.apache.avro.tool.Main.main(Main.java:73)
Is there a way to store JSON strings in Avro format which are flexible to deal with unknown optional fields?
Basically this is a schema evolution problem, Spark can deal with this problem by Schema Merging. I'm seeking a solution with Hadoop.
The map type is a "complex" type in avro terminology. The below snippet works:
{
"namespace": "example.avro",
"type": "record",
"name": "Log",
"fields": [
{"name": "ip", "type": "string"},
{"name": "timestamp", "type": "string"},
{"name": "message", "type": "string"},
{"name": "additional", "type": {"type": "map", "values": "string"}}
]
}
I have two questions:
Is it possible to use the same reader and parse records that were written with two schemas that are compatible, e.g. Schema V2 only has an additional optional field compared to Schema V1 and I want the reader to understand both? I think the answer here is no, but if yes, how do I do that?
I have tried writing a record with Schema V1 and reading it with Schema V2, but I get the following error:
org.apache.avro.AvroTypeException: Found foo, expecting foo
I used avro-1.7.3 and:
writer = new GenericDatumWriter<GenericData.Record>(SchemaV1);
reader = new GenericDatumReader<GenericData.Record>(SchemaV2, SchemaV1);
Here are examples of the two schemas (I have tried adding a namespace as well, but no luck).
Schema V1:
{
"name": "foo",
"type": "record",
"fields": [{
"name": "products",
"type": {
"type": "array",
"items": {
"name": "product",
"type": "record",
"fields": [{
"name": "a1",
"type": "string"
}, {
"name": "a2",
"type": {"type": "fixed", "name": "a3", "size": 1}
}, {
"name": "a4",
"type": "int"
}, {
"name": "a5",
"type": "int"
}]
}
}
}]
}
Schema V2:
{
"name": "foo",
"type": "record",
"fields": [{
"name": "products",
"type": {
"type": "array",
"items": {
"name": "product",
"type": "record",
"fields": [{
"name": "a1",
"type": "string"
}, {
"name": "a2",
"type": {"type": "fixed", "name": "a3", "size": 1}
}, {
"name": "a4",
"type": "int"
}, {
"name": "a5",
"type": "int"
}]
}
}
},
{
"name": "purchases",
"type": ["null",{
"type": "array",
"items": {
"name": "purchase",
"type": "record",
"fields": [{
"name": "a1",
"type": "int"
}, {
"name": "a2",
"type": "int"
}]
}
}]
}]
}
Thanks in advance.
I encountered the same issue. That might be a bug of avro, but you probably can work around by adding "default": null to the field of "purchase".
Check my blog for details: http://ben-tech.blogspot.com/2013/05/avro-schema-evolution.html
You can do opposite of it . Mean you can parse data schem 1 and write data from schema 2 . Beacause at write time it write data into file and if we don't provide any field at reading time than it will be ok. But if we write less field than read than it will not recognize extra field at reading time so , it will give error .
Best way is to have a schema mapping to maintain the schema like Confluent Avro schema registry.
Key Take Aways:
1. Unlike Thrift, avro serialized objects do not hold any schema.
2. As there is no schema stored in the serialized byte array, one has to provide the schema with which it was written.
3. Confluent Schema Registry provides a service to maintain schema versions.
4. Confluent provides Cached Schema Client, which checks in cache first before sending the request over the network.
5. Json Schema present in “avsc” file is different from the schema present in Avro Object.
6. All Avro objects extends from Generic Record
7. During Serialization : based on schema of the Avro Object a schema Id is requested from the Confluent Schema Registry.
8. The schemaId which is a INTEGER is converted to Bytes and prepend to serialized AvroObject.
9. During Deserialization : First 4 bytes are removed from the ByteArray. 4 bytes are converted back to INTEGER(SchemaId)
10. Schema is requested from the Confluent Schema Registry and using this schema the byteArray is deserialized.
http://bytepadding.com/big-data/spark/avro/avro-serialization-de-serialization-using-confluent-schema-registry/