Avro schema for record type with empty object - avro

I am trying to create avro schema for below json
{
"id": "TEST",
"status": "status",
"timestamp": "2019-01-01T00:00:22-03:00",
"comment": "add comments or replace it with adSummary data",
"error": {
"code": "ER1212132",
"msg": "error message"
}
}
the error object is optional, it could be
"error" :{}
Below is the avro schema without default value
{
"type" : "record",
"name" : "Order",
"fields" : [ {
"name" : "id",
"type" : "string"
}, {
"name" : "status",
"type" : "string"
}, {
"name" : "timestamp",
"type" : "string"
}, {
"name" : "comment",
"type" : ["null","string"],
"default": null
}, {
"name" : "error",
"type" : {
"type" : "record",
"name" : "error",
"fields" : [ {
"name" : "code",
"type" : "string"
}, {
"name" : "msg",
"type" : "string"
} ]
}
} ]
}
How can I add default value {} for error field in json.

{
"type" : "record",
"name" : "Order",
"fields" : [ {
"name" : "id",
"type" : "string"
}, {
"name" : "status",
"type" : "string"
}, {
"name" : "timestamp",
"type" : "string"
}, {
"name" : "comment",
"type" : ["null","string"],
"default": null
}, {
"name" : "error",
"type" : [{"type": "record", "fields":[{"name": "code", "type":"string"}, {"name": "msg", "type":"string"}]}, {"type": "record", "fields":[]}]
} ]
}

Related

Restricting Values Assignable to Discriminating Property

Swagger/OpenAPI definition:
{
"openapi" : "3.0.1",
"info" : {
"title" : "OpenAPI definition",
"version" : "v0"
},
"servers" : [ {
"url" : "http://sandbox.test.com:8063/api/recs",
"description" : "Generated server url"
} ],
"paths" : {
"/data" : {
"get" : {
"tags" : [ "Data" ],
"operationId" : "getData",
"parameters" : [ {
"name" : "goal",
"in" : "query",
"required" : false,
"schema" : {
"$ref" : "#/components/schemas/GoalsEnum_User"
}
} ],
"responses" : {
"404" : {
"description" : "Not Found",
"content" : {
"*/*" : {
"schema" : {
"type" : "object"
}
}
}
},
"200" : {
"description" : "Result generated successfully",
"content" : {
"application/json" : {
"schema" : {
"type" : "array",
"items" : {
"oneOf" : [ {
"$ref" : "#/components/schemas/EventDataDto"
}, {
"$ref" : "#/components/schemas/FreeRideDataDto"
}]
}
}
}
}
}
}
}
}
},
"components" : {
"schemas" : {
"GoalsEnum_User" : {
"type" : "string",
"enum" : [ "User1", "User2" ]
},
"EventDataDto" : {
"type" : "object",
"allOf" : [ {
"$ref" : "#/components/schemas/ParentDataSchema_UserData"
}, {
"type" : "object",
"properties" : {
"rules" : {
"type" : "array",
"items" : {
"$ref" : "#/components/schemas/RuleDto"
}
}
}
}, {
"$ref" : "#/components/schemas/ParentDataSchema"
} ]
},
"FreeRideDataDto" : {
"type" : "object",
"allOf" : [ {
"$ref" : "#/components/schemas/ParentDataSchema"
}, {
"type" : "object",
"properties" : {
"completedRoutes" : {
"type" : "array",
"items" : {
"type" : "integer",
"format" : "int64"
}
},
"averageDistance" : {
"type" : "number",
"format" : "double"
},
"averageDuration" : {
"type" : "number",
"format" : "double"
}
}
}, {
"$ref" : "#/components/schemas/ParentDataSchema_UserData"
} ]
},
"ParentDataSchema" : {
"required" : [ "type" ],
"type" : "object",
"properties" : {
"type" : {
"type" : "string",
"enum" : [ "FREE_RIDE", "EVENT" ]
}
},
"discriminator" : {
"propertyName" : "type",
"mapping" : {
"EVENT" : "#/components/schemas/EventRecommendationDto",
"FREE_RIDE" : "#/components/schemas/FreeRideRecommendationDto"
}
}
},
"ParentDataSchema_UserData" : {
"required" : [ "type" ],
"type" : "object",
"properties" : {
"type" : {
"type" : "string",
"enum" : [ "FREE_RIDE", "EVENT" ]
}
},
"discriminator" : {
"propertyName" : "type",
"mapping" : {
"EVENT" : "#/components/schemas/EventRecommendationDto",
"FREE_RIDE" : "#/components/schemas/FreeRideRecommendationDto"
}
}
}
}
}
}
Generated Example:
[
{
"type": "FREE_RIDE",
"rules": [
{
"ruleId": 0,
"categoryId": 0,
"name": "string",
"type": "string",
"value": "string"
}
]
},
{
"type": "FREE_RIDE",
"completedRoutes": [
0
],
"averageDistance": 0,
"averageDuration": 0
}
]
Since there are specific values for the discriminating field "type", I expect the examples to have the correct value for detected types. Although the types were listed correctly, the type field is not set to the discriminating value.
Is there anything I can do to the Swagger/OpenAPI definitions or Swagger UI to fix this? I'm even open to adding a bug-fix if you can point me to where the values of the field examples are set and how can I choose the discriminating value instead of the first one in the enum instead.

hive table to view the avro records which is streamed using flume getting Block size invalid or too large for this implementation: -40

I am creating the hive serde external table to view the twitter records which is streaming using flume.
My property file
# Naming the components on the current agent.
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
# Describing/Configuring the source
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = xxx
TwitterAgent.sources.Twitter.consumerSecret = xxx
TwitterAgent.sources.Twitter.accessToken = xxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxx
TwitterAgent.sources.Twitter.keywords = kafka
# Describing/Configuring the sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://xxx:8000/topics/flumedata
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 100000
TwitterAgent.sinks.hdfs.serializer=Text
# Describing/Configuring the channel
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 100000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000
TwitterAgent.channels.MemChannel.byteCapacity = 6912212
# Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
Query to create a hive external table
CREATE EXTERNAL TABLE twitter_tweets
COMMENT "just drop the schema right into the HQL"
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"type" : "record",
"name" : "Doc",
"doc" : "adoc",
"fields" : [ {
"name" : "id",
"type" : "string"
}, {
"name" : "user_friends_count",
"type" : [ "int", "null" ]
}, {
"name" : "user_location",
"type" : [ "string", "null" ]
}, {
"name" : "user_description",
"type" : [ "string", "null" ]
}, {
"name" : "user_statuses_count",
"type" : [ "int", "null" ]
}, {
"name" : "user_followers_count",
"type" : [ "int", "null" ]
}, {
"name" : "user_name",
"type" : [ "string", "null" ]
}, {
"name" : "user_screen_name",
"type" : [ "string", "null" ]
}, {
"name" : "created_at",
"type" : [ "string", "null" ]
}, {
"name" : "text",
"type" : [ "string", "null" ]
}, {
"name" : "retweet_count",
"type" : [ "long", "null" ]
}, {
"name" : "retweeted",
"type" : [ "boolean", "null" ]
}, {
"name" : "in_reply_to_user_id",
"type" : [ "long", "null" ]
}, {
"name" : "source",
"type" : [ "string", "null" ]
}, {
"name" : "in_reply_to_status_id",
"type" : [ "long", "null" ]
}, {
"name" : "media_url_https",
"type" : [ "string", "null" ]
}, {
"name" : "expanded_url",
"type" : [ "string", "null" ]
} ]
}');
LOAD DATA INPATH '/topics/flumedata/FlumeData.*' OVERWRITE INTO TABLE twitter_tweets;
After creating the table, when i hit select * from twitter_tweets;
It is not giving any data, it throughs an error
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for this implementation: -40
Where i went wrong, i dono why iam getting this block size issue. Can anyone guide me.

How do I import GeoJSON files for use in Google Earth Engine Code Editor?

I have generated some points of interest with my database with SQL to geoJSON.
geojson:
{
"FeatureCollection" : [
{
"geometry" : {
"coordinates" : [
-45.927083,
-12.260889
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "626.46"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.916500,
-12.255944
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "565.04"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.949417,
-12.270361
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "631.47"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.958833,
-12.277361
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "591.85"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.942944,
-12.249889
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "644.67"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.930917,
-12.243611
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "644.67"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.871917,
-12.197139
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "574.60"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.866861,
-12.206417
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "574.60"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.967389,
-12.261889
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "592.50"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.973500,
-12.250639
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "592.50"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.962944,
-12.245444
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "621.60"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.952667,
-12.239778
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "592.50"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.931639,
-12.228528
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "574.60"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.908694,
-12.247472
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "557.20"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.918667,
-12.239139
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "644.50"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.897028,
-12.246000
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "557.20"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.906417,
-12.230472
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "64.50"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.895750,
-12.225028
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "644.50"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.927111,
-12.213750
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "564.90"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.917639,
-12.208750
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "564.90"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.897833,
-12.198444
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "584.00"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.881583,
-12.202233
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "574.60"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.876833,
-12.235306
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "574.60"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.867278,
-12.230306
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "574.60"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.856806,
-12.224889
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "574.60"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.861806,
-12.215611
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "574.60"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.887833,
-12.192806
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "12.60"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.877639,
-12.187917
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "564.90"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.941889,
-12.234611
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "644.50"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.887111,
-12.239889
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "644.50"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.907944,
-12.203361
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "591.70"
},
"type" : "Feature"
},
{
"geometry" : {
"coordinates" : [
-45.892722,
-12.208028
],
"type" : "Point"
},
"properties" : {
"grower" : "foo",
"name" : "bar",
"radius" : "574.60"
},
"type" : "Feature"
}
]
}
I would like to import this geojson to my code editor on Google Earth Engine. Looking in the docs (assets manager), GEE accepts assets as raster images, shapefiles (.shp, shx, dbf, prj).
Also, I found the import to feature collection via fusion tables, but it still needs shapefiles.
I have found some geojson to shapefile conversors, though I need a way to directly import my geojson to a feature collection on GEE. Is that possible?
You can also import GeoJSON geometry objects directly into either the JavaScript or Python API using, for example, this format for a MultiPolygon:
feature_geometry = {
"type": "MultiPolygon",
'coordinates": [
[
[
[-120, 35],
[-120.001, 35],
[-120.001, 35.001],
[-120, 35.001],
[-120, 35]
]
]
]
}
Both hash maps (i.e., dictionaries) are identical to the GeoJSON specification (source):
{
"type": "MultiPolygon",
"coordinates": [
[
[
[-120, 35],
[-120.001, 35],
[-120.001, 35.001],
[-120, 35.001],
[-120, 35]
]
]
]
}
Of course, you can also read this data in from a GeoJSON file (Python example shown):
import json
data = json.loads(geojson_file)
For a simple Python wrapper, there is the pygeoj library, but JSON data is handled well natively in Python and of course in JavaScript.
You can easily use OGR to convert your data a shapefile (which you can then upload through the code editor) or to KML and upload it into FusionTables.
ogr2ogr -f KML output.kml input.json
However, your FeatureCollection isn't valid GeoJSON and you'll have to fix that first. The preamble should look like:
{
"type": "FeatureCollection",
"features": [
{
"geometry" : { ...

How to define type for a specific field in ElasticSearch for Rails

I am struggling with elasticsearch-rails.
I have the following mapping:
{
"listings" : {
"mappings" : {
"listing" : {
"properties" : {
"address" : {
"type" : "string"
},
"authorized" : {
"type" : "boolean"
},
"categories" : {
"properties" : {
"created_at" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"id" : {
"type" : "long"
},
"name" : {
"type" : "string"
},
"parent_id" : {
"type" : "long"
},
"updated_at" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"url_name" : {
"type" : "string"
}
}
},
"cid" : {
"type" : "string"
},
"city" : {
"type" : "string"
},
"country" : {
"type" : "string"
},
"created_at" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"featured" : {
"type" : "boolean"
},
"geojson" : {
"type" : "string"
},
"id" : {
"type" : "long"
},
"latitude" : {
"type" : "string"
},
"longitude" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"phone" : {
"type" : "string"
},
"postal" : {
"type" : "string"
},
"province" : {
"type" : "string"
},
"thumbnail_filename" : {
"type" : "string"
},
"updated_at" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"url" : {
"type" : "string"
}
}
}
}
}
}
I would like to change the type for the geojson field from string to geo_point so I can use the geo_shape query on it.
I tried this in my model:
settings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :geojson, type: 'geo_shape'
end
end
with peculiar results. When I queried the mapping with $ curl 'localhost:9200/_all/_mapping?pretty', the geojson field still shows as type: string.
Within a Rails console, if I do Listing.mappings.to_hash, it seems to show that the geojson field is of type geo_shape.
And yet when running this query:
Listing.search(query: { fuzzy_like_this: { fields: [:name], like_text: "gap" } }, query: { fuzzy_like_this_field: { city: { like_text: "San Francisco" } } }, query: { geo_shape: { geojson: { shape: { type: :envelope, coordinates: [[37, -122],[38,-123]] } } } }); response.results.total; response.results.map { |r| puts "#{r._score} | #{r.name}, #{r.city} (lat: #{r.latitude}, lon: #{r.longitude})" }
ES complains that the geojson field is not of type geo_shape.
What am I missing? How do I tell ES that I want the geojson field to be of type geo_shape and not string?
The issue was that I didn't delete and recreate the mapping.
In the rails console, I ran Model.__elasticsearch__.delete_index! and then Model.__elasticsearch__.create_index! followed by Model.import

Question populating nested records in Avro using a GenericRecord

Suppose I’ve got the following schema:
{
"name" : "Profile",
"type" : "record",
"fields" : [
{ "name" : "firstName", "type" : "string" },
{ "name" : "address" , "type" : {
"type" : "record",
"name" : "AddressUSRecord",
"fields" : [
{ "name" : "address1" , "type" : "string" },
{ "name" : "address2" , "type" : "string" },
{ "name" : "city" , "type" : "string" },
{ "name" : "state" , "type" : "string" },
{ "name" : "zip" , "type" : "int" },
{ "name" : "zip4", "type": "int" }
]
}
}
]
}
I’m using a GenericRecord to represent each Profile that gets created. To add a firstName, it’s easy to do the following:
Schema sch = Schema.parse(schemaFile);
DataFileWriter<GenericRecord> fw = new DataFileWriter<GenericRecord>(new GenericDatumWriter<GenericRecord>()).create(sch, new File(outFile));
GenericRecord r = new GenericData.Record(sch);
r.put(“firstName”, “John”);
fw.append(r);
But how would I set the city, for example? How do I represent the key as a string that the r.put method can understand?
Thanks
For the schema above:
GenericRecord t = new GenericData.Record(sch.getField("address").schema());
t.put("city","beijing");
r.put("address",t);

Resources