Apache Avro UnresolvedUnionException: Not in union ["null",{"type":"int","logicalType":"date"}]: 2001-01-01 - avro

Despite examples collected here and there, I haven't been able to produce a correct Avro 1.9.1 schema for my (lomboked) class, getting the title's error message at serialization time of my LocalDate field.
Can someone please explain what I'm missing?
#Data
public class Person {
private Long id;
private String firstname;
private LocalDate birth;
private Integer votes = 0;
}
This is the schema:
{
"type": "record",
"name": "Person",
"namespace": "com.example.demo",
"fields": [
{
"name": "id",
"type": "long"
},
{
"name": "firstname",
"type": "string"
},
{
"name": "birth",
"type": [ "null", { "type": "int", "logicalType": "date" }]
},
{
"name": "votes",
"type": "int"
}]
}
The error, meaning java.time.LocalDate is not found in the union's "index named" map, is this:
org.apache.avro.UnresolvedUnionException: Not in union ["null",{"type":"int","logicalType":"date"}]: 2001-01-01
Index named map keys are "null" and "int", which seems logical.

Related

Avro Tools Failure Expected start-union. Got VALUE_STRING

I've defined the below avro schema (car_sales_customer.avsc),
{
"type" : "record",
"name" : "topLevelRecord",
"fields" : [ {
"name": "cust_date",
"type": "string"
},
{
"name": "customer",
"type": {
"type": "array",
"items": {
"name": "customer",
"type": "record",
"fields": [
{
"name": "address",
"type": "string"
},
{
"name": "driverlience",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "name",
"type": "string"
},
{
"name": "phone",
"type": "string"
}
]
}
}
}]
}
and my input json payload (car_sales_customer.json) is as follows,
{"cust_date":"2017-04-28","customer":[{"address":"SanFrancisco,CA","driverlience":"K123499989","name":"JoyceRidgely","phone":"16504378889"}]}
I'm trying to use avro-tools and convert the above json to avro using the avro schema,
java -jar ./avro-tools-1.9.2.jar fromjson --schema-file ./car_sales_customer.avsc ./car_sales_customer.json > ./car_sales_customer.avro
I get the below error when I execute the above statement,
Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:514)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:433)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:89)
at org.apache.avro.tool.Main.run(Main.java:66)
at org.apache.avro.tool.Main.main(Main.java:55)
Is there a solution to overcome the error?

How to extract a nested nullable Avro Schema

The complete schema is the following:
{
"type": "record",
"name": "envelope",
"fields": [
{
"name": "before",
"type": [
"null",
{
"type": "record",
"name": "row",
"fields": [
{
"name": "username",
"type": "string"
},
{
"name": "timestamp",
"type": "long"
}
]
}
]
},
{
"name": "after",
"type": [
"null",
"row"
]
}
]
}
I wanted to programmatically extract the following sub-schema:
{
"type": "record",
"name": "row",
"fields": [
{
"name": "username",
"type": "string"
},
{
"name": "timestamp",
"type": "long"
}
]
}
As you see, field "before" is nullable. I can extract it's schema by doing:
schema.getField("before").schema()
But the schema is not a record as it contains NULL at the beginning(UNION type) and I can't go inside to fetch schema of "row".
["null",{"type":"record","name":"row","fields":[{"name":"username","type":"string"},{"name":"tweet","type":"string"},{"name":"timestamp","type":"long"}]}]
I want to fetch the sub-schema because I want to create GenericRecord out of it. Basically I want to create two GenericRecords "before" and "after" and add them to the main GenericRecord created from full schema.
Any help will be highly appreciated.
Good news, if you have a union schema, you can go inside to fetch the list of possible options:
Schema unionSchema = schema.getField("before").schema();
List<Schema> unionSchemaContains = unionSchema.getTypes();
At that point, you can look inside the list to find the one that corresponds to the Type.RECORD.

Can we refer to a only one property of other schema

I have a rest service, that can work as below:
http://server/path/AddressResource and
http://server/path/AddressResource/someAnotherPath
I have a definitions like below.
"definitions": {
"address": {
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
}
that is the response of path1, and in path two i just want to return the "city" property of address.
Can I create a schema, referring to address and using just one of it's property?

How to store business ID in elasticsearch?

I'm trying to store tweets in some Elasticsearch index using Spring Data Elasticsearch (for tweet requesting , I'm using twitter4j).
I have followed some basic example and I'm using this basic annotated POJO (metadatas with complex type have been removed) :
#Document(indexName = "twitter", type = "tweet")
public class StorableTweet {
#Id
private long id;
private String createdAt;
private String text;
private String source;
private boolean isTruncated;
private long inReplyToStatusId;
private long inReplyToUserId;
private boolean isFavorited;
private boolean isRetweeted;
private int favoriteCount;
private String inReplyToScreenName;
private String userScreenName = null;
// Getters/setters removed
}
To store a tweet using this model, I use :
public interface TweetRepository extends ElasticsearchRepository<StorableTweet, Long> {
}
and in my storing service :
tweetRepository.save(storableTweet);
It works fine, but my tweet Id is stored in "_id" (why not) and some other number coming from nowhere is stored in "id" (why....?) :
{
"_index": "twitter",
"_type": "tweet",
**"_id": "655008947099840512"**, <-- this is the real tweet id
"_version": 1,
"found": true,
"_source":
{
**"id": 655008947099840500**, <-- this number comes from nowhere
"createdAt": "Fri Oct 16 15:14:37 CEST 2015",
"text": "tweet text(...)",
"source": "Twitter for iPhone",
"inReplyToStatusId": -1,
"inReplyToUserId": -1,
"favoriteCount": 0,
"inReplyToScreenName": null,
"user": "971jml",
"favorited": false,
"retweeted": false,
"truncated": false
}
}
What I would like is either my tweet id stored in "_id" (and no "id" field), either my tweet id stored in "id" an having a generated number in "_id", and get rid of this random useless number in "id".
EDIT
mapping :
{
"twitter":
{
"mappings":
{
"tweet":
{
"properties":
{
"createdAt":
{
"type": "string"
},
"favoriteCount":
{
"type": "long"
},
"favorited":
{
"type": "boolean"
},
"inReplyToScreenName":
{
"type": "string"
},
"inReplyToStatusId":
{
"type": "long"
},
"inReplyToUserId":
{
"type": "long"
},
"retweeted":
{
"type": "boolean"
},
"source":
{
"type": "string"
},
"text":
{
"type": "string"
},
"truncated":
{
"type": "boolean"
},
"tweetId":
{
"type": "long"
},
"user":
{
"type": "string"
}
}
}
}
}
}
EDIT 2 : It looks like the problem is not about #Id annotation but about "long" type. Some other longs (not all) are transformed (a few units more or less) when stored into elasticsearch via Spring Data Elasticsearch.

Avro schema definition nesting types

I am fairly new to Avro and going through documentation for nested types. I have the example below working nicely but many different types within the model will have addresses. Is it possible to define an address.avsc file and reference that as a nested type? If that is possible, can you also take it a step further and have a list of Addresses for a Customer? Thanks in advance.
{"namespace": "com.company.model",
"type": "record",
"name": "Customer",
"fields": [
{"name": "firstname", "type": "string"},
{"name": "lastname", "type": "string"},
{"name": "email", "type": "string"},
{"name": "phone", "type": "string"},
{"name": "address", "type":
{"type": "record",
"name": "AddressRecord",
"fields": [
{"name": "streetaddress", "type": "string"},
{"name": "city", "type": "string"},
{"name": "state", "type": "string"},
{"name": "zip", "type": "string"}
]}
}
]
}
There are 4 possible ways:
Including it in pom file as mentioned in this ticket.
Declare all your types in a single avsc file.
Using a single static parser that first parses all the imports and then parse the actual data types.
(This is a hack) Use avdl file and use imports like https://avro.apache.org/docs/1.7.7/idl.html#imports . Though, IDL is intended for RPC calls.
Example for 2. Declare all your types in a single avsc file. Also answers array declaration on address.
[
{
"type": "record",
"namespace": "com.company.model",
"name": "AddressRecord",
"fields": [
{
"name": "streetaddress",
"type": "string"
},
{
"name": "city",
"type": "string"
},
{
"name": "state",
"type": "string"
},
{
"name": "zip",
"type": "string"
}
]
},
{
"namespace": "com.company.model",
"type": "record",
"name": "Customer",
"fields": [
{
"name": "firstname",
"type": "string"
},
{
"name": "lastname",
"type": "string"
},
{
"name": "email",
"type": "string"
},
{
"name": "phone",
"type": "string"
},
{
"name": "address",
"type": {
"type": "array",
"items": "com.company.model.AddressRecord"
}
}
]
},
{
"namespace": "com.company.model",
"type": "record",
"name": "Customer2",
"fields": [
{
"name": "x",
"type": "string"
},
{
"name": "y",
"type": "string"
},
{
"name": "address",
"type": {
"type": "array",
"items": "com.company.model.AddressRecord"
}
}
]
}
]
Example for 3. Using a single static parser
Parser parser = new Parser(); // Make this static and reuse
parser.parse(<location of address.avsc file>);
parser.parse(<location of customer.avsc file>);
parser.parse(<location of customer2.avsc file>);
If we want a hold of the Schema, that is if we want to create new records, we can either do
https://avro.apache.org/docs/1.5.4/api/java/org/apache/avro/Schema.Parser.html#getTypes() method to get the schema
or
Parser parser = new Parser(); // Make this static and reuse
Schema addressSchema =parser.parse(<location of address.avsc file>);
Schema customerSchema=parser.parse(<location of customer.avsc file>);
Schema customer2Schema =parser.parse(<location of customer2.avsc file>);
Just to added to #Princey James answer, the nested type must be defined before it is used.
Other add to #Princey James
With the Example for 2. Declare all your types in a single avsc file.
It will work for Serializing and deserializing with code generation
but Serializing and deserializing without code generation is not working
you will get org.apache.avro.AvroRuntimeException: Not a record schema: [{"type":" ...
working example with code generation :
#Test
public void avroWithCode() throws IOException {
UserPerso UserPerso3 = UserPerso.newBuilder()
.setName("Charlie")
.setFavoriteColor("blue")
.setFavoriteNumber(null)
.build();
AddressRecord adress = AddressRecord.newBuilder()
.setStreetaddress("mo")
.setCity("Paris")
.setState("IDF")
.setZip("75")
.build();
ArrayList<AddressRecord> li = new ArrayList<>();
li.add(adress);
Customer cust = Customer.newBuilder()
.setUser(UserPerso3)
.setPhone("0101010101")
.setAddress(li)
.build();
String fileName = "cust.avro";
File a = new File(fileName);
DatumWriter<Customer> customerDatumWriter = new SpecificDatumWriter<>(Customer.class);
DataFileWriter<Customer> dataFileWriter = new DataFileWriter<>(customerDatumWriter);
dataFileWriter.create(cust.getSchema(), new File(fileName));
dataFileWriter.append(cust);
dataFileWriter.close();
DatumReader<Customer> custDatumReader = new SpecificDatumReader<>(Customer.class);
DataFileReader<Customer> dataFileReader = new DataFileReader<>(a, custDatumReader);
Customer cust2 = null;
while (dataFileReader.hasNext()) {
cust2 = dataFileReader.next(cust2);
System.out.println(cust2);
}
}
without :
#Test
public void avroWithoutCode() throws IOException {
Schema schemaUserPerso = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));
Schema schemaAdress = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));
Schema schemaCustomer = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));
System.out.println(schemaUserPerso);
GenericRecord UserPerso3 = new GenericData.Record(schemaUserPerso);
UserPerso3.put("name", "Charlie");
UserPerso3.put("favorite_color", "blue");
UserPerso3.put("favorite_number", null);
GenericRecord adress = new GenericData.Record(schemaAdress);
adress.put("streetaddress", "mo");
adress.put("city", "Paris");
adress.put("state", "IDF");
adress.put("zip", "75");
ArrayList<GenericRecord> li = new ArrayList<>();
li.add(adress);
GenericRecord cust = new GenericData.Record(schemaCustomer);
cust.put("user", UserPerso3);
cust.put("phone", "0101010101");
cust.put("address", li);
String fileName = "cust.avro";
File file = new File(fileName);
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schemaCustomer);
DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
dataFileWriter.create(schemaCustomer, file);
dataFileWriter.append(cust);
dataFileWriter.close();
File a = new File(fileName);
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schemaCustomer);
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(a, datumReader);
GenericRecord cust2 = null;
while (dataFileReader.hasNext()) {
cust2 = dataFileReader.next(cust2);
System.out.println(cust2);
}
}

Resources