Deserialization error when using an evolved schema - avro

I originally had the following schema
{
"type":"record",
"name":"EntityA",
"fields":[
{"name":"values",
"type": {"type":"array","items": {"type":"record","name":"SubEntities",
"fields":[
{"name":"name","type":"string"},
{"name":"valueMap", "type":["null",{"type":"map","values":"string"}], "default":null}
]
}}
}
]
}
I had stored records serialized using this schema in rocksdb, then I added another field to this schema
{
"type":"record",
"name":"EntityA",
"fields":[
{"name":"values",
"type": {"type":"array","items": {"type":"record","name":"SubEntities",
"fields":[
{"name":"name","type":"string"},
{"name":"valueMap", "type":["null",{"type":"map","values":"string"}], "default":null},
{"name":"newMap", "type":["null",{"type":"map","values":"int"}], "default":null},
]
}}
}
]
}
Now I am using this above schema but also have the original schema registered with schema-registry. Suddenly, I start getting this error.
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 5
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 27 out of bounds for length 2
at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:460)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:188)
at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248)
at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:299)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:184)
at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248)
at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
where id 5 corresponds to the original schema above. Can someone please help to understand the reason for this error.

Related

Can't read data from Firebase Realtime database by issuing a GET request. Always returns null

Task is simple, but I couldn't find any solution though.
Here is the request I'm sending https://graf-24561-default-rtdb.firebaseio.com/graf-24561-default-rtdb.json
Rules for reading and writing:
{
"rules": {
".read": "now < 1651165200000", // 2022-4-29
".write": "now < 1651165200000", // 2022-4-29
}
}
Data in code:
[
{
"name": "0002 М ( мрамор) 8м пленка с\/м\/20 DEKORON ",
"price": 209.7
},
{
"name": "0007 М ( мрамор) 8м пленка с\/м\/20 DEKORON ",
"price": 209.7
},
{
"name": "0008-2 А (дуб темный) 8м пленка с\/м \/20",
"price": 232.84
},
{
"name": "0008-3 А (темн.махагон) 8м пленка с\/м \/20 ",
"price": 209.7
}
]
The graf-24561-default-rtdb that you see in the root of the JSON in the screenshot is the name of your database, and is not part of the data structure.
So to get the entire database, the URL would be:
https://graf-24561-default-rtdb.firebaseio.com/.json
It may read a bit weird with that .json at the end, but is the correct syntax.

How To Convert "created_timestamp" Value To A Valid Date In Python

I'm currently working on a Twitter bot that automatically reply messages, I'm doing this by using tweepy (the official python twitter library)
I need to filter messages based on the created time as I don't want to reply same message twice. Now the problem is that the API endpoint returns created_timestamp as string representation of positive integers.
Below is an example of data returned as per the doc
{
"next_cursor": "AB345dkfC",
"events": [
{ "id": "110", "created_timestamp": "1639919665615", ... },
{ "id": "109", "created_timestamp": "1639865141987", ... },
{ "id": "108", "created_timestamp": "1639827437833", ... },
{ "id": "107", "created_timestamp": "1639825389806", ... },
{ "id": "106", "created_timestamp": "1639825389796", ... },
{ "id": "105", "created_timestamp": "1639825389768", ... },
...
]
}
My question is "How do I convert the created_timestamp to a valid date using python" ?.
You might play with timestamps on this resource
And in your case could use methods like:
timestamp = int('timestamp_string')
datetime.fromtimestamp(timestamp, tz=None)
date.fromtimestamp(timestamp)
From the datetime standard library. But integers after the first line are already well comparable if the task is to distinguish differences between the timestamps.

JSON data types cannot be deserialized from a GraphQL query by using Ferry package

I have this GraphQL query:
query QuoteRequests($page: Int!) {
getQuoteRequestsList(page: $page) {
vehicle
body
licensePlate
vin
quality
currency
items
}
}
It generates this sample result:
{
"data": {
"getQuoteRequestsList": [
{
"vehicle": "1997 TOYOTA AVALON 4 DR ",
"body": "Sedan",
"licensePlate": "RHJ456",
"vin": "XBGGDFYYREAXVJJJD",
"quality": [
"GENUINO"
],
"currency": "USD",
"items": [
{
"uid": "74355f85-5312-9999-8acd-709ceccda00a",
"name": "Doble cero que es",
"description": "no me interesa",
"quantity": 11
},
{
"uid": "66db6fe0-1044-4d58-8454-5e51ab7a313f",
"name": "El arenero",
"description": "el duende verde",
"quantity": 2
}
]
},
]
}
}
The items data is a JSON type, and when trying to fetch that data by using Ferry package with
client.request(GQuoteRequestsReq((b) => b..vars.page = 0)).listen((response) => print(response.data.toString()));
I get a Null result, but if I leave items out of the query, I get a no-null result. I have no problem if I run that query using the HTTP package.
Is this is an error from Package? or do I need to configure something?
Thanks.
UPDATE 1
This is definitely an error because if I try to make a hot-reload in Flutter I get this error in debug console:
Reloaded 1 of 1700 libraries in 860ms.
E/flutter ( 2286): [ERROR:flutter/lib/ui/ui_dart_state.cc(209)] Unhandled Exception: Deserializing '[__typename, Query, getQuoteRequestsList, [{__typename: GetQuoteRequestsRecor...' to 'GQuoteRequestsData' failed due to: Deserializing '[{__typename: GetQuoteRequestsRecord, uid: 0bf6709f-7ab7-464e-8ee3-6a94e46f05...' to 'BuiltList<GQuoteRequestsData_getQuoteRequestsList>' failed due to: Deserializing '[__typename, GetQuoteRequestsRecord, uid, 0bf6709f-7ab7-464e-8ee3-6a94e46f057...' to 'GQuoteRequestsData_getQuoteRequestsList' failed due to: Deserializing '[{uid: 16870250-5acb-4c23-a7e4-f4e23bbd23ad, name: Doble cero que es, descrip...' to 'GJSON' failed due to: type 'List<dynamic>' is not a subtype of type 'String?' in type cast
E/flutter ( 2286): #0 BuiltJsonSerializers._deserialize
package:built_value/src/built_json_serializers.dart:178
E/flutter ( 2286): #1 BuiltJsonSerializers.deserialize
package:built_value/src/built_json_serializers.dart:124
It's clear it is fetching the expected result but it cannot deserialize properly, then it returns a null result.
Is there any workaround?
UPDATE 2
I've tried this solution but I'm getting same error.
In my case, I've followed these steps, I've
installed built_value package,
edited build.yaml file,
run the command flutter packages pub run build_runner build
The answer can be found in this issue.

Twitter API 2.0 - Unable to fetch user.fields

I am using API version 2.0 and unable to fetch the user.fields results. All other parameters seem to be returning results correctly. I'm following this documentation.
url = "https://api.twitter.com/2/tweets/search/all"
query_params = {
"query": "APPL",
"max_results": "10",
"tweet.fields": "created_at,lang,text,author_id",
"user.fields": "name,username,created_at,location",
"expansions": "referenced_tweets.id.author_id",
}
response = requests.request("GET", url, headers=headers, params=query_params).json()
Sample result:
{
'author_id': '1251347502013521925',
'text': 'All conspiracy. But watch for bad news on Apple. Such a vulnerable stocktechnically for the biggest market cap # $2.1T ( Thanks Jay). This is the glue for the bulls. But, they stopped innovating when Steve died, built a fancy office and split the stock. $appl',
'lang': 'en',
'created_at': '2021-06-05T02:33:48.000Z',
'id': '1401004298738311168',
'referenced_tweets': [{
'type': 'retweeted',
'id': '1401004298738311168'
}]
}
As you can see, the following information is not returned: name, username, and location.
Any idea how to retrieve this info?
Your query does actually return the correct data. I tested this myself.
A full example response will be structured like this:
{
"data": [
{
"created_at": "2021-06-05T02:33:48.000Z",
"lang": "en",
"id": "1401004298738311168",
"text": "All conspiracy. But watch for bad news on Apple. Such a vulnerable stocktechnically for the biggest market cap # $2.1T ( Thanks Jay). This is the glue for the bulls. But, they stopped innovating when Steve died, built a fancy office and split the stock. $appl",
"author_id": "1251347502013521925",
"referenced_tweets": [
{
"type": "retweeted",
"id": "1401004298738311168"
}
]
}
],
"includes": {
"users": [
{
"name": "Gary Casper",
"id": "1251347502013521925",
"username": "Hisel1979",
"created_at": "2020-07-11T13:39:58.000Z"
}
]
}
}
The sample result you provided comes from within the data object. However, the expanded object data will be nested in the includes object (in your case name, username, and location). The corresponding user object can be referenced via the author_id field.

Default datatype mapping in elasticsearch

The following are the contents of config/default_mapping.json:
{
"_default_" : [
{
"int_template" : {
"match": "*",
"match_mapping_type": "int",
"mapping": {
"type": "string"
}
}
]
}
Want i want ES to do is to pick out all numbers from my logs and map them as strings.
Use case-
After clearing all indexes- curl -XDELETE 'http://localhost:9200/_all', i run this to send the following to ES (through fluentd's tailf plugin)-
echo "{\"this\" : 134}" >> /home/user/logs/program-data/logs/tiger/tiger.log
Elastic happily creates the initial indexes. Now, to test weather my default_mapping works, i send a string at the value where i previously sent an int.
echo "{\"this\" : \"ABC\"}" >> /home/user/logs/program-data/logs/tiger/tiger.log
Exception caught by ES-
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [this]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:398)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:618)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:471)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:513)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:457)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:342)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:401)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.lang.NumberFormatException: For input string: "ABC"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:438)
at java.lang.Long.parseLong(Long.java:478)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.longValue(AbstractXContentParser.java:89)
What could be wrong here?
Update-
My default_mapping.json now looks like-
{
"_default_": {
"dynamic_templates": [
{
"string_template": {
"match": "*",
"mapping": {
"type": "string"
}
}
}
]
}
}
First of all, I'd suggest not to use file system based configuration or mappings. Just do it via api.
Your mapping is malformed, as you have the type name (_default_) but you don't specify that what you are submitting is a dynamic template.
As for the content, I'd remove that match_mapping_type if you want to map everything as a string.

Resources