MetaData in header of GeoJSON - geojson

Would it break most readers (and violate the spec) if I added some meta-data at the top of a GeoJSON file (or packet).
I looked at: https://gis.stackexchange.com/questions/96158/metadata-and-geojson
But I am not clear if that answered my question here.
For example, can an add more properties to the CRS object, other than "name", "properties" to get some extended meta-data, rather than putting it on each feature?

The geojson specs section 6.1 state (https://www.rfc-editor.org/rfc/rfc7946):
6.1. Foreign Members
Members not described in this specification ("foreign members") MAY be
used in a GeoJSON document. Note that support for foreign members can
vary across implementations, and no normative processing model for
foreign members is defined. Accordingly, implementations that rely
too heavily on the use of foreign members might experience reduced
interoperability with other implementations.
For example, in the (abridged) Feature object shown below
{
"type": "Feature",
"id": "f1",
"geometry": {...},
"properties": {...},
"title": "Example Feature" }
the name/value pair of "title": "Example Feature" is a foreign member.
When the value of a foreign member is an object, all the descendant
members of that object are themselves foreign members.
GeoJSON semantics do not apply to foreign members and their descendants, regardless of their names and values. For example, in
the (abridged) Feature object below
{
"type": "Feature",
"id": "f2",
"geometry": {...},
"properties": {...},
"centerline": {
"type": "LineString",
"coordinates": [
[-170, 10],
[170, 11]
]
} }
the "centerline" member is not a GeoJSON Geometry object.

I don't know about whether it violates specs, but I did something similar and it did not break the reader.
For example, I had a GeoJSON file with 10 features, and wanted to add a time stamp to it. I accomplished this with Javascript (Node.js):
var json_in = require('/path/to/file/input.json');
var timei = ("2016-10-31 12Z");
var jsonfile = require('jsonfile');
var file = '/path/to/file/output.json';
jsonfile.writeFile(file, json_in, function (err) {
console.error(err)
})
I then mapped out the features on http://geojson.io and confirmed that everything looked right.
FYI You can get the jsonfile package (makes I/O much smoother) here:
https://github.com/jprichardson/node-jsonfile

Related

System for data validation and class generation (Avro vs Json Schema vs OpenAPI)

We want to have a system that allows us to define data schemas that we can use to validate our data, and to generate code in specific languages. We found json schema's that lets us do something like
File "message.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "Message",
"properties": {
"name": {
"type" : "string"
},
"type": {
"$ref": "type/message_type.schema.json"
},
"message_id":{
"$ref": "type/uuid.schema.json"
}
},
"required": ["name", "message_id"]
}
File "message_type.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "MessageType",
"enum": ["Message", "Query"]
}
File "uuid_type.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "UUID",
"type": "string",
"pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"
}
File "query.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "Query",
"allOf" : [ {"$ref": "type/message.schema.json" }],
"required": ["type"]
}
Please ignore if there is something that doesn't make sense but the point is, we really enjoy this system because it allows us to define types, and to refer to types that we create in another files, and even to use them like for type inheritance.
Then we want to use this files for code generation and validation. In python we then use a library called python_jsonschema_objects that can parse this files and the files that it references recursively, and we can then really simply create a python object with all the validation included.
But then we also want to use them for Java/Kotlin but the library that we found jsonschema2pojo doesn't seem able to parse linked files expecting everything to be in the same file.
This leads us to think that for some reason Json Schema is not that supported or used, unfortunately.
So, we have the question if a system like Avro or OpenAPI would be better supported and more widely used and could be chosen to this type of task.

Complex queries in CouchDB across multiple types of documents

I'm relatively new to CouchDB (more specifically Cloudant if it matters) and I'm having a hard time wrapping my head around something.
Assume the following (simplified) document examples:
{ "docType": "school", "_id": "school1", "state": "CA" }
{ "docType": "teacher", "_id": "teacher1", "age": "40", "school": "school1" }
I want to find all the teachers aged $age (eg. 40) in state $state (eg. CA).
Views only consider one document at a time; that is queries can't directly combine data from different documents. You can query across multiple fields in the same document using Cloudant Query. You can write a selector directly in the Cloudant dashboard. Something like
"selector": {
"age": {
"$gte": 40
},
"state": {
"$eq": "CA"
}
}
See https://cloud.ibm.com/docs/services/Cloudant/tutorials?topic=cloudant-creating-an-ibm-cloudant-query
with the full reference here: https://cloud.ibm.com/docs/services/Cloudant/tutorials?topic=cloudant-query
You could also use a so-called linked document to emulate basic joins, as outlined in the CouchDB docs https://docs.couchdb.org/en/stable/ddocs/views/joins.html

How does one parse nested Avro records correctly in NiFi?

I have incoming Avro records that roughly follow the format below. I am able to read them and convert them in existing NiFi flows. However, a recent change requires me to read from these files and parse the nested record, employers in this example. I read the Apache NiFi blog post, Record-Oriented Data with NiFi
but was unable to figure out how to get the AvroRecordReader to parse nested records.
{
"name": "recordFormatName",
"namespace": "nifi.examples",
"type": "record",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "firstName", "type": "string" },
{ "name": "lastName", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "gender", "type": "string" },
{ "name": "employers",
"type": "record",
"fields": [
{"name": "company", "type": "string"},
{"name": "guid", "type": "string"},
{"name": "streetaddress", "type": "string"},
{"name": "city", "type": "string"}
]}
]
}
What I hope to achieve is a flow to read the employers records for each recordFormatName record and use the PutDatabaseRecord processor to keep track of the employers values seen. The current plan is to insert the records to a MySQL database. As suggested in an answer below, I plan on using PartitionRecord to sort the records based on a value in the employers subrecord. I do not need the top level details for this particular flow.
I have tried to parse with the AvroRecordReader but cannot figure out how to specify the nested records. Is this something that can be accomplished with the AvroRecordReader alone or does preprocessing, say a JOLT Transform need to happen first?
EDIT: Added further details about database after receiving a response.
What is your target DB and what does your target table look like? PutDatabaseRecord may not be able to handle nested records unless your DB, driver, and target table support them.
Alternatively you may need to use UpdateRecord to flatten the "employers" object into fields at the top level of the record. This is a manual process (until NIFI-4398 is implemented), but you only have 4 fields. After flattening the records, you could use PartitionRecord to get all records with a specific value for, say, employers.company. The outgoing flow files from PartitionRecord would technically constitute the distinct values for the partition field(s). I'm not sure what you're doing with the distinct values, but if you can elaborate I'd be happy to help.

Getting album, album art, and run time info from musicbrainz

Is there any way of getting a list of albums for an artist (band), along with a link to album art and runtime?
I've been given this endpoint, but the data it returns is confusing:
http://musicbrainz.org/ws/2/recording?query=artist:%22Queen%22%20and%20type:album&fmt=json
The data isn't really organized around albums, and the "length" data returns something like 203000. But it's better if you see it in context, so here's the first bit of it (sorry I couldn't get it indented):
{
"created": "2018-02-17T03:47:57.052Z",
"count": 9533710,
"offset": 0,
"recordings": [
{
"id": "c2e919f7-ecb9-4fdf-9162-3c26d0127fa0",
"score": "100",
"title": "Son and Daughter",
"length": 203000,
"video": null,
"artist-credit": [
{
"artist": {
"id": "0383dadf-2a4e-4d10-a46a-e9e041da8eb3",
"name": "Queen",
"sort-name": "Queen",
"disambiguation": "UK rock group",
"aliases": [
{
"sort-name": "Queen + Adam Lambert",
"name": "Queen + Adam Lambert",
"locale": null,
"type": null,
"primary": null,
"begin-date": "2011",
"end-date": null
}
]
}
}
],
"releases": [
{
"id": "bb19abaf-80b3-4a3e-846d-5f12b12af827",
"title": "Queen",
"status": "Official",
"release-group": {
"id": "810068af-2b3c-3e9c-b2ab-68a3f3e3787d",
"primary-type": "Album"
},
"date": "1994",
"country": "NL",
"release-events": [
{
"date": "1994",
"area": {
"id": "ef1b7cc0-cd26-36f4-8ea0-04d9623786c7",
"name": "Netherlands",
"sort-name": "Netherlands",
"iso-3166-1-codes": [
"NL"
]
}
}
],
"track-count": 10,
"media": [
{
"position": 1,
"format": "CD",
"track": [
{
"id": "3a26455e-2660-30dc-a652-6a2b40f1fbe5",
"number": "8",
"title": "Son and Daughter",
"length": 203400
}
],
"track-count": 10,
"track-offset": 7
}
]
},
{
"id": "1783da6a-9315-3602-a488-1738eb733a0f",
"title": "Queen",
"status": "Official",
"release-group": {
"id": "810068af-2b3c-3e9c-b2ab-68a3f3e3787d",
"primary-type": "Album"
},
"date": "1973-09-04",
"country": "US",
"release-events": [
{
"date": "1973-09-04",
"area": {
"id": "489ce91b-6658-3307-9877-795b68554c98",
"name": "United States",
"sort-name": "United States",
"iso-3166-1-codes": [
"US"
]
}
}
],
If someone can explain this data to me, then I don't need another endpoint. But I've been hunting around the musicbrainz docs and they're not super helpful.
Preferably it would be with one call, but I can do successive calls if necessary.
Thanks for your help.
First off:
Is there any way of getting a list of albums for an artist (band), along with a link to album art and runtime?
Yes, definitely.
First you will want to find the artist, say, the Queen that did Bohemian Rhapsody. They're identified with MusicBrainz Artist ID "0383dadf-2a4e-4d10-a46a-e9e041da8eb3", so you can do a browse request for Releases by this artist: https://musicbrainz.org/ws/2/release/?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&inc=recordings&fmt=json (note the inc=recordings)
This gives you most of what you are asking for. A list of releases and their runtime—kind of. Each Release should have one or more medium properties that in turn have a track-list with a number of tracks. The sum of the length of each of these tracks is what makes up the runtime (the length is given in milliseconds).
For cover art, you may notice that the output has a cover-art-archive property. For cover art, MusicBrainz uses Cover Art Archive which uses MusicBrainz IDs as identifiers. The cover-art-archive attribute states whether any cover art exists in Cover Art Archive and a few details about this—e.g., does CAA have any images at all (artwork)? Does it have a back image (back) and/or a front image (front)? How many images are there in all for the release (count)? If the cover-art-archive→artwork is true, we can go on and fetch cover art from the CAA. The CAA's API is really simple: to get the "front" image of a release, say the 1974 UK single "Killer Queen" that has MusicBrainz Release ID "a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc", you can simply do https://coverartarchive.org/release/a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc/front
You can also do https://coverartarchive.org/release/a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc to get a JSON document with more details about what cover art images are available (e.g., this one has two images: one Front+Medium and one Back+Medium image).
The Cover Art Archive API is documented at https://musicbrainz.org/doc/Cover_Art_Archive/API and the MusicBrainz API/web service documentation can be found at https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2
Note that using browse requests you can page through the results using offset and vary the amount of results per query using limit, see the "Paging" section under the browse request section in the MusicBrainz WS documentation.
Secondly: Though you don't ask about this directly, you're using a search query using a generic term in your question, so I thought I'd talk about this for a bit. In MusicBrainz everything is identified using MusicBrainz identifiers (IDs). (I kind of mentioned them in the first section too.)
The reason for this is that many, many names are not unique. There are as of this writing three unique artists known as "Queen" in MusicBrainz: https://musicbrainz.org/search?query=%22queen%22&type=artist&method=advanced – not counting any of the 321 other artists that have "queen" as part of their name. Without more information, it is not possible for MusicBrainz to know which of them you want to find out information from, so your first step will likely be to somehow either narrow the search (e.g., add type:group narrows the search to 123 results, using country:gb limits to 21 results, doing both gives 11 results (see the search syntax documentation for more details)) or somehow filter afterwards.
Once you've narrowed it down to the specific artist you want, you can continue with the steps outlined above to get the details you want. The steps for narrowing it down will depend on your specific application/use case.
Finally: You seem to have some missing understanding at the asbstract level about how MusicBrainz's data is structured. E.g., all of the above is assuming that by album you mean a specific released version like the 1974 UK "Killer Queen" single, and not a more generic concept of a release like any version of the "Killer Queen" single, which in MusicBrainz terminology would be a Release Group.
https://musicbrainz.org/doc/MusicBrainz_Entity is a list of entities used in MusicBrainz. Understanding the differences between a Release Group and a Release as well as between Tracks and Recordings (and Works) will put you in a much better position to effectively use the web service and the MusicBrainz data in general.
https://musicbrainz.org/doc/MusicBrainz_Database/Schema is a introduction to how MusicBrainz is structured. Knowing how artist credits, ("advanced") relationships, and mediums play into things is also likely to save you a lot of headache later.
You need to understand the format of the data returned, copy the result in to a JSON formatting service such as https://jsonformatter.curiousconcept.com/
You will then realise you have multiple artists in the returned data, which is why it's not as simple as "albums by artist"
I’m guessing the "length" data is in milliseconds.

Nested query parameters in Swagger 2.0

I'm documenting a Rails app with Swagger 2.0 and using Swagger-UI as the human-readable documentation/sandbox solution.
I have a resource where clients can store arbitrary metadata to query later. According to the Rails convention, the query would be submitted like so:
/posts?metadata[thing1]=abc&metadata[thing2]=def
which Rails translates to params of:
{ "metadata" => { "thing1" => "abc", "thing2" => "def" } }
which can easily be used to generate the appropriate WHERE clause for the database.
Is there any support for something like this in Swagger? I want to ultimately have Swagger-UI give some way to modify the generated request to add on arbitrary params under the metadata namespace.
This doesn't appear supported yet (over 2 years after you asked the question), but there's an ongoing discussion & open ticket about adding support for this on the OpenAPI github repo. They refer to this type of nesting as deepObjects.
There's another open issue where an implementation was attempted here. Using the most recent stable swagger-ui release, however, I have observed it working as I expect:
"parameters": [
{
"name": "page[number]",
"in": "query",
"type": "integer",
"default": 1,
"required": false
},
{
"name": "page[size]",
"in": "query",
"type": "integer",
"default": 25,
"required": false
}
This presents the expected dialog box & works with Try it out against a working server.
I don't believe there is a good way to specify arbitrary or a selection of values (e.g. an enum), so you may have to add parameters for every nesting option.

Resources