How to parse JSON to an entity that has a relationship to itself using Sync? - ios

I am using Sync trying to parse some JSON to Core Data.
My "Creature" entity has a parent-children relationship that looks like this:
and the JSON has a format similar to this:
[
{
"id": 1,
"name": "Mad king",
"parent": null,
"children": [
5
]
},
{
"id": 2,
"name": "Drogon",
"parent": 5,
"children": []
},
{
"id": 3,
"name": "Rhaegal",
"parent": 5,
"children": []
},
{
"id": 4,
"name": "Viserion",
"parent": 5,
"children": []
},
{
"id": 5,
"name": "Daenerys",
"parent": 1,
"children": [
2,
3,
4
]
}
]
The Mad king has one child Daenerys who has 3 children (Drogon, Rhaegal and Viserion).
Now, I know that Sync does support this sort of setup (where the JSON contains only the ids of parents/children instead of whole objects) and I suspect I have to parse the file twice - one for just getting all the objects and the second to create the relationships among them. For the second to work, I need to rename children to children_ids and parent to parent_id (as described in their README).
However I can't understand how exactly would I do that. Is it possible to ignore the parent/children during the first pass and then take them into account (using the modified keys) during the second?
Or could someone maybe propose a better solution that would (ideally) require just one pass?

According to the documentation:
For example, in the one-to-many example, you have a user, that has
many notes. If you already have synced all the notes then your JSON
would only need the notes_ids, this can be an array of strings or
integers. As a side-note only do this if you are 100% sure that all
the required items (notes) have been synced, otherwise this
relationships will get ignored and an error will be logged.
So you can, in theory, just blindly perform a full sync to actually get all the models(letting it fail on the relationships), and then sync again immediately after to get the relationships.
If you want to avoid the errors, you might want to write some helper functions to create 2 sets of JSON for these models, 1 to define the objects, and then a second to define the relationships. Either way, you'd need to do 2 passes.

Related

Exact change GLAccount for BankEntryLine

Currently we import our bank transactions. Through the REST API I read all these transactions and try to match them to our internal invoices.
If I find a match I need to change the GLAccountCode from for example 1000 to 2000 for this particular BankEntryLine. All I see on the BankEntryLine is that I can do a GET or POST but no PUT method.
Is there something wrong with my approach? Like do I have to create something else that reconciles this transaction or is there a different way of updating this transaction line?
Example BankEntryLine:
{
"d": {
"__metadata": {
"uri": "https://start.exactonline.nl/api/v1/000000/financialtransaction/BankEntryLines(guid'123000000-0000-0000-0000-000000000000')",
"type": "Exact.Web.Api.Models.Financial.BankEntryLine"
},
"Document": "00000000-0000-0000-0000-000000000000",
"DocumentNumber": 00000,
"EntryID": "00000000-0000-0000-0000-000000000000",
"EntryNumber": 00000000,
"ExchangeRate": 1,
"GLAccount": "100000000-0000-0000-0000-000000000000",
"GLAccountCode": "1000",
"ID": "123000000-0000-0000-0000-000000000000",
"LineNumber": 1,
"OffsetID": "000000000-0000-0000-0000-000000000000",
"OurRef": null,
"Project": null,
"ProjectCode": null,
"ProjectDescription": null,
"Quantity": null,
"VATCode": "4 "
}
}
API documentation: https://start.exactonline.nl/docs/HlpRestAPIResources.aspx?SourceAction=10
BankEntryLine: https://start.exactonline.nl/docs/HlpRestAPIResourcesDetails.aspx?name=FinancialTransactionBankEntryLines
There is no PUT or DELETE available for this API. I don't directly see another way to update/delete those lines.
Only possible workaround is to make a general journal entry to balance the amount of that suspense GL account to the one you need/want. But that will give you more entries and more lines to match.

How does one parse nested Avro records correctly in NiFi?

I have incoming Avro records that roughly follow the format below. I am able to read them and convert them in existing NiFi flows. However, a recent change requires me to read from these files and parse the nested record, employers in this example. I read the Apache NiFi blog post, Record-Oriented Data with NiFi
but was unable to figure out how to get the AvroRecordReader to parse nested records.
{
"name": "recordFormatName",
"namespace": "nifi.examples",
"type": "record",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "firstName", "type": "string" },
{ "name": "lastName", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "gender", "type": "string" },
{ "name": "employers",
"type": "record",
"fields": [
{"name": "company", "type": "string"},
{"name": "guid", "type": "string"},
{"name": "streetaddress", "type": "string"},
{"name": "city", "type": "string"}
]}
]
}
What I hope to achieve is a flow to read the employers records for each recordFormatName record and use the PutDatabaseRecord processor to keep track of the employers values seen. The current plan is to insert the records to a MySQL database. As suggested in an answer below, I plan on using PartitionRecord to sort the records based on a value in the employers subrecord. I do not need the top level details for this particular flow.
I have tried to parse with the AvroRecordReader but cannot figure out how to specify the nested records. Is this something that can be accomplished with the AvroRecordReader alone or does preprocessing, say a JOLT Transform need to happen first?
EDIT: Added further details about database after receiving a response.
What is your target DB and what does your target table look like? PutDatabaseRecord may not be able to handle nested records unless your DB, driver, and target table support them.
Alternatively you may need to use UpdateRecord to flatten the "employers" object into fields at the top level of the record. This is a manual process (until NIFI-4398 is implemented), but you only have 4 fields. After flattening the records, you could use PartitionRecord to get all records with a specific value for, say, employers.company. The outgoing flow files from PartitionRecord would technically constitute the distinct values for the partition field(s). I'm not sure what you're doing with the distinct values, but if you can elaborate I'd be happy to help.

Getting album, album art, and run time info from musicbrainz

Is there any way of getting a list of albums for an artist (band), along with a link to album art and runtime?
I've been given this endpoint, but the data it returns is confusing:
http://musicbrainz.org/ws/2/recording?query=artist:%22Queen%22%20and%20type:album&fmt=json
The data isn't really organized around albums, and the "length" data returns something like 203000. But it's better if you see it in context, so here's the first bit of it (sorry I couldn't get it indented):
{
"created": "2018-02-17T03:47:57.052Z",
"count": 9533710,
"offset": 0,
"recordings": [
{
"id": "c2e919f7-ecb9-4fdf-9162-3c26d0127fa0",
"score": "100",
"title": "Son and Daughter",
"length": 203000,
"video": null,
"artist-credit": [
{
"artist": {
"id": "0383dadf-2a4e-4d10-a46a-e9e041da8eb3",
"name": "Queen",
"sort-name": "Queen",
"disambiguation": "UK rock group",
"aliases": [
{
"sort-name": "Queen + Adam Lambert",
"name": "Queen + Adam Lambert",
"locale": null,
"type": null,
"primary": null,
"begin-date": "2011",
"end-date": null
}
]
}
}
],
"releases": [
{
"id": "bb19abaf-80b3-4a3e-846d-5f12b12af827",
"title": "Queen",
"status": "Official",
"release-group": {
"id": "810068af-2b3c-3e9c-b2ab-68a3f3e3787d",
"primary-type": "Album"
},
"date": "1994",
"country": "NL",
"release-events": [
{
"date": "1994",
"area": {
"id": "ef1b7cc0-cd26-36f4-8ea0-04d9623786c7",
"name": "Netherlands",
"sort-name": "Netherlands",
"iso-3166-1-codes": [
"NL"
]
}
}
],
"track-count": 10,
"media": [
{
"position": 1,
"format": "CD",
"track": [
{
"id": "3a26455e-2660-30dc-a652-6a2b40f1fbe5",
"number": "8",
"title": "Son and Daughter",
"length": 203400
}
],
"track-count": 10,
"track-offset": 7
}
]
},
{
"id": "1783da6a-9315-3602-a488-1738eb733a0f",
"title": "Queen",
"status": "Official",
"release-group": {
"id": "810068af-2b3c-3e9c-b2ab-68a3f3e3787d",
"primary-type": "Album"
},
"date": "1973-09-04",
"country": "US",
"release-events": [
{
"date": "1973-09-04",
"area": {
"id": "489ce91b-6658-3307-9877-795b68554c98",
"name": "United States",
"sort-name": "United States",
"iso-3166-1-codes": [
"US"
]
}
}
],
If someone can explain this data to me, then I don't need another endpoint. But I've been hunting around the musicbrainz docs and they're not super helpful.
Preferably it would be with one call, but I can do successive calls if necessary.
Thanks for your help.
First off:
Is there any way of getting a list of albums for an artist (band), along with a link to album art and runtime?
Yes, definitely.
First you will want to find the artist, say, the Queen that did Bohemian Rhapsody. They're identified with MusicBrainz Artist ID "0383dadf-2a4e-4d10-a46a-e9e041da8eb3", so you can do a browse request for Releases by this artist: https://musicbrainz.org/ws/2/release/?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&inc=recordings&fmt=json (note the inc=recordings)
This gives you most of what you are asking for. A list of releases and their runtime—kind of. Each Release should have one or more medium properties that in turn have a track-list with a number of tracks. The sum of the length of each of these tracks is what makes up the runtime (the length is given in milliseconds).
For cover art, you may notice that the output has a cover-art-archive property. For cover art, MusicBrainz uses Cover Art Archive which uses MusicBrainz IDs as identifiers. The cover-art-archive attribute states whether any cover art exists in Cover Art Archive and a few details about this—e.g., does CAA have any images at all (artwork)? Does it have a back image (back) and/or a front image (front)? How many images are there in all for the release (count)? If the cover-art-archive→artwork is true, we can go on and fetch cover art from the CAA. The CAA's API is really simple: to get the "front" image of a release, say the 1974 UK single "Killer Queen" that has MusicBrainz Release ID "a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc", you can simply do https://coverartarchive.org/release/a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc/front
You can also do https://coverartarchive.org/release/a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc to get a JSON document with more details about what cover art images are available (e.g., this one has two images: one Front+Medium and one Back+Medium image).
The Cover Art Archive API is documented at https://musicbrainz.org/doc/Cover_Art_Archive/API and the MusicBrainz API/web service documentation can be found at https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2
Note that using browse requests you can page through the results using offset and vary the amount of results per query using limit, see the "Paging" section under the browse request section in the MusicBrainz WS documentation.
Secondly: Though you don't ask about this directly, you're using a search query using a generic term in your question, so I thought I'd talk about this for a bit. In MusicBrainz everything is identified using MusicBrainz identifiers (IDs). (I kind of mentioned them in the first section too.)
The reason for this is that many, many names are not unique. There are as of this writing three unique artists known as "Queen" in MusicBrainz: https://musicbrainz.org/search?query=%22queen%22&type=artist&method=advanced – not counting any of the 321 other artists that have "queen" as part of their name. Without more information, it is not possible for MusicBrainz to know which of them you want to find out information from, so your first step will likely be to somehow either narrow the search (e.g., add type:group narrows the search to 123 results, using country:gb limits to 21 results, doing both gives 11 results (see the search syntax documentation for more details)) or somehow filter afterwards.
Once you've narrowed it down to the specific artist you want, you can continue with the steps outlined above to get the details you want. The steps for narrowing it down will depend on your specific application/use case.
Finally: You seem to have some missing understanding at the asbstract level about how MusicBrainz's data is structured. E.g., all of the above is assuming that by album you mean a specific released version like the 1974 UK "Killer Queen" single, and not a more generic concept of a release like any version of the "Killer Queen" single, which in MusicBrainz terminology would be a Release Group.
https://musicbrainz.org/doc/MusicBrainz_Entity is a list of entities used in MusicBrainz. Understanding the differences between a Release Group and a Release as well as between Tracks and Recordings (and Works) will put you in a much better position to effectively use the web service and the MusicBrainz data in general.
https://musicbrainz.org/doc/MusicBrainz_Database/Schema is a introduction to how MusicBrainz is structured. Knowing how artist credits, ("advanced") relationships, and mediums play into things is also likely to save you a lot of headache later.
You need to understand the format of the data returned, copy the result in to a JSON formatting service such as https://jsonformatter.curiousconcept.com/
You will then realise you have multiple artists in the returned data, which is why it's not as simple as "albums by artist"
I’m guessing the "length" data is in milliseconds.

Improving performance of a JSON with many arrays

I'm currently developing a SPA(shift planner) using rails 5 in api mode and angular JS.
The problem is it takes the browser 1 second to display the plain JSON.
The structure of the JSON looks like this:
[
{
"id": 2,
"name": "person0",
"skills": [
{
"skill_id": 3,
"person_id": 2,
"name": "skill1"
},
{
"skill_id": 6,
"person_id": 2,
"name": "skill4"
}
],
"roles": [
{
"name": "role4",
"id": 5,
"person_id": 2
},
{
"name": "role8",
"id": 9,
"person_id": 2
}
],
"languages": [
{
"name": "language1",
"id": 2,
"person_id": 2
}
],
"shifts": [
{
"date_of_shift": "2016-02-29T00:00:00+00:00",
"shift_id": 1011,
"shift_type_id": 1,
"name": "shift_type0"
},
{
"date_of_shift": "2016-03-01T00:00:00+00:00",
"shift_id": 1012,
"shift_type_id": 2,
"name": "shift_type1"
},
{
"date_of_shift": "2016-03-02T00:00:00+00:00",
"shift_id": 1013,
"shift_type_id": 4,
"name": "shift_type3"
},
{
"date_of_shift": "2016-03-03T00:00:00+00:00",
"shift_id": 1014,
"shift_type_id": 8,
"name": "shift_type7"
},
{
"date_of_shift": "2016-03-04T00:00:00+00:00",
"shift_id": 1015,
"shift_type_id": 1,
"name": "shift_type0"
}
]
},
So each person has about 40 elements and I have 50 people (for the department I want to display) in my test data, which leads to 2000 json elements.
So I did some research how fast it is to display a json and it looks like even 4 years ago it was way faster than what I experience.
This would be the output of the railsserver for the request.
Completed 200 OK in 913ms (Views: 18.8ms | ActiveRecord: 73.3ms)
When I'm getting all people in the db (2000) via the API as json
it only takes 300ms to finish displaying the json.
Server output:
Completed 200 OK in 335ms (Views: 329.8ms | ActiveRecord: 4.6ms)
Now it takes way more time to render the view on the server side, but the time spend in active record and the view is the same as the total time, whereas in the other api call active record and the view only take 100ms and 800ms are spend doing something different.
The people json would look something like this:
[
{
"id": 1,
"name": "jonny",
"department_id": 1,
"created_at": "2016-02-04T13:33:34.357Z",
"updated_at": "2016-02-04T13:33:34.357Z"
},
only 2000 times.
I get the data using a psql query which gives me exactly the data I need. I don't do anything else with it in rails.
This behavior is very confusing to me. Can someone explain the differences and why they happen and hopefully how to improve the performance in the first case.
edit: I tried removing the person_id from skills, roles and languages array but it had almost no impact on the performance.
From what I see, it seems you are collecting data from multiple models skills, languages, shifts and so on. What does your controller look like? The Active record call takes almost 20 times as long as the one for people. Did you use joins? http://edgeguides.rubyonrails.org/active_record_querying.html#joining-tables
When not, I would probably says this is an issue, as Active Record will make a separate DB query for every item.
I discovered a bug in rails I described here
formating the query makes it way slower. my query now only takes 160ms instead of 900ms. only because I removed all unneccessary spaces.

Solr CollapsingQParserPlugin with group.facet=on style facet counts

I have a Solr index of about 5 million documents at 8GB using Solr 4.7.0. I require grouping in Solr, but find it to be too slow. Here is the group configuration:
group=on
group.facet=on
group.field=workId
group.ngroups=on
The machine has ample memory at 24GB and 4GB is allocated to Solr itself. Queries are generally taking about 1200ms compared to 90ms when grouping is turned off.
I ran across a plugin called CollapsingQParserPlugin which uses a filter query to remove all but one of a group.
fq={!collapse field=workId}
It's designed for indexes that have a lot of unique groups. I have about 3.8 million. This approach is much much faster at about 120ms. It's a beautiful solution for me except for one thing. Because it filters out other members of the group, only facets from the representative document are counted. For instance, if I have the following three documents:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book"
},
{
"id": "2",
"workId": "abc",
"type": "ebook"
},
{
"id": "3",
"workId": "abc",
"type": "ebook"
}
]
once collapsed, only the top one shows up in the results. Because the other two get filtered out, the facet counts look like
"type": ["book":1]
instead of
"type": ["book":1, "ebook":1]
Is there a way to get group.facet counts using the collapse filter query?
According to Yonik Seeley, the correct group facet counts can be gathered using the JSON Facet API. His comments can be found at:
https://issues.apache.org/jira/browse/SOLR-7036?focusedCommentId=15601789&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15601789
I tested out his method and it works great. I still use the CollapsingQParserPlugin to collapse the results, but I exclude the filter when counting up the facets like so:
fq={!tag=workId}{!collapse field=workId}
json.facet={
type: {
type: terms,
field: type,
facet: {
workCount: "unique(workId)"
},
domain: {
excludeTags: [workId]
}
}
}
And the result:
{
"facets": {
"count": 3,
"type": {
"buckets": [
{
"val": "ebook",
"count": 2,
"workCount": 1
},
{
"val": "book",
"count": 1,
"workCount": 1
}
]
}
}
}
I was unable to find a way to do this with Solr or plugin configurations, so I developed a work around to effectively create group facet counts while still using the CollapsingQParserPlugin.
I do this by making a duplicate of the fields I'll be faceting on and making sure all facet values for the entire group are in each document like so:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book",
"facetType": [
"book",
"ebook"
]
},
{
"id": "2",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
},
{
"id": "3",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
}
]
When I ask Solr to generate facet counts, I use the new field:
facet.field=facetType
This ensures that all facet values are accounted for and that the counts represent groups. But when I use a filter query, I revert back to using the old field:
fq=type:book
This way the correct document is chosen to represent the group.
I know this is a dirty, complex way to make it work, but it does work and that's what I needed. Also it requires the ability to query your documents before insertion into Solr, which calls for some development. If anyone has a simpler solution I would still love to hear it.

Resources