Complex queries in CouchDB across multiple types of documents - join

I'm relatively new to CouchDB (more specifically Cloudant if it matters) and I'm having a hard time wrapping my head around something.
Assume the following (simplified) document examples:
{ "docType": "school", "_id": "school1", "state": "CA" }
{ "docType": "teacher", "_id": "teacher1", "age": "40", "school": "school1" }
I want to find all the teachers aged $age (eg. 40) in state $state (eg. CA).

Views only consider one document at a time; that is queries can't directly combine data from different documents. You can query across multiple fields in the same document using Cloudant Query. You can write a selector directly in the Cloudant dashboard. Something like
"selector": {
"age": {
"$gte": 40
},
"state": {
"$eq": "CA"
}
}
See https://cloud.ibm.com/docs/services/Cloudant/tutorials?topic=cloudant-creating-an-ibm-cloudant-query
with the full reference here: https://cloud.ibm.com/docs/services/Cloudant/tutorials?topic=cloudant-query
You could also use a so-called linked document to emulate basic joins, as outlined in the CouchDB docs https://docs.couchdb.org/en/stable/ddocs/views/joins.html

Related

Getting album, album art, and run time info from musicbrainz

Is there any way of getting a list of albums for an artist (band), along with a link to album art and runtime?
I've been given this endpoint, but the data it returns is confusing:
http://musicbrainz.org/ws/2/recording?query=artist:%22Queen%22%20and%20type:album&fmt=json
The data isn't really organized around albums, and the "length" data returns something like 203000. But it's better if you see it in context, so here's the first bit of it (sorry I couldn't get it indented):
{
"created": "2018-02-17T03:47:57.052Z",
"count": 9533710,
"offset": 0,
"recordings": [
{
"id": "c2e919f7-ecb9-4fdf-9162-3c26d0127fa0",
"score": "100",
"title": "Son and Daughter",
"length": 203000,
"video": null,
"artist-credit": [
{
"artist": {
"id": "0383dadf-2a4e-4d10-a46a-e9e041da8eb3",
"name": "Queen",
"sort-name": "Queen",
"disambiguation": "UK rock group",
"aliases": [
{
"sort-name": "Queen + Adam Lambert",
"name": "Queen + Adam Lambert",
"locale": null,
"type": null,
"primary": null,
"begin-date": "2011",
"end-date": null
}
]
}
}
],
"releases": [
{
"id": "bb19abaf-80b3-4a3e-846d-5f12b12af827",
"title": "Queen",
"status": "Official",
"release-group": {
"id": "810068af-2b3c-3e9c-b2ab-68a3f3e3787d",
"primary-type": "Album"
},
"date": "1994",
"country": "NL",
"release-events": [
{
"date": "1994",
"area": {
"id": "ef1b7cc0-cd26-36f4-8ea0-04d9623786c7",
"name": "Netherlands",
"sort-name": "Netherlands",
"iso-3166-1-codes": [
"NL"
]
}
}
],
"track-count": 10,
"media": [
{
"position": 1,
"format": "CD",
"track": [
{
"id": "3a26455e-2660-30dc-a652-6a2b40f1fbe5",
"number": "8",
"title": "Son and Daughter",
"length": 203400
}
],
"track-count": 10,
"track-offset": 7
}
]
},
{
"id": "1783da6a-9315-3602-a488-1738eb733a0f",
"title": "Queen",
"status": "Official",
"release-group": {
"id": "810068af-2b3c-3e9c-b2ab-68a3f3e3787d",
"primary-type": "Album"
},
"date": "1973-09-04",
"country": "US",
"release-events": [
{
"date": "1973-09-04",
"area": {
"id": "489ce91b-6658-3307-9877-795b68554c98",
"name": "United States",
"sort-name": "United States",
"iso-3166-1-codes": [
"US"
]
}
}
],
If someone can explain this data to me, then I don't need another endpoint. But I've been hunting around the musicbrainz docs and they're not super helpful.
Preferably it would be with one call, but I can do successive calls if necessary.
Thanks for your help.
First off:
Is there any way of getting a list of albums for an artist (band), along with a link to album art and runtime?
Yes, definitely.
First you will want to find the artist, say, the Queen that did Bohemian Rhapsody. They're identified with MusicBrainz Artist ID "0383dadf-2a4e-4d10-a46a-e9e041da8eb3", so you can do a browse request for Releases by this artist: https://musicbrainz.org/ws/2/release/?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&inc=recordings&fmt=json (note the inc=recordings)
This gives you most of what you are asking for. A list of releases and their runtime—kind of. Each Release should have one or more medium properties that in turn have a track-list with a number of tracks. The sum of the length of each of these tracks is what makes up the runtime (the length is given in milliseconds).
For cover art, you may notice that the output has a cover-art-archive property. For cover art, MusicBrainz uses Cover Art Archive which uses MusicBrainz IDs as identifiers. The cover-art-archive attribute states whether any cover art exists in Cover Art Archive and a few details about this—e.g., does CAA have any images at all (artwork)? Does it have a back image (back) and/or a front image (front)? How many images are there in all for the release (count)? If the cover-art-archive→artwork is true, we can go on and fetch cover art from the CAA. The CAA's API is really simple: to get the "front" image of a release, say the 1974 UK single "Killer Queen" that has MusicBrainz Release ID "a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc", you can simply do https://coverartarchive.org/release/a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc/front
You can also do https://coverartarchive.org/release/a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc to get a JSON document with more details about what cover art images are available (e.g., this one has two images: one Front+Medium and one Back+Medium image).
The Cover Art Archive API is documented at https://musicbrainz.org/doc/Cover_Art_Archive/API and the MusicBrainz API/web service documentation can be found at https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2
Note that using browse requests you can page through the results using offset and vary the amount of results per query using limit, see the "Paging" section under the browse request section in the MusicBrainz WS documentation.
Secondly: Though you don't ask about this directly, you're using a search query using a generic term in your question, so I thought I'd talk about this for a bit. In MusicBrainz everything is identified using MusicBrainz identifiers (IDs). (I kind of mentioned them in the first section too.)
The reason for this is that many, many names are not unique. There are as of this writing three unique artists known as "Queen" in MusicBrainz: https://musicbrainz.org/search?query=%22queen%22&type=artist&method=advanced – not counting any of the 321 other artists that have "queen" as part of their name. Without more information, it is not possible for MusicBrainz to know which of them you want to find out information from, so your first step will likely be to somehow either narrow the search (e.g., add type:group narrows the search to 123 results, using country:gb limits to 21 results, doing both gives 11 results (see the search syntax documentation for more details)) or somehow filter afterwards.
Once you've narrowed it down to the specific artist you want, you can continue with the steps outlined above to get the details you want. The steps for narrowing it down will depend on your specific application/use case.
Finally: You seem to have some missing understanding at the asbstract level about how MusicBrainz's data is structured. E.g., all of the above is assuming that by album you mean a specific released version like the 1974 UK "Killer Queen" single, and not a more generic concept of a release like any version of the "Killer Queen" single, which in MusicBrainz terminology would be a Release Group.
https://musicbrainz.org/doc/MusicBrainz_Entity is a list of entities used in MusicBrainz. Understanding the differences between a Release Group and a Release as well as between Tracks and Recordings (and Works) will put you in a much better position to effectively use the web service and the MusicBrainz data in general.
https://musicbrainz.org/doc/MusicBrainz_Database/Schema is a introduction to how MusicBrainz is structured. Knowing how artist credits, ("advanced") relationships, and mediums play into things is also likely to save you a lot of headache later.
You need to understand the format of the data returned, copy the result in to a JSON formatting service such as https://jsonformatter.curiousconcept.com/
You will then realise you have multiple artists in the returned data, which is why it's not as simple as "albums by artist"
I’m guessing the "length" data is in milliseconds.

Join two nodes in Firebase

I'm working on an app, which is supposed to show data from two nodes(Firebase). Firebase DB is structured as:
{
"College": {
"4F2EAB65": {
"id": "4F2EAB65",
"name": "SomeCollege"
},
"A3C2ED31": {
"id": "A3C2ED31",
"name": "OtherCollege"
},
"F967B5A0": {
"id": "F967B5A0",
"name": "CoolCollege"
}
},
"Student": {
"3E20545B": {
"college-ID": "4F2EAB65",
"id": "3E20545B",
"name": "A"
},
"6FDEE194": {
"college-ID": "F967B5A0",
"id": "6FDEE194",
"name": "B"
}
}
I want to fetch student details having details: "id", "name", "college-ID", "college-Name"(Need to fetch "college-Name" by "college-ID").
I've achieved this using for loop at front end. Is there any way to get this achieved at Firebase server, also can we make something like join (SQL).
Thanks.
There is no support for server-side joins in the Firebase Realtime Database. Client-side joins are quite normal.
The alternative is to duplicate the data upon writing, so that you don't have to read from two locations.
What's best for your application is a matter of personal preference, your comfort level with the code involved vs data duplication, and the use-cases of your app.
Client-side jons are likely not as slow as you may think. See http://stackoverflow.com/questions/35931526/speed-up-fetching-posts-for-my-social-network-app-by-using-query-instead-of-obse/35932786#35932786

How do I connect my database to API.AI?

How do I connect my database to API.AI
Making every sentence into INTENT and creating entities for each doesn't seem to be a good idea? So what is the best possible way to go about?
As far as I know it is not possible yet, but you can switch to row mode and past your entities inCVS or JSON format OR import a JSON/CSV file containing all your entities.
The file should look like below (JSON format):
[
{
"value": "val1",
"synonyms": [
"syn1",
"syn2"
]
},
{
"value": "val2",
"synonyms": [
"syn21",
"syn22"
]
},
]
So you can image of writing a small job that reads entities from you DB and make a JSON/CSV file according the wanted format.
Once the job done, this process may dramatically facilitate the creation of your entities on api.ai.
If you use a webhook for an intent, you can pass params to your endpoint where you can do all the queries to your db
I did a demo where I was querying news (cheating as I was getting it from the web, but I could plug a DB).
The was getting requests such as:
"What are the latest news about France"
latest and France would be params that I send through to the webhook endpoint.
You would get the following JSON sent your endpoint by API.AI
"result": {
"source": "agent",
"resolvedQuery": "latest news about France",
"action": "show.news",
"actionIncomplete": false,
"parameters": {
"adjective": "latest",
"subject": "France"
}
Then you can query all the news for France and order them by latest
In my understanding the idea is to create entities that are "placeholders" for the values you need to query.
Then you teach the AI with few examples by tagging in the request what did the person ask. Let say someone asks:
"what is the oldest news about France?"
The AI may not know what is oldest thus you tell it is is an adjective and from now on you can get oldest as a param

Importing web service with Core Data or SQLite or something else?

I'm creating an Events app which needs to pull data from a JSON web service to get information about the artists and the shows that are being played. The data will be used to display the line up of artists (a to z) on one view, artists by date and time on another view, and artists by location and sorted by date/time on a third view. We will also allow the user to add shows to their schedule.
The JSON data is similar to this:
Artists feed:
[
{
"artists": {
"3": {
"id": "3",
"title": "Kendrick Lamar",
"subtitle": null,
"imageURL":
"//goevent-images.s3.amazonaws.com/.../web/artist_3_20140331112744_d57b5a70.jpg",
"gcInfo": "artist$kendrick-lamar/3",
"shows": [ {
"id": 153,
"venueTitle": "Sapporo scene",
"formattedDate": "Sunday, August 31",
"date": "2014-08-31",
"title": "Kendrick Lamar"
}
],
"tags": ",8,159,164,",
"color": "#00a0a0",
"dates": [
"2014-08-31"
]
},... },
]
Shows feed:
[
{
"items": {
"197": {
"id": 197,
"title": "Arcade Fire",
"type": "artist",
"dateStart": "2014-08-30",
"timeStart": "16:00:00",
"formattedTimeStart": " 4:00 PM",
"gcInfo": "artist$arcade-fire/127",
"venueId": "1",
"tags": ",80,",
"color": "#337FC3"
}
}
]
Shows and artists will have a many to many relationship. I'll also need to create an entity/table for storing the user's shows that will be added to their personal schedule.
Bands/shows do have the possibility of being removed from the feed, so I think I'll likely need to clear out the artists and shows entities/tables before importing. I'm worried this will break the relationship to the user's scheduled shows.
I also need to download as much of the high-level data as possible upfront so that the app can be used offline as well.
So my question is:
What's the best approach to importing and storing the data for this?
“Best” is subjective. As I understand it, Core Data uses SQLite under the covers, so it's really more a matter of what you're comfortable with.
Have you used Core Data before? If so, use that.
Have you used an SQL Database before? if so, use SQLite.
If you're starting from square one, I supposed I'd recommend Core Data.
Here are a few links to get you started:
Data Management in iOS by Apple
Core Data Programming Guide from Apple.
Core Data Tutorial for iOS: Getting Started
How to Use SQLite to Manage Data in iOS Apps
SQLite Tutorial for iOS: Creating and Scripting

Solr CollapsingQParserPlugin with group.facet=on style facet counts

I have a Solr index of about 5 million documents at 8GB using Solr 4.7.0. I require grouping in Solr, but find it to be too slow. Here is the group configuration:
group=on
group.facet=on
group.field=workId
group.ngroups=on
The machine has ample memory at 24GB and 4GB is allocated to Solr itself. Queries are generally taking about 1200ms compared to 90ms when grouping is turned off.
I ran across a plugin called CollapsingQParserPlugin which uses a filter query to remove all but one of a group.
fq={!collapse field=workId}
It's designed for indexes that have a lot of unique groups. I have about 3.8 million. This approach is much much faster at about 120ms. It's a beautiful solution for me except for one thing. Because it filters out other members of the group, only facets from the representative document are counted. For instance, if I have the following three documents:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book"
},
{
"id": "2",
"workId": "abc",
"type": "ebook"
},
{
"id": "3",
"workId": "abc",
"type": "ebook"
}
]
once collapsed, only the top one shows up in the results. Because the other two get filtered out, the facet counts look like
"type": ["book":1]
instead of
"type": ["book":1, "ebook":1]
Is there a way to get group.facet counts using the collapse filter query?
According to Yonik Seeley, the correct group facet counts can be gathered using the JSON Facet API. His comments can be found at:
https://issues.apache.org/jira/browse/SOLR-7036?focusedCommentId=15601789&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15601789
I tested out his method and it works great. I still use the CollapsingQParserPlugin to collapse the results, but I exclude the filter when counting up the facets like so:
fq={!tag=workId}{!collapse field=workId}
json.facet={
type: {
type: terms,
field: type,
facet: {
workCount: "unique(workId)"
},
domain: {
excludeTags: [workId]
}
}
}
And the result:
{
"facets": {
"count": 3,
"type": {
"buckets": [
{
"val": "ebook",
"count": 2,
"workCount": 1
},
{
"val": "book",
"count": 1,
"workCount": 1
}
]
}
}
}
I was unable to find a way to do this with Solr or plugin configurations, so I developed a work around to effectively create group facet counts while still using the CollapsingQParserPlugin.
I do this by making a duplicate of the fields I'll be faceting on and making sure all facet values for the entire group are in each document like so:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book",
"facetType": [
"book",
"ebook"
]
},
{
"id": "2",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
},
{
"id": "3",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
}
]
When I ask Solr to generate facet counts, I use the new field:
facet.field=facetType
This ensures that all facet values are accounted for and that the counts represent groups. But when I use a filter query, I revert back to using the old field:
fq=type:book
This way the correct document is chosen to represent the group.
I know this is a dirty, complex way to make it work, but it does work and that's what I needed. Also it requires the ability to query your documents before insertion into Solr, which calls for some development. If anyone has a simpler solution I would still love to hear it.

Resources