return certain fields first in elastic search - ruby-on-rails

I have a Rails application with term => definitions stored in nodes on Neo4j that I want my users to search using elastic search. Through usage we've found they far more commonly want to find the term name first before they want to search the description. But I'm having trouble finding the function that returns results for a certain field first over other fields.
[
{
"id": 1,
"data": {
"name": "Foo",
"description": "Something super awesome."
}
},
{
"id": 2,
"data": {
"name": "Bar",
"description": "Something that depends on Foo"
}
}
]
search for "Foo"
Because both terms contain the word Foo in either name or description, my app returns both in alphabetical order and since Bar is alphabetically before Foo, Bar appears first. This can get very tiring when my users search for a common term used in many other terms.
How do I return results from the name field first followed by the secondary results in the description?
I have a feeling this has more to do with neo4j than elastic search

Its possible by Adding term and fields frequency value to your type mapping. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scoring-theory.html
name": {
"type": "string",
"store": true,
"norms": {
"enabled": false
},
"index_options": "docs"
}
let me known any queries.

Related

FastAPI swagger doesn't like list of strings passed via query parameter but endpoint works in browser

I have a problem with a REST API endpoint in FastAPI that accepts a list of strings via a single query parameter. An example of this endpoint's usage is:
http://127.0.0.1:8000/items/2?short=false&response=this&response=that
Here, the parameter named 'response' accepts a list of strings as documented in FastAPI tutorial, section on Query Parameters and String Validation. The endpoint works as expected in the browser.
However, it does not work in Swagger docs. The button labeled 'Add string item' shakes upon clicking 'Execute' to test the endpoint. Swagger UI seems unable to create the expected URL with the embedded query parameters (as shown in Fig 1.).
The code for the endpoint is as follows. I have tried with and without validation.
#app.get("/items/{item_ID}")
async def getQuestion_byID(item_ID: int = Path(
...,
title = "Numeric ID of the question",
description = "Specify a number between 1 and 999",
ge = 1,
le = 999
), response: Optional[List[str]] = Query(
[],
title="Furnish an answer",
description="Answer can only have letters of the alphabet and is case-insensitive",
min_length=3,
max_length=99,
regex="^[a-zA-Z]+$"
), short: bool = Query(
False,
title="Set flag for short result",
description="Acceptable values are 1, True, true, on, yes"
)):
"""
Returns the quiz question or the result.
Accepts item ID as path parameter and
optionally response as query parameter.
Returns result when the response is passed with the item ID.
Otherwise, returns the quiz question.
"""
item = question_bank.get(item_ID, None)
if not item:
return {"question": None}
if response:
return evaluate_response(item_ID, response, short)
else:
return {"question": item["question"]}
Grateful for any help.
As described here, this happens due to that OpenAPI applies the pattern (as well as minimum and maximum constraints) to the schema of the array itself, not just the individual items in the array. If you checked the OpenAPI schema at http://127.0.0.1:8000/openapi.json, you would see that the schema for the response parameter appears as shown below (i.e., validations are being applied to the array itself as well):
{
"description": "Answer can only have letters of the alphabet and is case-insensitive",
"required": false,
"schema": {
"title": "Furnish an answer",
"maxLength": 99,
"minLength": 3,
"pattern": "^[a-zA-Z]+$",
"type": "array",
"items": {
"maxLength": 99,
"minLength": 3,
"pattern": "^[a-zA-Z]+$",
"type": "string"
},
"description": "Answer can only have letters of the alphabet and is case-insensitive",
"default": []
},
"name": "response",
"in": "query"
}
Solution 1
As mentioned here, you could use a Pydantic constr instead to specify items with that contraint:
my_constr = constr(regex="^[a-zA-Z]+$", min_length=3, max_length=99)
response: Optional[List[my_constr]] = Query([], title="Furnish an...", description="Answer can...")
Solution 2
Keep your response parameter as is. Copy the OpenAPI schema from http://127.0.0.1:8000/openapi.json, remove the pattern (as well as minimum and maximum attributes) from response's (array) schema and save the OpenAPI schema to a new file (e.g., my_openapi.json). It should look like this:
...
{
"description": "Answer can only have letters of the alphabet and is case-insensitive",
"required": false,
"schema": {
"title": "Furnish an answer",
"type": "array",
"items": {
"maxLength": 99,
"minLength": 3,
"pattern": "^[a-zA-Z]+$",
"type": "string"
},
"description": "Answer can only have letters of the alphabet and is case-insensitive",
"default": []
},
"name": "response",
"in": "query"
},
...
Then, in your app, instruct FastAPI to use that schema instead:
import json
app.openapi_schema = json.load(open("my_openapi.json"))
Solution 3
Since the above solution would require you to copy and edit the schema every time you make a change or add new endpoints/parameters, you would rather modify the OpenAPI schema as described here. This would save you from copying/editing the schema file. Make sure to add the below at the end of your code (after defining all the routes).
from fastapi.openapi.utils import get_openapi
def custom_openapi():
if app.openapi_schema:
return app.openapi_schema
openapi_schema = get_openapi(
title="FastAPI",
version="0.1.0",
description="This is a very custom OpenAPI schema",
routes=app.routes,
)
del openapi_schema["paths"]["/items/{item_ID}"]["get"]["parameters"][1]["schema"]["maxLength"]
del openapi_schema["paths"]["/items/{item_ID}"]["get"]["parameters"][1]["schema"]["minLength"]
del openapi_schema["paths"]["/items/{item_ID}"]["get"]["parameters"][1]["schema"]["pattern"]
app.openapi_schema = openapi_schema
return app.openapi_schema
app.openapi = custom_openapi
In all the above solutions, the constraints annotation that would normally be shown in OpenAPI under response (i.e., (query) maxLength: 99 minLength: 3 pattern: ^[a-zA-Z]+$), won't appear (since Swagger would create that annotation from the constraints applied to the array, not the items), but there doesn't seem to be a way to preserve that. In Solutions 2 and 3, however, you could modify the "in" attribute, shown in the JSON code snippet above, to manually ddd the annotation. But, as HTML elements, etc., are controlled by Swagger, the whole annotation would appear inside parentheses and without line breaks between the constraints. Nevertheless, you could still inform users about the constraints applied to items, by specifying them in the description of your Query parameter.

Jira set user using API

I have been able to find various questions similar to this one, but none of them are solving this problem.
So I have this custom field
"customfield_10039": {
"required": false,
"schema": {
"type": "user",
"custom": "com.atlassian.jira.plugin.system.customfieldtypes:userpicker",
"customId": 10039
},
"name": "user",
"key": "customfield_10039",
"autoCompleteUrl": "https://integrationtr.atlassian.net/rest/api/1.0/users/picker?fieldName=customfield_10039&fieldConfigId=10140&projectId=10001&showAvatar=true&query=",
"hasDefaultValue": false,
"operations": [
"set"
]
},
So as you can see this allows one to set the user, now I set the user using this:
{"fields":{"customfield_10039" : {"name":"admin"}}}
I have tried so many combinations for name, I have tried using email id, display name, even name by concatenating first name, second name. In each one I get the same error:
{"errorMessages":[],"errors":{"customfield_10039":"user is required."}}

Getting album, album art, and run time info from musicbrainz

Is there any way of getting a list of albums for an artist (band), along with a link to album art and runtime?
I've been given this endpoint, but the data it returns is confusing:
http://musicbrainz.org/ws/2/recording?query=artist:%22Queen%22%20and%20type:album&fmt=json
The data isn't really organized around albums, and the "length" data returns something like 203000. But it's better if you see it in context, so here's the first bit of it (sorry I couldn't get it indented):
{
"created": "2018-02-17T03:47:57.052Z",
"count": 9533710,
"offset": 0,
"recordings": [
{
"id": "c2e919f7-ecb9-4fdf-9162-3c26d0127fa0",
"score": "100",
"title": "Son and Daughter",
"length": 203000,
"video": null,
"artist-credit": [
{
"artist": {
"id": "0383dadf-2a4e-4d10-a46a-e9e041da8eb3",
"name": "Queen",
"sort-name": "Queen",
"disambiguation": "UK rock group",
"aliases": [
{
"sort-name": "Queen + Adam Lambert",
"name": "Queen + Adam Lambert",
"locale": null,
"type": null,
"primary": null,
"begin-date": "2011",
"end-date": null
}
]
}
}
],
"releases": [
{
"id": "bb19abaf-80b3-4a3e-846d-5f12b12af827",
"title": "Queen",
"status": "Official",
"release-group": {
"id": "810068af-2b3c-3e9c-b2ab-68a3f3e3787d",
"primary-type": "Album"
},
"date": "1994",
"country": "NL",
"release-events": [
{
"date": "1994",
"area": {
"id": "ef1b7cc0-cd26-36f4-8ea0-04d9623786c7",
"name": "Netherlands",
"sort-name": "Netherlands",
"iso-3166-1-codes": [
"NL"
]
}
}
],
"track-count": 10,
"media": [
{
"position": 1,
"format": "CD",
"track": [
{
"id": "3a26455e-2660-30dc-a652-6a2b40f1fbe5",
"number": "8",
"title": "Son and Daughter",
"length": 203400
}
],
"track-count": 10,
"track-offset": 7
}
]
},
{
"id": "1783da6a-9315-3602-a488-1738eb733a0f",
"title": "Queen",
"status": "Official",
"release-group": {
"id": "810068af-2b3c-3e9c-b2ab-68a3f3e3787d",
"primary-type": "Album"
},
"date": "1973-09-04",
"country": "US",
"release-events": [
{
"date": "1973-09-04",
"area": {
"id": "489ce91b-6658-3307-9877-795b68554c98",
"name": "United States",
"sort-name": "United States",
"iso-3166-1-codes": [
"US"
]
}
}
],
If someone can explain this data to me, then I don't need another endpoint. But I've been hunting around the musicbrainz docs and they're not super helpful.
Preferably it would be with one call, but I can do successive calls if necessary.
Thanks for your help.
First off:
Is there any way of getting a list of albums for an artist (band), along with a link to album art and runtime?
Yes, definitely.
First you will want to find the artist, say, the Queen that did Bohemian Rhapsody. They're identified with MusicBrainz Artist ID "0383dadf-2a4e-4d10-a46a-e9e041da8eb3", so you can do a browse request for Releases by this artist: https://musicbrainz.org/ws/2/release/?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&inc=recordings&fmt=json (note the inc=recordings)
This gives you most of what you are asking for. A list of releases and their runtime—kind of. Each Release should have one or more medium properties that in turn have a track-list with a number of tracks. The sum of the length of each of these tracks is what makes up the runtime (the length is given in milliseconds).
For cover art, you may notice that the output has a cover-art-archive property. For cover art, MusicBrainz uses Cover Art Archive which uses MusicBrainz IDs as identifiers. The cover-art-archive attribute states whether any cover art exists in Cover Art Archive and a few details about this—e.g., does CAA have any images at all (artwork)? Does it have a back image (back) and/or a front image (front)? How many images are there in all for the release (count)? If the cover-art-archive→artwork is true, we can go on and fetch cover art from the CAA. The CAA's API is really simple: to get the "front" image of a release, say the 1974 UK single "Killer Queen" that has MusicBrainz Release ID "a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc", you can simply do https://coverartarchive.org/release/a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc/front
You can also do https://coverartarchive.org/release/a2d12ee8-9aeb-4d91-bfab-5c21f7a577fc to get a JSON document with more details about what cover art images are available (e.g., this one has two images: one Front+Medium and one Back+Medium image).
The Cover Art Archive API is documented at https://musicbrainz.org/doc/Cover_Art_Archive/API and the MusicBrainz API/web service documentation can be found at https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2
Note that using browse requests you can page through the results using offset and vary the amount of results per query using limit, see the "Paging" section under the browse request section in the MusicBrainz WS documentation.
Secondly: Though you don't ask about this directly, you're using a search query using a generic term in your question, so I thought I'd talk about this for a bit. In MusicBrainz everything is identified using MusicBrainz identifiers (IDs). (I kind of mentioned them in the first section too.)
The reason for this is that many, many names are not unique. There are as of this writing three unique artists known as "Queen" in MusicBrainz: https://musicbrainz.org/search?query=%22queen%22&type=artist&method=advanced – not counting any of the 321 other artists that have "queen" as part of their name. Without more information, it is not possible for MusicBrainz to know which of them you want to find out information from, so your first step will likely be to somehow either narrow the search (e.g., add type:group narrows the search to 123 results, using country:gb limits to 21 results, doing both gives 11 results (see the search syntax documentation for more details)) or somehow filter afterwards.
Once you've narrowed it down to the specific artist you want, you can continue with the steps outlined above to get the details you want. The steps for narrowing it down will depend on your specific application/use case.
Finally: You seem to have some missing understanding at the asbstract level about how MusicBrainz's data is structured. E.g., all of the above is assuming that by album you mean a specific released version like the 1974 UK "Killer Queen" single, and not a more generic concept of a release like any version of the "Killer Queen" single, which in MusicBrainz terminology would be a Release Group.
https://musicbrainz.org/doc/MusicBrainz_Entity is a list of entities used in MusicBrainz. Understanding the differences between a Release Group and a Release as well as between Tracks and Recordings (and Works) will put you in a much better position to effectively use the web service and the MusicBrainz data in general.
https://musicbrainz.org/doc/MusicBrainz_Database/Schema is a introduction to how MusicBrainz is structured. Knowing how artist credits, ("advanced") relationships, and mediums play into things is also likely to save you a lot of headache later.
You need to understand the format of the data returned, copy the result in to a JSON formatting service such as https://jsonformatter.curiousconcept.com/
You will then realise you have multiple artists in the returned data, which is why it's not as simple as "albums by artist"
I’m guessing the "length" data is in milliseconds.

neo4jClient create node with dynamic label using paramters

I am building an app that give users the ability to construct there own graphs. I have been using parameters for all queries and creates. But when I want to give users the ability to create a node where they can also Label it anything they want(respecting neo4j restrictions on empty string labels). How would I parameterize this type of transaction?
I tried this:
.CREATE("(a:{dynamicLabel})").WithParams(new {dynamicLabel = dlabel})...
But this yields a syntax error with neo. I am tempted to concatenate, but am worried that this may expose an injection risk to my application.
I am tempted to build up my-own class that reads the intended string and rejects any type of neo syntax, but this would limit my users a bit and I would rather not.
There is an open neo4j issue 4334, which is a feature request for adding the ability to parameterize labels during CREATE.So, this is not yet possible.
That issue contains a comment that suggests generating CREATE statements with hardcoded labels, which will work. It is, unfortunately, not as performant as using parameters (should it ever be supported in this case).
I searched like hell and finally found it out.
you can do it like that:
// create or update nodes with dynamic label from import data
WITH "file:///query.json" AS url
call apoc.load.json(url) YIELD value as u
UNWIND u.cis as ci
CALL apoc.merge.node([ ci.label ], {Id:ci.Id}, {}, {}) YIELD node
RETURN node;
The JSON looks like that:
{
"cis": [
{
"label": "Computer",
"Id": "1"
},
{
"label": "Service",
"Id": "2"
},
{
"label": "Person",
"Id": "3"
}
],
"relations": [
{
"end1Id": "1",
"Id": "4",
"end2Id": "2",
"label": "USES"
},
{
"end1Id": "3",
"Id": "5",
"end2Id": "1",
"label": "MANAGED_BY"
}
]
}
If you are using a Java client, then you can do it like this.
Node node = GraphDatabaseService.createNode();
Label label = new Label() {
#Override
public String name() {
return dynamicLabelVal;
}
};
node.addLabel(label);
You can then have a LabelCache which will avoid Label object creation for every node.

Solr CollapsingQParserPlugin with group.facet=on style facet counts

I have a Solr index of about 5 million documents at 8GB using Solr 4.7.0. I require grouping in Solr, but find it to be too slow. Here is the group configuration:
group=on
group.facet=on
group.field=workId
group.ngroups=on
The machine has ample memory at 24GB and 4GB is allocated to Solr itself. Queries are generally taking about 1200ms compared to 90ms when grouping is turned off.
I ran across a plugin called CollapsingQParserPlugin which uses a filter query to remove all but one of a group.
fq={!collapse field=workId}
It's designed for indexes that have a lot of unique groups. I have about 3.8 million. This approach is much much faster at about 120ms. It's a beautiful solution for me except for one thing. Because it filters out other members of the group, only facets from the representative document are counted. For instance, if I have the following three documents:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book"
},
{
"id": "2",
"workId": "abc",
"type": "ebook"
},
{
"id": "3",
"workId": "abc",
"type": "ebook"
}
]
once collapsed, only the top one shows up in the results. Because the other two get filtered out, the facet counts look like
"type": ["book":1]
instead of
"type": ["book":1, "ebook":1]
Is there a way to get group.facet counts using the collapse filter query?
According to Yonik Seeley, the correct group facet counts can be gathered using the JSON Facet API. His comments can be found at:
https://issues.apache.org/jira/browse/SOLR-7036?focusedCommentId=15601789&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15601789
I tested out his method and it works great. I still use the CollapsingQParserPlugin to collapse the results, but I exclude the filter when counting up the facets like so:
fq={!tag=workId}{!collapse field=workId}
json.facet={
type: {
type: terms,
field: type,
facet: {
workCount: "unique(workId)"
},
domain: {
excludeTags: [workId]
}
}
}
And the result:
{
"facets": {
"count": 3,
"type": {
"buckets": [
{
"val": "ebook",
"count": 2,
"workCount": 1
},
{
"val": "book",
"count": 1,
"workCount": 1
}
]
}
}
}
I was unable to find a way to do this with Solr or plugin configurations, so I developed a work around to effectively create group facet counts while still using the CollapsingQParserPlugin.
I do this by making a duplicate of the fields I'll be faceting on and making sure all facet values for the entire group are in each document like so:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book",
"facetType": [
"book",
"ebook"
]
},
{
"id": "2",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
},
{
"id": "3",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
}
]
When I ask Solr to generate facet counts, I use the new field:
facet.field=facetType
This ensures that all facet values are accounted for and that the counts represent groups. But when I use a filter query, I revert back to using the old field:
fq=type:book
This way the correct document is chosen to represent the group.
I know this is a dirty, complex way to make it work, but it does work and that's what I needed. Also it requires the ability to query your documents before insertion into Solr, which calls for some development. If anyone has a simpler solution I would still love to hear it.

Resources