Use Annotation tool configuration / Automatic annotation service from brat - named-entity-recognition

I'd like to use a personnal API for named entity recognition (NER), and use brat for visualisation. It seems brat offers an Automatic annotation tool, but documentation about its configuration is sparse.
Are there available working examples of this features ?
Could someone explain me what should be the format of the response of the API ?

I finally manage to understand how it works, thanks to this topic in the GoogleGroup diffusion list of BRAT
https://groups.google.com/g/brat-users/c/shX1T2hqzgI
The text is sent to the Automatic Annotator API as a byte string in the body of a POST request, and the format BRAT required in response from this API is in the form of a dictionary of dictionaries, namel(
{
"T1": {
"type": "WhatEverYouWantString", # must be defined in the annotation.conf file
"offsets": [(0, 2), (10, 12)], # list of tuples of integers that correspond to the start and end position of
"texts": ["to", "go"]
}
"T2" : {
"type": "SomeString",
"offsets":[(start1, stop1), (start2, stop2), ...]
"texts":["string[start1:stop1]", "string[start2:stop2]", ...
}
"T3" : ....
}
THEN, you put this dictionary in a JSON format and you send it back to BRAT.
Note :
"T1", "T2", ... are mandatory keys (and corresponds to the Term index in the .ann file that BRAT generates during manual annotation)
the keys "type", "offsets" and "texts" are mandatory, otherwise you get some error in the log of BRAT (you can consult these log as explained in the GoogleGroup thread linked above)
the format of the values are strict ("type" gets a string, "offsets" gets a list of tuple (or list) or integers, "texts" gets a list of strings), otherwise you get BRAT errors
I suppose that the strings in "texts" must corresponds to the "offsets", otherwise there should be an error, or at least a problem with the display of tags (this is already the case if you generate the .ann files from an automatic detection algorithm and have different start and stop than the associated text)
I hope it helps. I managed to make the API using Flask this morning, but I needed to construct a flask.Response object to get the correct output format. Also, the incoming format from BRAT to the Flask API could not be catch until I used a flask.request object with request.get_body() method.
Also, I have to mention that I was not able to use the examples given in the BRAT GitHub :
https://github.com/nlplab/brat/blob/master/tools/tokenservice.py
https://github.com/nlplab/brat/blob/master/tools/randomtaggerservice.py
I mean I could not make them working, but I'm not familiar at all with API and HTTP packages in Python. At least I figured out what was the correct format for the API response.
Finally, I have no idea how to make relations among entities (i.e. BRAT arrows) format from the API, though
https://github.com/nlplab/brat/blob/master/tools/restoataggerservice.py
seems to work with such thing.
The GoogleGroup discussion
https://groups.google.com/g/brat-users/c/lzmd2Nyyezw/m/CMe9FenZAAAJ
seems to mention that it is not possible to send relations between entities back from the Automatic Annotation API and make them work with BRAT.
I may try it later :-)

Related

Accept/Content-Type header based processing in Quart and Quart-Schema

Because I am rewriting a legacy app, I cannot change what the clients either send or accept. I have to accept and return JSON, HTML, and an in-house XML-like serialization.
They do, fortunately set headers that describe what they are sending and what they accept.
So right now, what I do is have a decoder module and an encoder module with methods that are basically if/elif/else chains. When a route is ready to process/return something, I call the decoder/encoder module with the python object and the header field, which returns the formatted object as a string and the route processes the result or returns Response().
I am wondering if there is a more Quart native way of doing this.
I'm also trying to figure out how to make this work with Quart-Schema. I see from the docs that one can do app.json_encoder = <class> and I suppose I could sub in a different processor there, but it seems application global, there's no way to set it based on what the client sends. Optimally, it would be great if I could just pass the results of a dynamically chosen parser to Quart-Schema and let it do it's thing on python objects.
Thoughts and suggestions welcome. Thanks!
You can write your own decorator like the quart-schema #validation_headers(). Inside the decorator, check the header for the Content-Type, parse it, and pass the parsed object to the func(...).

Having a POJO like feature in KarateAPI?

I have been using Karate and RestAssured for sometime. There are advantages and downside of both tools of course. Right now I have a RestAssured project where I have Request and Response object and POJOs. My requests wraps my endpoint and send my POJOs to those endpoint. I do all my Headers, etc configuration in an abstract layer. In case I need to override them, I override them during the test. If not, Its a two lines of code for me to trigger an endpoint.
My way of working with happy path and negative path of an edpoint is that I initialize the POJO before every test with new values in the constructor. Then I override the value that I want in test scope. For example, if I want to test a negative case for password field, I set this field to empty string during the test. But other fields are already set to some random stuff before the test.
But I dont know how to achieve this with Karate.
Karate allows me to create a JSON representation of my request body and define my parameters as seen below example.
{
"firstName": "<name>",
"lastName": "<lastName>",
"email": "<email>",
"role": <role>
}
Then in every test I have to fill all the fields with some data.
|token |value|
|name |'canberk'|
|lastName |''|
|email |'canberk#blbabla.com'|
|role |'1'|
and
|token |value|
|name |''|
|lastName |'akduygu'|
|email |'canberk#blbabla.com'|
|role |'1'|
It goes on like this.
It's ok with a 4 fields JSON body but when the body starts to have more than 20 fields, it become a pain to initialise every field for every test.
Does Karate have a way of achieving this problem with a predefined steps of I need to come up with a solution?
There are advantages and downside of both tools of course.
I'm definitely biased, but IMHO the only disadvantage of Karate compared to REST-assured is that you don't get compile time safety :) I hope that you have seen this comparison.
Karate has multiple ways to do what you want. Here's what I would do.
create a JSON file that has all your "happy path" values set
use the read() syntax to load the file (which means this is re-usable across multiple tests)
use the set keyword to update only the field for your scenario or negative test
You can get even more fancier if you use embedded expressions.
create a JSON file that has all your "happy path" values set and the values you want to vary look like foo: '##(foo)'
before using read() you init some variables for e.g. * def foo = 'bar' and if you use null that JSON key will even be removed from the JSON
read() the JSON. it is ready for use !
You can refer to this file that demonstrates some of these concepts for XML, and you may get more ideas: xml.feature

What is this url string concept called?

I have just been on a website and I noticed they have a strange query string structure in the URL, they seem to be key value pairs and when you make a change in the website form the values update in the URL.
Here is the URL:
http://www.holidaysplease.co.uk/holiday-finder/#{"d":"2016-06-1","a":[],"t":20,"r":200,"f":13,"tr":180,"s":[5,4,3],"ac":[],"c":[],"sh":[],"dh":[],"du":null,"b":"500-4407"}
Does anyone know what this concept is called? I recall seeing it once in a Java based web application but can someone reassure me how this is achieved and in what language?
It looks like this is a fragment identifier.
Wikipedia says:
The fragment identifier introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document. The generic syntax is specified in RFC 3986. The hash mark separator in URIs does not belong to the fragment identifier.
RFC 3986 is defined here.
Before, I never saw that before. The upper information is what a little bit of research gave me back. I hope this is not completely wrong.
The text after a # in a URL is a fragment identifier, which is normally used to refer to a section in the document, but can contain any data which won't be sent in the request to the server, but can be read by the client using JavaScript.
In your example, the fragment identifier contains a data structure encoded with JSON, which is a serialization format supporting key-value pairs and arrays.
Here's the JSON from your example in a more readable form:
{
"d": "2016-06-1",
"a": [],
"t": 20,
"r": 200,
"f": 13,
"tr": 180,
"s": [
5,
4,
3
],
"ac": [],
"c": [],
"sh": [],
"dh": [],
"du": null,
"b": "500-4407"
}
The concept behind this is the data which the user entered is send to the server as the JSON structure in the URL. The server reads the string as a JSON and it process the request.
This process is very effective in WebForm and it can be done using the method called encodeURIComponent.
I think you noticed that when the date is changed, it just update the JSON in the URL. So, they send the data filled to the server in a JSON format.
In your URL,
d - days and year
du - duration
a - holiday type
t - temperature
r - rainfall
f- fight time
tr - travel
s - star for the hotels
b - budgets
Hope this information helps you :)
The part after the # is called a fragment identifier. Client-side javascript code can access the content of the fragment using location.hash. In this case the fragment contains json data. The browser will typically not even send the fragment to the web server, so it's only used client-side.
The most common use of the fragment identifier is to link to a specific element on a web page using it's id attribute. This is used for subsections of wikipedia articles:
https://en.wikipedia.org/wiki/Fragment_identifier#Basics
When the fragment contains json, you can inspect the data by opening up your browser's javascript console and calling this code.
JSON.parse(location.hash.substr(1))
This kind of scheme can be used by single page applications to store state in the url, so that you can bookmark it and share the url.

Is a url query parameter valid if it has no value?

Is a url like http://example.com/foo?bar valid?
I'm looking for a link to something official that says one way or the other. A simple yes/no answer or anecdotal evidence won't cut it.
Valid to the URI RFC
Likely acceptable to your server-side framework/code
The URI RFC doesn't mandate a format for the query string. Although it is recognized that the query string will often carry name-value pairs, it is not required to (e.g. it will often contain another URI).
3.4. Query
The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). ...
... However, as query components
are often used to carry identifying information in the form of
"key=value" pairs and one frequently used value is a reference to
another URI, ...
HTML establishes that a form submitted via HTTP GET should encode the form values as name-value pairs in the form "?key1=value1&key2=value2..." (properly encoded). Parsing of the query string is up to the server-side code (e.g. Java servlet engine).
You don't identify what server-side framework you use, if any, but it is possible that your server-side framework may assume the query string will always be in name-value pairs and it may choke on a query string that is not in that format (e.g. ?bar). If its your own custom code parsing the query string, you simply have to ensure you handle that query string format. If its a framework, you'll need to consult your documentation or simply test it to see how it is handled.
They're perfectly valid. You could consider them to be the equivalent of the big muscled guy standing silently behind the mob messenger. The guy doesn't have a name and doesn't speak, but his mere presence conveys information.
"The "http" scheme is used to locate network resources via the HTTP protocol. This section defines the scheme-specific syntax and semantics for http URLs." http://www.w3.org/Protocols/rfc2616/rfc2616.html
http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
So yes, anything is valid after a question mark. Your server may interpret differently, but anecdotally, you can see some languages treat that as a boolean value which is true if listed.
Yes, it is valid.
If one simply want to check if the parameter exists or not, this is one way to do so.
URI Spec
The only relevant part of the URI spec is to know everything between the first ? and the first # fits the spec's definition of a query. It can include any characters such as [:/.?]. This means that a query string such as ?bar, or ?ten+green+apples is valid.
Find the RFC 3986 here
HTML Spec
isindex is not meaningfully HTML5.
It's provided deprecated for use as the first element in a form only, and submits without a name.
If the entry's name is "isindex", its type is "text", and this is the first entry in the form data set, then append the value to result and skip the rest of the substeps for this entry, moving on to the next entry, if any, or the next step in the overall algorithm otherwise.
The isindex flag is for legacy use only. Forms in conforming HTML documents will not generate payloads that need to be decoded with this flag set.
The last time isindex was supported was HTML3. It's use in HTML5 is to provide easier backwards compatibility.
Support in libraries
Support in libraries for this format of URI varies however some libraries do provide legacy support to ease use of isindex.
Perl URI.pm (special support)
Some libraries like Perl's URI provide methods of parsing these kind of structures
$uri->query_keywords
$uri->query_keywords( $keywords, ... )
$uri->query_keywords( \#keywords )
Sets and returns query components that use the keywords separated by "+" format.
Node.js url (no special support)
As another far more frequent example, node.js takes the normal route and eases parsing as either
A string
or, an object of keys and values (using parseQueryString)
Most other URI-parsing APIs following something similar to this.
PHP parse_url, follows as similar implementation but only returns the string for the query. Parsing into an object of k=>v requires parse_string()
It is valid: see Wikipedia, RFC 1738 (3.3. HTTP), RFC 3986 (3. Syntax Components).
isindex deprecated magic name from HTML5
This deprecated feature allows a form submission to generate such an URL, providing further evidence that it is valid for HTML. E.g.:
<form action="#isindex" class="border" id="isindex" method="get">
<input type="text" name="isindex" value="bar"/>
<button type="submit">Submit</button>
</form>
generates an URL of type:
?bar
Standard: https://www.w3.org/TR/html5/forms.html#naming-form-controls:-the-name-attribute
isindex is however deprecated as mentioned at: https://stackoverflow.com/a/41689431/895245
As all other answers described, it's perfectly valid for checking, specially for boolean kind stuff
Here is a simple function to get the query string by name:
function getParameterByName(name, url) {
if (!url) {
url = window.location.href;
}
name = name.replace(/[\[\]]/g, "\\$&");
var regex = new RegExp("[?&]" + name + "(=([^&#]*)|&|#|$)"),
results = regex.exec(url);
if (!results) return null;
if (!results[2]) return '';
return decodeURIComponent(results[2].replace(/\+/g, " "));
}
and now you want to check if the query string you are looking for exists or not, you may do a simple thing like:
var exampleQueryString = (getParameterByName('exampleQueryString') != null);
the exampleQueryString will be false if the function can't find the query string, otherwise will be true.
The correct resource to look for this is RFC6570. Please refer to section 3.2.9 where in examples empty parameter is presented as below.
Example Template Expansion
{&x,y,empty} &x=1024&y=768&empty=

Multiple key/value pairs in HTTP POST where key is the same name

I'm working on an API that accepts data from remote clients, some of which where the key in an HTTP POST almost functions as an array. In english what this means is say I have a resource on my server called "class". A class in this sense, is the type a student sits in and a teacher educates in. When the user submits an HTTP POST to create a new class for their application, a lot of the key value pairs look like:
student_name: Bob Smith
student_name: Jane Smith
student_name: Chris Smith
What's the best way to handle this on both the client side (let's say the client is cURL or ActiveResource, whatever..) and what's a decent way of handling this on the server-side if my server is a Ruby on Rails app? Need a way to allow for multiple keys with the same name and without any namespace clashing or loss of data.
My requirement has to be that the POST data is urlencoded key/value pairs.
There are two ways to handle this, and it's going to depend on your client-side architecture how you go about doing it, as the HTTP standards do not make the situation cut and dry.
Traditionally, HTTP requests would simply use the same key for repeated values, and leave it up to the client architecture to realize what was going on. For instance, you could have a post request with the following values:
student_name=Bob+Smith&student_name=Jane+Smith&student_name=Chris+Smith
When the receiving architecture got that string, it would have to realize that there were multiple keys of student_name and act accordingly. It's usually implemented so that if you have a single key, a scalar value is created, and if you have multiples of the same key, the values are put into an array.
Modern client-side architectures such as PHP and Rails use a different syntax however. Any key you want to be read in as an array gets square brackets appended, like this:
student_name[]=Bob+Smith&student_name[]=Jane+Smith&student_name[]=Chris+Smith
The receiving architecture will create an array structure named "student_name" without the brackets. The square bracket syntax solves the problem of not being able to send an array with only a single value, which could not be handled with the "traditional" method.
Because you're using Rails, the square bracket syntax would be the way to go. If you think you might switch server-side architectures or want to distribute your code, you could look into more agnostic methods, such as JSON-encoding the string being sent, which adds overhead, but might be useful if it's a situation you expect to have to handle.
There's a great post on all this in the context of JQuery Ajax parameters here.
Send your data as XML or JSON and parse whatever you need out of it.

Resources