What is this url string concept called? - url

I have just been on a website and I noticed they have a strange query string structure in the URL, they seem to be key value pairs and when you make a change in the website form the values update in the URL.
Here is the URL:
http://www.holidaysplease.co.uk/holiday-finder/#{"d":"2016-06-1","a":[],"t":20,"r":200,"f":13,"tr":180,"s":[5,4,3],"ac":[],"c":[],"sh":[],"dh":[],"du":null,"b":"500-4407"}
Does anyone know what this concept is called? I recall seeing it once in a Java based web application but can someone reassure me how this is achieved and in what language?

It looks like this is a fragment identifier.
Wikipedia says:
The fragment identifier introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document. The generic syntax is specified in RFC 3986. The hash mark separator in URIs does not belong to the fragment identifier.
RFC 3986 is defined here.
Before, I never saw that before. The upper information is what a little bit of research gave me back. I hope this is not completely wrong.

The text after a # in a URL is a fragment identifier, which is normally used to refer to a section in the document, but can contain any data which won't be sent in the request to the server, but can be read by the client using JavaScript.
In your example, the fragment identifier contains a data structure encoded with JSON, which is a serialization format supporting key-value pairs and arrays.
Here's the JSON from your example in a more readable form:
{
"d": "2016-06-1",
"a": [],
"t": 20,
"r": 200,
"f": 13,
"tr": 180,
"s": [
5,
4,
3
],
"ac": [],
"c": [],
"sh": [],
"dh": [],
"du": null,
"b": "500-4407"
}

The concept behind this is the data which the user entered is send to the server as the JSON structure in the URL. The server reads the string as a JSON and it process the request.
This process is very effective in WebForm and it can be done using the method called encodeURIComponent.
I think you noticed that when the date is changed, it just update the JSON in the URL. So, they send the data filled to the server in a JSON format.
In your URL,
d - days and year
du - duration
a - holiday type
t - temperature
r - rainfall
f- fight time
tr - travel
s - star for the hotels
b - budgets
Hope this information helps you :)

The part after the # is called a fragment identifier. Client-side javascript code can access the content of the fragment using location.hash. In this case the fragment contains json data. The browser will typically not even send the fragment to the web server, so it's only used client-side.
The most common use of the fragment identifier is to link to a specific element on a web page using it's id attribute. This is used for subsections of wikipedia articles:
https://en.wikipedia.org/wiki/Fragment_identifier#Basics
When the fragment contains json, you can inspect the data by opening up your browser's javascript console and calling this code.
JSON.parse(location.hash.substr(1))
This kind of scheme can be used by single page applications to store state in the url, so that you can bookmark it and share the url.

Related

Use Annotation tool configuration / Automatic annotation service from brat

I'd like to use a personnal API for named entity recognition (NER), and use brat for visualisation. It seems brat offers an Automatic annotation tool, but documentation about its configuration is sparse.
Are there available working examples of this features ?
Could someone explain me what should be the format of the response of the API ?
I finally manage to understand how it works, thanks to this topic in the GoogleGroup diffusion list of BRAT
https://groups.google.com/g/brat-users/c/shX1T2hqzgI
The text is sent to the Automatic Annotator API as a byte string in the body of a POST request, and the format BRAT required in response from this API is in the form of a dictionary of dictionaries, namel(
{
"T1": {
"type": "WhatEverYouWantString", # must be defined in the annotation.conf file
"offsets": [(0, 2), (10, 12)], # list of tuples of integers that correspond to the start and end position of
"texts": ["to", "go"]
}
"T2" : {
"type": "SomeString",
"offsets":[(start1, stop1), (start2, stop2), ...]
"texts":["string[start1:stop1]", "string[start2:stop2]", ...
}
"T3" : ....
}
THEN, you put this dictionary in a JSON format and you send it back to BRAT.
Note :
"T1", "T2", ... are mandatory keys (and corresponds to the Term index in the .ann file that BRAT generates during manual annotation)
the keys "type", "offsets" and "texts" are mandatory, otherwise you get some error in the log of BRAT (you can consult these log as explained in the GoogleGroup thread linked above)
the format of the values are strict ("type" gets a string, "offsets" gets a list of tuple (or list) or integers, "texts" gets a list of strings), otherwise you get BRAT errors
I suppose that the strings in "texts" must corresponds to the "offsets", otherwise there should be an error, or at least a problem with the display of tags (this is already the case if you generate the .ann files from an automatic detection algorithm and have different start and stop than the associated text)
I hope it helps. I managed to make the API using Flask this morning, but I needed to construct a flask.Response object to get the correct output format. Also, the incoming format from BRAT to the Flask API could not be catch until I used a flask.request object with request.get_body() method.
Also, I have to mention that I was not able to use the examples given in the BRAT GitHub :
https://github.com/nlplab/brat/blob/master/tools/tokenservice.py
https://github.com/nlplab/brat/blob/master/tools/randomtaggerservice.py
I mean I could not make them working, but I'm not familiar at all with API and HTTP packages in Python. At least I figured out what was the correct format for the API response.
Finally, I have no idea how to make relations among entities (i.e. BRAT arrows) format from the API, though
https://github.com/nlplab/brat/blob/master/tools/restoataggerservice.py
seems to work with such thing.
The GoogleGroup discussion
https://groups.google.com/g/brat-users/c/lzmd2Nyyezw/m/CMe9FenZAAAJ
seems to mention that it is not possible to send relations between entities back from the Automatic Annotation API and make them work with BRAT.
I may try it later :-)

Is there any way to parse JSON with trailing commas in Ruby?

I'm currently coding a transition from a system that used hand-crafted JSON files to one that can automatically generate the JSON files. The old system works; the new system works; what I need to do is transfer data from the old system to the new one.
The JSON files are used by an iOS app to provide functionality, and have never been read by our server software in Ruby On Rails before. To convert between the original system and the new system, I've started work on parsing the existing JSON files.
The problem is that one of my first two sample files has trailing commas in the JSON:
{ "sample data": [1, 2, 3,] }
This apparently went through just fine with the iOS app, because that file has been in use for a while. Now I need some way to parse the data provided in the file in my Ruby on Rails server, which (quite rightfully) throws an exception over the illegal trailing comma in the JSON file.
I can't just JSON.parse the code, because the parser, quite rightfully, rejects it as invalid JSON. Is there some way to parse it -- either an option I can pass to JSON.parse, or a gem that adds something, etc etc? Or do I need to report back that we're going to have to hand-fix the broken files before the automated process can process them?
Edit:
Based on comments and requests, it looks like some additional data is called for. The JSON files in question are stored in .zip files on S3, stored via ActiveStorage. The process I'm writing needs to download, unpack, and parse the zip files, using the 'manifest.json' file as a key to convert the archived file into a database structure with multiple, smaller files stored on S3 instead of a single zip that contains everything. A (very) long term goal is for clients to stop downloading a unitary zip file, and instead download the files individually. The first step towards that is to break the zip files up on the server, which means the server needs to read in the zip files. A more detailed sample of the data follows. (Note that the structure contains several design decisions I later came to regret; one of the original ideas was to be able to re-use files rather than pack multiple copies of the same identical file, but YAGNI bit me in the rear there)
The following includes comments that are not legal in JSON format:
{
"defined_key": [
{
"name": "Object_with_subkeys",
"key": "filename",
"subkeys": [
{
"id":"1"
},
{
"id":"2"
},
{
"id":"3" // references to identifier on another defined key
}, // Note trailing comma
]
}
],
"another_defined_key":[
{
"identifier": "should have made parent a hash with id as key instead of an array",
"data":"metadata",
"display_name":"Names: Can be very arbitrary",
"user text":"Wait for the right {moment}", // I actually don't expect { or } in the strings, but they're completely legal and may have been used
"thumbnail":"filename-2.png",
"video-1":"filename-3.mov"
}
]
}
The problem is that your are trying to parse something that looks a lot like JSON but is not actually JSON as defined by the spec.
Arrays- An array structure is a pair of square bracket tokens surrounding zero or more values. The values are separated by commas.
Since you have a trailing comma another value is also expected and most JSON parsers will raise an error due to this violation
All that being said json-next will parse this appropriately maybe give that a shot.
It can parse JSON like representations that completely violate the JSON spec depending on the flavor you use. (HanSON, SON, JSONX as defined in the gem)
Example:
json = "{ \"sample data\": [1, 2, 3,] }")
require 'json/next'
HANSON.parse(json)
#=> {"sample data"=>[1, 2, 3]}
but the following is equivalent and completely violates spec
JSONX.parse("{ \"sample data\": [1 2 3] }")
#=> {"sample data"=>[1, 2, 3]}
So if you choose this route do not expect to use this to validate the JSON data or structure in any fashion and you could end up with unintended results.

Why is a rails POST/PUT format different to GET by default?

Say I have a ruby model which has a name and age attribute. A GET request for one of these objects returns something like this when using rails generate scaffold:
{
"id": 1,
"name": "foo",
"age": 21,
"parent_id": 1
}
By default a POST/PUT to this resource expects:
{
"user": {
"name": "foo",
"age": 21,
"parent_id": 1
}
}
When using nested resources configured in routes the default behaviour is to add the parent id outside of this nested hash too, e.g.: PUT /parents/1/users:
{
"parent_id": 1,
"user": {
"name": "foo",
"age": 21
}
}
I can go to the controller simply enough and alter what parameters are expected, but I'd like to know why that is the case and if I risk breaking anything if changing it.
More specifically this is a Rails API and I'd like to add swagger doc generation to the API, so having this asymmetrical request body is annoying.
So in summary my questions are:
What are the advantages of this, why is it the Rails default and what do I risk breaking by changing it?
How best to add swagger support to the API in a way which doesn't have different GET responses vs PUT/POST (which seems like bad design to me, but maybe I'm wrong)?
How best/should I make the API automatically add the parent id when making a call like POST /parents/1/users, because again the default generation doesn't support it and I'm wondering if there's a reason
What are the advantages of this?
This is perhaps an opinion-based answer, which is generally frowned upon by StackOverflow, but here's my 2 cents.
In the GET request, you are simply being returned a resource. So the attributes are all you need to know:
{
"id": 1,
"name": "foo",
"age": 21,
"parent_id": 1
}
On the other hand, for this PUT request:
{
"parent_id": 1,
"user": {
"name": "foo",
"age": 21
}
}
You can think of the parameters as being split into two "sections": The parent_id (which would normally get sent as a path param, not part of the request body!) is something to "search/filter" by, whereas the user params are the attributes of the user resource to update.
This logical separation of concerns is particularly useful in the context of web forms (which is what Rails was originally/primarily designed for), especially when dealing with complex queries or "nested" attributes.
what do I risk breaking by changing it?
Nothing really.
That format, however, was "optimised" for the context of RESTful APIs and web forms.
If you'd rather use some other format then go ahead; Rails isn't forcing you to use anything here. Just beware that a naive "better design" may come back to bite you down the line.
How best to add swagger support to the API in a way which doesn't have different GET responses vs PUT/POST (which seems like bad design to me, but maybe I'm wrong)?
You can design the API any way you like. If you want "flat parameters" everywhere, then just build the Rails application like that.
How best/should I make the API automatically add the parent id when making a call like POST /parents/1/users, because again the default generation doesn't support it and I'm wondering if there's a reason
I'm not sure what you mean by "the default generation doesn't support it". The default generation of what? The swagger docs? The rails application?
Anyway... That should be implemented as a path parameter. The swagger docs should look something like this:
/parents/{parent_id}/users:
get:
description: '.....'
parameters:
- name: parent_id
in: path
description: 'ID of the parent'
required: true
type: integer
Tom Lord’s answer and note is probably better than mine.
My guess is that this mimics the behaviour of HTTP. If you GET data, you can add parameters (?name=foo). However, if you POST data, you tend to put the payload in the body of the request. And not have any parameters in the URL.
It’s likely that Rails thinks that you’re going put that JSON object into the body of the request. Whereas the GET request it’s going to split the key/values apart and send as parameters.
The advantages of keeping it the way they are is that it’ll avoid a gotcha later. I’d argue this is always the best thing to do in programming, but also especially something like Rails. But, if you’re making an API I can see why you’d want to let people send data as parameters rather than a body that needs validating.
As for Swagger, let the user know they need to send the data as a JSON string, and then use the parameters feature as expected.
Last one is a bit tricky. I guess it’s up to the design of your API. You could pass it as part of the request. Maybe take a look through sometihng like RESTful API Design to clarify your goal.

Is a url query parameter valid if it has no value?

Is a url like http://example.com/foo?bar valid?
I'm looking for a link to something official that says one way or the other. A simple yes/no answer or anecdotal evidence won't cut it.
Valid to the URI RFC
Likely acceptable to your server-side framework/code
The URI RFC doesn't mandate a format for the query string. Although it is recognized that the query string will often carry name-value pairs, it is not required to (e.g. it will often contain another URI).
3.4. Query
The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). ...
... However, as query components
are often used to carry identifying information in the form of
"key=value" pairs and one frequently used value is a reference to
another URI, ...
HTML establishes that a form submitted via HTTP GET should encode the form values as name-value pairs in the form "?key1=value1&key2=value2..." (properly encoded). Parsing of the query string is up to the server-side code (e.g. Java servlet engine).
You don't identify what server-side framework you use, if any, but it is possible that your server-side framework may assume the query string will always be in name-value pairs and it may choke on a query string that is not in that format (e.g. ?bar). If its your own custom code parsing the query string, you simply have to ensure you handle that query string format. If its a framework, you'll need to consult your documentation or simply test it to see how it is handled.
They're perfectly valid. You could consider them to be the equivalent of the big muscled guy standing silently behind the mob messenger. The guy doesn't have a name and doesn't speak, but his mere presence conveys information.
"The "http" scheme is used to locate network resources via the HTTP protocol. This section defines the scheme-specific syntax and semantics for http URLs." http://www.w3.org/Protocols/rfc2616/rfc2616.html
http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
So yes, anything is valid after a question mark. Your server may interpret differently, but anecdotally, you can see some languages treat that as a boolean value which is true if listed.
Yes, it is valid.
If one simply want to check if the parameter exists or not, this is one way to do so.
URI Spec
The only relevant part of the URI spec is to know everything between the first ? and the first # fits the spec's definition of a query. It can include any characters such as [:/.?]. This means that a query string such as ?bar, or ?ten+green+apples is valid.
Find the RFC 3986 here
HTML Spec
isindex is not meaningfully HTML5.
It's provided deprecated for use as the first element in a form only, and submits without a name.
If the entry's name is "isindex", its type is "text", and this is the first entry in the form data set, then append the value to result and skip the rest of the substeps for this entry, moving on to the next entry, if any, or the next step in the overall algorithm otherwise.
The isindex flag is for legacy use only. Forms in conforming HTML documents will not generate payloads that need to be decoded with this flag set.
The last time isindex was supported was HTML3. It's use in HTML5 is to provide easier backwards compatibility.
Support in libraries
Support in libraries for this format of URI varies however some libraries do provide legacy support to ease use of isindex.
Perl URI.pm (special support)
Some libraries like Perl's URI provide methods of parsing these kind of structures
$uri->query_keywords
$uri->query_keywords( $keywords, ... )
$uri->query_keywords( \#keywords )
Sets and returns query components that use the keywords separated by "+" format.
Node.js url (no special support)
As another far more frequent example, node.js takes the normal route and eases parsing as either
A string
or, an object of keys and values (using parseQueryString)
Most other URI-parsing APIs following something similar to this.
PHP parse_url, follows as similar implementation but only returns the string for the query. Parsing into an object of k=>v requires parse_string()
It is valid: see Wikipedia, RFC 1738 (3.3. HTTP), RFC 3986 (3. Syntax Components).
isindex deprecated magic name from HTML5
This deprecated feature allows a form submission to generate such an URL, providing further evidence that it is valid for HTML. E.g.:
<form action="#isindex" class="border" id="isindex" method="get">
<input type="text" name="isindex" value="bar"/>
<button type="submit">Submit</button>
</form>
generates an URL of type:
?bar
Standard: https://www.w3.org/TR/html5/forms.html#naming-form-controls:-the-name-attribute
isindex is however deprecated as mentioned at: https://stackoverflow.com/a/41689431/895245
As all other answers described, it's perfectly valid for checking, specially for boolean kind stuff
Here is a simple function to get the query string by name:
function getParameterByName(name, url) {
if (!url) {
url = window.location.href;
}
name = name.replace(/[\[\]]/g, "\\$&");
var regex = new RegExp("[?&]" + name + "(=([^&#]*)|&|#|$)"),
results = regex.exec(url);
if (!results) return null;
if (!results[2]) return '';
return decodeURIComponent(results[2].replace(/\+/g, " "));
}
and now you want to check if the query string you are looking for exists or not, you may do a simple thing like:
var exampleQueryString = (getParameterByName('exampleQueryString') != null);
the exampleQueryString will be false if the function can't find the query string, otherwise will be true.
The correct resource to look for this is RFC6570. Please refer to section 3.2.9 where in examples empty parameter is presented as below.
Example Template Expansion
{&x,y,empty} &x=1024&y=768&empty=

Multiple key/value pairs in HTTP POST where key is the same name

I'm working on an API that accepts data from remote clients, some of which where the key in an HTTP POST almost functions as an array. In english what this means is say I have a resource on my server called "class". A class in this sense, is the type a student sits in and a teacher educates in. When the user submits an HTTP POST to create a new class for their application, a lot of the key value pairs look like:
student_name: Bob Smith
student_name: Jane Smith
student_name: Chris Smith
What's the best way to handle this on both the client side (let's say the client is cURL or ActiveResource, whatever..) and what's a decent way of handling this on the server-side if my server is a Ruby on Rails app? Need a way to allow for multiple keys with the same name and without any namespace clashing or loss of data.
My requirement has to be that the POST data is urlencoded key/value pairs.
There are two ways to handle this, and it's going to depend on your client-side architecture how you go about doing it, as the HTTP standards do not make the situation cut and dry.
Traditionally, HTTP requests would simply use the same key for repeated values, and leave it up to the client architecture to realize what was going on. For instance, you could have a post request with the following values:
student_name=Bob+Smith&student_name=Jane+Smith&student_name=Chris+Smith
When the receiving architecture got that string, it would have to realize that there were multiple keys of student_name and act accordingly. It's usually implemented so that if you have a single key, a scalar value is created, and if you have multiples of the same key, the values are put into an array.
Modern client-side architectures such as PHP and Rails use a different syntax however. Any key you want to be read in as an array gets square brackets appended, like this:
student_name[]=Bob+Smith&student_name[]=Jane+Smith&student_name[]=Chris+Smith
The receiving architecture will create an array structure named "student_name" without the brackets. The square bracket syntax solves the problem of not being able to send an array with only a single value, which could not be handled with the "traditional" method.
Because you're using Rails, the square bracket syntax would be the way to go. If you think you might switch server-side architectures or want to distribute your code, you could look into more agnostic methods, such as JSON-encoding the string being sent, which adds overhead, but might be useful if it's a situation you expect to have to handle.
There's a great post on all this in the context of JQuery Ajax parameters here.
Send your data as XML or JSON and parse whatever you need out of it.

Resources