how to get unique filename in erlang yaws - erlang

I am creating application that upload media files(audio,video and images) using erlang in yaws web server.There multiple user can upload files concurrently so how can i get unique filename every time ?
Please suggest me solution.

Usually, you have two ways to do that :
1 : You store a base ID in a database (like "a") which will be the name for the next file uploaded, and you increment the ID each time you add a file ("b", "c", "aa", "aA", ...). For exemple, this is the the kind of techniques used by youtube or url-shortner.
2 : You hash the file (with MD5, SHA1, CRC32, ...) and the resulting hash will be the name of the file on your server. Since hash should be unique for each file, you shouldn't have name collision.
The hash technique is usually more CPU-intensive (because you will need to hash every file uploaded), but you can then provide the hash to your clients to check the integrity of the file.
If you want to keep the original name of the file, you will need a database (like mnesia if you are using erlang) to store each relation (ID -> orignal name, or Hash -> original name).

Related

Use Annotation tool configuration / Automatic annotation service from brat

I'd like to use a personnal API for named entity recognition (NER), and use brat for visualisation. It seems brat offers an Automatic annotation tool, but documentation about its configuration is sparse.
Are there available working examples of this features ?
Could someone explain me what should be the format of the response of the API ?
I finally manage to understand how it works, thanks to this topic in the GoogleGroup diffusion list of BRAT
https://groups.google.com/g/brat-users/c/shX1T2hqzgI
The text is sent to the Automatic Annotator API as a byte string in the body of a POST request, and the format BRAT required in response from this API is in the form of a dictionary of dictionaries, namel(
{
"T1": {
"type": "WhatEverYouWantString", # must be defined in the annotation.conf file
"offsets": [(0, 2), (10, 12)], # list of tuples of integers that correspond to the start and end position of
"texts": ["to", "go"]
}
"T2" : {
"type": "SomeString",
"offsets":[(start1, stop1), (start2, stop2), ...]
"texts":["string[start1:stop1]", "string[start2:stop2]", ...
}
"T3" : ....
}
THEN, you put this dictionary in a JSON format and you send it back to BRAT.
Note :
"T1", "T2", ... are mandatory keys (and corresponds to the Term index in the .ann file that BRAT generates during manual annotation)
the keys "type", "offsets" and "texts" are mandatory, otherwise you get some error in the log of BRAT (you can consult these log as explained in the GoogleGroup thread linked above)
the format of the values are strict ("type" gets a string, "offsets" gets a list of tuple (or list) or integers, "texts" gets a list of strings), otherwise you get BRAT errors
I suppose that the strings in "texts" must corresponds to the "offsets", otherwise there should be an error, or at least a problem with the display of tags (this is already the case if you generate the .ann files from an automatic detection algorithm and have different start and stop than the associated text)
I hope it helps. I managed to make the API using Flask this morning, but I needed to construct a flask.Response object to get the correct output format. Also, the incoming format from BRAT to the Flask API could not be catch until I used a flask.request object with request.get_body() method.
Also, I have to mention that I was not able to use the examples given in the BRAT GitHub :
https://github.com/nlplab/brat/blob/master/tools/tokenservice.py
https://github.com/nlplab/brat/blob/master/tools/randomtaggerservice.py
I mean I could not make them working, but I'm not familiar at all with API and HTTP packages in Python. At least I figured out what was the correct format for the API response.
Finally, I have no idea how to make relations among entities (i.e. BRAT arrows) format from the API, though
https://github.com/nlplab/brat/blob/master/tools/restoataggerservice.py
seems to work with such thing.
The GoogleGroup discussion
https://groups.google.com/g/brat-users/c/lzmd2Nyyezw/m/CMe9FenZAAAJ
seems to mention that it is not possible to send relations between entities back from the Automatic Annotation API and make them work with BRAT.
I may try it later :-)

Is there any way to parse JSON with trailing commas in Ruby?

I'm currently coding a transition from a system that used hand-crafted JSON files to one that can automatically generate the JSON files. The old system works; the new system works; what I need to do is transfer data from the old system to the new one.
The JSON files are used by an iOS app to provide functionality, and have never been read by our server software in Ruby On Rails before. To convert between the original system and the new system, I've started work on parsing the existing JSON files.
The problem is that one of my first two sample files has trailing commas in the JSON:
{ "sample data": [1, 2, 3,] }
This apparently went through just fine with the iOS app, because that file has been in use for a while. Now I need some way to parse the data provided in the file in my Ruby on Rails server, which (quite rightfully) throws an exception over the illegal trailing comma in the JSON file.
I can't just JSON.parse the code, because the parser, quite rightfully, rejects it as invalid JSON. Is there some way to parse it -- either an option I can pass to JSON.parse, or a gem that adds something, etc etc? Or do I need to report back that we're going to have to hand-fix the broken files before the automated process can process them?
Edit:
Based on comments and requests, it looks like some additional data is called for. The JSON files in question are stored in .zip files on S3, stored via ActiveStorage. The process I'm writing needs to download, unpack, and parse the zip files, using the 'manifest.json' file as a key to convert the archived file into a database structure with multiple, smaller files stored on S3 instead of a single zip that contains everything. A (very) long term goal is for clients to stop downloading a unitary zip file, and instead download the files individually. The first step towards that is to break the zip files up on the server, which means the server needs to read in the zip files. A more detailed sample of the data follows. (Note that the structure contains several design decisions I later came to regret; one of the original ideas was to be able to re-use files rather than pack multiple copies of the same identical file, but YAGNI bit me in the rear there)
The following includes comments that are not legal in JSON format:
{
"defined_key": [
{
"name": "Object_with_subkeys",
"key": "filename",
"subkeys": [
{
"id":"1"
},
{
"id":"2"
},
{
"id":"3" // references to identifier on another defined key
}, // Note trailing comma
]
}
],
"another_defined_key":[
{
"identifier": "should have made parent a hash with id as key instead of an array",
"data":"metadata",
"display_name":"Names: Can be very arbitrary",
"user text":"Wait for the right {moment}", // I actually don't expect { or } in the strings, but they're completely legal and may have been used
"thumbnail":"filename-2.png",
"video-1":"filename-3.mov"
}
]
}
The problem is that your are trying to parse something that looks a lot like JSON but is not actually JSON as defined by the spec.
Arrays- An array structure is a pair of square bracket tokens surrounding zero or more values. The values are separated by commas.
Since you have a trailing comma another value is also expected and most JSON parsers will raise an error due to this violation
All that being said json-next will parse this appropriately maybe give that a shot.
It can parse JSON like representations that completely violate the JSON spec depending on the flavor you use. (HanSON, SON, JSONX as defined in the gem)
Example:
json = "{ \"sample data\": [1, 2, 3,] }")
require 'json/next'
HANSON.parse(json)
#=> {"sample data"=>[1, 2, 3]}
but the following is equivalent and completely violates spec
JSONX.parse("{ \"sample data\": [1 2 3] }")
#=> {"sample data"=>[1, 2, 3]}
So if you choose this route do not expect to use this to validate the JSON data or structure in any fashion and you could end up with unintended results.

Rails/Dragonfly: is it possible to decode URL (/media/W1siZiIsInRzYW...) to get the model instance it refers to?

This is what I want to accomplish:
On an extranet with a feed wall, users just type non-formatted text (non style, just \n and links recognition).
But quite often, users want to add a link to a document which is stored in the same extranet (using dragonfly). Obviously, the link is quite awful to display (ex: https://extranet.com/media/l0ngUiD/original_filename.pdf?sha=31310881DAEF1).
This document refers to a Document instance which has a nice title, ex: "Original Filename (PDF)"
I would like those links to be (automatically) replaced by
Original Filename (PDF)
Problem is: how to find which model and which document this document refers to, using the UID and sha.
I guess this is possible as Dragonfly decodes the url, but I can't find how to (not much comments in the code).
The variant's UID is Base64 encoded, so when decoding the UID you will get a JSON encoded array of Dragonfly processor steps.
Here an actual example from a live database:
uid = 'W1siZiIsIjIwMTUvMDcvMDkvMDkvMTMvMDIvOTE3LzE5NDg3ODk0MDc1XzE4NGYzMjc0MWVfay5qcGciXV0'
Base64.decode64 uid
# => "[[\"f\",\"2015/07/09/09/13/02/917/19487894075_184f32741e_k.jpg\"]]"
Each step is an array with the step's operation in the first element and the step's arguments in the remaining.

Parse id in URL in encrypted form to prevent sql injection

snatching my hair to fix this problem but I can't.
I am parsing id in url to pull data on next page according to that id. So rather than parsing id=123 I encrypted it something like process.php?token=TG4n6iv_aoO7sU3AngFY4WLSppLvueEoh-MnYE6k7NA, and decrypted it on process.php page by collecting it with $_GET, before using it in sql query. This is not proper URL, I need url like process.php?token=9878799889 and I need to decrypt this 9878799889 on process.php which would give me my original user id.
So here I can not use md5 or base64_encode which give me ugly string.
What would be best thing to do here?
id is unique so generated long digits should be unique as well and not easy to guess.
Right now I am using encrypt logarithm with salt. Actually want to parse like www.sitename.com/process/token/9878799889..this can be achieve with .htaccess so not worried about it..
Any help will be much appreciated..
What you could do is add an association table in your databse, which would contains a UUID as primary key (a randomly generated number) and your true ID reference (and other information you may want to store there, like a "valid until" date or other things...)
You'd have to generate the entry in that table as you parse the UUID
let's say INSERT INTO uuid_table (uuid, real_id) VALUES (9878799889, 123);
now when you process the url process.php?token=9878799889
you would only have to SELECT real_id FROM uuid_table WHERE uuid=9878799889;
it would retern the read id 123
You should also DELETE FROM uuid_table WHERE uuid=9878799889 when you're done.
Note that it would also work with md5 or base_64, but indeed it makes the url uglier.

Dart Read support for binary files

there exist some sample code for an Http Server in the Dart:io section.
Now I will distribute images with this server. To achieve this, I read the requested image file and send its content to the client via request.response.write().
The problem is the format of the read data:
Either I read the image file as 16bit-String or as Byte Array. Neither of them is compatible to a raw 8-bit array, which I have to send to the client.
May someone help me?
There exist several kinds of write-methods in the response class.
write
writeCharCode
add
While "write" writes the data 'as seen', "writeCharCode" transforms the data back to raw-format. However, writeCharCode prepends some "magic byte" (C2) at the beginning, so it corrupts the data.
Another function, called add( List < int > ) processes the readAsBytes-result as desired.
Best regards,
Alex

Resources