Which settings should be used for TokensregexNER - named-entity-recognition

When I try regexner it works as expected with the following settings and data;
props.setProperty("annotators", "tokenize, cleanxml, ssplit, pos, lemma, regexner");
Bachelor of Laws DEGREE
Bachelor of (Arts|Laws|Science|Engineering|Divinity) DEGREE
What I would like to do is that using TokenRegex. For example
Bachelor of Laws DEGREE
Bachelor of ([{tag:NNS}] [{tag:NNP}]) DEGREE
I read that to do this, I should use TokensregexNERAnnotator.
I tried to use it as follows, but it did not work.
Pipeline.addAnnotator(new TokensRegexNERAnnotator("expressions.txt", true));
Or I tried setting annotator in another way,
props.setProperty("annotators", "tokenize, cleanxml, ssplit, pos, lemma, tokenregexner");
props.setProperty("customAnnotatorClass.tokenregexner", "edu.stanford.nlp.pipeline.TokensRegexNERAnnotator");
I tried to different TokenRegex formats but either annotator could not find the expression or I got SyntaxException.
What is the proper way to use TokenRegex (query on tokens with tags) on NER data file ?
BTW I just see a comment in TokensRegexNERAnnotator.java file. Not sure if it is related pos tags does not work with RegexNerAnnotator.
if (entry.tokensRegex != null) {
// TODO: posTagPatterns...
pattern = TokenSequencePattern.compile(env, entry.tokensRegex);
}

First you need to make a TokensRegex rule file (sample_degree.rules). Here is an example:
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
{ pattern: (/Bachelor/ /of/ [{tag:NNP}]), action: Annotate($0, ner, "DEGREE") }
To explain the rule a bit, the pattern field is specifying what type of pattern to match. The action field is saying to annotate every token in the overall match (that is what $0 represents), annotate the ner field (note that we specified ner = ... in the rule file as well, and the third parameter is saying set the field to the String "DEGREE").
Then make this .props file (degree_example.props) for the command:
customAnnotatorClass.tokensregex = edu.stanford.nlp.pipeline.TokensRegexAnnotator
tokensregex.rules = sample_degree.rules
annotators = tokenize,ssplit,pos,lemma,ner,tokensregex
Then run this command:
java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -props degree_example.props -file sample-degree-sentence.txt -outputFormat text
You should see that the three tokens you wanted tagged as "DEGREE" will be tagged.
I think I will push a change to the code to make tokensregex link to the TokensRegexAnnotator so you won't have to specify it as a custom annotator.
But for now you need to add that line in the .props file.
This example should help in implementing this. Here are some more resources if you want to learn more:
http://nlp.stanford.edu/software/tokensregex.shtml#TokensRegexRules
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/tokensregex/SequenceMatchRules.html
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/tokensregex/types/Expressions.html

Related

Is it possible to apply [Rule Chain] after [Data Converter]?

I am currently working on a POC by using ThingsBoard PE.
Our raw data contains [Asset] [Attributes].
Data flow:
[IoT cloud] --https webhook carry raw data--> [ThingsBoard PE HTTP INTEGRATION] --uplink--> [ThingsBoard PE Data Converter]
My question is: is it possible to apply [Rule Chain] after [ThingsBoard PE Data Converter]? Therefore, the device can auto create relationship with [Asset] by the [Attribute], instead of [AssetName].
Example data after data converter process:
{
"deviceName": "ABC",
"deviceType": "temperature",
"attributes": {
"asset_id": 6 // <-- the id is used in asset attribute
},
"telemetry": {
"temperature": 39.43
}
}
Answering your two, separate questions:
is it possible to apply [Rule Chain] after [ThingsBoard PE Data Converter]?
Yes it is possible. Once your data is successfully integrated and you are receiving it, you can access it using the [Input] Rule Node (the default green one that is always there when you create a Rule) and route it to any other node you need.
Therefore, the device can auto create relationship with [Asset] by the [Attribute], instead of [AssetName].
So, you want the relationship to take your custom attribute and use that as the pattern that identifies the Asset you want to create the relationship from.
The PE edition already has the Create Relation Node. However, seems that as it is one is not able to do what you seek (has no option to specify custom Asset id).
However, two options you got are:
Create a Custom Rule Node that does what you want. For that try checking the Rule Node Development page from Thingsboard. You can use the Create Relation Node as base and work from there. This can be a longer solution than doing...
Enrich your incoming message's metadata, adding your desired attribute. The Create Relation Node allows you to use variables on your message's metadata in your Name and Type patterns, as seen from this screenshot I took from that node:
This allows us a workaround to what you want to do: Add a Script Transformation Node that adds attributes.asset_id to the metadata, for example as metadata.asset_id, so you can then use it as ${asset_id} on your Name and Type patterns.
For example, your Transform() method of such Script Transformation Node should look something like this:
function Transform(msg, metadata, msgType){
//assumming your desired id is msg.attributes.asset_id, add it to the metadata
metadata.asset_id = msg.attributes.asset_id;
//return the message, in your case to the Create Relation Node
return {msg: msg, metadata:metadata, msgType:msgType};
}
Finally, your Rule should be connected like this:
[Input] -> [Script Node] -> [Create Relation Node] -> [...whatever else you like]

How to dynamically set binding type's "formatOptions" and "constraints" in XML with binding?

I have a list of elements (OData set) and use a binding to show this list.
One field is for a quantity value and this value could sometimes need some decimal places.
The requirement is: only show that amount of decimal numbers that is also available in the OData service.
Annotation techniques can't be used.
I 'hacked' something that is misusing a formatter to update the type of a binding. But this is 'a hack' and it is not possible to convert it to XML views. (The reason is a different handling of the scope the formatter will be called).
So I am searching for a working solution for XML views.
The following code would not work but shows the issue:
new sap.m.Input({ // looking for an XML solution with bindings
value: {
path: "Quantity",
type: new sap.ui.model.type.Float({
// formatOptions
maxFractionDigits: "{QuantityDecimals}",
// ...
}, {
// constraints
minimum: 0
}),
// ...
}
});
The maxFractionDigits : "{QuantityDecimals}" should be "dynamic" and not a constant value.
Setting formatOptions and constraints dynamically in XML (via bindings or a declared function) is unfortunately not (yet) supported. But IMHO this is a valid enhancement request that app developers would greatly benefit from, since it encourages declarative programming.
I already asked for the same feature some years ago but in a comment at https://github.com/SAP/openui5/issues/2449#issuecomment-474304965 (See my answer to Frank's question "If XMLViews would allow a way to specify the dynamic constraints as well (e.g. enhanced syntax), would that fix the problem?").
Please create a new issue via https://github.com/SAP/openui5/issues/new with a clear description of what kind of problems the feature would resolve and possibly other use cases (You can add a link to my comment). I'll add my 👍 to your GitHub issue, and hopefully others too.
I'll update this answer as soon as the feature is available.
Get your dynamic number from your model and store it in a JS variable.
var nQuantityDecimals = this.getModel().getProperty("/QuantityDecimals");
new sap.m.Input({
value : {
path : "Quantity",
type : new sap.ui.model.type.Float({
maxFractionDigits : nQuantityDecimals,
source : {
groupingSeparator: ",",
decimalSeparator: ".",
groupingEnabled: false
}
}, {
minimum:0
})
}
}),

Check if entered text is valid in Xtext

lets say we have some grammar like this.
Model:
greeting+=Greeting*;
Greeting:
'Hello' name=ID '!';
I would like to check whether the text written text in name is a valid text.
All the valid words are saved in an array.
Also the array should be filled with words from a given file.
So is it possible to check this at runtime and maybe also use this words as suggestions.
Thanks
For this purpose you can use a validator.
A simple video tutorial about it can be found here
In your case the function in the validator could look like this:
public static val INVALID_NAME = "greeting_InvalidName"
#Check
def nameIsValid(Greeting grt) {
val name = grt.getName() //or just grt.Name
val validNames = NewArrayList
//add all valid names to this list
if (!validNames.contains(name)) {
val errorMsg = "Name is not valid"
error(errorMsg, GreetingsPackage.eINSTANCE.Greeting_name, INVALID_NAME)
}
}
You might have to replace the "GreetingsPackage" if your DSL isn't named Greetings.
The static String passed to the error-method serves for identification of the error. This gets important when you want to implement Quickfixes which is the second thing you have asked for as they provide the possibility to give the programmer a few ideas how to actually fix this particular problem.
Because I don't have any experience with implementing quickfixes myself I can just give you this as a reference.

Stanford type dependency, can not extract "prepositional modfier"

I am trying to extract the prepositional modifier, like it is stated in the Dependency Manual:
I try to parse the sentence :
"I saw a cat with a telescope" , using the code:
List<CoreMap> sentences = stanfordDocument.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
Tree tree = sentence.get(TreeAnnotation.class);
TreebankLanguagePack languagePack = new PennTreebankLanguagePack();
GrammaticalStructureFactory grammaticalStructureFactory = languagePack.grammaticalStructureFactory();
GrammaticalStructure structure = grammaticalStructureFactory.newGrammaticalStructure(tree);
Collection<TypedDependency> typedDependencies = structure.typedDependenciesCollapsed();
for (TypedDependency td : typedDependencies) {
System.out.println(td.reln());
}
}
As stated in the Manual I was expecting to get : prep(saw, with).
In the Collection of the TypedDependeny I get only
"nsubj; root; det; dobj; det; prep_with" as relation type, and not the "prep/prepc" as stated in the http://robotics.usc.edu/~gkoch/DependencyManual.pdf (page 8).
I have also tried to extract pcomp : Prepositional compelement (page 7 of the manual) and it doesnt find it.
Did somebody encountered the same problem? Am I doing anything wrong?
CoreNLP outputs "Collapsed dependencies preserving a tree structure" (section 4.4 of the manual) from my experience. I think it's the same thing here (e.g. prep_with is a collapsed dependency of prep(saw, with))

How to create a parser which tokenizes a list of words taken from a file?

I am trying to do a syntax text corrector for my compilers' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like "A valid phrase is SUBJECT VERB ADJECTIVE", as in "Ruby is great".
Ok, so first I have to tokenize the input "Ruby is great". So I have a text file "verbs", with a lot of verbs, one by line. Then I have one text "adjectives", one "pronouns", etc.
I am trying to use Ragel to create a parser, but I don't know how I could do something like:
%%{
machine test;
subject = <open-the-subjects-file-and-accept-each-one-of-them>;
verb = <open-the-verbs-file-and-accept-each-one-of-them>;
adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
main = subject verb adjective # { print "Valid phrase!" } ;
}%%
I looked at ANTLR, Lex/Yacc, Ragel, etc. But couldn't find one that seemed to solve this problem. The only way to do this that I could think of was to preprocess Ragel's input file, so that my program reads the file and writes its contents at the right place. But I don't like this solution either.
Does anyone knows how I could do this? There's no problem if it isn't with Ragel, I just want to solve this problem. I would like to use Ruby or Python, but that's not really necessary either.
Thanks.
If you want to read the files at compile time .. make them be of the format:
subject = \
ruby|\
python|\
c++
then use ragel's 'include' or 'import' statement (I forget which .. must check the manual) to import it.
If you want to check the list of subjects at run time, maybe just make ragel read 3 words, then have an action associated with each word. The action can read the file and lookup if the word is good or not at runtime.
The action reads the text file and compares the word's contents.
%%{
machine test
action startWord {
lastWordStart = p;
}
action checkSubject {
word = input[lastWordStart:p+1]
for possible in open('subjects.txt'):
if possible == word:
fgoto verb
# If we get here do whatever ragel does to go to an error or just raise a python exception
raise Exception("Invalid subject '%s'" % word)
}
action checkVerb { .. exercise for reader .. ;) }
action checkAdjective { .. put adjective checking code here .. }
subject = ws*.(alnum*)>startWord%checkSubject
verb := : ws*.(alnum*)>startWord%checkVerb
adjective := ws*.)alnum*)>startWord%checkAdjective
main := subject;
}%%
With bison I would write the lexer by hand, which lookup the words in the predefined dictionary.

Resources