ROUGE evaluation method gives zero value - summarization

I have set all parameters as discribed in http://kavita-ganesan.com/rouge-howto. But I get zero values of precision recall and f-1. Please, Help me what can i do?

If you have set all parameters right and are not getting any error while running rouge then probably you are doing the following mistake while making your summary files in html format.
rouge does not handle whitespaces properly
thus
<a name="1">[1]</a> <a href="#1" id= 1>
<a name="1">[1]</a> <a href="#1" id=1>
are not the same
In the first case you will not be shown any error but the output will be zero. In second case you get the right output.
Hope this helps..

The settings.xml file should look something like this:
<ROUGE_EVAL version="1.5.5">
<EVAL ID="1">
<PEER-ROOT>systems</PEER-ROOT>
<MODEL-ROOT>models</MODEL-ROOT>
<INPUT-FORMAT TYPE="SPL" />
<PEERS>
<P ID="1">peer.txt</P>
</PEERS>
<MODELS>
<M ID="A">modelA.txt</M>
<M ID="B">modelB.txt</M>
<M ID="C">modelC.txt</M>
<M ID="D">modelD.txt</M>
</MODELS>
</EVAL>
</ROUGE_EVAL>
Although your input format type might be different, I find that SPL works for .txt and SEE is for HTML.
One thing that tripped me up was: <M ID="A">modelA.txt</M>, I had it as <P ID="A">modelA.txt</P>, ROUGE didn't complain, it just came out with 0 for every value. So keep an eye out for small things like that.

Related

Carrying style IDs/names from HTML to .docx?

Is it possible to somehow tell pandoc to carry the names of styles from original HTML to .docx?
I understand that in order to tune the actual styles, I should be using reference.docx file generated by pandoc. However, reference.docx is limited to what styles it has to: headings, body text, block text, etc.
I'd like to:
specify "myStyle" style in the input HTML (via a "class" attribute, via any other HTML attribute or even via a filter code written in Lua),
<html>
<body>
<p>Hello</p>
<p class="myStyle">World!</p>
</body>
</html>
add a custom "myStyle" to reference.docx using Word,
run a html->docx conversion an expect pandoc generate a paragraph element with "myStyle" (instead of BodyText, which I believe it sets by default), so the end result looks like this (contents of word/document.xml inside the resulting output.docx was cut for brevity):
<w:p>
<w:pPr>
<w:pStyle w:val="BodyText" />
</w:pPr>
<w:r>
<w:txml:space="preserve">Hello</w:t>
</w:r>
</w:p>
<w:p>
<w:pPr>
<w:pStyle w:val="myStyle" />
</w:pPr>
<w:r>
<w:txml:space="preserve">World!</w:t>
</w:r>
</w:p>
There's some evidence styleId can be passed around, but I don't really understand it and am unable to find any documentation about it.
Doc on filtering in Lua states you can access attrs when manipulating a pandoc.div, but it says nothing about whether any of the attrs will be interpreted by pandoc in any meaningful way.
Finally, found what I needed – Custom styles. It's limited, but better than what I arrived earlier, and of course much better than nothing at all :)
I'll leave a step-by-step guide here in case anyone stumbles upon a similar question.
First, generate a reference.docx file like this:
pandoc --print-default-data-file reference.docx > styles.docx
Then open the file in MS Word (I was using a macOS version) you'll see this:
Click the "New style..." button on the right, and create a style to your liking. In my case I made change the style of text to be bold, in blue color:
Since I am converting from HTML to DOCX, here's my input.html:
<html>
<body>
<div>Page 1</div>
<div custom-style="eugene-is-testing">Page 2</div>
<div>Page 3</div>
</body>
</html>
Run:
pandoc --standalone --reference-doc styles.docx --output output.docx input.html
Finally, enjoy the result:

How do I escape HTML for th:errors?

Let's say I have:
<span th:if="${#fields.hasErrors('firstName')}" class="color--error" th:errors="*{firstName}"></span>
How do I escape the text if the error text contains HTML? I know for normal text, we can use th:utext.
As of 3.0.8-SNAPSHOT, Thymeleaf-Spring has th:uerrors.
See this GitHub issue for the discussion: https://github.com/thymeleaf/thymeleaf-spring/issues/153
And this change log for 3.0.8: http://forum.thymeleaf.org/Thymeleaf-3-0-8-JUST-PUBLISHED-td4030687.html
th:errors is just a shortcut. You still use th:utext for this, you just have to manually output your errors. In your case, the code could look something like:
<div th:if="${#fields.hasErrors('firstName')}" th:each="err: ${#fields.errors('firstName')}" th:utext="${err}" class="color--error" />

google translate misses up the coding of my file

i am trying to use google translate for localization of an XML file, it has near 350K lines, but some of them contain coding for in-game font size and color, like so:
<replacement><p horizontalalignment="center"><br/><image enablescale="false" imagesetpath="00015590.InterD_Jeryoung_3"/><br/><image enablescale="true" imagesetpath="00015590.Tag_Dungeon_Six_Superior" scalerate="1.5"/><image enablescale="true" imagesetpath="00015590.Tag_Dungeon_Four_Superior" scalerate="1.5"/><br/><image enablescale="true" imagesetpath="00009499.Field_Boss" scalerate="1.4"/>Хмельной лик<br/><br/></p>Уничтожить зараженных насекомых<br/>возле мест обитания их королевы。<br/></replacement>
now for god knows what reason, google translate alters that code in the process of translation into some unacceptable coding, like so:
<replacement> <p horizontalalignment="center"> <br/> <image enablescale="false" imagesetpath="00015590.InterD_Jeryoung_3"/> <br/> <image enablescale = "true "imagesetpath =" 00015590.Tag_Dungeon_Six_Superior "scalerate =" 1.5 "/> <image enablescale="true" imagesetpath="00015590.Tag_Dungeon_Four_Superior" scalerate="1.5"/> <br/> <image enablescale = "true" imagesetpath = "00009499.Field_Boss" scalerate = "1.4" /> Intoxicated face <br/> <br/> </ p> Destroy infected insects <br/> habitats near their queen. <br/> </ replacement>
is there any way to avoid that, why is it happening exactly? anyhelp is appreciated on that matter,thanks
EDIT : i am also looking for a way to input my text and have it out in the same exact language with only the coding mishaps changing, so i can isolate those,build a comparison table and then use that to fix the errors after the actual translation is done, but i don't see a way for selecting the same language as input AND output in google translate, it always forces me choose a different one in input or output, kind of makes sense but if there is a way to do that, i might be able to work around it..
Do not feed Google translate with your Xml file, as far as I know it doesn't understand Xml.
Extract the text from the Xml file.
Feed the text to translate.
Transform the text back to Xml.
You could simply transform the Xml to a text document with a single line per Xml element so it would be easier to turn it back into Xml.
More detail
According to the Toolkit you can upload:
HTML (.HTML)
Microsoft Word (.DOC/.DOCX)
OpenDocument Text (.ODT)
Plain Text (.TXT)
Rich Text (.RTF)
Wikipedia URLs
And a couple of extras such as JSON. So no Xml.
The best way I see is to transform your Xml document into one of these types (I would probably use JSON) and transform it is such a way that it can easily be transformed back again by using either position (1 line in the text file is the first element in the Xml document) or by an id (add the Id or position of the element in the xml hierarchy to the JSON element)
My guess is that the toolkit recognizes the html tags in the xml and escapes them. So another option might be to un-escape the > to > and &lt to <

xpath with contains throws error if string starts with a number

I'm running into a strange problem with nokogiri and xpath. I want to parse a HTML document and get all links by href value and the anchor text they contain.
Here's my xpath so far:
xpath = "//a[contains(text(), #{link['anchor_text']}) and #href='#{link['target_url']}']"
a = doc.search(xpath)
This works fine so far as long as link['anchor_text'] is a string without numbers.
If I'm trying to get a link with the anchor text "11example" it throws the following error:
Invalid expression: //a[contains(text(), 11example) and #href='http://www.example.com/']
Maybe it's just a stupid mistake, but I'm not seeing why this error occurs. If I put some quotes around the #{link['anchor_text']} in the xpath, nothing is working.
Edit: Here's the sample HTML:
<!DOCTYPE html>
<head>
<title>Example.com</title>
</head>
<body>
<p>
<strong>Here is some text</strong><br />
11exampleSome text here and there
</p>
<p>
<strong>Another text</strong><br />
example.comSome text here and there
</p>
</body>
Edit2: If I run these queries manually in irb console everything works as expected, but only if I put the text in quotes.
Thanks in advance!
Kind regards,
madhippie
The simple answer is that you are missing quotes around #{link['anchor_text']}, like you have around #{link['target_url']}. The full XPath should be
xpath = "//a[contains(text(), '#{link['anchor_text']}') and #href='#{link['target_url']}']"
The reason it appears to work (at least not produce an error) when you don’t start with a number is that the string is being interpreted as a node query. For example Nokogiri is looking for a tag named <example.com> inside the <a> tag, then converting it to a string and seeing if the text nodes of the <a> tag contain that string. If the tag isn’t there (as in this case) then the result of contains is always true.
As a demonstration, with the HTML:
<q>foo</q>example
<q>foo</q>foo
foo
Then the query
doc.search("//a[contains(text(), q)]")
doesn’t match the first <a> tag, but does match the second and third.
When the string starts with a number, it can’t be parsed into a node query since names starting with digits aren’t valid XML (or HTML) element names, so you get an error.

MODX: Snippet strips and hangs string when parsing the vars

i have a snippet call like this:
[!mysnippet?&content=`[*content*]` !]
What happen is that, if i send some html like this:
[!mysnippet?&content=`<p color='red'>Yeah</p>` !]
it will return this:
<p colo
the [test only] snippet code (mysnippet) is:
<?php
return $content;
?>
Why is this happening?
My actual snippet is converting html to pdf, so i really need this.
Thank you all ;D
EDIT: I'm using Modx Evo 1.0.2
MODx Evolution has a limitation whereby you can't use "=" (equals signs) in Snippet parameter values. Best solution is to place the content in a chunk or TV and then call it. This is not an issue in MODx Revolution.

Resources