Using go-html-transform to preprocess HTML: Replace fails

Using go-html-transform to preprocess HTML: Replace fails - html-parsing

Following on from this question on whitelisting HTML tags, I've been experimenting with Jeremy Wall's go-html-transform. In the hopes of improving searchable documentation I'm asking this here rather than pestering the author directly... hopefully this isn't too tool-specific for SO.
App Engine, latest SDK. Post.Body is a []byte. This works:
package posts
import (
// ...
"html/template"
"code.google.com/p/go-html-transform/html/transform"
"code.google.com/p/go-html-transform/h5"
)
// ...
// Pre-process post body, then return it to the template as HTML()
// to avoid html/template's escaping allowable tags
func (p *Post) BodyHTML() template.HTML {
doc, _ := transform.NewDoc(string(p.Body))
t := transform.NewTransform(doc)
// Add some text to the end of any <strong></strong> nodes.
t.Apply(transform.AppendChildren(h5.Text("<em>Foo</em>")), "strong")
return template.HTML(t.String())
}
Result:
<strong>Blarg.<em>Foo</em></strong>
However, if instead of AppendChildren() I use something like the following:
t.Apply(transform.Replace(h5.Text("<em>Foo</em>")), "strong")
I get an internal server error. Have I misunderstood the use of Replace()? The existing documentation suggests this sort of thing should be possible.

Running your transform code outside of App Engine, it panics and you can see a TODO in the source at that point. Then it's not too much harder to read the code and see that it's going to panic if given a root node.

Related

How do I parse wikitext using built-in mediawiki support for lua scripting?

The wiktionary entry for faint lies at https://en.wiktionary.org/wiki/faint
The wikitext for the etymology section is:
From {{inh|en|enm|faynt}}, {{m|enm|feynt||weak; feeble}}, from
{{etyl|fro|en}} {{m|fro|faint}}, {{m|fro|feint||feigned; negligent;
sluggish}}, past participle of {{m|fro|feindre}}, {{m|fro|faindre||to
feign; sham; work negligently}}, from {{etyl|la|en}}
{{m|la|fingere||to touch, handle, usually form, shape, frame, form in
thought, imagine, conceive, contrive, devise, feign}}.
It contains various templates of the form {{xyz|...}}
I would like to parse them and get the text output as it shows on the page:
From Middle English faynt, feynt (“weak; feeble”), from Old French
faint, feint (“feigned; negligent; sluggish”), past participle of
feindre, faindre (“to feign; sham; work negligently”), from Latin
fingere (“to touch, handle, usually form, shape, frame, form in
thought, imagine, conceive, contrive, devise, feign”).
I have about 10000 entries extracted from the freely available dumps of wiktionary here.
To do this, my thinking is to extract templates and their expansions (in some form). To explore the possibilites I've been fiddling with the lua scripting facility on mediawiki. By trying various queries inside the debug console on edit pages of modules, like here:
https://en.wiktionary.org/w/index.php?title=Module:languages/print&action=edit
mw.log(p)
>> table
mw.logObject(p)
>> table#1 {
["code_to_name"] = function#1,
["name_to_code"] = function#2,
}
p.code_to_name("aaa")
>>
p.code_to_name("ab")
>>
But, I can't even get the function calls right. p.code_to_name("aaa") doesn't return anything.
The code that presumably expands the templates for the etymology section is here:
https://en.wiktionary.org/w/index.php?title=Module:etymology/templates
How do I call this code correctly?
Is there a simpler way to achieve my goal of parsing wikitext templates?
Is there some function available in mediawiki that I can call like "parse-wikitext("text"). If so, how do I invoke it?

To expand templates (and other stuff) in wikitext, use frame.preprocess, which is called as a method on a frame object. To get a frame object, use mw.getCurrentFrame. For instance, type = mw.getCurrentFrame():preprocess('{{l|en|word}}') in the console to get the wikitext resulting from {{l|en|word}}. That currently gives <span class="Latn" lang="en">[[word#English|word]]</span>.
You can also use the Expandtemplates action in the MediaWiki API ( https://en.wiktionary.org/w/api.php?action=expandtemplates&text={{l|en|word}}), or the Special:ExpandTemplates page, or JavaScript (if you open the browser console while browsing a Wiktionary page):
new mw.Api().get({
action: 'parse',
text: '{{l|en|word}}',
title: mw.config.values.wgPageName,
}).done(function (data) {
const wikitext = data.parse.text['*'];
if (wikitext)
console.log(wikitext);
});
If the mw.api library hasn't already been loaded and you get a TypeError ("mw.Api is not a constructor"):
mw.loader.using("mediawiki.api", function() {
// Use mw.Api here.
});
So these are some of the ways to expand templates.

firefox addon webrequest.addListener misbehaving

I want to examine http requests in an extension for firefox. To begin figuring out how to do what I want to do I figured I'd just log everything and see what comes up:
webRequest.onResponseStarted.addListener(
(stuff) => {console.log(stuff);},
{urls: [/^.*$/]}
);
The domain is insignificant, and I know the regex works, verified in the console. When running this code I get no logging. When I take out the filter parameter I get every request:
webRequest.onResponseStarted.addListener(
(stuff) => {console.log(stuff);}
);
Cool, I'm probably doing something wrong, but I can't see what.
Another approach is to manually filter on my own:
var webRequest = Components.utils.import("resource://gre/modules/WebRequest.jsm", {});
var makeRequest = function(type) {
webRequest[type].addListener(
(stuff) => {
console.log(!stuff.url.match(/google.com.*/));
if(!stuff.url.match(/google.com.*/))
return;
console.log(type);
console.log(stuff);
}
);
}
makeRequest("onBeforeRequest");
makeRequest("onBeforeSentHeaders");
makeRequest("onSendHeaders");
makeRequest("onHeadersReceived");
makeRequest("onResponseStarted");
makeRequest("onCompleted");
With the console.log above the if, I can see the regex returning true when I want it to and the code making it past the if. When I remove the console.log above the if the if no longer gets executed.
My question is then, how do I get the filtering parameter to work or if that is indeed broken, how can I get the code past the if to be executed? Obviously, this is a fire hose, and to begin searching for a solution I will need to reduce the data.
Thanks

urls must be a string or an array of match patterns. Regular expressions are not supported.
WebRequest.jsm uses resource://gre/modules/MatchPattern.jsm. Someone might get confused with the util/match-pattern add-on sdk api, which does support regular expressions.

Prestashop all translatable-field display none for product page

Just new in Prestashop (1.6.0.6), I've a problem with my product page in admin. All translatable-field are to display:none (I inspect the code with chrome).
So when I want to create a new product I can't because the name field is required.
I thought that it was simple to find the .js whose do that but it isn't.
If somebody could help me, I would be happy.
Thank you for your help
Hi,
I make some searches and see that the function hideOtherLanguage(id) hide and show translatable-field element.
function hideOtherLanguage(id)
{
console.log(id_language);
$('.translatable-field').hide();
$('.lang-' + id).show();
var id_old_language = id_language;
id_language = id;
if (id_old_language != id)
changeEmployeeLanguage();
updateCurrentText();
}
When I set the Id to 1 (default language), it works. It seems that when I load the page, the function is called twice and the last calling, the id value is undefined. So the show() function will not work.
If somebody could help me. Thank you.
In my console, I see only one error
undefined is not a function.
under index.php / Line 1002
...
$("#product_form").validate({
...
But I find the form.tpl template and set this lines in comment but nothing change.

EDIT: According to comment on this link http://forge.prestashop.com/browse/PSCFV-2928 this can possibly be caused by corrupted installation file(s) - so when on clean install - try to re-download and reinstall...
...otherwise:
I got into a similar problem - in module admin page, when creating configuration form using PrestaShop's HelperForm. I will provide most probable cases and their possible solutions.
The solution for HelperForm was tested on PS 1.6.0.14
Generally there are 2 cases when this will happen.
First, you have to check what html you recieve.
=> Display source code - NOT in developer tools/firebug/etc...!
=> I really mean the pure recieved (JavaScript untouched) html.
Check if your translatable-fields have already the inline style "display: none":
Case 1 - fields already have inline style(s) for "display: none"
This means the template/html was already prepared this way - most probably in some TPL file I saw codes similar to these:
<div class="translatable-field lang-{$language.id_lang}"
{if $language.id_lang != $id_lang_default}style="display:none"{/if}>
Or particularly in HelperForm template:
<div class="translatable-field lang-{$language.id_lang}"
{if $language.id_lang != $defaultFormLanguage}style="display:none"{/if}>
Case 1 is the most easy to solve, you just have to find, where to set this default language.
Solutions
HelperForm
Look where you've (or someone else) prepared the HelperForm object - something like:
$formHelper = new HelperForm();
...
Somewhere there will be something like $formHelper->default_form_language = ...;
My wrong first solution was to get default form language from context - which might not be set:
$this->context->controller->default_form_language; //THIS IS WRONG!
The correct way is to get the default language from configuration - something like:
$default_lang = new Language((int)Configuration::get('PS_LANG_DEFAULT'));
$formHelper->default_form_language = $default_lang->id;
...this particularly solved my problem...
Other form-creations
If there is something else than HelperForm used for form creations, the problem is still very similar.
You have to find where in files(probably tpls) is a condition for printing display:none for your case - then find where is the check-against-variable set and set it correctly yourself.
Case 2 - fields don't have inline style(s) for "display: none"
This means it is done after loading HTML by JavaScript. There are two options:
There is a call for hideOtherLanguage(), but there is wrongly set input language - that means no language will be displayed and all hidden.Solution for this one can be often solved by solving Case 1 (see above). In addition there can be programming error in not setting the after-used language id variable at all... then you would have to set it yourself (assign in JavaScript).
Some script calls some sort of .hide() on .translatable-field - you will have to search for it the hard way and remove/comment it out.
PS: Of course you can set the language to whatever you want, it is just common to set it to default language, because it is the most easier and the most clear way how to set it.

Why Won't this Clipboard Code Pass Mozilla Validation?

Salve! When I try Mozilla's Validator on my addon, it get the following error related to my treatment of clipboard usage:
nsITransferable has been changed in Gecko 16.
Warning: The nsITransferable interface has changed to better support
Private Browsing Mode. After instantiating the object, you should call
the init function on it before any other functions are called.
See https://developer.mozilla.org/en-US/docs/Using_the_Clipboard for more
information.
var trans = Components.classes["#mozilla.org/widget/transferable;1"].createInstance(Components.interfaces.nsITransferable);
if ('init' in trans){ trans.init(null);};
I can't understand this.
Here is my code - I am clearly calling trans.init:
var clip = Components.classes["#mozilla.org/widget/clipboard;1"].getService(Components.interfaces.nsIClipboard);
if (!clip) return "";
var trans = Components.classes["#mozilla.org/widget/transferable;1"].createInstance(Components.interfaces.nsITransferable);
if ('init' in trans){ trans.init(null);}; //<--IT DOESN'T LIKE THIS
if (!trans) return false;
trans.addDataFlavor("text/unicode");
I've also tried the Transferable function from Mozilla's example here, but get the same non-validation report.
One of the Mozilla AMO editors told me to write exactly this, and it still doesn't validate.
I've also tried, simply:
var trans = Components.classes["#mozilla.org/widget/transferable;1"].createInstance(Components.interfaces.nsITransferable);
trans.init(null); //<---LOOK HERE
if (!trans) return false;
trans.addDataFlavor("text/unicode");
The Validator does not report any errors - just this warning. Everything works properly. Mozilla updated their Gecko engine, and they want devlopers to match up to the new standard.
In my usage, we want to be able to use the contents of the clipboard that was probably gotten from outside the application, too, so we do want to call the init function with null instead of window.
Any advice would be wonderful!

trans.init(null) is valid in some circumstances, such as yours. It can also cause privacy leaks if used in the wrong circumstances, so the validator flags all uses of it as potentially requiring changing. Therefore, it is a warning that you can ignore in this case.

Nokogiri pull parser (Nokogiri::XML::Reader) issue with self closing tag

I have a huge XML(>400MB) containing products. Using a DOM parser is therefore excluded, so i tried to parse and process it using a pull parser. Below is a snippet from the each_product(&block) method where i iterate over the product list.
Basically, using a stack, i transform each <product> ... </product> node into a hash and process it.
while (reader.read)
case reader.node_type
#start element
when Nokogiri::XML::Node::ELEMENT_NODE
elem_name = reader.name.to_s
stack.push([elem_name, {}])
#text element
when Nokogiri::XML::Node::TEXT_NODE, Nokogiri::XML::Node::CDATA_SECTION_NODE
stack.last[1] = reader.value
#end element
when Nokogiri::XML::Node::ELEMENT_DECL
return if stack.empty?
elem = stack.pop
parent = stack.last
if parent.nil?
yield(elem[1])
elem = nil
next
end
key = elem[0]
parent_childs = parent[1]
# ...
parent_childs[key] = elem[1]
end
The issue is on self-closing tags (EG <country/>), as i can not make the difference between a 'normal' and a 'self-closing' tag. They both are of type Nokogiri::XML::Node::ELEMENT_NODE and i am not able to find any other discriminator in the documentation.
Any ideas on how to solve this issue?

There is a feature request on project page regarding this issue (with the corresponding failing test).
Until it will be fixed and pushed into the current version, we'll stick with good'ol
input_text.gsub! /<([^<>]+)\/>/, '<\1></\1>'

Hey Vlad, well I am not a Nokogiri expert but I've done a test and saw that the self_closing?() method works fine on determining the self closing tags. Give it a try.
P.S. : I know this is an old post :P / the documentation is HERE

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Using go-html-transform to preprocess HTML: Replace fails - html-parsing

Running your transform code outside of App Engine, it panics and you can see a TODO in the source at that point. Then it's not too much harder to read the code and see that it's going to panic if given a root node.

Related

How do I parse wikitext using built-in mediawiki support for lua scripting?

firefox addon webrequest.addListener misbehaving

Prestashop all translatable-field display none for product page

Why Won't this Clipboard Code Pass Mozilla Validation?

Nokogiri pull parser (Nokogiri::XML::Reader) issue with self closing tag

Categories

Resources