Nokogiri/Ruby array question - ruby-on-rails

I have a quick question. I am currently writing a Nokogiri/Ruby script and have the following code:
fullId = doc.xpath("/success/data/annotatorResultBean/annotations/annotationBean/concept/fullId")
fullId.each do |e|
e = e.to_s()
g.write(e + "\n")
end
This spits out the following text:
<fullId>D001792</fullId>
<fullId>D001792</fullId>
<fullId>D001792</fullId>
<fullId>D008715</fullId>
I wanted the just the numbers text in between the "< fullid>" saved, without the < fullId>,< /fullId> markup. What am I missing?
Bobby

I think you want to use the text() accessor (which returns the child text values), rather than to_s() (which serializes the entire node, as you see here).
I'm not sure what the g object you're calling write on is, but the following code should give you an array containing all of the text in the fullId nodes:
doc.xpath(your_xpath).map {|e| e.text}

Related

How to detect if a field contains a character in Lua

I'm trying to modify an existing lua script that cleans up subtitle data in Aegisub.
I want to add the ability to delete lines that contain the symbol "♪"
Here is the code I want to modify:
-- delete commented or empty lines
function noemptycom(subs,sel)
progress("Deleting commented/empty lines")
noecom_sel={}
for s=#sel,1,-1 do
line=subs[sel[s]]
if line.comment or line.text=="" then
for z,i in ipairs(noecom_sel) do noecom_sel[z]=i-1 end
subs.delete(sel[s])
else
table.insert(noecom_sel,sel[s])
end
end
return noecom_sel
end
I really have no idea what I'm doing here, but I know a little SQL and LUA apparently uses the IN keyword as well, so I tried modifying the IF line to this
if line.text in (♪) then
Needless to say, it didn't work. Is there a simple way to do this in LUA? I've seen some threads about the string.match() & string.find() functions, but I wouldn't know where to start trying to put that code together. What's the easiest way for someone with zero knowledge of Lua?
in is only used in the generic for loop. Your if line.text in (♪) then is no valid Lua syntax.
Something like
if line.comment or line.text == "" or line.text:find("\u{266A}") then
Should work.
In Lua every string have the string functions as methods attached.
So use gsub() on your string variable in loop like...
('Text with ♪ sign in text'):gsub('(♪)','note')
...thats replace the sign and output is...
Text with note sign in text
...instead of replacing it with 'note' an empty '' deletes it.
gsub() is returning 2 values.
First: The string with or without changes
Second: A number that tells how often the pattern matches
So second return value can be used for conditions or success.
( 0 stands for "pattern not found" )
So lets check above with...
local str,rc=('Text with strange ♪ sign in text'):gsub('(♪)','notation')
if rc~=0 then
print('Replaced ',rc,'times, changed to: ',str)
end
-- output
-- Replaced 1 times, changed to: Text with strange notation sign in text
And finally only detect, no change made...
local str,rc=('Text with strange ♪ sign in text'):gsub('(♪)','%1')
if rc~=0 then
print('Found ',rc,'times, Text is: ',str)
end
-- output is...
-- Found 1 times, Text is: Text with strange ♪ sign in text
The %1 holds what '(♪)' found.
So ♪ is replaced with ♪.
And only rc is used as a condition for further handling.

0 Checking if TextBox.Text contains the string in the table. But it doesn't work? Lua

I am making a script inside TextButton script that will check if the TextBox contains any of the word or string inside the table.
text = script.Parent.Parent:WaitForChild('TextBox')
label = script.Parent.Parent:WaitForChild('TextLabel')
a = {'test1','test2','test3'}
script.Parent.MouseButton1Click:connect(function()
if string.match(text.Text, a) then
label.Text = "The word "..text.Text.." was found in the table."
else
label.Text = "The word "..text.Text.." was not found in the table."
end
end)
But it gives an error string expected, got table. from line 7 which is refering to the line if string.match....
Is there any way to get all text in the table?
What's the right way to do it?
Oh boy, there's a lot to say about this.
The error message
Yes.
No, seriously, the answer is yes. The error message is exactly right. a is a table value; you can clearly see that on the third line of code. string.match needs a string as its second argument, so it obviously crashes.
Simple solution
use a for loop and check for each string in a separately.
found = false
for index, entry in ipairs(a) do
if entry == text.Text then
found = true
end
end
if found then
... -- the rest of your code
The better* solution
In Lua, if we want to know if a single element is in a set, we usually take advantage of the fact that tables are implemented as hashmaps, meaning they are very fast when looking up keys.
For that to work, one first needs to change the way the table looks:
a = {["test1"] = true, ["test2"] = true, ["test3"] = true}
Then we can just index a with a string to find out if it is contained int eh set.
if a[text.Text] then ...
* In practice this is just as good as the first solution as long as you only have a few elements in your table. It only becomes relevant when you have a few hundred entries or your code needs to run absolutely as fast as possible.

Confirming existence of a string in an xml table Lua

Good afternoon everyone,
My problem is that I have 2 XML lists
<List1> <Agency>String</Agency> </List1>
and
<List2><Agency2>String</Agency2><List2>.
In Lua I need to create a program which is parsing this list and when the user inputs a matching string from List 1 or List 2, the program needs to actually confirm to the user if the string belongs to either L1 or L2 or if the string is inexistent. I'm new to Lua and to programming generally speaking and I would be very grateful for you answers. I have LuaExpat as a plugin but I can't seem to be able to actually read from file, I can only do some beginner tricks if the xml list is written in the code. At a later time this small program will be fed by an RSS.
require("lxp")
local stuff = {}
xmldata="<Top><A/> <B a='1'/> <B a='2'/><B a='3'/><C a='3'/></Top>"
function doFunc(parser, name, attr)
if not (name == 'B') then return end
stuff[#stuff+1]= attr
end
local xml = lxp.new{StartElement = doFunc}
xml:parse(xmldata)
xml:close()
print(stuff[3].a)
This code is a tutorial over the web that works, everything is just fine it prints nr. 3. Now I want to know how to do that from an actual file, as if I input io.read:(file, "r" or "rb" ) under xmldata variable and run the same thing it returns either empty space or nil.

Nokogiri pull parser (Nokogiri::XML::Reader) issue with self closing tag

I have a huge XML(>400MB) containing products. Using a DOM parser is therefore excluded, so i tried to parse and process it using a pull parser. Below is a snippet from the each_product(&block) method where i iterate over the product list.
Basically, using a stack, i transform each <product> ... </product> node into a hash and process it.
while (reader.read)
case reader.node_type
#start element
when Nokogiri::XML::Node::ELEMENT_NODE
elem_name = reader.name.to_s
stack.push([elem_name, {}])
#text element
when Nokogiri::XML::Node::TEXT_NODE, Nokogiri::XML::Node::CDATA_SECTION_NODE
stack.last[1] = reader.value
#end element
when Nokogiri::XML::Node::ELEMENT_DECL
return if stack.empty?
elem = stack.pop
parent = stack.last
if parent.nil?
yield(elem[1])
elem = nil
next
end
key = elem[0]
parent_childs = parent[1]
# ...
parent_childs[key] = elem[1]
end
The issue is on self-closing tags (EG <country/>), as i can not make the difference between a 'normal' and a 'self-closing' tag. They both are of type Nokogiri::XML::Node::ELEMENT_NODE and i am not able to find any other discriminator in the documentation.
Any ideas on how to solve this issue?
There is a feature request on project page regarding this issue (with the corresponding failing test).
Until it will be fixed and pushed into the current version, we'll stick with good'ol
input_text.gsub! /<([^<>]+)\/>/, '<\1></\1>'
Hey Vlad, well I am not a Nokogiri expert but I've done a test and saw that the self_closing?() method works fine on determining the self closing tags. Give it a try.
P.S. : I know this is an old post :P / the documentation is HERE

Truncate Markdown?

I have a Rails site, where the content is written in markdown. I wish to display a snippet of each, with a "Read more.." link.
How do I go about this? Simple truncating the raw text will not work, for example..
>> "This is an [example](http://example.com)"[0..25]
=> "This is an [example](http:"
Ideally I want to allow the author to (optionally) insert a marker to specify what to use as the "snippet", if not it would take 250 words, and append "..." - for example..
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
The marker could be thought of like an EOF marker (which can be ignored when displaying the full document)
I am using maruku for the Markdown processing (RedCloth is very biased towards Textile, BlueCloth is extremely buggy, and I wanted a native-Ruby parser which ruled out peg-markdown and RDiscount)
Alternatively (since the Markdown is translated to HTML anyway) truncating the HTML correctly would be an option - although it would be preferable to not markdown() the entire document, just to get the first few lines.
So, the options I can think of are (in order of preference)..
Add a "truncate" option to the maruku parser, which will only parse the first x words, or till the "excerpt" marker.
Write/find a parser-agnostic Markdown truncate'r
Write/find an intelligent HTML truncating function
Write/find an intelligent HTML truncating function
The following from http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/, with some modifications will correctly truncate HTML, and easily allow appending a string before the closing tags.
>> puts "<p><b>Something</p>".truncate_html(5, at_end = "...")
=> <p><b>Someth...</b></p>
The modified code:
require 'rexml/parsers/pullparser'
class String
def truncate_html(len = 30, at_end = nil)
p = REXML::Parsers::PullParser.new(self)
tags = []
new_len = len
results = ''
while p.has_next? && new_len > 0
p_e = p.pull
case p_e.event_type
when :start_element
tags.push p_e[0]
results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
when :end_element
results << "</#{tags.pop}>"
when :text
results << p_e[0][0..new_len]
new_len -= p_e[0].length
else
results << "<!-- #{p_e.inspect} -->"
end
end
if at_end
results << "..."
end
tags.reverse.each do |tag|
results << "</#{tag}>"
end
results
end
private
def attrs_to_s(attrs)
if attrs.empty?
''
else
' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
end
end
end
Here's a solution that works for me with Textile.
Convert it to HTML
Truncate it.
Remove any HTML tags that got cut in half with
html_string.gsub(/<[^>]*$/, "")
Then, uses Hpricot to clean it up and close unclosed tags
html_string = Hpricot( html_string ).to_s
I do this in a helper, and with caching there's no performance issue.
You could use a regular expression to find a line consisting of nothing but "^" characters:
markdown_string = <<-eos
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
eos
preview = markdown_string[0...(markdown_string =~ /^\^+$/)]
puts preview
Rather than trying to truncate the text, why not have 2 input boxes, one for the "opening blurb" and one for the main "guts". That way your authors will know exactly what is being show when without having to rely on some sort of funkly EOF marker.
I will have to agree with the "two inputs" approach, and the content writer would need not to worry, since you can modify the background logic to mix the two inputs in one when showing the full content.
full_content = input1 + input2 // perhaps with some complementary html, for a better formatting
Not sure if it applies to this case, but adding the solution below for the sake of completeness. You can use strip_tags method if you are truncating Markdown-rendered contents:
truncate(strip_tags(markdown(article.contents)), length: 50)
Sourced from:
http://devblog.boonecommunitynetwork.com/rails-and-markdown/
A simpler option that just works:
truncate(markdown(item.description), length: 100, escape: false)

Resources