Arel producing different SQL output - ruby-on-rails

I'm writing a gem for Rails that makes use of Arel. I have run into a case where the output generated by a subnode is different depending on how the output is generated. Below is a test case. Is this a bug or am I doing something wrong?
it 'should generate the same output' do
def ts_language; 'english'; end
def searchable_columns; [:id, :name]; end
def ts_column_sets
column_sets = if searchable_columns[0].is_a?(Array)
searchable_columns.map do |array|
array.map { |c| Table.new(:users)[c] }
end
else
searchable_columns.map { |c| Table.new(:users)[c] }.map { |x| [x] }
end
end
def ts_vectors
ts_column_sets.map do |columns|
coalesce = columns[1..-1].inject(columns[0]) do |memo, column|
Arel::Nodes::InfixOperation.new('||', memo, column)
end
coalesce = Arel::Nodes::InfixOperation.new('::', Arel::Nodes::NamedFunction.new('COALESCE', [coalesce, '']), Arel::Nodes::SqlLiteral.new('text'))
Arel::Nodes::NamedFunction.new('to_tsvector', [ts_language, coalesce])
end
end
def ts_query(query)
querytext = query.is_a?(Array) ? query.map(&:to_s).map(&:strip) : query.to_s.strip.split(" ")
querytext = querytext[1..-1].inject(querytext[0]) { |memo, c| memo + ' & ' + c }
querytext << ':*'
querytext = Arel::Nodes::InfixOperation.new('::', querytext, Arel::Nodes::SqlLiteral.new('text'))
Arel::Nodes::NamedFunction.new('to_tsquery', ['english', querytext])
end
node = Arel::Nodes::InfixOperation.new('##', ts_vectors[0], ts_query(0))
assert_equal 'to_tsvector(\'english\', COALESCE("users"."id", 0) :: text)', node.left.to_sql
assert_equal 'to_tsquery(\'english\', \'0:*\' :: text)', node.right.to_sql
assert_equal 'to_tsvector(\'english\', COALESCE("users"."id", 0) :: text) ## to_tsquery(0, \'0:*\' :: text)', node.to_sql
end
The line
assert_equal 'to_tsquery(\'english\', \'0:*\' :: text)', node.right.to_sql
is correct but the line
assert_equal 'to_tsvector(\'english\', COALESCE("users"."id", 0) :: text) ## to_tsquery(0, \'0:*\' :: text)', node.to_sql
results in an error and the output is:
'to_tsvector(\'english\', COALESCE("users"."id", 0) :: text) ## to_tsquery(0, 0 :: text)'

Related

Ruby hash path to each leaf

First of all I beg your pardon if this question already exists, I deeply searched for a solution here but I've been able to find it, nevertheless I feel it's a problem so common that is seems so strange to not find anything here...
My struggle is the following: given an hash, I need to return all the PATHS to each leaf as an array of strings; so, for example:
{:a=> 1} gives ['a']
{:a=>{:b=>3, :c=>4} returns an array with two results: ["a.b", "a.c"]
{:a=>[1, {:b=>2}]} will result in ["a.0", "a.1.b"]
and so on...
I have found only partial solutions to this and with dozens of codelines. like this
def pathify
self.keys.inject([]) do |acc, element|
return acc if element.blank?
if !(element.is_a?(Hash) || element.is_a?(Array))
if acc.last.is_a?(Array)
acc[acc.size-1] = acc.last.join('.')
else
acc << element.to_s
end
end
if element.is_a?(Hash)
element.keys.each do |key|
if acc.last.is_a?(Array)
acc.last << key.to_s
else
acc << [key.to_s]
end
element[key].pathify
end
end
if element.is_a?(Array)
acc << element.map(&:pathify)
end
acc
end
end
But it does not work in all cases and is extremely inefficient. Summarizing: is there any way to "pathify" an hash to return all the paths to each leaf in form of array of strings?
Thank you for the help!
Edited
Adding some specs
for {} it returns []
for {:a=>1} it returns ["a"]
for {:a=>1, :b=>1} it returns ["a", "b"]
for {:a=>{:b=>1}} it returns ["a.b"] (FAILED - 1) got: ["a"]
for {:a=>{:b=>1, :c=>2}} it returns ["a.b", "a.c"] (FAILED - 2) got: ["a"]
for {:a=>[1]} it returns ["a.0"] (FAILED - 3) got: ["a"]
for {:a=>[1, "b"]} it returns ["a.0", "a.1"] (FAILED - 4) got: ["a"]
def show(key, path)
if path.is_a? Array
path.map {|p| "#{key}.#{p}"}
else
path == "" ? key.to_s : "#{key}.#{path}"
end
end
def pathify(input)
if input.is_a? Hash
input.map do |k,v|
sub_path = pathify(v)
show(k, sub_path)
end.flatten
elsif input.is_a? Array
input.map.with_index do |v, i|
sub_path = pathify(v)
show(i, sub_path)
end.flatten
else
""
end
end
def leaf_paths(enum)
return unless [Hash, Array].include? enum.class
[].tap do |result|
if enum.is_a?(Hash)
enum.each { |k, v| result = attach_leaf_paths(k, v, result) }
elsif enum.is_a?(Array)
enum.each_with_index { |elem, index| result = attach_leaf_paths(index, elem, result) }
end
end
end
def attach_leaf_paths(key, value, result)
if (children = leaf_paths(value))
children.each { |child| result << "#{key}.#{child}" }
else
result << key.to_s
end
result
end
This is very similar to https://github.com/wteuber/yaml_normalizer/blob/b85dca7357df00757c471acb5dadb79a53dd27c1/lib/yaml_normalizer/ext/namespaced.rb
So I tweaked the code a bit to fit your needs:
module Leafs
def leafs(namespace = [], tree = {})
each do |key, value|
child_ns = namespace.dup << key
if value.instance_of?(Hash)
value.extend(Leafs).leafs child_ns, tree
elsif value.instance_of?(Array)
value.each.with_index.inject({}) {|h, (v,k)| h[k]=v; h}.extend(Leafs).leafs child_ns, tree
else
tree[child_ns.join('.')] = value
end
end
tree.keys.to_a
end
end
Here is how to use it:
h = {a: [1, "b"], c: {d:1}}
h.extend(Leafs)
h.leafs
# => ["a.0", "a.1", "c.d"]
I hope you find this helpful.
def pathify(what)
paths = []
if what.is_a?(Array)
what.each_with_index do | element, index |
paths+= pathify(element).map{|e| index.to_s + '.' + e.to_s}
end
elsif what.is_a?(Hash)
what.each do |k,v|
paths+= pathify(v).map{|e| k.to_s + '.' + e.to_s}
end
else
paths.append('')
end
paths.map{|e| e.delete_suffix('.')}
end

Convert a structured string to a tree

I have strings formatted like this cookie,sandwich(hotdog,burger),cake(chocolate(tiramisu)),candy. I'd like to convert into a tree-like structure (can be hash/array):
cookie
sandwich
|__hotdog
|__burger
cake
|__chocolate
|__tiramisu
candy
What's the simplest way to do this? I looked at Treetop but it seems overkill.
str = "cookie,sandwich(hotdog,burger(cheese,onions)),cake(chocolate(tiramisu)),candy"
Let's first create a helper method to split strings on those commas that are separated by strings containing balanced parentheses.
def separate(str)
start_idx = 0
left_paren_count = 0
str.each_char.with_index.with_object([]) do |(c,i),a|
case c
when '('
left_paren_count += 1
when ')'
left_paren_count -= 1
when ','
if left_paren_count.zero?
a << str[start_idx..i-1]
start_idx = i+1
end
end
end << str[start_idx..-1]
end
For example,
separate(str)
#=> ["cookie",
# "sandwich(hotdog,burger(cheese,onions))",
# "cake(chocolate(tiramisu))",
# "candy"]
separate("hotdog,burger(cheese,onions)")
#=> ["hotdog",
# "burger(cheese,onions)"]
separate("cheese,onions")
#=> ["cheese", "onions"]
We may now write a recursive expression.
def recurse(str)
separate(str).map do |s,h|
s1, s2 = s.split('(', 2)
s.include?('(') ? [s1, recurse(s2[0..-2])] : s1
end
end
Try it.
recurse(str)
#=> ["cookie",
# ["sandwich", ["hotdog", ["burger", ["cheese", "onions"]]]],
# ["cake", [["chocolate", ["tiramisu"]]]], "candy"]
class Node
attr_reader :name
attr_reader :children
def initialize(str)
#name, children_str = str.match(/^(\w+)\((.+)\)$/)&.captures
nodes = (children_str || str)
.scan(/(\w+\([\w,]+\))|(\w+\([\w,()]+\))|(\w+)/)
.flatten.compact
if #name.nil? && nodes.size == 1
#name = nodes.first
else
#children = nodes.map { |s|
Node.new(s)
}
end
end
def show_tree(level=0)
str = ""
unless #name.nil?
str = " " * (level - 1) if level > 1
str += "|__" if level > 0
str += "#{#name}\n"
level += 1
end
#children&.each do |node|
str += "#{node.show_tree(level)}"
end
str
end
end
test
foods = Node.new("cookie,sandwich(hotdog,burger),cake(chocolate(tiramisu)),candy")
puts foods.show_tree
# cookie
# sandwich
# |__hotdog
# |__burger
# cake
# |__chocolate
# |__tiramisu
# candy
rails = Node.new("root(config,db(migrate,seeds),lib,app(controllers(concerns,api),models(concerns),views))")
puts rails.show_tree
# root
# |__config
# |__db
# |__migrate
# |__seeds
# |__lib
# |__app
# |__controllers
# |__concerns
# |__api
# |__models
# |__concerns
# |__views

How to see if the characters of one string appear in order in a second string

Given two strings, "word" and "key", how can I write a method sequence_search(word, key) that returns true (else false) if the characters in "key" appear in the same order (but not necessarily contiguous) in "word"?
Suppose, for example, key = 'cat'. The method should return true if "word" equals "arcata", "c1a2t3" or "coat", but return false if "word" equals "cta".
Here is my attempt to answer the question:
def sequence_search(word, key)
arr = []
i = 0
while i < word.length
word[i].include?(key)
arr >> word[i]
end
i+= 1
end
if arr.join == key # line raising exception
return true
else false
end
end
When I run my code I get the exception:
NameError (undefined local variable or method `arr' for main:Object)
in the line indicated. Why? And is there a better way to write the method?
To determine the problem with your code it helpful to first format it properly.
def sequence_search(word, key)
arr = []
i = 0
while i < word.length
word[i].include?(key)
arr >> word[i]
end
i += 1
end
if arr.join == key # line raising exception
return true
else
false
end
end
As you see, the method does not end where you thought it did and there is an extra end. Some of the problems are as follows:
the while loop will never end because i is not incremented within the loop;
arr >> word[i] should be arr << word[i]
word[i].include?(key) has no effect, as it's return value is not used (you may want arr << word[i] if word[i].include?(key));
the logic is wrong: after correcting the code if word = "acat" (which contains "cat") you are trying to construct the array arr #=> ["a", "c", "a", "t"], which you will join (to produce "acat") and compare with "cat" (if arr.join == key), which would (erroneously) fail.
Here are two ways you could write the method.
Use String#index to step through word looking for each character of key
def sequence_search(word, key)
i = -1
key.each_char.all? { |c| i = word.index(c,i+1) }
end
sequence_search("arcata", "cat") #=> true
sequence_search("c1a2t3", "cat") #=> true
sequence_search("cta", "cat") #=> false
sequence_search("coat", "cat") #=> true
See String#index, with particular attention to the optional second argument, the offset into the string at which the search is to begin.
Use a regular expression
def sequence_search(word, key)
word.match?(/#{key.chars.join(".*")}/)
end
sequence_search("arcata", "cat") #=> true
sequence_search("c1a2t3", "cat") #=> true
sequence_search("cta", "cat") #=> false
sequence_search("coat", "cat") #=> true
When key = "cat",
/#{key.chars.join(".*")}/
#=> /c.*a.*t/
The regular expression reads, "match a 'c' followed by zero or more characters followed by 'a' followed by zero or more characters followed by 't'.
Delete every char which is not in the key, and check if the key is included in the remainder:
def sequence_search(str, key)
str.delete("^#{key}").include?(key) # ^ means "everything but"
end
I don't know what solution are you looking for, but this a quick solution for me:
def sequence_search(word, key)
arr = key.split('')
arr.each do |c|
return false if word.index(c) == nil
word.slice!(0, word.index(c) + 1)
return true if arr.last == c
end
end
sequence_search('cat', 'cat') #=> true
sequence_search('cdadsas', 'cat') #=> false
sequence_search('gdfgddfgcgddadsast', 'cat') #=> true
sequence_search('gdfgddfgcgddadsast', 'cat4') #=> false
Useful information:
require 'benchmark'
N = 100_000
puts 'Ruby %s' % RUBY_VERSION
def cary1(word, key)
i = -1
key.each_char.all? { |c| i = word.index(c,i+1) }
end
def cary2(word, key)
word.match?(/#{key.chars.join(".*")}/)
end
def steenslag(str, key)
str.delete("^#{key}").include?(key) # ^ means "everything but"
end
def estebes(word, key)
arr = key.split('')
arr.each do |c|
return false if word.index(c) == nil
word.slice!(0, word.index(c) + 1)
return true if arr.last == c
end
end
Benchmark.bmbm do |x|
x.report('cary1') { N.times { cary1("arcata", "cat") } }
x.report('cary2') { N.times { cary2("arcata", "cat") } }
x.report('steenslag') { N.times { steenslag("arcata", "cat") } }
x.report('estebes') { N.times { estebes("arcata", "cat") } }
end
# >> Ruby 2.7.1
# >> Rehearsal ---------------------------------------------
# >> cary1 0.128231 0.000218 0.128449 ( 0.128572)
# >> cary2 0.461305 0.000509 0.461814 ( 0.462048)
# >> steenslag 0.055794 0.000026 0.055820 ( 0.055847)
# >> estebes 0.263030 0.000185 0.263215 ( 0.263399)
# >> ------------------------------------ total: 0.909298sec
# >>
# >> user system total real
# >> cary1 0.131944 0.000141 0.132085 ( 0.132227)
# >> cary2 0.453452 0.000626 0.454078 ( 0.454374)
# >> steenslag 0.055342 0.000026 0.055368 ( 0.055394)
# >> estebes 0.255280 0.000156 0.255436 ( 0.255607)
Using Fruity:
require 'fruity'
puts 'Ruby %s' % RUBY_VERSION
def cary1(word, key)
i = -1
key.each_char.all? { |c| i = word.index(c,i+1) }
end
def cary2(word, key)
word.match?(/#{key.chars.join(".*")}/)
end
def steenslag(str, key)
str.delete("^#{key}").include?(key) # ^ means "everything but"
end
def estebes(word, key)
arr = key.split('')
arr.each do |c|
return false if word.index(c) == nil
word.slice!(0, word.index(c) + 1)
return true if arr.last == c
end
end
compare do
_cary1 { cary1("arcata", "cat") }
_cary2 { cary2("arcata", "cat") }
_steenslag { steenslag("arcata", "cat") }
_estebes { estebes("arcata", "cat") }
end
# >> Ruby 2.7.1
# >> Running each test 8192 times. Test will take about 2 seconds.
# >> _steenslag is faster than _cary1 by 2x ± 0.1
# >> _cary1 is faster than _estebes by 2x ± 0.1
# >> _estebes is faster than _cary2 by 2x ± 0.1

Test-first-ruby 13_xml_document

I am working on test-first-ruby-master (you can find it at https://github.com/appacademy/test-first-ruby).
The 13_xml_document_spec.rb is the Rspec test that my code must pass. This test has several tasks, but it is the last one (called "indents") that my code doesn't accomplish.
Here is the Rspec test:
require "13_xml_document"
describe XmlDocument do
before do
#xml = XmlDocument.new
end
it "renders an empty tag" do
expect(#xml.hello).to eq("<hello/>")
end
it "renders a tag with attributes" do
expect(#xml.hello(:name => "dolly")).to eq('<hello name="dolly"/>')
end
it "renders a randomly named tag" do
tag_name = (1..8).map{|i| ("a".."z").to_a[rand(26)]}.join
expect(#xml.send(tag_name)).to eq("<#{tag_name}/>")
end
it "renders block with text inside" do
expect(#xml.hello { "dolly" }).to eq("<hello>dolly</hello>")
end
it "nests one level" do
expect(#xml.hello { #xml.goodbye }).to eq("<hello><goodbye/></hello>")
end
it "nests several levels" do
xml = XmlDocument.new
xml_string = xml.hello do
xml.goodbye do
xml.come_back do
xml.ok_fine(:be => "that_way")
end
end
end
expect(xml_string).to eq('<hello><goodbye><come_back><ok_fine
be="that_way"/></come_back></goodbye></hello>')
end
it "indents" do
#xml = XmlDocument.new(true)
xml_string = #xml.hello do
#xml.goodbye do
#xml.come_back do
#xml.ok_fine(:be => "that_way")
end
end
end
expect(xml_string).to eq(
"<hello>\n" +
" <goodbye>\n" +
" <come_back>\n" +
" <ok_fine be=\"that_way\"/>\n" +
" </come_back>\n" +
" </goodbye>\n" +
"</hello>\n"
)
end
end
And here is my code:
class XmlDocument
def initialize(indentation = false)
#indentation = indentation
#counter = 0
end
def method_missing(method, *args, &block)
hash = {}
if block
if #indentation == false
"<#{method}>#{yield}</#{method}>"
elsif #indentation == true
string = ""
string << indent1
string << "<#{method}>\n"
(###)
add_indent
string << indent1
string << yield + "\n"
sub_indent
string << indent2
string << "</#{method}\>"
string
end
elsif args[0].is_a?(Hash)
args[0].map { |key,value| "<#{method.to_s} #{key.to_s}=\"#{value.to_s}\"/>" }.join(" ")
elsif hash.empty?
"<#{method.to_s}/>"
end
end
def indent1
" " * #counter
end
def indent2
" " * #counter
end
def add_indent
#counter += 1
end
def sub_indent
#counter -= 1
end
end
This is the output I get for the "indents" part:
<hello>
<goodbye>
<come_back>
+ <ok_fine be="that_way"/>
</come_back>
</goodbye>
</hello>
Contrary to the right answer, the 4th line ('ok_fine be="that_way"/') seems be two indents closer to the left than it is supposed to be. As opposed to the rest of the lines, the 4th line is not a block, but an argument of the called method 'come_back'.
I cannot see where my mistake is. Even writing an exception in the code (where the (###) is in my code) doesn't seem to have any effect on the 4th line.
Here is the exception (###):
if args[0].is_a?(Hash)
add_indent
string << indent
arg[0].map{|key, value| string << "<#{method.to_s} #{key.to_s}=\"#{value.to_s}\"/>"}
end
NOTE: I assume that if I manage to give the 4th line the right numbers of indents, that also will increase the number of indents of the lines after it, so the method 'indent2' will need to be modified.
I figured out what the problem was. As I said in my question, in the Rspec test they have the following input:
xml_string = xml.hello do
xml.goodbye do
xml.come_back do
xml.ok_fine(:be => "that_way")
end
end
end
where the 4th line (xml.ok_fine(:be => "that_way")) doesn't have a block nested, but an argument. In my code I established a condition (if block) for when there is a block present and inside this first condition, a second condition (if #indentation == true) for when #indentation is true:
if block
if #indentation == false
"<#{method}>#{yield}</#{method}>"
elsif #indentation == true
...
It is inside this second condition that I create the variable 'string' where I shovel in the different parts:
elsif #indentation == true
string = ""
string << indent1
string << "<#{method}>\n"
(###)
add_indent
string << indent1
string << yield + "\n"
sub_indent
string << indent2
string << "</#{method}\>"
string
end
But because the 4th line doesn't carry a block, the first condition (if block) doesn't return true for it and therefore this 4th line is skipped.
I've re-written my code so now it passes the Rspec test:
class XmlDocument
def initialize(indentation = false)
#indentation = indentation
#counter = 0
end
def method_missing(method, args = nil, &block)
string = ""
arguments = args
if #indentation == false
if (arguments == nil) && (block == nil)
"<#{method.to_s}/>"
elsif arguments.is_a?(Hash)
arguments.map { |key,value| "<#{method.to_s} #{key.to_s}=\"#{value.to_s}\"/>" }.join(" ")
elsif block
"<#{method}>#{yield}</#{method}>"
end
elsif #indentation == true
if (block) || (arguments.is_a?(Hash))
string << indent1
string << "<#{method}>\n" unless !block
add_indent
string << indent1 unless !block
if block
string << yield + "\n"
elsif arguments.is_a?(Hash)
arguments.map { |key,value| string << "<#{method.to_s} #{key.to_s}=\"#{value.to_s}\"/>" }
end
sub_indent
string << indent2 unless !block
string << "</#{method}\>" unless !block
if indent2 == ""
string << "\n"
end
end
string
end
end
def indent1
" " * #counter
end
def indent2
" " * #counter
end
def add_indent
#counter += 1
end
def sub_indent
#counter -= 1
end
end
In contrast to the code I wrote in my question, in this one, the two main conditions are #indentation == false and #indentation == true and inside these two conditions I establish different exceptions for the different cases (block or no block, argument or no argument...). Specifically for elsif #indentation == true I created a condition that accepts the 4th line: if (block) || (arguments.is_a?(Hash)), or in other words, it accepts methods that have a block or an argument (especifically a a hash).
Now, I shovel in the different parts in 'string', and when I reach a block to yield there is a bifurcation:
if block
string << yield + "\n"
elsif arguments.is_a?(Hash)
arguments.map { |key,value| string << "<#{method.to_s} #{key.to_s}=\"#{value.to_s}\"/>" }
if there is a block I "yield" it, and if there is and argument that is a hash I shovel it into 'string'.
Also, there is this exception unless !block either when I indent or I shovel a method because otherwise it can introduce unwanted indents and '\n' if there is a method that doesn't have a block (as line 4th).
Finally, I had to add at the end
if indent2 == ""
string << "\n"
end
because the solution requires a '\n' at the end.
I hope this answer can help other
NOTE: I wrote a 'NOTE' in my question where I assumed I would have to modify 'indent2'. That, obviously I didn't have to do because the output I was getting did not considered the 4th line (because it doesn't have a block), so the bigger indentation (" ") of 'indent2' is all right.

Escape non HTML tags in plain text (convert plain text to HTML)

Using Rails, I need to get a plain text and show it as HTML, but I don't want to use <pre> tag, as it changes the format.
I needed to subclass HTML::WhiteListSanitizer to escape non whitelisted tags (by changing process_node), monkey patch HTML::Node to don't downcase tags' names and monkey patch HTML::Text to apply <wbr /> word splitting:
class Text2HTML
def self.convert text
text = simple_format text
text = auto_link text, :all, :target => '_blank'
text = NonHTMLEscaper.sanitize text
text
end
# based on http://www.ruby-forum.com/topic/87492
def self.wbr_split str, len = 10
fragment = /.{#{len}}/
str.split(/(\s+)/).map! { |word|
(/\s/ === word) ? word : word.gsub(fragment, '\0<wbr />')
}.join
end
protected
extend ActionView::Helpers::TagHelper
extend ActionView::Helpers::TextHelper
extend ActionView::Helpers::UrlHelper
class NonHTMLEscaper < HTML::WhiteListSanitizer
self.allowed_tags << 'wbr'
def self.sanitize *args
self.new.sanitize *args
end
protected
# Copy, just to reference this Node definition
def tokenize(text, options)
options[:parent] = []
options[:attributes] ||= allowed_attributes
options[:tags] ||= allowed_tags
tokenizer = HTML::Tokenizer.new(text)
result = []
while token = tokenizer.next
node = Node.parse(nil, 0, 0, token, false)
process_node node, result, options
end
result
end
# gsub <> instead of returning nil
def process_node(node, result, options)
result << case node
when HTML::Tag
if node.closing == :close
options[:parent].shift
else
options[:parent].unshift node.name
end
process_attributes_for node, options
options[:tags].include?(node.name) ? node : node.to_s.gsub(/</, "<").gsub(/>/, ">")
else
bad_tags.include?(options[:parent].first) ? nil : node.to_s
end
end
class Text < HTML::Text
def initialize(parent, line, pos, content)
super parent, line, pos, content
#content = Text2HTML.wbr_split content
end
end
# remove tag/attributes downcases and reference this Text
class Node < HTML::Node
def self.parse parent, line, pos, content, strict=true
if content !~ /^<\S/
Text.new(parent, line, pos, content)
else
scanner = StringScanner.new(content)
unless scanner.skip(/</)
if strict
raise "expected <"
else
return Text.new(parent, line, pos, content)
end
end
if scanner.skip(/!\[CDATA\[/)
unless scanner.skip_until(/\]\]>/)
if strict
raise "expected ]]> (got #{scanner.rest.inspect} for #{content})"
else
scanner.skip_until(/\Z/)
end
end
return HTML::CDATA.new(parent, line, pos, scanner.pre_match.gsub(/<!\[CDATA\[/, ''))
end
closing = ( scanner.scan(/\//) ? :close : nil )
return Text.new(parent, line, pos, content) unless name = scanner.scan(/[^\s!>\/]+/)
unless closing
scanner.skip(/\s*/)
attributes = {}
while attr = scanner.scan(/[-\w:]+/)
value = true
if scanner.scan(/\s*=\s*/)
if delim = scanner.scan(/['"]/)
value = ""
while text = scanner.scan(/[^#{delim}\\]+|./)
case text
when "\\" then
value << text
value << scanner.getch
when delim
break
else value << text
end
end
else
value = scanner.scan(/[^\s>\/]+/)
end
end
attributes[attr] = value
scanner.skip(/\s*/)
end
closing = ( scanner.scan(/\//) ? :self : nil )
end
unless scanner.scan(/\s*>/)
if strict
raise "expected > (got #{scanner.rest.inspect} for #{content}, #{attributes.inspect})"
else
# throw away all text until we find what we're looking for
scanner.skip_until(/>/) or scanner.terminate
end
end
HTML::Tag.new(parent, line, pos, name, attributes, closing)
end
end
end
end
end
end

Resources