Get the attribute from CSS selector - ruby-on-rails

I'm trying to access the sender attribute of an XML document:
<adi:ADI2 createDateTime="2015-04-10T15:36:03+02:00" docNumber="777"
sender="test" relativePriority="1"...
with the following command:
xml.css('/adi|ADI2[sender]')
But it doesn't work, it gives the exact same result as:
xml.css('/adi|ADI2')
To get the value of the attribute, I'm forced to use:
xml.css('/adi|ADI2[sender]').attribute('sender')
Is there a way of getting the attribute directly from the CSS selector?

You're missing a document root and name-space declaration in your XML sample but here's a simple example of what to do:
require 'nokogiri'
doc = Nokogiri::XML('<root xmlns:adi="http://foo.com"><adi:ADI2 createDateTime="2015-04-10T15:36:03+02:00" docNumber="777" sender="test" relativePriority="1"><root>')
doc.at('adi|ADI2')['sender'] # => "test"
Once we have a pointer to a Node, it can be treated much like a hash. From the Node documentation:
A Nokogiri::XML::Node may be treated similarly to a hash with regard to attributes.
irb(main):004:0> node
=> link
irb(main):005:0> node['href']
=> "#foo"
irb(main):006:0> node.keys
=> ["href", "id"]
irb(main):007:0> node.values
=> ["#foo", "link"]
irb(main):008:0> node['class'] = 'green'
=> "green"
irb(main):009:0> node
=> link
irb(main):010:0>
Your syntax using
xml.css('/adi|ADI2[sender]')
is incorrect.
/adi|ADI2[sender] is an attempt to use mixed CSS/XPath selector it looks like. I'd recommend sticking to CSS as it's simpler and easier to read, unless you need the power of XPath.
Also, instead of using css, you might want to use at. css returns a NodeSet, and you can't return the specific attribute of every Node found using the [attr] syntax unless you iterate over the NodeSet using map. If you'll have multiple instances of that tag, then css, xpath or the generic search will work, otherwise use at, or the language-specific at_css or at_xpath, to find the first such occurrence. at is equivalent to search('...').first.
Nokogiri's "Searching an HTML / XML Document" tutorial covers this.

To get an attribute one could use # selector:
▶ xml = '<tag sender="test">'
#⇒ "<tag sender=\"test\">"
▶ xml = Nokogiri::XML(xml, nil, "UTF-8")
#⇒ #<Nokogiri::XML::Document:0x5ca6f16 name="document" children=...>
# ⇓⇓⇓⇓⇓⇓⇓ attribute
▶ xml.xpath('//tag/#sender').text
#⇒ "test"

Related

Nokogiri Use Ruby-Core Method as XML Node Name

How do I get Nokogiri to accept a ruby-core method as a node name e.g.
xml.hash Digest::SHA256.file form.survey_xml should return something like this
<hash>cde6f0dd030aac1d3aa6d231b7c0cc30a34686a6f6780c468ccc64a4822f01e0</hash>
Instead I am getting an error ArgumentError: wrong number of arguments (1 for 0) in hash of course because hash is a ruby method.
How do I set the node name to hash using the Nokogiri DSL since the API I am interacting with expects that node.
I can just create the xml manually but the answer I am looking for is specifically using nokogiri
More Info
Here is the xml I am trying to create:
<?xml version=\"1.0\"?>
<xforms xmlns=\"http://openrosa.org/xforms/xformsList\">
<xform>
<formID>1</formID>
<name>BLAH BLAH</name>
<version>1</version>
<hash>892734982SDHFK238479823749234934</hash>
<downloadUrl>/Users/me/workspace/dashboard/public/uploads/survey_xml/survey_xml/2/S_1_.xml</downloadUrl>
</xform>
</xforms>
Here is my code:
require 'nokogiri'
require 'digest'
def mine
xml = Nokogiri::XML::Builder.new{ |xml|
xml.xforms xmlns: 'http://openrosa.org/xforms/xformsList' do
#forms.each do |form|
xml.xform do
xml.formID form.id
xml.name form.name
xml.version 1
xml.hash Digest::SHA256.file form.survey_xml.survey_xml.file.file
xml.downloadUrl form.survey_xml.survey_xml.file.file
end
end
end
}.to_xml
end
Based on dimakura's answer:
You can use other Nokogiri methods.
xpath:
node.xpath('hash').first
search:
node.search('hash').first
children:
xml.children.select{|x| x.name == 'hash'}
If you are creating new elements, not getting them. Then you can add them, for example, like this:
xml.add_child '<hash>hash-code</hash>'
Update When working with Nokogiri::XML::Builder special names should be used with underscore (_):
xml.hash_ 'your-hash'

How to stream large xml in Rails 3.2?

I'm migrating our app from 3.0 to 3.2.x. Earlier the streaming was done by the assigning the response_body a proc. Like so:
self.response_body = proc do |response, output|
target_obj = StreamingOutputWrapper.new(output)
lib_obj.xml_generator(target_obj)
end
As you can imagine, the StreamingOutputWrapper responds to <<.
This way is deprecated in Rails 3.2.x. The suggested way is to assign an object that responds to each.
The problem I'm facing now is in making the lib_obj.xml_generator each-aware.
The current version of it looks like this:
def xml_generator(target, conditions = [])
builder = Builder::XmlMarkup.new(:target => target)
builder.root do
builder.elementA do
Model1.find_each(:conditions => conditions) { |model1| target << model1.xml_chunk_string }
end
end
end
where target is a StreamingOutputWrapper object.
The question is, how do I modify the code - the xml_generator, and the controller code, to make the response xml stream properly.
Important stuff: Building the xml in memory is not an option as the model records are huge. The typical size of the xml response is around 150MB.
What you are looking for is SAX Parsing. SAX reads files "chunks" at a time instead of loading the whole file into DOM. This is super convenient and fortunately there are a lot of people before you who have wanted to do the same thing. Nokogiri offers XML::SAX methods, but it can get really confusing in the disastrous documentation and syntactically, it's a mess. I would suggest looking into something that sits on top of Nokogiri and makes getting your job done, a lot more simple.
Here are a few options -
SAX_stream:
Mapping out objects in sax_stream is super simple:
require 'sax_stream/mapper'
class Product
include SaxStream::Mapper
node 'product'
map :id, :to => '#id'
map :status, :to => '#status'
map :name_confirmed, :to => 'name/#confirmed'
map :name, :to => 'name'
end
and calling the parser in is also simple:
require 'sax_stream/parser'
require 'sax_stream/collectors/naive_collector'
collector = SaxStream::Collectors::NaiveCollector.new
parser = SaxStream::Parser.new(collector, [Product])
parser.parse_stream(File.open('products.xml'))
However, working with the collectors (or writing your own) and end up being slightly confusing, so I would actually go with:
Saxerator:
Saxerator gets the job doen and has some really handy methods for traversing into nodes that can be a little less complex than sax_stream. Saxerator also has a few really great configuration options that are well documented. Simple Saxerator example below:
parser = Saxerator.parser(File.new("rss.xml"))
parser.for_tag(:item).each do |item|
# where the xml contains <item><title>...</title><author>...</author></item>
# item will look like {'title' => '...', 'author' => '...'}
puts "#{item['title']}: #{item['author']}"
end
# a String is returned here since the given element contains only character data
puts "First title: #{parser.for_tag(:title).first}"
If you end up having to pull the XML from an external source (or it is getting updated frequently and do you don't want to have to update the version on your server manually, check out THIS QUESTION and the accepted answer, it works great.
You could always monkey-patch the response object:
response.stream.instance_eval do
alias :<< :write
end
builder = Builder::XmlMarkup.new(:target => response.stream)
...

trying to understand specific bit of ruby syntax

I'm new to both Ruby and Rails, and as I go over various tutorials, I occasionally hit on a bit of Ruby syntax that I just can't grok.
For instance, what does this actually do?
root to: "welcome#index"
I can gather that this is probably a method named "root", but I'm lost after that. "To" isn't a symbol, is it? The colon would be before, as in ":to" if it were. Is this some form of keyword argument utilizing hashes? I can't make this syntax work when trying it in irb with ruby1.9.3.
I know this might be a RTFM question, but I can't even think of what to google for this.
Thanks!
I'm still playing around with this syntax,
def func(h)
puts h[:to]
end
x = { :to => "welcome#index" }
y = :to => "welcome#index"
z = to: "welcome#index"
func to: "welcome#index"
I see that this example only works with the lines defining "y" and "z" commented out. So the braceless and the "colon-after" syntax are only valid in the context of calling a method?
First, that's right - root is a method call.
Now
to: 'welcome#index'
is equivalent to
:to => 'welcome#index'
and it's a Hash where the key is :to symbol and value is 'welcome#index' string. You can use this syntax in defining hashes since Ruby 1.9.
It's equivalent to
root(:to => "welcome#index")
I'm having trouble finding the official documentation on the new hash syntax, but when you see foo: bar, it means that foo is a symbol used as a key in the hash and has a value bar.
Here is an example of defining a function foo which takes a hash, and prints to screen.
def foo(hash)
puts hash.inspect
puts hash[:to]
end
foo to: "wecome#index" #method call without paratheses
Output of method call above:
{:to=>"welcome#index"}
welcome#index
Equivalent declarations:
h = {:to => "welcome#index"}
h = {to: "wecolme#index"}
Also, you can use Ripper (part of Ruby standard library) to understand how Ruby parses code. In the example below, I have already defined foo as above. Now, I call foo without using Ripper. Then I use Ripper to see how Ruby parses the method call.
[2] pry(main)> foo to: "welcome#index"
{:to=>"welcome#index"}
welcome#index
=> nil
[3] pry(main)> require 'ripper'
=> true
[4] pry(main)> Ripper.sexp 'foo to: "welcome#index"'
=> [:program,
[[:command,
[:#ident, "foo", [1, 0]],
[:args_add_block,
[[:bare_assoc_hash,
[[:assoc_new,
[:#label, "to:", [1, 4]],
[:string_literal,
[:string_content, [:#tstring_content, "welcome#index", [1, 9]]]]]]]],
false]]]]
In ruby braces in method calls are optional, so it can be rewritten as:
root(to: "welcome#index")
and it can be rewritten again as
root(:to => "welcome#index")
Hashes as keyword arguments(ruby 1.9) explained here as well: hash-constructor-parameter-in-1-9
P.S. and by the way general rule of the thumb for the rails-newcomers is "learn ruby first, then learn rails" ;)
As you correctly gathered, root is a method call. Or rather, it's a message send. Ruby, like Smalltalk, builds upon a messaging metaphor, where objects send messages to other objects, and those objects (called the receiver) respond to those messages.
In this case, you pass an argument to root, that's how you know it's a message send. Message sends are the only thing that can take arguments, if you see an argument, then it must be a message send. There are no functions, no static methods, no constructors, no procedures, only methods and message sends.
So, what is the argument? Well, in Ruby, a lot of things that are syntactically required in other languages are optional. For example, parenthesis around the argument list:
foo.bar(baz)
# can also be written as
foo.bar baz
If the very last argument to the message send is a Hash literal, you can leave off the curly braces:
foo.bar({ :baz => 23, :quux => 42 })
# can also be written as
foo.bar(:baz => 23, :quux => 42)
Put the two together, and you get:
foo.bar({ :baz => 23, :quux => 42 })
# can also be written as
foo.bar :baz => 23, :quux => 42
In Ruby 1.9, a new alternative Hash literal syntax was introduced. This literal syntax is very limited compared to the original one, because it can only express Hashes whose keys are Symbols which are also valid Ruby identifiers, whereas with the original syntax, you can write down a Hash with any arbitrary object as key. But, for that limited use case, it is very readable:
{ :baz => 23, :quux => 42 }
# can also be written as
{ baz: 23, quux: 42 }
If we put that feature together with the other two, we get the message send syntax you are asking about:
foo.bar baz: 23, quux: 42
If we have method declared with a single argument like this:
def foo.bar(opts) p opts end
opts will be bound to a single Hash with two key-value pairs.
These features were often used to emulate keyword arguments as found in other languages. And it has long been a desire of the Ruby community to get support for real keyword arguments. This support was implemented in two steps: first, the new Hash literal syntax was introduced in Ruby 1.9, which allows you to make message sends which look like they are using keyword arguments, even though they are really just a Hash. And then in a second step, in Ruby 2.0 real keyword arguments were introduced. The modified method signature would look like this:
def foo.bar(baz: nil, quux: nil) p baz, quux end
Note that at the moment, it is not possible to have required keyword arguments, they always need to have a default value and are thus always optional. You can, however, use the fact that default values can be arbitrary expressions and do something like this:
def foo.bar(baz: raise ArgumentError '`baz` must be supplied!',
quux: raise ArgumentError '`quux` must be supplied!') p baz, quux end
In a future version of Ruby (it was actually already implemented in February and will likely be in 2.1), required keyword arguments can be specified by omitting the default value:
def foo.bar(baz:, quux:) p baz, quux end
Note that there is a syntactic ambiguity now:
foo.bar baz: 23, quux: 42
# is this sending the message `bar` to `foo` with *one* `Hash` or *two* keywords?
This ambiguity is actually intentional, because it allows old client code written against APIs which use a Hash argument to work unchanged with new APIs that use keyword arguments. There are some semi-complex rules which determine whether that syntax will be interpreted as a Hash or as keywords, but mostly those rules work out the way you would expect them to.

Create a dictionary as an attribute in FactoryGirl

Factory Girl allows to do something like:
FactoryGirl define
factory :post do
content "some content"
styles "styles here"
team 1
end
end
However, if I try something inside the factory block like:
factory :post do
content "some content"
styles "styles here"
team 1
my_dictionary {'a' => 1, 'b' => 2}
end
The my_dictionary does not get interpreted as a dictionary type. I don't know how to make a dictionary as an attribute inside FactoryGirl. Can anyone help me ?
The issue you observe comes from a syntax ambiguity in Ruby. The language uses curly braces both for defining hashes (which you call dictionaries) as well as for blocks (e.g. when using each loops). As you now use the hash as the only parameter to the my_dictionary method, it is unclear to the parser whether that opening curly brace is to be interpreted as the start of a block or a hash. In this case, Ruby defaults to the block assumption.
To enforce the interpretation as a method parameter, you can use parenthesis like so:
my_dictionary({'a' => 1, 'b' => 2})
Then, the statement can be parsed without any ambiguity. What you have here is just one of the rare cases where you can't easily omit the parenthesis for method calls.

Is there a Ruby library/gem that will generate a URL based on a set of parameters?

Rails' URL generation mechanism (most of which routes through polymorphic_url at some point) allows for the passing of a hash that gets serialized into a query string at least for GET requests. What's the best way to get that sort of functionality, but on top of any base path?
For instance, I'd like to have something like the following:
generate_url('http://www.google.com/', :q => 'hello world')
# => 'http://www.google.com/?q=hello+world'
I could certainly write my own that strictly suits my application's requirements, but if there existed some canonical library to take care of it, I'd rather use that :).
Yes, in Ruby's standard library you'll find a whole module of classes for working with URI's. There's one for HTTP. You can call #build with some arguments, much like you showed.
http://www.ruby-doc.org/stdlib/libdoc/uri/rdoc/classes/URI/HTTP.html#M009497
For the query string itself, just use Rails' Hash addition #to_query. i.e.
uri = URI::HTTP.build(:host => "www.google.com", :query => { :q => "test" }.to_query)
Late to the party, but let me highly recommend the Addressable gem. In addition to its other useful features, it supports writing and parsing uri's via RFC 6570 URI templates. To adapt the given example, try:
gsearch = Addressable::Template.new('http://google.com/{?query*}')
gsearch.expand(query: {:q => 'hello world'}).to_s
# => "http://www.google.com/?q=hello%20world"
or
gsearch = Addressable::Template.new('http://www.google.com/{?q}')
gsearch.expand(:q => 'hello world').to_s
# => "http://www.google.com/?q=hello%20world"
With vanilla Ruby, use URI.encode_www_form:
require 'uri'
query = URI.encode_www_form({ :q => "test" })
url = URI::HTTP.build(:host => "www.google.com", query: query).to_s
#=> "http://www.google.com?q=test"
I would suggest my iri gem, which makes it easy to build a URL through a fluent interface:
require 'iri'
url = Iri.new('http://google.com/')
.append('find').append('me') # -> http://google.com/find/me
.add(q: 'books about OOP', limit: 50) # -> ?q=books+about+OOP&limit=50
.del(:q) # remove this query parameter
.del('limit') # remove this one too
.over(q: 'books about tennis', limit: 10) # replace these params
.scheme('https') # replace 'http' with 'https'
.host('localhost') # replace the host name
.port('443') # replace the port
.path('/new/path') # replace the path of the URI, leaving the query untouched
.cut('/q') # replace everything after the host and port
.to_s # convert it to a string

Resources