Slower while generating the XML from the bunch of model object - ruby-on-rails

class GenericFormatter < Formatter
attr_accessor :tag_name,:objects
def generate_xml
builder = Nokogiri::XML::Builder.new do |xml|
xml.send(tag_name.pluralize) {
objects.each do |obj|
xml.send(tag_name.singularize){
self.generate_obj_row obj,xml
}
end
}
end
builder.to_xml
end
def initialize tag_name,objects
self.tag_name = tag_name
self.objects = objects
end
def generate_obj_row obj,xml
obj.attributes.except("updated_at").map do |key,value|
xml.send(key, value)
end
xml.updated_at obj.updated_at.try(:strftime,"%m/%d/%Y %H:%M:%S") if obj.attributes.key?('updated_at')
end
end
In the above code, I have implemented the formatter where I have used the nokogiri XML Builder to generate the XML by manipulating the objects passing out inside the code.It's generated the faster XML when the data is not too large if data is larger like more than 10,000 records then It's slow down the XML to generate and takes at least 50-60 seconds.
Problem: Is there any way to generate the XML faster, I have tried XML Builders on view as well but did n't work.How can I generate the XML Faster? Should the solution be an application on rails 3 and suggestions to optimized above code?

Your main problem is processing everything in one go instead of splitting your data into batches. It all requires a lot of memory, first to build all those ActiveRecord models and then to build memory representation of the whole xml document. Meta-programming is also quite expensive (I mean those send methods).
Take a look at this code:
class XmlGenerator
attr_accessor :tag_name, :ar_relation
def initialize(tag_name, ar_relation)
#ar_relation = ar_relation
#tag_name = tag_name
end
def generate_xml
singular_tag_name = tag_name.singularize
plural_tag_name = tag_name.pluralize
xml = ""
xml << "<#{plural_tag_name}>"
ar_relation.find_in_batches(batch_size: 1000) do |batch|
batch.each do |obj|
xml << "<#{singular_tag_name}>"
obj.attributes.except("updated_at").each do |key, value|
xml << "<#{key}>#{value}</#{key}>"
end
if obj.attributes.key?("updated_at")
xml << "<updated_at>#{obj.updated_at.strftime('%m/%d/%Y %H:%M:%S')}</updated_at>"
end
xml << "</#{singular_tag_name}>"
end
end
xml << "</#{tag_name.pluralize}>"
xml
end
end
# example usage
XmlGenerator.new("user", User.where("age < 21")).generate_xml
Major improvements are:
fetching data from database in batches, you need to pass ActiveRecord collection instead of array of ActiveRecord models
generating xml by constructing strings, this has a risk of producing invalid xml, but it is much faster than using builder
I tested it on over 60k records. It took around 40 seconds to generate such xml document.
There is much more that can be done to improve this even further, but it all depends on your application.
Here are some ideas:
do not use ActiveRecord to fetch data, instead use lighter library or plain database driver
fetch only data that you need
tweak batch size
write generated xml directly to a file (if that is your use case) to save memory

The Nokogiri gem has a nice interface for creating XML from scratch,
Nokogiri is a wrapper around libxml2.
Gemfile gem 'nokogiri' To generate xml simple use the Nokogiri XML Builder like this
xml = Nokogiri::XML::Builder.new { |xml|
xml.body do
xml.test1 "some string"
xml.test2 890
xml.test3 do
xml.test3_1 "some string"
end
xml.test4 "with attributes", :attribute => "some attribute"
xml.closing
end
}.to_xml
output
<?xml version="1.0"?>
<body>
<test1>some string</test1>
<test2>890</test2>
<test3>
<test3_1>some string</test3_1>
</test3>
<test4 attribute="some attribute">with attributes</test4>
<closing/>
</body>
Demo: http://www.jakobbeyer.de/xml-with-nokogiri

Related

Trying to navigate XML file using nokogiri and xpath

I have an xml file which is a download from: https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml
What I'm trying to do is navigate through the currencies, so that I can save them in my database.
I have:
open('app/assets/forex/eurofxref-daily.xml', 'wb') do |file|
file << open('https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml').read
end
then
doc = File.open("app/assets/forex/eurofxref-daily.xml") { |f| Nokogiri::XML(f) }
I am having a hard time accessing the nodes I'm interested in to extract currencies and values.
I'm not familiar with Nokogiri, but from this tutorial, it looks like you can apply the following XPath: /*/e:Cubes/e:Cube/e:Cube to select all of the Cube elements.
From there, you can iterate over each of the Cube elements, and select their #currency and #rate attributes:
#doc = Nokogiri::XML(File.open("https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml"))
#doc.xpath('/*/e:Cubes/e:Cube/e:Cube', 'e' => 'ttp://www.ecb.int/vocabulary/2002-08-01/eurofxref').each do |node|
# do stuff
currency = node.attr('currency')
rate = node.attr('rate')
end

Nokogiri Use Ruby-Core Method as XML Node Name

How do I get Nokogiri to accept a ruby-core method as a node name e.g.
xml.hash Digest::SHA256.file form.survey_xml should return something like this
<hash>cde6f0dd030aac1d3aa6d231b7c0cc30a34686a6f6780c468ccc64a4822f01e0</hash>
Instead I am getting an error ArgumentError: wrong number of arguments (1 for 0) in hash of course because hash is a ruby method.
How do I set the node name to hash using the Nokogiri DSL since the API I am interacting with expects that node.
I can just create the xml manually but the answer I am looking for is specifically using nokogiri
More Info
Here is the xml I am trying to create:
<?xml version=\"1.0\"?>
<xforms xmlns=\"http://openrosa.org/xforms/xformsList\">
<xform>
<formID>1</formID>
<name>BLAH BLAH</name>
<version>1</version>
<hash>892734982SDHFK238479823749234934</hash>
<downloadUrl>/Users/me/workspace/dashboard/public/uploads/survey_xml/survey_xml/2/S_1_.xml</downloadUrl>
</xform>
</xforms>
Here is my code:
require 'nokogiri'
require 'digest'
def mine
xml = Nokogiri::XML::Builder.new{ |xml|
xml.xforms xmlns: 'http://openrosa.org/xforms/xformsList' do
#forms.each do |form|
xml.xform do
xml.formID form.id
xml.name form.name
xml.version 1
xml.hash Digest::SHA256.file form.survey_xml.survey_xml.file.file
xml.downloadUrl form.survey_xml.survey_xml.file.file
end
end
end
}.to_xml
end
Based on dimakura's answer:
You can use other Nokogiri methods.
xpath:
node.xpath('hash').first
search:
node.search('hash').first
children:
xml.children.select{|x| x.name == 'hash'}
If you are creating new elements, not getting them. Then you can add them, for example, like this:
xml.add_child '<hash>hash-code</hash>'
Update When working with Nokogiri::XML::Builder special names should be used with underscore (_):
xml.hash_ 'your-hash'

How to stream large xml in Rails 3.2?

I'm migrating our app from 3.0 to 3.2.x. Earlier the streaming was done by the assigning the response_body a proc. Like so:
self.response_body = proc do |response, output|
target_obj = StreamingOutputWrapper.new(output)
lib_obj.xml_generator(target_obj)
end
As you can imagine, the StreamingOutputWrapper responds to <<.
This way is deprecated in Rails 3.2.x. The suggested way is to assign an object that responds to each.
The problem I'm facing now is in making the lib_obj.xml_generator each-aware.
The current version of it looks like this:
def xml_generator(target, conditions = [])
builder = Builder::XmlMarkup.new(:target => target)
builder.root do
builder.elementA do
Model1.find_each(:conditions => conditions) { |model1| target << model1.xml_chunk_string }
end
end
end
where target is a StreamingOutputWrapper object.
The question is, how do I modify the code - the xml_generator, and the controller code, to make the response xml stream properly.
Important stuff: Building the xml in memory is not an option as the model records are huge. The typical size of the xml response is around 150MB.
What you are looking for is SAX Parsing. SAX reads files "chunks" at a time instead of loading the whole file into DOM. This is super convenient and fortunately there are a lot of people before you who have wanted to do the same thing. Nokogiri offers XML::SAX methods, but it can get really confusing in the disastrous documentation and syntactically, it's a mess. I would suggest looking into something that sits on top of Nokogiri and makes getting your job done, a lot more simple.
Here are a few options -
SAX_stream:
Mapping out objects in sax_stream is super simple:
require 'sax_stream/mapper'
class Product
include SaxStream::Mapper
node 'product'
map :id, :to => '#id'
map :status, :to => '#status'
map :name_confirmed, :to => 'name/#confirmed'
map :name, :to => 'name'
end
and calling the parser in is also simple:
require 'sax_stream/parser'
require 'sax_stream/collectors/naive_collector'
collector = SaxStream::Collectors::NaiveCollector.new
parser = SaxStream::Parser.new(collector, [Product])
parser.parse_stream(File.open('products.xml'))
However, working with the collectors (or writing your own) and end up being slightly confusing, so I would actually go with:
Saxerator:
Saxerator gets the job doen and has some really handy methods for traversing into nodes that can be a little less complex than sax_stream. Saxerator also has a few really great configuration options that are well documented. Simple Saxerator example below:
parser = Saxerator.parser(File.new("rss.xml"))
parser.for_tag(:item).each do |item|
# where the xml contains <item><title>...</title><author>...</author></item>
# item will look like {'title' => '...', 'author' => '...'}
puts "#{item['title']}: #{item['author']}"
end
# a String is returned here since the given element contains only character data
puts "First title: #{parser.for_tag(:title).first}"
If you end up having to pull the XML from an external source (or it is getting updated frequently and do you don't want to have to update the version on your server manually, check out THIS QUESTION and the accepted answer, it works great.
You could always monkey-patch the response object:
response.stream.instance_eval do
alias :<< :write
end
builder = Builder::XmlMarkup.new(:target => response.stream)
...

How to open, parse and process XML file with Ox gem like with Nokogiri gem?

I want to open an external XML file, parse it and use the data to store in my database. I do this with Nokogiri quite easy:
file = '...external.xml'
xml = Nokogiri::XML(open(file))
xml.xpath('//Element').each do |element|
# process elements and save to Database e.g.:
#data = Model.new(:attr => element.at('foo').text)
#data.save
end
Now I want to try the (maybe faster) Ox gem (https://github.com/ohler55/ox) - but I do not get how to open and process a file from the documentary.
Any equivalent code examples for the above code would be awesome! Thank you!
You can't use XPath to locate nodes in Ox, but Ox does provide a locate method. You can use it like so:
xml = Ox.parse(%Q{
<root>
<Element>
<foo>ex1</foo>
</Element>
<Element>
<foo>ex2</foo>
</Element>
</root>
}.strip)
xml.locate('Element/foo/^Text').each do |t|
#data = Model.new(:attr => t)
#data.save
end
# or if you need to do other stuff with the element first
xml.locate('Element').each do |elem|
# do stuff
#data = Model.new(:attr => elem.locate('foo/^Text').first)
#data.save
end
If your query doesn't find any matches, it will return an empty array. For a brief description of the locate query parameter, see the source code at element.rb.
From the documentation:
doc2 = Ox.parse(xml)
To read the contents of a file in Ruby you can use xml = IO.read('filename.xml') (among others). So:
doc = Ox.parse(IO.read(filename))
If your XML file is UTF-8 encoded, then alternatively:
doc = Ox.parse( File.open(filename,"r:UTF-8",&:read) )

Array to XML -- Rails

I have a multi-dimensional array that I'd like to use for building an xml output.
The array is storing a csv import. Where people[0][...] are the column names that will become the xml tags, and the people[...>0][...] are the values.
For instance, array contains:
people[0][0] => first-name
people[0][1] => last-name
people[1][0] => Bob
people[1][1] => Dylan
people[2][0] => Sam
people[2][1] => Shepard
XML needs to be:
<person>
<first-name>Bob</first-name>
<last-name>Dylan</last-name>
</person>
<person>
<first-name>Sam</first-name>
<last-name>Shepard</last-name>
</person>
Any help is appreciated.
I suggest using FasterCSV to import your data and to convert it into an array of hashes. That way to_xml should give you what you want:
people = []
FasterCSV.foreach("yourfile.csv", :headers => true) do |row|
people << row.to_hash
end
people.to_xml
There are two main ways I can think of achieving this, one using an XML serializer; the second by pushing out the raw string.
Here's an example of the second:
xml = ''
1.upto(people.size-1) do |row_idx|
xml << "<person>\n"
people[0].each_with_index do |column, col_idx|
xml << " <#{column}>#{people[row_idx][col_idx]}</#{column}>\n"
end
xml << "</person>\n"
end
Another way:
hash = {}
hash['person'] = []
1.upto(people.size-1) do |row_idx|
row = {}
people[0].each_with_index do |column, col_idx|
row[column]=people[row_idx][col_idx]
end
hash['person'] << row
end
hash.to_xml
Leaving this answer here in case someone needs to convert an array like this that didn't come from a CSV file (or if they can't use FasterCSV).
Using Hash.to_xml is a good idea, due to its support in the core rails. It's probably the simplest way to export Hash-like data to simple XML. In most, simple cases - more complex cases requires more complex tools.
Thanks to everyone that posted. Below is the solution that seems to work best for my needs. Hopefully others may find this useful.
This solution grabs a remote url csv file, stores it in a multi-dimensional array, then exports it as xml:
require 'rio'
require 'fastercsv'
url = 'http://remote-url.com/file.csv'
people = FasterCSV.parse(rio(url).read)
xml = ''
1.upto(people.size-1) do |row_idx|
xml << " <record>\n"
people[0].each_with_index do |column, col_idx|
xml << " <#{column.parameterize}>#{people[row_idx][col_idx]}</#{column.parameterize}>\n"
end
xml << " </record>\n"
end
There are better solutions out there, using hash.to_xml would have been great except I needed to change the csv index line to parameterize to use as a xml tag, but this code works so I'm happy.

Resources