How to stream large xml in Rails 3.2? - ruby-on-rails

I'm migrating our app from 3.0 to 3.2.x. Earlier the streaming was done by the assigning the response_body a proc. Like so:
self.response_body = proc do |response, output|
target_obj = StreamingOutputWrapper.new(output)
lib_obj.xml_generator(target_obj)
end
As you can imagine, the StreamingOutputWrapper responds to <<.
This way is deprecated in Rails 3.2.x. The suggested way is to assign an object that responds to each.
The problem I'm facing now is in making the lib_obj.xml_generator each-aware.
The current version of it looks like this:
def xml_generator(target, conditions = [])
builder = Builder::XmlMarkup.new(:target => target)
builder.root do
builder.elementA do
Model1.find_each(:conditions => conditions) { |model1| target << model1.xml_chunk_string }
end
end
end
where target is a StreamingOutputWrapper object.
The question is, how do I modify the code - the xml_generator, and the controller code, to make the response xml stream properly.
Important stuff: Building the xml in memory is not an option as the model records are huge. The typical size of the xml response is around 150MB.

What you are looking for is SAX Parsing. SAX reads files "chunks" at a time instead of loading the whole file into DOM. This is super convenient and fortunately there are a lot of people before you who have wanted to do the same thing. Nokogiri offers XML::SAX methods, but it can get really confusing in the disastrous documentation and syntactically, it's a mess. I would suggest looking into something that sits on top of Nokogiri and makes getting your job done, a lot more simple.
Here are a few options -
SAX_stream:
Mapping out objects in sax_stream is super simple:
require 'sax_stream/mapper'
class Product
include SaxStream::Mapper
node 'product'
map :id, :to => '#id'
map :status, :to => '#status'
map :name_confirmed, :to => 'name/#confirmed'
map :name, :to => 'name'
end
and calling the parser in is also simple:
require 'sax_stream/parser'
require 'sax_stream/collectors/naive_collector'
collector = SaxStream::Collectors::NaiveCollector.new
parser = SaxStream::Parser.new(collector, [Product])
parser.parse_stream(File.open('products.xml'))
However, working with the collectors (or writing your own) and end up being slightly confusing, so I would actually go with:
Saxerator:
Saxerator gets the job doen and has some really handy methods for traversing into nodes that can be a little less complex than sax_stream. Saxerator also has a few really great configuration options that are well documented. Simple Saxerator example below:
parser = Saxerator.parser(File.new("rss.xml"))
parser.for_tag(:item).each do |item|
# where the xml contains <item><title>...</title><author>...</author></item>
# item will look like {'title' => '...', 'author' => '...'}
puts "#{item['title']}: #{item['author']}"
end
# a String is returned here since the given element contains only character data
puts "First title: #{parser.for_tag(:title).first}"
If you end up having to pull the XML from an external source (or it is getting updated frequently and do you don't want to have to update the version on your server manually, check out THIS QUESTION and the accepted answer, it works great.

You could always monkey-patch the response object:
response.stream.instance_eval do
alias :<< :write
end
builder = Builder::XmlMarkup.new(:target => response.stream)
...

Related

Rails how to censor some parameters from logs

I'm doing some custom logging in my Rails application and I want to automatically sensor some parameters. I know that we have fitler_parameter_logging.rb which does this for the params object. How can I achieve something like this for my custom hash.
Let's say I'm logging something like this:
Rails.logger.info {name: 'me', secret: '1231234'}.inspect
So my secret key should be sensored in the logs.
I know I can personally delete the key before logging, but it adds noise to my application.
The question title talks about removing the parameters, but your question refers to censoring the parameters similar to how Rails.application.config.filter_parameters works. If it's the latter, it looks like that's already been answered in Manually filter parameters in Rails. If it's the former, assuming a filter list, and a hash:
FILTER_LIST = [:password, :secret]
hash = {'password' => 123, :secret => 321, :ok => "this isn't going anywhere"}
then you could do this:
hash.reject { |k,v| FILTER_LIST.include?(k.to_sym) }
That'll cope with both string and symbol key matching, assuming the filter list is always symbols. Additionally, you could always use the same list as config.filter_parameters if they are going to be the same and you don't need a separate filter list:
hash.reject { |k,v| Rails.application.config.filter_parameters.include?(k.to_sym) }
And if you wanted to make this easier to use within your own logging, you could consider monkey patching the Hash class:
class Hash
def filter_like_parameters
self.reject { |k,v| Rails.application.config.filter_parameters.include?(k.to_sym) }
end
end
Then your logging code would become:
Rails.logger.info {name: 'me', secret: '1231234'}.filter_like_parameters.inspect
If you do monkey patch custom functionality to core classes like that though for calls you're going to be making a lot, it's always best to use a quite obtuse method name to reduce the likelihood of a clash with any other library that might share the same method names.
Hope that helps!

Using a frozen constant as a hash key?

I built a small API that makes a few calls that have similar payloads, so I have a base payload that I merge in the call-specific elements, like so:
def foo_call
base_payload.merge({"request.id" => request_id})
end
def biz_call
base_payload.merge({"request.id" => some_other_thing})
end
def base_payload
{
bar: bar,
baz: baz,
"request.id" => default_id
}
end
My coworker suggested that I make "request.id" a frozen constant, arguing that making it a constant means we can freeze it means we wont be allocating a new string object on each call, saving a bit of memory. That would look like this:
REQUESTID = "request.id".freeze
def foo_call
base_payload.merge({REQUESTID => request_id})
end
def biz_call
base_payload.merge({REQUESTID => some_other_thing})
end
def base_payload
{
bar: bar,
baz: baz,
REQUESTID => default_id
}
end
I'm a little apprehensive, but I can't quite pin down any reason why not to do this (other than my latent resistance at having a new commit :>). I feel it might cause weirdness to have the same object be the key in multiple hashes -- that merging in an object "request.id" might not overwrite the original string from the base_payload -- or that we won't actually see any memory saved since the hash would have to clone the string for its hash key anyway, but I'm not really sure.
Am I just being overly paranoid/resistant?

Rails: build for difference between relationships

A doc has many articles and can have many edits.
I want to build an edit for each article up to the total number of #doc.articles. This code works with the first build (i.e., when no edits yet exist).
def editing
#doc = Doc.find(params[:id])
unbuilt = #doc.articles - #doc.edits
unbuilt.reverse.each do |article|
#doc.edits.build(:body => article.body, :article_id => article.id, :doc_id => #doc.id)
end
end
But when edits already exist it'll keep those edits and still build for the #doc.articles total, ending up with too many edits and some duplicates if only one article was changed.
I want to put some condition against :article_id which exists in both edits and articles in to say (in pseudocode):
unbuilt = #doc.articles - #doc.edits
unbuilt.where('article_id not in (?)', #doc.edits).reverse.each do |article|
#doc.edits.build(...)
end
Any help would be excellent! Thank-you so much.
You are doing something weird here:
unbuilt = #doc.articles - #doc.edits
You probably want this instead
unbuilt = #doc.articles - #doc.edits.map(&:article)
This works if #doc.articles and #doc.edits are small collections, otherwise a SQL solution would be preferred.
-- EDIT: added explanation --
this piece of Ruby
#doc.edits.map(&:article)
is equivalent to
#doc.edits.map do |edit| edit.article end
the previous one is much more compact and exploits a feature introduced in ruby 1.9
It basically takes a symbol (:article), calls on it the 'to_proc' method (it does this by using the '&' character). You can think of the 'to_proc' method as something very similar to this:
def to_proc
proc { |object| object.send(self) }
end
In ruby, blocks and procs are generally equivalent (kindof), so this works!

Struct with types and conversion

I am trying to accomplish the following in Ruby:
person_struct = StructWithType.new "Person",
:name => String,
:age => Fixnum,
:money_into_bank_account => Float
And I would like it to accept both:
person_struct.new "Some Name",10,100000.0
and
person_struct.new "Some Name","10","100000.0"
That is, I'd like it to do data conversion stuff automatically.
I know Ruby is dinamically and I should not care about data types but this kind of conversion would be handy.
What I am asking is something similar to ActiveRecord already does: convert String to thedatatype defined in the table column.
After searching into ActiveModel I could not figure out how to to some TableLess that do this conversion.
After all I think my problem may require much less that would be offered by ActiveModel modules.
Of course I could implement a class by myself that presents this conversion feature, but I would rather know this has not yet been done in order to not reinvent the wheel.
Tks in advance.
I think that the implementation inside a class is so easy, and there is no overhead at all, so I don't see the reason to use StructWithType at all. Ruby is not only dynamic, but very efficient in storing its instances. As long as you don't use an attribute, there is none.
The implementation in a class should be:
def initialize(name, age, money_into_bank_account)
self.name = name
self.age = age.to_i
self.money_into_bank_account = money_into_bank_account.to_f
end
The implementation in StructWithType would then be one layer higher:
Implement for each type a converter.
Bind an instance of that converter in the class.
Use in the new implementation of StructWithType instances (not class) the converters of the class to do the conversion.
A very first sketch of it could go like that:
class StructWithType
def create(args*)
<Some code to create new_inst>
args.each_with_index do |arg,index|
new_value = self.converter[index].convert(arg)
new_inst[argname[index]]= new_value
end
end
end
The ideas here are:
You have an instance method named create that creates from the factory a new struct instance.
The factory iterates through all args (with the index) and searches for each arg the converter to use.
It converts the arg with the converter.
It stores in the new instance at the argname (method argname[] has to be written) the new value.
So you have to implement the creation of the struct, the lookup for converter, the lookup for the argument name and the setter for the attributes of the new instance. Sorry, no more time today ...
I have used create because new has a different meaning in Ruby, I did not want to mess this up.
I have found a project in github that fulfill some of my requirements: ActiveHash.
Even though I still have to create a class for each type but the type conversion is free.
I am giving it a try.
Usage example:
class Country < ActiveHash::Base
self.data = [
{:id => 1, :name => "US"},
{:id => 2, :name => "Canada"}
]
end
country = Country.new(:name => "Mexico")
country.name # => "Mexico"
country.name? # => true

How to test a scope in Rails 3

What's the best way to test scopes in Rails 3. In rails 2, I would do something like:
Rspec:
it 'should have a top_level scope' do
Category.top_level.proxy_options.should == {:conditions => {:parent_id => nil}}
end
This fails in rails 3 with a "undefined method `proxy_options' for []:ActiveRecord::Relation" error.
How are people testing that a scope is specified with the correct options? I see you could examine the arel object and might be able to make some expectations on that, but I'm not sure what the best way to do it would be.
Leaving the question of 'how-to-test' aside... here's how to achieve similar stuff in Rails3...
In Rails3 named scopes are different in that they just generate Arel relational operators.
But, investigate!
If you go to your console and type:
# All the guts of arel!
Category.top_level.arel.inspect
You'll see internal parts of Arel. It's used to build up the relation, but can also be introspected for current state. You'll notice public methods like #where_clauses and such.
However, the scope itself has a lot of helpful introspection public methods that make it easier than directly accessing #arel:
# Basic stuff:
=> [:table, :primary_key, :to_sql]
# and these to check-out all parts of your relation:
=> [:includes_values, :eager_load_values, :preload_values,
:select_values, :group_values, :order_values, :reorder_flag,
:joins_values, :where_values, :having_values, :limit_value,
:offset_value, :readonly_value, :create_with_value, :from_value]
# With 'where_values' you can see the whole tree of conditions:
Category.top_level.where_values.first.methods - Object.new.methods
=> [:operator, :operand1, :operand2, :left, :left=,
:right, :right=, :not, :or, :and, :to_sql, :each]
# You can see each condition to_sql
Category.top_level.where_values.map(&:to_sql)
=> ["`categories`.`parent_id` IS NULL"]
# More to the point, use #where_values_hash to see rails2-like :conditions hash:
Category.top_level.where_values_hash
=> {"parent_id"=>nil}
Use this last one: #where_values_hash to test scopes in a similar way to #proxy_options in Rails2....
Ideally your unit tests should treat models (classes) and instances thereof as black boxes. After all, it's not really the implementation you care about but the behavior of the interface.
So instead of testing that the scope is implemented in a particular way (i.e. with a particular set of conditions), try testing that it behaves correctly—that it returns instances it should and doesn't return instances it shouldn't.
describe Category do
describe ".top_level" do
it "should return root categories" do
frameworks = Category.create(:name => "Frameworks")
Category.top_level.should include(frameworks)
end
it "should not return child categories" do
frameworks = Category.create(:name => "Frameworks")
rails = Category.create(:name => "Ruby on Rails", :parent => frameworks)
Category.top_level.should_not include(rails)
end
end
end
If you write your tests in this way, you'll be free to re-factor your implementations as you please without needing to modify your tests or, more importantly, without needing to worry about unknowingly breaking your application.
This is how i check them. Think of this scope :
scope :item_type, lambda { |item_type|
where("game_items.item_type = ?", item_type )
}
that gets all the game_items where item_type equals to a value(like 'Weapon') :
it "should get a list of all possible game weapons if called like GameItem.item_type('Weapon'), with no arguments" do
Factory(:game_item, :item_type => 'Weapon')
Factory(:game_item, :item_type => 'Gloves')
weapons = GameItem.item_type('Weapon')
weapons.each { |weapon| weapon.item_type.should == 'Weapon' }
end
I test that the weapons array holds only Weapon item_types and not something else like Gloves that are specified in the spec.
Don't know if this helps or not, but I'm looking for a solution and ran across this question.
I just did this and it works for me
it { User.nickname('hello').should == User.where(:nickname => 'hello') }
FWIW, I agree with your original method (Rails 2). Creating models just for testing them makes your tests way too slow to run in continuous testing, so another approach is needed.
Loving Rails 3, but definitely missing the convenience of proxy_options!
Quickly Check the Clauses of a Scope
I agree with others here that testing the actual results you get back and ensuring they are what you expect is by far the best way to go, but a simple check to ensure that a scope is adding the correct clause can also be useful for faster tests that don't hit the database.
You can use the where_values_hash to test where conditions. Here's an example using Rspec:
it 'should have a top_level scope' do
Category.top_level.where_values_hash.should eq {"parent_id" => nil}
end
Although the documentation is very slim and sometimes non-existent, there are similar methods for other condition-types, such as:
order_values
Category.order(:id).order_values
# => [:id]
select_values
Category.select(:id).select_values
# => [:id]
group_values
Category.group(:id).group_values
# => [:id]
having_values
Category.having(:id).having_values
# => [:id]
etc.
Default Scope
For default scopes, you have to handle them a little differently. Check this answer out for a better explanation.

Resources