rails 3, xml formatting and builder - ruby-on-rails

I have an xml tag that needs to be formatted like so:
<AddDealRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
I can't seem to get this to work properly, using builder. I am attempting the following code in builder:
xml.AddDealRequest(:xmlns:xsi => "http://www.w3.org/2001/XMLSchema-instance", :xmlns:xsd => "http://www.w3.org/2001/XMLSchema" ) do
but obviously that second colon is throwing off the symbol. Is there any way to escape that second symbol? Or is this declaration entirely necessary?

Try quoting your symbols:
:'xmlns:xsi' => "http://www.w3.org/2001/XMLSchema-instance",
:'xmlns:xsd' => "http://www.w3.org/2001/XMLSchema"
You could also try using strings instead of symbols
'xmlns:xsi' => "http://www.w3.org/2001/XMLSchema-instance",
'xmlns:xsd' => "http://www.w3.org/2001/XMLSchema"
but I don't know if builder will be happy with that but the documentation includes things like this:
xm.target("name"=>"compile", "option"=>"fast")
# => <target option="fast" name="compile"\>
so strings for the attribute names should work.
A bit of time in irb might be help clarify things:
>> 'where_is:pancakes_house'.to_sym
=> :"where_is:pancakes_house"
>> :'xmlns:xsi'.to_s
=> "xmlns:xsi"

Rather than expect anyone to read through all the comments in the earliest answer, I'll just post the outcome here:
Firefox doesn't display the xmlns attribute (at least not when it matches a default). If you view the source (Ctrl+U) or use Chrome as your browser, you'll see that the missing attributes are appearing in the xml output.


How to find text on a page using Nokogiri

I am trying to find the best way to find a word on a page using Nokogiri.
I have a page which has the following text.
<p>Modelo: ABC123-A</p>
I would like to find the "Modelo:" text, and then get the model number after it.
I have had a look around but can't seem to find. So, I thought I would post on here and see if anyone with experience of Nokogiri could shed some light on this for me.
Use p:contains selector and get the matching p nodes.
doc = Nokogiri::HTML('<html><body><p>Modelo: ABC123-A</p><br/><p>Nothing here</p><p>Modelo: 4321</p></body></html>')
doc.css('p:contains("Modelo")').map { |x| x.text.split(': ').last }
#=> ["ABC123-A", "4321"]
A simple example:
doc = Nokogiri::HTML('<html><body><p>Modelo: ABC123-A</p></body></html>')
doc.css('p').first.content # => Modelo: ABC123-A
str.split( ': ' )[-1] # => ABC123-A
You could also try Oga, it's lighter than Nokogiri.

Rails ActiveRecord invalid byte sequence in UTF-8 issue

I am using MSSQL 2005.
I have StringIO object that contains my zip file content.
Here is how I obtain zip binary data:
stringio = Zip::ZipOutputStream::write_buffer do |zio|
Eclaim.find_by_sql("SET TEXTSIZE 67108864")
#zio.write #claim_db[:xml]
biblio = Nokogiri::XML('<?xml version="1.0" encoding="utf-8"?>' + #claim_db[:xml], &:noblanks)
zio.write biblio.to_xml
builder = Nokogiri::XML::Builder.new(:encoding => 'utf-8') do |xml|
xml.documents {
docs.where("ext not in (#{PROHIBITED_EXTS.collect{|v| "'#{v}'"}.join(', ')})").each{|doc|
zio.write doc[:efile]
xml.document(:id => doc[:materialtitle_id]) {
xml.title doc[:title]
xml.code doc[:code]
xml.filename "#{doc[:materialtitle_id]}.#{doc[:ext]}"
xml.extname doc[:ext]
zio.write builder.to_xml
In my controller I try:
:title => 'Some file',
:ext => 'zip',
:size => data.length,
:receive_date => Time.now,
:efile => data.sysread
But Rails complains invalid byte sequence in UTF-8
Help me pls with it.
If the stream is configured as UTF-8 stream, you can't write compressed binary (which may contain any value).
I think, setting data as binary stream before write:
data.force_encoding "ASCII-8BIT"
might help.
What is complaining, ruby, ActiveRecord, or SQL Server? My guess is SQL Server. Make sure the data type of the efile field in the database is a binary BLOB.
You should break your problem down in constituate parts and troubleshoot smaller units. Remove complexity till you get to the source of the issue. For instance, have you tried just doing a simple Document.create with the attributes in say the console to remove the possibility that your controller code may be buggy? Something like Document.create :efile => File.read('sometiny.zip') and just go from there.
Assuming that works or break, you have a much simpler support request and less noise to issue ratio. Right now I suspect your controller code, not the SQL Server Adapter or the connection mode, as I have both tested to the hilt for simple binary data. Assuming the above does not work, you can then moving to examining smaller components.
For instance, what is the data type of the efile column? Do this in the console to find out, Document.columns_hash['efile'] and look at the #sql_type. Is it something suitable like varbinary(max)?
Moving on from there, what connection mode are you using with the SQL Server Adapter, TinyTDS? By default TinyTDS will convert everything to UTF8 as needed and is really smart about things. I have it tested with everything from binary to many different encodings. BTW, if you are using TinyTDS, did you make sure that you compiled FreeTDS with libiconv so it can do all this properly? You can easily check by doing the following in the console tsql -C assuming you have FreeTDS's binaries in your path. This should output a few lines, look for "iconv library: yes". Also make sure you are running 0.91 or better too!
Lastly, a bit of advice, that SET TEXTSIZE is so wrong there. You only want to do that once per connection. See here https://github.com/rails-sqlserver/activerecord-sqlserver-adapter#configure-connection--app-name

Rails 2.3.2/Ruby 1.8.6 Encoding Question - ActionController returning UTF-8?

I have a pretty simple Rails question regarding encoding that I can't find an answer to.
Rails 2.3.2/Ruby1.8.6
I am not setting any encoding options within the Rails environment currently, have left everything to defaults.
If I read a String from disk from a text file - and send it via Rails render :text functionality using Apache/Phusion, what encoding should the client expect?
Thank you for any answers,
Since about Rails 1.2, Rails sets Ruby 1.8's $KCODE magic variable to "UTF8". It includes ActiveSupport::CoreExtensions::String::Multibyte to patch around issues with otherwise ambiguous per-character/per-byte operators. Your text file should be UTF-8, Ruby will pass it through and your application layout should specify a META tag declaring the document's charset to be UTF-8 too:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Then it should all 'just work', but there are some gotchas described below.
If you're on a Mac, running "script/console" in Terminal.app and then pasting unusual character sequences directly into the terminal from e.g. the Character Viewer is a good way to play around and demonstrate this to your own satisfaction, since the whole OS works in UTF-8. I don't know what the equivalent would be for Windows or an arbitrary Linux distribution.
For example, "⇒" - RIGHTWARDS DOUBLE ARROW - is Unicode 21D2, UTF8 0xE2 (226), 0x87 (125), 0x92 (146). If I paste that into Terminal and ask for the byte values I get the expected result:
=> "UTF8"
>> "⇒"
=> "\342\207\222"
>> puts "⇒"
>> "⇒"[0]
=> 226
>> "⇒"[1]
=> 135
>> "⇒"[2]
=> 146
>> "⇒"[3]
=> nil
Note how you're still getting byte access with "[]". See the documentation on the Multibyte extensions in the Rails API (for Rails 2.2, e.g. at http://railsapi.com/) if you want to do string operations, otherwise things like "foo.reverse" will do the wrong thing; "foo.mb_chars.reverse" gets it right by using the "mb_chars" proxy.

Can I use a regular expression to extract the domain from a URL?

Suppose I want to turn this :
into this :
or even better, this :
Is this even possible in regex?
Why use a regex when Ruby has a library for it? The URI library:
ruby-1.9.1-p378 > require 'uri'
=> true
ruby-1.9.1-p378 > uri = URI.parse("http://en.wikipedia.org/wiki/Anarchy")
=> #<URI::HTTP:0x000001010a2270 URL:http://en.wikipedia.org/wiki/Anarchy>
ruby-1.9.1-p378 > uri.host
=> "en.wikipedia.org"
ruby-1.9.1-p378 > uri.host.split('.')
=> ["en", "wikipedia", "org"]
Splitting the host is one way to separate the domains, but I'm not aware of a reliable way to get the base domain -- you can't just count, in the event of a URL like "http://somedomain.otherdomain.school.ac.uk" vs "www.google.com".
/http:\/\/([^\/]*).*/ will produce en.wikipedia.org from the string you provided.
/http:\/\/.{0,3}\.([^\/]*).*/ will produce wikipedia.org.
Now I know you haven't asked for how, and you haven't specified a language, but I'll answer anyway... (note, this works for all language subsites, not just en.wikipedia...)
$url =~ s,http://[a-z]{2}\.(wikipedia\.org)/.*,$1,;
url = url.sub(/http:\/\/[a-z]{2}\.(wikipedia\.org)\/.*/, '\1')
$url = preg_replace('|http://[a-z]{2}.(wikipedia.org)/.*|, '$1', $url);
Of course, for this particular example, you don't even need a regex, just this will do:
url = 'wikipedia.org'
but I jest...
you probably want to handle any URL and pull out the domain part, and it should also work for domains in different countries, eg: foo.co.uk.
In which case, I'd use Mark Rushakoff's solution to get the hostname and then a regex to pull out the domain:
domain = host.sub(/^.*\.([^.]+\.[^.]+(\.[a-z]{2})?)$/, '\1')
Hope this helps
Also, if you want to learn more, I have a regex tute online: http://tech.bluesmoon.info/2006/04/beginning-regular-expressions.html
Sure all you would have to do is search on http://(.*)/wiki/Anarchy
In Perl (Sorry I don't know Ruby, but I expect it's similar)
$string_to_search =~ s/http:////(.)//. should give you wikipedia.org
to get rid of the en, you can simply search on http:////en(.)//......
That should do it.
Update: In case you're not familiar with Regex, I would recommend picking up a Regex book, this one really rocks and I like it: REGEX BOOK,Mastering Regular Expressions, I saw it on half.com the other day for 14.99 used, but to clarify what i suggested above, is to look for the string http://en, then for anything until you find a / this is all captured in $1 (in perl, not sure if it's the same in ruby), a simple print $1 will print the string.
Update: #2 sorry the star in the regex is not showing up for some reason, so where you see the . in the () and after the // just imagine a *, oh and I forgot for the en part add a /. at the end that way you don't end up with .wikipedia.org

Is it possible to generate plain-old XML using Haml?

I've been working on a piece of software where I need to generate a custom XML file to send back to a client application. The current solutions on Ruby/Rails world for generating XML files are slow, at best. Using builder or event Nokogiri, while have a nice syntax and are maintainable solutions, they consume too much time and processing.
I definetly could go to ERB, which provides a good speed at the expense of building the whole XML by hand.
HAML is a great tool, have a nice and straight-forward syntax and is fairly fast. But I'm struggling to build pure XML files using it. Which makes me wonder, is it possible at all?
Does any one have some pointers to some code or docs showing how to do this, build a full, valid XML from HAML?
Doing XML in HAML is easy, just start off your template with:
!!! XML
which produces
<?xml version='1.0' encoding='utf-8' ?>
Then as #beanish said earlier, you "make up your own tags":
%test2 hello
%item{:name => "blah"}
to get
<item name='blah'></item>
%test2 hello
%item{:name => "blah"}
run it through haml
haml hamltest.haml test.xml
open the file in a browser
<item name='blah'></item>
The HAML reference talks about html tags and gives some examples.
HAML reference
This demonstrates some things that could use useful for xml documents:
!!! XML
%root{'xmlns:foo' => 'http://myns'}
-# Note: :dashed-attr is invalid syntax
%dashed-tag{'dashed-attr' => 'value'} Text
%underscore_tag Text
- ['apple', 'orange', 'pear'].each do |fruit|
- haml_tag(fruit, "Yummy #{fruit.capitalize}!", 'fruit-code' => fruit.upcase)
%foo:nstag{'foo:nsattr' => 'value'}
<?xml version='1.0' encoding='utf-8' ?>
<root xmlns:foo='http://myns'>
<dashed-tag dashed-attr='value'>Text</dashed-tag>
<apple fruit-code='APPLE'>Yummy Apple!</apple>
<orange fruit-code='ORANGE'>Yummy Orange!</orange>
<pear fruit-code='PEAR'>Yummy Pear!</pear>
<foo:nstag foo:nsattr='value'></foo:nstag>
Look at the Haml::Helpers link on the haml reference for more methods like haml_tag.
If you want to use double-quotes for attributes,
See: https://stackoverflow.com/a/967065/498594
Or outside of rails use:
>> Haml::Engine.new("%tag{:name => 'value'}", :attr_wrapper => '"').to_html
=> "<tag name=\"value\"></tag>\n"
Haml can produce XML just as easily as HTML (I've used it for FBML and XHTML). What problems are you having?
I've not used HAML, but if you can't make it work another option is Builder.
what about creating the xml header, e.g. <?xml version="1.0" encoding="UTF-8"?>
It should be possible. After all you can create plain old XML with Notepad.
