AXLSX_RAILS converting html to text

AXLSX_RAILS converting html to text - ruby-on-rails

I'm saving Rails data into an Excel spreadsheet using the gem AXLSX_RAILS.
I have some text fields that are stored as HTML in the database.
This is my attempt to convert the HTML to text:
sheet.add_row ['REVENUE DESCRIPTION', strip_tags(#costproject.revenue).gsub!(" ", "")]
That works to remove the HTML tags.
But, I would like to replace with the Excel new line (code 10 - vbLf).
How can I do that?
I tried this:
sheet.add_row ['DESCRIPTION', strip_tags(#costproject.description).gsub!(" ", vbLf)]
Thanks for the help!

Try "\x0A". Something like:
sheet.add_row ['REVENUE DESCRIPTION', strip_tags(#costproject.revenue).gsub!(" ", "\x0A")]
That should be the Hex equivalent of vbLF. See this reference.
If you want both carriage return and line feed, use "\x0D\x0A".
Incidentally, I do not see any constant within Axlsx.

Related

Rails Roo gem .xlsx output contains the object not the output of method

I am using the Roo gem to output a spreadsheet from a Rails app. One of my columns is a hash (Postgres DB). I would like to format the cell contents into something more readable. I am using a method to return a human readable cell.
The column data looks like this:
Inspection.first.results
=> {"soiled"=>"oil on back",
"assigned_to"=>"Warehouse#firedatasolutions.com",
"contaminated"=>"blood on left cuff",
"inspection_date"=>"01/01/2017",
"physical_damage_seam_integrity"=>"",
"physical_damage_thermal_damage"=>"",
"physical_damage_reflective_trim"=>"",
"physical_damage_rips_tears_cuts"=>"small tear on right sleeve",
"correct_assembly_size_compatibility_of_shell_liner_and_drd"=>"",
"physical_damage_damaged_or_missing_hardware_or_closure_systems"=>""}
In my Inspections model I defined the following method:
def print_results
self.results.each do |k,v|
puts "#{k.titleize}:#{v.humanize}\r\n"
end
end
So in the console I get this:
Inspection.first.print_results
Soiled:Oil on back
Assigned To:Warehouse
Contaminated:Blood on left cuff
Inspection Date:01/01/2017
Physical Damage Seam Integrity:
Physical Damage Thermal Damage:
Physical Damage Reflective Trim:
Physical Damage Rips Tears Cuts:Small tear on right sleeve
Correct Assembly Size Compatibility Of Shell Liner And Drd:
Physical Damage Damaged Or Missing Hardware Or Closure Systems:
=> {"soiled"=>"oil on back",
"assigned_to"=>"Warehouse",
"contaminated"=>"blood on left cuff",
"inspection_date"=>"01/01/2017",
"physical_damage_seam_integrity"=>"",
"physical_damage_thermal_damage"=>"",
"physical_damage_reflective_trim"=>"",
"physical_damage_rips_tears_cuts"=>"small tear on right sleeve",
"correct_assembly_size_compatibility_of_shell_liner_and_drd"=>"",
"physical_damage_damaged_or_missing_hardware_or_closure_systems"=>""}
But when I put this in the index.xlsx.axlsx file
wb = xlsx_package.workbook
wb.add_worksheet(name: "Inspections") do |sheet|
sheet.add_row ['Serial Number', 'Category', 'Inspection Type', 'Date',
'Pass/Fail', 'Assigned To', 'Inspected By', 'Inspection Details']
#inspections.each do |inspection|
sheet.add_row [inspection.ppe.serial, inspection.ppe.category,
inspection.advanced? ? 'Advanced' : 'Routine',
inspection.results['inspection_date'],
inspection.passed? ? 'Pass' : 'Fail',
inspection.ppe.user.last_first_name,
inspection.user.last_first_name,
inspection.print_results]
end
end
The output in the spreadsheet is the original hash, not the results of the print statement.
{"soiled"=>"oil on back",
"assigned_to"=>"Warehouse",
"contaminated"=>"blood on left cuff", "inspection_date"=>"01/01/2017",
"physical_damage_seam_integrity"=>"",
"physical_damage_thermal_damage"=>"",
"physical_damage_reflective_trim"=>"",
"physical_damage_rips_tears_cuts"=>"small tear on right sleeve",
"correct_assembly_size_compatibility_of_shell_liner_and_drd"=>"",
"physical_damage_damaged_or_missing_hardware_or_closure_systems"=>""}
Is it possible to get the output of the method into the cell rather than the hash object?

The problem is that your print_results method prints out what you want to stdout (that is, the console), but still returns the original hash. The return value of the method is all that matters to Roo.
What you want to do is rewrite print_results to return the formatted string:
def print_results
self.results.map do |k,v|
"#{k.titleize}:#{v.humanize}\r\n"
end.join
end
This will return a string (note the use of .join to combine the array of strings returned by .map) that you can throw into Roo and get your desired output.

Rails Axlsx New Line in Cell

Is there a way that I can add a new line to a cell using the Axlsx gem in Rails?
So basically replicating in Excel once you enter a value you can do a Alt + Enter to add additional text to the new line in the cell. I tried
sheet.add_row ["Testing cell row 1" + \r\n + "Testing cell row 2"]
but that throws an error.

I recently had the same problem and I found a solution that works.
I used this to setup:
p = Axlsx::Package.new
p.use_shared_strings = true
And this code adds a wrap style that makes the \r line breaks work correctly:
wrap = p.workbook.styles.add_style alignment: {wrap_text: true}
sheet.add_row "1\r2\r3", style: wrap
Now the new line in cell works, and the output is:
1
2
3
Notes:
The new line in cell doesn't work (#Gary Pinkham)
The "\x0D\x0A" didn't work (#noel)

For a forced line feed use "\x0A" (breaks between paragraphs.)
If you want both carriage return and line feed, use "\x0D\x0A".

I couldn't comment on the "doesn't work in mac excel" comment so adding this as an answer.. use package.use_shared_strings = true.. needed for Mac Excel..

Generate a link_to on the fly if a URL is found inside the contents of a db text field?

I have an automated report tool (corp intranet) where the admins have a few text area boxes to enter some text for different parts of the email body.
What I'd like to do is parse the contents of the text area and wrap any hyperlinks found with link tags (so when the report goes out there are links instead of text urls).
Is ther a simple way to do something like this without figuring out a way of parsing the text to add link tags around a found (['http:','https:','ftp:] TO the first SPACE after)?
Thank You!
Ruby 1.87, Rails 2.3.5

Make a helper :
def make_urls(text)
urls = %r{(?:https?|ftp|mailto)://\S+}i
html_text = text.gsub urls, '\0'
html_text
end
on the view just call this function , you will get the expected output.
like :
irb(main):001:0> string = 'here is a link: http://google.com'
=> "here is a link: http://google.com"
irb(main):002:0> urls = %r{(?:https?|ftp|mailto)://\S+}i
=> /(?:https?|ftp|mailto):\/\/\S+/i
irb(main):003:0> html = string.gsub urls, '\0'
=> "here is a link: http://google.com"

There are many ways to accomplish your goal. One way would be to use Regex. If you have never heard of regex, this wikipedia entry should bring you up to speed.
For example:
content_string = "Blah ablal blabla lbal blah blaha http://www.google.com/ adsf dasd dadf dfasdf dadf sdfasdf dadf dfaksjdf kjdfasdf http://www.apple.com/ blah blah blah."
content_string.split(/\s+/).find_all { |u| u =~ /^https?:/ }
Which will return: ["http://www.google.com/", "http://www.apple.com/"]
Now, for the second half of the problem, you will use the array returned above to subsititue the text links for hyperlinks.
links = ["http://www.google.com/", "http://www.apple.com/"]
links.each do |l|
content_string.gsub!(l, "<a href='#{l}'>#{l}</a>")
end
content_string will now be updated to contain HTML hyperlinks for all http/https URLs.
As I mentioned earlier, there are numerous ways to tackle this problem - to find the URLs you could also do something like:
require 'uri'
URI.extract(content_string, ['http', 'https'])
I hope this helps you.

Remove empty paragraphs

I'm importing an RSS feed which has a series of empty paragraphs "<p> </p>".
I am using gsub however it's not stripping the elements from the document:
document.gsub(/<p>\s*<\/p>/,"") or gsub(/<p> <\/p>/,"")
Is there an alternative method or a mistake in the above?
The below appears to work?
gsub(/<p>.<\/p>/,"")

Correct regex like in example:
>> document = "<p>\n\n\n \n</p>aaa<p> </p>bbb"
=> "<p>\n\n\n \n</p>aaa<p> </p>bbb"
>> document.gsub(/<p>[\s$]*<\/p>/, '')
=> "aaabbb"

If the paragraph elements in your RSS feed uses id and classes try this:
gsub(/\<p(\s((class)|(id))=[\'\"][A-z0-9\s]+[\'\"]\s*)*\>\s*\<\/p\>/,"")

Truncate Markdown?

I have a Rails site, where the content is written in markdown. I wish to display a snippet of each, with a "Read more.." link.
How do I go about this? Simple truncating the raw text will not work, for example..
>> "This is an [example](http://example.com)"[0..25]
=> "This is an [example](http:"
Ideally I want to allow the author to (optionally) insert a marker to specify what to use as the "snippet", if not it would take 250 words, and append "..." - for example..
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
The marker could be thought of like an EOF marker (which can be ignored when displaying the full document)
I am using maruku for the Markdown processing (RedCloth is very biased towards Textile, BlueCloth is extremely buggy, and I wanted a native-Ruby parser which ruled out peg-markdown and RDiscount)
Alternatively (since the Markdown is translated to HTML anyway) truncating the HTML correctly would be an option - although it would be preferable to not markdown() the entire document, just to get the first few lines.
So, the options I can think of are (in order of preference)..
Add a "truncate" option to the maruku parser, which will only parse the first x words, or till the "excerpt" marker.
Write/find a parser-agnostic Markdown truncate'r
Write/find an intelligent HTML truncating function

Write/find an intelligent HTML truncating function
The following from http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/, with some modifications will correctly truncate HTML, and easily allow appending a string before the closing tags.
>> puts "<p><b>Something</p>".truncate_html(5, at_end = "...")
=> <p><b>Someth...</b></p>
The modified code:
require 'rexml/parsers/pullparser'
class String
def truncate_html(len = 30, at_end = nil)
p = REXML::Parsers::PullParser.new(self)
tags = []
new_len = len
results = ''
while p.has_next? && new_len > 0
p_e = p.pull
case p_e.event_type
when :start_element
tags.push p_e[0]
results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
when :end_element
results << "</#{tags.pop}>"
when :text
results << p_e[0][0..new_len]
new_len -= p_e[0].length
else
results << "<!-- #{p_e.inspect} -->"
end
end
if at_end
results << "..."
end
tags.reverse.each do |tag|
results << "</#{tag}>"
end
results
end
private
def attrs_to_s(attrs)
if attrs.empty?
''
else
' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
end
end
end

Here's a solution that works for me with Textile.
Convert it to HTML
Truncate it.
Remove any HTML tags that got cut in half with
html_string.gsub(/<[^>]*$/, "")
Then, uses Hpricot to clean it up and close unclosed tags
html_string = Hpricot( html_string ).to_s
I do this in a helper, and with caching there's no performance issue.

You could use a regular expression to find a line consisting of nothing but "^" characters:
markdown_string = <<-eos
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
eos
preview = markdown_string[0...(markdown_string =~ /^\^+$/)]
puts preview

Rather than trying to truncate the text, why not have 2 input boxes, one for the "opening blurb" and one for the main "guts". That way your authors will know exactly what is being show when without having to rely on some sort of funkly EOF marker.

I will have to agree with the "two inputs" approach, and the content writer would need not to worry, since you can modify the background logic to mix the two inputs in one when showing the full content.
full_content = input1 + input2 // perhaps with some complementary html, for a better formatting

Not sure if it applies to this case, but adding the solution below for the sake of completeness. You can use strip_tags method if you are truncating Markdown-rendered contents:
truncate(strip_tags(markdown(article.contents)), length: 50)
Sourced from:
http://devblog.boonecommunitynetwork.com/rails-and-markdown/

A simpler option that just works:
truncate(markdown(item.description), length: 100, escape: false)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

AXLSX_RAILS converting html to text - ruby-on-rails

Try "\x0A". Something like: sheet.add_row ['REVENUE DESCRIPTION', strip_tags(#costproject.revenue).gsub!(" ", "\x0A")] That should be the Hex equivalent of vbLF. See this reference. If you want both carriage return and line feed, use "\x0D\x0A". Incidentally, I do not see any constant within Axlsx.

Related

Rails Roo gem .xlsx output contains the object not the output of method

Rails Axlsx New Line in Cell

Generate a link_to on the fly if a URL is found inside the contents of a db text field?

Remove empty paragraphs

Truncate Markdown?

Categories

Resources