Generate a file list based on an array - ruby-on-rails

I tried a few things but this week i feel like my brain's having holidays and i need to complete this thing.. so i hope someone can help me.
I need to create a filelist based on a hash which is saved into a database. The has looks like this:
['file1', 'dir1/file2', 'dir1/subdir1/file3']
Output should be like this:
file1
dir1
file2
subdir1
file3
in html, preferrably like this (to extend it with js to fold and multiselect)
<ul>
<li>file1
<li>dir1</li>
<ul>
<li>file2</li>
<li>subdir1</li>
<ul>
<li>file3</li>
</ul>
</ul>
</ul>
I'm using Ruby on Rails and try to achieve this in an RJS template. But this don't really matters. You can also help me with some detailed pseudo-code.
Someone know how to solve this?
Edit
Thanks to everyone for these solutions. Listing works, i extended it to a foldable solution to show/hide directory contents. I still have one problem: The code aims to have complete file paths in checkboxes behind the entries for a synchronisation. Based on sris' solution, i can only read the current file and it's subs, but not the whole path from the root. For a better understanding:
Currently:
[x] dir1
[x] dir2
[x] file1
gives me
a checkbox with the same value a sthe text displays, e.g "file1" for [x] file1. But what i need is a full path, e.g "dir1/dir2/file1" for [x] file1.
Does someone have another hint how to add this?

Here's a quick implementation you can use for inspiration. This implementation disregards the order of files in the input Array.
I've updated the solution to save the entire path as you required.
dirs = ['file1', 'dir1/file2', 'dir1/subdir1/file3', 'dir1/subdir1/file5']
tree = {}
dirs.each do |path|
current = tree
path.split("/").inject("") do |sub_path,dir|
sub_path = File.join(sub_path, dir)
current[sub_path] ||= {}
current = current[sub_path]
sub_path
end
end
def print_tree(prefix, node)
puts "#{prefix}<ul>"
node.each_pair do |path, subtree|
puts "#{prefix} <li>[#{path[1..-1]}] #{File.basename(path)}</li>"
print_tree(prefix + " ", subtree) unless subtree.empty?
end
puts "#{prefix}</ul>"
end
print_tree "", tree
This code will produce properly indented HTML like your example. But since Hashes in Ruby (1.8.6) aren't ordered the order of the files can't be guaranteed.
The output produced will look like this:
<ul>
<li>[dir1] dir1</li>
<ul>
<li>[dir1/subdir1] subdir1</li>
<ul>
<li>[dir1/subdir1/file3] file3</li>
<li>[dir1/subdir1/file5] file5</li>
</ul>
<li>[dir1/file2] file2</li>
</ul>
<li>[file1] file1</li>
</ul>
I hope this serves as an example of how you can get both the path and the filename.

Think tree.
# setup phase
for each pathname p in list
do
add_path_to_tree(p)
od
walk tree depth first, emitting HTML
add_path_to_tree is recursive
given pathname p
parse p into first_element, rest
# that is, "foo/bar/baz" becomes "foo", "bar/baz"
add first_element to tree
add_path_to_tree(rest)
I'll leave the optimal data struct (list of lists) for the tree (list of lists) as an exercise.

Expanding on sris's answer, if you really want everything sorted and the files listed before the directories, you can use something like this:
def files_first_traverse(prefix, node = {})
puts "#{prefix}<ul>"
node_list = node.sort
node_list.each do |base, subtree|
puts "#{prefix} <li>#{base}</li>" if subtree.empty?
end
node_list.each do |base, subtree|
next if subtree.empty?
puts "#{prefix} <li>#{base}</li>"
files_first_traverse(prefix + ' ', subtree)
end
puts '#{prefix}</ul>'
end

Related

Multiple `gsub` in multiple `each` loops getting overriden one by another

Trying to iterate over some phrases, and whenever I find a word, I need to replace it with a link.
phrases = ["hello world", "worldwide"]
words_to_link = ["world", "world"]
I am trying to get:
"hello <a href='world'>world</a><br />worldwide"
My code is:
phrases.each do |ph|
words_to_link.each do |w|
ph.gsub!(w, "<a href='#{w}'>#{w}</a>")
end
end.join("<br />").html_safe
The output of this is:
"hello <a href='<a href='world'>world</a>'><a href='world'>world</a></a><br /><a href='<a href='world'>world</a>'><a href='world'>world</a></a>wide"
On the first run it finds all occurrences of world, but on the second, it goes inside the generated world and gsubs again.
Another problem is the proper regex to only find words by boundaries, I thought it would be /\b(word)\b, but that didn't work.
Any pointers?
I'm a little confused by your question, so may have got the wrong end of the stick here. However, here is an answer by my interpretation:
phrases = ["hello world", "worldwide"]
substitutions = { /\bworld\b/ => "world" }
phrases.each do |ph|
substitutions.each do |pattern, replacement|
ph.gsub!(pattern, "<a href='#{replacement}'>#{replacement}</a>")
end
end
phrases.join("<br />").html_safe
You can use \b in a regex to mark a work boundary, to avoid altering the "worldwide" string. And (I think this is what you wanted?) you can define some mapping between the search/replace terms rather than looping though twice, to avoid the double-replacement.

Rails Microsoft Word, XML databinding, repeat rows

Those willing to jump straight to my questions can go to the paragraph "Please help with". You will find there my beginning of implementation, along with short XML samples
The story
The famous problem of inserting repeating content, like table rows, into a word template, using the rails framework.
I decided to implement a 'cleaner' solution for replacing some variables in a Word document with rails, using XML databinding. This solution works very well for non-repetitive content, but for repetitive content, a little extra dirty work must be done and I need help with it.
No C#, No Visual, just plain olde ruby on rails & XML
The databinded document
I have a Word document with some content controls, tagged with "human-readable" text, so my users know what should be inside.
I have used Word 2007 Content Control Toolkit to add some custom XML to a .docx file. Therefore in each .docx I have some customXml/itemsx.xml that contains my custom XML.
I have manually databinded this XML to text content control I have in my word template, using drag & drop with Word 2007 Content Control Toolkit.
The replacing process with nokogiri
Basically I already have some code that replaces every XML node by the corresponding value from a hash. For example if I provide this hash to my function :
variables = {
"some_xml-node" => "some_value"
}
It will properly replace XML in customXml/itemsx.xml of .docx file :
<root> <some> <xml-node>some_value</xml-node></some> </root>
So this is taken care of !
The repetitive content
Now as I said, this works perfectly for non-repetitive content. For repetitive content (in my case I want to repeat some <w:tr> in a document), the solution I'd like to go with, is
Manually insert some tags in word/document.xml of .docx file (this is dirty, but hell I can't think of anything else) before every <tr> that needs to be duplicated
In rails, parse the XML and locate the <tr> that needs duplicating using Nokogiri
Copy the tr as many times as I need
Look at some text inside this <tr>, find the databinding (which looks like <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]"
Replace movie[1] by movie[index]
Repeat for every table that needs <tr> duplication
With this solution Therefore I ensure 100% compatibility with my existing system ! It's some kind of preprocessing...
Please help with
Finding an XML comment containing a custom string, and selecting the node just below it (using Nokogiri)
Changing attributes in many sub-nodes of the node found in 1.
XML/Hash samples that could be used (my beginning of implementation after that):
Sample of .docx word/document.xml
<w:document>
<!-- My_Custom_Tag_ID -->
<w:tr someparam="something">
<w:td></w:td>
<w:td><w:sthelse></w:sthelse><w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]><w:sth>Value</w:sth></w:td>
<w:td></<:td>
</w:tr>
</w:document>
Sample of input parameter repeat_tag hash
repeat_tags_sample = [
{
"tag" => "My_Custom_Tag_ID",
"repeatable-content" => "movie"
},
{
"tag" => "My_Custom_Tag_ID_2",
"repeatable-content" => "cartoons"
}
]
Sample of input parameter contents hash
contents_sample =
{
"movies" => [{"name" => "X-Men",
"year" => 1998,
"property-xxx" => 42
}, { "name" => "X-Men-4",
"year" => 2007,
"property-xxx" => 42
}],
"cartoons" => [{"name" => "Tom_Jerry",
"year" => 1995,
"property-yyy" => "cat"
}, { "name" => "Random_name",
"year" => 2008,
"property-yyy" => 42
}]
}
My beginning of implementation :
def dynamic_table_content(zip, repeat_tags, contents)
doc = zip.find_entry("word/document.xml")
xml = Nokogiri::XML.parse(doc.get_input_dtream)
# repeat_tags_sample = [ {
# "tag" => My_Custom_Tag_ID",
# "repeatable-content" => "movie"},
# ...]
repeat_tags.each do |rpt|
content = contents[rpt[:repeatable-content]]
# content now looks like [
# {"name" => "X-Men",
# "year" => 1998,
# "property-xxx" => 42, ...},
# ...]
content_name = rpt[:repeateable_content].to_s
# the 'movie' of '/root[1]/movies[1]/movie[1]/name[1]' (see below)
puts "Processing #{rpt[:tag]}, adding #{content_name}s"
# Word document.xml sample code looks like this :
# <!-- My_Custom_Tag_ID_inserted_manually -->
# <w:tr ...>
# ...
# <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]>
# ...
# </w:tr>
Find a comment containing a custom string, and select the node just below
# Find starting <w:tr > tag located after <!-- rpt[:tag] -->
base_tr_node = find the node after
# Duplicate it as many times as we want.
content.each_with_index do |content, index|
puts "Adding #{content_name} : #{content}.to_s"
new_tr_node = base_tr_node.add_next_sibling(base_tr_node)
# inside this new node there are many
# <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]>
# <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/year[1]>
# ..../movie[1]/property-xxx[1]
# GOAL : replace every movie[1] by movie[index]
Change attributes in many sub-nodes of the node found in 1.
new_tr_node.change_attributes as shown in (see GOAL in previous comments)
# Maybe, it would be something like
# new_tr_node.gsub("(#{content_name})\[([1-9]+)\]", "\1\[#{index}\]")
# ... But new_tr_node is a nokogiri element so .gsub doesn't exist
end
end
#replace["word/document.xml"] = xml.serialize :save_zip_with => 0
end
I have looked at the DoPE extension for Word documents. It looks great ! But alas I had already done a lot of work, and just now I (almost) finished building my own preprocessor.
What I needed was more complicated than what I originally asked. But nevertheless, the answers would be :
EDIT : fixed bad regex/xpath
# 1. Find a comment containing a custom string, and select the node just below
comment_nodes = doc.xpath("//comment()")
# Loop like comment_nodes.each do |comment|
base_tr_node = comment.next_sibling.next_sibling
# For some reason, need to apply next_sibling twice, thought the comment is indeed just above the <w:tr> node
# 2. Change attributes in many sub-nodes of the node found in 1.
matches = tr_node.search('.//*[name()='w:dataBinding']')
matches.each do |databinding_node|
# replace '.*phase[1].*' by '.*phase[index].*'
databinding_node['w:xpath'].gsub("#{comment.text}\[1\]", "#{comment.text}\[#{index}\]")
end

Generate a link_to on the fly if a URL is found inside the contents of a db text field?

I have an automated report tool (corp intranet) where the admins have a few text area boxes to enter some text for different parts of the email body.
What I'd like to do is parse the contents of the text area and wrap any hyperlinks found with link tags (so when the report goes out there are links instead of text urls).
Is ther a simple way to do something like this without figuring out a way of parsing the text to add link tags around a found (['http:','https:','ftp:] TO the first SPACE after)?
Thank You!
Ruby 1.87, Rails 2.3.5
Make a helper :
def make_urls(text)
urls = %r{(?:https?|ftp|mailto)://\S+}i
html_text = text.gsub urls, '\0'
html_text
end
on the view just call this function , you will get the expected output.
like :
irb(main):001:0> string = 'here is a link: http://google.com'
=> "here is a link: http://google.com"
irb(main):002:0> urls = %r{(?:https?|ftp|mailto)://\S+}i
=> /(?:https?|ftp|mailto):\/\/\S+/i
irb(main):003:0> html = string.gsub urls, '\0'
=> "here is a link: http://google.com"
There are many ways to accomplish your goal. One way would be to use Regex. If you have never heard of regex, this wikipedia entry should bring you up to speed.
For example:
content_string = "Blah ablal blabla lbal blah blaha http://www.google.com/ adsf dasd dadf dfasdf dadf sdfasdf dadf dfaksjdf kjdfasdf http://www.apple.com/ blah blah blah."
content_string.split(/\s+/).find_all { |u| u =~ /^https?:/ }
Which will return: ["http://www.google.com/", "http://www.apple.com/"]
Now, for the second half of the problem, you will use the array returned above to subsititue the text links for hyperlinks.
links = ["http://www.google.com/", "http://www.apple.com/"]
links.each do |l|
content_string.gsub!(l, "<a href='#{l}'>#{l}</a>")
end
content_string will now be updated to contain HTML hyperlinks for all http/https URLs.
As I mentioned earlier, there are numerous ways to tackle this problem - to find the URLs you could also do something like:
require 'uri'
URI.extract(content_string, ['http', 'https'])
I hope this helps you.

Nokogiri/Ruby array question

I have a quick question. I am currently writing a Nokogiri/Ruby script and have the following code:
fullId = doc.xpath("/success/data/annotatorResultBean/annotations/annotationBean/concept/fullId")
fullId.each do |e|
e = e.to_s()
g.write(e + "\n")
end
This spits out the following text:
<fullId>D001792</fullId>
<fullId>D001792</fullId>
<fullId>D001792</fullId>
<fullId>D008715</fullId>
I wanted the just the numbers text in between the "< fullid>" saved, without the < fullId>,< /fullId> markup. What am I missing?
Bobby
I think you want to use the text() accessor (which returns the child text values), rather than to_s() (which serializes the entire node, as you see here).
I'm not sure what the g object you're calling write on is, but the following code should give you an array containing all of the text in the fullId nodes:
doc.xpath(your_xpath).map {|e| e.text}

Truncate Markdown?

I have a Rails site, where the content is written in markdown. I wish to display a snippet of each, with a "Read more.." link.
How do I go about this? Simple truncating the raw text will not work, for example..
>> "This is an [example](http://example.com)"[0..25]
=> "This is an [example](http:"
Ideally I want to allow the author to (optionally) insert a marker to specify what to use as the "snippet", if not it would take 250 words, and append "..." - for example..
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
The marker could be thought of like an EOF marker (which can be ignored when displaying the full document)
I am using maruku for the Markdown processing (RedCloth is very biased towards Textile, BlueCloth is extremely buggy, and I wanted a native-Ruby parser which ruled out peg-markdown and RDiscount)
Alternatively (since the Markdown is translated to HTML anyway) truncating the HTML correctly would be an option - although it would be preferable to not markdown() the entire document, just to get the first few lines.
So, the options I can think of are (in order of preference)..
Add a "truncate" option to the maruku parser, which will only parse the first x words, or till the "excerpt" marker.
Write/find a parser-agnostic Markdown truncate'r
Write/find an intelligent HTML truncating function
Write/find an intelligent HTML truncating function
The following from http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/, with some modifications will correctly truncate HTML, and easily allow appending a string before the closing tags.
>> puts "<p><b>Something</p>".truncate_html(5, at_end = "...")
=> <p><b>Someth...</b></p>
The modified code:
require 'rexml/parsers/pullparser'
class String
def truncate_html(len = 30, at_end = nil)
p = REXML::Parsers::PullParser.new(self)
tags = []
new_len = len
results = ''
while p.has_next? && new_len > 0
p_e = p.pull
case p_e.event_type
when :start_element
tags.push p_e[0]
results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
when :end_element
results << "</#{tags.pop}>"
when :text
results << p_e[0][0..new_len]
new_len -= p_e[0].length
else
results << "<!-- #{p_e.inspect} -->"
end
end
if at_end
results << "..."
end
tags.reverse.each do |tag|
results << "</#{tag}>"
end
results
end
private
def attrs_to_s(attrs)
if attrs.empty?
''
else
' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
end
end
end
Here's a solution that works for me with Textile.
Convert it to HTML
Truncate it.
Remove any HTML tags that got cut in half with
html_string.gsub(/<[^>]*$/, "")
Then, uses Hpricot to clean it up and close unclosed tags
html_string = Hpricot( html_string ).to_s
I do this in a helper, and with caching there's no performance issue.
You could use a regular expression to find a line consisting of nothing but "^" characters:
markdown_string = <<-eos
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
eos
preview = markdown_string[0...(markdown_string =~ /^\^+$/)]
puts preview
Rather than trying to truncate the text, why not have 2 input boxes, one for the "opening blurb" and one for the main "guts". That way your authors will know exactly what is being show when without having to rely on some sort of funkly EOF marker.
I will have to agree with the "two inputs" approach, and the content writer would need not to worry, since you can modify the background logic to mix the two inputs in one when showing the full content.
full_content = input1 + input2 // perhaps with some complementary html, for a better formatting
Not sure if it applies to this case, but adding the solution below for the sake of completeness. You can use strip_tags method if you are truncating Markdown-rendered contents:
truncate(strip_tags(markdown(article.contents)), length: 50)
Sourced from:
http://devblog.boonecommunitynetwork.com/rails-and-markdown/
A simpler option that just works:
truncate(markdown(item.description), length: 100, escape: false)

Resources