Traversing directories and reading from files in ruby on rails

Traversing directories and reading from files in ruby on rails - ruby-on-rails

I'm having some trouble figuring out how to 1) traverse a directory and 2) taking each file (.txt) and saving it as a string. I'm obviously pretty new to both ruby and rails.
I know that I could save the file with f=File.open("/path/*.txt") and then output it with puts f.read but I would rather save it as a string, not .txt, and dont know how to do this for each file.
Thanks!

You could use Dir.glob and map over the filenames to read each filename into a string using IO.read. This is some pseudo code:
file_names_with_contents = Dir.glob('/path/*.txt').inject({}){|results, file_name| result[file_name] = IO.read(file_name)}

You could prob also use tap here:
file_names_with_contents = {}.tap do |h|
Dir.glob('/path/*.txt').each{|file_name| h[file_name] = IO.read(file_name)}
end

The following based on python os.walk function, which returns a list of tuples with: (dirname, dirs, files ). Since this is ruby, you get a list of arrays with:
[dirname, dirs, files]. This should be easier to process than trying to recursively walk the directory yourself. To run the code, you'll need to provide a demo_folder.
def walk(dir)
dir_list = []
def _walk(dir, dir_list)
fns = Dir.entries(dir)
dirs = []
files = []
dirname = File.expand_path(dir)
list_item = [dirname, dirs, files]
fns.each do |fn|
next if [".",".."].include? fn
path_fn = File.join(dirname, fn)
if File.directory? path_fn
dirs << fn
_walk(path_fn, dir_list)
else
files << fn
end
end
dir_list << list_item
end
_walk(dir, dir_list)
dir_list
end
if __FILE__ == $0
require 'json'
dir_list = walk('demo_folder')
puts JSON.pretty_generate(dir_list)
end

Jake's answer is good enough, but each_with_object will make it slightly shorter. I also made it recursive.
def read_dir dir
Dir.glob("#{dir}/*").each_with_object({}) do |f, h|
if File.file?(f)
h[f] = open(f).read
elsif File.directory?(f)
h[f] = read_dir(f)
end
end
end
When the directory is like:
--+ directory_a
+----file_b
+-+--directory_c
| +-----file_d
+----file_e
then
read_dir(directory_a)
willl return:
{file_b => contents_of_file_b,
directory_c => {file_d => contents_of_file_d},
file_e => contents_of_file_e}

Related

extracting path within a string using ruby

I have written a ruby script where I iterate through folders, and search for file names ending with ".xyz" . Within these files I search then for lines which have the following structure:
<ClCompile Include="..\..\..\Projects\Project_A\Applications\Modules\Sources\myfile.c"/>
This works so far with the script:
def parse_xyz_files
files = Dir["./**/*.xyz"]
files.each do |file_name|
puts file_name
File.open(file_name) do |f|
f.each_line { |line|
if line =~ /<ClCompile Include=/
puts "Found #{line}"
end
}
end
end
end
Now I would like to extract only the string between double quotes, in this example:
..\..\..\Projects\Project_A\Applications\Modules\Sources\myfile.c
I'm trying to do it with something like this (with match method):
def parse_xyz_files
files = Dir["./**/*.xyz"]
files.each do |file_name|
puts file_name
File.open(file_name) do |f|
f.each_line { |line|
if line =~ /<ClCompile Include=/.match(/"([^"]*)"/)
puts "Found #{line}"
end
}
end
end
end
The regular expression is so far ok (checked with rubular). Any idea how to do it in a simple way? I'm relative new to ruby.

You can use the String#scan method:
line = '<ClCompile Include="..\..\..\Projects\Project_A\Applications\Modules\Sources\myfile.c"/>'
path = line.scan(/".*"/).first
or in the case if your <CICompile> tag can have some other attributes then:
path = line.scan(/Include="(.*)"/).first.first
But using an XML parser is definitely a much better idea.

Use Nokogiri to parse the XML and not regex.
require 'nokogiri'
xml = '<foo><bar><ClCompile Include="..\..\..\Projects\Project_A\Applications\Modules\Sources\myfile.c"/></bar></foo>'
document = Nokogiri::XML xml
d.xpath('//ClCompile/#Include').text

Fluentd record with source filename parts

I'm using fluentd on a server to export logs.
My configuration uses something like this to capture several log files:
<source>
type tail
path /my/path/to/file/*/*.log
</source>
The different files are tracked properly, however, I have one more feature needed:
The two wildcards parts of the path should be added to the record as well (let's call them directory and filename).
If the in_tail plugin would add the filename to the record, I could write a formatter to split and edit.
Anything that I'm missing or rewriting in_tail to my heart wishes is the best way to go?

So, yes. Extending in_tail is the way to go.
I've written a new plugin that inherits from NewTailInput and uses a slightly different parse_singleline and parse_multilines to add the path to the record.
Much better than expected.
Update 6/3/2020:
I've dug up the code, this was the least Ruby I could muster to solve the problem.
Customize convert_line_to_event_with_path_names for your needs to add custom data to the records.
module Fluent
class DirParsingTailInput < NewTailInput
Plugin.register_input('dir_parsing_tail', self)
def initialize
super
end
def receive_lines(lines, tail_watcher)
es = #receive_handler.call(lines, tail_watcher)
unless es.empty?
tag = if #tag_prefix || #tag_suffix
#tag_prefix + tail_watcher.tag + #tag_suffix
else
#tag
end
begin
router.emit_stream(tag, es)
rescue
# ignore errors. Engine shows logs and backtraces.
end
end
end
def convert_line_to_event_with_path_names(line, es, path)
begin
directory = File.basename(File.dirname(path))
filename = File.basename(path, ".*")
line.chomp! # remove \n
#parser.parse(line) { |time, record|
if time && record
if directory != "logs"
record["parent"] = directory
record["child"] = filename
else
record["parent"] = filename
end
es.add(time, record)
else
log.warn "pattern not match: #{line.inspect}"
end
}
rescue => e
log.warn line.dump, :error => e.to_s
log.debug_backtrace(e.backtrace)
end
end
def parse_singleline(lines, tail_watcher)
es = MultiEventStream.new
lines.each { |line|
convert_line_to_event_with_path_names(line, es, tail_watcher.path)
}
es
end
def parse_multilines(lines, tail_watcher)
lb = tail_watcher.line_buffer
es = MultiEventStream.new
if #parser.has_firstline?
lines.each { |line|
if #parser.firstline?(line)
if lb
convert_line_to_event_with_path_names(lb, es, tail_watcher.path)
end
lb = line
else
if lb.nil?
log.warn "got incomplete line before first line from #{tail_watcher.path}: #{line.inspect}"
else
lb << line
end
end
}
else
lb ||= ''
lines.each do |line|
lb << line
#parser.parse(lb) { |time, record|
if time && record
convert_line_to_event_with_path_names(lb, es, tail_watcher.path)
lb = ''
end
}
end
end
tail_watcher.line_buffer = lb
es
end
end
end

How to edit docx with nokogiri and rubyzip

I'm using a combination of rubyzip and nokogiri to edit a .docx file. I'm using rubyzip to unzip the .docx file and then using nokogiri to parse and change the body of the word/document.xml file but ever time I close rubyzip at the end it corrupts the file and I can't open it or repair it. I unzip the .docx file on desktop and check the word/document.xml file and the content is updated to what I changed it to but all the other files are messed up. Could someone help me with this issue? Here is my code:
require 'rubygems'
require 'zip/zip'
require 'nokogiri'
zip = Zip::ZipFile.open("test.docx")
doc = zip.find_entry("word/document.xml")
xml = Nokogiri::XML.parse(doc.get_input_stream)
wt = xml.root.xpath("//w:t", {"w" => "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}).first
wt.content = "New Text"
zip.get_output_stream("word/document.xml") {|f| f << xml.to_s}
zip.close

I ran into the same corruption problem with rubyzip last night. I solved it by copying everything to a new zip file, replacing files as necessary.
Here's my working proof of concept:
#!/usr/bin/env ruby
require 'rubygems'
require 'zip/zip' # rubyzip gem
require 'nokogiri'
class WordXmlFile
def self.open(path, &block)
self.new(path, &block)
end
def initialize(path, &block)
#replace = {}
if block_given?
#zip = Zip::ZipFile.open(path)
yield(self)
#zip.close
else
#zip = Zip::ZipFile.open(path)
end
end
def merge(rec)
xml = #zip.read("word/document.xml")
doc = Nokogiri::XML(xml) {|x| x.noent}
(doc/"//w:fldSimple").each do |field|
if field.attributes['instr'].value =~ /MERGEFIELD (\S+)/
text_node = (field/".//w:t").first
if text_node
text_node.inner_html = rec[$1].to_s
else
puts "No text node for #{$1}"
end
end
end
#replace["word/document.xml"] = doc.serialize :save_with => 0
end
def save(path)
Zip::ZipFile.open(path, Zip::ZipFile::CREATE) do |out|
#zip.each do |entry|
out.get_output_stream(entry.name) do |o|
if #replace[entry.name]
o.write(#replace[entry.name])
else
o.write(#zip.read(entry.name))
end
end
end
end
#zip.close
end
end
if __FILE__ == $0
file = ARGV[0]
out_file = ARGV[1] || file.sub(/\.docx/, ' Merged.docx')
w = WordXmlFile.open(file)
w.force_settings
w.merge('First_Name' => 'Eric', 'Last_Name' => 'Mason')
w.save(out_file)
end

I stumbled accross the post and know nothing about ruby or nokogiri but ...
It looks like you are reziping the new content incorrectly.
I don't know about rubyzip, but you need a way to tell it to update the entry word/document.xml
and then resave/rezip the file.
It looks like you are just overwriting the entry with new data wich of course is going to be a different size and totally screw up the rest of the zip file.
I give an example for excel in this post Parse text file and create an excel report
which may be of use even though i am using a different zip library and VB (Im still doing exactly what you are trying to do, my code is about half way down)
here is the part that applies
Using z As ZipFile = ZipFile.Read(xlStream.BaseStream)
'Grab Sheet 1 out of the file parts and read it into a string.
Dim myEntry As ZipEntry = z("xl/worksheets/sheet1.xml")
Dim msSheet1 As New MemoryStream
myEntry.Extract(msSheet1)
msSheet1.Position = 0
Dim sr As New StreamReader(msSheet1)
Dim strXMLData As String = sr.ReadToEnd
'Grab the data in the empty sheet and swap out the data that I want
Dim str2 As XElement = CreateSheetData(tbl)
Dim strReplace As String = strXMLData.Replace("<sheetData/>", str2.ToString)
z.UpdateEntry("xl/worksheets/sheet1.xml", strReplace)
'This just rezips the file with the new data it doesnt save to disk
z.Save(fiRet.FullName)
End Using

According to the official Github documentation, you should Use write_buffer instead open. There's also a code example at the link.

Rails, use the content of file in the controller

I've a file in the config directory, let's say my_policy.txt.
I want to use the content of that file in my controller like a simple string.
#policy = #content of /config/my_policy.txt
How to achieve that goal, does Rails provide its own way to do that?
Thanks

Rails doesn't provide a way, but Ruby does:
#policy = IO.read("#{Rails.root}\config\my_policy.txt")

#policy = File.read(RAILS_ROOT + '/config/my_policy.txt')
To also cache the content (if you don't want to read it every time the variable is used):
def policy
##policy ||= File.read(RAILS_ROOT + '/config/my_policy.txt')
end
If you need something more elegant for configuration, check configatronic.

Read file as string like that?!
def get_file_as_string(filename)
data = ''
f = File.open(filename, "r")
f.each_line do |line|
data += line
end
return data
end
##### MAIN #####
#policy = get_file_as_string 'path/to/my_policy.txt'
# print out the string
puts #policy

code refactoring using pathname and writing to a file

I am using ruby on rails. Below given code works. However I was wondering if it can be written better.
# Usage: write 'hello world' to tmp/hello.txt file
# Util.write_to_file('hello world', 'a+', 'tmp', 'hello.txt')
def self.write_to_file(data, mode, *args)
input = args
filename = input.pop
dir = Rails.root.join(*input).cleanpath.to_s
FileUtils.mkdir_p(dir)
file = File.join(dir, filename)
File.open(file, mode) {|f| f.puts(data) }
end

How often are you going to be changing the mode? If not very often, I'd put it directly in the method, and do the rest like so:
def self.write_to_file(data, *args)
file = Rails.root.join(*args)
FileUtils.mkdir_p(file.dirname)
File.open(file, "a+") { |f| f.puts(data) }
end

You can just leverage the existing API instead of having to do all the dirty work yourself which cleans things up a lot:
def self.write_to_file(data, mode, *path)
path = File.expand_path(File.join(path.flatten), Rails.root)
FileUtils.mkdir_p(File.dirname(path))
File.open(path, mode) do |fh|
fh.print(data)
end
end
There's a few things to note here.
File.expand_path will resolve any "../" parts to the path.
File.dirname is great at determining the directory of an arbitrary file path.
Use File#print to write data to a file. File#puts appends a linefeed.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Traversing directories and reading from files in ruby on rails - ruby-on-rails

You could use Dir.glob and map over the filenames to read each filename into a string using IO.read. This is some pseudo code: file_names_with_contents = Dir.glob('/path/*.txt').inject({}){|results, file_name| result[file_name] = IO.read(file_name)}

You could prob also use tap here: file_names_with_contents = {}.tap do |h| Dir.glob('/path/*.txt').each{|file_name| h[file_name] = IO.read(file_name)} end

Related

extracting path within a string using ruby

Fluentd record with source filename parts

How to edit docx with nokogiri and rubyzip

Rails, use the content of file in the controller

code refactoring using pathname and writing to a file

Categories

Resources