I'm having some trouble figuring out how to 1) traverse a directory and 2) taking each file (.txt) and saving it as a string. I'm obviously pretty new to both ruby and rails.
I know that I could save the file with f=File.open("/path/*.txt") and then output it with puts f.read but I would rather save it as a string, not .txt, and dont know how to do this for each file.
Thanks!
You could use Dir.glob and map over the filenames to read each filename into a string using IO.read. This is some pseudo code:
file_names_with_contents = Dir.glob('/path/*.txt').inject({}){|results, file_name| result[file_name] = IO.read(file_name)}
You could prob also use tap here:
file_names_with_contents = {}.tap do |h|
Dir.glob('/path/*.txt').each{|file_name| h[file_name] = IO.read(file_name)}
end
The following based on python os.walk function, which returns a list of tuples with: (dirname, dirs, files ). Since this is ruby, you get a list of arrays with:
[dirname, dirs, files]. This should be easier to process than trying to recursively walk the directory yourself. To run the code, you'll need to provide a demo_folder.
def walk(dir)
dir_list = []
def _walk(dir, dir_list)
fns = Dir.entries(dir)
dirs = []
files = []
dirname = File.expand_path(dir)
list_item = [dirname, dirs, files]
fns.each do |fn|
next if [".",".."].include? fn
path_fn = File.join(dirname, fn)
if File.directory? path_fn
dirs << fn
_walk(path_fn, dir_list)
else
files << fn
end
end
dir_list << list_item
end
_walk(dir, dir_list)
dir_list
end
if __FILE__ == $0
require 'json'
dir_list = walk('demo_folder')
puts JSON.pretty_generate(dir_list)
end
Jake's answer is good enough, but each_with_object will make it slightly shorter. I also made it recursive.
def read_dir dir
Dir.glob("#{dir}/*").each_with_object({}) do |f, h|
if File.file?(f)
h[f] = open(f).read
elsif File.directory?(f)
h[f] = read_dir(f)
end
end
end
When the directory is like:
--+ directory_a
+----file_b
+-+--directory_c
| +-----file_d
+----file_e
then
read_dir(directory_a)
willl return:
{file_b => contents_of_file_b,
directory_c => {file_d => contents_of_file_d},
file_e => contents_of_file_e}
Related
I have written a ruby script where I iterate through folders, and search for file names ending with ".xyz" . Within these files I search then for lines which have the following structure:
<ClCompile Include="..\..\..\Projects\Project_A\Applications\Modules\Sources\myfile.c"/>
This works so far with the script:
def parse_xyz_files
files = Dir["./**/*.xyz"]
files.each do |file_name|
puts file_name
File.open(file_name) do |f|
f.each_line { |line|
if line =~ /<ClCompile Include=/
puts "Found #{line}"
end
}
end
end
end
Now I would like to extract only the string between double quotes, in this example:
..\..\..\Projects\Project_A\Applications\Modules\Sources\myfile.c
I'm trying to do it with something like this (with match method):
def parse_xyz_files
files = Dir["./**/*.xyz"]
files.each do |file_name|
puts file_name
File.open(file_name) do |f|
f.each_line { |line|
if line =~ /<ClCompile Include=/.match(/"([^"]*)"/)
puts "Found #{line}"
end
}
end
end
end
The regular expression is so far ok (checked with rubular). Any idea how to do it in a simple way? I'm relative new to ruby.
You can use the String#scan method:
line = '<ClCompile Include="..\..\..\Projects\Project_A\Applications\Modules\Sources\myfile.c"/>'
path = line.scan(/".*"/).first
or in the case if your <CICompile> tag can have some other attributes then:
path = line.scan(/Include="(.*)"/).first.first
But using an XML parser is definitely a much better idea.
Use Nokogiri to parse the XML and not regex.
require 'nokogiri'
xml = '<foo><bar><ClCompile Include="..\..\..\Projects\Project_A\Applications\Modules\Sources\myfile.c"/></bar></foo>'
document = Nokogiri::XML xml
d.xpath('//ClCompile/#Include').text
I'm using fluentd on a server to export logs.
My configuration uses something like this to capture several log files:
<source>
type tail
path /my/path/to/file/*/*.log
</source>
The different files are tracked properly, however, I have one more feature needed:
The two wildcards parts of the path should be added to the record as well (let's call them directory and filename).
If the in_tail plugin would add the filename to the record, I could write a formatter to split and edit.
Anything that I'm missing or rewriting in_tail to my heart wishes is the best way to go?
So, yes. Extending in_tail is the way to go.
I've written a new plugin that inherits from NewTailInput and uses a slightly different parse_singleline and parse_multilines to add the path to the record.
Much better than expected.
Update 6/3/2020:
I've dug up the code, this was the least Ruby I could muster to solve the problem.
Customize convert_line_to_event_with_path_names for your needs to add custom data to the records.
module Fluent
class DirParsingTailInput < NewTailInput
Plugin.register_input('dir_parsing_tail', self)
def initialize
super
end
def receive_lines(lines, tail_watcher)
es = #receive_handler.call(lines, tail_watcher)
unless es.empty?
tag = if #tag_prefix || #tag_suffix
#tag_prefix + tail_watcher.tag + #tag_suffix
else
#tag
end
begin
router.emit_stream(tag, es)
rescue
# ignore errors. Engine shows logs and backtraces.
end
end
end
def convert_line_to_event_with_path_names(line, es, path)
begin
directory = File.basename(File.dirname(path))
filename = File.basename(path, ".*")
line.chomp! # remove \n
#parser.parse(line) { |time, record|
if time && record
if directory != "logs"
record["parent"] = directory
record["child"] = filename
else
record["parent"] = filename
end
es.add(time, record)
else
log.warn "pattern not match: #{line.inspect}"
end
}
rescue => e
log.warn line.dump, :error => e.to_s
log.debug_backtrace(e.backtrace)
end
end
def parse_singleline(lines, tail_watcher)
es = MultiEventStream.new
lines.each { |line|
convert_line_to_event_with_path_names(line, es, tail_watcher.path)
}
es
end
def parse_multilines(lines, tail_watcher)
lb = tail_watcher.line_buffer
es = MultiEventStream.new
if #parser.has_firstline?
lines.each { |line|
if #parser.firstline?(line)
if lb
convert_line_to_event_with_path_names(lb, es, tail_watcher.path)
end
lb = line
else
if lb.nil?
log.warn "got incomplete line before first line from #{tail_watcher.path}: #{line.inspect}"
else
lb << line
end
end
}
else
lb ||= ''
lines.each do |line|
lb << line
#parser.parse(lb) { |time, record|
if time && record
convert_line_to_event_with_path_names(lb, es, tail_watcher.path)
lb = ''
end
}
end
end
tail_watcher.line_buffer = lb
es
end
end
end
I'm using a combination of rubyzip and nokogiri to edit a .docx file. I'm using rubyzip to unzip the .docx file and then using nokogiri to parse and change the body of the word/document.xml file but ever time I close rubyzip at the end it corrupts the file and I can't open it or repair it. I unzip the .docx file on desktop and check the word/document.xml file and the content is updated to what I changed it to but all the other files are messed up. Could someone help me with this issue? Here is my code:
require 'rubygems'
require 'zip/zip'
require 'nokogiri'
zip = Zip::ZipFile.open("test.docx")
doc = zip.find_entry("word/document.xml")
xml = Nokogiri::XML.parse(doc.get_input_stream)
wt = xml.root.xpath("//w:t", {"w" => "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}).first
wt.content = "New Text"
zip.get_output_stream("word/document.xml") {|f| f << xml.to_s}
zip.close
I ran into the same corruption problem with rubyzip last night. I solved it by copying everything to a new zip file, replacing files as necessary.
Here's my working proof of concept:
#!/usr/bin/env ruby
require 'rubygems'
require 'zip/zip' # rubyzip gem
require 'nokogiri'
class WordXmlFile
def self.open(path, &block)
self.new(path, &block)
end
def initialize(path, &block)
#replace = {}
if block_given?
#zip = Zip::ZipFile.open(path)
yield(self)
#zip.close
else
#zip = Zip::ZipFile.open(path)
end
end
def merge(rec)
xml = #zip.read("word/document.xml")
doc = Nokogiri::XML(xml) {|x| x.noent}
(doc/"//w:fldSimple").each do |field|
if field.attributes['instr'].value =~ /MERGEFIELD (\S+)/
text_node = (field/".//w:t").first
if text_node
text_node.inner_html = rec[$1].to_s
else
puts "No text node for #{$1}"
end
end
end
#replace["word/document.xml"] = doc.serialize :save_with => 0
end
def save(path)
Zip::ZipFile.open(path, Zip::ZipFile::CREATE) do |out|
#zip.each do |entry|
out.get_output_stream(entry.name) do |o|
if #replace[entry.name]
o.write(#replace[entry.name])
else
o.write(#zip.read(entry.name))
end
end
end
end
#zip.close
end
end
if __FILE__ == $0
file = ARGV[0]
out_file = ARGV[1] || file.sub(/\.docx/, ' Merged.docx')
w = WordXmlFile.open(file)
w.force_settings
w.merge('First_Name' => 'Eric', 'Last_Name' => 'Mason')
w.save(out_file)
end
I stumbled accross the post and know nothing about ruby or nokogiri but ...
It looks like you are reziping the new content incorrectly.
I don't know about rubyzip, but you need a way to tell it to update the entry word/document.xml
and then resave/rezip the file.
It looks like you are just overwriting the entry with new data wich of course is going to be a different size and totally screw up the rest of the zip file.
I give an example for excel in this post Parse text file and create an excel report
which may be of use even though i am using a different zip library and VB (Im still doing exactly what you are trying to do, my code is about half way down)
here is the part that applies
Using z As ZipFile = ZipFile.Read(xlStream.BaseStream)
'Grab Sheet 1 out of the file parts and read it into a string.
Dim myEntry As ZipEntry = z("xl/worksheets/sheet1.xml")
Dim msSheet1 As New MemoryStream
myEntry.Extract(msSheet1)
msSheet1.Position = 0
Dim sr As New StreamReader(msSheet1)
Dim strXMLData As String = sr.ReadToEnd
'Grab the data in the empty sheet and swap out the data that I want
Dim str2 As XElement = CreateSheetData(tbl)
Dim strReplace As String = strXMLData.Replace("<sheetData/>", str2.ToString)
z.UpdateEntry("xl/worksheets/sheet1.xml", strReplace)
'This just rezips the file with the new data it doesnt save to disk
z.Save(fiRet.FullName)
End Using
According to the official Github documentation, you should Use write_buffer instead open. There's also a code example at the link.
I've a file in the config directory, let's say my_policy.txt.
I want to use the content of that file in my controller like a simple string.
#policy = #content of /config/my_policy.txt
How to achieve that goal, does Rails provide its own way to do that?
Thanks
Rails doesn't provide a way, but Ruby does:
#policy = IO.read("#{Rails.root}\config\my_policy.txt")
#policy = File.read(RAILS_ROOT + '/config/my_policy.txt')
To also cache the content (if you don't want to read it every time the variable is used):
def policy
##policy ||= File.read(RAILS_ROOT + '/config/my_policy.txt')
end
If you need something more elegant for configuration, check configatronic.
Read file as string like that?!
def get_file_as_string(filename)
data = ''
f = File.open(filename, "r")
f.each_line do |line|
data += line
end
return data
end
##### MAIN #####
#policy = get_file_as_string 'path/to/my_policy.txt'
# print out the string
puts #policy
I am using ruby on rails. Below given code works. However I was wondering if it can be written better.
# Usage: write 'hello world' to tmp/hello.txt file
# Util.write_to_file('hello world', 'a+', 'tmp', 'hello.txt')
def self.write_to_file(data, mode, *args)
input = args
filename = input.pop
dir = Rails.root.join(*input).cleanpath.to_s
FileUtils.mkdir_p(dir)
file = File.join(dir, filename)
File.open(file, mode) {|f| f.puts(data) }
end
How often are you going to be changing the mode? If not very often, I'd put it directly in the method, and do the rest like so:
def self.write_to_file(data, *args)
file = Rails.root.join(*args)
FileUtils.mkdir_p(file.dirname)
File.open(file, "a+") { |f| f.puts(data) }
end
You can just leverage the existing API instead of having to do all the dirty work yourself which cleans things up a lot:
def self.write_to_file(data, mode, *path)
path = File.expand_path(File.join(path.flatten), Rails.root)
FileUtils.mkdir_p(File.dirname(path))
File.open(path, mode) do |fh|
fh.print(data)
end
end
There's a few things to note here.
File.expand_path will resolve any "../" parts to the path.
File.dirname is great at determining the directory of an arbitrary file path.
Use File#print to write data to a file. File#puts appends a linefeed.