How to split a text file into 2 files using number of lines? - ruby-on-rails

I have been facing some issue with file concepts. I have a text file in which I have 1000 lines. I want to split that file into 2 and each of which should contain 500 lines.
For that I wrote the following code, but it splits that by giving certain memory space.
class Hello
def chunker f_in, out_pref, chunksize = 500
File.open(f_in,"r") do |fh_in|
until fh_in.eof?
ch_path = "/my_applications//#{out_pref}_#{"%05d"%(fh_in.pos/chunksize)}.txt"
puts "choose path: "
puts ch_path
File.open(ch_path,"w") do |fh_out|
fh_out << fh_in.read(chunksize)
puts "FH out : "
puts fh_out
end
end
end
end
end
f=Hello.new
f.chunker "/my_applications/hello.txt", "output_prefix"
I am able to split the parent file according to memory size(500kb).
But I want that gets splitted by number of lines. How can I achieve that.
Please help me.

Calculating the middle line pivot, and output according it.
out1 = File.open('output_prefix1', 'w')
out2 = File.open('output_prefix2', 'w')
File.open('/my_applications/hello.txt') do |file|
pivot = file.lines.count / 2
file.rewind
file.lines.each_with_index do |line, index|
if index < pivot
out1.write(line)
else
out2.write(line)
end
end
end
out1.close
out2.close

file = File.readlines('hello.txt')
File.open('first_half.txt', 'w') {|new_file| new_file.puts file[0...500]}
File.open('second_half.txt', 'w') {|new_file| new_file.puts file[500...1000]}

Related

Move line from one text file to another

I have a list of names (names.txt) separated by line. After I loop through each line, I'd like to move it to another file (processed.txt).
My current implementation to loop through each line:
open("names.txt") do |csv|
csv.each_line do |line|
url = line.split("\n")
puts url
# Remove line from this file amd move it to processed.txt
end
end
def readput
#names = File.readlines("names.txt")
File.open("processed.txt", "w+") do |f|
f.puts(#names)
end
end
You can do it like this:
File.open('processed.txt', 'a') do |file|
open("names.txt") do |csv|
csv.each_line do |line|
url = line.chomp
# Do something interesting with url...
file.puts url
end
end
end
This will result in processed.txt containing all of the urls that were processed with this code.
Note: Removing the line from names.txt is not practical using this method. See How do I remove lines of data in the middle of a text file with Ruby for more information. If this is a real goal of this solution, it will be a much larger implementation with some design considerations that need to be defined.

Switch case with regex in Ruby?

I made a Ruby script which uses a regex to cut line from a file and paste it in a new file. That part works fine. Now I want to enter a region into the console and the output should show me just the region which i typed in before.
For example I'm looking for this "ipc-abc-01" and type abc in, now it should show me all entries with abc.
found = 0
words = []
puts "Please enter a word to search the list"
input = gets.chomp
#Open text file in read mode
File.open("list.txt", "r+") do |f|
f.each do |line|
if m = line.match( /\b#{input}\b/i )
puts "#{m[0]} "
# ... etc.
found = 1
end
end
end
Is this what you were looking for? this searches all strings by user given input in regex form that is case insensitive..
words = []
print "Please enter a word to search the braslist: "
input = gets.chomp
#Open text file in read mode
File.open("braslist.txt", "r+") do |f|
f.each do |line|
words += line.split(" ").grep(/#{input}/i)
end
end
puts "#{words.length} words found!"
puts words
To match all words containing zhb in: zhb440 zhb440 izhh550 ipc-zhb790-r-br-01you can use:
\S*?(?:zhb)\S*
if line =~ /\S*?(?:zhb)\S*/
match = $~[1]
else
match = ""
end
See the DEMO

RUBY: Read text file into 2D array, while also removing newlines?

I'm trying to input text from a file, ignoring the first line, and adding each character to a 2D array at a separate index [[],[]]. So far I can add the characters to their own index but can't remove the newline characters with .chomp or etc.
I want my end result to be
[['*','.','.','.']","['.','.','*','.']","['.','.','.','.']]
So that [0][0] will return * for example, and [0] will return *...
Right now I'm returning
[['*','.','.','.',"\n"]","['.','.','*','.',"\n"]","['.','.','.','.',"\n"]]
The code in question is:
def load_board(file)
board = File.readlines(file)[1..-1].map do |line|
line.split("").map(&:to_s)
end
end
origin_board = load_board('mines.txt')
print origin_board
If I try the following code:
def load_board(file)
board = File.readlines(file)[1..-1].map do |line|
line.split.map(&:to_s)
end
end
origin_board = load_board('mines.txt')
print origin_board
I end up with a 2D array like:
[["*..."],["..*."],["...."]]
Stripping your input should help:
def load_board(file)
board = File.readlines(file)[1..-1].map do |line|
# ⇓⇓⇓⇓⇓
line.strip.split("").map(&:to_s)
end
end
String#strip will remove leading-trailing blanks from a string. The other option is not to use readlines in favour of read and split by $/ (line separator):
def load_board(file)
board = File.read(file).split($/)[1..-1].map do |line|
line.split("").map(&:to_s)
end
end
You can add a .chomp before the .split, like this:
def load_board(file)
board = File.readlines(file)[1..-1].map do |line|
line.chomp.split("").map(&:to_s)
end
end
Haven't exactly tested it but did some fiddling and it should work http://repl.it/hgJ/1.

prepending numbers to a file

I've been trying to open a file, read the contents and then numbering the contents of that file and saving it. So for example the file contains:
This is line 1.
This is line 2.
This is line 3.
the output should be :
This is line 1.
This is line 2.
This is line 3.
I'm incredibly new to ruby so I've only gotten as far as adding the lines to an array. But now I don't know how to add numbers to each item of the array. Here is what I have:
class AddNumbers
def insert_numbers_to_file(file)
#file_array = []
line_file = File.open(file)
line_file.each do |line|
#file_array << [line]
end
end
end
Any help or hints would be appreciated.
Thank you
Enumerators have an #each_with_index method that you can use:
class AddNumbers
def insert_numbers_to_file(file)
#file_array = []
File.open(file).each_with_index do |line, index|
#file_array << "%d. %s" % [index, line]
end
end
end
The magic variable $. is your ticket to ride here:
class AddNumbers
def insert_numbers_to_file(file)
#file_array = []
line_file = File.open(file)
line_file.each do |line|
#file_array << "#{$.}: #{line}"
end
#file_array
end
end

Buffered/RingBuffer IO in Ruby + Amazon S3 non-blocking chunk reads

I have huge csv files (100MB+) on amazon s3 and I want to read them in chunks and process them using ruby CSV library. I'm having a hard time creating the right IO object for csv processing:
buffer = TheRightIOClass.new
bytes_received = 0
RightAws::S3Interface.new(<access_key>, <access_secret>).retrieve_object(bucket, key) do |chunk|
bytes_received += buffer.write(chunk)
if bytes_received >= 1*MEGABYTE
bytes_received = 0
csv(buffer).each do |row|
process_csv_record(row)
end
end
end
def csv(io)
#csv ||= CSV.new(io, headers: true)
end
I don't know what the right setup here should be and what the TheRightIOClass is. I don't want to load the entire file into memory with StringIO. Is there a bufferedio or ringbuffer in ruby to do this?
If anyone has a good solution using threads(no processes) and pipes I would love to see it.
You can use StringIO and do some clever Error handling to insure you have an entire row in a chunk before handling it. The packer class in this example just accumulates the parsed rows in memory until you flush them to disk or a database.
packer = Packer.new
object = AWS::S3.new.buckets[bucket].objects[path]
io = StringIO.new
csv = ::CSV.new(io, headers: true)
object.read do |chunk|
#Append the most recent chunk and rewind the IO
io << chunk
io.rewind
last_offset = 0
begin
while row = csv.shift do
#Store the parsed row unless we're at the end of a chunk
unless io.eof?
last_offset = io.pos
packer << row.to_hash
end
end
rescue ArgumentError, ::CSV::MalformedCSVError => e
#Only rescue malformed UTF-8 and CSV errors if we're at the end of chunk
raise e unless io.eof?
end
#Seek to our last offset, create a new StringIO with that partial row & advance the cursor
io.seek(last_offset)
io.reopen(io.read)
io.read
#Flush our accumulated rows to disk every 1 Meg
packer.flush if packer.bytes > 1*MEGABYTES
end
#Read the last row
io.rewind
packer << csv.shift.to_hash
packer

Resources