Write to multiple files simultaneously on Julia - printing

How do I print to multiple files simultaneously in Julia? Is there a cleaner way other than:
for f in [open("file1.txt", "w"), open("file2.txt", "w")]
write(f, "content")
close(f)
end

From your question I assume that you do not mean write in parallel (which probably would not speed up things due to the fact that the operation is probably IO bound).
Your solution has one small problem - it does not guarantee that f is closed if write throws an exception.
Here are three alternative ways to do it making sure the file is closed even on error:
for fname in ["file1.txt", "file2.txt"]
open(fname, "w") do f
write(f, "content")
end
end
for fname in ["file1.txt", "file2.txt"]
open(f -> write(f, "content"), fname, "w")
end
foreach(fn -> open(f -> write(f, "content"), fn, "w"),
["file1.txt", "file2.txt"])
They give the same result so the choice is a matter of taste (you can derive some more variations of similar implementations).
All the methods are based on the following method of open function:
open(f::Function, args...; kwargs....)
Apply the function f to the result of open(args...; kwargs...)
and close the resulting file descriptor upon completion.
Observe that the processing will still be terminated if an exception is actually thrown somewhere (it is only guaranteed that the file descriptor will be closed). In order to ensure that every write operation is actually attempted you can do something like:
for fname in ["file1.txt", "file2.txt"]
try
open(fname, "w") do f
write(f, "content")
end
catch ex
# here decide what should happen on error
# you might want to investigate the value of ex here
end
end
See https://docs.julialang.org/en/latest/manual/control-flow/#The-try/catch-statement-1 for the documentation of try/catch.

If you really want to write in parallel (using multiple processes) you can do it as follows:
using Distributed
addprocs(4) # using, say, 4 examples
function ppwrite()
#sync #distributed for i in 1:10
open("file$(i).txt", "w") do f
write(f, "content")
end
end
end
For comparison, the serial version would be
function swrite()
for i in 1:10
open("file$(i).txt", "w") do f
write(f, "content")
end
end
end
On my machine (ssd + quadcore) this leads to a ~70% speedup:
julia> #btime ppwrite();
3.586 ms (505 allocations: 25.56 KiB)
julia> #btime swrite();
6.066 ms (90 allocations: 6.41 KiB)
However, be aware that these timings might drastically change for real content, which might have to be transferred to different processes. Also they probably won't scale as IO will typically be the bottleneck at some point.
Update: larger (string) content
julia> using Distributed, Random, BenchmarkTools
julia> addprocs(4);
julia> global const content = [string(rand(1000,1000)) for _ in 1:10];
julia> function ppwrite()
#sync #distributed for i in 1:10
open("file$(i).txt", "w") do f
write(f, content[i])
end
end
end
ppwrite (generic function with 1 method)
julia> function swrite()
for i in 1:10
open("file$(i).txt", "w") do f
write(f, content[i])
end
end
end
swrite (generic function with 1 method)
julia> #btime swrite()
63.024 ms (110 allocations: 6.72 KiB)
julia> #btime ppwrite()
23.464 ms (509 allocations: 25.63 KiB) # ~ 2.7x speedup
Doing the same thing with string representations of larger 10000x10000 matrices (3 instead of 10) results in
julia> #time swrite()
7.189072 seconds (23.60 k allocations: 1.208 MiB)
julia> #time swrite()
7.293704 seconds (37 allocations: 2.172 KiB)
julia> #time ppwrite();
16.818494 seconds (2.53 M allocations: 127.230 MiB) # > 2x slowdown of first call
julia> #time ppwrite(); # 30%$ slowdown of second call
9.729389 seconds (556 allocations: 35.453 KiB)

Just to add a coroutine version that does IO in parallel like the multiple-process one, but also avoids the data duplication and transfer.
julia> using Distributed, Random
julia> global const content = [randstring(10^8) for _ in 1:10];
julia> function swrite()
for i in 1:10
open("file$(i).txt", "w") do f
write(f, content[i])
end
end
end
swrite (generic function with 1 method)
julia> #time swrite()
1.339323 seconds (23.68 k allocations: 1.212 MiB)
julia> #time swrite()
1.876770 seconds (114 allocations: 6.875 KiB)
julia> function awrite()
#sync for i in 1:10
#async open("file$(i).txt", "w") do f
write(f, "content")
end
end
end
awrite (generic function with 1 method)
julia> #time awrite()
0.243275 seconds (155.80 k allocations: 7.465 MiB)
julia> #time awrite()
0.001744 seconds (144 allocations: 14.188 KiB)
julia> addprocs(4)
4-element Array{Int64,1}:
2
3
4
5
julia> function ppwrite()
#sync #distributed for i in 1:10
open("file$(i).txt", "w") do f
write(f, "content")
end
end
end
ppwrite (generic function with 1 method)
julia> #time ppwrite()
1.806847 seconds (2.46 M allocations: 123.896 MiB, 1.74% gc time)
Task (done) #0x00007f23fa2a8010
julia> #time ppwrite()
0.062830 seconds (5.54 k allocations: 289.161 KiB)
Task (done) #0x00007f23f8734010

If you only needed to read the files line by line, you could probably do something like this:
for (line_a, line_b) in zip(eachline("file_a.txt"), eachline("file_b.txt"))
# do stuff
end
As eachline will return an iterable EachLine, which will have an I/O stream linked to it.

Related

Elixir/Erlang - Trace when a message arrives in the mailbox

Fairly straight forward question, is it possible to trace messages arriving in (the mailbox of) a Process/GenServer? Note, this is different from tracing when a message is received (which would be once it leaves the mailbox and is handled). I've not found a way of doing this until now.
In erlang you have flags for it in dbg:p/2, s for sending and r for receiving:
1> dbg:tracer().
{ok,<0.82.0>}
2> dbg:p(self(), r).
(<0.80.0>) << {dbg,{ok,[{matched,nonode#nohost,1}]}}
(<0.80.0>) << {io_reply,#Ref<0.2586582558.1779957764.183997>,319}
{ok,[{matched,nonode#nohost,1}]}
(<0.80.0>) << {io_reply,#Ref<0.2586582558.1779957764.184000>,
[{expand_fun,#Fun<group.0.82824323>},
{echo,true},
{binary,false},
{encoding,latin1}]}
(<0.80.0>) << {io_reply,#Ref<0.2586582558.1779957764.184002>,ok}
3> self() ! trace_me.
(<0.80.0>) << {shell_cmd,<0.73.0>,
{eval,[{op,{1,8},
'!',
{call,{1,1},{atom,{1,1},self},[]},
{atom,{1,10},trace_me}}]},
cmd}
(<0.80.0>) << trace_me
(<0.80.0>) << {io_reply,#Ref<0.2586582558.1779957764.184006>,319}
trace_me
(<0.80.0>) << {io_reply,#Ref<0.2586582558.1779957764.184008>,
[{expand_fun,#Fun<group.0.82824323>},
{echo,true},
{binary,false},
{encoding,latin1}]}
(<0.80.0>) << {io_reply,#Ref<0.2586582558.1779957764.184011>,ok}

Is it possible to render/generate pdf in rails without using a gem?

There are some gems which can generate pdf. But I don't want to use a gem. I tried the following:
def show
respond_to do |format|
format.html
format.pdf
end
end
And for the link:
link_to show_path(#show, format: :pdf)
I can get the pdf output but it says the pdf document might not be displayed properly.
Rails does not generate pdf out-of-the box. PDF is a 7-bit text format with binary parts, so technically you can generate it manually using ERB-templates, show.pdf.erb:
%PDF-1.1
%¥±ë
1 0 obj << /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj << /Type /Pages /Kids [3 0 R] /Count 1 /MediaBox [0 0 300 300] >>
endobj
3 0 obj << /Type /Page /Parent 2 0 R
/Resources
<< /Font
<< /F1
<< /Type /Font /Subtype /Type1 /BaseFont /Times-Roman >>
>>
>>
/Contents 4 0 R
>>
endobj
4 0 obj << /Length 90 >>
stream
BT
/F1 18 Tf
90 150 Td
(Hello World!) Tj
ET
endstream
endobj
trailer << /Root 1 0 R /Size 4 >>
%%EOF
This minimal PDF is viewable by some apps, but has errors in it, because there's no xref section and object bytecounts will be wrong. Also once you need anything more complex than single page with couple text labels on it - it will become hard to maintain.
Better way of generation is actually using a gem like prawn or wicked_pdf

How to Parse with Commas in CSV file in Ruby

I am parsing the CSV file with Ruby and am having trouble in that the delimiter is a comma my data contains commas.
In portions of the data that contain commas the data is surrounded by "" but I am not sure how to make CSV ignore commas that are contained within Quotations.
Example CSV Data (File.csv)
NCB 14591 BLK 13 LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
Example Code:
require 'csv'
CSV.foreach("File.csv", encoding:'iso-8859-1:utf-8', :quote_char => "\x00").each do |x|
puts x[1]
end
Current Output: " 84.07 FT OF 25
Expected Output: 84.07 FT OF 25, ALL OF 26,
Link to the gist to view the example file and code.
https://gist.github.com/markscoin/0d6c2d346d70fd627203317c5fe3097c
Try with force_quotes option:
require 'csv'
CSV.foreach("data.csv", encoding:'iso-8859-1:utf-8', quote_char: '"', force_quotes: true).each do |x|
puts x[1]
end
Result:
84.07 FT OF 25, ALL OF 26,
The illegal quoting error is when a line has quotes, but they don't wrap the entire column, so for instance if you had a CSV that looks like:
NCB 14591 BLK 13 LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
NCB 14592 BLK 14 LOT W IRR,84.07 FT OF "25",TWENTY-FOUR SAC HOLDING COR
You could parse each line individually and change the quote character only for the lines that use bad quoting:
require 'csv'
def parse_file(file_name)
File.foreach(file_name) do |line|
parse_line(line) do |x|
puts x.inspect
end
end
end
def parse_line(line)
options = { encoding:'iso-8859-1:utf-8' }
begin
yield CSV.parse_line(line, options)
rescue CSV::MalformedCSVError
# this line is misusing quotes, change the quote character and try again
options.merge! quote_char: "\x00"
retry
end
end
parse_file('./File.csv')
and running this gives you:
["NCB 14591 BLK 13 LOT W IRR", " 84.07 FT OF 25, ALL OF 26,", "TWENTY-THREE SAC HOLDING COR"]
["NCB 14592 BLK 14 LOT W IRR", "84.07 FT OF \"25\"", "TWENTY-FOUR SAC HOLDING COR"]
but then if you have a mix of bad quoting and good quoting in a single row this falls apart again. Ideally you just want to clean up the CSV to be valid.

How to Calculate sum of all the digits in text file

I am having text file t.txt,I want to calculate sum of all the digits in text file
Example
--- t.txt ---
The rahul jumped in 2 the well. The water was cold at 1 degree Centigrade. There were 3 grip holes on the walls. The well was 17 feet deep.
--- EOF --
sum 2+1+3+1+7
My ruby code to calculate sum is
ruby -e "File.read('t.txt').split.inject(0){|mem, obj| mem += obj.to_f}"
But i am not getting any answer??
str = "The rahul jumped in 2 the well. The water was cold at 1 degree Centigrade. There were 3 grip holes on the walls. The well was 17 feet deep."
To get sum of all integers:
str.scan(/\d+/).sum(&:to_i)
# => 23
Or to get sum of all digits as in your example:
str.scan(/\d+?/).sum(&:to_i)
# => 14
PS: I used sum seeing Rails tag. If you are only using Ruby you can use inject instead.
Example with inject
str.scan(/\d/).inject(0) { |sum, a| sum + a.to_i }
# => 14
str.scan(/\d+/).inject(0) { |sum, a| sum + a.to_i }
# => 23
Your statement is computing correctly. Just add puts before File read as:
ruby -e "puts File.read('t.txt').split.inject(0){|mem, obj| mem += obj.to_f}"
# => 23.0
For summing single digit only:
ruby -e "puts File.read('t.txt').scan(/\d/).inject(0){|mem, obj| mem += obj.to_f}"
# => 14.0
Thanks

How to count lines of code?

I tried rake stats but that seems highly inaccurate. Perhaps it ignores several directories?
I use the free Perl script cloc. Sample usage:
phrogz$ cloc .
180 text files.
180 unique files.
77 files ignored.
http://cloc.sourceforge.net v 1.56 T=1.0 s (104.0 files/s, 19619.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Javascript 29 1774 1338 10456
Ruby 61 577 185 4055
CSS 10 118 133 783
HTML 1 13 3 140
DOS Batch 2 6 0 19
Bourne Shell 1 4 0 15
-------------------------------------------------------------------------------
SUM: 104 2492 1659 15468
-------------------------------------------------------------------------------
Here's a simple solution. It counts the lines of code in your rails project's app folder - CSS, Ruby, CoffeeScript, and all. At the root of your project, run this command:
find ./app -type f | xargs cat | wc -l
EDIT
Read the comments. Then try this instead:
find ./app -type f -name "*.rb" | xargs cat | sed "/^\s*\(#\|$\)/d" | wc -l
You can try out these two options:
Hack rake stats
Rakestats snippet from blogpost:
namespace :spec do
desc "Add files that DHH doesn't consider to be 'code' to stats"
task :statsetup do
require 'code_statistics'
class CodeStatistics
alias calculate_statistics_orig calculate_statistics
def calculate_statistics
#pairs.inject({}) do |stats, pair|
if 3 == pair.size
stats[pair.first] = calculate_directory_statistics(pair[1], pair[2]); stats
else
stats[pair.first] = calculate_directory_statistics(pair.last); stats
end
end
end
end
::STATS_DIRECTORIES << ['Views', 'app/views', /\.(rhtml|erb|rb)$/]
::STATS_DIRECTORIES << ['Test Fixtures', 'test/fixtures', /\.yml$/]
::STATS_DIRECTORIES << ['Email Fixtures', 'test/fixtures', /\.txt$/]
# note, I renamed all my rails-generated email fixtures to add .txt
::STATS_DIRECTORIES << ['Static HTML', 'public', /\.html$/]
::STATS_DIRECTORIES << ['Static CSS', 'public', /\.css$/]
# ::STATS_DIRECTORIES << ['Static JS', 'public', /\.js$/]
# prototype is ~5384 LOC all by itself - very hard to filter out
::CodeStatistics::TEST_TYPES << "Test Fixtures"
::CodeStatistics::TEST_TYPES << "Email Fixtures"
end
end
task :stats => "spec:statsetup"
metric_fu - A Ruby Gem for Easy Metric Report Generation
PS: I haven't tried any of the above, but metric_fu sounds interesting, see the screenshots of the output.
This one calculates number of files, total lines of code, comments, and average LOC per file. It also excludes files inside directories with "vendor" in their name.
Usage:
count_lines('rb')
Code:
def count_lines(ext)
o = 0 # Number of files
n = 0 # Number of lines of code
m = 0 # Number of lines of comments
files = Dir.glob('./**/*.' + ext)
files.each do |f|
next if f.index('vendor')
next if FileTest.directory?(f)
o += 1
i = 0
File.new(f).each_line do |line|
if line.strip[0] == '#'
m += 1
next
end
i += 1
end
n += i
end
puts "#{o.to_s} files."
puts "#{n.to_s} lines of code."
puts "#{(n.to_f/o.to_f).round(2)} LOC/file."
puts "#{m.to_s} lines of comments."
end
If your code is hosted on GitHub, you can use this line count website. Just enter your GitHub URL and wait for the result.
Example for Postgres: https://line-count.herokuapp.com/postgres/postgres
File Type Files Lines of Code Total lines
Text 1336 0 472106
C 1325 1069379 1351222
Perl 182 23917 32443
Shell 5 355 533
...

Resources