Does Ruby on Rails have read stream for files? - ruby-on-rails

Does rails have a way to implement read streams like Node js for file reading?
i.e.
fs.createReadStream(__dirname + '/data.txt');
as apposed to
fs.readFile(__dirname + '/data.txt');
Where I see ruby has
file = File.new("data.txt")
I am unsure of the equivalent in ruby/rails for creating a stream and would like to know if this is possible. The reasons I ask is for memory management as a stream will be delivered piece by piece as apposed to one whole file.

If you want to read a file in Ruby piece-by-piece, there are a host of methods available to you.
IO#each_line/IO::foreach, also implemented in File to iterate over each line of the file. Neither reads the whole file into memory; instead, both simply read up until the next newline, return, and pause reading, barring a possible buffer.
IO#read/IO::read takes a length parameter, which allows you to specify for it to read up to length bytes from the file. This will only read that many, and not the whole thing.
IO::binread does the same as IO::read, but will open the file in binary mode.
IO#readpartial appears to be very similar or identical to IO#read, but is also worth looking at.
IO#getc and IO#gets both read from the file until they reach the end of what they'll return, as far as I can tell.
There are several more that I'm looking for right now.

Related

ID3 Parser and Editor

I'm writing an ID3 parser and editor. It does already support ID3v1, v2.1-2.3. Are there any other widely used ID3 versions or extensions? For example, I've read about Enhanced ID3v1 tag (which goes before ID3v1) and starts with "TAG+", but I've never seen it inside MP3 files. Should I implement support for it anyway?
"ID3v2.1" never existed.
Yes, Enhanced TAG identifies by TAG+, which extends IDv1.
For a list of all metadata systems to be expected in MP3 files see https://stackoverflow.com/a/62366354 - top priority should have ID3v2.4 as you will encounter those most aside from ID3v2.3. Then go for informal and/or legacy ones because those can still be encountered (just because files become old doesn't mean they cease to exist).
Keep the following things in mind when parsing files:
A file can have both: IDv1 and IDv2 tags.
A file can have multiple IDv2 tags (i.e. IDv2.3 and IDv2.4). Although it shouldn't occur it should pose no problem to your parser to also accept multiple tags of the same version.
ID3v2 is not limited to MP3 files (but IDv1 and all its informal extensions are).
Consider the following parsing order in an MP3 file:
Check for ID3v1 at the end of the file.
Check for ID3v1.2 in front of ID3v1.
Check for Enhanced TAG in front of ID3v1.
Check for multiple ID3v2 at the start of file and, as for ID3v2.4, a footer at the end of the file in front of all ID3v1-like tags.

What is the recommended way to make & load a library?

I want to make a small "library" to be used by my future maxima scripts, but I am not quite sure on how to proceed (I use wxMaxima). Maxima's documentation covers the save(), load() and loadFile() functions, yet does not provide examples. Therefore, I am not sure whether I am using the proper/best way or not. My current solution, which is based on this post, stores my library in the *.lisp format.
As a simple example, let's say that my library defines the cosSin(x) function. I open a new session and define this function as
(%i0) cosSin(x) := cos(x) * sin(x);
I then save it to a lisp file located in the /tmp/ directory.
(%i1) save("/tmp/lib.lisp");
I then open a new instance of maxima and load the library
(%i0) loadfile("/tmp/lib.lisp");
The cosSin(x) is now defined and can be called
(%i1) cosSin(%pi/4)
(%o1) 1/2
However, I noticed that a substantial number of the libraries shipped with maxima are of *.mac format: the /usr/share/maxima/5.37.2/share/ directory contains 428 *.mac files and 516 *.lisp files. Is it a better format? How would I generate such files?
More generally, what are the different ways a library can be saved and loaded? What is the recommended approach?
Usually people put the functions they need in a file name something.mac and then load("something.mac"); loads the functions into Maxima.
A file can contain any number of functions. A file can load other files, so if you have somethingA.mac and somethingB.mac, then you can have another file that just says load("somethingA.mac"); load("somethingB.mac");.
One can also create Lisp files and load them too, but it is not required to write functions in Lisp.
Unless you are specifically interested in writing Lisp functions, my advice is to write your functions in the Maxima language and put them in a file, using an ordinary text editor. Also, I recommend that you don't use save to save the functions to a file as Lisp code; just type the functions into a file, as Maxima code, with a plain text editor.
Take a look at the files in share to get a feeling for how other people have gone about it. I am looking right now at share/contrib/ggf.mac and I see it has a lengthy comment header describing its purpose -- such comments are always a good idea.
For principiants, like me,
Menu Edit:configure:Startup commands
Copy all the functions you have verified in the first box (this will write your wxmaxima-init.mac in the location indicated below)
Restart Wxmaxima.
Now you can access the functions whitout any load() command

ChromeWorker to write a huge file

In my extension, I need to write a huge file (say around 20 gigs) to the disk. Currently I am doing it in the main thread, but file creation is very expensive operation. I was about to move the whole file creation process to a ChromeWorker, but based on https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Functions_and_classes_available_to_workers I cannot have access to the nsiFile from a ChromeWorker.
So my questions are:
1. Is it possible to access Cc, Ci, and Cu from within a ChromeWorker?
2. If not what would be the most efficient way to create and fill large files in Firefox. Note that I need to write the file based on segments and offsets (Ci.nsISeekableStream).
It's not possible to access nsIFile from ChromeWorker. But nsIFile is horrible synchronus option.
Go with OS.File: https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/OSFile.jsm
On that page go to the link for usage on workers: https://developer.mozilla.org/docs/Mozilla/JavaScript_code_modules/OSFile.jsm/OS.File_for_workers
On the mainthread os.file returns promises.
In worker they are synchronus. Wrap your os.file functions in worker with a try-catch, as when an error occurs, (like os.file.remove with option of ignoreAbsent set to false) then the catch will hold the OS.File.Error object.
Great move to ChromeWorker btw! I'm a huge fan of ChromeWorkers. I wrote a simple example of jsm using chromeworker here: https://github.com/Noitidart/jpm-chromeworker
For segments, you'll have to OS.File.open and then on the return value do a .setPosition() then you can read certain number of bytes from that position, or write, or whatever. Its awesome stuff. OS.File is the new way and the recommended way to do file operations. Its been around awhile now though since like Firefox 29 or before that.

How to process a GCS filepattern, full file at a time?

I need to process a (GCS) bucket of files, where each file is compressed and contains a single multi-line JSON record. Also, the name of the file being processed is significant and I need to know it within my transform.
Starting with examples in the docs, TextIO looks pretty close, but it looks like its designed to process each file line-by-line and does not allow me to read the entire file at once. Also, I don't see any way to get the filename that's being processed?
PCollectionTuple results = p.apply(TextIO.Read
.from("gs://bucket/a/*.gz")
.withCompressionType(TextIO.CompressionType.GZIP)
.withCoder(MyJsonCoder.of()))
Looks like I need to write a custom IO reader, or some such? Any tips for best place to start?
You are correct that right now none of the existing classes do exactly what you want. There are 2 reasonable approaches:
Match the filepattern yourself (using IOChannelUtils and IOChannelFactory) and wrap the resulting files into a PCollection<String> where the String will be a filename, using Create.of(filenames). Then apply a ParDo with a function which reads the given filename.
Write your own subclass of Source (there's also FileBasedSource, but it's not quite right for your use case). It would be configured by the filepattern, and splitIntoBundles would match the filepattern and expand into individual sources each corresponding to one file.
I would recommend the first approach because it seems like less code and your use case does not require the full power of Source.

Is original_filename method in ActionDispatch::Http::UploadedFile safe?

Is original_filename method in ActionDispatch::Http::UploadedFile safe to use to save as file in host system, without further sanitize it?
Looking at the source, it doesn't look like they do any checking of the filename, so unless they do it somewhere else (which would be bad design and thus unlike the Rails team), the real question is: what harm can a filename do? The only cases I can think of that it might be possible to use it maliciously is:
if the file is named ".". Those can be hard to delete if you actually succeed in creating them. I doubt that Ruby would let you save a file by that name, you can try it and see. If it doesn't, this this point can be ignored.
or maybe a really long name might cause a buffer overflow somewhere deeper in the API code of the OS.
Note that neither of those should be a problem. OSes try to make it impossible to create . files, but I've seen it done. And since (most?) filesystems already have a max filename limit, they should just error or truncate the file for you; truncating yourself is a borderline paranoid measure to protect against buffer overflow exploits that may be found in the OS's API code in the future. Such an exploit is very unlikely to exist.
So, if you really want to, just check for these two cases and you should be okay. You might want to do this by subclassing the UploadedFile class and adding the functionality that, if the name is "." or "..", then you just give it a random name; and if it is over, say 100 chars, then truncate it.
But I would say that neither of these are likely enough to warrant the introduction of a nonstandard class into your code base. I would just try saving the file by the given name and depend on the underlying file saving API to catch errors, check for said errors, and report them back to the user.

Resources