Ruby: Using a regular expression to find and open a file based on its filename? - ruby-on-rails

I am trying to test the contents of a file that is generated from code. The problem is that the full name of the file is based on a timestamp abc123_#{d.strftime('%Y%m%d%I%M%S')}.log
How could I use File to find this file and read it? I tried doing File.exists?() with a regular expression as the parameter but that didn't work.
I found this in another question on stackoverflow:
File.basename(file_path).match(/_.*(css|scss|sass)/)
How would I be able to use that in my case where the file is located in mypublic folder?
ANSWER
So the two answers below both work and I used a combination of them.
Dir['public/*.log'].select { |f| f =~ /purge_cc_website/}
The * acts as a wildcard that is sort of a regular expression in itself. After that you filter the array using an actual regex.

Dir[] takes a file glob so, if your pattern isn't too complicated, you can just do:
Dir['public/abc123_*.log']
More glob info here.

File is for reading one file. You need to use Dir to find files by name.
files = Dir['*'].select {|x| x =~ /_.*(css|scss|sass)/ }
If you just want the last file in the case of dups:
files = Dir['*'].select {|x| x =~ /_.*(css|scss|sass)/ }.sort.last

Related

Apache Beam ReadFromText() pattern match returns no results

I'm writing an Apache Beam pipeline in python and trying to load multiple text files but encounter an error when using the pattern match. When I pass in an exact filename, the pipeline runs correctly.
For example:
files = p | 'Read' >> ReadFromText('lyrics.txt')
However, when using pattern match an error occurs:
files = p | 'Read' >> ReadFromText('lyrics*')
IOError: No files found based on the file pattern
In this example, I have several files that start with "lyrics".
I've tried many different pattern types but haven't had any success with anything except passing the complete file name. Is there a different way to apply pattern match in this case?
Updated with answer
If you're on Windows don't forget to use a backslash instead of forward slash when specifying directories. For example: ReadFromText('.\lyrics*')
This looks like a bug. I've filed https://issues.apache.org/jira/browse/BEAM-7560. In the meantime, try an absolute path or ReadFromText('./lyrics*').

Can I set directory pattern when using TAILDIR source on Apache Flume?

I use flume-1.8.0.
On the document, it says that I cannot set the directory pattern.
(Regular expression (and not file system patterns) can be used for filename only.)
But I have to set the directory pattern to get log from other system which controlled by other team.
Is there some solution to set directory path like /dir/201801/0101.log, /dir/201802/0001.log, ... ?
Use something like this for the file groups with file patterns i.e use the Regex ASCII pattern see https://en.wikipedia.org/wiki/Regular_expression for more details
a1.sources.r1.filegroups.f2 = /path/to/files/with/pattern/databundle_cnt_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
In your case I will advise
a1.sources.r1.filegroups.f2 = /dir/[0-9]{6}/[0-9]{4}.log

Ruby glob: Exclude dir and file

For the import of my Sass files, I use sass-rails' (https://github.com/rails/sass-rails) glob feature. It says
Any valid ruby glob may be used
I want to exclude a directory and a file when using #import. Any ruby code using blocks don't work in this scenario. But even trying to exclude a single file doesn't work the way I want.
Consider this tree structure
/_bar.scss
/_foo.scss
/all.scss
For example, I want to exclude the file _foo.scss. I read here https://stackoverflow.com/a/27707682/228370, using a ! you can negate a pattern.
I tried the following:
Dir["{[!_foo]}*.scss"]
=> ["all.scss"]
But this skips _bar.scss. When looking into the glob reference of Ruby (http://ruby-doc.org/core-2.2.0/Dir.html#method-c-glob) it becomes clear why:
[set]
Matches any one character in set. Behaves exactly like character sets in Regexp, including set negation ([^a-z]).
(apparently, negation can be achieved with ! AND ^)
Because we have an underscore in our pattern, every file with an underscore gets excluded.
But what would be the solution, to exclude a fixed file?
There's probably a regex way of doing it. But if you're talking about one specific file, it might be easier to just do:
Dir["*.scss"].reject { |i| i == '_foo.scss' }

rename file name with eloquent way

File.rename(blog_path + '/' + project_path, File.expand_path(topic_name, blog_path))
I use these code to rename ruby file name, but I think there is a better way to write this functionality with less code since it includes blog_path two times.
The code is OK, but I think there is no need to expand_path here - this method creates an absolute path from the the relative one.
Also, it is good to use File.join to create a path instead just concatenate it with slash - it will be completely OS independent. So I would write your code like this:
File.rename(File.join(blog_path, project_path), File.join(blog_path, topic_name))
Or if you want to get rid of doubled blog_path, change working directory before doing a rename:
Dir.chdir(blog_path)
File.rename(project_path, topic_name)
More info on working with files and directories in Ruby you can find in the article: Ruby for Admins: Files and Directories.

Slash at the end of url

I think (correct me if I am wrong) that it is better to put a / at the end of most of url. Like this: http://www.myweb/file/
And not put / at the end of filenames: http://www.myweb/name.html
I have to correct that in a website with a lot of links. Is there a way I can do that in a fast way. For instance in some programs like Dreamweaver I can use find and replace.
The second case is quite easy with Dreamweaver:
- Find: .html/"
- Replace: .html"
But how can I say something like:
- Find: all the links that end with a directory. Like http://www.myweb/file
- Replace: the same link but with a / at the end. Like http://www.myweb/file/
Your approach may work but it is based on the assumption that all files have a file extension.
There is a distinct difference between the urls http://www.myweb/file and http://www.myweb/file/ because the latter could resolve to http://www.myweb/file/index.php, or any other in the default set configured in your web server. That URL could also reference a perfectly valid file which doesn't contain a file extension, such as if it were a REST endpoint.
So you are correct insofar as you should explicitly add a "/" if you are referring to a directory, for example if you are expecting the web server to look up the correct index page to respond, or doing a directory listing.
To replace the incorrect URLS, regular expressions are your friend.
To find all files which have an erroneous "/" you could use /\.(html|php|jpg|png)\//, adding as many different file extensions into that pipe-separated list as you like. You can then replace that with .$1 or .\1 depending on your tool.
An example of doing this with Perl would be:
perl -pi -e 's/\.(html|php|jpg|png)\//.\1/g' theFileYouWantToCheck.html
Of (if you're using a Linux-based system) you can automate that nicely with find:
find path/to/html/root -type f -name "*.html* | xargs perl -pi -e 's/\.(html|php|jpg|png)\//.\1/g'
which will find all html files in the directory and do an inline find and replace. Assuming you're using version control, it's then easy to see the changes it's applied :)
Update
Solving the problem for adding a slash to directories isn't trivial. The approach I'd take:
Write a script to recurse through your website structure locally, making a list of all files
Parse the HTML files to extract all href=".*" and replace them with href=".*/" only if the end of the URL isn't present in the list extracted by the first script.
Any text-based find and replace is not going to be aware of whether the link is actually to a file or not.

Resources