Is it safe to split a path on '/' to parse it? - path

I am trying to get all directories in a path. For example, from a/b/c/d.e I would like to get a, a/b, and a/b/c. I can achieve this by calling functions like posixdirname several times. The problem is with paths like a/b/c/. I would like to get a, a/b, and a/b/c. Since there is a / following c, c should be a directory that I would like to list. But functions like dirname return a/b instead of a/b/c when given a/b/c/ as input.
Can I just split on / to get the list of directories or is there an edge case where this wouldn't work?

The only special cases I know are
with file name at the end
with a / at the end
with a folder name called a.e (may or may be not at the end)
symbolic link
Windows path (which is using \ but I think it is not applying in your case)
User input error, put multiple /. e.g. /tmp//something which is supported in some programmes
If your function is supporting all of the above cases, I think it is fine.

Related

Can I expand glob in context of a macro?

let's say that I want to generate multiple rules in a macro based on repository contents - something like:
def mymacro(dests) :
for d in dests:
myrule(name = d, ...)
# in a BUILD:
mymacro(dests = glob(["some/pkg/path/**"]))
So far, I've always gotten an empty list when I try this (although the path has many entries). Is such a thing possible or am I doing something wrong?
Using glob like that should work fine. Glob also works within a macro (though you have to do native.glob(...))
The glob pattern is probably just not matching anything. Glob will happily return an empty list if the pattern matches nothing (you can pass allow_empty=False to change that behavior).
Note that glob will not traverse into subpackages. So with your example some/pkg/path/**, if there's a build file in one of the subdirectories (some/BUILD, some/pkg/BUILD, some/pkg/path/BUILD, etc) then glob won't looking for anything in that subdirectory. To make files visible from one package to another, typically you'll have a filegroup (maybe with its own glob) in one package that another package depends on.

Can I avoid hardcoding file locations in SPSS syntax?

I'm using SPSS 25 syntax to open and process a set of datafiles. I would like these syntax files to be as portable as possible. For that reason, I want the user to be able to select the file locations at runtime without having to recode the syntax itself.
I'm running Windows 10, although hopefully that doesn't matter. I do have the Python plugin for SPSS, although ideally this would be a base SPSS syntax solution.
In SPSS right now, I'm doing this:
GET
FILE='C:\Users\xkcd\studies\project\rawdata'+
'\reallyraw\veryraw.sav'
PASSWORD='CorrectHorseBatteryStaple'.
DATASET NAME Demo WINDOW=FRONT.
In R, I would do this:
message("Where is the veryraw.sav file?")
demo<-fread(file.choose())
Ideally, the user would, at runtime, select the individual files one at a time.
Less ideally, the user would select a folder in which all of the files, with known names.
I could use FILE HANDLE so that the user would only have to hardcode a few folder locations, but that's less than ideal - I really would rather that the user isn't editing the syntax at all.
Thanks in advance!
Following up on the idea of a fully automated process - the following code will work assuming there is a specific file name you need to run your code on, and only one copy exists in the folder you are searching. This is possible to run on drive C: directly, but will take much less time to run if you can narrow down the path:
* this will create a text file that has the path of the required file.
HOST COMMAND=['dir /s /b "C:\Users\somename\*required file name.sav" > C:\Users\somename\tempname.sps'].
* now to read the name and put in in a handle.
DATA LIST file = "C:\Users\somename\tempname.sps" fixed / pth 1-500 (a).
exe.
string cmd(a500).
compute cmd=concat("file handle myfile / name='", rtrim(pth), "'.").
write out="C:\Users\somename\tempname.sps" /cmd.
exe.
* inserting the new syntax will activate the handle.
insert file = "C:\Users\somename\tempname.sps".
Now you can use the handle myfile in the syntax, e.g:
get file=myfile.

Slash at the end of url

I think (correct me if I am wrong) that it is better to put a / at the end of most of url. Like this: http://www.myweb/file/
And not put / at the end of filenames: http://www.myweb/name.html
I have to correct that in a website with a lot of links. Is there a way I can do that in a fast way. For instance in some programs like Dreamweaver I can use find and replace.
The second case is quite easy with Dreamweaver:
- Find: .html/"
- Replace: .html"
But how can I say something like:
- Find: all the links that end with a directory. Like http://www.myweb/file
- Replace: the same link but with a / at the end. Like http://www.myweb/file/
Your approach may work but it is based on the assumption that all files have a file extension.
There is a distinct difference between the urls http://www.myweb/file and http://www.myweb/file/ because the latter could resolve to http://www.myweb/file/index.php, or any other in the default set configured in your web server. That URL could also reference a perfectly valid file which doesn't contain a file extension, such as if it were a REST endpoint.
So you are correct insofar as you should explicitly add a "/" if you are referring to a directory, for example if you are expecting the web server to look up the correct index page to respond, or doing a directory listing.
To replace the incorrect URLS, regular expressions are your friend.
To find all files which have an erroneous "/" you could use /\.(html|php|jpg|png)\//, adding as many different file extensions into that pipe-separated list as you like. You can then replace that with .$1 or .\1 depending on your tool.
An example of doing this with Perl would be:
perl -pi -e 's/\.(html|php|jpg|png)\//.\1/g' theFileYouWantToCheck.html
Of (if you're using a Linux-based system) you can automate that nicely with find:
find path/to/html/root -type f -name "*.html* | xargs perl -pi -e 's/\.(html|php|jpg|png)\//.\1/g'
which will find all html files in the directory and do an inline find and replace. Assuming you're using version control, it's then easy to see the changes it's applied :)
Update
Solving the problem for adding a slash to directories isn't trivial. The approach I'd take:
Write a script to recurse through your website structure locally, making a list of all files
Parse the HTML files to extract all href=".*" and replace them with href=".*/" only if the end of the URL isn't present in the list extracted by the first script.
Any text-based find and replace is not going to be aware of whether the link is actually to a file or not.

Matlab 'addpath/rmpath' not working in my case

Let me explain my situation with some dummy file names.
I am working in directory 'A' which has a sub directory 'a'. I am running a function 'func1' which is present in both folders. 'func1' needs 'file1' & 'file2' during its execution. 'file1' & 'file2' are present in both folders with some parameters changed inside them. It is not possible for me to change file names at all.
Now, the problem is that when I am running 'func1' in 'A', everything is working fine. But, when I run 'func1' in 'a' using 'addpath/rmpath', rather than using 'file1' & 'file2' from 'a', it is using 'file1' & 'file2' from 'A' which is producing wrong results.
Please tell me how can I change path so that when I run 'func1' in sub directory 'a', it always use 'file1' & 'file2' from 'a' rather than directory 'A'.
I hope I am clear in my explanation :S
If I have understood correctly, you are hoping that if you use addpath to add the subdirectory to the search path, Matlab will give the search path precedence over the current directory. Unfortunately, it is precisely the other way around, as per the Matlab documentation: "Functions in the current folder take precedence over functions with the same file name that reside anywhere on the search path." - and this also applies to the load function when reading data files. (incidentally, I suspect that for this reason you are also not running the version of func1 that you think you are running - try typing which func1 to find out).
Anyway, the solution here is to make sure that Matlab picks the right version of file1 and file2, which you could do in several ways:
Change your working directory to a, since the working directory has precedence: cd a
Put the two versions into separate subfolders, e.g. a and b, and use addpath to add them separatley
Change the different versions of func1 to have explicit references to the files, i.e. load('./a/file1')
With addpath and rmpath you modify the search path in Matlab. Your search path basically is a list of folders where Matlab looks for functions. Not for files you want to open.
If you have your files in folder A and this is your current working directory, Matlab will look for the files in A. If you change to a and change your working directory accordingly, Matlab will open the files in a - this has nothing to do with your search path. If you want to open files from a specific directory, use the entire path in the open command:
fileID = fopen('/path/to/A/file1');
In your case, the case may be that the fopen is applied in the way explained above. If you want Matlab to always open files from the current working directory, change it to:
fileID = fopen('file1');

Compose path (with boost::filesystem)

I have a file that describes input data, which is split into several other files. In my descriptor file, I first give the path A that tells where all the other files are found.
The originator may set either a relative (to location of the descriptor file) or absolute path.
When my program is called, the user gives the name of the descriptor file. It may not be in the current working directory, so the filename B given may also contain directories.
For my program to always find the input files at the right places, I need to combine this information. If the path A given is absolute, I need to just that one. If it is relative, I need to concatenate it to the path B (i.e. directory portion of the filename).
I thought boost::filesystem::complete may do the job for me. Unfortunately, it seems it is not. I also did not understand how to test wether a path given is absolute or not.
Any ideas?
Actually I was quite misguided first but now found the solution myself. When "base" holds the path A, and filename holds B:
boost::filesystem::path basepath(base), filepath(filename);
if (!basepath.is_complete())
basepath = filepath.remove_leaf() /= basepath;
base = basepath.string();
It works with Linux at least (where it would be very easy to do without boost, but oh well..), still have to test with Windows.

Resources