What's a "canonical path"? - path

So, an absolute path is a way to get to a certain file or location describing the full route to it, the full path, and it's OS dependent (the absolute paths for Windows and Linux, for example, are different). A relative path, on the other hand, is a route to a file or location which is described from the current location .. (two dots) indicating a superior level in the directories tree. That has been clear to me for several years now.
When searching I've even seen that there are canonicalized files too!
All I know is that CANONICAL means something like "according to the rules" or something.
Can somebody enlighten me in therms of theory about canonical stuff?

The whole point of making anything "canonical" is so that you can compare two things. For example, both ../../here/bar/x and ./test/../../bar/x may refer to the same location, but you can't do a textual comparison on the two paths. However, if you turn them into their canonical representation, they both become ../bar/x, and we see that they actually refer to the same thing.
In short, it is often the case that you have many ways of referring to one thing, and in that case you may be able to define a canonical representation which is unique and which allows you to get a handle on col­lections of such things.
(If you're looking for more examples, all of mathematics is full of "canonical" constructions for all sorts of objects, and very much with the same purpose in mind. Maybe this Wikipedia article can provide some ad­ditional directions.)

A good way to define a canonical path will be: the shortest absolute path (short, in the meaning of string-length).
This is an example of the difference between an absolute path and a canonical path:
absolute path: C:\abc\..\abc\file.txt
canonical path: C:\abc\file.txt
Canonicalization is a type of normalization which allows an object to be identified in a unique way. A relative path cannot do it, by definition.
For more info:
https://en.wikipedia.org/wiki/Canonicalization
https://en.wikipedia.org/wiki/Canonical_form

What a canonical path is (or its difference from an absolute path) is system dependent.
Typically if a (full) path contains aliases, shortcuts or symbolic links the canonical path resolves all these into the actual directories they refer.
Example: if /bin/a is a sym link, you can find it anywhere you request for an absolute path e.g. from java.io.File#getAbsolutePath while the real file (i.e. the actual target of the link) i.e. usr/local/bin/a would be return as a canonical path e.g. from java.io.File#getCanonicalPath

A good definition of a canonical path is given in the documentation of readlink in GNU Coreutils. It is specified that 'Canonicalize mode' returns an equivalent path that doesn't have any of these things:
hard links to self (.) and parent (..) directories
repeated separators (/)
symbolic links
The string length is irrelevant, as is demonstrated in the following example.
You can experiment with readlink -f (canonicalize mode) or its preferred equivalent command realpath to see the difference between an 'absolute path' and a 'canonical absolute path' for some programs on your system if you are running linux or are using GNU Coreutils.
I can get the path of 'java' on my system using which
$ which java
/usr/bin/java
This path, however, is actually a symbolic link to another symbolic link. This symbolic link chain can be displayed using namei.
$ namei $(which java)
f: /usr/bin/java
d /
d usr
d bin
l java -> /etc/alternatives/java
d /
d etc
d alternatives
l java -> /usr/lib/jvm/java-17-openjdk-amd64/bin/java
d /
d usr
d lib
d jvm
d java-17-openjdk-amd64
d bin
- java
The canonical path can be found using the previously mentioned realpath command.
$ realpath $(which java)
/usr/lib/jvm/java-17-openjdk-amd64/bin/java

The most issues with canonical paths occur when you are passing the name of a dir and not file. For file, if we are providing absolute path that is also the canonical path. But for dir it means omitting the last "/". For example, "/var/tmp/foo" is a canonical path while "/var/tmp/foo/" is not.

Related

What do the last two entries of PATH tell you?

I'm still learning how PATH works, what exactly is echo $PATH's output meant to be? What do they tell us?
What exactly is echo $PATH's output meant to be?
The current path.
What do the last two entries tell you?
The last two directories on your path.
In other words, it's unclear what you're asking. Do you not know what a path is? Are you asking what a particular sequence of special characters mean in the context of a path? What is the explicit path you're asking about?
$PATH variable is simply a list of paths that your system automatically checks whenever you run a command on, say your bash terminal.
PATH is a colon : (Unix-like) or semi-colon ; (Windows) separated list.
Whenever you run a command like ls -lrt, your system looks for the definition of the command (or function) ls. So the definition of the PATH variable being established, we can answer your two questions:
what exactly is echo $PATH's output meant to be
The $PATH's output provides a list of paths.
What do the last two entries of PATH tell you?
PATH is list is of all the paths where your system will look for a command. So the last 2 entries tell you the last 2 paths out of the list.

What is the conceptual difference between bin and gen?

For ordinary rules, output gets written in the bin dictionary. For e.g. genrules output is written to genfiles directory. While this is not surprising given the name of the latter name, I wonder why there is a and what is the conceptual difference.
There isn't a particularly good reason (and you can actually write genrules' output to bin with the output_to_bindir attribute and put Skylark outputs anywhere you want).
It's just historical. There are actually a couple of other output directories like those (e.g., testlogs, include), they're just the most common.

TFS drop, exclude obj folder using minimatch pattern

I'm setting up TFS 2015 on-prem and I'm having an issue on my last build step, Publish Build Artifacts. For some reason, the build agent appears to be archiving old binaries and I'm left with a huge filepath:
E:\TFSBuildAgent\_work\1a4e9e55\workspace\application\Development\project\WCF\WCF\obj\Debug\Package\Archive\Content\E_C\TFSBuildAgent\_work\1a4e9e55\workspace\application\Development\project\WCF\WCF\obj\Debug\Package\PackageTmp\bin
I'm copying the files using the example minimatch pattern to begin with:
**\bin
I'm only testing at the moment so this is not a permanent solution but how can I copy all binaries that are in a bin folder but not a descendant of obj?
From research I think that this should work, but it doesn't (It doesn't match anything):
**!(obj)**\bin
I'm using www.globtester.com to test. Any suggestions?
On a separate note, I'll look into the archiving issue later but if anyone has any pointers on it, feel free to comment. Thanks
In VSTS there are two kinds of pattern matching for URLs that are built-in to the SDKs. Most tasks nowadays use the Minimatch pattern as described in Matt's answer. However, some use the pattern that was used by the 1.x Agent's Powershell SDK. That format is still available in the 2.x Agent's Powershell SDK by the way.
So that means there are 5 kinds of tasks:
1.x agent - Powershell SDK
2.x agent - Node SDK
2.x agent - Powershell 1 Backwards compatibility
2.x agent - Powershell 3 SDK - Using find-files
2.x agent - Powershell 3 SDK - Using find-match
The ones in bold don't Minimatch, but the format documented in the VSTS-Task-SDK's find-files method.
The original question was posted in 2015, at which point in time the 2.x agent wasn't yet around. In that case, the pattern would, in all likelihood, be:
**\bin\$(BuildConfiguration)\**\*;-:**\obj\**
The -: excludes the items from the ones in front of it.
According to Microsoft's documentation, here is a list of
file matching patterns you can use. The most important rules are:
Match with ?
? matches any single character within a file or directory name (zero or one times).
Match with * or +
* or + matches zero or more characters within a file or directory name.
Match with # sign
# matches exactly once.
Match with Brackets (, ) and |
If you're using brackets with | it is treated as a logical OR, e.g. *(hello|world) means "Zero or more occurrances of hello or world"
Match with Double-asterisk **
** recursive wildcard. For example, /hello/**/* matches all descendants of /hello.
Exclude patterns with !
Leading ! changes the meaning of an include pattern to exclude. Interleaved exclude patterns are supported.
Character sets with [ and ]
[] matches a set or range of characters within a file or directory name.
Comments with #
Patterns that begin with # are treated as comments.
Escaping
Wrapping special characters in [] can be used to escape literal glob characters in a file name. For example the literal file name hello[a-z] can be escaped as hello[[]a-z].
Example
The following expressions can be used in the Contents field of the "Copy Files" build step to create a deployment package for a web project:
**\?(.config|.dll|*.sitemap)
**\?(.exe|.dll|.pdb|.xml|*.resx)
**\?(.js|.css|.html|.aspx|.ascx|.asax|.Master|.cshtml|*.map)
**\?(.gif|.png|.jpg|.ico|*.pdf)
Note: You might need to add more extensions, depending on the needs of your project.

What do the last lines in Lua's `package.config` mean?

The Lua specs say about package.config (numbering added by me):
The first line is the directory separator string. Default is '\' for Windows and '/' for all other systems.
The second line is the character that separates templates in a path. Default is ';'.
The third line is the string that marks the substitution points in a template. Default is '?'.
The fourth line is a string that, in a path in Windows, is replaced by the executable's directory. Default is '!'.
The fifth line is a mark to ignore all text before it when building the luaopen_ function name. Default is '-'.
My paraphrasing:
Absolutely clear (example for Windows/other systems makes it fool proof)
There can be multiple paths in a path string. They are separated by this symbol (; by default).
Wherever Lua finds this character in the path string (? by default), it will replace it with the module name supplied to the require or package.searchpath functions and check whether that file exists.
So far, so good, but the last two lines aren't entirely clear to me.
Why does it say "in a path in Windows"? Does that mean on other platforms, this doesn't have any significance? If so, why?
It took me a while to make sense of this, but eventually another part of the specs gave me a hint:
The name of this C function is the string "luaopen_" concatenated with a copy of the module name where each dot is replaced by an underscore. Moreover, if the module name has a hyphen, its prefix up to (and including) the first hyphen is removed. For instance, if the module name is a.v1-b.c, the function name will be luaopen_b_c.
So is this symbol (- by default) intended to make different versions of a library available at the same time – potentially with an unprefixed symlink to the newest version so that the same library would be accessible on two paths (i.e. under two module names), but with only one C symbol name?
4: Applications for Linux have libraries installed system-wide; however, for Windows, libraries can be installed in the current directory.
5: Versioning and project forking, I believe, would be the reason behind this.

Confusion about globbing patterns

On the GruntJS site, it has a section on globbing patterns, but there is something I'm a little confused on.
foo/**/*.js will match all files ending with .js in the foo/
subdirectory and all of its subdirectories.
I see that the double asterisk matches all paths including the / but if a file was in the foo path, would that mean that it's trying to match a path called foo//*.js?
Before I found that, I was trying something like foo/{,**}*.js but that never really did what I wanted and I am a little confused on why that didn't work.
The double asterisk means that the pattern should perform a recursive match; i.e. look through all subdirectories it finds. For example, the pattern will match:
1. foo/bar.js
2. foo/baz.js
3. foo/bar/baz.js
4. foo/bar/baz/qux.js
It will not match a foo.txt file. Although, a pattern such as foo/** will match everything recursively (txt, js, css etc).
Whereas, a pattern such as foo/*.js will only match 1 and 2 because it is not a recursive pattern.

Resources