urlreq.pathname2url not returning enough slashes?

urlreq.pathname2url not returning enough slashes? - url

Windows 7, Python 3.3. I'm using the following method to generate URLs to files and folders on our shared drive:
import urllib.request as urlreq
...
urlreq.urljoin('file:', urlreq.pathname2url(path))
If path starts with a drive letter, then the above adds three slashes to the front and returns:
file:///Z:/foo
Which is exactly what I need. But if path starts with our network path "//WDSHARESPACE" (Correction: "\WDSHARESPACE") then I'm getting
file://WDSHARESPACE/Public/foo
Which works with IE, but not with Firefox. (Firefox wants the three slashes, plus the original two), so:
file://///WDSHARESPACE/Public/foo
Is there an elegant way to accomplish this, or do I need to test for the different cases? I'm not real strong in HTML coding, so would prefer not to.

From the docs:
Convert the pathname path from the local syntax for a path to the form used in the path component of a URL
The 'local syntax' on windows uses backslashes, not forward slashes. So if you pass //WDSHARESPACE, the forward slashes are not treated specially in any way.
Just have a look at the implementation to see what's really going on. If the path doesn't start with a drive letters or two backslashes, the function just converts backslashes to forward slathes and quotes the rest.
Also note this part of the docstring:
not recommended for general use

Related

Electron command line replaces single backslash with double

I'm building an app in Electron that requires a file path to be passed to it via the command line. I'm able to retrieve the command line arguments and values fine, but the backslash ('\') character keeps getting replaced with two backslashes ('\\') regardless of how I format the file path in the arguments.
For example, this is the test command: npx electron . --path='test/path\\hello\foo\/baz'
In main.js I use getSwitchValue('path') to get the value and immediately print it out:
'test/path\\\\hello\\foo\\/baz'
Notice that the forward slash stays the same, while all the backslashes are doubled.
The real problem is trying to replace the backslashes with a single backslashes or a forward slash using the string.replace method to no avail.
Obviously, I can instruct users to use forward slashes only, but when copying a file path from Windows file explorer it always uses backslashes and this the behavior expected from users.
Does anyone know why this happens and how to stop it or at least circumvent it?

Revit Ironpython Shell - Parsing a list of filenames with a number after a backslash in the path

I want to read in a list of files (inc path) from either a spreadsheet or a text file for some downstream processing. The list has been generated as a log from another process and the path includes a 2 digit year folder followed by a project number folder as follows:
\\servername\projects\19\1901001\project files\filetobeprocessed.abc
The problem is as soon as the above string is read in, it is interpreted as
\\servername\\projects\x019\x01901001\\project files\x0ciletobeprocessed.abc
Which then means that I cannot use the path to access the file.
Assigning the path string to a variable, I have tried:
thePath = repr(pathreadfromfile)
After assigning the path string I have tried fixing the string using
thePath.replace('\x0','\\')
thePath.replace('\\x0','\\')
thePath.replace(r'\x0','\\')
Nothing seems to fix the path so that it can be used to open the file.
I can't find anything in either python or Ironpython that suggests a fix for this programatically. I know that you can fix this is the path is known within the code by using r'' to use raw text to create the path.
Any help appreciated

Obviously, the backslash \ is interpreted as an escape character.
For a really simple solution, hopefully the simplest, I would suggest using forward slash / for all your path separators instead of backslash.
If you really need the backslash somewhere further down the road, you can replace them back again.

Slash at the end of url

I think (correct me if I am wrong) that it is better to put a / at the end of most of url. Like this: http://www.myweb/file/
And not put / at the end of filenames: http://www.myweb/name.html
I have to correct that in a website with a lot of links. Is there a way I can do that in a fast way. For instance in some programs like Dreamweaver I can use find and replace.
The second case is quite easy with Dreamweaver:
- Find: .html/"
- Replace: .html"
But how can I say something like:
- Find: all the links that end with a directory. Like http://www.myweb/file
- Replace: the same link but with a / at the end. Like http://www.myweb/file/

Your approach may work but it is based on the assumption that all files have a file extension.
There is a distinct difference between the urls http://www.myweb/file and http://www.myweb/file/ because the latter could resolve to http://www.myweb/file/index.php, or any other in the default set configured in your web server. That URL could also reference a perfectly valid file which doesn't contain a file extension, such as if it were a REST endpoint.
So you are correct insofar as you should explicitly add a "/" if you are referring to a directory, for example if you are expecting the web server to look up the correct index page to respond, or doing a directory listing.
To replace the incorrect URLS, regular expressions are your friend.
To find all files which have an erroneous "/" you could use /\.(html|php|jpg|png)\//, adding as many different file extensions into that pipe-separated list as you like. You can then replace that with .$1 or .\1 depending on your tool.
An example of doing this with Perl would be:
perl -pi -e 's/\.(html|php|jpg|png)\//.\1/g' theFileYouWantToCheck.html
Of (if you're using a Linux-based system) you can automate that nicely with find:
find path/to/html/root -type f -name "*.html* | xargs perl -pi -e 's/\.(html|php|jpg|png)\//.\1/g'
which will find all html files in the directory and do an inline find and replace. Assuming you're using version control, it's then easy to see the changes it's applied :)
Update
Solving the problem for adding a slash to directories isn't trivial. The approach I'd take:
Write a script to recurse through your website structure locally, making a list of all files
Parse the HTML files to extract all href=".*" and replace them with href=".*/" only if the end of the URL isn't present in the list extracted by the first script.
Any text-based find and replace is not going to be aware of whether the link is actually to a file or not.

File.dirname windows path returns

I want to extract the directory name from a windows path. The windows path is a string, something like the following:
"c:\\some\path\name"
when I do the following:
File.dirname("c:\\some\\path\\name")
The result is
"."
If I run this on the unix path it works fine
File.dirname("/some/path/name") => "/some/path"
Do I need to somehow set the FILE::ALT_SEPARATOR? I have tried different variations of the path to no avail.

One solution I found is to replace all the backslashes with a forward slash. This works decently well. However there still must be a better solution.
File.dirname("c:\\some\\path\\name".gsub('\\', '/')).gsub('/', '\\')
=> "c:\\some\\path"
I sub the backslashes back in after the dirname call in order to keep the representation consistent.

The recommended way is to always use unix-type forward slash for path separators in Ruby code. Even if you use it on Windows OS, they will be correctly mapped internally to its backslash path separators.
If the backslash comes from the user input, then you need to detect whether the OS is such that allows a backslash in a file name (e.g., Windows does not, Unix does). Then if the backslash is not allowed, then you should convert them to forward slash during validation. In the Ruby code, keep all separators as forward slash. So, you should not be caring about backslashes when using commands such as File.dirname.

What are the rules for file extensions in Windows and Unix?

i'm currently using File::Basename fileparse to separate out a file's directory, base file name and it's extension using something like this:
my($myfile_name,$mydirectory, $file_extension) = fileparse($$rhash_params{'storage_full_path_location'},'\..{1,4}');
But see that there's a variation where you can actually provide a array of suffixes to the function, the array would contains all the known file extension.
So i'm trying to find a safe way to do this as i've seen that i've got some strange file names to process, i.e. file.0f1.htm, etc.
Question:
Is there a list of commonly used
extension for Windows and Unix
systems? But in my case it's mainly
for Windows.
And is it safe to
assume that all file names in
Windows should have an extension
ending with three letter characters?
And if there's an even better way to do this, please share.
Thanks.
Updates:
So obviously i must be drunk to forgot about those other extension. :)
Thus i've updated the current regex to allow from 1-4chars.
In this case, how should i change my regex line to properly match it?
Or is it an even better idea to look for all those commonly used extension from google and put them into an array to be passed to the function instead? My users are usually either students or teachers.

1. Is there a list of commonly used extension for Windows and Unix
systems? But in my case it's mainly
for Windows.
Yes, loads, all over the internet: http://www.google.com/search?q=common+file+extensions
2. And is it safe to assume that all file names in Windows should have
an extension ending with three letter
characters?
No, it's perfectly possible to use '.c', '.java', etc in Windows.

There are several fault assumptions in your code:
files need not have extensions. For example most binary executables on Unix/Linux/... don't have an extension at all. They are simply calls "bash", "wget", "sed", "Xorg", ...
extensions need not be three characters long, as #Alnitak already told you: ".c", ".java", ".mpeg", ".jpeg", ".html" are all perfectly fine and rather wide-spread extensions
cutting at the last "." is probably saver, but can still fail for files with no extensions or with multiple (or multi-part) extensions such as ".tar.gz", "tar.bz2", which occur rather often in the Unix/Linux/...-World

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart