How to config Jetty to interpret correctly UTF-8 in filenames - character-encoding

After upgrading to eXist-db 4.7.0, we have now Jetty 404 errors for filenames with UFT-8 accented or Chinese characters.
Any idea if there is a config file to manage that?
For eg.:
HTTP ERROR 404
Problem accessing /.../dicoEnviro-fr/humanit%C3%A9.xml.
Reason: Document /.../dicoEnviro-fr/humanité.xml not found
Powered by Jetty:// 9.4.14.v20181114

Use Jetty 9.4.20.v20190813 (there are updates to UTF-8 handling of resources on java.nio.file.FileSystem present since 9.4.16.v20190411 that you need)
Since I don't know what eXist-db does to start Jetty, I'm going to assume it's embedded, and answer based on that assumption.
Make sure your ServletContextHandler or WebAppContext is declared to use a Base Resource that is defined as a PathResource object pointing to your directory location defined as a java.nio.file.Path object.
Advice about Base Resource declaration:
Do not use a String to define it, that winds up being a URLResource which works with URL references, not file system paths, and you'll have the problem you are experiencing.
Do not use a File to define it, as that winds up being a FileResource which is deprecated functionality, and known to have problems with UTF-8 references.
Ensure your java.nio.file.Path is an absolute path. (no relative paths)
Ensure your java.nio.file.Path is normalized. (no "//" or "/../" segments)

Related

what is url shorthand in a filesystem

This should be pretty simple I need know what dots mean in a url such as "../../../Program Files (x86)/Filed/examples/tmw_desert_spacing.png"
I'm assuming this is some kind of shorthand that means "the same as the current directory"/etc/folder/file.png a link to an article that explains this would be nice too, my google search turned up nothing since im not even sure this is called a url. thanks
more info: the program im writing won't except this as the file name, I need to konw what need to change to become acceptable.
According to RFC 3986:
The path segments "." and "..", also known as dot-segments, are
defined for relative reference within the path name hierarchy. They
are intended for use at the beginning of a relative-path reference
(Section 4.2) to indicate relative position within the hierarchical
tree of names.
The takeaway is that they have the same meaning as in paths on a linux or windows system - single dot means "the directory specified by the preceding part of the path", two dots mean "the parent directory of the directory specified by the preceding part of

Delphi : how to check if a file exists (path over 255 characters)

I need to make my delphi app able to check if a file copied using Robocopy is there or not when its path exceeds 255 characters.
I have tried the usual "If FileExists(MyFile) then ... " but it always returns "false" even if the file is there.
I also tried to get the file's date but I get 1899/12/30 which can be considered as an empty date.
A File search does not return anything either.
Prefix the file name with \\?\ to enable extended-length path parsing. For example you would write
if FileExists('\\?\'+FileName) then
....
Note that this will only work if you are calling the Unicode versions of the Win32 API functions. So if you use a Unicode Delphi then this will do the job. Otherwise you'll have to roll your own version of FileExists that calls Unicode versions of the API functions.
These issues are discussed in great length over on MSDN: Naming Files, Paths, and Namespaces.

Why should I use #Url.Content("~/blah-blah-blah")?

I can't understand the benefit(s) that I can get from Url.Content() method in ASP.NET MVC. For example, you see src='#Url.Content("~/Contents/Scripts/jQuery.js")'. Why should I use it? What reasons might exist for using it? What benefits, advantages, etc. over using plain old simple references like src='/scripts/jquery.js?
Update: Based on the answers, I'd like to know if there is any other reason for using it, other than handling virtual folders? Because I haven't seen using virtual applications that much (which of course doesn't mean that it hasn't been used that much).
Usually, your web application is published as: www.yoursite.com/. The ~ character matches the root of the site /.
However, if you publish your site withing a virtual directory www.yoursite.com/mywebapp/, the the ~ character would match "/mywebapp/".
Hard-coding Urls with "/" character would cause wrong page references.
Mapping virtual paths is it's only purpose.
If you do not ever need to map them and are sure your app or it folders will not sit under other apps then it won't serve you any purpose.
From the docs note if you don't use ~ you get no change in the result anyways:
"Remarks
If the specified content path does not start with the tilde (~) character, this method returns contentPath unchanged.
"
It is usefull if your applications root path is not the root path of your server. Url.Content("~/") returns the root folder of your application.

relative pathnames in XSLT 2.0

Consider:
<xsl:result-document
href="{string-join(
($scripts-offset, $metadata-directory, $redirect-file),
'/'
)}"
format="text">
in which the net effect of the string-join is "../resources/foo.txt".
What is this supposed to be relative to? The style sheet? The input document?
EDIT
Cher answerers: after posing this question, I had a burst of energy and coffee and read the spec for xs:result-document carefully, and I also read the implementation of Saxon-B. The spec calls for the href to be relative to the 'primary output document'. Depending on how you call Saxon, it might set that up correctly from the File object you supply it as a target ... or it might require you to make an extra call to set it up. So upvotes all around, and sorry for all the trouble.
This is implementation defined.
From http://www.w3.org/TR/xslt20/#creating-result-trees
The href attribute is optional. The
default value is the zero-length
string. The effective value of the
attribute must be a URI Reference,
which may be absolute or relative.
There may be implementation-defined
restrictions on the form of absolute
URI that may be used, but the
implementation is not required to
enforce any restrictions. Any legal
relative URI must be accepted. Note
that the zero-length string is a legal
relative URI.
The base URI of the document node at
the root of the final result tree is
based on the effective value of the
href attribute. If the effective value
is a relative URI, then it is resolved
relative to the base output URI. If
the implementation provides an API to
access final result trees, then it
must allow a final result tree to be
identified by means of this base URI.
And from http://www.w3.org/TR/xslt20/#dt-base-output-uri
This document does not specify any
application programming interfaces or
other interfaces for initiating a
transformation. This section, however,
describes the information that is
supplied when a transformation is
initiated. Except where otherwise
indicated, the information is
required.
A base output URI. [Definition: The base output URI is a URI to
be used as the base URI when resolving a relative URI allocated to a final
result tree. If the transformation generates more than one final result tree, then typically each one will be allocated a URI relative to this base
URI. ] The way in which a base output URI is established is implementation-defined.
But more important, think about this note:
Note:
The base URI of the final result tree
is not necessarily the same thing as
the URI of its serialized
representation on disk, if any. For
example, a server (or browser client)
might store final result trees only in
memory, or in an internal disk cache.
As long as the processor satisfies
requests for those URIs, it is
irrelevant where they are actually
written on disk, if at all.
In Saxon and AltovaXML it's relative to path from XSLT processor were called. For example:
cd somePath
java -classpath lib\saxon9he.jar net.sf.saxon.Transform -o:output.xml xml\input.xml xsl\stylesheet.xsl
"C:\Program Files (x86)\Altova\AltovaXML2011\AltovaXML.exe" -xslt2 xsl\stylesheet.xsl -in xml\input.xml -out output.xml
In your case it would be:
somePath\..\resources\foo.txt

Find long (>255) filenames

There are some folder with more than 100 files on it.
But all files and folders names broken with wrong encoding names (UTF->ANSI).
"C:\...\Госдача-Лечебни корпус\вертолетка\Госдача-Лечебни корпус\Госдача-Лечебни корпус\вертолетка\Госдача-Лечебни корпус\вертолетка\Госдача-Лечебни корпус\Госдача-Лечебни корпус\Госдача-Лечебни корпус\вертолетка\Госдача-Лечебни корпус\Госдача-Лечебни корпус\вертолетка\Госдача-Лечебни корпус\..."
Regular function Utf8ToAnsi finxing it, but FindFirst can't search folders with names longer than 255 symbols.
It gaves me only 70/100 files.
FindFirst wraps the Win32 API function FindFirstFile, and the Unicode version of that function can search paths up to 32,767 characters long if you prepend \\?\ to the path you're passing in, like \\?\C:\Folder\Folder\*.
Since Delphi 2009 and newer call the Unicode functions for you, you can just use FindFirst and co there. For Delphi 2007 and earlier (ANSI versions), you'll need to call FindFirstFile/FindNextFile/FindClose from Windows.pas directly. For more information check the Naming a file section of the platform SDK.
Do note that using \\?\ disables various bits of path processing though, so make sure it's a fully qualified path without any '.' or '..' entries. You can use the same trick to open file streams, rename, or copy files with longer paths.
The shell (Explorer) doesn't support this though, so you still need to limit those to at most MAX_PATH characters for things like SHFileOperation (to delete to the recycle bin) or ShellExecute. In many cases you can work around the problem by passing in the DOS 8.3 names instead of the long ones. FindFirst's TSearchRec doesn't expose the short names, but FindFirstFile's TWin32FindData structure does as cAlternateFileName.
Change the current directory (ChDir) to the deepest one you know about, and then pass a relative path to FindFirst or FindFirstFile.
No path component in that file name is longer than MAX_PATH characters, so you should be able to work your way into the directories one step at a time.
Beware that multithreaded programs may be sensitive to changes in the current directory since a process has only one current directory shared by all the threads.

Resources