Is this substring-after and concat breaking my encoding? - xslt-2.0

Our GSA uses a FileConnector to index different shares which are targets of DFS Links. I am trying to rewrite file://filesrv01.example.com/share$/dir/file.ext to file://R:/hare/dir/file.ext in the frontend XSL.
There is a xsl:choose element wich tests for different protocols but not file://, so I assume the default handling for my source links would be this node:
<xsl:otherwise>
<xsl:value-of disable-output-escaping='yes' select="U"/>
</xsl:otherwise>
We created a new xsl:when node like this:
<xsl:when test="starts-with(U, 'file://server.example.com/share$>
<xsl:value-of disable-output-escaping='yes'
select="concat('file://R:/share/',
substring-after(U,'file://server.example.com/share$/') )"/>
</xsl:when>
This works for almost all entries in our index, but it fails when the path contains a german umlaut. Following input, actual and expected Output:
file://server/share$/dir/FileWithUmläut.txt
file://R:/share/dir/FileWithUmläut.txt
file://R:/share/dir/FileWithUmläut.txt
Why is the default xsl:otherwise working without changing umlauts but our concat+substring is not? Anything I could check or change?
Edit #1
There is only one output element in the XSL file: <xsl:output method="html"/>. The XSL itself is recognised as ANSI in Notepad++ with some Umlauts in UI texts. Output to the browser is utf-8 xhtml.
Edit #2
When I replace the xsl:when with the following block, the encoding is not broken and the link can be opened (not using the DFS root but directly using unc). Because of this I believe it is not the encoding of XML or XSL, thanks for your input nevertheless, #MathiasMüller.
<xsl:when test="starts-with(U, 'file://server.example.com/share$/')">
<xsl:value-of disable-output-escaping='yes' select="U"/>
</xsl:when>

My specific problem vanished as soon as I used file:///R:/ instead of file://R:/ (additional forward slash) but I still try to figure out why that helped. In the GSA XSL there is a JavaScript snippet to "fix" encoding issues in IE but that does not care if the protocol has 2 or 3 slashes.
Although Firefox does not allow the file protocol out of the box, neither syntax works when copied from there. This leads me to believe that my currently installed IE 9 fixes some encoding issues on its own when using the correct file:/// prefix and Firefox does not.
As we would like the links to work in Firefox too, I will continue my quest for glory in the land of unicode, plagued by the ancient dragon of file:/// and home to the houses of IE and FF.

Related

What encoding is this and how do I turn it into something I can see properly?

I'm writing a script that will operate on the subtitle files of a popular streaming service (Netfl*x).
The subtitle files have strange characters in them and I can't get them to render in a way that my text editors or web browser will display in a readable way. The xml encoding says UTF-8, but some characters are not readable.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<tt xmlns:tt="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:tickRate="10000000" ttp:timeBase="media" xmlns="http://www.w3.org/ns/ttml">
<p>de 15 % la nuit dernière.</span></p>
<p>if youâve got things to doâ¦</span></p>
And in Vim:
This is what it looks like in the browser:
How can I convert this into something I can use?
I'll go out on a limb and say that file is UTF-8 encoded just fine, and you're merely looking at it using the wrong encoding. The character À encoded in UTF-8 is C3 80. C3 in ISO-8859-1 is Ã, which in your screenshot is followed by an 80. So looks like you're looking at a UTF-8 file using the (wrong) ISO-8859 encoding.
Use the correct encoding when opening the file.
My terminal is set to en_US.UTF-8, but was also rendering this supposedly UTF-8 encoded file incorrectly (sonné -> sonné). I was able to solve this by using iconv to encode the file in ISO8859-1.
iconv original.xml -t ISO8859-1 -o converted.xml
In the new file, the characters were properly rendered, although I don't quite understand why.

Saxon 9.8: Which patterns are supported in EXPath File Module function file:list?

Good afternoon,
I am working with Java Saxon 9.8.0.4. I would like to use EXPath File Module function "file:list" with its third "pattern" parameter. But I am in doubt, which style of pattern is supported.
I read both Saxon documentation and EXPath documentation. But I do not know, which patterns are supported in Saxon 9.8.0.4. It would be great to support regular expression, but I understand it is overkill for most users. I tried several blind tests, but just * and ? wildchars works for me as defined in EXPath documentation.
Yes, I can quite easily do regexp postprocessing in for-each, but to know more about list function could help.
Thank You in advance for Your help, Stepan
P.S: My use-case is to get all files without extension ("test" and not "test.txt") recursively from large and deep directory structure and process all of matching files with XSL-T 3.0. Most of such files have identical fileName and thus I can not do "copy to one folder" pre-processing for Saxon's -s:directory -o:directory one time invocation and invocation of Java (Saxon) for each file is of cource terrible time overhead. So I would like to read all matching files into sequence and process each item of such sequence using for-each (files are text ones and I read them using unparsed-text). And no, GAWK is not solution, as I have all transformation infrastructure from XML to SQL already in XSL-T, because 95 % of files are XMLs.
--ADDED code and explanation below:
Example of my test files.
XML file "a.xml":
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="a.xsl"?>
<root/>
XSL-T file "a.xsl":
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
xmlns:expathFile="http://expath.org/ns/file"
exclude-result-prefixes="xs saxon"
version="3.0">
<xsl:output method="text" />
<xsl:template match="/root">
<xsl:variable name="list" select="expathFile:list('C:\temp\temp\test\', false(), '^.*$')"/>
<xsl:for-each select="$list">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
My folder "C:\temp\temp\test\" contains 6 test files: "a.txt", "b.txt", "c.txt", "e", "f", "g".
But after testing of online Java RegExp tester on "http://www.regexplanet.com/advanced/java/index.html" I have found, that the problem is solely on my side, because Java regular expression behaves little different than PCRE (Perl), sed, gawk regular expressions. So it is my fault and I need to learn Java regular expression.
Saxon uses the same code for this pattern as for the filter in select="pattern" in collection URIs, which is described at http://www.saxonica.com/documentation/index.html#!sourcedocs/collections
Extracting the relevant details:
The pattern used in the select parameter can use glob-like syntax, for
example *.xml selects all files with extension "xml". More generally,
the pattern is converted to a regular expression by prepending "^",
appending "$", replacing "." by "\.", "*" by ".*", and "?" by ".?",
and it is then used to match the file names appearing in the directory
using the Java regular expression rules. So, for example, you can
write ?select=*.(xml|xhtml) to match files with either of these two
file extensions. Note however, that special characters used in the URL
(that is, characters such as backslash and curly braces that are not
allowed in the query part of a URI) must be escaped using the %HH
convention. For example, vertical bar needs to be written as %7C. This
escaping can be achieved using the encode-for-uri() function.
Note that Saxon's collection() function now also supports match=pattern in the URI, where the pattern is a standard XPath 3.1 regular expression.

Adding a datamodule to the delphi object repository

I am using D10 Pro. I added a datamodule to the object repository by right clicking it and selecting "Add to Repository" on the popup menu.
The datamodule shows up in the New>Other dialog and I am able to click the icon for it. When I do, I get the following exception: "Unable to find both a form () and source file (). The same exception occurs with forms I place there. The object that came with Delphi load without any problem. How do I fix this?
When adding items to the repository, you should avoid using dotnet style names for your files. For example, I originally named the file "MyLib.Datamodule.TextImporter.pas" and I received the error in my question. I experienced the same problem with a form using the same dotnet style naming. After changing the file name to "TextImporterDatamodule.pas" and adding it to the repository, I was able to use it to create new datamodules without a problem. This is something Embarcadero needs to address.
I can't answer your q, but maybe this will help you track down your problem.
Contrary to what the DocWiki says for Seattle, the repository .Xml file is actually named "Repository.Xml" and in my case is located here:
C:\Users\MA\AppData\Roaming\Embarcadero\BDS\17.0\Repository.Xml
I added a data module to it, resulting in the entry shown below being added.
Notice that for a datamodule, the path to it is stored in its IDString
attribute along with the filename, unlike a form, where the path+name is stored
in the the Value attribute of the FormName node.
With that entry in place, unlike you I can then include a copy of it in a project
by going to File | New | Other in the IDE. However, if I then change the
on-disk name of the folder where the item is located, and try to use it, I get the error
message you quoted. Of course, that doesn't mean that's why you're getting
it, but I thought it might help to see the repository entry for something that's known to work.
<Item IDString="D:\Delphi\Code\SO\Devex\DM1" CreatorIDString="BorlandDelphiRepositoryCreator">
<Name Value="AAADataModule"/>
<Icon Value=""/>
<Description Value="MA datamodule"/>
<Author Value="MA"/>
<Personality Value="Delphi.Personality"/>
<Platforms Value=""/>
<Frameworks Value=""/>
<Identities Value="RADSTUDIO"/>
<Categories>
<Category Value="InternalRepositoryCategory.MyCategory" Parent="Borland.Delphi.NewFiles">MyCategory</Category>
<Category Value="Borland.Delphi.NewFiles" Parent="Borland.Delphi.New">Delphi Files</Category>
<Category Value="Borland.Delphi.New" Parent="Borland.Root">Delphi Projects</Category>
</Categories>
<Type Value="FormTemplate"/>
<Ancestor Value=""/>
<FormName Value=""/>
<Designer Value="Any"/>
</Item>
If this doesn't help, best I can suggest is to post your q in the IDE section
of EMBA's newsgroups here:
https://forums.embarcadero.com/forum.jspa?forumID=62
I don't think that should provoke cross-posting complaints, seeing as your q has been up here for a while without getting a definitive answer.

How to provide an empty Source in xslTransformer.transform() method?

I have an xslt 2.0 file which is being used to transform a csv file to an xml file. The xsl has been taken from here:
http://p2p.wrox.com/xslt/40898-transform-csv-file-xml.html#post164344
Now I am trying to execute this through Java transformer (using the Saxon9 xsl transformer factory). Since the csv file is being passed into the xsl as a parameter, there is no need for me to pass anything in the Source parameter in the transform method. Since the javadocs for the transform method state the following:
The javadocs for the Transformer.transform method clearly state that the following:
"An empty Source is represented as an empty document as constructed by DocumentBuilder.newDocument(). The result of transforming an empty Source depends on the transformation behavior; it is not always an empty Result."
I tried to create an empty document and try the transformation as seen below:
TransformerFactory transformerFactory = TransformerFactory.newInstance("net.sf.saxon.TransformerFactoryImpl",null);
Source xsltSource = new StreamSource("file:///C:/my.xsl");
Transformer xsltTransformer = transformerFactory.newTransformer(xsltSource);
xsltTransformer.setParameter("pathToCSV", "'file:///C:/input.csv'");
StringWriter writer = new StringWriter();
xsltTransformer.transform(new DOMSource(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument()), new StreamResult(writer));
The above piece of code does not output anything and does not work as expected since I think the empty document given as input is taken into consideration rather than the csv file passed in the following line in the xsl:
<xsl:param name="pathToCSV" />
<xsl:variable name="input" select="unparsed-text($pathToCSV)"/>
Could anyone give me pointers on how to accomplish what I am trying to achieve?
Consider to use the Saxon API http://saxonica.com/documentation/html/using-xsl/embedding/s9api-transformation.html and not to use the JAXP API if you want to use XSLT 2.0 features like starting with a named template as the XSLT you linked to requires. Or, if you want to use JAXP with an empty dummy document you at least need to add a template doing
<xsl:template match="/">
<xsl:call-template name="main"/>
</xsl:template>

PartCover browser not opening code files

We're generating PartCover reports via the command line tool along with our CruiseControl.Net unit tests. This generates an xml file that displays the results nicely on the cruisecontrol dashboard. The xslt transforms that are included only show you the percentage of coverage in an individual class. We want to know exactly what lines are not being covered. The problem ist when we open the report in the PartCover browser and double click a method it doesn't show us our cs files. I know the PartCover browser is capable of showing you the files because of the following.
Here's a screenshot of PartCover browser with the lines of code showing: http://kjkpub.s3.amazonaws.com/blog/img/partcover-browse.png.
The information looks like it should be available to the browser because the report contains this:
<Method name="get_DeviceType" sig="Cathexis.IDBlue.DeviceType ()" bodysize="19" flags="0" iflags="0">
<pt visit="2" pos="0" len="1" fid="82" sl="35" sc="13" el="35" ec="14" />
<pt visit="2" pos="1" len="4" fid="82" sl="36" sc="17" el="36" ec="39" />
<pt visit="2" pos="5" len="2" fid="82" sl="37" sc="13" el="37" ec="14" />
</Method>
and this:
<File id="66" url="D:\sandbox\idblue\idblue\trunk\software\code\driver\dotnet\Common\AsyncEventQueue.cs" />
All I want to be able to do is view what lines of code are not being covered in my test cases without having to figure out what the xml above is trying to tell me.
Thanks to anyone in advance who replies.
I figured out why the cs files were not displaying. The paths were incorrect in the xml file because our test project was being built on a different machine than the one partcover was on. (partcover must generate the .cs file paths from pdb files maybe?) Once I search and replaced the file switching the base directory of our subversion location to the one on the other machine all was well.

Resources