We have been reading and writing Sticky Notes/Annotations/Comments to pdfs via an activex control in our application for a number of years. We have recently upgraded to Delphi2009 with Unicode Support. The following is causing problems.
When we call
CAcroPDAnnot.GetContents
The results seem to be rather strange and we lose our Unicode Chars. It is not like saving as an ansi string which would usually result in returning ????? instead we get a string such as
‚És‚“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç
For a string of Japanese characters.
However if I save the comments in the pdf to a datafile via the menu in the pdf itself it is written to file as something like
0kˆL0Oeå0k˜¨ª0’0r0D0_0‰
The latter can be export and reimported into an acrobat pdf and will recreate the correct unicode characters. However once I call CAcroPDAnnot.GetContents in my code it is coming back as something else.
Is CAcroPDAnnot.GetContents broken?
Is there an encoding scheme I should be aware of?
Is there an alternative I might be able to do?
Thanks
‚És‚“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç
That's the string:
に行く日に風邪をひいたら
in CP-932 aka Shift-JIS encoding, an awful but lamentably still-popular encoding in Japan.
You're currently interpreting it in as CP-1252 (Windows Western European). If your PDF-reading component won't convert it for you automatically, you'll need to find a way to detect what encoding the document is in and convert it manually.
I don't know what Delphi provides for reading encodings, but have you got the encodings for Shift-JIS installed in Windows, from the Control Panel -> Regional Options -> "Install files for East Asian languages" option? If not, that might explain why it'd be failing to convert automatically, perhaps.
You're not exactly giving us a lot of information to work with.
I take it you're talking about the "Acrobat.CAcroPDAnnot" class' method GetContents here. Which version of Acrobat are you using? Have you perhaps switched versions (or run an update) around the time you started programming with Delphi 2009?
Then: how did you instantiate the object? If using a *_TLB.pas file generated from the DLL, are you certain it still matches it? (Try re-generating it, if uncertain).
Third: how are you calling the method? What type of variable are you assigning the result to?
What might also help, is if you could provide a sample of an annotation (preferably including non-ASCII chars); and for that annotation:
what it should look like (and what it does look like inside Reader)
what it returns when using a pre-2009 version of Delphi*
what it returns when using Delphi 2009*
(* preferably the HEX byte codes of the (ansi/wide)strings; but output from the Ctrl-F7 inspector should do)
Then maybe someone could provide a more meaningful answer.
Ok, one of the main differences between Delphi 2009 and the earlier versions is that the default string type is an unicode string. That means that if you use the same ActiveX component as in previous versions, you are passing unicode strings to ascii strings and that is usually not a good idea.
There are a couple of solutions for this problem:
Try if you can upgrade your activeX component so that it supports full unicode strings.
Use AnsiString and not string to communicate with the activeX component. In this case, you can still use the old interface, but you are still bound to the same limitations.
Use an other control that creates pdf. There is a lot to find, but be prepared to change a big chunk of your software. (Some controls are XML based and use encoding. )
Related
I try to convert a code from Delphi 7 to delphi xe8 and I cannot find a solution to the following case.
Our old application creates a txt file which first row is like that
±HEADER―ID°N1799―USER_ID°N1―PATH_NAME―R_DATABASE°TC:\DATA―R_SERVER°TTEST_SRV―R_COMPUTER°TMYPC―
Char(―) is chr(175).
We tried to read the already created file from our new application with Delphi xe8 like that:
StrData := TStringList.Create;
StrData.LoadFromFile(sFile);
StrData.Text returns the desired text but chr(175) is replaced with chr(8213).
In order to go on I did the followings:
StrData.LoadFromFile(sFile,TEncoding.ANSI);
StrData.Text := StringReplace(StrData.Text,Chr(8213),Chr(175),[rfReplaceAll]);
What I cannot solve is the opposite case.
I have to create that file from Delphi xe8 so as it would be exactly the same with the one produced from the old delphi 7 application.
At the beginning I used the same code we had:
StrData.SavetoFile(sFile); //returns text but chr(175) is replaced with (?)
Also i tried all encodings with no results.
StrData.SavetoFile(sFile,Ansi);//returns text but chr(175) is replaced with (?) etc.
The same results also when converts the code to TStreamFile or textfile.
base64 encode files
Old one - Correct one (StrData.SavetoFile(sFile)) //Delphi 7
wrFIRUFERVLigJVJRMKwTjE4NjbigJVSSUTCsE4zNjHigJVDU0lURV9JRMKwTjHigJVSU0lURV9JRMKwTjEwMeKAlVNTSVRFX0lEwrBOMeKAlVRSTl9EQVRFwrBENDIzODIuNjA2NzkzOTgxNeKAlVVTRVJfSUTCsE4x4oCVUEFUSF9OQU1FwrBUXFxkZWxwaGkyMDEycjJcQkVORUZJVF9URVNUXFBBX09GRklDRV9WU0xcVFJBTlNGRVJcRVhQT1JU4oCVRklMRV9OQU1FwrBUQU5BRDM2MU0udHh04oCVRklMRV9UWVBFwrBO4oCVUENLX1NFTkRFUsKwVEFkbWlu4oCVUENLX05PVEVTwrBU4oCVX1JWX0lEMcKwTuKAlVNWX0lEwrBO4oCVUlZfSUTCsE7igJVSREJfSUTCsE4xMeKAlVNEQl9JRMKwTjEx4oCVUENLX0NOWFTCsFQwMywxNSw3MiwwMOKAlUtFWUlEwrBUezczRDIwMDU3LTM3NTgtNDlDMi05NTlGLTA4QzYxMDY4NEZGNn3igJVGTF9UWVBFwrBOMuKAlUZMX1NUQVRVU8KwTjDigJVTVEFSVF9EVMKwRDQyMzgyLjYwNjQ5MzA1NTbigJVSX1NUQVJUX0RUwrBE4oCVUl9FTkRfRFTCsETigJVSX1VTRVLCsFTigJVSX1BBVEjCsFTigJVTSVpFX1BDS8KwTuKAlVNJWkVfREFUQcKwTuKAlVNJWkVfQVRDSMKwTuKAlVNJWkVfRE9DU8KwTuKAlURBVEVfSU7CsETigJVSX0RBVEFCQVNFwrBU4oCVUl9TRVJWRVLCsFTigJVSX0NPTVBVVEVSwrBU4oCV
StrData.SavetoFile(sFile,Tencoding.Ansi); & StrData.SavetoFile(sFile); //XE8
wrFIRUFERVI/SUTCsE4xODY3P1JJRMKwTjM2Mj9DU0lURV9JRMKwTjE/UlNJVEVfSUTCsE4xMDE/U1NJVEVfSUTCsE4xP1RSTl9EQVRFwrBENDIzODIuNzA3NzA4MzMzMz9VU0VSX0lEwrBOMT9QQVRIX05BTUXCsFRcXGRlbHBoaTIwMTJyMlxCRU5FRklUX1RFU1RcUEFfT0ZGSUNFX1ZTTFxUUkFOU0ZFUlxFWFBPUlQ/RklMRV9OQU1FwrBUQU5BRDM2Mk0udHh0P0ZJTEVfVFlQRcKwTj9QQ0tfU0VOREVSwrBUQWRtaW4/UENLX05PVEVTwrBOP19SVl9JRDHCsE4/U1ZfSUTCsE4/UlZfSUTCsE4/UkRCX0lEwrBOMTE/U0RCX0lEwrBOMTE/UENLX0NOWFTCsFQwMywxNSw3MiwwMD9LRVlJRMKwVHtGQzM1N0QyNC1EQjRBLTRBOUMtQkE3My0xQ0FBMEVFRDUzOUJ9P0ZMX1RZUEXCsE4yP0ZMX1NUQVRVU8KwTjA/U1RBUlRfRFTCsEQ0MjM4Mi43MDcyNjg1MTg1P1JfU1RBUlRfRFTCsEQ/Ul9FTkRfRFTCsEQ/Ul9VU0VSwrBUP1JfUEFUSMKwVD9TSVpFX1BDS8KwTj9TSVpFX0RBVEHCsE4/U0laRV9BVENIwrBOP1NJWkVfRE9DU8KwTj9EQVRFX0lOwrBEP1JfREFUQUJBU0XCsFQ/Ul9TRVJWRVLCsFQ/Ul9DT01QVVRFUsKwVD8=
StrData.SavetoFile(sFile,Tencoding.UTF8); //XE8
wrFIRUFERVLCr0lEwrBOMTg3MMKvUklEwrBOMzY1wq9DU0lURV9JRMKwTjHCr1JTSVRFX0lEwrBOMTAxwq9TU0lURV9JRMKwTjHCr1RSTl9EQVRFwrBENDIzODIuNzIyODEyNcKvVVNFUl9JRMKwTjHCr1BBVEhfTkFNRcKwVFxcZGVscGhpMjAxMnIyXEJFTkVGSVRfVEVTVFxQQV9PRkZJQ0VfVlNMXFRSQU5TRkVSXEVYUE9SVMKvRklMRV9OQU1FwrBUQU5BRDM2NU0udHh0wq9GSUxFX1RZUEXCsE7Cr1BDS19TRU5ERVLCsFRBZG1pbsKvUENLX05PVEVTwrBOwq9fUlZfSUQxwrBOwq9TVl9JRMKwTsKvUlZfSUTCsE7Cr1JEQl9JRMKwTjExwq9TREJfSUTCsE4xMcKvUENLX0NOWFTCsFQwMywxNSw3MiwwMMKvS0VZSUTCsFR7ODE5REQ0NDQtQzEwQi00MTY1LUFEQjAtQkI2NDAyRjA3NUI4fcKvRkxfVFlQRcKwTjLCr0ZMX1NUQVRVU8KwTjDCr1NUQVJUX0RUwrBENDIzODIuNzIyNTIzMTQ4McKvUl9TVEFSVF9EVMKwRMKvUl9FTkRfRFTCsETCr1JfVVNFUsKwVMKvUl9QQVRIwrBUwq9TSVpFX1BDS8KwTsKvU0laRV9EQVRBwrBOwq9TSVpFX0FUQ0jCsE7Cr1NJWkVfRE9DU8KwTsKvREFURV9JTsKwRMKvUl9EQVRBQkFTRcKwVMKvUl9TRVJWRVLCsFTCr1JfQ09NUFVURVLCsFTCrw==
Any ideas?
The file saved by your Delphi 7 program is UTF-8 encoded. I decoded the base64 that you supplied and look at it in a hex editor. It looks like this:
The first two bytes are C2 B1. That is the UTF-8 encoding for ±. You can check that here: https://mothereff.in/utf-8.
Use LoadFromFile(..., TEncoding.UTF8) to load the file, and SaveToFile(..., TEncoding.UTF8) to save it. That's all you need to do. Note that when you save this way then a BOM will be included in the file. If that is not desired then it possible to omit the BOM, as has been covered here before.
Do note that you must remove the call to StringReplace. That modifies the text and serves no useful purpose. You absolutely do not wish to replace U+2015 ― with U+00AF ¯.
Based on the comments to this answer it seems that you have some Delphi 7 code that produced UTF-8 encoded text which behaves incorrectly when executed by Delphi XE8. That's not surprising due to the change from ANSI to UTF-16. You will need to revisit this code and adapt it appropriately. It's impossible for us to say more given the fact the only you have this code.
It feels very much as though you are trying things almost at random and hoping for a quick fix. That is not productive. You will only make progress with a clear understanding of Unicode, and your program. You will need to step back, slow down, and fill in the gaps in your knowledge.
Currently I am trying to set application name using
net.rim.blackberry.api.homescreen.HomeScreen.setName("これはある");
but it throws exception: IllegalArgumentException.
Can anyone provide the solution?
I am using Blackberry JDE 5.0.
This is probably a string encoding problem. Try
new String(new String("これはある").getBytes("UTF-16BE"), "UTF-16BE");
It's not pretty but I think that will work.
Here's a link to the Blackberry string spec: http://www.blackberry.com/developers/docs/5.0.0api/java/lang/String.html
By default it's ISO-8859-1 which does not include Japanese characters.
The problem you are facing is how to get a string represented in your source code into your application with the same characters. For latin characters, this is pretty straightforward, as we can just put the characters in quotes, and get a string, like "Hello world"
When you go to non-latin, like Japanese, it gets harder. You can still directly write Japanese in your source code, but you need to make sure your editor and your compiler agree on an encoding so that the characters can be interpreted correctly. The Java-SE compiler takes an argument "-encoding" which allows you to specify the encoding of your java source files.
Unfortunately, rapc, the BlackBerry compiler, does not offer an option to specify encoding, even though it is invoking javac itself. So rapc uses the platform default, which is utf-8 on Linux and OSX and iso-8859-1 on Windows.
The way around this problem is to use a feature of the Java language for parsing strings - unicode escaping. By entering the six character sequence "\u3053" in a string, the java compiler will parse that number as hexidecimal and use the corresponding unicode code point, solving problems with source file encoding.
So "Hello world" and "\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064" will result in the same strings appearing in your class files.
Because of this, Svetlin's answer from the comments is the right approach here:
net.rim.blackberry.api.homescreen.HomeScreen.setName("\u3053\u308C\u306F\u3042\u308B");
My program is written in Delphi 7 and I want to avoid a Russian or a Chinese,
Korean try to use my soft because file paths contains Unicode chars and my program can t handle them yet (as long as I do not port my program on a new Delphi version supporting UNICODE).
How do I write a function detecting the "Unicode language" in Delphi 7?
A Delphi 7 program (in its VCL part) can handle Russian, Chinese or Korean characters without any problem.
If the Windows system language is properly set, the charset will match the corresponding encoding, and the file names will be able to have Unicode chars as available in this charset. In fact, default string=AnsiString is converted into Unicode when the VCL calls Windows APIs (all ....A() calls will do the conversion then call the ....W() version).
You can force the default code page (the one which will select the charset to be used) by calling code like this:
if GetThreadLocale<>LCID then // force locale settings if different
if SetThreadLocale(LCID) then
GetFormatSettings; // resets all locale-specific variables
In this case, the TFileName (=AnsiString) in the current system charset will be converted by Windows into the corresponding Unicode characters, and you'll be able to use it in your Delphi 7 application.
What you can't do with the standard VCL AnsiString use it to directly mix charsets, as you can since Delphi 2009, thanks to the new string = UnicodeString default paradigm.
PS:
Since the CharSet only involve #128..#255 chars (i.e. all with bit 7 set), if you use only #0..#127 chars, your string will be consistent whatever the current charset/codepage setting is. If you use only English chars and numbers e.g., your path will always work, whatever the charset/codepage is. But if you use non English chars, the path will only work if the charset/codepage is correctly set, which is the case for a path used by an end-user (using a TOpenDialog at runtime for instance).
Is it possible to unmangle names like these in Delphi?
If so, where do I get more information?
Example of an error message where it cannot find a certain entry in the dbrtl100.bpl
I want to know which exact function it cannot find (unit, class, name, parameters, etc).
---------------------------
myApp.exe - Entry Point Not Found
---------------------------
The procedure entry point #Dbcommon#GetTableNameFromSQLEx$qqrx17System#WideString25Dbcommon#IDENTIFIEROption could not be located in the dynamic link library dbrtl100.bpl.
---------------------------
OK
---------------------------
I know it is the method GetTableNameFromSQLEx in the Dbcommon unit (I have Delphi with the RTL/VCL sources), but sometimes I bump into apps where not all code is available for (yes, clients should always buy all the source code for 3rd party stuff, but sometimes they don't).
But say this is an example for which I do not have the code, or only the interface files (BDE.INT anyone?)
What parameters does it have (i.e. which potential overload)?
What return type does it have?
Is this mangling the same for any Delphi version?
--jeroen
Edit 1:
Thanks to Rob Kennedy: tdump -e dbrtl100.bpl does the trick. No need for -um at all:
C:\WINDOWS\system32>tdump -e dbrtl100.bpl | grep GetTableNameFromSQLEx
File STDIN:
00026050 1385 04AC __fastcall Dbcommon::GetTableNameFromSQLEx(const System::WideString, Dbcommon::IDENTIFIEROption)
Edit 2:
Thanks to TOndrej who found this German EDN article (English Google Translation).
That article describes the format pretty accurately, and it should be possible to create some Delphi code to unmangle this.
Pitty that the website the author mentions (and the email) are now dead, but good to know this info.
--jeroen
There is no function provided with Delphi that will unmangle function names, and I'm not aware of it being documented anywhere. Delphi in a Nutshell mentions that the "tdump" utility has a -um switch to make it unmangle symbols it finds. I've never tried it.
tdump -um -e dbrtl100.bpl
If that doesn't work, then it doesn't look like a very complicated scheme to unmangle yourself. Evidently, the name starts with "#" and is followed by the unit name and function name, separated by another "#" sign. That function name is followed by "$qqrx" and then the parameter types.
The parameter types are encoded using the character count of the type name followed by the same "#"-delimited format from before.
The "$" is necessary to mark the end of the function name and the start of the parameter types. The remaining mystery is the "qqrx" part. That's revealed by the article Tondrej found. The "qqr" indicates the calling convention, which in this case is register, a.k.a. fastcall. The "x" applies to the parameter and means that it's constant.
The return type doesn't need to be encoded in the mangled function name because overloading doesn't consider return types anyway.
Also see this article (in German).
I guess the mangling is probably backward-compatible, and new mangling schemes are introduced in later Delphi versions for new language features.
If you have C++Builder, check out $(BDS)\source\cpprtl\Source\misc\unmangle.c - it contains the source code for the unmangling mechanism used by TDUMP, the debugger and the linker. (C++Builder and Delphi use the same mangling scheme.)
From the Delphi 2007 source files:
function GetTableNameFromSQLEx(const SQL: WideString; IdOption: IDENTIFIEROption): WideString;
This seems to be the same version, since I also have the same .BPL in my Windows\System32 folder.
Source can be found in [Program Files folders]\CodeGear\RAD Studio\5.0\source\Win32\db
Borland/Codegear/Embarcadero has used this encoding for a while now and never gave many details about the .BPL format. I've never been very interested in them since I hate using runtime libraries in my projects. I prefer to compile them into my projects, although this will result in much bigger executables.
I have a set of compiled Delphi dcu files, without source. Is there a way to determine what types are defined inside that dcu?
To find out what's in a unit named FooUnit, type the following in your editor:
unit Test;
interface
uses FooUnit;
var
x: FooUnit.
Press Ctrl+Space at the end, and the IDE will present a list of possible completion values, which should consist primarily, if not exclusively, of type names.
You could have a look at DCU32INT, a Delphi DCU decompiler. It generates an .int file that is somehow readable but not compilable, but if you only want to determine the types defined, this could be enough.
The DCU format is undocumented, last I checked. However, there is a tool I found that might give you some basic info called DCUtoPAS. It's not very well rated on the site, but it might at least extract the types for you. There is also DCU32INT, which might help as well.
Otherwise, you might just have to open the file with a hex editor and dig around for strings.