TStringList splitting bugs - delphi

Recently I've been informed by a reputable SO user, that TStringList has splitting bugs which would cause it to fail parsing CSV data. I haven't been informed about the nature of these bugs, and a search on the internet including Quality Central did not produce any results, so I'm asking. What are TStringList splitting bugs?
Please note, I'm not interested in unfounded opinion based answers.
What I know:
Not much... One is that, these bugs show up rarely with test data, but not so rarely in real world.
The other is, as stated, they prevent proper parsing of CSV. Thinking that it is difficult to reproduce the bugs with test data, I am (probably) seeking help from whom have tried using a string list as a CSV parser in production code.
Irrelevant problems:
I obtained the information on a 'Delphi-XE' tagged question, so failing parsing due to the "space character being considered as a delimiter" feature do not apply. Because the introduction of the StrictDelimiter property with Delphi 2006 resolved that. I, myself, am using Delphi 2007.
Also since the string list can only hold strings, it is only responsible for splitting fields. Any conversion difficulty involving field values (f.i. date, floating point numbers..) arising from locale differences etc. are not in scope.
Basic rules:
There's no standard specification for CSV. But there are basic rules inferred from various specifications.
Below is demonstration of how TStringList handles these. Rules and example strings are from Wikipedia. Brackets ([ ]) are superimposed around strings to be able to see leading or trailing spaces (where relevant) by the test code.
Spaces are considered part of a field and should not be ignored.
Test string: [1997, Ford , E350]
Items: [1997] [ Ford ] [ E350]
Fields with embedded commas must be enclosed within double-quote characters.
Test string: [1997,Ford,E350,"Super, luxurious truck"]
Items: [1997] [Ford] [E350] [Super, luxurious truck]
Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.
Test string: [1997,Ford,E350,"Super, ""luxurious"" truck"]
Items: [1997] [Ford] [E350] [Super, "luxurious" truck]
Fields with embedded line breaks must be enclosed within double-quote characters.
Test string: [1997,Ford,E350,"Go get one now
they are going fast"]
Items: [1997] [Ford] [E350] [Go get one now
they are going fast]
In CSV implementations that trim leading or trailing spaces, fields with such spaces must be enclosed within double-quote characters.
Test string: [1997,Ford,E350," Super luxurious truck "]
Items: [1997] [Ford] [E350] [ Super luxurious truck ]
Fields may always be enclosed within double-quote characters, whether necessary or not.
Test string: ["1997","Ford","E350"]
Items: [1997] [Ford] [E350]
Testing code:
var
SL: TStringList;
rule: string;
function GetItemsText: string;
var
i: Integer;
begin
for i := 0 to SL.Count - 1 do
Result := Result + '[' + SL[i] + '] ';
end;
procedure Test(TestStr: string);
begin
SL.DelimitedText := TestStr;
Writeln(rule + sLineBreak, 'Test string: [', TestStr + ']' + sLineBreak,
'Items: ' + GetItemsText + sLineBreak);
end;
begin
SL := TStringList.Create;
SL.Delimiter := ','; // default, but ";" is used with some locales
SL.QuoteChar := '"'; // default
SL.StrictDelimiter := True; // required: strings are separated *only* by Delimiter
rule := 'Spaces are considered part of a field and should not be ignored.';
Test('1997, Ford , E350');
rule := 'Fields with embedded commas must be enclosed within double-quote characters.';
Test('1997,Ford,E350,"Super, luxurious truck"');
rule := 'Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.';
Test('1997,Ford,E350,"Super, ""luxurious"" truck"');
rule := 'Fields with embedded line breaks must be enclosed within double-quote characters.';
Test('1997,Ford,E350,"Go get one now'#10#13'they are going fast"');
rule := 'In CSV implementations that trim leading or trailing spaces, fields with such spaces must be enclosed within double-quote characters.';
Test('1997,Ford,E350," Super luxurious truck "');
rule := 'Fields may always be enclosed within double-quote characters, whether necessary or not.';
Test('"1997","Ford","E350"');
SL.Free;
end;
If you've read it all, the question was :), what are "TStringList splitting bugs?"

Not much... One is that, these bugs show up rarely with test data, but not so rarely in real world.
All it takes is one case. Test data is not random data, one user with one failure case should submit the data and voilà, we've got a test case. If no one can provide test data, maybe there's no bug/failure?
There's no standard specification for CSV.
That one sure helps with the confusion. Without a standard specification, how do you prove something is wrong? If this is left to one's own intuition, you might get into all kinds of troubles. Here's some from my own happy interaction with government issued software; My application was supposed to export data in CSV format, and the government application was supposed to import it. Here's what got us into a lot of trouble several years in a row:
How do you represent empty data? Since there's no CSV standard, one year my friendly gov decided anything goes, including nothing (two consecutive commas). Next they decided only consecutive commas are OK, that is, Field,"",Field is not valid, should be Field,,Field. Had a lot of fun explaining to my customers that the gov app changed validation rules from one week to the next...
Do you export ZERO integer data? This was probably an bigger abuse, but my "gov app" decided to validate that also. At one time it was mandatory to include the 0, then it was mandatory NOT to include the 0. That is, at one time Field,0,Field was valid, next Field,,Field was the only valid way...
And here's an other test-case where (my) intuition failed:
1997, Ford, E350, "Super, luxurious truck"
Please note the space between , and "Super, and the very lucky comma that follows "Super. The parser employed by TStrings only sees the quote char if it immediately follows the delimiter. That string is parsed as:
[1997]
[ Ford]
[ E350]
[ "Super]
[ luxurious truck"]
Intuitively I'd expect:
[1997]
[ Ford]
[ E350]
[Super luxurious truck]
But guess what, Excel does it the same way Delphi does it...
Conclusion
TStrings.CommaText is fairly good and nicely implemented, at least the Delphi 2010 version I looked at is quite effective (avoids multiple string allocations, uses a PChar to "walk" the parsed string) and works about the same as Excel's parser does.
In the real world you'll need to exchange data with other software, written using other libraries (or no libraries at all), where people might have miss-interpreted some of the (missing?) rules of CSV. You'll have to adapt, and it'll probably not be a case of right-or-wrong but a case of "my clients need to import this crap". If that happens, you'll have to write your own parser, one that adapts to the requirements of the 3rd party app you'd be dealing with. Until that happens, you can safely use TStrings. And when it does happen, it might not be TString's fault!

I'm going to go out on a limb and say that the most common failure case is the embedded linebreak. I know most of the CSV parsing I do ignores that. I'll use 2 TStringLists, 1 for the file I'm parsing, the other for the current line. So I'll end up with code similar to the following:
procedure Foo;
var
CSVFile, ALine: TStringList;
s: string;
begin
CSVFile := TStringList.Create;
ALine := TStringList.Create;
ALine.StrictDelimiter := True;
CSVFile.LoadFromFile('C:\Path\To\File.csv');
for s in CSVFile do begin
ALine.CommaText := s;
DoSomethingInteresting(ALine);
end;
end;
Of course, since I'm not taking care to make sure that each line is "complete", I can potentially run into cases where the input contains a quoted linebreak in a field and I miss it.
Until I run into real world data where it's an issue, I'm not going to bother fixing it. :-P

Another example... this TStringList.CommaText bug exists in Delphi 2009.
procedure TForm1.Button1Click(Sender: TObject);
var
list : TStringList;
begin
list := TStringList.Create();
try
list.CommaText := '"a""';
Assert(list.Count = 1);
Assert(list[0] = 'a');
Assert(list.CommaText = 'a'); // FAILS -- actual value is "a""
finally
FreeAndNil(list);
end;
end;
The TStringList.CommaText setter and related methods corrupt the memory of the string that holds the a item (its null terminator character is overwritten by a ").

Already tried use TArray<String> split?
var
text: String;
arr: TArray<String>;
begin
text := '1997,Ford,E350';
arr := text.split([',']);
So arr would be:
arr[0] = 1997;
arr[1] = Ford;
arr[2] = E350;

Related

Searching for words in MS Word Document from Delphi with RegEx and import to Delphi app

I am working with our lab report system and want to automate some of the tasks. The system we use is not intuitive and uses word documents to enter data. There are several paragraphs with headings (protected headings).
I want to copy a phrase in one of the paragraphs and paste it into another paragraph using a Delphi app
GetActiveOleObject('Word.Application');
How can I use a RegEx for that. The good thing is the searchable phrases I want to copy are in uppercase while everything else is sentence case. example:
3rd paragraph heading:---> Receiver Notes <---- this is not editable in the document (protected)
the specimen is received in CONTAINER OF FORMALIN at this workstation
the specimen is received FRESH WITH NO FIXATIVE at this workstation
my result has to be something like:
4th paragraph heading --->Methods of Receiving <------ protected again
CONTAINER OF FORMALIN <----- here is where I want to paste from the first match
FRESH WITH NO FIXATIVE <----- and here the second match … etc
So my feeling is to have a delphi code to search between paragraph heading "Receiver Note" and "Methods of Receiving" for those in upper case and list them in the next paragraph.
I use delphi xe3 and I know how to use regex with other files but not in word using delphi. Any input, code snippets, examples, etc would be much appreciated!
Ok I finally got this to work and I am posting the code if incase someone needs this. I had to copy the document to my delphi Memo and work it there with regex and then paste it back where I want. Although the process may seem cumbersome, it executes very fast. The word documents I work with are usually one or two pages anyways.
procedure TForm1.Button1Click(Sender: TObject);
var
DXRANGE, DXWORD: OleVariant;
n : Integer;
regexpr: TRegEx;
Match: TMatch;
begin
try
DXWORD := GetActiveOleObject('Word.Application');
DXRANGE := DXWORD.Documents.Item(1)
.Range(DXWORD.Documents.Item(1).Range.Start, DXWORD.Documents.Item(1)
.Range.End);
DXRANGE.Copy;
Memo1.Clear;
Memo1.PasteFromClipBoard;
regexpr := TRegEx.Create('\b[A-Z][A-Z][A-Z]+(?:\s+[A-Z]+)*\b');
Match := regexpr.Match(Memo1.Text);
n := 1;
Memo2.Clear;
while Match.Success do
begin
Memo2.Lines.Add(IntToStr(n) + Match.Value);
Memo2.Lines.Add('');
Match := Match.NextMatch;
n := n + 1;
end;
Memo2.SelectAll;
Memo2.CopyToClipboard;
DXWORD.Selection.PasteSpecial(wdPasteRTF)
except
on E: exception do
begin
ShowMessage(E.Message);
end;
end;
end;
As a general rule when working with Word (or any office app) and ActiveX Delphi component, is to use the amazing macro recorder to see how it would do it.
eg.
Open your word document
Select [Record Macro] from the tools menu
Do your search
Copy it to the clipboard
Replace your code
Do Whatever else you need to do.
Stop Macro
Now open up the macro VBA organiser and look at the code VBA has generated for what you did. This will give you a very good idea of the functions you need to get your delphi code to call.

Error because of quote char after converting file to string with Delphi XE?

I have incorrect result when converting file to string in Delphi XE. There are several ' characters that makes the result incorrect. I've used UnicodeFileToWideString and FileToString from http://www.delphidabbler.com/codesnip and my code :
function LoadFile(const FileName: TFileName): ansistring;
begin
with TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) do
begin
try
SetLength(Result, Size);
Read(Pointer(Result)^, Size);
// ReadBuffer(Result[1], Size);
except
Result := '';
Free;
end;
Free;
end;
end;
The result between Delphi XE and Delphi 6 is different. The result from D6 is correct. I've compared with result of a hex editor program.
Your output is being produced in the style of the Delphi debugger, which displays string variables using Delphi's own string-literal format. Whatever function you're using to produce that output from your own program has actually been fixed for Delphi XE. It's really your Delphi 6 output that's incorrect.
Delphi string literals consist of a series of printable characters between apostrophes and a series of non-printable characters designated by number signs and the numeric values of each character. To represent an apostrophe, write two of them next to each other. The printable and non-printable series of characters can be written right not to each other; there's no need to concatenate them with the + operator.
Here's an excerpt from the output you say is correct:
#$12'O)=ù'dlû'#6't
There are four lone apostrophes in that string, so each one either opens or closes a series of printable characters. We don't necessarily know which is which when we start reading the string at the left because the #, $, 1, and 2 characters are all printable on their own. But if they represent printable characters, then the 0, ), =, and ù characters are in the non-printable region, and that can't be. Therefore, the first apostrophe above opens a printable series, and the #$12 part represents the character at code 18 (12 in hexadecimal). After the ù is another apostrophe. Since the previous one opened a printable string, this one must close it. But the next character after that is d, which is not #, and therefore cannot be the start of a non-printable character code. Therefore, this string from your Delphi 6 code is mal-formed.
The correct version of that excerpt is this:
#$12'O)=ù''dlû'#6't
Now there are three lone apostrophes and one set of doubled apostrophes. The problematic apostrophe from the previous string has been doubled, indicating that it is a literal apostrophe instead of a printable-string-closing one. The printable series continues with dlû. Then it's closed to insert character No. 6, and then opened again for t. The apostrophe that opens the entire string, at the beginning of the file, is implicit.
You haven't indicated what code you're using to produce the output you've shown, but that's where the problem was. It's not there anymore, and the code that loads the file is correct, so the only place that needs your debugging attention is any code that depended on the old, incorrect format. You'd still do well to replace your code with that of Robmil since it does better at handling (or not handling) exceptions and empty files.
Actually, looking at the real data, your problem is that the file stores binary data, not string data, so interpreting this as a string is not valid at all. The only reason it works at all in Delphi 6 is that non-Unicode Delphi allows you to treat binary data and strings the same way. You cannot do this in Unicode Delphi, nor should you.
The solution to get the actual text from within the file is to read the file as binary data, and then copy any values from this binary data, one byte at a time, to a string if it is a "valid" Ansi character (printable).
I will suggest the code:
function LoadFile(const FileName: TFileName): AnsiString;
begin
with TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) do
try
SetLength(Result, Size);
if Size > 0 then
Read(Result[1], Size);
finally
Free;
end;
end;

Wrong Unicode conversion, how to store accent characters in Delphi 2010 source code and handle character sets?

We are upgrading our project from Delphi 2006 to Delphi 2010. Old code was:
InputText: string;
InputText := SomeTEditComponent.Text;
...
for i := 1 to length(InputText) do
if InputText[i] in ['0'..'9', 'a'..'z', 'Ř' { and more special characters } ] then ...
Trouble is with accent letters - compare will fail.
I tried switch source code from ANSI to UTF8 and LE UCS-2 but without luck. Only cast as AnsiChar works:
if CharInSet(AnsiChar(InputText[i]), ['0'..'9', 'a'..'z', 'Ř']) then
Funny is how Delphi works with that letters - try this in Evaluate during debugging:
Ord('Ř') = Ord('Ø')
(yes, Delphi says True, on Windows 7 Czech)
Question is: How can I store and compare simple strings without forcing them as AnsiStrings? Because if this is not working why we should use Unicode?
Thanks all for reply
Right now we are using in some parts simple CharInSet(AnsiChar(...
The declaration of CharInSet is
function CharInSet(C: AnsiChar; const CharSet: TSysCharSet): Boolean; overload; inline;
function CharInSet(C: WideChar; const CharSet: TSysCharSet): Boolean; overload; inline;
while TSysCharSet is
TSysCharSet = set of AnsiChar;
Thus CharInSet can only compare to a set of AnsiChar. That is why your accented character is converted to AnsiChar.
There is no equivalent to a set of WideChar as sets are limited to 256 elements. You have to implement some other means to check the character.
Something like
const
specials: string = 'Ř';
if CharInSet(InputText[i], ['0'..'9', 'a'..'z']) or (Pos(InputText[I], specials) > 0) then
might be a try. You can add more characters to specials as needed.
Don't rely on the encoding of your Delphi source code files.
It might be mangled when using any non-Unicode tool to work on your text files (or even buggy Unicode aware tools).
The best way is to specify your characters as a 4-digit Unicode code point.
const
MyEuroSign = #$20AC;
See also my blog posting about this.
As mentioned by Uwe Raabe, the problem with Unicode char is, they're pretty large. If Delphi allowed you to create an "set of Char" it would be 8 Kb in size! An "set of AnsiChar" is only 32 bytes in size, pretty manageable.
I'd like to offer some alternatives. First is a sort of drop-in replacement for the CharInSet function, one that uses an array of CHAR to do the tests. It's only merit is that it can be called immediately from almost anywhere, but it's benefits stop there. I'd avoid this if I can:
function UnicodeCharInSet(UniChr:Char; CharArray:array of Char):Boolean;
var i:Integer;
begin
for i:=0 to High(CharArray) do
if CharArray[i] = UniChr then
begin
Result := True;
Exit;
end;
Result := False;
end;
The trouble with this function is that it doesn't handle the x in ['a'..'z'] syntax and it's slow! The alternatives are faster, but aren't as close to a drop-in replacement as one might want. The first set of alternatives to be investigated are the string functions from Microsoft. Amongst them there's IsCharAlpha and IsCharAlphanumeric, they might fix lots of issues. The problem with those, all "alpha" chars are the same: You might end up with valid Alpha chars in non-enlgish non-czech languages. Alternatively you can use the TCharacter class from Embarcadero - the implementation is all in the Character.pas unit, and it looks effective, I have no idea how effective Microsoft's implementation is.
An other alternative is to write your own functions, using an "case" statement to get things to work. Here's an example:
function UnicodeCharIs(UniChr:Char):Boolean;
var i:Integer;
begin
case UniChr of
'ă': Result := True;
'ş': Result := False;
'Ă': Result := True;
'Ş': Result := False;
else Result := False;
end;
end;
I inspected the assembler generated for this function. While Delphi has to implement a series of "if" conditions for this, it does it very effectively, way better then implementing the series of IF statements from code. But it could use a lot of improvement.
For tests that are used ALOT you might want to look for some bit-mask based implementation.
You should either use IFs instead of IN or find a WideCharSet implementation. This might help if you have a lot of sets: http://code.google.com/p/delphilhlplib/source/browse/trunk/Library/src/Extensions/DeHL.WideCharSet.pas.
You have stumbled onto a case where an idiom from Pre-Unicode Pascal should not be translated directly into the most visually similar idiom in Unicode era pascal.
First, let's deal with unicode string literals. If you can always be sure you will never have any body ever use your source code with any tool that could mess up your encodings
then you could use Unicode literals. Personally, I would not like to see Unicode codepoints in string literals in any of my code, for various reasons, the strongest reason being that my code may need to be reviewed for internationalization at some point, and having literals that belong to your local language peppered through your code is even more of a problem when you use a language other than those which use the simple Ascii/Ansi codepage symbols. Your source code will be more readable if you keep in mind the assumption that your accented characters, and even non-accented character literals would be better declared, as Jeroen says to declare them, in the const section, away from your actual place in the code that you use them.
Consider the case where you use the same string literal thirty three times throughout your code. Why should it be repeated instead of a constant? And even when it is used only once, isn't the code more readable if you declare a sane constant name?
So, first you should declare constants like he shows.
Second, the CharInSet function is deprecated for all uses other than the use it was intended for which is where you must continue to use the "Set of AnsiChar" types. This is no longer a recommended approach in Delphi 2009/2010, and using arrays of literal unicode characters, in your constant section, would be more readable, and more up-to-date.
I suggest you use the JCL StrContainsChars function and avoid character sets, since
you can not declare an inline SET of Unicode Characters at all, the language does not allow it. Instead use this, and be sure to comment it:
implementation
uses
JclStrings;
const
myChar1 = #$2001;
myChar2 = #$2002;
myChar3 = #$2003;
myMatchList1 : Array[0..2] of Char = (myChar1,myChar2,myChar3);
function Match(s:String):Boolean;
begin
result := StrContainsChars( s, myMatchList1,false);
end;
String, and Character Literals are bad to have peppering your code, especially character or numeric literals, are called "Magic values" and are to be avoided.
P.S. Your debug assertion shows that Ord('?') is downcasting the unicode character quietly to an AnsiChar byte-size character in the debugger. This behaviour is unexpected and should probably logged in QC.

Convert Hi-Ansi chars to Ascii equivalent (é -> e)

Is there a routine available in Delphi 2007 to convert the characters in the high range of the ANSI table (>127) to their equivalent ones in pure ASCII (<=127) according to a locale (codepage)?
I know some chars cannot translate well but most can, esp. in the 192-255 range:
À → A
à → a
Ë → E
ë → e
Ç → C
ç → c
– (en dash) → - (hyphen - that can be trickier)
— (em dash) → - (hyphen)
WideCharToMultiByte does best-fit mapping for any characters that aren't supported by the specified character set, including stripping diacritics. You can do exactly what you want by using that and passing 20127 (US-ASCII) as the codepage.
function BestFit(const AInput: AnsiString): AnsiString;
const
CodePage = 20127; //20127 = us-ascii
var
WS: WideString;
begin
WS := WideString(AInput);
SetLength(Result, WideCharToMultiByte(CodePage, 0, PWideChar(WS),
Length(WS), nil, 0, nil, nil));
WideCharToMultiByte(CodePage, 0, PWideChar(WS), Length(WS),
PAnsiChar(Result), Length(Result), nil, nil);
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
ShowMessage(BestFit('aÀàËëÇç–—€¢Š'));
end;
Calling that with your examples produces results you're looking for, including the emdash-to-minus case, which I don't think is handled by Jeroen's suggestion to convert to Normalization form D. If you did want to take that approach, Michael Kaplan has a blog post the explicitly discusses stripping diacritics (rather than normalization in general), but it uses C# and an API that was introduces in Vista. You can get something similar using the FoldString api (any WinNT release).
Of course if you're only doing this for one character set, and you want to avoid the overhead from converting to and from a WideString, Padu is correct that a simple for loop and a lookup table would be just as effective.
Just to extend Craig's answer for Delphi 2009:
If you use Delphi 2009 and newer, you can use a more readable code with the same result:
function OStripAccents(const aStr: String): String;
type
USASCIIString = type AnsiString(20127);//20127 = us ascii
begin
Result := String(USASCIIString(aStr));
end;
Unfortunately, this code does work only on MS Windows. On Mac, the accents are not replaced by best-fitted characters but by question marks.
Obviously, Delphi internally uses WideCharToMultiByte on Windows whereas on Mac iconv is used (see LocaleCharsFromUnicode in System.pas).
The question is if this different behaviour on different OS should be considered as bug and reported to CodeCentral.
I believe your best bet is creating a lookup table.
What you are looking for is normalization.
Michael Kaplan wrote a nice blog article about normalization.
It does not immediately solve your problem, but points you in the right direction.
--jeroen

Delphi 2009 RawByteString vagaries

Suppose that for some perverse reason you want to display the raw byte contents of a UTF8String.
var
utf8Str : UTF8String;
begin
utf8Str := '€ąćęłńóśźż';
end;
(1) This doesn't do, it displays the readable form:
memo1.Lines.Add( RawByteString( utf8Str ));
// output: '€ąćęłńóśźż'
(2) This, however, does "work" - note the concatenation:
memo1.Lines.Add( 'x' + RawByteString( utf8Str ));
// output: 'x€ąćęłńóśźż'
I understand (1), though the compiler's forced coerction to UnicodeString seems to prevent ever displaying a RawByteString var as-is. However, why does the behavior change in (2)?
(3) Stranger still - let's reverse the concatenation:
memo1.Lines.Add( RawByteString( utf8Str ) + 'x' );
// output: '€ąćęłńóśźżx'
I've been reading up on the newfangled string types in Delphi and thought I understood how they work, but this is a puzzle.
RawByteString only exists to minimize the number of overloads required for functions that work with various flavours of AnsiStrings with different codepage affinities.
In general, don't declare variables of type RawByteString. Don't typecast values to that type. Don't do concatenations on variables of that type. About the only things you can do are:
Declaring a parameter of this type (the original intent)
Indexing on such a parameter
Searching in such a parameter
Intelligent operations that check the actual code page of the string, using the StringCodePage function.
For example, you'll note that the StringCodePage function itself uses RawByteString as its argument type. This way, it will work with any AnsiString, rather than doing a codepage translation before passing it as an argument.
For your case, things like concatenations are largely undefined. The behaviour changed between RTM and Update 2, but when the RTL string concatenation functions receive multiple strings with different code pages, there's no easy way for it to figure out what code page should be used for the final string. That's just one reason why you shouldn't concatenate them like you do here.
You cannot add a string to a TMemo "as is". You always need to so some kind of conversion to Unicode, because that's all TMemo knows about in Delphi 2009.
If you want to pretend that your UTF8String uses code page 1252, do this:
var
utf8Str : UTF8String;
Raw: RawByteString;
begin
utf8Str := '€ąćęłńóśźż';
Raw := utf8Str;
SetCodePage(Raw, 1252, False);
Memo.Lines.Add(Raw);
end;
For more details, see my article Using RawByteString Effectively

Resources