Why can't I perform this operation:
var
data:pbyte;
x:int64;
o:pointer;
begin
o:=data+x;
end;
PChar is a pointer to char, but it receives special support from the compiler to allow pointer arithmetic to make C-like string manipulations easier in Delphi. PByte is just a plain old typed pointer, and does not receive any special attention from the compiler to allow pointer arithmetic.
In Delphi 2009, a new compiler directive was introduced ($POINTERMATH ON/OFF) which allows you to add compiler support for pointer arithmetic to your own pointer type declarations.
pbyte = ^byte;
pchar = ^char;
In old Delphi versions (prior to D2009), SizeOf(char)=SizeOf(byte), i.e., 8-bit.
In D2009 and later, char is 16-bit whereas byte remains 8-bit, so that:
SizeOf(byte)=1
SizeOf(char)=2
To allow modifying pointers by e.g. adding values etc., you can use $POINTERMATH ON (available in D2009 and later, see here). The alternative is to follow the pattern:
NewPointer:= Pointer(Integer(OldPointer)+IntegerValue)
Edit1 -- Note that (as pointed out in comments to another answer), also inc() and dec() work with typed pointers; they will increment/decrement a PMyType by SizeOf(TMyType).
Edit2 -- For future-proofing your code, you should consider that SizeOf(Pointer) will probably change in future 64-bit Delphi versions, so that the relationship SizeOf(Integer)=SizeOf(Pointer) will no longer hold. To circumvent this, recent Delphi versions define the types NativeInt and NativeUInt, which are integers that have the same size as a pointer.
Related
I'm converting an old code from Delphi 5 to XE5.
It has this piece of code:
Boolean(RecBuf[0]) := False;
RecBuf is PChar.
This works in Delphi 7, but not in XE5.
In XE5, it gives "Left side cannot assign" error.
How to implement this code in XE5?
In Delphi 7, PChar is an alias for PAnsiChar. That is, pointer to 8 bit ANSI character. Which means that RecBuf[0] has type AnsiChar. Since AnsiChar and Boolean have the same size, the cast is valid.
In Delphi XE5, PChar is an alias for PWideChar, pointer to 16 bit wide character. And so RecBuf[0] has type WideChar. Thus the cast is invalid because WideChar and Boolean have different size.
Exactly how best to fix the problem cannot be discerned from the code that you have shown here. Quite possibly you need to redeclare RecBuf. Perhaps it needs to be declared as PAnsiChar. Although one does wonder why you are casting a character to a Boolean.
Another possibility is that the reason RecBuf is declared as PChar is to allow you to use the index operator [], something that in older versions of Delphi is a special capability of pointers to character types. In modern Delphi you can use {$POINTERMATH ON} to enable that functionality for all typed pointers. So, if you did that then perhaps RecBuf should be PBoolean.
The bottom line is that whilst we can explain why the compiler complains about your code, we cannot give you a definitive solution.
Judging from RecBuf name they are used not as a pointer to string but as a pointer to buffer allocated in memory.
If my assumption correct, you may want to redeclare RecBuf as a variable of type PByte.
You can't assign cast stuff on the left, you need to cast the right side.
Instead of
Newtype(Whatever) := NewtypeConstant;
you should use
Whatever := TUnknown(constant)
In yourcase, of recbuf is an array of say, byte, then it puts false (0) in it, so it should be
Recbuf[0] := Byte(False);
Or you could, you know, just put zero in there.
Since there is 64-bit delphi compiler, we should use 64-bit pointers.
so wondering any difference if we use nativeint or nativeuint. for example,
Should I use
Pointer(NativeUInt(Pointer(Buffer)) + LongWord(datawrote))^,
or
Pointer(NativeInt(Pointer(Buffer)) + LongWord(datawrote))^,
Does it matter? which is better style?
The simplest thing to do is cast the pointer to PByte. Then you can perform arithmetic on that:
PByte(Buffer) + offset
That expression is of type PByte and so you may need to cast it back to some other pointer type.
As a general rule, pointers are not integers and you should resist the temptation to convert cast them to be integers. It is almost always best to let pointers be pointers. You can always perform pointer arithmetic on PAnsiChar, PWideChar and PByte, and for other pointer types you can use {$POINTERMATH ON} to enable pointer arithmetic.
Is there any function in freepascal to show the Unicode symbol by its code (e.g. U+1D15E)? Unfortunately Chr() works only with ANSI symbols (with codes less than 127).
I want to use symbols from custom symbolic font and it is very inconvenient to put them into sourcecode directly (they are shown in Lazarus as ? or something else because they are absent in system fonts).
Take a look at this page. I assume that Freepascal either uses UTF-16, in which it becomes a surrogate pair of two WideChars (see table) or UTF-8, in which it becomes a sequence of byte values (see table again).
UTF-8:
const
HalfNoteString = UTF8String(#$F0#$9D#$85#$9E);
UTF-16:
const
HalfNoteString = UnicodeString(#$D834#$DD5E);
The names of the string types may differ, as I don't know FreePascal very well. Perhaps AnsiString and WideString.
I have never used Free Pascal, but if I were you, I'd try
var
s: char;
begin
s := char($222b); // Just cast a word
or, if the compiler is really stubborn,
var
s: char;
begin
PWord(#s)^ := $222b; // Forcibly write a word
Current unicode status of FPC to my best knowledge
The codepage of literals can be set with $codepage http://www.freepascal.org/docs-html/prog/progsu81.html
FPC 2.4.x+ does have unicodestring (since it is +/- Kylix widestring) but only basic routine support. (pos and copy, not routines like format), but the "record" misses the codepage field.
Lazarus widgets expect UTF8 in normal ansistrings (D7..D2007 ansistrings without codepage data), and programmers must manually insert conversions if necessary. So on Windows the widgets ARE mostly using unicode (-W) calls, but take ansistrings with UTF8 in it.
FPC doesn't follow the utf8 in ansistring scheme , so for some string accepting routines in sysutils, there are special routines in Lazarus that assume UTF8 that call -W variants)
FPC ansistring is the system default 1-byte encoding. ansi on Windows, utf8 on most other platforms.
Trunk, 2.7.1, provides support for the new D2009+ ansistring (with codepages).
There has been no discussion yet how to deal with the default stringtype (e.g. will "string" be utf8string on *nix and unicodestring on Windows, or unicodestring or utf8string everywhere?)
Other unicodestring related enhancement (like encoding parameters to tstringlist.savetofile) are not implemented. Likewise for the pseudo objects (like TCharacter which are afaik mostly static)
Update: 2.7.1 has a variable encoding ansistring type, and lazarus has been fixed to keep working. Nothing is really taking advantage from it yet though, e.g. most of the RTL still uses -A calls, and prototypes of sysutils and system procedures that takes strings haven't changed to rawbytestring yet.
I assume the problem is to convert from UCS4 encoding (which is actually a Unicode codepoint number) to UTF16.
In Delphi, you can use UCS4StringToUnicodeString function.
Warning: Be careful with UCS4String type. It is actually a zero-terminated dynamic array, not a string (that means it is zero-based).
var
S1: UCS4String;
S: string;
begin
SetLength(S1, 2);
S1[0]:= UCS4Char($1D15E);
S1[1]:= UCS4Char(0);
S:= UCS4StringToUnicodeString(S1);
ShowMessage(Format('%d, %x, %x', [Length(S), Ord(S[1]), Ord(S[2])]));
end;
I had a similar question to this here: Delphi XE - should I use String or AnsiString? . After deciding that it is right to use ANSI strings in a (large) library of mine, I have realized that I can actually use RawByteString instead of ANSI. Because I mix UNICODE strings with ANSI strings, my code now has quite few places where it does conversions between them. However, it looks like if I use RawByteString I get rid of those conversions.
Please let me know your opinion about it.
Thanks.
Update:
This seems to be disappointing. It looks like the compiler still makes a conversion from RawByteString to string.
procedure TForm1.FormCreate(Sender: TObject);
var x1, x2: RawByteString;
s: string;
begin
x1:= 'a';
x2:= 'b';
x1:= x1+ x2;
s:= x1; { <------- Implicit string cast from 'RawByteString' to 'string' }
end;
I think it does some internal workings (such as copying data) and my code will not be much faster and I will still have to add lots of typecasts in my code in order to silence the compiler.
RawByteString is an AnsiString with no code page set by default.
When you assign another string to this RawByteString variable, you'll copy the code page of the source string. And this will include a conversion. Sorry.
But there is one another use of RawByteString, which is to store plain byte content (e.g. a database BLOB field content, just like an array of byte)
To summarize:
RawByteString should be used as a "code page agnostic" parameter to a method or function;
RawByteString can be used as a variable type to store some BLOB data.
If you want to reduce conversion, and would rather use 8 bit char string in your application, you should better:
Do not use the generic AnsiString type, which will depend on the current system code page, and by which you'll loose data;
Rely on UTF-8 encoding, i.e. some 8 bit code page / charset which won't loose any data when converted from or to an UnicodeString;
Don't let the compiler show warnings about implicit conversions: all conversion should be made explicit;
Use your own dedicated set of functions to handle your UTF-8 content.
That exactly what we made for our framework. We wanted to use UTF-8 in its kernel because:
We rely on UTF-8 encoded JSON for data transmission;
Memory consumption will be smaller;
The used SQLite3 engine will store text as UTF-8 in its database file;
We wanted a way of handling Unicode text with no loose of data with all versions of Delphi (from Delphi 6 up to XE), and WideString was not an option because it's dead slow and you've got the same problem of implicit conversions.
But, in order to achieve best speed, we write some optimized functions to handle our custom string type:
{{ RawUTF8 is an UTF-8 String stored in an AnsiString
- use this type instead of System.UTF8String, which behavior changed
between Delphi 2009 compiler and previous versions: our implementation
is consistent and compatible with all versions of Delphi compiler
- mimic Delphi 2009 UTF8String, without the charset conversion overhead
- all conversion to/from AnsiString or RawUnicode must be explicit }
{$ifdef UNICODE} RawUTF8 = type AnsiString(CP_UTF8); // Codepage for an UTF8string
{$else} RawUTF8 = type AnsiString; {$endif}
/// our fast RawUTF8 version of Trim(), for Unicode only compiler
// - this Trim() is seldom used, but this RawUTF8 specific version is needed
// by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString
function Trim(const S: RawUTF8): RawUTF8;
/// our fast RawUTF8 version of Pos(), for Unicode only compiler
// - this Pos() is seldom used, but this RawUTF8 specific version is needed
// by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString
function Pos(const substr, str: RawUTF8): Integer; overload; inline;
And we reserved the RawByteString type for handling BLOB data:
{$ifndef UNICODE}
/// define RawByteString, as it does exist in Delphi 2009/2010/XE
// - to be used for byte storage into an AnsiString
// - use this type if you don't want the Delphi compiler not to do any
// code page conversions when you assign a typed AnsiString to a RawByteString,
// i.e. a RawUTF8 or a WinAnsiString
RawByteString = AnsiString;
/// pointer to a RawByteString
PRawByteString = ^RawByteString;
{$endif}
/// create a File from a string content
// - uses RawByteString for byte storage, thatever the codepage is
function FileFromString(const Content: RawByteString; const FileName: TFileName;
FlushOnDisk: boolean=false): boolean;
Source code is available in our repository. In this unit, UTF-8 related functions were deeply optimized, with both version in pascal and asm for better speed. We sometimes overloaded default functions (like Pos) to avoid conversion, or More information about how we handled text in the framework is available here.
Last word:
If you are sure that you will only have 7 bit content in your application (no accentuated characters), you may use the default AnsiString type in your program. But in this case, you should better add the AnsiStrings unit in your uses clause to have overloaded string functions which will avoid most unwanted conversion.
RawByteString is still an "AnsiString." It is best described as a "universal receiver" which means it will take on whatever the source-string's codepage is at the point of assignment without forcing a codepage conversion. RawByteString was intended to be used only as a function parameter so that you will, as you've discovered, not incur a conversion between AnsiStrings with differing code-page affinities when calling utility functions which take AnsiStrings.
However, in the case above, you're assigning what is essentially an AnsiString to a UnicodeString which will incur a conversion. It must do a conversion because the RawByteString has a payload of 8bit-based characters, whereas a string (UnicodeString) has a payload of 16bit-based characters.
I'm upgrading some ancient (from 2003) Delphi code to Delphi Architect XE and I'm running into a few problems. I am getting a number of errors where there are incompatible types. These errors don't happen in Delphi 6 so I must assume that this is because things have been upgraded.
I honestly don't know what the difference between PAnsiChar and PWideChar is, but Delphi sure knows the difference and won't let me compile. If I knew what the differences were maybe I could figure out which to use or how to fix this.
The short: prior to Delphi 2009 the native string type in Delphi used to be ANSI CHAR: Each char in every string was represented as an 8 bit char. Starting with Delphi 2009 Delphi's strings became UNICODE, using the UTF-16 notation: Now the basic Char uses 16 bits of data (2 bytes), and you probably don't need to know much about the Unicode code points that are represented as two consecutive 16 bits chars.
The 8 bit chars are called "Ansi Chars". An PAnsiChar is a pointer to 8 bit chars.
The 16 bit chars are called "Wide Chars". An PWideChar is a pointer to 16 bit chars.
Delphi knows the difference and does well if it doesn't allow you to mix the two!
More info
Here's a popular link on Unicode: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets
You can find some more information on migrating Delphi to Unicode here: New White Paper: Delphi Unicode Migration for Mere Mortals
You may also search SO for "Delphi Unicode migration".
A couple years ago, the default character type in Delphi was changed from AnsiChar (single-byte variable representing an ANSI character) to WideChar (two-byte variable representing a UTF16 character.) The char type is now an alias to WideChar instead of AnsiChar, the string type is now an alias to UnicodeString (a UTF-16 Unicode version of Delphi's traditional string type) instead of AnsiString, and the PChar type is now an alias to PWideChar instead of PAnsiChar.
The compiler can take care of a lot of the conversions itself, but there are a few issues:
If you're using string-pointer types, such as PChar, you need to make sure your pointer is pointing to the right type of data, and the compiler can't always verify this.
If you're passing strings to var parameters, the variable type needs to be exactly the same. This can be more complicated now that you've got two string types to deal with.
If you're using string as a convenient byte-array buffer for holding arbitrary data instead of a variable that holds text, that won't work as a UnicodeString. Make sure those are declared as RawByteString as a workaround.
Anyplace you're dealing with string byte lengths, for example when reading or writing to/from a TStream, make sure your code isn't assuming that a char is one byte long.
Take a look at Delphi Unicode Migration for Mere Mortals for some more tricks and advice on how to get this to work. It's not as hard as it sounds, but it's not trivial either. Good luck!