How to program a small program? - delphi

I would like to program productive and keep my FileSize very small.
However I would like to know a few tips how do accomplish that.
For example what is better for a small FileSize:
Either:
if .... = '1' then begin
...
end;
or:
if ..... = inttostr(1) then begin
...
end;
or:
if .... = inttostr($0001) then begin
...
end;
or:
case of intvar
1: ...
2: ...
end;
Then there is something that I tried and I was surprised.
I made another Unit in my project that stores Strings as constants and then I use the constant vars to replace the strings in my project. For some reason this raises my FileSize although I replace double used Strings as a var now.
Also is it better to store stuff in vars than put them directly into the code?!
For example:
Function1(param1, param2, param3); // this code is used about 20 times in my project
or is it better if I:
Avar = Function1 (param1,param2,param3); // Store this once in a var and then replace it
And what about:
if ... = TRUE
or:
if ....
Same as:
if .... = FALSE
or:
if not(...)...
Any other tips to program productive for a smaller FileSize?
Thanks in advance.
I use Delphi7

I'm sorry to be blunt, but you are putting the cart before the horse.
If you really want to know how to make your executable smaller without already knowing what differences will result from your code variations in your given examples, you should just stop right now and read/learn/practice until you know more about the language and the compiler.
Then you'll understand that your question makes little sense per se, as you can already see by all the pertinent comments you got.
the exact same source code can result in vastly different executables and different source code can result in the same executable, depending on the compiler options/optimizations
if your main goal is to micro-manage/control the generated exe, program directly in assembler.
if you want to know what is generated by the compiler, learn how to use the CPU View.
program for correctness first, then readability/maintainability
only then, if needed (implies using correct metrics/tools), you can optimize for speed, memory usage or file size (probably the least useful/needed)

Long time ago, i tried to make a program as small as possible, because it had to fit onto a floppy disk (yes i know i'm a dinosaur). This Splitter tool was written in Delphi and is about 50KB in size.
To get it this small, it was necessary to do without a lot of Delphi's units (especially the Forms unit and all units with references to it). The only way, was to use the Windows-API directly for the GUI, and i can't think of a reason to do this nowadays, but out of interest.
As soon as you use the VCL, the exe size will grow much more, than all micro optimizations in your code together can reduce it.

Related

Why can't I use an Int64 in a for loop?

I can write for..do process for integer value..
But I can't write it for int64 value.
For example:
var
i:int64;
begin
for i:=1 to 1000 do
end;
The compiler refuses to compile this, why does it refuse?
The Delphi compiler simply does not support Int64 loop counters yet.
Loop counters in a for loop have to be integers (or smaller).
This is an optimization to speed up the execution of a for loop.
Internally Delphi always uses an Int32, because on x86 this is the fastest datatype available.
This is documented somewhere deep in the manual, but I don't have a link handy right now.
If you must have a 64 bit loop counter, use a while..do or repeat..until loop.
Even if the compiler did allow "int64" in a Delphi 7 for-loop (Delphi 7???), it probably wouldn't complete iterating through the full range until sometime after the heat death of the Sun.
So why can't you just use an "integer"?
If you must use an int64 value ... then simply use a "while" loop instead.
Problem solved :)
Why to use a Int64 on a for-loop?
Easy to answer:
There is no need to do a lot of iterations to need a Int64, just do a loop from 5E9 to 5E9+2 (three iterations in total).
It is just that values on iteration are bigger than what Int32 can hold
An example:
procedure Why_Int64_Would_Be_Great_On_For_Loop;
const
StartValue=5000000000; // Start form 5E9, 5 thousand millons
Quantity=10; // Do it ten times
var
Index:Int64;
begin
for Index:=StartValue to StartValue+Quantity-1
do begin // Bla bla bla
// Do something really fast (only ten times)
end;
end;
That code would take no time at all, it is just that index value need to be far than 32bit integer limit.
The solution is to do it with a while loop:
procedure Equivalent_For_Loop_With_Int64_Index;
const
StartValue=5000000000; // Start form 5E9, 5 thousand millons
Quantity=10; // Do it ten times
var
Index:Int64;
begin
Index:=StartValue;
while Index<=StartValue+Quantity
do begin // Bla bla bla
// Do something really fast (only ten times)
Inc(Index);
end;
end;
So why the compiler refuses to compile the foor loop, i see no real reason... any for loop can be auto-translated into a while loop... and pre-compiler could do such before compiler (like other optimizations that are done)... the only reason i see is the lazy people that creates the compiler that did not think on it.
If for is optimized and so it is only able to use 32 bit index, then if code try to use a 64 bit index it can not be so optimized, so why not let pre-compiler optimizator to chage that for us... it only gives bad image to programmers!!!
I do not want to make anyone ungry...
I only just say something obvious...
By the way, not all people start a foor loop on zero (or one) values... sometimes there is the need to start it on really huge values.
It is allways said, that if you need to do something a fixed number of times you best use for loop instead of while loop...
Also i can say something... such two versions, the for-loop and the while-loop that uses Inc(Index) are equally fast... but if you put the while-loop step as Index:=Index+1; it is slower; it is really not slower because pre-compiler optimizator see that and use Inc(Index) instead... you can see if buy doing the next:
// I will start the loop from zero, not from two, but i first do some maths to avoid pre-compiler optimizator to convert Index:=Index+Step; to Inc(Index,Step); or better optimization convert it to Inc(Index);
Index:=2;
Step:=Index-1; // Do not put Step:=1; or optimizator will do the convertion to Inc()
Index:=Step-2; // Now fix, the start, so loop will start from zero
while Index<1000000 // 1E6, one millon iterations, from 0 to 999999
do begin
// Do something
Index:=Index+Step; // Optimizator will not change this into Inc(Index), since sees that Step has changed it's value before
end;
The optimizer can see a variable do not change its value, so it can convert it to a constant, then on the increment assign if adding a constant (variable:=variable+constant) it will optimize it to Inc(variable,constant) and in the case it sees such constant is 1 it will also optimes it to Inc(variable)... and such optimizatons in low level computer language are very noticeble...
In Low level computer language:
A normal add (variable:=variable1+variable2) implies two memory reads plus one sum plus one memory write... lot of work
But if is a (variable:=variable+othervariable) it can be optimized holding variable inside the processor cache.
Also if it is a (variable:=variable1+constant) it can also be optimized by holding constant on the processor cache
And if it is (variable:=variable+constant) both are cached on processor cache, so huge fast compared with other options, no acces to RAM is needed.
In such way pre-compiler optimizer do another important optimization... for-loops index variables are holded as processor registers... much more faster than processor cache...
Most mother processor do an extra optimization as well (at hardware level, inside the processor)... some cache areas (32 bit variables for us) seen that are intensivly used are stored as special registers to fasten access... and such for-loop / while-loop indexes are ones of them... but as i said.. most mother AMD proccesors (the ones that uses MP technology does that)... i do not yet know any Intel that do that!!! such optimization is more relevant when multi-core and on super-computing... so maybe that is the reason why AMD has it and Intel not!!!
I only want to show one "why", there are a lot more... another one could be as simple as the index is stored on a database Int64 field type, etc... there are a lot of reasons i know and a lot more i did not know yet...
I hope this will help to understand the need to do a loop on a Int64 index and also how to do it without loosing speed by correctly eficiently converting loop into a while loop.
Note: For x86 compiles (not for 64bit compilation) beware that Int64 is managed internally as two Int32 parts... and when modifing values there is an extra code to do, on adds and subs it is very low, but on multiplies or divisions such extra is noticeble... but if you really need Int64 you need it, so what else to do... and imagine if you need float or double, etc...!!!

TStringList of objects taking up tons of memory in Delphi XE

I'm working on a simulation program.
One of the first things the program does is read in a huge file (28 mb, about 79'000 lines,), parse each line (about 150 fields), create a class for the object, and add it to a TStringList.
It also reads in another file, which adds more objects during the run. At the end, it ends up being about 85'000 objects.
I was working with Delphi 2007, and the program used a lot of memory, but it ran OK. I upgraded to Delphi XE, and migrated the program over and now it's using a LOT more memory, and it ends up running out of memory half way through the run.
So in Delphi 2007, it would end up using 1.4 gigs after reading in the initial file, which is obviously a huge amount, but in XE, it ends up using almost 1.8 gigs, which is really huge and leads to running out and getting the error
So my question is
Why is it using so much memory?
Why is it using so much more memory in XE than 2007?
What can I do about this? I can't change how big or long the file is, and I do need to create an object for each line and to store it somewhere
Thanks
Just one idea which may save memory.
You could let the data stay on the original files, then just point to them from in-memory structures.
For instance, it's what we do for browsing big log files almost instantly: we memory-map the log file content, then we parse it quick to create indexes of useful information in memory, then we read the content dynamically. No string is created during the reading. Only pointers to each line beginning, with dynamic arrays containing the needed indexes. Calling TStringList.LoadFromFile would be definitively much slower and memory consuming.
The code is here - see the TSynLogFile class. The trick is to read the file only once, and make all indexes on the fly.
For instance, here is how we retrieve a line of text from the UTF-8 file content:
function TMemoryMapText.GetString(aIndex: integer): string;
begin
if (self=nil) or (cardinal(aIndex)>=cardinal(fCount)) then
result := '' else
result := UTF8DecodeToString(fLines[aIndex],GetLineSize(fLines[aIndex],fMapEnd));
end;
We use the exact same trick to parse JSON content. Using such a mixed approach is used by the fastest XML access libraries.
To handle your high-level data, and query them fast, you may try to use dynamic arrays of records, and our optimized TDynArray and TDynArrayHashed wrappers (in the same unit). Arrays of records will be less memory consuming, will be faster to search in because the data won't be fragemented (even faster if you use ordered indexes or hashes), and you'll be able to have high-level access to the content (you can define custom functions to retrieve the data from the memory mapped file, for instance). Dynamic arrays won't fit fast deletion of items (or you'll have to use lookup tables) - but you wrote you are not deleting much data, so it won't be a problem in your case.
So you won't have any duplicated structure any more, only logic in RAM, and data on memory-mapped file(s) - I added a "s" here because the same logic could perfectly map to several source data files (you need some "merge" and "live refresh" AFAIK).
It's hard to say why your 28 MB file is expanding to 1.4 GB worth of objects when you parse it out into objects without seeing the code and the class declarations. Also, you say you're storing it in a TStringList instead of a TList or TObjecList. This sounds like you're using it as some sort of string->object key/value mapping. If so, you might want to look at the TDictionary class in the Generics.Collections unit in XE.
As for why you're using more memory in XE, it's because the string type changed from an ANSI string to a UTF-16 string in Delphi 2009. If you don't need Unicode, you could use a TDictionary to save space.
Also, to save even more memory, there's another trick you could use if you don't need all 79,000 of the objects right away: lazy loading. The idea goes something like this:
Read the file into a TStringList. (This will use about as much memory as the file size. Maybe twice as much if it gets converted into Unicode strings.) Don't create any data objects.
When you need a specific data object, call a routine that checks the string list and looks up the string key for that object.
Check if that string has an object associated with it. If not, create the object from the string and associate it with the string in the TStringList.
Return the object associated with the string.
This will keep both your memory usage and your load time down, but it's only helpful if you don't need all (or a large percentage) of the objects immediately after loading.
In Delphi 2007 (and earlier), a string is an Ansi string, that is, every character occupies 1 byte of memory.
In Delphi 2009 (and later), a string is a Unicode string, that is, every character occupies 2 bytes of memory.
AFAIK, there is no way to make a Delphi 2009+ TStringList object use Ansi strings. Are you really using any of the features of the TStringList? If not, you could use an array of strings instead.
Then, naturally, you can choose between
type
TAnsiStringArray = array of AnsiString;
// or
TUnicodeStringArray = array of string; // In Delphi 2009+,
// string = UnicodeString
Reading though the comments, it sounds like you need to lift the data out of Delphi and into a database.
From there it is easy to match organ donors to receivers*)
SELECT pw.* FROM patients_waiting pw
INNER JOIN organs_available oa ON (pw.bloodtype = oa.bloodtype)
AND (pw.tissuetype = oa.tissuetype)
AND (pw.organ_needed = oa.organ_offered)
WHERE oa.id = '15484'
If you want to see the patients that might match against new organ-donor 15484.
In memory you only handle the few patients that match.
*) simplified beyond all recognition, but still.
In addition to Andreas' post:
Before Delphi 2009, a string header occupied 8 bytes. Starting with Delphi 2009, a string header takes 12 bytes. So every unique string uses 4 bytes more than before, + the fact that each character takes twice the memory.
Also, starting with Delphi 2010 I believe, TObject started using 8 bytes instead of 4. So for each single object created by delphi, delphi now uses 4 more bytes. Those 4 bytes were added to support the TMonitor class I believe.
If you're in desperate need to save memory, here's a little trick that could help if you have a lot of string value that repeats themselve.
var
uUniqueStrings : TStringList;
function ReduceStringMemory(const S : String) : string;
var idx : Integer;
begin
if not uUniqueStrings.Find(S, idx) then
idx := uUniqueStrings.Add(S);
Result := uUniqueStrings[idx]
end;
Note that this will help ONLY if you have a lot of string values that repeat themselves. For exemple, this code use 150mb less on my system.
var sl : TStringList;
I: Integer;
begin
sl := TStringList.Create;
try
for I := 0 to 5000000 do
sl.Add(ReduceStringMemory(StringOfChar('A',5)));every
finally
sl.Free;
end;
end;
I also read in a lot of strings in my program that can approach a couple of GB for large files.
Short of waiting for 64-bit XE2, here is one idea that might help you:
I found storing individual strings in a stringlist to be slow and wasteful in terms of memory. I ended up blocking the strings together. My input file has logical records, which may contain between 5 and 100 lines. So instead of storing each line in the stringlist, I store each record. Processing a record to find the line I need adds very little time to my processing, so this is possible for me.
If you don't have logical records, you might just want to pick a blocking size, and store every (say) 10 or 100 strings together as one string (with a delimiter separating them).
The other alternative, is to store them in a fast and efficient on-disk file. The one I'd recommend is the open source Synopse Big Table by Arnaud Bouchez.
May I suggest you try using the jedi class library (JCL) class TAnsiStringList, which is like TStringList fromDelphi 2007 in that it is made up of AnsiStrings.
Even then, as others have mentioned, XE will be using more memory than delphi 2007.
I really don't see the value of loading the full text of a giant flat file into a stringlist. Others have suggested a bigtable approach such as Arnaud Bouchez's one, or using SqLite, or something like that, and I agree with them.
I think you could also write a simple class that will load the entire file you have into memory, and provide a way to add line-by-line object links to a giant in-memory ansichar buffer.
Starting with Delphi 2009, not only strings but also every TObject has doubled in size. (See Why Has the Size of TObject Doubled In Delphi 2009?). But this would not explain this increase if there are only 85,000 objects. Only if these objects contain many nested objects, their size could be a relevant part of the memory usage.
Are there many duplicate strings in your list? Maybe trying to only store unique strings will help reducing the memory size. See my Question
about a string pool for a possible (but maybe too simple) answer.
Are you sure you don't suffer from a case of memory fragementation?
Be sure to use the latest FastMM (currently 4.97), then take a look at the UsageTrackerDemo demo that contains a memory map form showing the actual usage of the Delphi memory.
Finally take a look at VMMap that shows you how your process memory is used.

Does avoiding functions increase the performance?

Here is a little test:
function inc(n:integer):integer;
begin
n := n+1;
result := n;
end;
procedure TForm1.Button1Click(Sender: TObject);
var
start,i,n:integer;
begin
n := 0;
start := getTickCount;
for i := 0 to 10000000 do begin
inc(n);//calling inc function takes 73 ms
//n := n+1; writing it directly takes 16 ms
end;
showMessage(inttostr(getTickCount-start));
end;
Yes, calling a function introduces an overhead. Before calling the function it's necessary to save the current state - which instruction was planned to execute next - and also to copy the function parameters. This requires extra work and extra time.
That's where inlining is helpful. If the compiler supports that it can just injsct the function code directly at the call site and avoid the overhead. With good optimization of surrounding code it can even decrease amount of generated code.
This doesn't mean you need to avoid functions. In most cases the function body executes much longer that the time needed to organize the call. Only in quite rare cases the overhead is worth optimizing. This should never be done without the help of the profiler - otherwise you waste time and most likely just get a lot of unmaintainable code.
Calling a function (whichever language you're working with) generally involves doing a bit more things, like saving some context, pushing parameters to some kind of stack, calling the function itself, reading the parameters, and then pushing the result back somewhere, returning from the function, extracting the return value, ...
So, of course, calling functions generally means having some overhead.
But the main point of functions is re-using some parts of code : maybe it will take a few micro-seconds more at execution, but if you only have to write some code once, instead of 10 (or more) times, there is a huge gain ; and that code will be much easier to maintain, which is really important in the long term.
After, you might want not using functions for some really small parts of code like the one you provided as an example (well, except if the language you're using provides some kind of inlining thing -- it's the case for C, if I remember correctly ; not sure about delphi, though) : the overhead of calling the function will be important, compared to the number of lines of code the function will save you from writing (here : none ! On the contrary ^^ ).
But for bigger parts of code, the overhead will me much smaller, compared to the time taken to execute the bunch of code the function contains...
Premature optimization is the root of all evil...
Write correct and maintainable code using the known features (here the built-in pseudo(magic) procedure inc), benchmark it and refactor where it's needed for performance reason (if any).
I bet that in 99.9% of the cases, avoiding calling a function or procedure is not the solution.
Here is an example where adding a call to a procedure actually IS the optimization.
Only optimize when there is a bottleneck.
Your current code is perfectly fine for about 99.9% of the cases.
If it gets slow, use a profiler to point you at the bottleneck.
When the bottleneck appears to be in the inc function, then you can always inline your function by marking it with the 'inline' directive.
I totally agree with Francois on this one.
One of the most expensive parts of a function call is the returning of the result.
If you did want to keep your program modular, but wanted to save a bit of time, change your function to a procedure and use a var parameter to retrieve the result.
So for your example:
procedure inc(var n:integer);
begin
n := n+1;
end;
should be considerably faster than using your inc function.
Also, in the loop in your example, you have the statement:
inc(n)
but this will not update the value of n. The loop will finish and n will have the value of 0. What you need instead is:
n := inc(n);
For your timings, do you have optimization on? If you do, then it may not be timing what you thing it is. The value of n is not used by the program and may be optimized right out of it.
To make sure that n is used for the timings, you can simply display the value of n in your showMessage line.
Finally, inc is a built in procedure. It is not good practice to use the same function name as that of a built in procedure as it can cause doubts as to which procedure is being executed - yours or the built in one.
Change your function's name to myinc, and then do a third test with the built in inc procedure itself, to see if it is faster than n := n + 1;
As others before me said. Yes it does. Every line of code you write does. Functions need to store current states of registers etc... before they can execute and restore it afterwards.
But the overhead is so minimal that optimizing that means nothing. It is more important to have a redable well structured code. Almost always. There may be rare cases when every nanosecond is important but I cannot imagine one right now.
Look here for general guidelines about performance in delphi programs:
http://effovex.com/OptimalCode/opguide.htm
just want to add some comments specific to Delphi:
I think i remember than getTickCount() got a minimal resolution a bit hight to do this kind of test. (+/- 10-15ms). You could use QueryPerformanceCounter() for a better result.
for small function called a lot of time (inside process loop, data convertion, ...) use INLINE (search the help)
but to know for real what a funciton take and if you should do something about it, use a profiler !! I use http://www.prodelphi.de/, it's pretty simple, very usefull and the price is very correct compare to other profiler (ie: +/-50€ instead of 500€)
In delphi, they is the inc() function. It's faster than "n := n+1". ( because inc() is not really a function, it is replaced by the compiler by asm. ie: they is no source code for the funcion inc() ).
All good comments.
Functions are supposed to be useful, that's why they're in the language. The assumption is that if they have a nominal cost, you are willing to pay that to get the utility they provide.
Here's the real problem with functions, no matter who writes them, but especially if somebody other than you wrote them.
They have an implied contract for what they're supposed to do, but they have no contract for how long they should take.
Usually the person who writes the function thinks "This function does something valuable, so the person who calls it will respect that, and use it sparingly."
Then the person who calls it thinks "This function does so much in only a single call that I can make my code really clean and powerful by calling it lots of times."
Now, with multiple layers of abstraction, this effect acts like compound interest.
So, the real performance problem with functions is not the cost of calls, it is the psychology of programmers, leading to exponential slowdown.
Fortunately, experience in performance tuning can ameliorate this problem.

Delphi constant bitwise expressions

Probably a stupid question, but it's an idle curiosity for me.
I've got a bit of Delphi code that looks like this;
const
KeyRepeatBit = 30;
...
// if bit 30 of lParam is set, mark this message as handled
if (Msg.lParam and (1 shl KeyRepeatBit) > 0) then
Handled:=true;
...
(the purpose of the code isn't really important)
Does the compiler see "(1 shl KeyRepeatBit)" as something that can be computed at compile time, and thus it becomes a constant? If not, would there be anything to gain by working it out as a number and replacing the expression with a number?
Yes, the compiler evaluates the expression at compile time and uses the result value as a constant. There's no gain in declaring another constant with the result value yourself.
EDIT: The_Fox is correct. Assignable typed constants (see {$J+} compiler directive) are not treated as constants and the expression is evaluated at runtime in that case.
You can make sure iike this, for readability alone:
const
KeyRepeatBit = 30;
KeyRepeatMask = 1 shl KeyRepeatBit ;
It converts it to a constant at compile time.
However, even if it didn't, this would have no noticeable impact on your application's performance.
You might handle a few thousand messages per second if your app is busy. Your old Pentium I can do gazillions of shifts and ands per second.
Keep your code readable, and profile it to find bottlenecks that you then optimize - usually by looking at the algorithm, and not such a low level as whether you're shifting or not.
I doubt that using a number (would be 1073741824, by the way) here would really improve performance. You seem to be in some Windows message context here and this will possible add more delay than a single and that is lightning fast even if the number is not optimized at compiled time (anyway, I think it is optimized).
The only exception I could imagine would be the case that this particular piece of code is run really often, but as I said I think this gets optimized at compile time and so even in this case it won't make a difference at all.
Maybe it's offtopic to your question but I use a case record for these kind of things, example:
TlParamRecord = record
case Integer of
0: (
RepeatCount: Word;
ScanCode: Byte;
Flags: Set of (lpfExtended, lpfReserved=4, lpfContextCode,
lpfPreviousKeyState, lpfTransitionState);
);
1: (lParam: LPARAM);
end;
see article on my blog for more details

How to ensure 16byte code alignment of Delphi routines?

Background:
I have a unit of optimised Delphi/BASM routines, mostly for heavy computations. Some of these routines contain inner loops for which I can achieve a significant speed-up if the loop start is aligned to a DQWORD (16-byte) boundary. I can ensure that the loops in question are aligned as desired IF I know the alignment at the routine entry point.
As far as I can see, the Delphi compiler aligns procedures/functions to DWORD boundaries, and e.g. adding functions to the unit may change the alignment of subsequent ones. However, as long as I pad the end of routines to multiples of 16, I can ensure that subsequent routines are likewise aligned -- or misaligned, depending on the alignment of the first routine. I therefore tried to place the critical routines at the beginning of the unit's implementation section, and put a bit of padding code before them so that the first procedure would be DQWORD aligned.
This looks something like below:
interface
procedure FirstProcInUnit;
implementation
procedure __PadFirstProcTo16;
asm
// variable number of NOP instructions here to get the desired code length
end;
procedure FirstProcInUnit;
asm //should start at DQWORD boundary
//do something
//padding to align the following label to DQWORD boundary
#Some16BAlignedLabel:
//code, looping back to #Some16BAlignedLabel
//do something else
ret #params
//padding to get code length to multiple of 16
end;
initialization
__PadFirstProcTo16; //call this here so that it isn't optimised out
ASSERT ((NativeUInt(Pointer(#FirstProcInUnit)) AND $0F) = 0, 'FirstProcInUnit not DQWORD aligned');
end.
This is a bit of a pain in the neck, but I can get this sort of thing to work when necessary. The problem is that when I use such a unit in different projects, or make some changes to other units in the same project, this may still break the alignment of __PadFirstProcTo16 itself. Likewise, recompiling the same project with different compiler versions (e.g. D2009 vs. D2010) typically also breaks the alignment. So, the only way of doing this sort of thing I found was by hand as the pretty much last thing to be done when all the rest of the project is in its final form.
Question 1:
Is there any other way to achieve the desired effect of ensuring that (at least some specific) routines are DQWORD-aligned?
Question 2:
Which are the exact factors that affect the compiler's alignment of code and (how) could I use such specific knowledge to overcome the problem outlined here?
Assume that for the sake of this question "don't worry about code alignment/the associated presumably small speed benefits" is not a permissible answer.
As of Delphi XE, the problem of code alignment is now easily solved using the $CODEALIGN compiler directive (see this Delphi documentation page):
{$CODEALIGN 16}
procedure MyAlignedProc;
begin
..
end;
One thing that you could do, is to add a 'magic' signature at the end of each routine, after an explicit ret instruction:
asm
...
ret
db <magic signature bytes>
end;
Now you could create an array containing pointers to each routine, scan the routines at run-time once for the magic signature to find the end of each routine and therefore its length. Then, you can copy them to a new block of memory that you allocate with VirtualAlloc using PAGE_EXECUTE_READWRITE, ensuring this time that each routine starts on a 16-byte boundary.

Resources