How to use 16-bits Assembly inline on Delphi on Windows98? - delphi

Today I was playing with my old computer and trying to use 16-bits Assembly inside Delphi. It's works fine with 32-bits but I always had problem when I used interrupts. Blue Screen or Freezing, that was making me believe that's not possible to do it. I'm on Windows 98 and using Delphi 7, using this simple code.
program Project1;
{$APPTYPE CONSOLE}
uses
SysUtils, Windows;
begin
asm
mov ax,$0301
mov bx,$0200
mov cx,$0001
xor dx,dx
int $13
int $20
end;
MessageBox(0,'Okay','Okay',MB_OK);
end.
To "format" a diskkete on the Floppy drive. There's a way to use it on Delphi 7 without freezing and blues screens? Or Delphi only allows to use 32-bits Assembly? Am I doing something wrong?

As long as your application is built as "32-bit Windows" application, the interrupts cannot work since these interrupts are simply not mapped.
You could try to compile your application as a "16-bit Console" application. I don't know if Delphi supports this, but that's my best guess for getting the emulation of int 0x13 and int 0x10.
By the way, shouldn't your assembly code use hexadecimal numbers, like this:?
mov ax, $0301
mov bx, $0200
mov cx, $0001
xor dx, dx
int $13
int $20
As it is now, you are probably calling interrupt $0d, which according to Ralf Brown's Interrupt List means:
INT 0D C - IRQ5 - FIXED DISK (PC,XT), LPT2 (AT), reserved (PS/2)

Delphi 7 produces 32 bit executables. Your 16 bit assembly code is therefore not compatible with the compiler you use.
You might have some luck with a 16 bit compiler, e.g. Turbo Pascal or Delphi 1. But it would make more sense, I suspect, to use the Win32 API to achieve your goals.

Related

Is there a way to manually change BIOS POST codes on motherboard LCD?

I was wondering if anyone knew if it was possible to change the BIOS POST Code that is displayed on the motherboard LCD. I want to develop a program that can manipulate the LCD screen on the motherboard to display any set of desired characters. I haven't been able to find anyone who has done something similar. Does anyone have any ideas on if this is possible? Thank You!
POST codes are usually displayed on LED devices on the motherboard, not LCD. Historically, POST codes can be output via IO port 0x80 on IBM/Intel compatible systems. Been a while since I have done x86 assembly, but would be something like this:
mov al, 41h ;41h, the value to output
out 80h, al ;send the value to IO port 80h
This will make "41" display on the POST code LEDs. If you have 4 LEDs (a four digit value), then use AX instead of AL or use port 81h and a second write.
mov ax, 5150h ;5150h, the value to output
out 80h, al ;send the value to IO port 80h
Note: as I recall in/out instructions are protected instructions and will generate an exeception when the CPU is in protected mode (e.g. from the Windows command line)

Memory usage in assembler 8086

I made a program in assembler 8086 for my class and everything is working just fine.
But beside making working program we have to make it use as low memory as possible. Could you give me some tips in that aspect? What should I write and what should I avoid?
The program is supposed to first print letter A on the screen and then in avery new line two more of letters of next letter in the alphabet, stop at Z and after pressing any key end program. For stopping until key is pressed i'm using:
mov ah,00h
int 16h
Is it good way to do it?
Most of what you want can be done in zero memory (counting only data, not the code itself). In general:
use registers rather than variables in memory
do not use push/pop
do not use subroutines
But to interact with the OS, you need to make BIOS calls and/or OS system calls; these require some memory (typically a small amount of stack space). In your case, you have to:
output characters to screen
wait for keypress
exit back to the OS
However, if you are serious about doing this in minimal memory, then there are a few hacks you can use.
Output characters to screen
On a PC, in traditional text mode, you can write characters straight to video RAM (address B800:0000 and further). This requires zero memory.
Wait for keypress
The cheapest way is to wait for a change of the BIOS keyboard buffer head (a change of the 16-bit content at address 041A hex). This requires zero memory.
See also: http://support.microsoft.com/kb/60140
Exit back to the OS
Try a simple ret; it is not recommended but it might just work in some versions of MS-DOS. An even uglier escape is to jump to F000:FFF0, which will reboot the machine. That's guaranteed to work in zero memory.
Use these instructions:
INC (Register*) instead of ADD (Register*), 1
DEC (Register*) instead of SUB (Register*), 1
XOR (Register)(same register) instead of MOV (Register), 0 (Doesn't work with variables)
SHR (Register*), 1 instead of DIV (Register*), 2
SHR (Register*), 2 instead of DIV (Register*), 4
..
SHL (Register*), 1 instead of MUL (Register*), 2
..
*Register or variable
These optimizations makes the program faster AND the size larger

Using SSE to round in Delphi

I wrote this function to round singles to integers:
function Round(const Val: Single): Integer;
begin
asm
cvtss2si eax,Val
mov Result,eax
end;
end;
It works, but I need to change the rounding mode. Apparently, per this, I need to set the MXCSR register.
How do I do this in Delphi?
The reason I am doing this in the first place is I need "away-from-zero" rounding (like in C#), which is not possible even via SetRoundingMode.
On modern Delphi, to set MXCSR you can call SetMXCSR from the System unit. To read the current value use GetMXCSR.
Do beware that SetMXCSR, just like Set8087CW is not thread-safe. Despite my efforts to persuade Embarcadero to change this, it seems that this particular design flaw will remain with us forever.
On older versions of Delphi you use the LDMXCSR and STMXCSR opcodes. You might write your own versions like this:
function GetMXCSR: LongWord;
asm
PUSH EAX
STMXCSR [ESP].DWord
POP EAX
end;
procedure SetMXCSR(NewMXCSR: LongWord);
//thread-safe version that does not abuse the global variable DefaultMXCSR
var
MXCSR: LongWord;
asm
AND EAX, $FFC0 // Remove flag bits
MOV MXCSR, EAX
LDMXCSR MXCSR
end;
These versions are thread-safe and I hope will compile and work on older Delphi versions.
Do note that using the name Round for your function is likely to cause a lot of confusion. I would advise that you do not do that.
Finally, I checked in the Intel documentation and both of the Intel floating point units (x87, SSE) offer just the rounding modes specified by the IEEE754 standard. They are:
Round to nearest (even)
Round down (toward −∞)
Round up (toward +∞)
Round toward zero (Truncate)
So, your desired rounding mode is not available.

Does Delphi support all MMX/SSE instructions?

I have this snippet of code:
#combinerows:
mov esi,eax
and edi,Row1Mask
and ebx,Row2Mask
or ebx,edi
//NewQ:= (Row1 and Row1Mask) or (Row2 and Row2Mask);
//Result:= NewQ xor q;
PUNPCKDQ mm4,mm5 <-- I get an error here
//mov eax,[eax].q
movd eax,mm4
//q:= NewQ;
mov [esi].q,ebx
xor eax,ebx //Return difference.
I get this error:
[Pascal Error] SDIMAIN.pas(718): E2003 Undeclared identifier: 'PUNPCKDQ'
Am I doing something wrong, or does Delphi 2007 not support a full set of MMX/SSE instructions?
Delphi 2007 supports the MMX and SSE instruction sets. Certainly, Delphi 2010 and XE support up to the SSE4.2 instruction sets (but so far no support for AVX).
However, Delphi is correct to complain about your "PUNPCKDQ" instruction: If you search the Intel® 64 and IA-32 Architectures Software Developer’s Manual (especially Volumes 2A and 2B would be relevant), you will NOT find an instruction by that name. I.e., it is your mistake, not Delphi's lack of support for this instruction.
A quick Google gives information on a PUNPCKLDQ rather than PUNPCKDQ.
D2007 accepts PUNPCKLDQ
and even better it also supports PUNPCKHDQ, which lets you transfer a high order dword to a low dword enabling you to load it into a general purpose register.

Most Efficient Unicode Hash Function for Delphi 2009

I am in need of the fastest hash function possible in Delphi 2009 that will create hashed values from a Unicode string that will distribute fairly randomly into buckets.
I originally started with Gabr's HashOf function from GpStringHash:
function HashOf(const key: string): cardinal;
asm
xor edx,edx { result := 0 }
and eax,eax { test if 0 }
jz #End { skip if nil }
mov ecx,[eax-4] { ecx := string length }
jecxz #End { skip if length = 0 }
#loop: { repeat }
rol edx,2 { edx := (edx shl 2) or (edx shr 30)... }
xor dl,[eax] { ... xor Ord(key[eax]) }
inc eax { inc(eax) }
loop #loop { until ecx = 0 }
#End:
mov eax,edx { result := eax }
end; { HashOf }
But I found that this did not produce good numbers from Unicode strings. I noted that Gabr's routines have not been updated to Delphi 2009.
Then I discovered HashNameMBCS in SysUtils of Delphi 2009 and translated it to this simple function (where "string" is a Delphi 2009 Unicode string):
function HashOf(const key: string): cardinal;
var
I: integer;
begin
Result := 0;
for I := 1 to length(key) do
begin
Result := (Result shl 5) or (Result shr 27);
Result := Result xor Cardinal(key[I]);
end;
end; { HashOf }
I thought this was pretty good until I looked at the CPU window and saw the assembler code it generated:
Process.pas.1649: Result := 0;
0048DEA8 33DB xor ebx,ebx
Process.pas.1650: for I := 1 to length(key) do begin
0048DEAA 8BC6 mov eax,esi
0048DEAC E89734F7FF call $00401348
0048DEB1 85C0 test eax,eax
0048DEB3 7E1C jle $0048ded1
0048DEB5 BA01000000 mov edx,$00000001
Process.pas.1651: Result := (Result shl 5) or (Result shr 27);
0048DEBA 8BCB mov ecx,ebx
0048DEBC C1E105 shl ecx,$05
0048DEBF C1EB1B shr ebx,$1b
0048DEC2 0BCB or ecx,ebx
0048DEC4 8BD9 mov ebx,ecx
Process.pas.1652: Result := Result xor Cardinal(key[I]);
0048DEC6 0FB74C56FE movzx ecx,[esi+edx*2-$02]
0048DECB 33D9 xor ebx,ecx
Process.pas.1653: end;
0048DECD 42 inc edx
Process.pas.1650: for I := 1 to length(key) do begin
0048DECE 48 dec eax
0048DECF 75E9 jnz $0048deba
Process.pas.1654: end; { HashOf }
0048DED1 8BC3 mov eax,ebx
This seems to contain quite a bit more assembler code than Gabr's code.
Speed is of the essence. Is there anything I can do to improve either the pascal code I wrote or the assembler that my code generated?
Followup.
I finally went with the HashOf function based on SysUtils.HashNameMBCS. It seems to give a good hash distribution for Unicode strings, and appears to be quite fast.
Yes, there is a lot of assembler code generated, but the Delphi code that generates it is so simple and uses only bit-shift operations, so it's hard to believe it wouldn't be fast.
ASM output is not a good indication of algorithm speed. Also, from what I can see, the two pieces of code are doing almost the identical work. The biggest difference seem to be the memory access strategy and the first is using roll-left instead of the equivalent set of instructions (shl | shr -- most higher-level programming languages leave out the "roll" operators). The latter may pipeline better than the former.
ASM optimization is black magic and sometimes more instructions execute faster than fewer.
To be sure, benchmark both and pick the winner. If you like the output of the second but the first is faster, plug the second's values into the first.
rol edx,5 { edx := (edx shl 5) or (edx shr 27)... }
Note that different machines will run the code in different ways, so if speed is REALLY of the essence then benchmark it on the hardware that you plan to run the final application on. I'm willing to bet that over megabytes of data the difference will be a matter of milliseconds -- which is far less than the operating system is taking away from you.
PS. I'm not convinced this algorithm creates even distribution, something you explicitly called out (have you run the histograms?). You may look at porting this hash function to Delphi. It may not be as fast as the above algorithm but it appears to be quite fast and also gives good distribution. Again, we're probably talking on the order of milliseconds of difference over megabytes of data.
We held a nice little contest a while back, improving on a hash called "MurmurHash"; Quoting Wikipedia :
It is noted for being exceptionally
fast, often two to four times faster
than comparable algorithms such as
FNV, Jenkins' lookup3 and Hsieh's
SuperFastHash, with excellent
distribution, avalanche behavior and
overall collision resistance.
You can download the submissions for that contest here.
One thing we learned was, that sometimes optimizations don't improve results on every CPU. My contribution was tweaked to run good on AMD, but performed not-so-good on Intel. The other way around happened too (Intel optimizations running sub-optimal on AMD).
So, as Talljoe said : measure your optimizations, as they might actually be detrimental to your performance!
As a side-note: I don't agree with Lee; Delphi is a nice compiler and all, but sometimes I see it generating code that just isn't optimal (even when compiling with all optimizations turned on). For example, I regularly see it clearing registers that had already been cleared just two or three statements before. Or EAX is put into EBX, only to have it shifted and put back into EAX. That sort of thing. I'm just guessing here, but hand-optimizing that sort of code will surely help in tight spots.
Above all though; First analyze your bottleneck, then see if a better algorithm or datastructure can be used, then try to optimize the pascal code (like: reduce memory-allocations, avoid reference counting, finalization, try/finally, try/except blocks, etc), and then, only as a final resort, optimize the assembly code.
I've written two assembly "optimized" functions in Delphi, or more implemented known fast hash algorithms in both fine-tuned Pascal and Borland Assembler. The first was a implementation of SuperFastHash, and the second was a MurmurHash2 implementation triggered by a request from Tommi Prami on my blog to translate my c# version to a Pascal implementation. This spawned a discussion continued on the Embarcadero Discussion BASM Forums, that in the end resulted in about 20 implementations (check the latest benchmark suite) which ultimately showed that it would be difficult to select the best implementation due to the big differences in cycle times per instruction between Intel and AMD.
So, try one of those, but remember, getting the fastest every time would probably mean changing the algorithm to a simpler one which would hurt your distribution. Fine-tuning an implementation takes lots of time and better create a good validation and benchmarking suite to make check your implementations.
There has been a bit of discussion in the Delphi/BASM forum that may be of interest to you. Have a look at the following:
http://forums.embarcadero.com/thread.jspa?threadID=13902&tstart=0

Resources