Memory usage in assembler 8086 - memory

I made a program in assembler 8086 for my class and everything is working just fine.
But beside making working program we have to make it use as low memory as possible. Could you give me some tips in that aspect? What should I write and what should I avoid?
The program is supposed to first print letter A on the screen and then in avery new line two more of letters of next letter in the alphabet, stop at Z and after pressing any key end program. For stopping until key is pressed i'm using:
mov ah,00h
int 16h
Is it good way to do it?

Most of what you want can be done in zero memory (counting only data, not the code itself). In general:
use registers rather than variables in memory
do not use push/pop
do not use subroutines
But to interact with the OS, you need to make BIOS calls and/or OS system calls; these require some memory (typically a small amount of stack space). In your case, you have to:
output characters to screen
wait for keypress
exit back to the OS
However, if you are serious about doing this in minimal memory, then there are a few hacks you can use.
Output characters to screen
On a PC, in traditional text mode, you can write characters straight to video RAM (address B800:0000 and further). This requires zero memory.
Wait for keypress
The cheapest way is to wait for a change of the BIOS keyboard buffer head (a change of the 16-bit content at address 041A hex). This requires zero memory.
See also: http://support.microsoft.com/kb/60140
Exit back to the OS
Try a simple ret; it is not recommended but it might just work in some versions of MS-DOS. An even uglier escape is to jump to F000:FFF0, which will reboot the machine. That's guaranteed to work in zero memory.

Use these instructions:
INC (Register*) instead of ADD (Register*), 1
DEC (Register*) instead of SUB (Register*), 1
XOR (Register)(same register) instead of MOV (Register), 0 (Doesn't work with variables)
SHR (Register*), 1 instead of DIV (Register*), 2
SHR (Register*), 2 instead of DIV (Register*), 4
..
SHL (Register*), 1 instead of MUL (Register*), 2
..
*Register or variable
These optimizations makes the program faster AND the size larger

Related

What instruction set would be easiest to implement on a homemade ALU?

I'm designing a basic 8 or 16 bit computer (haven't really decided yet) using eeprom chips, sram, and an ALU made (mostly) out of individual transistors on a PCB using cmos logic that I already have partially designed and tested. And I thought it would be cool to use an already existing instruction set so I can compile C++ code for it instead of writing everything in machine code.
I looked at the AVR gcc compiler on Compiler Explorer and the machine code it produces, it looks very simple and I think it is only 8-bits. Or should I go for 32-bits and try to use x86? That would make the ALU a lot bigger. Are there compilers that let you use limited instructions so I don't have to make every single one? Or would it even be easier to just write an interpreter for a custom instruction set? Any advice is welcome, thank you.
After a bit of research it has become apparent that trying to recreate modern ALUs and instructions would be very complicated and time consuming, and I should definitely make my own simplistic architecture and if I really want to compile C code for it I could probably just interpret x86 or AVR assembly from gcc.
I would also love some feedback on my design, I came up with a really weird ISA last night that is focused mainly on being easy to engineer the hardware.
There are two registers in the ALU, all other registers perform functions based off those two numbers all at the same time. For instance, there is a register that holds the added result of A and B, one that holds the result of A shifted right B times, a "jump if A > B" branch, and so on.
And so to add a number, it would take 3 clock cycles, you would move two values from ram into A and B, then copy the data back to ram afterwards. It would look like this:
setA addressInRam1 (6-bit opcode, 18-bit address/value)
setB addressInRam2
copyAddedResult addressInRam1
And program code is executed directly from EEPROM memory. I don't know if I should think of it as having two general purpose registers or it having 2^18 registers. Either way, it makes it much easier and simpler to build when you're executing instructions one at a time like that. Again any advice is welcome, I am somewhat of a noob in this field, thank you!
Oh and then an additional C register to hold a value to be stored in RAM the next clock cycle specified in the set register. This is what the Fibonacci sequence would look like:
1: setC 1; // setting C reg to 1
2: set 0; // setting address 0 in ram to the C register
3: setA 0; // copying value in address 0 of ram into A reg
// repeat for B reg
4: set 1; // setting this to the same as the other
5: setB 1;
6: jumpIf> 9; // jump to line 9 if A > B
7: getSum 0; // put sum of A and B into address 0 of ram
8: setA 0; // set the A register to address 0 of ram
9: getSum 1; // "else" put the sum into the second variable
10: setB 1;
11: jump 6; // loop back to line 6 forever
I made a C++ equivalent and put it through compiler explorer and despite the many drawbacks of this architecture it uses the same amount of clock cycles as x64 in the loop and two more in total. But I think this function in particular works pretty well with it as I don't have to reassign A and B often.

What happens when memory "wraps" on an IA-32 supporting machine?

I'm creating a 64-bit model of IA-32 and am representing memory as a 0-based array of 2**64 bytes (the language I'm modeling this in uses ** as the exponentiation operator). This means that valid indices into the array are from 0 to 2**64-1. Now, to model the possible modes of accessing that memory, one can treat one element as an 8-bit number, two elements as a (little-endian) 16-bit number, etc.
My question is, what should my model do if they ask for a 16-bit (or 32-bit, etc.) number from location 2**64-1? Right now, what the model does is say that the returned value is Memory(2**64-1) + (8 * Memory(0)). I'm not updating any flags (which feels wrong). Is wrapping like this the correct behavior? Should I be setting any flags when the wrapping happens?
I have a copy of Intel-64-ia-32-ISA.pdf which I'm using as a reference, but it's 1,479 pages, and I'm having a hard time finding the answer to this particular question.
The answer is in Volume 3A, section 5.3: "Limit checking."
For ia-32:
When the effective limit is FFFFFFFFH (4 GBytes), these accesses [which extend beyond the end of the segment] may or may not cause the indicated exceptions. Behavior is implementation-specific and may vary from one execution to another.
For ia-64:
In 64-bit mode, the processor does not perform rumtime limit checking on code or data segments. Howver, the processor does check descriptor-table limits.
I tested it (did anyone expect that?) for 64bit numbers with this code:
mov dword [0], 0xDEADBEEF
mov dword [-4], 0x01020304
mov rdi, [-4]
call writelonghex
In a custom OS, with pages mapped as appropriate, running in VirtualBox. writelonghex just writes rdi to the screen as a 16-digit hexadecimal number. The result:
So yes, it does just wrap. Nothing funny happens.
No flags should be affected (though the manual doesn't say that no flags should be set for address wrapping, it does say that mov reg, [mem] doesn't affect them ever, and that includes this case), and no interrupt/trap/whatever happens (unless of course one or both pages touched are not present).

How to write programs larger than 64KB for 8086 processor?

A segment is only 64KB long. so a program must be maximum 64KB in size to fit into a memory segment (i.e. if the segment register value is not to be changed).
Suppose we want to write a larger than 64KB program for 8086 system. Presumably this will requires change of the segment register value somewhere in the middle of the program? Do we change it explicitly inside the program or we just write the code and let OS handle it? How would OS like DOS handle such larger program?
x86 processors have variants of JMP and CALL where you specify a new value for CS (the code segment register). This is known as a far JMP/CALL, and the exact syntax differs between different assemblers. If we use NASM as an example, you'd write:
; Do an inter-segment jump to the label named foobar
jmp (seg foobar):foobar
; Do an inter-segment call to the subroutine named foobar
call (seg foobar):foobar
There might be assemblers that are smart enough to figure out to generate a far jump even if you just wrote jmp foobar and foobar is located in a different segment, though I can't name any examples since this isn't something I've tested.
If your program will be compiled to a .com file, your code cannot be larger than ~63 KB.
If you want to add code, you have to write that code to a separate file and to load the code from the file during runtime.
To create arrays or add code or data outside of the segment where the .com file has been loaded to, your program has to reserve free memory.
To do this, use the DOS function to reserve free space. Before doing this, use the DOS function to give back the whole memory that is used by your program except the current segment.

How to draw a pixel on the screen in protected mode in x86 assembly?

I am creating a little bootloader+kernel and till now I managed to read disk, load second sector, load GDT, open A20 and enable pmode.
I jumped to the 32-bits function that show me a character on the screen, using the video memory for textual content (0x000B0000 - 0x000B7777)
pusha
mov edi, 0xB8000
mov bl, '.'
mov dl, bl
mov dh, 63
mov word [edi], dx
popa
Now, I would like to go a little further and draw a single pixel on the screen. As I read on some website, if I want to use the graphics mode of the VGA, I have to write my pixel at location 0x000A0000. Is that right?
Now, what is the format of a single pixel? For a single character you need ASCII code and attribute, but what do you need to define a pixel (if it works the same way as the textual mode)?
Unfortunately, it's a little more than a little further.
The rules for writing to video memory depend on the graphics mode. Among traditional video modes, VGA mode 320x200 (8bpp) is the only one where video memory behaves like a normal kind of memory: you write a byte corresponding to a pixel you want to the video buffer starting from 0xA000:0000 (or 0xA0000 linear), and that's all.
For other VGA (pre-SVGA) modes, the rules are more complicated: when you write a byte to video memory, you address a group of pixels, and some VGA registers which I have long since forgotten specify which planes of those pixels are updated and how the old value of them is used. It's not just memory any more.
There are SVGA modes (starting with 800x600x8bpp); you can switch to them in a hardware-independent way using VESA Video Bios Extensions. In those modes, video memory behaves like memory again, with 1,2,3 or 4 bytes per pixel and no VGA-like 8-pixel groups which you touch with one byte access. The problem is that the real-mode video buffer is not large enough any more to address the whole screen.
VESA VBE 1.2 addressed this problem by providing functions to modify the memory window base: in any particular moment, the segment at linear 0xA0000 is addressing 64Kb region of video memory, but you can control which 64Kb of the whole framebuffer are available at this address (minimal unit of base address adjustment, a.k.a window granularity, depends on the hardware, but you can rely on the ability to map N*64Kb offset at 0xA0000). The downside is that it requires VBE BIOS call each time when you start working with different 64Kb chunk.
VESA VBE 2.0 added flat framebuffer, available at some high address in protected mode (also in unreal mode). Thus VBE BIOS call is required for entering video mode, but not for drawing pixels.
VESA VBE 3.0, which might not be portable enough yet, provides a way to call VBE functions in protected mode. (I didn't have a chance to try it, it was not there during my "OS in assembly" age).
Anyway, you have to switch to graphics mode first. There are several variants of doing that:
The easiest thing to do is to use a BIOS call before you enter protected mode. With VBE 2.0, you won't need video memory window adjustment calls.
Another way is creating a V8086-mode environment which is good enough for BIOS. The hardest part is forwarding interrupts to real-mode interrupt handlers. It's not easy, but when it's done, you'll be able to switch video modes in PM and use some other BIOS functions (for disk I/O, for example).
Yet another way is to use VESA VBE 3.0 protected mode interface. No idea on how easy or complicated it might be.
And a real Jedi way is digging out the information on your specific video card, switching modes by setting its registers. Been there, done that for some Cirrus card in the past -- getting big plain framebuffer in PM was not too complicated. It's unportable, but maybe it's just what you need if the aim is understanding the internals of your machine.
It depends on the graphics mode in use, and there are a lot a differences. BIOS VGA video mode 13h (320x200 at 8 bits/pixel) is probably the easiest to get started with (and it's the only BIOS VGA video mode with 256 colors, however you can create your own modes by writing directly to the ports of the video card): in BIOS video mode 13h the video memory mapped to screen begins at 0x0A0000 and it runs continuosly 1 byte for each pixel, and only 1 bit plane, so each coordinate's memory address is 0x0A000 + 320*y + x:
To change to BIOS video mode 13h (320 x 200 at 8 bits/pixel) while in real mode:
mov ax,0x13
int 0x10
To draw a pixel in the upper left corner (in video mode 13h) while in protected mode:
mov edi,0x0A0000
mov al,0x0F ; the color of the pixel
mov [edi],al
org 100h
bits 16
cpu 386
section.text:
START:
mov ax,12h
int 10h
mov al,02h
mov ah,0ch
pixel.asm
c:\>nasm pixel.asm -f bin -o pixel.com
int 10h

Heap overflow exploit

I understand that overflow exploitation requires three steps:
1.Injecting arbitrary code (shellcode) into target process memory space.
2.Taking control over eip.
3.Set eip to execute arbitrary code.
I read ben hawkens articles about heap exploitation and understood few tactics about how to ultimatly override a function pointer to point to my code.
In other words, I understand step 2.
I do not understand step 1 and 3.
How do I inject my code to the process memory space ?
During step 3 I override a function pointer with a
Pointer to my shellcode, How can I calculate\know what address
Was my injected code injected into ? (This problem is solved
In stackoverflow by using "jmp esp).
In a heap overflow, supposing that the system does not have ASLR activated, you will know the address of the memory chunks (aka, the buffers) you use in the overflow.
One option is to place the shellcode where the buffer is, given that you can control the contents of the buffer (as the application user). Once you have placed the shellcode bytes in the buffer, you only have to jump to that buffer address.
One way to perform that jump is by, for example, overwriting a .dtors entry. Once the vulnerable program finishes, the shellcode - placed in the buffer - will be executed. The complicated part is the .dtors overwriting. For that you will have to use the published heap exploiting techniques.
The prerequisites are that ASLR is deactivated (to know the address of the buffer before executing the vulnerable program) and that the memory region where the buffer is placed must be executable.
On more thing, steps 2 and 3 are the same. If you control eip, it's logic that you will point it to the shellcode (the arbitrary code).
P.S.: Bypassing ASLR is more complex.
Step 1 requires a vulnerability in the attacked code.
Common vulnerabilites include:
buffer overflow (common i C code, happens if the program reads an arbitrary long string into a fixed buffer)
evaluation of unsanitized data (common in SQL and script languages, but can occur in other languages as well)
Step 3 requires detailed knowledge of the target architecture.
How do I inject my code into process space?
This is quite a statement/question. It requires an 'exploitable' region of code in said process space. For example, Windows is currently rewriting most strcpy() to strncpy() if at all possible. I say if possible
because not all areas of code that use strcpy can successfully be changed over to strncpy. Why? BECAUSE ~# of this crux in difference shown below;
strcpy($buffer, $copied);
or
strncpy($buffer, $copied, sizeof($copied));
This is what makes strncpy so difficult to implement in real world scenarios. There has to be installed a 'magic number' on most strncpy operations (the sizeof() operator creates this magic number)
As coders' we are taught using hard coded values such as a strict compliance with a char buffer[1024]; is really bad coding practise.
BUT ~ in comparison - using buffer[]=""; or buffer[1024]=""; is the heart of the exploit. HOWEVER, if for example we change this code to the latter we get another exploit introduced into the system...
char * buffer;
char * copied;
strcpy(buffer, copied);//overflow this right here...
OR THIS:
int size = 1024;
char buffer[size];
char copied[size];
strncpy(buffer,copied, size);
This will stop overflows, but introduce a exploitable region in RAM due to size being predictable and structured into 1024 blocks of code/data.
Therefore, original poster, looking for strcpy for example, in a program's address space, will make the program exploitable if strcpy is present.
There are many reasons why strcpy is favoured by programmers over strncpy. Magic numbers, variable input/output data size...programming styles...etc...
HOW DO I FIND MYSELF IN MY CODE (MY LOCATION)
Check various hacker books for examples of this ~
BUT, try;
label:
pop eax
pop eax
call pointer
jmp label
pointer:
mov esp, eax
jmp $
This is an example that is non-working due to the fact that I do NOT want to be held responsible for writing the next Morris Worm! But, any decent programmer will get the jist of this code and know immediately what I am talking about here.
I hope your overflow techniques work in the future, my son!

Resources