How can you tell if a memory page is marked as read-only? - memory

When using copy-on-write semantics to share memory among processes, how can you test if a memory page is writable or if it is marked as read-only? Can this be done by calling a specific assembler code, or reading a certain spot in memory, or through the OS's API?

On Linux you can examine /proc/pid/maps:
$ cat /proc/self/maps
002b3000-002cc000 r-xp 00000000 68:01 143009 /lib/ld-2.5.so
002cc000-002cd000 r-xp 00018000 68:01 143009 /lib/ld-2.5.so
002cd000-002ce000 rwxp 00019000 68:01 143009 /lib/ld-2.5.so
002d0000-00407000 r-xp 00000000 68:01 143010 /lib/libc-2.5.so
00407000-00409000 r-xp 00137000 68:01 143010 /lib/libc-2.5.so
00409000-0040a000 rwxp 00139000 68:01 143010 /lib/libc-2.5.so
0040a000-0040d000 rwxp 0040a000 00:00 0
00c6f000-00c70000 r-xp 00c6f000 00:00 0 [vdso]
08048000-0804d000 r-xp 00000000 68:01 379298 /bin/cat
0804d000-0804e000 rw-p 00004000 68:01 379298 /bin/cat
08326000-08347000 rw-p 08326000 00:00 0
b7d1b000-b7f1b000 r--p 00000000 68:01 226705 /usr/lib/locale/locale-archive
b7f1b000-b7f1c000 rw-p b7f1b000 00:00 0
b7f28000-b7f29000 rw-p b7f28000 00:00 0
bfe37000-bfe4d000 rw-p bfe37000 00:00 0 [stack]
The first column is the virtual memory address range, the second column contains the permissions (read, write, execute, and private), columns 3-6 contain the offset, major and minor device numbers, the inode, and the name of memory mapped files.

On Win32, the best way is to use VirtualQuery. It returns a MEMORY_BASIC_INFORMATION for the page an address falls in. One of the members is Protect, which is some combination of these flags, which contain the possible protection modes. The function also tells you if the memory is free, committed, reserved, and whether it is private, part of an image or shared memory section.
The OS's API is the best way to detirmine the a page's protection. The CPU reads the protection mode from a page descriptor, which is only accessible from kernel mode.

Are you talking abou the variety of shared memory allocated via shmget (on Unix)? I.e.
int shmget(key_t, size_t, int);
If so, you can query that memory using
int shmctl(int, int, struct shmid_ds *);
For example:
key_t key = /* your choice of memory api */
int flag = /* set of flags for your app */
int shmid = shmget(key, 4096, flag);
struct shmid_ds buf;
int result = shmctl(shmid, IPC_STAT, &buf);
/* buf.ipc_perm.mode contains the permissions for the memory segment */

If you're using Win32, there are the calls IsBadReadPtr and IsBadWritePtr. However, their use is discouraged:
"The general consensus is that the IsBad family of functions (IsBadReadPtr, IsBadWritePtr, and so forth) is broken and should not be used to validate pointers."
The title of Raymond Chen's take on this says it all: "IsBadXxxPtr should really be called CrashProgramRandomly"
Chen has some helpful advice about how to deal with this issue here.
The upshot is, you shouldn't be testing this kind of thing at run-time. Code so that you know what you're being handed, and if it's not what's expected, treat it as a bug. If you really have no choice, look into SEH for handling the exception.

Related

thread keeps generating page fault

So i was recently trying to exploit a kernel mode driver (HEVD) and I encountered a problem which doesn't make sense to me.
After i jumped to my shellcode (which resides in usermode and was made executable by VirtualProtect), the thread keeps generating a pagefault. Then the Thread keeps generating this pagefault in an endless loop (it doesn't throw an exception).
So i investegated what kind of page fault was triggered and i got the following output for the pagefault-handler:
Breakpoint 0 hit
nt!KiPageFault+0x8:
fffff806`7d201d08 488dac2480000000 lea rbp,[rsp+80h]
0: kd> dd %rsp + 0x158
ffff9307`717e9768 51dce820 ffffad82 00000011 00000000
ffff9307`717e9778 ee49cf8e 000001a4 00000010 00000000
ffff9307`717e9788 00010206 00000000 717e97a0 ffff9307
ffff9307`717e9798 00000018 00000000 00000003 00000000
ffff9307`717e97a8 c00000bb 00000000 0000004d 00000000
ffff9307`717e97b8 00000018 00000000 00000003 00000000
ffff9307`717e97c8 7a0e643b fffff806 00000000 00000000
ffff9307`717e97d8 55555555 55555555 7a0e8b10 fffff806
Here we see the stack after the page fault occured. The Breakpoint is triggered immediatly after i resume the execution (and yes, the breakpoint is conditional for only this thread). The frame is exactly the same every time the breakpoint was triggered.
So I tried to decode the stackframe. A pagefault pushes some infomation on the stack. I decoded it manually (I don't know a better way to do this) and I got that the following:
RFLAGS 10206
CS 10
RIP 000001a4ee49cf8e
Errcode 11 --> P, I bits set
CR2 000001a4ee49cf8e
This should be right decoded (maybe it's erroneous. The top of the stack was ffff9307`717e9770, so the 0x...11 was the last thing pushed onto the stack). So the errocodes says that the exception is thrown while the page was present and it was during an instruction fetch.
Now: I think this should mean a access violation because of the NX-Bit. But this cannot be because the address is executable, as this snippet shows:
VA 000001a4ee49cf8e
PXE at FFFFA8D46A351018 PPE at FFFFA8D46A203498 PDE at FFFFA8D440693B90 PTE at
FFFFA880D27724E0
contains 8A00000027704867 contains 0A0000010DA05867 contains 0A0000010F9BE867 contains
0100000119B6C825
pfn 27704 ---DA--UW-V pfn 10da05 ---DA--UWEV pfn 10f9be ---DA--UWEV pfn 119b6c ----A-
UREV
The executable bit is set for the pte. So I asked myself why this happens. It's a bit weird also that it doesn't throw a access-violation and create a bugcheck.
Also the OS runs on virtualbox. I looked up the settings and I noticed that SMEP and SMAP aren't supported by my virtual machine (but by my host system. VirtualBox catches cpuid-Instructions and pretends that this feature is disabled). So this shoudn't be the problem either. Also if you dump the register, the SMEP/SMAP bits in CR4 are not set.
I have really no clue what could be the cause of this problem. Maybe it's because of the virtual machine. It could also be because Windows gets confused when a thread runs userocde or something else. I really thought a long time about it but maybe I just overlookeda a simple reason.
Thanks in advance

Poke opcodes into memory

Hi I am trying to understand whether it is possible to take instruction opcodes and 'poke' them into memory or smehow convert them to a binary program. I have found an abandoned lisp project here: http://common-lisp.net/viewvc/cl-x86-asm/cl-x86-asm/ which takes x86 asm instructions and converts them into opcodes (please see example below). The project does not go further to actually complete the creation of the binary executable. Hence I would need to do that 'manually' Any ideas can help me. Thanks.
;; assemble some code in it
(cl-x86-asm::assemble-forms
'((.Entry :PUSH :EAX)
(:SUB :EAX #XFFFEA)
(:MOV :EAX :EBX)
(:POP :EAX)
(:PUSH :EAX)
(.Exit :RET))
Processing...
;; print the assembled segment
(cl-x86-asm::print-segment)
* Segment type DATA-SEGMENT
Segment size 0000000C bytes
50 81 05 00 0F FF EA 89
03 58 50 C3
Clozure Common Lisp for example has this built-in. This is usually called LAP, Lisp Assembly Program.
See defx86lapfunction.
Example:
(defx86lapfunction fast-mod ((number arg_y) (divisor arg_z))
(xorq (% imm1) (% imm1))
(mov (% number) (% imm0))
(div (% divisor))
(mov (% imm1) (% arg_z))
(single-value-return))
SBCL can do some similar with VOP (Virtual Operations).
http://g000001.cddddr.org/2011-12-08
I learned that it can be done using CFFI/FFI for example the very simple asm code:
(:movl 12 :eax)
(:ret)
This will be converted to the following sequence of octets: #(184 12 0 0 0 195) which in hex it is: #(B8 C 0 0 0 C3). The next step is to send it to a location in memory as such:
(defparameter pointer (cffi:foreign-alloc :unsigned-char :initial-contents #(184 12 0 0 0 195)))
;; and then execute it as such to return the integer 12:
(cffi:foreign-funcall-pointer pointer () :int)
=> result: 12
Thanks to the experts in #lisp (freenode irc channel) for helping out with this solution.

How to find the event handler for a DispatcherTimer in windbg

I have a Silverlight 3 application which seems to be leaking DispatcherTimer objects. At least, over time when the application runs I find more of them on the heap:
!dumpheap -type DispatcherTimer
returns an increating number of them.
I'd like to find the Tick event handler method for these so I can identify where they're created in my code.
When I try dumping one of these in windbg, I get something like:
!do 098b9980
Name: System.Windows.Threading.DispatcherTimer
MethodTable: 0bfd4ba0
EEClass: 0bc98d18
Size: 20(0x14) bytes
File: C:\Program Files (x86)\Microsoft Silverlight\4.0.50524.0\System.Windows.dll
Fields:
MT Field Offset Type VT Attr Value Name
0bfd1538 40008be 4 ...eObjectSafeHandle 0 instance 098b9994 m_nativePtr
0bfd3d0c 40008bf 8 ...reTypeEventHelper 0 instance 098b99ac _coreTypeEventHelper
506a07e4 40008c0 c System.Boolean 1 instance 1 _isEnabled
0bfd3c68 40008c1 cec ...ependencyProperty 0 shared static IntervalProperty
>> Domain:Value 086d3f38:NotInit 086daeb8:098b99b8 <<
But from here, I don't know how to find the method handling the Tick event. I suspect it's something to do with _coreTypeEventHelper, but when I dump that, I get:
!do 098b99ac
Name: MS.Internal.CoreTypeEventHelper
MethodTable: 0bfd3d0c
EEClass: 0bc98420
Size: 12(0xc) bytes
File: C:\Program Files (x86)\Microsoft Silverlight\4.0.50524.0\System.Windows.dll
Fields:
MT Field Offset Type VT Attr Value Name
00000000 40009f5 4 0 instance 098b9ae4 _eventAndDelegateTable
506a0e94 40009f4 514 System.Int32 1 shared static _nextAvailableTableIndex
>> Domain:Value 086d3f38:NotInit 086daeb8:669 <<
then I dump the _eventAndDelegateTable:
Name: System.Collections.Generic.Dictionary`2[[System.Int32, mscorlib],[MS.Internal.CoreTypeEventHelper+EventAndDelegate, System.Windows]]
MethodTable: 0bfcc0a0
EEClass: 5026c744
Size: 52(0x34) bytes
File: C:\Program Files (x86)\Microsoft Silverlight\4.0.50524.0\mscorlib.dll
Fields:
MT Field Offset Type VT Attr Value Name
5068f2d0 4000648 4 System.Int32[] 0 instance 098b9b18 buckets
50691060 4000649 8 ...non, mscorlib]][] 0 instance 098b9b30 entries
506a0e94 400064a 20 System.Int32 1 instance 1 count
506a0e94 400064b 24 System.Int32 1 instance 1 version
506a0e94 400064c 28 System.Int32 1 instance -1 freeList
506a0e94 400064d 2c System.Int32 1 instance 0 freeCount
50697f08 400064e c ...Int32, mscorlib]] 0 instance 098b9650 comparer
506ccfb0 400064f 10 ...Canon, mscorlib]] 0 instance 00000000 keys
506ceaac 4000650 14 ...Canon, mscorlib]] 0 instance 00000000 values
506a02e4 4000651 18 System.Object 0 instance 00000000 _syncRoot
506895d8 4000652 1c ...SerializationInfo 0 instance 00000000 m_siInfo
And then I'm kind of lost!
Before attempting to find the relevant event handler, you could also search for the source of the leak by investigating why the DispatcherTimer instances doesn't get released.
After you have the output of the !dumpheap -type DispatcherTimer, execute the !gcroot command on a couple of instances of DispatcherTimer. You should be able to see which object holds a reference to the timer.
Also, you could place appropriate breakpoints (using !bpmd), in order to obtain helpful stacktraces.

BSS, Stack, Heap, Data, Code/Text - Where each of these start in memory?

Segments of memory - BSS, Stack, Heap, Data, Code/Text (Are there any more?).
Say I have a 128MB RAM, Can someone tell me:
How much memory is allocated for each of these memory segments?
Where do they start? Please specify the address range or something like that for better clarity.
What factors influence which should start where?
That question depends on the number of variables used. Since you did not specify what compiler or language or even operating system, that is a difficult one to pin down on! It all rests with the operating system who is responsible for the memory management of the applications. In short, there is no definite answer to this question, think about this, the compiler/linker at runtime, requests the operating system to allocate a block of memory, that allocation is dependent on how many variables there are, how big are they, the scope and usage of the variables. For instance, this simple C program, in a file called simpletest.c:
#include <stdio.h>
int main(int argc, char **argv){
int num = 42;
printf("The number is %d!\n", num);
return 0;
}
Supposing the environment was Unix/Linux based and was compiled like this:
gcc -o simpletest simpletest.c
If you were to issue a objdump or nm on the binary image simpletest, you will see the sections of the executable, in this instance, 'bss', 'text'. Make note of the sizes of these sections, now add a int var[100]; to the above code, recompile and reissue the objdump or nm, you will find that the data section has appeared - why? because we added a variable of an array type of int, with 100 elements.
This simple exercise will prove that the sections grows, and hence the binary gets bigger, and it will also prove that you cannot pre-determine how much memory will be allocated as the runtime implementation varies from compiler to compiler and from operating system to operating system.
In short, the OS calls the shot on the memory management!
you can get all this information compiling your program
# gcc -o hello hello.c // you might compile with -static for simplicity
and then readelf:
# readelf -l hello
Elf file type is EXEC (Executable file)
Entry point 0x80480e0
There are 3 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x55dac 0x55dac R E 0x1000
LOAD 0x055dc0 0x0809edc0 0x0809edc0 0x01df4 0x03240 RW 0x1000
NOTE 0x000094 0x08048094 0x08048094 0x00020 0x00020 R 0x4
Section to Segment mapping:
Segment Sections...
00 .init .text .fini .rodata __libc_atexit __libc_subfreeres .note.ABI-tag
01 .data .eh_frame .got .bss
02 .note.ABI-tag
The output shows the overall structure of hello. The first program header corresponds to the process' code segment, which will be loaded from file at offset 0x000000 into a memory region that will be mapped into the process' address space at address 0x08048000. The code segment will be 0x55dac bytes large and must be page-aligned (0x1000). This segment will comprise the .text and .rodata ELF segments discussed earlier, plus additional segments generated during the linking procedure. As expected, it's flagged read-only (R) and executable (X), but not writable (W).
The second program header corresponds to the process' data segment. Loading this segment follows the same steps mentioned above. However, note that the segment size is 0x01df4 on file and 0x03240 in memory. This is due to the .bss section, which is to be zeroed and therefore doesn't need to be present in the file. The data segment will also be page-aligned (0x1000) and will contain the .data and .bss ELF segments. It will be flagged readable and writable (RW). The third program header results from the linking procedure and is irrelevant for this discussion.
If you have a proc file system, you can check this, as long as you get "Hello World" to run long enough (hint: gdb), with the following command:
# cat /proc/`ps -C hello -o pid=`/maps
08048000-0809e000 r-xp 00000000 03:06 479202 .../hello
0809e000-080a1000 rw-p 00055000 03:06 479202 .../hello
080a1000-080a3000 rwxp 00000000 00:00 0
bffff000-c0000000 rwxp 00000000 00:00 0
The first mapped region is the process' code segment, the second and third build up the data segment (data + bss + heap), and the fourth, which has no correspondence in the ELF file, is the stack. Additional information about the running hello process can be obtained with GNU time, ps, and /proc/pid/stat.
example taken from:
http://www.lisha.ufsc.br/teaching/os/exercise/hello.html
memory depend on the global variable and local variable

Incremental Checksums

I am looking for a checksum algorithm where for a large block of data the checksum is equal to the sum of checksums from all the smaller component blocks. Most of what I have found is from RFCs 1624/1141 which do provide this functionality. Does anyone have any experience with these checksumming techniques or a similar one?
If it's just a matter of quickly combining the checksums of the smaller blocks to get to the checksums of the larger message (not necessarily by a plain summation) you can do this with a CRC-type (or similar) algorithm.
The CRC-32 algorithm is as simple as this:
uint32_t update(uint32_t state, unsigned bit)
{
if (((state >> 31) ^ bit) & 1) state = (state << 1) ^ 0x04C11DB7;
else state = (state << 1);
return state;
}
Mathematically, the state represents a polynomial over the field GF2 that is always reduced modulo the generator polynomial. Given a new bit b the old state is transformed into the new state like this
state --> (state * x^1 + b * x^32) mod G
where G is the generator polynomial and addition is done in GF2 (xor). This checksum is linear in the sense that you can write the message M as a sum (xor) of messages A,B,C,... like this
10110010 00000000 00000000 = A = a 00000000 00000000
00000000 10010001 00000000 = B = 00000000 b 00000000
00000000 00000000 11000101 = C = 00000000 00000000 c
-------------------------------------------------------------
= 10110010 10010001 11000101 = M = a b c
with the following properties
M = A + B + C
checksum(M) = checksum(A) + checksum(B) + checksum(C)
Again, I mean the + in GF2 which you can implement with a binary XOR.
Finally, it's possible to compute checksum(B) based on checksum(b) and the position of the subblock b relative to B. The simple part is leading zeros. Leading zeros don't affect the checksum at all. So checksum(0000xxxx) is the same as checksum(xxxx). If you want to compute the checksum of a zero-padded (to the right -> trailing zeros) message given the checksum of the non-padded message it is a bit more complicated. But not that complicated:
zero_pad(old_check_sum, number_of_zeros)
:= ( old_check_sum * x^{number_of_zeros} ) mod G
= ( old_check_sum * (x^{number_of_zeros} mod G) ) mod G
So, getting the checksum of a zero-padded message is just a matter of multiplying the "checksum polynomial" of the non-padded message with some other polynomial (x^{number_of_zeros} mod G) that only depends on the number of zeros you want to add. You could precompute this in a table or use the square-and-multiply algorithm to quickly compute this power.
Suggested reading: Painless Guide to CRC Error Detection Algorithms
I have only used Adler/Fletcher checksums which work as you describe.
There is a nice comparison of crypto++ hash/checksum implementations here.
To answer Amigable Clark Kent's bounty question, for file identity purposes you probably want a cryptographic hash function, which tries to guarantee that any two given files have an extremely low probability of producing the same value, as opposed to a checksum which is generally used for error detection only and may provide the same value for two very different files.
Many cryptographic hash functions, such as MD5 and SHA-1, use the Merkle–Damgård construction, in which there is a computation to compress a block of data into a fixed size, and then combine that with a fixed size value from the previous block (or an initialization vector for the first block). Thus, they are able to work in a streaming mode, incrementally computing as they go along.

Resources