I'm working on a virtual machine under Debian with EGLIBC 2.13 in order to learn memory address.
So I wrote a simple code giving me the address of a test variable, but everytime I exec this script, I'm getting a totally different address.
Here's two screens from 2 distincts executions :
What's causing this ? The fact I'm working on a VM or my GLIBC version ?
I guess it's GLIBC to prevent buffer overflow but I can't find my answer on the web.
And is it totally random ?
First, Glib (from GTK) is not GNU libc (a.k.a. glibc)
Then, you are observing the effect of ASLR (address space layout randomization). Don't try to disable it on a server directly connected to the Internet, it is a valuable security measure.
ASLR is mostly provided by the Linux kernel (e.g. when handing mmap(2) without MAP_FIXED, as most implementations of malloc do, and probably also at execve(2) time for the initial stack). Changing your libc (e.g. to musl-libc) won't disable it.
You could disable system-wide ASLR on a laptop (or on a Linux system running inside some VM) using proc(5): run
echo 0 > /proc/sys/kernel/randomize_va_space
as root. Be careful, by doing that you are decreasing the security of your system.
I don't know what you call totally random, but ASLR is random enough. IIRC, (but I might be wrong) the middle 32 bits of the 64-bits address (assuming a 64 bits Linux system) are quite random, to the point of making result of mmap (hence of malloc using it) practically unpredictable and non-reproducible.
BTW, to see ASLR in practice, try several times (with ASLR enabled) the following command
cat /proc/self/maps
this command displays a textual representation of the address space (in virtual memory) of the process running that cat command. You'll see different outputs when you run it several times !
For debugging memory leaks, use valgrind. With a recent GCC 4.9 or better (or recent Clang/LLVM) compiler, the address sanitizer is also useful, so you could compile with gcc then -Wall -Wextra to get all the warnings even the extra ones, then -g to get debug info, then -fsanitize=address
Related
Not sure if anyone here can answer this.
I've learned that an Operating System checks if an instruction of a program changes something outside of its allocated memory, and if it does then the OS won't allow the program to do this.
But, if the OS has to check this for every instruction, won't this take up at least 5/6 of the CPU? I tried to replicate this, and this is how many clock cycles I've come up with to check this for every instruction.
If I've understood something wrong, please correct me, because I can't imagine that an OS takes up that much of the CPU.
There are several safe-guards in place to ensure a non-privileged process behaves. I will discuss two of them in the context of the x86_64 architecture, but these concepts (mostly) extend to other major platforms.
Privilege Levels
There is a bit in a particular CPU register that indicates the current privilege level. These privileges are often called rings, where ring 0 corresponds to the kernel (ie. highest privilege), and ring 3 corresponds to a userspace process (ie. lowest privilege). There are other rings, but they're not relevant to this introduction.
Certain instructions in x86_64 may only be executed by privileged processes. The current ring must be 0 to execute a privileged instruction. If you try to execute this instruction without the correct privileges, the processor raises a general protection fault. The kernel synchronously processes this interrupt, and will almost certainly kill the userspace process.
The ring level can only be changed while in ring 0, so the userspace process can't simply change from ring 3 to ring 0 by itself.
Execute Permission in Page Tables
All instructions to be executed are stored in memory. Many architectures (including x86_64) use page tables to store mappings from virtual addresses to physical addresses. These page tables have several bookkeeping entries as well, one of which is an execute permission bit. If this bit is not set for a page that corresponds to the instruction trying to be executed, then the processor will produce a general protection fault. As before, the kernel will synchronously process this interrupt, and likely kill the offending process.
When are these execute bits set? They can be dynamically set via mmap(2), but in most cases the compiler emits special CODE sections in the binaries it generates, and when the OS loads the binary into memory it sets the execute bit in the page table entries for the pages that correspond to the CODE sections.
Who's checking these bits?
You're right to ask about the performance penalty of an OS checking these bits for every single instruction. If the OS were doing this, it would be prohibitively expensive. Instead, the processor supports privilege levels and page tables (with the execute bit). The OS can set these bits, and rely on the processor to generate interrupts when a process acts outside its privileges.
These hardware checks are very fast.
Above image indicates that the Update Ramdisk loads the kernel during an iOS update. If so which binary (ASR, etc.) in the iOS 10.3.1 Update Ramdisk loads the Kernel?
None of them, that's not how ramdisks work.
For starters, the kernel operates on and with the ramdisk, not the other way round. This is true for any kernel-ramdisk pair I've seen on any platform so far.
Furthermore, binaries from the iOS ramdisks are all userland binaries, which means:
They rely on the dynamic linker (/usr/lib/dyld) and system libraries.
They rely on system calls.
They rely on the availability of a file system.
They run in EL0 ("userland"), the least privileged processor mode.
If any of those wanted to load the kernel, there would be a number of problems with that:
The kernel runs in EL1. If you run in EL0, then you are not privileged to access anything in EL1 and thus cannot put any kernel there.
Linking, libraries and system calls work very differently in EL1:
System libraries are not available in EL1. I suppose they could be made available, but since there can only be one binary executing in EL1 at any given time, that sounds like a huge overkill.
There exists a linker for EL1 in iOS (KXLD), but it is part of the iOS kernel and its designed to link kernel extensions to the kernel. It doesn't operate on userland binaries.
While technically you can generate an exception from EL1 targeting EL1 with the svc instruction, you yourself will be invoked to handle it, which means that until you load the kernel, you are the kernel. Userland binaries are not prepared for that.
That said, I'm not sure what your image is trying to express. My best guess would be that it means that the denoted ramdisk is passed to the kernel. In any case though, iBoot is the one loading and setting up the kernel.
I start to explore in the area of computer architecture. There are 2 questions about ISA that confuse me.
As far as I know, there are different kinds of ISA such as ARM, MIPS, 80x86, etc. I wonder whether a CPU can only specifically read one kind of ISA. For example, can a processor read both 80x86 and MIPS.
If a CPU is unique to an ISA, how can I check which ISA my PC processor is using? Can I find it out manually?
Thank you
All the CPU/MCU's I know of support just single instruction set.
There is capability of loading microcode to some of the newer architectures that may allow to change the instruction set behavior to some point bot strongly doubt it you can change the instruction set with it. Instruction set and internal CPU/MCU circuitry are strongly dependent. Making universal CPU with changeable instruction set is possible (for example with FPGA) but would be very slow in comparison to direct DIE encoded CPU. With similar technology of Die the clock speed would be may be just few MHz.
Some architectures like i80x86 supports modes that can switch to different kind of operation (16/32/64 bit,real,protected) but its hard to say it is different instruction set or just subset of the same thing ...(matter of perspective)
detection of instruction set.
This is madness. Yes it is possible to detect which type of instruction set you have via program but all the CPU/MCU's have different pinout, interfaces, architectures and are not interchangeable (even in the same architecture class) so you detecting instruction set is meaningless as you alredy know the architecture you are doing the wiring for ...
Anyway the detection would work like this:
have set of test programs of each supported instruction set/architecture that will set specific memory or IO to predefined state if working properly
have watch dog cycling between all the detections and stop on first valid result.
Yes, each type of CPU is unique to an instruction set. The instruction set for ARM will not work with x86, SPARC, etc. There may be some overlap by coincidence, but programs are not compatible between architectures.
Depending on your operating system, there are commands you can run to see this information. For unix/Linux, uname -a will show you what architecture you're running, as well as dmidecode. For Windows OS's, right-clicking on My Computer and selecting Properties should show you your architecture.
For example (Windows 7):
For Linux (I know, it's a super-old distro!):
$ uname -a
Linux hostname 2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:32:27 UTC 2010 x86_64 GNU/Linux
(In this example, the architecture is x86_64), which is 64-bit Intel or AMD. To tell for sure, you can run dmidecode as I mentioned earlier:
~# dmidecode |grep -i proc
Processor Information
Type: Central Processor
Version: AMD Opteron(tm) Processor 154
Processor Information
Type: Central Processor
Version: AMD Opteron(tm) Processor 154
It can actually read any instruction set if the support is implemented. Most of the CPUs nowadays support two/three instructions set that only slightly differ because of 32-bit/64-bit addressing.
x86 supports 16-bit, 32-bit and 64-bit instructions set, ARM support 32-bit, 64-bit, for both Thumb and Thumb-2, etc. Similarly for MIPS for example.
Original Transmeta I believe was flexible about it and supposed to transcompile any instruction set into internal set and run it natively. However it failed and nowadays there is nothing similar to it.
Anyway, once you run application, it's bound to specific instruction set in its header so it can't change it during the runtime. Well, ARM is exception to that - it's able to switch between full and Thumb versions but they are just different encoding for the same...
For the second part - either in your OS GUI or you can usually read it - in Linux by reading /proc/cpuinfo, on Windows in the environment variable PROCESSOR_ARCHITECTURE.
I've a program where argv[0] gets overwritten from time to time. This happens (only) on a production machine which I cannot access and where I cannot use a debugger. In order to find the origin of this corruption, I'd like to write protect this stack page, so that any write access would be turned in a fault, and I could get the address of the culprit instruction.
The system is an AIX 5.3 64 bits based. When I try to invoke mprotect on my stack page, I get an ENOMEM error. I'm using gcc to generate my program.
On a Linux system (x86 based) I can set a similar protection using mprotect without trouble.
Is there any way to achieve this on AIX. Or is this a hopeless attempt?
On AIX, mprotect() requires that requested pages be shared memory or memory mapped files only. On AIX 6.1 and later, you can extend this to the text region, shared libraries, etc, with the MPROTECT_TXT environment variable.
You can however use the -qstackprotect option on XLC 11/AIX 6.1TL4 and later. "Stack Smashing Protection" is designed to protect against exactly the situation you're describing.
On AIX 5.3, my only suggestion would be to look into building with a toolset like Parasoft's Insure++. It would locate errant writes to your stack at runtime. It's pretty much the best (and now only) tool in the business for AIX development. We use it in house and its invaluable when you need it.
For the record, a workaround for this problem is to move processing over to a pthread thread. On AIX, pthread thread stacks live in the data segment which can be mprotected (as opposed to the primordial thread, which cannot be mprotected). This is the way the JVM (OpenJDK) on AIX implements stack guards.
I've compiled and am currently running a program with g++. I expect that it's going to take a while to run but I'm hoping I might be able to speed it up. I'm currently using Ubuntu. Checking the system monitor I found the terminal I'm running the program from. While it certainly is using a chunk of memory, there's far more memory available. Is there some sort of command for the terminal or something that will allow me to allocate more memory to the program so it will run a bit faster? Or a command for g++? Or just something to put in the C++ code?
Thanks!
Giving the program more memory will not make it run faster; it will ask for more memory from the operating system as needed. You're thinking of the behavior of languages with garbage collection, like Java. Normal C++ programs don't include a garbage collector, and thus won't run any faster with a larger heap.