how do programs find and call device drivers' routines? - device-driver

Let's say a device driver has been written, compiled and loaded by the OS.
To call its subroutines, I assume one has to know what subroutines are provided by the device drivers (know its interface/API, I guess). So, I also assume we know about this interface thing too (correct me if I'm wrong).
Now the question is how we are supposed to find the entry point of these subroutines to call them; or better to say what is the exact procedure of calling a device driver's subroutines.
I am guessing it should be something like loading a dynamic library in which a linker finally puts the addresses of subroutines in the calling programs' address space.
NOTE
I am completely new to this field, so any information/link/illustration is truly appreciated.

The details of this are platform-specific.
For true device drivers, i.e. where the driver is the software interface for a physical device, the driver usually conforms to a certain standard interface that the O/S calls when needed to display something on screen, write a sector on a disk, and so on.
On Windows, if your driver exists to provide a software service that can only be performed in kernel mode, you can define your own call interface within your driver that can then be access from user mode via the DeviceIOControl function.

Related

STM32 Memory Dump and Extracting Secret Key

I am quite new at embedded development and started with a STM32F429 board to improve myself.
I have just developed a basic Caesar encryption application for my board. It is working well, and defined the secret key as "3". Now I would like to extract this super secret(!) key from my device.
How can I do it? Should I dump the memory or firmware of my device, and how?
May you suggest me any software for this proccess? (Not ST Utility or STM softwares please. Because I would like to try gained experiences on other devices as well.)
Thanks!
I take it the value you're looking for is hardcoded. In that case it resides in the internal flash. So yes, a memory dump will be necessary.
I will go the long way and assume that you know very little about how it works, so if you know some of this stuff, well, good for you. I will try to give a few pointers.
Specifically about STM32:
You have an option to boot the microcontroller from the so-called system memory, which is read-only memory, and it is already preprogrammed from factory with a bootloader. You can talk to that running bootloader via UART (most common way, comes with ST-Link, but any cheapo USB-UART bridge also works). Or it can be some other protocol. You can ask that bootloader to read its flash out to you, among other things. This is covered in AN2606.pdf. It has some useful links in it, such as:
names of documents, where you can find specific bootloader commands for any interface you wish. Of course, you only care about interfaces, that the bootloader of your specific MCU F429 supports, which are found in the same AN2606, page 172 (for bootloader version 0.7, there is also 0.9 for those MCUs, I have no idea how to tell which one you have, so...try? UART configuration seems identical anyway):
So what exactly needs to be done? Flip the state of BOOT0 pin - permanently - of the MCU and reset it (power cycle or reset pin, both ok). You will boot the MCU into bootloader instead of booting program from flash. You can read about it in the Reference manual STM32F429, page 69. It talks about states of BOOT0 and BOOT1 pins on boot. What pins are boot0 - if they're not marked on your board, then you'll have to consult F429 datasheet, page 69 (I swear, it's a coincidence). Depending on your specific IC, it will be one pin or another.
It will activate all MCUs peripherals as per docs above and it and wait on its UART and other pins for commands. Commands listed in the documents I provided above. Let's take a look at AN3155 about USART of bootloader:
And the commands are
are all in that document, the table of contents in pdf really helps to find stuff quickly. Of course, if you need specific details, and you will need specific information about specific commands, it's all in there too. How many bytes in command, how many bytes at a time you read from flash etc. Basically, you can either write your own program that does that (even program another microcontroller to program that microcontroller using victim's bootloader), or use any other software that knows what commands to send to the bootloader. It can be ST utility, it can be any other program. They all implement the very same command set, so it doesn't actually matter much. I couldn't find many programs that do that, the only thing that stood out was stm32flash. Never used it myself. I'm ok with ST stuff, since I know what it does (I think).
Oh yeah back to getting the secret value out. I almost forgot about that. Well, then you open the dump in hex viewer/editor, and scroll it around looking for interesting combination of values. Yeah, that's kinda what it looks like. One can run it via disassembly. Scroll disassembled code around, see if there are any numeric values that stand out. You know, some random number 0xD35B581 or something hardcoded in the middle of pretty program could mean something, like be a serial number or a secret number. Unfortunately, I'm reaching boundaries of my competence here, so won't go any further on what one can do with dump.

Does simulating memory-mapped I/O using VMX require instruction decoding?

I am wondering how a hypervisor using Intel's VMX / VT technology would simulate memory-mapped I/O (so that the guest could think it was performing memory mapped I/O againsta device).
I think the basic principle would be to set up the EPT page tables in such a way that the memory addresses in question would cause an EPT violation (i.e. VM exit) by setting them such that they cannot be read or written? However, the next question is how to process the VM exit. Such a VM-exit would fill out all the exit qualification reasons etc. including the guest-linear and guest-physical address etc. But what I am missing in these exit qualification fields is some field indicating - in case of a write instruction - the value that was attempted to be written and the size of the write. Likewise, for a read instruction it would be nice with some bit fields indicating the destination of the read, say a register or a memory location (in case of memory-to-memory string operations). This would make it very easy for the hypervisor to figure out what the guest was trying to do and then simulate the device behavior towards the guest.
But the trouble is, I can't find such fields among the exit qualifications. I can see an instruction pointer to where the faulting instruction is, so I could walk the page tables to read in the instruction and then decode it to understand the instruction, then simulate the I/O behavior. However, this requires the hypervisor to have a fairly complete picture of all x86 instructions, and be able to decode them. That seems to be quite a heavy burden on the hypervisor, and will also require it to stay current with later instruction additions. And the CPU should already have this information.
There's a chance that that I am missing these relevant fields because the documentation is quite extensive, but I have tried to search carefully but have not been able to find it. Maybe someone can point me in the right direction OR confirm that the hypervisor will need to contain an instruction decoder.
I believe most VMs decode the instruction. It's not actually that hard, and most VMs have software emulators to fallback on when the CPU VM extensions aren't available or up to the task. You don't need to handle every instruction, just those that can take memory operands, and you can probably ignore everything that isn't a 1, 2, or 4 byte memory operand since you're not likely to emulating device registers other than those sizes. (For memory mapped device buffers, like video memory, you don't want to be trapping every memory accesses because that's too slow, and so you'll have to take different approach.)
However, there is one way you can let the CPU do the work for you, but it's much slower then decoding the instruction itself and it's not entirely perfect. You can single step the instruction while temporarily mapping in a valid page of RAM. The VM exit will tell you the guest physical address access and whether it was a read or write. Unfortunately it doesn't reliably tell you whether it was read-modify-write instruction, those may just set the write flag, and with some device registers that can make a difference. It might be easier to copy the instruction (it can only be a most 15 bytes, but watch out for page boundaries) and execute it in the host, but that requires that you can map the page to same virtual address in the host as in the guest.
You could combine these techniques, decode the common instructions that are actually used to access memory mapped device registers, while using single stepping for the instructions you don't recognize.
Note that by choosing to write your own hypervisor you've put a heavy burden on yourself. Having to decode instructions in software is a pretty minor burden compared to the task of emulating an entire IBM PC compatible computer. The Intel virtualisation extensions aren't designed to make this easier, they're just designed to make it more efficient. It would be easier to write a pure software emulator that interpreted the instructions. Handling memory mapped I/O would be just a matter of dispatching the reads and writes to the correct function.
I don't know in details how VT-X works, but I think I see a flaw in your wishlist way it could work:
Remember that x86 is not a load/store machine. The load part of add [rdi], 2 doesn't have an architecturally-visible destination, so your proposed solution of telling the hypervisor where to find or put the data doesn't really work, unless there's some temporary location that isn't part of the guest's architectural state, used only for communication between the hypervisor and the VMX hardware.
To handle a read-modify-write instruction with a memory destination efficiently, the VM should do the whole thing with one VM exit. So you can't just provide separate load and store interfaces.
More importantly, handling atomic read-modify-writes is a special case. lock add [rdi], 2 can't just be done as a separate load and store.

Write-Only Memory

I know there exists read-only values in many languages (final in Java const in C++ etc.) but does such a thing as "Write-Only" values exist? I've heard a variation of this in jokes, such as write-only code, but I'm wondering if this is actually a legitimate concept in computer science. To be honest, I can't see how it would be helpful in any situation, but I'm just wondering.
In unix shell scripting there is a concept of write only memory. But it's not part of any shell or scripting language, it's a device: /dev/null.
The write-only device /dev/null is used to discard output you don't want. Generally by allowing the caller to redirect stdout and/or stderr to it.
There are other write-only memory on a computer. One example is your sound card which on some (older) unix machines are mapped to /dev/audio or /dev/dsp. Writing values to it makes your speaker produce sound but reading from it gets you nothing.
At the lower level of the device drivers themselves, these hardware devices are often connected to a specific memory or I/O address (some CPU architectures don't have separate memory and I/O address - just a single address space shared by RAM and all other hardware). So in a real sense these memory locations are really write-only.
There were certainly some FPUs for PCs that used a somewhat weird setup, by existing as memory-mapped devices. To perform some operations, you would simply write the value you wanted to operate on, to a memory address indicating what operation you wanted performed, the value would then (eventually) be available at another address.
I don't know if you would define this, strictly, as "write-only memory", it is rather memory where (part of) the address is used as an opcode.

What exactly is a character driver ?

I know the definition
A character device driver is one that transfers data directly to and
from a user process.
But can some one explain this in a more intutive way? First of all there should be a device. What is the device in the above definition?
If you say it can be a file, then can we say file reading and putting the data on the console an example of character driver?
What exactly is a character driver ?
Device driver is Integration of two pieces of code. First piece of code is how the driver services are made available to the application.(user space)
Second piece of code is the hardware access part. Instructions to carry out physical operation on target hardware.
Based on first piece of code we have three models. Character model, Block model, Network model. we can say character driver, block driver, Network driver.
First piece of code is totally kernel specific. what Interface kernel provides to access the hardware. Implementing first piece of code on Linux,windows on other os may be different. we should know what inteface it offers to provide services to application .
In linux From user perspective everyhardware is a file. Why means at boot time all the hardware devices present are detected and added to device tree. Based on hardware corresponding device node gets created automatically in /dev directory. As mentioned above if it is char device---> character device node. Block device---> block device node gets created.
To write character driver. we create a device node in /dev directory , assigning a major no and application usually performing read/write operations on device file.
We implement driver operations and assign driver operations to file structure (fops) pointer.
Device file request received by VFS. VFS turns device file operations into your corresponding driver operations.
APP--->dev file----> fops--->driver---->device.
dev file ---> interface between application and driver.
driver interacts ----> device
You can refer http://lwn.net/images/pdf/LDD3/ch03.pdf.
According to my knowledge,
The device can be your own private structure or system object.
The driver is said to be a char driver because the data read and write is in byte range
If you are writing your char driver you can use char buffer or kfifo to read and write into the device.
you can create your device file in procfs,
and can read/write as u wish and this is accomplished though your char driver
I hope that my answer helps you
Think of device driver as abstraction of various hardware that the kernel has to deal with. The device driver knows the details of hardware it is communicating with. So the kernel reads and writes data as it would do with any other filesystem file.
If you can make some reading on Device Drivers, you will find out that the system calls of fopen, read, write, fclose are all mapped to driver specific function calls.
There is 2 types of drivers: block driver and char driver. The difference is that first one handles with blocks of data and second one receives/transmits data byte by byte.
Ok, for example you have a device and you want to talk to it, how to do it?
You have a driver, simply a number of functions, that will be called by the kernel at the right time. Driver itself can't do anything (neither send data, nor receive data), you need to say your driver what it should do. As you should know so far, you can't discretely communicate with a kernel from user space. There are some tricks that you can use. What you are talking about named as device file. Your driver generates a file for your device and when you write something to this file from user space, driver is notified that it is some information to handle, it takes this information and provide this data to write function, that will transmit data to your device(in this case byte by bite).
Hopefully, this what you wanted to know.

How is external memory, internal memory, and cache organized?

Consider a system as follows: a hardware board having say ARM Cortex-A8 and Neon Vector coprocessor, and Embedded Linux OS running on Cortex-A8. On this environment, if some application - say, a video decoder - is executing, then:
How is it decided which buffers would be in external memory, which ones would be allocated in internal SRAM, etc.
When one calls calloc/malloc on such a system/code, the pointer returned is from which memory: internal or external?
Can a user make buffers to be allocated in the memories of his choice (internal/external)?
In ARM architectures, there is another memory called "tightly coupled memory" (TCM). What is that and how can user enable and use it? Can I declare buffers in this memory?
Do I need to see the memory map (if any) of the hardware board to understand about all these different physical memories present in a typical hardware board?
How much of a role does the OS play in distinguishing these different memories?
Sorry for multiple questions, but i think they all are interlinked.
Please note that I'm not familiar with the ARM nor embedded Linux's specifically, so all of my comments will be from a general point of view.
First, about cache: Very early during boot, the operating system will do some amount of cache initialization. Exactly what this entails will vary from processor to processor, but the net effect is to ensure cache is initialized properly, and then enable its use by the processor. After this, the cache is operated exclusively by the processor with no further interaction by the operating system or your programs.
Now, on to external (off-chip) and internal (on-chip) memories:
The operating system owns all hardware on the system, including the internal and external memories and so is ultimately responsible for discovering, configuring, and allocating these resources within the kernel and to user processes. In a typical system (eg, your desktop or a 1u server) there won't usually be any special internal (on-chip) ram, and so the operating system can treat all dram equally. It will go into a general pool of pages (usually 4k) for allocation to processes, file system buffers, etc. On a system with special memory of various sorts (nvram, high-speed on-chip memory, and a few others), the operating system's general policies aren't usually correct.
How this is presented to the user will depend on choices made while porting the OS to this system.
One could modify the OS to be explicitly aware of this special memory, and provide special system calls to allocate it to to user land processes. However, this could be quite a bit of work unless the embedded linux being used has at least some support for this sort of thing.
The approach I'd probably take would be to avoid modifying the kernel itself, and instead write a device driver for the internal memory. A driver of this sort would typically provide some sort of mmap interface to allow user processes to get simple address-based access to the internal memory.
Here are answers to some of your concrete questions.
How much of a role does the OS play in distinguishing these different memories?
If your system has taken the device driver approach described above, then the OS probably knows only about external memory, or perhaps just enough about the internal memories to initialize them properly although that would likely be in the device driver too, if at all possible. If the OS knows more explicitly about the on-chip memory, then it will definitely contain any needed initialization code, as well as some sort of scheme to provide access to the user processes.
How is it decided which buffers would be in external memory, which ones would be allocated in internal SRAM, etc.
It seems unlikely to me that the operating system would try to automate such choices. Instead, I suspect that either the OS or a device driver would provide a generic interface to provide access to the on-chip memory, and leave it up to your user code to decide what to do with it.
When one calls calloc/malloc on such a system/code, the pointer returned is from which memory: internal or external?
Almost certainly, malloc and friends will return pointers into the general off-chip memory. In the driver-based approach suggested above, you'd use mmap to gain access to the on-chip memory. If you needed to do finer-grained allocation than that, you'd need to write your own allocator, or find one that can be given an explicit region of memory to work in.
Can a user make buffers to be allocated in the memories of his choice (internal/external)?
If by buffers you mean the regions returned from the standard malloc calls, probably not. But, if you mean "can a user program somehow get a pointer to the on-chip memory", then the answer is almost certainly yes, but the mechanism will depend on choices made when porting linux to this system.
In ARM architectures, there is another memory called "tightly coupled memory" (TCM). What is that and how can user enable and use it? Can I declare buffers in this memory?
I don't know what this is. If I had to guess, I'd assume it's just another form of on-chip ram, but since it has a different name, perhaps I'm wrong.
Do I need to see the memory map (if any) of the hardware board to understand about all these different physical memories present in a typical hardware board?
If the OS and/or device drivers have provided some sort of abstract access to these memory regions, then you won't need to know explicitly about the address map. This knowledge is, however, needed to implement this access in either the kernel or a device driver.
I hope this helps somewhat.

Resources