STM32 Memory Dump and Extracting Secret Key

STM32 Memory Dump and Extracting Secret Key - memory

I am quite new at embedded development and started with a STM32F429 board to improve myself.
I have just developed a basic Caesar encryption application for my board. It is working well, and defined the secret key as "3". Now I would like to extract this super secret(!) key from my device.
How can I do it? Should I dump the memory or firmware of my device, and how?
May you suggest me any software for this proccess? (Not ST Utility or STM softwares please. Because I would like to try gained experiences on other devices as well.)
Thanks!

I take it the value you're looking for is hardcoded. In that case it resides in the internal flash. So yes, a memory dump will be necessary.
I will go the long way and assume that you know very little about how it works, so if you know some of this stuff, well, good for you. I will try to give a few pointers.
Specifically about STM32:
You have an option to boot the microcontroller from the so-called system memory, which is read-only memory, and it is already preprogrammed from factory with a bootloader. You can talk to that running bootloader via UART (most common way, comes with ST-Link, but any cheapo USB-UART bridge also works). Or it can be some other protocol. You can ask that bootloader to read its flash out to you, among other things. This is covered in AN2606.pdf. It has some useful links in it, such as:
names of documents, where you can find specific bootloader commands for any interface you wish. Of course, you only care about interfaces, that the bootloader of your specific MCU F429 supports, which are found in the same AN2606, page 172 (for bootloader version 0.7, there is also 0.9 for those MCUs, I have no idea how to tell which one you have, so...try? UART configuration seems identical anyway):
So what exactly needs to be done? Flip the state of BOOT0 pin - permanently - of the MCU and reset it (power cycle or reset pin, both ok). You will boot the MCU into bootloader instead of booting program from flash. You can read about it in the Reference manual STM32F429, page 69. It talks about states of BOOT0 and BOOT1 pins on boot. What pins are boot0 - if they're not marked on your board, then you'll have to consult F429 datasheet, page 69 (I swear, it's a coincidence). Depending on your specific IC, it will be one pin or another.
It will activate all MCUs peripherals as per docs above and it and wait on its UART and other pins for commands. Commands listed in the documents I provided above. Let's take a look at AN3155 about USART of bootloader:
And the commands are
are all in that document, the table of contents in pdf really helps to find stuff quickly. Of course, if you need specific details, and you will need specific information about specific commands, it's all in there too. How many bytes in command, how many bytes at a time you read from flash etc. Basically, you can either write your own program that does that (even program another microcontroller to program that microcontroller using victim's bootloader), or use any other software that knows what commands to send to the bootloader. It can be ST utility, it can be any other program. They all implement the very same command set, so it doesn't actually matter much. I couldn't find many programs that do that, the only thing that stood out was stm32flash. Never used it myself. I'm ok with ST stuff, since I know what it does (I think).
Oh yeah back to getting the secret value out. I almost forgot about that. Well, then you open the dump in hex viewer/editor, and scroll it around looking for interesting combination of values. Yeah, that's kinda what it looks like. One can run it via disassembly. Scroll disassembled code around, see if there are any numeric values that stand out. You know, some random number 0xD35B581 or something hardcoded in the middle of pretty program could mean something, like be a serial number or a secret number. Unfortunately, I'm reaching boundaries of my competence here, so won't go any further on what one can do with dump.

Related

Multiple boards Windows Device Manager prompts resource conflict, error code is 12

I inserted multiple same boards into the system. Device PCIe is implemented with Xilinx IP core. After each FPGA program is programed, manually refresh the device manager to check whether the device and driver are working properly.
My confusion is that it seems that this approach can only work on two boards at the same time. After the third board is programmed , and then refresh the task manager, the system prompts insufficient resources, "This device cannot find enough free resources that it can use (Code 12)"
I tried to disable the other two boards, but the device still prompted conflicts. I don't know how to query conflicting resources.
My boards have 2 BARs(BAR0: 2KB, BAR1:16MB), and 1 IRQ.
I did a few experiments and felt that it was caused by the conflict of memory resources. The conflicting party was the AMD integrated graphics card that came with the motherboard.
1st, 2nd, 3rd all indicate my board number.
3rd is not recognized because of conflicts.
After I shut it down, I plugged in all three boards and then turned it on again. As a result, during the startup process, the system restarted again after a sudden power failure, and then all three boards were normal. At this time, it is found that the memory address of the graphics card has changed
I want to know how to resolve the conflict? modify my driver code or FPGA configuration?

PCIe enumeration should resolve memory allocations issues, however there are a couple implementation issues to be aware of. Case in point, I have used Xilinx XDMA's with a 64 bit BAR of 2GB in size and I have literally bricked a DELL XPS motherboard. But I have done the same with an IBM system and it just worked. The point here is that enumeration can be done with Firmware, Hardware or OS driven event. If you doing hardware manager, that sounds like OS driven, but when I toasted the XPS board, that was some kind of FW issue that was related to BAR size that resulted in a permanent failure. 16MB isn't big and it should not be a problem, but I would recommend going with the Xilinx defaults first and show reliability there. I think it's one BAR at 1M. I have run 3x 64 bit BARS at 1MB a piece without issue, but keep it simple and show reliability. Then move up. This will help isolate if it is a system flakiness or not.
I have seen some system use FW based enumeration that comes up really fast, before the FPGA has configured, in which case there is no PCIe target to ID. If you frequently find that your FPGA is not detected on power up, but detected on a rescan, this can be a symptom. How to resolve this is a bit of a pain. We ended up using partial reconfiguration. Start with the PCIe interface, then have a reconfig to load the remaining image. Let's hope it isn't this problem
The next thing is to be aware of is your reset mechanism within the FPGA. You probably hooked the PCIe IP reset to the bus reset which is great, however, I have in the past also hooked that reset to internal PLL locked signals that may not be up. For troubleshooting purposes, keep that reset simple and get rid of everything else to show that just the PCIe IP by itself is reliable first.
You also have to be careful here too. If you strip things down, make sure it is clean. If you ignore the PLL lock and try to use a Xilinx driver, such as the XDMA driver, it has a routine where it tries to identify the XDMA with data transactions. It is looking for the DMA BAR one BAR at a time. But when it does this, the transaction it attempts may go out on the AXI bus if the BAR isn't the XDMA control BAR. If the AXI bus isn't out of reset or clocked when this happens you will lock the AXI bus and I have locked up a Linux box this way on several occasions. AXI requires that transactions complete, otherwise it just sits there waiting.
BTW, on a Linux box, you can look at the enumeration output in the kernel log. I'm not sure if Windows shows you the same thing or not. But this can be helpful if you see that the device was initially probed but then something invalid was detected in the config register, versus not being seen at all.
So a couple things to look at.

Does simulating memory-mapped I/O using VMX require instruction decoding?

I am wondering how a hypervisor using Intel's VMX / VT technology would simulate memory-mapped I/O (so that the guest could think it was performing memory mapped I/O againsta device).
I think the basic principle would be to set up the EPT page tables in such a way that the memory addresses in question would cause an EPT violation (i.e. VM exit) by setting them such that they cannot be read or written? However, the next question is how to process the VM exit. Such a VM-exit would fill out all the exit qualification reasons etc. including the guest-linear and guest-physical address etc. But what I am missing in these exit qualification fields is some field indicating - in case of a write instruction - the value that was attempted to be written and the size of the write. Likewise, for a read instruction it would be nice with some bit fields indicating the destination of the read, say a register or a memory location (in case of memory-to-memory string operations). This would make it very easy for the hypervisor to figure out what the guest was trying to do and then simulate the device behavior towards the guest.
But the trouble is, I can't find such fields among the exit qualifications. I can see an instruction pointer to where the faulting instruction is, so I could walk the page tables to read in the instruction and then decode it to understand the instruction, then simulate the I/O behavior. However, this requires the hypervisor to have a fairly complete picture of all x86 instructions, and be able to decode them. That seems to be quite a heavy burden on the hypervisor, and will also require it to stay current with later instruction additions. And the CPU should already have this information.
There's a chance that that I am missing these relevant fields because the documentation is quite extensive, but I have tried to search carefully but have not been able to find it. Maybe someone can point me in the right direction OR confirm that the hypervisor will need to contain an instruction decoder.

I believe most VMs decode the instruction. It's not actually that hard, and most VMs have software emulators to fallback on when the CPU VM extensions aren't available or up to the task. You don't need to handle every instruction, just those that can take memory operands, and you can probably ignore everything that isn't a 1, 2, or 4 byte memory operand since you're not likely to emulating device registers other than those sizes. (For memory mapped device buffers, like video memory, you don't want to be trapping every memory accesses because that's too slow, and so you'll have to take different approach.)
However, there is one way you can let the CPU do the work for you, but it's much slower then decoding the instruction itself and it's not entirely perfect. You can single step the instruction while temporarily mapping in a valid page of RAM. The VM exit will tell you the guest physical address access and whether it was a read or write. Unfortunately it doesn't reliably tell you whether it was read-modify-write instruction, those may just set the write flag, and with some device registers that can make a difference. It might be easier to copy the instruction (it can only be a most 15 bytes, but watch out for page boundaries) and execute it in the host, but that requires that you can map the page to same virtual address in the host as in the guest.
You could combine these techniques, decode the common instructions that are actually used to access memory mapped device registers, while using single stepping for the instructions you don't recognize.
Note that by choosing to write your own hypervisor you've put a heavy burden on yourself. Having to decode instructions in software is a pretty minor burden compared to the task of emulating an entire IBM PC compatible computer. The Intel virtualisation extensions aren't designed to make this easier, they're just designed to make it more efficient. It would be easier to write a pure software emulator that interpreted the instructions. Handling memory mapped I/O would be just a matter of dispatching the reads and writes to the correct function.

I don't know in details how VT-X works, but I think I see a flaw in your wishlist way it could work:
Remember that x86 is not a load/store machine. The load part of add [rdi], 2 doesn't have an architecturally-visible destination, so your proposed solution of telling the hypervisor where to find or put the data doesn't really work, unless there's some temporary location that isn't part of the guest's architectural state, used only for communication between the hypervisor and the VMX hardware.
To handle a read-modify-write instruction with a memory destination efficiently, the VM should do the whole thing with one VM exit. So you can't just provide separate load and store interfaces.
More importantly, handling atomic read-modify-writes is a special case. lock add [rdi], 2 can't just be done as a separate load and store.

Watchdog timeout during call to file.format?

This question is entirely unrelated to my code, but to satisfy the obligatory show your code directive:
file.format()
Before the call above returns, on this one SoC I always get a wdt reset. Sometimes but not always the flash does appear to be formatted when the chip is started again. And sometimes if freezes after the wdt reset message, and has to be powered off (looks like wrong comm parameters after pressing hardware reset, but none of the terminal app options seemed to match.)
(Note: since starting this draft I built another copy of my device, using another new, recently received ESP8266-12E, and it behaves identically. Previously built copies still work normally, with the identical firmware.)
So this must be a bad chip, right? Or maybe bad on-board flash? It is a brand new one I just bought. I've also seen file.write issues, with buffer size always 255 bytes or less, though no read issues at all.
One other quirk, after burning a cloud-built nodemcu image to this ESP8266-12E device, adc.read returned 65535 and adc.readvdd33 returned an apparently valid value. (I corrected that by burning esp_init_data_default.bin to 0x3FC000.) This was the first (out of 15, maybe 20) I have seen that was like that. I did not check to see if an older version of nodemcu was already on it.
This wouldn't be the first chip with which I've had issues on arrival; it's at least the 2nd, likely the 3rd or 4th.
So maybe the larger question, what percentage of the ESP8266's that you buy, are either DOA or suffer infant mortality? (Not counting the ones that you have reason to believe were inadvertently killed.)

The problem can be something other than the ESPs, like a inappropriate power supply. I know from my own experience that the Arduino Uno and most USB-TTL converters cannot safely deliver enough current to ESPs. If you're not already, consider using a dedicated power supply circuit that are connected to a USB power source.

It does indeed seem to be a hardware issue, 2 bad out of 6, not good! I think it might be a certain vendor but don't want to name names without being sure... Whatever is wrong with the chip hangs it up long enough to make the watchdog bark.
Much more than the cost of the part, the time consumed figuring out whether it's lua code, firmware, supporting connections, peripherals or the chip itself, is the costly thing (not to mention frustration, and wasted storage on SO.)

Detecting bad sectors using Delphi or freepascal

Thanks to help by David Heffernan I have a program written in Freepascal (but a Delphi solution to my question would suffice) that reads a physical disk sector by sector. It does so using the Windows API CreateFileW function for the disk handle, then FileFile, FileSeek etc to navigate and read. If all the sectors are OK, it works fine. However, if the disk had bad sectors, I need to treat them differently.
My question is, is there and procedures or libraries that can be used, while reading these sectors, to determine if they are bad sectors? If not, how might I go about it? I gather it is the disk controller that knows what sectors are bad and which are not, so I don't think my program can actually access a bad sector, so how can I detect which are the bad ones and act accordingly? Does one need to query SMART and if so, how?
I have searched this site (only found this C post, which relates to a program, not code) and Googled it and no obvious solutions came to my attention.

Generally speaking, you can't access bad sectors at all (they have been remapped already so are out of LBA). What you can access are pending sectors, attempts to read them will always cause a read error. SMART will tell you nothing but the number of bad/pending sectors. So you probably should continue using chosen API interpreting persistent read errors as diagnostics for "bad" sectors, just make sure they aren't caused by access sharing violation.
If you want to obtain a p-list or g-list somehow, it is only possible (for PATA/SATA, not SCSI) in terminal mode, what requires connection to HDD's service port, USB-to-COM adapter and is vendor- and product-specific, if possible at all.

Sectors and their hardware status are not things that normal user-level code needs to deal with so there is no easy copy/paste API available for this purpose.
Also in general the sector concept is abstracted away on multiple levels. For one example see the Wikipedia: logical disk address translation. Physical sector status is very low-level concept. Some hardware vendors even don't expose it through public API at all. Bad (or suspicious) sectors are often detected in the hardware itself and automatically redirected to other places. So in general the bad disk-sector concept does not exist
MSDN Logging Guidelines
...Bad sectors. If a disk driver encounters a bad sector, it may be able to read from or write to the sector after retrying the operation, but the sector will go bad eventually. If the disk driver can proceed, it should log a Warning event; otherwise, it should log an Error event. If a file system driver finds a large number of bad sectors and fixes them, logging Warning events might help an administrator determine that the disk may be about to fail...
If you really need to work with this low-level concepts then first forget about Pascal or Delphi as your requirements.
Learn how to use the Windows API and once you know it bind to the API in your language of choice (you can map any Win32 user-level API function to Free Pascal easily).
For understanding how user-level code sees the disk abstraction start reading documentation at MSDN → Dev Center - Desktop → Device Management Reference → Device Management Functions → DeviceIOControl function
For understanding how the kernel-level code sees the hardware and how does it communicate with user-level code start reading documentation at MSDN → Dev Center - Hardware → Develop → Drivers → Concepts for all driver developers
For example of reading S.M.A.R.T. disk information see WinSim Inc. DISKID32 source code function ReadPhysicalDriveInNTUsingSmart() in diskid32.cpp
In my opinion you are going to swim in a dark & deep waters without flashlight and swim ring and you should think twice about what you (or your users) really need/want and perhaps improve the question to get a reasonably-sized on-topic answer

How to log mallocs

This is a bit hypothetical and grossly simplified but...
Assume a program that will be calling functions written by third parties. These parties can be assumed to be non-hostile but can't be assumed to be "competent". Each function will take some arguments, have side effects and return a value. They have no state while they are not running.
The objective is to ensure they can't cause memory leaks by logging all mallocs (and the like) and then freeing everything after the function exits.
Is this possible? Is this practical?
p.s. The important part to me is ensuring that no allocations persist so ways to remove memory leaks without doing that are not useful to me.

You don't specify the operating system or environment, this answer assumes Linux, glibc, and C.
You can set __malloc_hook, __free_hook, and __realloc_hook to point to functions which will be called from malloc(), realloc(), and free() respectively. There is a __malloc_hook manpage showing the prototypes. You can add track allocations in these hooks, then return to let glibc handle the memory allocation/deallocation.
It sounds like you want to free any live allocations when the third-party function returns. There are ways to have gcc automatically insert calls at every function entrance and exit using -finstrument-functions, but I think that would be inelegant for what you are trying to do. Can you have your own code call a function in your memory-tracking library after calling one of these third-party functions? You could then check if there are any allocations which the third-party function did not already free.

First, you have to provide the entrypoints for malloc() and free() and friends. Because this code is compiled already (right?) you can't depend on #define to redirect.
Then you can implement these in the obvious way and log that they came from a certain module by linking those routines to those modules.
The fastest way involves no logging at all. If the amount of memory they use is bounded, why not pre-allocate all the "heap" they'll ever need and write an allocator out of that? Then when it's done, free the entire "heap" and you're done! You could extend this idea to multiple heaps if it's more complex that that.
If you really do need to "log" and not make your own allocator, here's some ideas. One, use a hash table with pointers and internal chaining. Another would be to allocate extra space in front of every block and put your own structure there containing, say, an index into your "log table," then keep a free-list of log table entries (as a stack so getting a free one or putting a free one back is O(1)). This takes more memory but should be fast.
Is it practical? I think it is, so long as the speed-hit is acceptable.

You could run the third party functions in a separate process and close the process when you are done using the library.

A better solution than attempting to log mallocs might be to sandbox the functions when you call them—give them access to a fixed segment of memory and then free that segment when the function is done running.
Unconfined, incompetent memory usage can be just as damaging as malicious code.

Can't you just force them to allocate all their memory on the stack? This way it would be garanteed to be freed after the function exits.

In the past I wrote a software library in C that had a memory management subsystem that contained the ability to log allocations and frees, and to manually match each allocation and free. This was of some use when attempting to find memory leaks, but it was difficult and time consuming to use. The number of logs was overwhelming, and it took an extensive amount of time to understand the logs.
That being said, if your third party library has extensive allocations, its more then likely impractical to track this via logging. If you're running in a Windows environment, I would suggest using a tool such as Purify[1] or BoundsChecker[2] that should be able to detect leaks in your third party libraries. The investment in the tool should pay for itself in time saved.
[1]: http://www-01.ibm.com/software/awdtools/purify/ Purify
[2]: http://www.compuware.com/products/devpartner/visualc.htm BoundsChecker

Since you're worried about memory leaks and talking about malloc/free, I assume you're in C. I'm also assuming based on your question that you do not have access to the source code of the third party library.
The only thing I can think of is to examine memory consumption of your app before & after the call, log error messages if they're different and convince the third party vendor to fix any leaks you find.

If you have money to spare, then consider using Purify to track issues. It works wonders, and does not require source code or recompilation. There are also other debugging malloc libraries available that are cheaper. Electric Fence is one name I recall. That said, the debugging hooks mentioned by Denton Gentry seem interesting too.

If you're too poor for Purify, try Valgrind. It it a lot better than it was 6 years ago and a lot easier to dive into than Purify.

Microsoft Windows provides (use SUA if you need a POSIX), quite possibly, the most advanced heap+(other api known to use the heap) infrastructure of any shipping OS today.
the __malloc() debug hooks and the associated CRT debug interfaces are nice for cases where you have the source code to the tests, however they can often miss allocations by standard libraries or other code which is linked. This is expected as they are the Visual Studio heap debugging infrastructure.
gflags is a very comprehensive and detailed set of debuging capabilities which has been included with Windows for many years. Having advanced functionality for source and binary only use cases (as it is the OS heap debugging infrastructure).
It can log full stack traces (repaginating symbolic information in a post-process operation), of all heap users, for all heap modifying entrypoint's, serially if needed. Also, it may modify the heap with pathalogical cases which may align the allocation of data such that the page protection offered by the VM system is optimally assigned (i.e. allocate your requested heap block at the end of a page, so even a singele byte overflow is detected at the time of the overflow.
umdh is a tool which can help assess the status at various checkpoints, however the data is continually accumulated during the execution of the target o it is not a simple checkpointing debug stop in the traditional context. Also, WARNING, Last I checked at least, the total size of the circular buffer which store's the stack information, for each request is somewhat small (64k entries (entries+stack)), so you may need to dump rapidly for heavy heap users. There are other ways to access this data but umdh is fairly simple.
NOTE there are 2 modes;
MODE 1, umdh {-p:Process-id|-pn:ProcessName} [-f:Filename] [-g]
MODE 2, umdh [-d] {File1} [File2] [-f:Filename]
I do not know what insanity gripped the developer who chose to alternate between -p:foo argument specifier's and naked ordering of argument's but it can get a little confusing.
The debugging sdk works with a number of other tools, memsnap is a tool which apparently focuses on memory leask and such, but I have not used it, your milage may vary.
Execute gflags with no arguments for the UI mode, +arg's and /args are different "modes" of use also.

On Linux I've successfully used mtrace(3) to log allocations and freeings. Its usage is as simple as
Modify your program to call mtrace() when you need to begin tracing (e.g. at the top of main()),
Set environment variable MALLOC_TRACE to the file path where the trace should be saved and run the program.
After that the output file will contain something like this (excerpt from the middle to show a failed allocation):
# /usr/lib/tls/libnvidia-tls.so.390.116:[0xf44b795c] + 0x99e5e20 0x49
# /opt/gcc-7/lib/libstdc++.so.6:(_ZdlPv+0x18)[0xf6a80f78] - 0x99beba0
# /usr/lib/tls/libnvidia-tls.so.390.116:[0xf44b795c] + 0x9a23ec0 0x10
# /opt/gcc-7/lib/libstdc++.so.6:(_ZdlPv+0x18)[0xf6a80f78] - 0x9a23ec0
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668ee49] + 0x99c67c0 0x8
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668f14f] - 0x99c67c0
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668ee49] + (nil) 0x30000000
# /lib/libc.so.6:[0xf677f8eb] + 0x99c21f0 0x158
# /lib/libc.so.6:(_IO_file_doallocate+0x91)[0xf677ee61] + 0xbfb00480 0x400
# /lib/libc.so.6:(_IO_setb+0x59)[0xf678d7f9] - 0xbfb00480

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart