Finding and using memory offsets in an existing program? - memory

Most game botting applications use a series of memory offsets they have found for that particular version of a game client to facilitate botting. They might have a memory offset for health, x/y position, etc. Every time the game releases an update the offsets for the various pieces of information the bot program uses must be re-found and updated as well.
I'm interested in writing a Solitaire bot as a pet project. If you look here, mmoglider (a commercial bot) has already accomplished this as a demo for their botting program (which normally is used to bot WoW): YouTube video of MMOGlider botting Vista Solitaire.
What is a common method of accurately locating various useful memory offsets? How might I go about locating the memory offset that points to the "deck" in the solitaire program and use that to determine what cards are on the stack? I know from experience with the glider guys that once they were able to locate the offsets for the deck itself they said that every card value for the entire deck was there.
So, does anyone have any experience with reverse engineering and pulling memory offsets out of existing programs? And once you have those offsets how to be able to pull and read the values from that "Deck" structure in memory?

Typically there are two approaches to such tasks. For simplicity, let us consider a game with an integer amount of "health" for the player.
The first is to manipulate the process memory while the program is running. This is good for finding known values. When you have 100 health in a game, search the memory space for 100 (most likely as an integer) and record every location it is found. Then when your health changes to 99, cross-search those same locations to see which have changed appropriately. Continue until you have narrowed down the precise location(s) of the health variable. In most modern games what you will actually find is a dynamically allocated memory address that is part of a struct. That struct will be referenced by a pointer within the program, you then have to search within the program memory for values that may be a pointer to the space near the health variable, and repeat the narrowing-down process over multiple game runs to establish the position of the pointer to the data that you want. This is the method most useful for classic PC and console games, particularly any game where the memory space is small and easy to manipulate.
The second method requires you to disassemble the application binary (I use IDA Pro for this), then locate functions that are known to use the data that you want. For example, say you see "Health: 99" on the screen. Search the binary for the "Health: " string, then find references to that string (you will likely find a call to sprintf or similar) and see what other memory locations those same functions reference, this will usually lead you to the "health" variable or the struct containing it. This is the method most common in more modern games, with massive memory spaces and more advanced programming practices.

Related

Wasm Hot Reloading Experiment: Debunking Assumptions, and How to specify where the data section is?

First, to avoid making this seem like an XYZ problem, I'd like to give some context (Note I am not using Emscripten):
I am trying to see if I can implement a form of hot reloading for Wasm programs written in C++, hosted on the web. To do this, I want to have a section of memory that I call my "world state" (to anyone who has watched Handmade Hero ( https://handmadehero.org/ ), this will be familiar):
struct State {
// put everything here
} state;
Typically for a full C++ program with a platform layer, you'd allocate this struct on the platform side and feed a pointer to that memory through a function pointer in the reloadable/dll/dylib part of the code. The reloadable code puts EVERYTHING into this persistent memory so if the code needs to be recompiled and reloaded, all the state will continue to exist since the memory was allocated in the part of the program that wasn't reloaded. As far as I can tell, this is impossible in Wasm though.
Firstly, is my assumption correct that I have to use WebAssembly.Memory? --or can I allocate a uint8array in js and use that for my persistent state, separate from the program memory? If so, is that slower?
So this will work as long as I don't use a dynamic allocator like WASI, and instead use a push allocator I can control. (I think this because, suppose I use malloc to get memory addresses and reload--malloc's internal state will reload and think all the heap memory is available when it's not, so future allocations might clobber previous ones.)
Upon reload, I can first copy the struct into a temporary buffer on the js side, reload, get the memory location of the struct from Wasm (I will require that it exists), and copy the saved memory from js back into position.
However this falls apart if I use pointers because if I change the program (which is the point) __data_end might change, which would offset all of the addresses! I checked the linker flags here https://lld.llvm.org/WebAssembly.html to see what I could control. I can specify that the stack comes before the data segment, but the heap would still come after that, which results in the same problem. I can also specify where the global data are located, but that's not the data segment I believe, so the variable-size data segment could still offset all of my addresses.
Here's a nice page that can help us visualize the Wasm memory: https://dassur.ma/things/c-to-webassembly/
Would anyone have any thoughts on how to achieve what I'd like? The only options I can think of involve somehow using memory outside the Wasm memory (possibly slower or impossible), using only stack memory and no pointers (unrealistic unless I can auto-recalculate all pointer offsets after a recompile, which would be painful and bug-prone), or finding a way to make the data segment come after the stack and heap at a fixed address, which would then guarantee that the stack and heap segments wouldn't get offset if the data segment needs to grow. Another option, if possible, would be to fix the max size of the data segment. The Wasm spec/documentation aren't really great when it comes to memory manipulation like this, so I'd appreciate some clarification about what's possible too. Lastly, maybe I could use two Wasm modules (but wouldn't that sort of indirection be slow)? I might be missing something crucial related to the memory layout.
Please let me know if you need more details. I've done something like this before in C, as I mentioned, and it's a common rapid iteration game-dev technique. Basically I'm trying to recreate it in Wasm.
EDIT: Apparently you can call Wasm functions from another module directly. Firstly, how do you do it, and secondly, what would be performance characteristics be for accessing the memory of another module?
EDIT2: Maybe some form of dynamic linking if that's supported? https://webassembly.org/docs/dynamic-linking/
WebAssembly modules hold variable state in three distinct places:
Linear memory
Local variables associated with the execution stack
Global variables
Of these, only global variables and linear memory are accessible to the host environment, and potentially serialisable in order to cache them as you hot-reload your module. There is of course no way to directly access and store the current call-stack.
If I were looking to achieve this, I'd create my own state machine within WebAssembly, storing this within a known location within linear memory.
Wasm is organised into modules, and modules define four relevant kinds of entities: functions, memories, tables, globals. The code is in the functions, while the other three represent a module's state.
Now, the interesting thing is that all four of these entity kinds can be imported and exported. Moreover, all of them can be created externally to the module, e.g., by the JS API.
Consequently, a way to emulate code swapping is to set up your module such that all three pieces of state are created externally and imported into the module. That way, you can keep them alive externally and pass them to the upgraded module once available. (You also need to make sure that the upgraded module doesn't use data/element segments or start functions in a way that paves over preexisting state.)
Of course, this only works if the shape of the module's state does not change between upgrades. E.g., no new globals, no new data layout in memory, otherwise the new code won't understand the old state. That is actually the hard part of the problem, but it's independent from Wasm specifics.

Finding a specific memory location in game?

I'm using cheat engine to find where in the game's memory certain properties are stored. For example - my player's health. Ultimately I want to write a program that will know where in memory to look so that my program can make decisions based on the current game state. I can and have found where in memory certain things are stored, the problem is that each time the game is opened the memory locations change. What do I need to do so that my program can work around the changing memory locations?
The issue is that in your cheat engine table you're using hard coded addresses for these variables. The variables are either dynamically allocated or are allocated staticly relative to the base address of a module. To fix this you can use pointers to the variables, in which the pointers are staticly located or calculated at runtime using relative offsets from module base addresses. You would use "Find out what accesses" to find pointers or the Pointer Scanner to do this. You grab dynamic addresses of modules by using the ToolHelp32Snapshot windows API function. You can also use signature scanning to scan for an array of bytes that represent instructions that access the variable at runtime. Then grab the address from the instruction operands.

Opening camera in multiple program in openCV

How can I open a single webcam in multiple programs written in openCV simultaneously. Btw I have attached 3 webcams and all are working fine in any single program of openCV, but why two program can't use them both, simultaneously?
Is this a restriction or is there any workaround?
Yes, this is an intention to restrict
Why? The conceptual view is related to the hardware control layer. Operating system assumes, there are some peripherals, that can be used on-demand, but keeps their context-of-use non-share-able.
As an example, one may assume a USB-mouse. While it can be used within several processes, some reasoning told the architects, it would not be the case one mouse should feed events into more than one-( window )-framed -context ( Yes, right, a.k.a. process )
Some other peripherals are even instances of both an EventSENSOR and an EventCOLLECTOR, USB-Cameras for example can receive and process signals from operating system to re-adjust their physical state ( Pan-Tilt-Zoom as an example ).
The more obvious becomes the 1:1 relation assumption, which is something we may sometimes wish not to have hardcoded there. On the other hand, what would the poor device do, if one process instructs it go-left and another go-right at the same time?
Similarly, who would be happy if one USB-mouse will send it's motion and other MMI-interaction events to all currently running processes? At least Drag&Drop UI-navigation policy will become a funny lottery.
Yes, there is a workaround
A simplest "just-enough" scenario includes aCameraControllerPROCESS which keeps the 1:1 relation with USB-device and has the visual data-acquisition responsibility. Here nanoseconds matter, so do not spend any CPU_CLK on anything other but on moving bytes into a buffer. All other processing shall be left on aVisualDataViewConsumerPROCESS where openCV may ( and will ) spend tens and hundreds microseconds on it's own.
Plus it has a communication / service provisioning part, which allows other distributed processes, access the acquired visual data in parallel manner ( in a non-blocking, concurrent view manner, which is a must for distributed soft-real-time processing ).
If one's architecture requires more features or alternatives, this layered approach allows one to add features while retaining both the control and performance overheads within acceptable envelopes.
A good implementation may remain quite smart, fast, low-latency and slim (resources-wise), if one uses ZeroMQ with it's Zero-Copy inproc: virtually abstract transport class, where one does lose an almost Zero-Latency as a cost of the trick, but gains an immense power of having a robust, massively-distribute-able one may scale-out beyond one, two, dozen, thousands, ... you name it ... hosts for free

Determining available video memory

When developing an OpenGL program, is there a way to poll from the system to find out just how many megabytes are available to store textures, etc?
Or is the standard approach these days just allocate memory and forget about everything?
Although the official stance remains "you don't need to know, you don't want to know, and it would not help you anyway", luckily at least two IHVs have shown a little more insight lately and offer extensions to query that information:
NVX_gpu_memory_info
ATI_meminfo
One nice thing about these extensions is that they have a least common denominator which is just what most people need, and you don't need to query extension support or do anything special, as they both work via glGetIntegerv.
In the easiest case, you can just initialize an array of 4 integers to zero (or some minimum default value that you'll assume in case the extensions don't work), then you call glGetIntegerv twice (with GPU_MEMORY_INFO_CURRENT_AVAILABLE_VIDMEM_NVX and TEXTURE_FREE_MEMORY_ATI, respectively), and finally call glGetError to clear the error state. glGetIntegerv does not modify the pointed-to memory if it fails, nor does it crash or any other bad thing -- it merely sets the error state to GL_INVALID_ENUM.
Both extensions return a value in the first array position, the ATI one returns some values in the other 3 too.
n.b.: glAreTexturesResident has not been supported for almost a decade on mainstream hardware, in the same manner as texture priorities. The common mantra is that the driver writer knows much better than you anyway.
OpenGL doesn't give you this information. And frankly: There's only little benefit, simply because today we have multitasking operating systems. The OpenGL driver is responsible for swapping in texture data to/from system memory, if there's demand for it.
What OpenGL can do for you, is tell, if the textures you've uploaded are still resident in fast memory. The function is called "glAreTexturesResident". You can use this to gradually upload stuff to the GPU until you've filled up the GPU's memory. But keep in mind that you're not the only user of the GPU.

How to get the root cause of a memory corruption in a embedded environment?

I have detected a memory corruption in my embedded environment (my program is running on a set top box with a proprietary OS ). but I couldn't get the root cause of it.
the memory corruption , itself, is detected after a stress test of launching and exiting an application multiple times. giving that I couldn't set a memory break point because the corruptued variable is changing it's address every time that the application is launched, is there any idea to catch the root cause of this corruption?
(A memory break point is break point launched when the environment change the value of a giving memory address)
note also that all my software is developed using C language.
Thanks for your help.
These are always difficult problems on embedded systems and there is no easy answer. Some tips:
Look at the value the memory gets corrupted with. This can give a clear hint.
Look at datastructures next to your memory corruption.
See if there is a pattern in the memory corruption. Is it always at a similar address?
See if you can set up the memory breakpoint at run-time.
Does the embedded system allow memory areas to be sandboxed? Set-up sandboxes to safeguard your data memory.
Good luck!
Where is the data stored and how is it accessed by the two processes involved?
If the structure was allocated off the heap, try allocating a much larger block and putting large guard areas before and after the structure. This should give you an idea of whether it is one of the surrounding heap allocations which has overrun into the same allocation as your structure. If you find that the memory surrounding your structure is untouched, and only the structure itself is corrupted then this indicates that the corruption is being caused by something which has some knowledge of your structure's location rather than a random memory stomp.
If the structure is in a data section, check your linker map output to determine what other data exists in the vicinity of your structure. Check whether those have also been corrupted, introduce guard areas, and check whether the problem follows the structure if you force it to move to a different location. Again this indicates whether the corruption is caused by something with knowledge of your structure's location.
You can also test this by switching data from the heap into a data section or visa versa.
If you find that the structure is no longer corrupted after moving it elsewhere or introducing guard areas, you should check the linker map or track the heap to determine what other data is in the vicinity, and check accesses to those areas for buffer overflows.
You may find, though, that the problem does follow the structure wherever it is located. If this is the case then audit all of the code surrounding references to the structure. Check the contents before and after every access.
To check whether the corruption is being caused by another process or interrupt handler, add hooks to each task switch and before and after each ISR is called. The hook should check whether the contents have been corrupted. If they have, you will be able to identify which process or ISR was responsible.
If the structure is ever read onto a local process stack, try increasing the process stack and check that no array overruns etc have occurred. Even if not read onto the stack, it's likely that you will have a pointer to it on the stack at some point. Check all sub-functions called in the vicinity for stack issues or similar that could result in the pointer being used erroneously by unrelated blocks of code.
Also consider whether the compiler or RTOS may be at fault. Try turning off compiler optimisation, and failing that inspect the code generated. Similarly consider whether it could be due to a faulty context switch in your proprietary RTOS.
Finally, if you are sharing the memory with another hardware device or CPU and you have data cache enabled, make sure you take care of this through using uncached accesses or similar strategies.
Yes these problems can be tough to track down with a debugger.
A few ideas:
Do regular code reviews (not fast at tracking down a specific bug, but valuable for catching such problems in general)
Comment-out or #if 0 out sections of code, then run the cut-down application. Try commenting-out different sections to try to narrow down in which section of the code the bug occurs.
If your architecture allows you to easily disable certain processes/tasks from running, by the process of elimination perhaps you can narrow down which process is causing the bug.
If your OS is a cooperative multitasking e.g. round robin (this would be too hard I think for preemptive multitasking): Add code to the end of the task that "owns" the structure, to save a "check" of the structure. That check could be a memcpy (if you have the time and space), or a CRC. Then after every other task runs, add some code to verify the structure compared to the saved check. This will detect any changes.
I'm assuming by your question you mean that you suspect some part of the proprietary code is causing the problem.
I have dealt with a similar issue in the past using what a colleague so tastefully calls a "suicide note". I would allocate a buffer capable of storing a number of copies of the structure that is being corrupted. I would use this buffer like a circular list, storing a copy of the current state of the structure at regular intervals. If corruption was detected, the "suicide note" would be dumped to a file or to serial output. This would give me a good picture of what was changed and how, and by increasing the logging frequency I was able to narrow down the corrupting action.
Depending on your OS, you may be able to react to detected corruption by looking at all running processes and seeing which ones are currently holding a semaphore (you are using some kind of access control mechanism with shared memory, right?). By taking snapshots of this data too, you perhaps can log the culprit grabbing the lock before corrupting your data. Along the same lines, try holding the lock to the shared memory region for an absurd length of time and see if the offending program complains. Sometimes they will give an error message that has important information that can help your investigation (for example, line numbers, function names, or code offsets for the offending program).
If you feel up to doing a little linker kung fu, you can most likely specify the address of any statically-allocated data with respect to the program's starting address. This might give you a consistent-enough memory address to set a memory breakpoint.
Unfortunately, this sort of problem is not easy to debug, especially if you don't have the source for one or more of the programs involved. If you can get enough information to understand just how your data is being corrupted, you may be able to adjust your structure to anticipate and expect the corruption (sometimes needed when working with code that doesn't fully comply with a specification or a standard).
You detect memory corruption. Could you be more specific how? Is it a crash with a core dump, for example?
Normally the OS will completely free all resources and handles your program has when the program exits, gracefully or otherwise. Even proprietary OSes manage to get this right, although its not a given.
So an intermittent problem could seem to be triggered after stress but just be chance, or could be in the initialisation of drivers or other processes the program communicates with, or could be bad error handling around say memory allocations that fail when the OS itself is under stress e.g. lazy tidying up of the closed programs.
Printfs in custom malloc/realloc/free proxy functions, or even an Electric Fence -style custom allocator might help if its as simple as a buffer overflow.
Use memory-allocation debugging tools like ElectricFence, dmalloc, etc - at minimum they can catch simple errors and most moderately-complex ones (overruns, underruns, even in some cases write (or read) after free), etc. My personal favorite is dmalloc.
A proprietary OS might limit your options a bit. One thing you might be able to do is run the problem code on a desktop machine (assuming you can stub out the hardware-specific code), and use the more-sophisticated tools available there (i.e. guardmalloc, electric fence).
The C library that you're using may include some routines for detecting heap corruption (glibc does, for instance). Turn those on, along with whatever tracing facilities you have, so you can see what was happening when the heap was corrupted.
First I am assuming you are on a baremetal chip that isn't running Linux or some other POSIX-capable OS (if you are there are much better techniques such as Valgrind and ASan).
Here's a couple tips for tracking down embedded memory corruption:
Use JTAG or similar to set a memory watchpoint on the area of memory that is being corrupted, you might be able to catch the moment when memory being is accidentally being written there vs a correct write, many JTAG debuggers include plugins for IDEs that allow you to get stack traces as well
In your hard fault handler try to generate a call stack that you can print so you can get a rough idea of where the code is crashing, note that since memory corruption can occur some time before the crash actually occurs the stack traces you get are unlikely to be helpful now but with better techniques mentioned below the stack traces will help, generating a backtrace on baremetal can be a very difficult task though, if you so happen to be using a Cortex-M line processor check this out https://github.com/armink/CmBacktrace or try searching the web for advice on generating a back/stack trace for your particular chip
If your compiler supports it use stack canaries to detect and immediately crash if something writes over the stack, for details search the web for "Stack Protector" for GCC or Clang
If you are running on a chip that has an MPU such as an ARM Cortex-M3 then you can use the MPU to write-protect the region of memory that is being corrupted or a small region of memory right before the region being corrupted, this will cause the chip to crash at the moment of the corruption rather than much later

Resources