I have a weird issue. LUA 5.3.5, compiled on STM32F429. Free RAM is about 1Mb (memory allocation is using external SDRAM, not the more limited internal RAM on the STM32). Note that working with things like strings works fine, as well. It only seems to be division causing the problem.
This script works:
a=100
b=20
c=a+b
print(c)
This script returns "memory allocation error: block too big:"
a=100
b=20
c=a/b
print(c)
Further research is showing that the problem is not with the division, at all. It is with tostring() which is called by print(). For some reason, tostring() is trying to allocate too much memory when dealing with the result from direct division.
In lstring.c, is the following:
luaS_newlstr():
if (l >= (MAX_SIZE - sizeof(TString))/sizeof(char))
luaM_toobig(L);
When the issue occurs, l == 0xd0600f56
(interestingly, that is a memory address location in the range of the external SDRAM, rather than a valid string size).
If I modify the LUA script to do the following, it works fine:
a=100
b=20
c=math.floor(a/b)
print(c)
I checked and in both cases, c is type==number
As for the question regarding the memory allocation, we are using the dlmalloc() library, configured like this during LUA startup:
ezCmdLua = lua_newstate(ezlua_poolalloc, NULL);
int error = luaL_loadbuffer(ezCmdLua, bfr, len, "ezCmdLua");
if (!error)
{
error = lua_pcall(ezCmdLua, 0, 0, 0);
if (error) {
...
}
}
....
static void *ezlua_poolalloc (void *ud, void *ptr, size_t osize, size_t nsize) {
(void)ud; (void)osize; /* not used */
if (nsize == 0) {
dlfree(ptr);
return NULL;
}
else
return dlrealloc(ptr, nsize);
}
I have confirmed that memory allocation is working properly, and I can do things like string manipulation and printing of strings with no problem at all. In fact, when debugging this issue, the luaS_newlstr() function is called several times prior to the issue occurring, and each time l (the length of the string) is a reasonable value. That is, until I try to print the result of the division. Moving the division around in the script makes no difference (ie, adding things before it like other print statements), so I doubt the stack is being trashed.
Related
In C, before using the scanf or gets "stdio.h" functions to get and store user input, the programmer has to manually allocate memory for the data that's read to be stored in. In Rust, the std::io::Stdin.read_line function can seemingly be used without the programmer having to manually allocate memory prior. All it needs is for there to be a mutable String variable to store the data it reads in. How does it do this seemingly without knowledge about how much memory will be required?
Well, if you want a detailed explanation, you can dig a bit into the read_line method which is part of the BufRead trait. Heavily simplified, the function look like this.
fn read_line(&mut self, target: &mut String)
loop {
// That method fills the internal buffer of the reader (here stdin)
// and returns a slice reference to whatever part of the buffer was filled.
// That buffer is actually what you need to allocate in advance in C.
let available = self.fill_buf();
match memchr(b'\n', available) {
Some(i) => {
// A '\n' was found, we can extend the string and return.
target.push_str(&available[..=i]);
return;
}
None => {
// No '\n' found, we just have to extend the string.
target.push_str(available);
},
}
}
}
So basically, that method extends the string as long as it does not find a \n character in stdin.
If you want to allocate a bit of memory in advance for the String that you pass to read_line, you can create it using String::with_capacity. This will not prevent the String to reallocate if it is not large enough though.
I am working with Embedded C language and recently run the MathWorks Polyspace Code Prover (Dynamic analysis) for the whole project to check for critical runtime errors. It found one bug (Red warning) at While loop where I am copying some ROM data into RAM via memory registers.
The code is working fine and as expected but I would like to ask if there is any solution to safely remove this warning. Please find the code example below:
register int32 const *source;
uint32 i=0;
uint32 *dest;
source= (int32*)&ADDR_SWR4_BEGIN;
dest = (uint32*)&ADDR_ARAM_BEGIN;
if ( source != NULL )
{
while ( i < 2048 )
{
dest[i] = (uint32)source[i];
i++;
}
}
My guess is that ADDR_SWR4_BEGIN and ADDR_ARAM_BEGIN is defined in linker script and polyspace didn't compile and link the project that is why it is complaining about the possible run time error or infinite loop.
ADDR_SWR4_BEGIN and ADDR_ARAM_BEGIN are defined as extern in the respective header file.
extern uint32_t ADDR_SWR4_BEGIN;
extern uint32_t ADDR_ARAM_BEGIN;
The warning is red and exact warning is as follow:
Check: Non-terminating Loop
Detail: The Loop is infinite or contains a run-time error
Severity: Unset
Any suggestions would be appreciated.
The code is overall quite fishy.
Bugs
if ( source != NULL ). You just set this pointer to point at an address, so it will obviously not point at NULL. This line is superfluous.
You aren't using volatile when accessing registers/memory, so if this code is executed multiple times, the compiler might make all kinds of strange assumptions. This might be the cause of the diagnostic message.
Bad style/code smell (should be fixed)
Using the register keyword is fishy. This was once a thing in the 1980s when compilers were horrible and couldn't optimize code properly. Nowadays they can do this, and far better than the programmer, so any presence of register in new source code is fishy.
Accessing a register or memory location as int32 and then casting this to unsigned type doesn't make any sense at all. If the data isn't signed, then why are you using a signed type in the first place.
Using home-brewed uint32 types instead of stdint.h is poor style.
Nit-picks (minor remarks)
The (int32*) cast should be const qualified.
The loop is needlessly ugly, could be replaced with a for loop:
for(uint32_t i=0; i<2048; i++)
{
dest[i] = source[i];
}
If PolySpace does not know the value ADDR_ARAM_BEGIN it will assume it could be NULL (or any other value value for its type). While you explicitly test for source being NULL, you do not do the same for dest.
Since both source and dest are assigned from linker constants and in normal circumstances neither should be NULL it is unnecessary to explicitly test for NULL in the control flow and an assert() would be preferable - PolySPace recognises assertions, and will apply the constraint in subsequent analysis, but assert() resolves to nothing when NDEBUG is defined (normally in release builds), so does not impose unnecessary overhead:
const uint32_t* source = (const uint32_t*)&ADDR_SWR4_BEGIN ;
uint32_t* dest = (uint32_t*)&ADDR_ARAM_BEGIN;
// PolySpace constraints asserted
assert( source != NULL ) ;
assert( dest != NULL ) ;
for( int i = 0; i < 2048; i++ )
{
dest[i] = source[i] ;
}
An alternative is to provide PolySpace with a "forced-include" (-include option) to provide explicit definitions so that PolySpace will not consider all possible values to be valid in its analysis. That will probably have the effect of speeding analysis also.
the reason why Polyspace is giving a red error here is that source and dest are pointers to a uint32. Indeed, when you write:
source= (int32*)&ADDR_SWR4_BEGIN
you take the address of the variable ADDR_SWR4_BEGIN and assign it to source.
Hence both pointers are pointing to a buffer of 4 bytes only.
It is then not possible to use these pointers like arrays of 2048 elements.
You should also see an orange check on source[i] giving you information on what's happening with the pointer source.
It seems that ADDR_SWR4_BEGIN and ADDR_SWR4_BEGIN are actually containing addresses.
And in this case, the code should be:
source = (uint32*)ADDR_SWR4_BEGIN;
dest = (uint32*)ADDR_ARAM_BEGIN;
If you do this change in the code, the red error disappears.
Is it possible to start a process in windows with exactly the same address structure as the previous opening of the process?
To clarify the goal of this question I should mention that I use cheatengine (http://www.cheatengine.org/) to cheat some games! It includes several iterations to find a parameter (e.g. ammunition) and freeze it. However, each time I restart the game, since the memory structure of the game changes, I need to go through the time-consuming iterations again. So, if there were a method bring up the game exactly with the same memory structure as before, I wouldn't need going through iterations.
Not to say it's impossible, but this is essentially too much work due to the dynamic memory allocation routines the process will be using including the new operator and malloc(). Additionally when the DLL's imported by the executable are loaded into memory they have a preferred imagebase but if that address is already used, the OS will load it into a different memory location. Additionally Address Space Layout Randomization (ASLR) can be enabled on the process which is a security measure that randomizes the memory address of code sections.
The solution to your problem is much easier then what you're asking. To defeat the dynamic memory allocation described above you can still resolve the correct address of a variable by utilizing:
Relative offsets from module bases
Multi-level pointers
Pattern Scanning
Cheat Engine has all 3 of these built into it. When you save an address to your table is often saves it as a module + relative offset. You can pointer scan for the address and save it as a multilevel pointer or reverse the pointer yourself and manually place it in the table. Pattern scanning is achieved by using a CE Script, which you can put right in the Cheat Table.
In this case the ammo variable, may be a "static address" which means it's relative to the base address of the module. you may see it listed in Cheat Engine as "client.dll + 0xDEADCODE". You simply get the base address of the module at runtime and add the relative offset.
If you're looking to make an external hack in C++ you can get started like this.
In an external hack you do this by walking a ToolHelp32Snapshot:
uintptr_t GetModuleBase(const wchar_t * ModuleName, DWORD ProcessId) {
// This structure contains lots of goodies about a module
MODULEENTRY32 ModuleEntry = { 0 };
// Grab a snapshot of all the modules in the specified process
HANDLE SnapShot = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, ProcessId);
if (!SnapShot)
return NULL;
// You have to initialize the size, otherwise it will not work
ModuleEntry.dwSize = sizeof(ModuleEntry);
// Get the first module in the process
if (!Module32First(SnapShot, &ModuleEntry))
return NULL;
do {
// Check if the module name matches the one we're looking for
if (!wcscmp(ModuleEntry.szModule, ModuleName)) {
// If it does, close the snapshot handle and return the base address
CloseHandle(SnapShot);
return (DWORD)ModuleEntry.modBaseAddr;
}
// Grab the next module in the snapshot
} while (Module32Next(SnapShot, &ModuleEntry));
// We couldn't find the specified module, so return NULL
CloseHandle(SnapShot);
return NULL;
}
To get the Process ID you would do:
bool GetPid(const wchar_t* targetProcess, DWORD* procID)
{
HANDLE snap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
if (snap && snap != INVALID_HANDLE_VALUE)
{
PROCESSENTRY32 pe;
pe.dwSize = sizeof(pe);
if (Process32First(snap, &pe))
{
do
{
if (!wcscmp(pe.szExeFile, targetProcess))
{
CloseHandle(snap);
*procID = pe.th32ProcessID;
return true;
}
} while (Process32Next(snap, &pe));
}
}
return false;
}
Using my example you would combine these functions and do:
DWORD procId;
GetPid(L"game.exe", &procId);
uintptr_t modBaseAddr = GetModuleBase(L"client.dll", procId);
uintptr_t ammoAddr = modBaseAddr + 0xDEADCODE;
If the address is not "static" you can find a pointer to it, the base address of the pointer must be static and then you just follow the above guide, and dereference each level of the pointer and add an offset.
Of course I have a function for that too :)
uintptr_t FindDmaAddy(HANDLE hProcHandle, uintptr_t BaseAddress, uintptr_t Offsets[], int PointerLevel)
{
uintptr_t pointer = BaseAddress;
uintptr_t pTemp;
uintptr_t pointerAddr;
for (int i = 0; i < PointerLevel; i++)
{
if (i == 0)
{
ReadProcessMemory(hProcHandle, (LPCVOID)pointer, &pTemp, sizeof(pTemp), NULL);
}
pointerAddr = pTemp + Offsets[i];
ReadProcessMemory(hProcHandle, (LPCVOID)pointerAddr, &pTemp, sizeof(pTemp), NULL);
}
return pointerAddr;
}
I would highly recommend watching some Youtube tutorials to see how it's done, much better explained in video format.
I am trying to use the C/C++ API of Z3 to parse fixed point constraints in the SMTLib2 format (specifically files produced by SeaHorn). However, my application crashes when parsing the string (I am using the Z3_fixedpoint_from_string method). The Z3 version I'm working with is version 4.5.1 64 bit.
The SMTLib file I try to parse works find with the Z3 binary, which I have compiled from the sources, but it runs into a segmentation fault when calling Z3_fixedpoint_from_string. I narrowed the problem down to the point that I think the issue is related to adding relations to the fixed point context. A simple example that produces a seg fault on my machine is the following:
#include "z3.h"
int main()
{
Z3_context c = Z3_mk_context(Z3_mk_config());
Z3_fixedpoint f = Z3_mk_fixedpoint(c);
Z3_fixedpoint_from_string (c, f, "(declare-rel R ())");
Z3_del_context(c);
}
Running this code with valgrind reports a lot of invalid reads and writes. So, either this is not how the API is supposed to be used, or there is a problem somewhere. Unfortunately, I could not find any examples on how to use the fixed point engine programmatically. However, calling Z3_fixedpoint_from_string (c, f, "(declare-var x Int)"); for instance works just fine.
BTW, where is Z3_del_fixedpoint()?
The fixedpoint object "f" is reference counted. the caller is responsible for taking a reference count immediately after it is created. It is easier to use C++ smart pointers to control this, similar to how we control it for other objects. The C++ API does not have a wrapper for fixedpoint objects so you would have to create your own in the style of other wrappers.
Instead of del_fixedpoint one uses reference counters.
class fixedpoint : public object {
Z3_fixedpoint m_fp;
public:
fixedpoint(context& c):object(c) { mfp = Z3_mk_fixedpoint(c); Z3_fixedpoint_inc_ref(c, m_fp); }
~fixedpoint() { Z3_fixedpoint_dec_ref(ctx(), m_fp); }
operator Z3_fixedpoint() const { return m_fp; }
void from_string(char const* s) {
Z3_fixedpoint_from_string (ctx(), m_fp, s);
}
};
int main()
{
context c;
fixedpoint f(c);
f.from_string("....");
}
I notice that Rust's test has a benchmark mode that will measure execution time in ns/iter, but I could not find a way to measure memory usage.
How would I implement such a benchmark? Let us assume for the moment that I only care about heap memory at the moment (though stack usage would also certainly be interesting).
Edit: I found this issue which asks for the exact same thing.
You can use the jemalloc allocator to print the allocation statistics. For example,
Cargo.toml:
[package]
name = "stackoverflow-30869007"
version = "0.1.0"
edition = "2018"
[dependencies]
jemallocator = "0.5"
jemalloc-sys = {version = "0.5", features = ["stats"]}
libc = "0.2"
src/main.rs:
use libc::{c_char, c_void};
use std::ptr::{null, null_mut};
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
extern "C" fn write_cb(_: *mut c_void, message: *const c_char) {
print!("{}", String::from_utf8_lossy(unsafe {
std::ffi::CStr::from_ptr(message as *const i8).to_bytes()
}));
}
fn mem_print() {
unsafe { jemalloc_sys::malloc_stats_print(Some(write_cb), null_mut(), null()) }
}
fn main() {
mem_print();
let _heap = Vec::<u8>::with_capacity (1024 * 128);
mem_print();
}
In a single-threaded program that should allow you to get a good measurement of how much memory a structure takes. Just print the statistics before the structure is created and after and calculate the difference.
(The "total:" of "allocated" in particular.)
You can also use Valgrind (Massif) to get the heap profile. It works just like with any other C program. Make sure you have debug symbols enabled in the executable (e.g. using debug build or custom Cargo configuration). You can use, say, http://massiftool.sourceforge.net/ to analyse the generated heap profile.
(I verified this to work on Debian Jessie, in a different setting your mileage may vary).
(In order to use Rust with Valgrind you'll probably have to switch back to the system allocator).
P.S. There is now also a better DHAT.
jemalloc can be told to dump a memory profile. You can probably do this with the Rust FFI but I haven't investigated this route.
As far as measuring data structure sizes is concerned, this can be done fairly easily through the use of traits and a small compiler plugin. Nicholas Nethercote in his article Measuring data structure sizes: Firefox (C++) vs. Servo (Rust) demonstrates how it works in Servo; it boils down to adding #[derive(HeapSizeOf)] (or occasionally a manual implementation) to each type you care about. This is a good way of allowing precise checking of where memory is going, too; it is, however, comparatively intrusive as it requires changes to be made in the first place, where something like jemalloc’s print_stats() doesn’t. Still, for good and precise measurements, it’s a sound approach.
Currently, the only way to get allocation information is the alloc::heap::stats_print(); method (behind #![feature(alloc)]), which calls jemalloc's print_stats().
I'll update this answer with further information once I have learned what the output means.
(Note that I'm not going to accept this answer, so if someone comes up with a better solution...)
Now there is jemalloc_ctl crate which provides convenient safe typed API. Add it to your Cargo.toml:
[dependencies]
jemalloc-ctl = "0.3"
jemallocator = "0.3"
Then configure jemalloc to be global allocator and use methods from jemalloc_ctl::stats module:
jemalloc_ctl::stats::allocated
jemalloc_ctl::stats::resident
Here is official example:
use std::thread;
use std::time::Duration;
use jemalloc_ctl::{stats, epoch};
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
fn main() {
loop {
// many statistics are cached and only updated when the epoch is advanced.
epoch::advance().unwrap();
let allocated = stats::allocated::read().unwrap();
let resident = stats::resident::read().unwrap();
println!("{} bytes allocated/{} bytes resident", allocated, resident);
thread::sleep(Duration::from_secs(10));
}
}
There's a neat little solution someone put together here: https://github.com/discordance/trallocator/blob/master/src/lib.rs
use std::alloc::{GlobalAlloc, Layout};
use std::sync::atomic::{AtomicU64, Ordering};
pub struct Trallocator<A: GlobalAlloc>(pub A, AtomicU64);
unsafe impl<A: GlobalAlloc> GlobalAlloc for Trallocator<A> {
unsafe fn alloc(&self, l: Layout) -> *mut u8 {
self.1.fetch_add(l.size() as u64, Ordering::SeqCst);
self.0.alloc(l)
}
unsafe fn dealloc(&self, ptr: *mut u8, l: Layout) {
self.0.dealloc(ptr, l);
self.1.fetch_sub(l.size() as u64, Ordering::SeqCst);
}
}
impl<A: GlobalAlloc> Trallocator<A> {
pub const fn new(a: A) -> Self {
Trallocator(a, AtomicU64::new(0))
}
pub fn reset(&self) {
self.1.store(0, Ordering::SeqCst);
}
pub fn get(&self) -> u64 {
self.1.load(Ordering::SeqCst)
}
}
Usage: (from: https://www.reddit.com/r/rust/comments/8z83wc/comment/e2h4dp9)
// needed for Trallocator struct (as written, anyway)
#![feature(integer_atomics, const_fn_trait_bound)]
use std::alloc::System;
#[global_allocator]
static GLOBAL: Trallocator<System> = Trallocator::new(System);
fn main() {
GLOBAL.reset();
println!("memory used: {} bytes", GLOBAL.get());
{
let mut vec = vec![1, 2, 3, 4];
for i in 5..20 {
vec.push(i);
println!("memory used: {} bytes", GLOBAL.get());
}
for v in vec {
println!("{}", v);
}
}
// For some reason this does not print zero =/
println!("memory used: {} bytes", GLOBAL.get());
}
I've just started using it, and it seems to work well! Straight-forward, realtime, requires no external packages, and doesn't require changing your base memory allocator.
It's also nice that, because it's intercepting the allocate/deallocate calls, you should be able to add custom logic if desired (eg. if memory usage goes above X, print the stack-trace to see what's triggering the allocations) -- although I haven't tried this yet.
I also haven't yet tested to see how much overhead this approach adds. If someone does a test for this, let me know!