how to get the available memory on the device - memory

I'm trying to obtain how much free memory I have on the device. To do this I call the cuda function cuMemGetInfo from a fortran code, but it returns negative values for the free amount of memory, so there's clearly something wrong.
Does anyone know how I can do that?
Thanks
EDIT:
Sorry, in fact my question was not very clear. I'm using OpenACC in Fortran and I call the C++ cuda function cudaMemGetInfo. Finally I could fix the code, the problem was effectively the kind of variables that I was using. Switching to size_ fixed everything. This is the interface in fortran that I'm using:
interface
subroutine get_dev_mem(total,free) bind(C,name="get_dev_mem")
use iso_c_binding
integer(kind=c_size_t)::total,free
end subroutine get_dev_mem
end interface
and this the cuda code
#include <cuda.h>
#include <cuda_runtime.h>
extern "C" {
void get_dev_mem(size_t& total, size_t& free)
{
cuMemGetInfo(&free, &total);
}
}
There's one last question: I pushed an array on the gpu and I checked its size using cuMemGetInfo, then I computed it's size counting the number of bytes, but I don't have the same answer, why? In the first case it is 3052mb large, in the latter 3051mb. This difference of 1mb could be the size of the array descriptor? Here there's the code that I used:
integer, parameter:: long = selected_int_kind(12)
integer(kind=c_size_t) :: total, free1,free2
real(8), dimension(:),allocatable::a
integer(kind=long)::N, eight, four
allocate(a(four*N))
!some OpenACC stuff in order to init the gpu
call get_dev_mem(total,free1)
!$acc data copy(a)
call get_dev_mem(total,free2)
print *,"size a in the gpu = ",(free1-free2)/1024/1024, " mb"
print *,"size a in theory = ", (eight*four*N)/1024/1024, " mb"
!$acc end data
deallocate(a)

Right, so, like commenters have suggested, we're not sure exactly what you're running, but filling in the missing details by guessing, here's a shot:
Most CUDA API calls return a status code (or error code if you will); this is true both in C/C++ and in Fortran, as we can see in the Portland Group's CUDA Fortran Manual:
Most of the runtime API routines are integer functions that return an error code; they return a value of zero if the call was successful, and a nonzero value if there was an error. To interpret the error codes, refer to “Error Handling,” on page 48.
This is the case for cudaMemGetInfo() specifically:
integer function cudaMemGetInfo( free, total )
integer(kind=cuda_count_kind) :: free, total
The two integers for free and total are cuda_count_kind, which if I am not mistaken are effectively unsigned... anyway, I would guess that what you're getting is an error code. Have a look at the Error Handling section on page 48 of the manual.

Related

Why are printed memory addresses in Rust a mix of both 40-bit and 48-bit addresses?

I'm trying to understand the way Rust deals with memory and I've a little program that prints some memory addresses:
fn main() {
let a = &&&5;
let x = 1;
println!(" {:p}", &x);
println!(" {:p} \n {:p} \n {:p} \n {:p}", &&&a, &&a, &a, a);
}
This prints the following (varies for different runs):
0x235d0ff61c
0x235d0ff710
0x235d0ff728
0x235d0ff610
0x7ff793f4c310
This is actually a mix of both 40-bit and 48-bit addresses. Why this mix? Also, can somebody please tell me why the addresses (2, 3, 4) do not fall in locations separated by 8-bytes (since std::mem::size_of_val(&a) gives 8)? I'm running Windows 10 on an AMD x-64 processor (Phenom || X4) with 24GB RAM.
All the addresses do have the same size, Rust is just not printing trailing 0-digits.
The actual memory layout is an implementation detail of your OS, but the reason that a prints a location in a different memory area than all the other variables is, that a actually lives in your loaded binary, because it is a value that can already be calculated by the compiler. All the other variables are calculated at runtime and live on the stack.
See the compilation result on https://godbolt.org/z/kzSrDr:
.L__unnamed_4 contains the value 5; .L__unnamed_5, .L__unnamed_6 and .L__unnamed_1 are &5 &&5 and &&&5.
So .L__unnamed_1 is what on your system is at 0x7ff793f4c310. While 0x235d0ff??? is on your stack and calculated in the red and blue areas of the code.
This is actually a mix of both 40-bit and 48-bit addresses. Why this mix?
It's not really a mix, Rust just doesn't display leading zeroes. It's really about where the OS maps the various components of the program (data, bss, heap and stack) in the address space.
Also, can somebody please tell me why the addresses (2, 3, 4) do not fall in locations separated by 8-bytes (since std::mem::size_of_val(&a) gives 8)?
Because println! is a macro which expands to a bunch of stuff in the stackframe, so your values are not defined next to one another in the frame final code (https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=5b812bf11e51461285f51f95dd79236b). Though even if they were there'd be no guarantee the compiler wouldn't e.g. be reusing now-dead memory to save up on frame size.

How to get this sqrt inline assembly working for iOS

I am trying to follow another SO post and implement sqrt14 within my iOS app:
double inline __declspec (naked) __fastcall sqrt14(double n)
{
_asm fld qword ptr [esp+4]
_asm fsqrt
_asm ret 8
}
I have modified this to the following in my code:
double inline __declspec (naked) sqrt14(double n)
{
__asm__("fld qword ptr [esp+4]");
__asm__("fsqrt");
__asm__("ret 8");
}
Above, I have removed the "__fastcall" keyword from the method definition since my understanding is that it is for x86 only. The above gives the following errors for each assembly line respectively:
Unexpected token in argument list
Invalid instruction
Invalid instruction
I have attempted to read through a few inline ASM guides and other posts on how to do this, but I am generally just unfamiliar with the language. I know MIPS quite well, but these commands/registers seem to be very different. For example, I don't understand why the original author never uses the passed in "n" value anywhere in the assembly code.
Any help getting this to work would be greatly appreciated! I am trying to do this because I am building an app where I need to calculate sqrt (ok, yes, I could do a lookup table, but for right now I care a lot about precision) on every pixel of a live-video feed. I am currently using the standard sqrt, and in addition to the rest of the computation, I'm running at around 8fps. Hoping to bump that up a frame or two with this change.
If it matters: I'm building the app to ideally be compatibly with any current iOS device that can run iOS 7.1 Again, many thanks for any help.
The compiler is perfectly capable of generating fsqrt instruction, you don't need inline asm for that. You might get some extra speed if you use -ffast-math.
For completeness' sake, here is the inline asm version:
__asm__ __volatile__ ("fsqrt" : "=t" (n) : "0" (n));
The fsqrt instruction has no explicit operands, it uses the top of the stack implicitly. The =t constraint tells the compiler to expect the output on the top of the fpu stack and the 0 constraint instructs the compiler to place the input in the same place as output #0 (ie. the top of the fpu stack again).
Note that fsqrt is of course x86-only, meaning it wont work for example on ARM cpus.

Calculating factorial on FORTRAN with integer variables. Memory overflow

I'm doing a program with FORTRAN that is a bit special. I can only use integer variables, and as you know with these you've got a memory overflow when you try to calculate a factorial superior to 12 or 13. So I made this program to avoid this problem:
http://lendricheolfiles.webs.com/codigo.txt
But something very strange is happening. The program calculates the factorial well 4 or 5 times and then gives a memory overflow message. I'm using Windows 8 and I fear it might be the cause of the failure, or if it's just that I've done something wrong.
Thanks.
Try compiling with run-time subscript checking. In Fortran segmentation faults are generally caused either by subscript errors or by mismatches between actual and dummy arguments (i.e., between arguments in the call to a procedure and the arguments as declared in the procedure). I'll make a wild guess from glancing at your code that you have have a subscript error -- let the compiler find it for you by turning on run-time subscript checking. Most Fortran compilers have this as an compilation option.
P.S. You can also do calculations like this by using already written packages, e.g., the arbitrary precision arithmetic software of David Bailey, et al., available in Fortran 90 at http://crd-legacy.lbl.gov/~dhbailey/mpdist/
M.S.B.'s answer has the gist of your problem: your array indices go out of bounds at a couple of places.
In three loops, cifra - 1 == 0 is out of bounds:
do cifra=ncifras,1,-1
factor(1,cifra-1) = factor(1,cifra)/10 ! factor is (1:2, 1:ncifras)
factor(1,cifra) = mod(factor(1,cifra),10)
enddo
! :
! Same here:
do cifra=ncifras,1,-1
factor(2,cifra-1) = factor(2,cifra)/10
factor(2,cifra) = mod(factor(2,cifra),10)
enddo
!:
do cifra=ncifras,1,-1
sumaprovisional(cifra-1) = sumaprovisional(cifra-1)+(sumaprovisional(cifra)/10)
sumaprovisional(cifra) = mod(sumaprovisional(cifra),10)
enddo
In the next case, the value of cifra - (fila - 1) goes out of bounds:
do fila=1,nfilas
do cifra=1,ncifras
! Out of bounds for all cifra < fila:
sumando(fila,cifra-(fila-1)) = factor(1,cifra)*factor(2,ncifras-(fila-1))
enddo
sumaprovisional = sumaprovisional+sumando(fila,:)
enddo
You should be fine if you rewrite the first three loops as do cifra = ncifras, 2, -1 and the inner loop of the other case as do cifra = fila, ncifras. Also, in the example program you posted, you first have to allocate resultado properly before passing it to the subroutine.

The art of exploitation - exploit_notesearch.c

i've got a question regarding the exploit_notesearch program.
This program is only used to create a command string we finally call with the system() function to exploit the notesearch program that contains a buffer overflow vulnerability.
The commandstr looks like this:
./notesearch Nop-block|shellcode|repeated ret(will jump in nop block).
Now the actual question:
The ret-adress is calculated in the exploit_notesearch program by the line:
ret = (unsigned int) &i-offset;
So why can we use the address of the i-variable that is quite at the bottom of the main-stackframe of the exploit_notesearch program to calculate the ret address that will be saved in an overflowing buffer in the notesearch program itself ,so in an completely different stackframe, and has to contain an address in the nop block(which is in the same buffer).
that will be saved in an overflowing buffer in the notesearch program itself ,so in an completely different stackframe
As long as the system uses virtual memory, another process will be created by system() for the vulnerable program, and assuming that there is no stack randomization,
both processes will have almost identical values of esp (as well as offset) when their main() functions will start, given that the exploit was compiled on the attacked machine (i.e. with vulnerable notesearch).
The address of variable i was chosen just to give an idea about where the frame base is. We could use this instead:
unsigned long sp(void) // This is just a little function
{ __asm__("movl %esp, %eax");} // used to return the stack pointer
int main(){
esp = sp();
ret = esp - offset;
//the rest part of main()
}
Because the variable i will be located on relatively constant distance from esp, we can use &i instead of esp, it doesn't matter much.
It would be much more difficult to get an approximate value for ret if the system did not use virtual memory.
the stack is allocated in a way as first in last out approach. The location of i variable is somewhere on the top and lets assume that it is 0x200, and the return address is located in a lower address 0x180 so in order to determine the where about to put the return address and yet to leave some space for the shellcode, the attacker must get the difference, which is: 0x200 - 0x180 = 0x80 (128), so he will break that down as follows, ++, the return address is 4 bytes so, we have only 48 bytes we left before reaching the segmentation. that is how it is calculated and the location i give approximate reference point.

Lua Alien Module - Trouble using WriteProcessMemory function, unsure on types (unit32)

require "alien"
--the address im trying to edit in the Mahjong game on Win7
local SCOREREF = 0x0744D554
--this should give me full access to the process
local ACCESS = 0x001F0FFF
--this is my process ID for my open window of Mahjong
local PID = 1136
--function to open proc
local op = alien.Kernel32.OpenProcess
op:types{ ret = "pointer", abi = "stdcall"; "int", "int", "int"}
--function to write to proc mem
local wm = alien.Kernel32.WriteProcessMemory
wm:types{ ret = "long", abi = "stdcall"; "pointer", "pointer", "pointer", "long", "pointer" }
local pRef = op(ACCESS, true, PID)
local buf = alien.buffer("99")
-- ptr,uint32,byte arr (no idea what to make this),int, ptr
print( wm( pRef, SCOREREF, buf, 4, nil))
--prints 1 if success, 0 if failed
So that is my code. I am not even sure if I have the types set correctly.
I am completely lost and need some guidance. I really wish there was more online help/documentation for alien, it confuses my poor brain.
What utterly baffles me is that it WriteProcessMemory will sometimes complete successfully (though it does nothing at all, to my knowledge) and will also sometimes fail to complete successfully. As I've stated, my brain hurts.
Any help appreciated.
It looks like your buffer contains only 2 bytes ("99"), but you specify 4 bytes in the call to WriteProcessMemory.
If your intention was to write the 32-bit value 99 into memory (as a number, not an ASCII string), you can use:
alien.buffer("\99\0\0\0")
You can convert arbitrary integers to string representations using alien.struct.pack:
require "alien.struct"
s = alien.struct.pack('i', 99)
buf = alien.buffer(s)
I know this question is long forgotten, but I ran into the same issue (with the same function), and there was nothing on the web except this question, and then I solved it myself, so I'm leaving my solution here.
SHORT ANSWER
The type of the second argument of WriteProcessMemory is not "pointer". I mean, officially it is, but alien cannot cast a raw address to a "pointer", so you are better off pretending it's a "long" instead. So your types declaration should look like
wm:types{ ret = "long", abi = "stdcall"; "pointer", "long", "pointer", "long", "pointer" }
LONG ANSWER
I was playing around with ReadProcessMemory, since I figured that before writing something you need to verify that this something actually exists. So one time I called ReadProcessMemory, and it returned a buffer that wasn't what I was looking for, but it wasn't empty either. In fact, it seemed something was written there - as in, an ASCII string. Not text, though, just some digits. But that was enough to convince me that the data actually came from somewhere.
So I grabbed Cheat Engine, opened the same process and ran a search for this string. And guess what - it actually was there, but the address was completely wrong. That led me to believe that the address is specified wrongly. After trying to find a way to generate a "pointer" object from a Lua number, I gave up and changed the types declaration - after all, a pointer is just a differently interpreted integer.
After all that, I did some investigating, including reading the sources of both lua and alien, and stepping through the relevant parts with a debugger. It turns out, the full walkthrough of the error is as follows:
The "pointer" keyword has special behaviour for strings: if your "pointer"-declared argument is actually a Lua string, then a new buffer is instantly created, the string is copied there, and it is used as the real argument.
Alien uses the lua_isstring function to implement this
lua_isstring returns "true" not only for actual strings, but for numbers as well, since they are auto-convertible into strings.
As a result, your SCOREREF is turned into a string, copied into a newly created buffer, and the address of THAT is passed into WriteProcessMemory as a void*.
Since the layouts of most processes in their respective address spaces are similar, this void* more often than not happens to coincide with an address of some thing or another in the target process. That is why the system call sometimes succeedes, it just writes into a completely wrong place.

Resources