How to use constant memory on several cuda devices - memory

I'm trying to utilize two cuda devices I have (I'm now experimenting with GF 690GTX which is actually 2 separate devices), and my program uses constant memory to transfer data to the device.
I clearly understand what I should do with global memory to use it in two devices:
//working with device 0
cudaSetDevice(0);
void* mem_on_dev_0 = cudaMalloc(...);
cudaMemcpy(mem_on_dev_0, mem_on_host, ...);
kernel_call<<<...>>>(mem_on_dev_0);
//working with device 1
cudaSetDevice(1);
void* mem_on_dev_1 = cudaMalloc(...);
cudaMemcpy(mem_on_dev_1, mem_on_host, ...);
kernel_call<<<...>>>(mem_on_dev_1);
But when working with constant memory the ordinary way to use it is to declare the constant variable somewhere in the file and then use "Symbol"-functions to work with it:
// What device this memory is on?
__device__ __constant__ float g_const_memory[CONST_MEM_SIZE];
// dev_func can be told to be called on any device
__global__ void dev_func()
{
//using const memory
float f = g_const_memory[const_index];
}
void host_func()
{
//cudaSetDevice(0); //any sense?
cudaMemcpyToSymbol(g_const_memory, host_mem, ...);
dev_func<<<...>>>();
}
I've googled much before asking this question but I didn't find any answer. Is there really no way to do it?

Related

Cannot access memory above 0x8000000 in protected mode x86

I am quite enthusiastic about computers and I managed to create my own bootloader and kernel.
So far it jumps to protected mode from real mode and can execute C code. But the thing is that I cannot write to memory addresses above 0x8000000. The code runs fine but whenever I read from that address it seems that the data was not written.
I know that all memory addresses are not mapped to ram but according to this article addresses from 100000-FEBFFFFF are mapped to ram. So my code should work.
http://staff.ustc.edu.cn/~xyfeng/research/cos/resources/machine/mem.htm
Here is a snippet to help you understand
char *data1 = (char*)0x8000000; // This cannot be written to
char *data2 = (char*)0x7ffffff; // This can be written to
*data1 = 'A'; // Will run but not actually write to that ram address
*data2 = 'A'; // Will run properly
Note - I am running my code in qemu(an emulator) and have not tested this on raw hardware yet.
Any help would be appreciated.

Detect if Cycript/Substrate or gdb is attached to an iOS app's process?

I am building an iOS app that transmits sensitive data to my server, and I'm signing my API requests as an additional measure. I want to make reverse engineering as hard as possible, and having used Cycript to find signing keys of some real-world apps, I know it's not hard to find these keys by attaching to a process. I am absolutely aware that if someone is really skilled and tries hard enough, they eventually will exploit, but I'm trying to make it as hard as possible, while still being convenient for myself and users.
I can check for jailbroken status and take additional measures, or I can do SSL pinning, but both are still easy to bypass by attaching to the process and modifying the memory.
Is there any way to detect if something (whether it be Cycript, gdb, or any similar tool that can be used for cracking the process) is attached to the process, while not being rejected from App Store?
EDIT: This is not a duplicate of Detecting if iOS app is run in debugger. That question is more related to outputting and it checks an output stream to identify if there's an output stream attached to a logger, while my question is not related to that (and that check doesn't cover my condition).
gdb detection is doable via the linked stackoverflow question - it uses the kstat to determine if the process is being debugged. This will detect if a debugger is currently attached to the process.
There is also a piece of code - Using the Macro SEC_IS_BEING_DEBUGGED_RETURN_NIL in iOS app - which allows you to throw in a macro that performs the debugger attached check in a variety of locations in your code (it's C/Objective-C).
As for detecting Cycript, when it is run against a process, it injects a dylib into the process to deal with communications between the cycript command line and the process - the library has part of the name looking like cynject. That name doesn't look similar to any libraries that are present on a typical iOS app. This should be detectable with a little loop like (C):
BOOL hasCynject() {
int max = _dyld_image_count();
for (int i = 0; i < max; i++) {
const char *name = _dyld_get_image_name(i);
if (name != NULL) {
if (strstr(name, "cynject") == 0) return YES;
}
}
}
Again, giving it a better name than this would be advisable, as well as obfuscating the string that you're testing.
These are only approaches that can be taken - unfortunately these would only protect you in some ways at run-time, if someone chooses to point IDA or some other disassembler at it then you would not be protected.
The reason that the check for debugger is implemented as a macro is that you would be placing the code in a variety of places in the code, and as a result someone trying to fix it would have to patch the app in a variety of places.
Based on #petesh's answer, I found the below code achieved what I wanted on a jailbroken phone with Cycript. The existence of printf strings is gold to a reverse engineer, so this code is only suitable for demo / crack-me apps.
#include <stdio.h>
#include <string.h>
#include <mach-o/dyld.h>
int main ()
{
int max = _dyld_image_count();
for (int i = 0; i < max; i++) {
const char *name = _dyld_get_image_name(i);
const char needle[11] = "libcycript";
char *ret;
if ((ret = strstr(name, needle)) != NULL){
printf("%s\nThe substring is: %s\n", name, ret);
}
}
return 0;
}
As far as I know, Cycript process injection is made possible by debug symbols. So, if you strip out debug symbols for the App Store release (the default build setting for the Release configuration), that would help.
Another action you could take, which would have no impact on the usability of the App, would be to use an obfuscator. However, this would render any crash reports useless, since you wouldn't be able to make sense of the symbols, even if the crash report was symbolicated.

Difference between writing data in memory directly and using asm instruction

I am reading the Linux kernel. I am curious about the way to write data in memory.
In some part of drivers, they use the writel() function defined in asm/io.h and in definition of that function, they use the movnti instruction - actually I don't understand what this instruction means except it is a kind of mov instruction.
Anyway, when writing data in memory, what's the difference between using writel() and directly writing in memory, e.g. **address = data;.
Here is the case:
static inline void __writel(__u32 val, volatile void __iomem *addr)
{
volatile __u32 __iomem *target = addr;
asm volatile("movnti %1,%0"
: "=m" (*target)
: "r" (val) : "memory");
}
and this is another case:
*(unsigned int*)(MappedAddr+pageOffset) = result;
writel looks like it's intended for memory mapped IO, there are a few things to support this, first the use of the volatile pointer (which prevents optimization such as reordering calls or optimizing them out among other things) and the non-temproal instruction (IO writes/reads shouldn't be cached) and of course the iomem annotation seems to support this too.
If I understand this correctly then using the moventi instruction will minimise the impact on the processor's data caches. Using *(unsigned int*)(MappedAddr+pageOffset) = result; instead leaves the the compiler free to choose whichever move instruction it likes, and its likely to choose one that causes the cache line to be pulled into the cache. Which is probably not what you want if you're interacting with a memory mapped device.

What is the correct way to clear sensitive data from memory in iOS?

I want to clear sensitive data from memory in my iOS app.
In Windows I used to use SecureZeroMemory. Now, in iOS, I use plain old memset, but I'm a little worried the compiler might optimize it:
https://buildsecurityin.us-cert.gov/bsi/articles/knowledge/coding/771-BSI.html
code snippet:
NSData *someSensitiveData;
memset((void *)someSensitiveData.bytes, 0, someSensitiveData.length);
Paraphrasing 771-BSI (link see OP):
A way to avoid having the memset call optimized out by the compiler is to access the buffer again after the memset call in a way that would force the compiler not to optimize the location. This can be achieved by
*(volatile char*)buffer = *(volatile char*)buffer;
after the memset() call.
In fact, you could write a secure_memset() function
void* secure_memset(void *v, int c, size_t n) {
volatile char *p = v;
while (n--) *p++ = c;
return v;
}
(Code taken from 771-BSI. Thanks to Daniel Trebbien for pointing out for a possible defect of the previous code proposal.)
Why does volatile prevent optimization? See https://stackoverflow.com/a/3604588/220060
UPDATE Please also read Sensitive Data In Memory because if you have an adversary on your iOS system, your are already more or less screwed even before he tries to read that memory. In a summary SecureZeroMemory() or secure_memset() do not really help.
The problem is NSData is immutable and you do not have control over what happens. If the buffer is controlled by you, you could use dataWithBytesNoCopy:length: and NSData will act as a wrapper. When finished you could memset your buffer.

pin_ptr a native void* help

The Setup
I have a PDF API which has a native function that is defined below.
typdef void* PDF_DOCUMENT;
unsigned long PDF_GetMetaText(PDF_DOCUMENT document,
const char tag,
void* buffer,
unsigned long bufferlen)
//Calling it "natively" in C++/CLI function to get the PDF Creator tag
WCHAR result[32];
void* pdoc = PDF_LoadDoc("C:\test.pdf");
int numChars = PDF_GetMetaText(pdoc, "Creator", result, 32);
PDF_CloseDoc(pdoc);
if I call the above code in my C++/CLI wrapper function, it returns the correct string but throws an AccessViolationException when I call PDF_CloseDoc. WOOPS. I forgot to pin_ptr the pointer the document.
The Problem
When I pin_ptr pdoc, i can successfully call these native functions, however the buffer no longer contains my string when PDF_GetMetaText returns.
String^ Wrapper::GetCreator(String^ filename)
{
WCHAR buffer[32];
void *pdoc = PDF_LoadDoc(SystemStringToCStr(filename));
pin_ptr<void*> p = &pdoc; //added
int numPages = PDF_GetMetaText(p, "Creator", buffer, 32);
PDF_CloseDocument(p); //doesnt crash, but at this line buffer is an empty string
return gcnew String(buffer);
}
I have also tried pinning buffer[0] but that causes an accessviolation exception at GetMetaText.
The Question
I cant say what is happening in GetMetaText, so I am not sure what is happing to pdoc. Any suggestions to the above code?
This doesn't make any sense. You can only pin managed objects, the return value of PDF_LoadDoc() sure doesn't look like a managed object to me. Same goes for result, it is not a managed array<WCHAR>, just a plain vanilla C array that gets allocated on the stack frame. Unfortunately, pin_ptr<> doesn't complain about this.
The result array could only get 'empty' if code is stomping the stack frame. Which you can diagnose by setting a data breakpoint on the first element. Fwiw, SystemStringToCStr() looks like a candidate. This cannot work without releasing the buffer for the native string somewhere. Another candidate is the PDF API function declarations. Pay attention to the value of the ESP register and make sure it doesn't change. If it does, the stack is get imbalanced because you don't have the proper calling convention. Which is usually __stdcall for DLL exports.

Resources