Usage of global vs. constant memory in CUDA

Usage of global vs. constant memory in CUDA - memory

Hey there,
I have the following piece of code:
#if USE_CONST == 1
__constant__ double PNT[ SIZE ];
#else
__device__ double *PNT;
#endif
and a bit later I have:
#if USE_CONST == 0
cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
#else
cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);
#endif
whereas point is somewhere defined in the code before. When working with USE_CONST=1 everything works as expected, but when working without it, than it doesn't. I access the array in my kernel-function via
PNT[ index ]
Where's the problem between the both variants?
Thanks!

The correct usage of cudaMemcpyToSymbol prior to CUDA 4.0 is:
cudaMemcpyToSymbol("PNT", point, sizeof(double)*SIZE)
or alternatively:
double *cpnt;
cudaGetSymbolAddress((void **)&cpnt, "PNT");
cudaMemcpy(cpnt, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
which might be a bit faster if you are planning to access the symbol from the host API more than once.
EDIT: misunderstood the question. For the global memory version, do something similar to the second version for constant memory
double *gpnt;
cudaGetSymbolAddress((void **)&gpnt, "PNT");
cudaMemcpy(gpnt, point, sizeof(double)*SIZE. cudaMemcpyHostToDevice););

Although this is an old question I add this for future googlers:
The problem is here:
cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
The cudaMalloc writes to the host version of PNT which is actually a device variable that must not be accessed from host. So correct would be to allocate memory, copy the address to the device symbol and copy the memory to the the memory pointed to by that symbol:
void* memPtr;
cudaMalloc(&memPtr, sizeof(double)*SIZE);
cudaMemcpyToSymbol(PNT, &memPtr, sizeof(memPtr));
// In other places you'll need an additional:
// cudaMemcpyFromSymbol(&memPtr, PNT, sizeof(memPtr));
cudaMemcpy(memPtr, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
Easier would be:
#if USE_CONST == 1
__constant__ double PNT[ SIZE ];
#else
__device__ double PNT[ SIZE ];
#endif
// No #if required anymore:
cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);

Related

On iOS, are 0x0000000000000026, 0x000000000000001c, 0x000000000000005a examples of tagged pointers?

On iOS 11, when I intentionally create objects which would turn out to be tagged pointers, they start with 0xB, instead of the 0x0000000000000026, 0x000000000000001c, 0x000000000000005a values that are showing up in my crash reports as invalid addresses. I think these are likely tagged pointers, but they aren't formatted like tagged pointers that I see in the debugger.
What about 0x0000000000000010, 0x0000000000000020, 0x0000000000000030 ? They all have a trailing 0, but they sure look suspiciously small to be real pointers.

The implementation details of tagged pointers changes from release to release and architecture to architecture. With that said, those really don't look like tagged pointers.
What is most likely is that some piece of code is dereferencing into a struct or object that is unexpectedly NULL or nil.
Run this code:
struct bob {
void *a;
void *b;
void *c;
char d[42];
};
int main(int argc, const char * argv[]) {
struct bob *fred = NULL;
fred->d[2] = 'q';
return 0;
}
You'll get this crash (on x86_64): Thread 1: EXC_BAD_ACCESS (code=1, address=0x1a)
That is, it is trying to dereference through 0x0. So, more likely than not, you have a struct/object reference that is NULL and your code is trying to dereference an element or instance variable that is offset by the hex #s you listed from the beginning.

Windriver VxWorks Simulator Self modifying code

Good morning.
I have a program that is Self-Modifying-Code.
Really, it build the binaries, which then are changed by ELFPatch and changes some function's prologues.
I am working with Windriver WorkBench 3.3 & VxWorks 6.9 Update3.
I created a standard simulator (PENTIUM),
when i run my code on the simulator:
void replace_prolog(void* func_ptr) {
char* p = (char*)func_ptr;
for (int i=0; i < PROLOGUE_SIZE; ++i)
p[i]=m_prologue[i]; // << prologue is a member array.
...
}
Let's call the Real Prologue : Original Prologue;
The Changed Prologue : Changed Prologue;
The One that is placed at Run-Time : Replacement Prologue;
I get an Exception (signal 11 - Segmentation Fault).
!! I realized it is VxWorks's .text Segment Protection.
So, I created a SimPC based VIP to be my simulator BSP, and excluded INCLUDE_PROTECT_TEXT (and all it's relevant kernel components)
and run the simulator:
Now, there is no exception!
Facts
Looking at Memory Browser I see the Changed Prologue Bytes (memory didn't change)!
Printing the buffer to console, prints the Replacement Prologue Bytes values! (Weird)
looking at assembly view (Mega Weird): shows the Changed Prologue Hex values but the Original Prologue asm commands (push bp;...) even though the byte value does not match them.
My Questions
Anyone had any experience with modifying .text segment?
Anyone encountered memory that would not change (without an exception/signal) on simulator, which is not a memory mapped port/volatile ?
Long Shot Assumption
I have an assumption it is about caching, hinting that vxWorks know this region shouldn't change, so it doesn't write_through, but don't know how i can check it...
EDIT 2: tried setting my pointers to be volatile => same behavior!
Please Help.

This may not be the answer but since you are seeing expected output, it confirms that .text section is changed. Only explanation I can think of is if you are using host tools to look at the .text memory then there is a possibility that information may be read from host.
Did you typed commands on target to look at the memory location?

Forgot about the question: but still have an answer.
there is an issue with the Host_Tools which does not show the changes to .text section.
while on the target, the bytes actually changed.
the function didn't work because my transformation was ruining dynamic linking.
my function code, had a call to function with a constant string "Whatever"
when i transformed the function code, i unintentionally, changed the reference of a relocation pointer which at loading time got a bad absolute PTR.
Lucky me, it pointed to a 0x00 buffer, and therefore printed an empty string without crashing.
Suggested Solutions:
Do not touch the relocated Pointers both when altering the Executable and altering at Run-time.
Create a static self-contained executable with absolute footprint => no dynamic relocation occurs that way.
alter dl() to transform the altered reloacted pointers back to their expected relocated.
alter dl() to infer from the altered relocated pointers the expected altered absolute pointer, so the transformation will create absolute pointer.
Note: I Choose #2 because it is the simplest, and because in my system, I do not need shared objects anyway.

Error after upgrading to iOS 7.1 SDK - implicit conversions with precision loss no longer allowed

I just upgraded Xcode + iOS SDK to the latest versions (5.1/7.1) and am now getting a bunch of errors about implicit conversions losing precision (NSInteger to int etc).
Does anybody know if there is a compiler flag or something that allows me to tell the compiler to treat those as warnings rather than errors again? I couldn't find anything so far. I really don't want to go through the code and add explicit casts everywhere as this would be in a lot of places.

This is an error for good reason. NSInteger throughout your codebase will make sure that when you compile the code for 32 and 64 bit iOS devices they are handled consistently. In the 32 bit world NSInteger and int were the same, but with the advent of the iPhone 5S and the iPad Air, iOS is no longer 32 bit only.
As others have said, there really is no way around this if you don't want trouble with modern devices.

If you just want to get back to work and deal with this problem later, then you need to remove arm64 from the 'Valid Architectures' and 'Architectures' items in your project's Build Settings.
As others have said, you really should fix the warnings, but this was a nasty little surprise by Apple (that is, adding the arm64 architecture to Build Settings on Xcode 5.1) so I can totally understand wanting to put these warnings away for a while and work on what ever you were working on before you decided to upgrade.

Indeed XCode now includes the arm64 architecture.
NSInteger is something completely different now as it is define in NSObjCRuntime.h :
#if __LP64__ || (TARGET_OS_EMBEDDED && !TARGET_OS_IPHONE) || TARGET_OS_WIN32 || NS_BUILD_32_LIKE_64
typedef long NSInteger;
typedef unsigned long NSUInteger;
#else
typedef int NSInteger;
typedef unsigned int NSUInteger;
#endif
To deal with it you should improve your codebase. First of all, you have to be really consistent. Assign NSInteger only to NSInteger and not to int. Avoid all kind of :
int i = [aString integerValue] ( as it returns a NSInteger)
but
NSInteger i = [aString integerValue] (and if it's a long type then you won't have any trouble.)
Moreover, another issue you might have is when you want to create a string from a value.
What you could do is something like:
#define cL(v) (long)(v)
#define cUL(v) (unsigned long)(v)
NSLog(#"array.count: %ld", cUL(anArray.count));
array.count returns an unsigned int under armv7(s) and an unsigned long under arm64. By always casting into an unsigned long you will not face any warning anymore and, more important, do not get any bug.
This "logic" have been introduced by Apple itself on some tech-talks videos there:
https://developer.apple.com/tech-talks/videos/ (video "Architecting Modern iOS Games". Play the video around 10m00s)

You should go and cast them the way they ought to be. Apple isn't complaining for no reason - it also ensures that you don't get any weird unexpected behaviour later down the line. I suggest you go and cast them all. It's extreme but well, it's clean.

Detect processor ArmV7 vs ArmV7s. Objective-C

Is it possible in iOS to detect the current processor of the device.
The project I am working on requires, programmatically, to check if the processor is ArmV7 or ArmV7s.

I don't know for certain that this will provide the information you'd like but I would try sysctl:
int32_t value = 0;
size_t length = sizeof(value);
sysctlbyname("hw.cpusubtype", &value, &length, NULL, 0);
The values for subtype are in mach/machine.h.
/*
* ARM subtypes
*/
#define CPU_SUBTYPE_ARM_ALL ((cpu_subtype_t) 0)
#define CPU_SUBTYPE_ARM_V4T ((cpu_subtype_t) 5)
#define CPU_SUBTYPE_ARM_V6 ((cpu_subtype_t) 6)
#define CPU_SUBTYPE_ARM_V5TEJ ((cpu_subtype_t) 7)
#define CPU_SUBTYPE_ARM_XSCALE ((cpu_subtype_t) 8)
#define CPU_SUBTYPE_ARM_V7 ((cpu_subtype_t) 9)
#define CPU_SUBTYPE_ARM_V7F ((cpu_subtype_t) 10) /* Cortex A9 */
#define CPU_SUBTYPE_ARM_V7K ((cpu_subtype_t) 12) /* Kirkwood40 */

I'm not aware of any API to check this, but you could perhaps write one yourself by providing v7 and v7s assembler implementations for the same symbol that simply return true or false as required.
Assuming that the v7s implementation will be used if and only if the processor supports v7s, it should work.

This way:
sysctlbyname("hw.cpusubtype", &value, &length, NULL, 0);
is good, but there are more easy way:
#include <mach-o/ldsyms.h>
int processorType = _mh_execute_header.cputype;
int processorSubtype = _mh_execute_header.cpusubtype;
Although, it shows you which cpu used in compile time for current code..

You don't.
That is, it is not advisable to programmatically detect ARM architecture version support at runtime on iOS. This is the answer I got from an Apple engineer when I specifically asked this question, and given the pitfalls (e.g. what do you do when you get an unexpected value?), I can believe them.
(yes, sometimes the correct answer to an "How do I?" question is "You don't.")

dyld API on iPhone - strange output

I have three question for you, all related to dyld :)
I have been using this dyld man page as a basis. I have compiled the following code and successfully executed the binary on my jailbroken device.
#include <stdio.h>
#include <mach-o/dyld.h>
int main(int argc, const char* argv[]) {
uint32_t image_count, i;
image_count = _dyld_image_count();
for (i = 0; i < image_count; i++) {
printf("%s\n", _dyld_get_image_name(i));
}
return 0;
}
I thought that these functions let me find all the shared libraries that are loaded in my program's address-space. On my mac, the output is pretty straightforward: It shows the paths to all the libraries that are currently loaded in memory. On my iPhone the output is nearly the same - i also get filepaths - but there are no files at the specified location. (On my mac on the other hand, i can locate the files!)
This is a sample line from the output:
/usr/lib/system/libdyld.dylib
According to ls, iFile and all the other tools i've used, this directory (/usr/lib/system/) is empty. Why? Where are those files?
Another thing i'd like to know is: Is it possible to locate a library in memory? From what offset to what offset the library is mapped into memory? I think i know how to find the beginning but i have no idea how to find the end of the library. To find the beginning, i'd use the address returned by _dyld_get_image_header - Is that correct?
Last question: I wanted to load a dynamic lib system-wide so i assumed i could use DYLD_INSERT_LIBRARIES to do just that. However, every binary i try to execute after inserting my lib crashes and produces a bus error! Did i forget something or is it the dynamic library that causes the crash?

the libraries are located at :
/System/Library/Caches/com.apple.dyld/dyld_shared_cache_armv6 (_armv7)
This is a big file were all the single libraries have been joined into one large one.
See http://iphonedevwiki.net/index.php/MobileSubstrate for hooking on jailbroken device
Yes one can determine the position of a dylib in memory, even on non jailbroken devices.
parse the LC_SEGMENT(_TEXT)-Section Header(_text) of the library then you can get the base address of the library and the size of the TEXT __text segment. Then query for the vmslide. Add this to the base address of the TEXT __text.
A detailed description of the mach-o file format can be found here:
https://developer.apple.com/library/mac/#documentation/DeveloperTools/Conceptual/MachORuntime/Reference/reference.html. Pay special attention to "segment_command"-structure.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Usage of global vs. constant memory in CUDA - memory

Related

On iOS, are 0x0000000000000026, 0x000000000000001c, 0x000000000000005a examples of tagged pointers?

Windriver VxWorks Simulator Self modifying code

Error after upgrading to iOS 7.1 SDK - implicit conversions with precision loss no longer allowed

Detect processor ArmV7 vs ArmV7s. Objective-C

dyld API on iPhone - strange output

Categories

Resources