cudaMemcpy allows programmers to explicitly specify the direction of memory transfer.
Is there any advantage of manually specifying memory transfer direction (cudaMemcpyDeviceToHost/cudaMemcpyHostToDevice/cudaMemcpyDeviceToDevice) instead of letting cuda automatically infer (cudaMemcpyDefault) from the pointer values?
tl;dr: Almost certainly no advantage.
cudaMemcpyDefault was added IIRC when GPUs started becoming capable of easily identifying the memory space by inspecting the address ("Unified virtual addressing"). Before that, you had to specify the direction. See, for example, the CUDA 3 documentation, accessible here. Look for cudaMemcpyKind in the API reference - no Default, just H2H, H2D, D2H and H2H.
When this changed, I guess it made sense to nVIDIA not to overload the function or name it differently, but just add a different constant value for the new capability.
I'm not 100% certain there's no difference, it's just very reasonable; and speaking from anecdotal personal experience, I've not seen any advantage/difference. Certainly the copying is not faster.
From the docs of cudaMemcpy():
[...] Passing cudaMemcpyDefault is recommended, in which case the type of transfer is inferred from the pointer values. However, cudaMemcpyDefault is only allowed on systems that support unified virtual addressing. [...]
Therefore if you have a GPU that allows unified virtual addressing, use cudaMemcpyDefault, otherwise you got no option than to be explicit.
You can query if your system supports it with
cudaGetDeviceProperties() with the device property cudaDeviceProp::unifiedAddressing.
Related
Communication in AUTOSAR can take two major paths : com and ldcom. I understand that ldcom is more efficient version of com (By removing most COM features). But are there any general rules/Criterion that can help decide to stick with one of them?
If you are reimplementing all the features of Com within your SWC(s), then you are definitely using LdCom for the wrong reasons.
And if your SWCs now depend also too much on the definition of the network (e.g. you have to change your SWC always as soon as the network description changes), then you also using LdCom for the wrong reasons.
I now understand how virtual memory works and what is responsible for setting up this virtual memory. However, some days ago, I encountered memory segmentation that split up the address space into segments like data and text. I cannot find any clear, non-ambiguous resources (at least to me) that explains memory segmentation. For instance, I would like to know,
What is responsible for splitting up address spaces into segments?
How exactly does it work? Like how are segments translated to physical addresses, and what checks if an address within a certain segment has been accessed?
I have found this wiki article but it does not really answer such questions.
The term "segment" appears in at least two distinct memory contexts.
In ye olde days, segmentation was method used for memory protection. The Intel chips continued the use of segments for decades after they were obsolete. Intel finally dropped the used of segments in 64-bit mode but they still exist in vestigial form and they still exist in 32-bit mode.
That is the type of "segmentation" described in the wikipedia link.
The "code" and "data"-type segmentation is something entirely different. Another term for this is "program section."
When you link your code, the linker usually groups memory with the same attributes into "program sections" (aka "segments"). Typically you will have memory that:
Is read only/execute (code)
read/write and initialized to zero
read/write and initialized to specified values
Read only
In order to control the grouping of related memory, linkers generally used named segments/program sections. A linker may, by default, create a program section/segment called "Code" and place all the executable code in that segment. It make create, by default, a segment called "Data" and place the read only data in that segment.
Powerful linkers allow the programmer to override these. Some assembly languages and system languages allow you to specify program sections.
"Segments" in this context only exist only in the linking process. There is no area in memory marked "Code" or "Data" (unless you are using the olde Intel system).
What is responsible for splitting up address spaces into segments?
The address space is not split up into segments of this second type on modern systems (ie those designed after 1970 and not from Intel). Some confusing books use this as a pedagogical concept in diagrams. A process can (and usually does) have code pages interspersed with data pages.
Like how are segments translated to physical addresses, and what checks if an address within a certain segment has been accessed?
That question relates to the use of the term "Segment" described at the top. That translation is done using hardware registers.
Well, to be honest I prefer you to consult books that have basics and thorough materials rather reading articles. Because, their content is specific and of above basic level (to me).
Every term in your question is a separate topic that are very well described in bellow reference. If you really want answers and clear concepts then you should go through this:
Read out Abraham Silberschatz's "Operating system concepts".
Chapter 8: Memory Management
Sub topics: Paging basic method and hardware support, Segmentation
can I use more than one stack in microprocessor?
and if I can,How can I progamming those?
Sure you can. Several CPU architectures have multiple stack pointers - even lowly 8-bit processors, such as the M6809. And even if the concept is not implemented in the CPU hardware, you can easily create multiple stacks in software. A stack pointer is basically simply an index register, so you could (for example) use the IX and IY registers of the Z80 to implement multiple stacks.
If your microprocesser has more than one hardware stack, then yes, you can. You would have to write assembler though, since no c/c++ implementation makes use of multiple stacks.
It would be easier to help if you could say exactly what architecture you're talking about.
As for the how of it. Generally there is a special register or memory location that is used to point to the stack. Using another stack is just as simple as setting this value. This is all processor and architecture dependent so it depends on the one you are using.
On some platforms, the stack used for return addresses is entirely separate from the one used for parameter passing. Indeed, on some platforms, C compilers don't permit recursion and don't use any stack for parameter passing. Frankly, I like such designs, since they minimize the likelihood of stack problems causing errant program behavior.
Most of the literature on Virtual Memory point out that the as a Application developer,understanding Virtual Memory can help me in harnessing its powerful capabilities. I have been involved in developing applications on Linux for sometime but but didn't care about Virtual Memory intricacies while I code. Am I missing something? If so, please shed some light on how I can leverage the workings of Virtual Memory. Else let me know if am I not making sense with the question!
Well, the concept is pretty simple actually. I won't repeat it here, but you should pick up any book on OS design and it will be explained there. I recommend the "Operating System Concepts" from Silberscahtz and Galvin - it's what I had to use in the University and it's good.
A couple of things that I can think of what Virtual Memory knowledge might give you are:
Learning to allocate memory on page boundaries to avoid waste (applies only to virtual memory, not the usual heap/stack memory);
Lock some pages in RAM so they don't get swapped to HDD;
Guardian pages;
Reserving some address range and committing actual memory later;
Perhaps using the NX (non-executable) bit to increase security, but im not sure on this one.
PAE for accessing >4GB on a 32-bit system.
Still, all of these things would have uses only in quite specific scenarios. Indeed, 99% of applications need not concern themselves about this.
Added: That said, it's definately good to know all these things, so that you can identify such scenarios when they arise. Just beware - with power comes responsibility.
It's a bit of a vague question.
The way you can use virtual memory, is chiefly through the use of memory-mapped files. See the mmap() man page for more details.
Although, you are probably using it implicitly anyway, as any dynamic library is implemented as a mapped file, and many database libraries use them too.
The interface to use mapped files from higher level languages is often quite inconvenient, which makes them less useful.
The chief benefits of using mapped files are:
No system call overhead when accessing parts of the file (this actually might be a disadvantage, as a page fault probably has as much overhead anyway, if it happens)
No need to copy data from OS buffers to application buffers - this can improve performance
Ability to share memory between processes.
Some drawbacks are:
32-bit machines can run out of address space easily
Tricky to handle file extending correctly
No easy way to see how many / which pages are currently resident (there may be some ways however)
Not good for real-time applications, as a page fault may cause an IO request, which blocks the thread (the file can be locked in memory however, but only if there is enough).
May be 9 out of 10 cases you need not worry about virtual memory management. That's the job of the kernel. May be in some highly specialized applications do you need to tweak around them.
I know of one article that talks about computer memory management with an emphasis on Linux [ http://lwn.net/Articles/250967 ]. Hope this helps.
For most applications today, the programmer can remain unaware of the workings of computer memory without any harm. But sometimes -- for example the case when you want to improve the footprint of your program -- you do end up having to manipulate memory yourself. In such situations, knowing how memory is designed to work is essential.
In other words, although you can indeed survive without it, learning about virtual memory will only make you a better programmer.
And I would think the Wikipedia article can be a good start.
If you are concerned with performance -- understanding memory hierarchy is important.
For small data sets which are fully contained in physical memory you need to be concerned with caching (accessing memory from the cache is much faster).
When dealing with large data sets -- which may be paged out due to lack of physical memory you need to be careful to keep your access patterns localized.
For example if you declare a matrix in C (int a[rows][cols]), it is allocated by rows. Thus when scanning the matrix, you need to scan by rows rather than by columns. Otherwise you will be paging the same data in and out many times.
Another issue is the difference between dirty and clean data held in memory. Clean data is information loaded from file that was not modified by the program. The OS may page out clean data (perhaps depending on how it was loaded) without writing it to disk. Dirty pages must first be written to the swap file.
What is "tagged memory" and how does it help in reducing program size?
You may be referring to a tagged union, or more specifically a hardware implementation like the tagged architecture used in LISP machines. Basically a method for storing data with type information.
In a LISP machine, this was done in-memory by using a longer word length and using some of the extra bits to store type information. Handling and checking of tags was done implicitly in hardware.
For a type-safe C++ implementation, see boost:variant.
Not sure, but it is possible that you are referring to garbage collection, which is the process of automatically disposing of no longer used objects created when running a program.
"Tagged memory" can be a synonym for mark-and-sweep, which is the most basic way to implement garbage collection.
If this is all wrong, please edit your question to clarify.
The Windows DDK makes use of "pool tags" when allocating memory out of the kernel page pool. It costs 4 bytes of memory per allocation, but allows you to label (i.e. tag) portions of kernel memory which might help with debugging and detecting memory leaks.
BTW I don't see how anything called "tagged memory" could reduce program code size. It sounds like extra work, which translates to "more code" and "bigger program." Maybe it's meant to reduce the memory footprint somehow?
Here's a more technical description going into the implementation details as to how this is used for garbage collection. You may also want to check out the wikipedia article about Tagged Pointers.