memcpy from user space to kernel space - virtual

Is it possible to copy data from a user space address to kernel space? If so, who will handle translating VMA so there are no page faults? Would memcpy guarantee this to be safe?

copy_from_user
should do what you want

Related

How to efficiently overwrite the memory region on GPU?

I allocate the data block on GPU. And I have an algorithm to generate new data to replace the old one. The new buffer has the same data size. There is a solution is to bring the old data back to the cpu and then erase it but I think that’s highly inefficient and very slow. Is it possible to overwrite the old element with the new data at the same location?
If your kernels accept a pointer that's pointing to some buffer region, you may be able to just pass the original data pointer to that kernel, causing your input data to be overwritten by the results of the kernel.
Or if you're working with an algorithm that requires using a buffer, you could use cudaMemcpy to copy the results stored in the buffer to the region of memory holding your input data, overwriting it in the process.

Can a userspace process kfree() memory with GFP_USER?

I have a kernel module that handles IOCTL calls from userspace. One of the calls needs to return a variable length buffer from the kernel into userspace. From the module, I can kmalloc( ..., GFP_USER) a buffer for the userspace process to use. But, my question is, can this buffer be free'd from userspace or does it need to be free'd from kernel space?
Alternatively, is there a better way to handle data transfer with variable length data?
No, user space can't free kernel memory. Your module would have to offer another call / ioctl to let user space tell your kernel code to free the memory. You would also have to track your allocations to make sure to free them when the user space process exits so as not to leak memory… Also kernel memory is not swappable, if user space makes you allocate memory again and again it could run the kernel out of memory so you have to guard against that, too.
The easier method is to just let user space offer the buffer from its own memory. Include a maximum length argument in the call so that you won't write more than user space expects and return partial data or an error if the size is too small, as appropriate.
GFP_USER - means that its a kernel space memory that you can allow the user to access (used as a marker for shared kernel/user pages). note, process can sleep/block and run only in process context.
However, memory which gets allocated in the kernel space gets always get freed in the kernel space, and vis-a-vis for user space.

Calculate hash of the BIOS

I've read that BIOS is mapped to memory at f000:. At f000:fff0 I see JMP to f000:e05b. At e05b another jump. So, the code jumps many times within f000 segment. So, the questions:
1) If I calculate hash of the segment f000:0000 - f000:ffff will I get the hash of the BIOS code?
2) Whether the all bytes of the segment are constant during warm reboot?
Not necessarily. The BIOS ROM may map to a larger or smaller area than that (though some early BIOSes did map to exactly that memory range).
Probably, but again, not necessarily.

Is it possible to use cudaMemcpy with src and dest as different types?

I'm using a Tesla, and for the first time, I'm running low on CPU memory instead of GPU memory! Hence, I thought I could cut the size of my host memory by switching all integers to short (all my values are below 255).
However, I want my device memory to use integers, since the memory access is faster. So is there a way to copy my host memory (in short) to my device global memory (in int)? I guess this won't work:
short *buf_h = new short[100];
int *buf_d = NULL;
cudaMalloc((void **)&buf_d, 100*sizeof(int));
cudaMemcpy( buf_d, buf_h, 100*sizeof(short), cudaMemcpyHostToDevice );
Any ideas? Thanks!
There isn't really a way to do what you are asking directly. The CUDA API doesn't support "smart copying" with padding or alignment, or "deep copying" of nested pointers, or anything like that. Memory transfers require linear host and device memory, and alignment must be the same between source and destination memory.
Having said that, one approach to circumvent this restriction would be to copy the host short data to an allocation of short2 on the device. Your device code can retrieve a short2 containing two packed shorts, extract the value it needs and then cast the value to int. This will give the code 32 bit memory transactions per thread, allowing for memory coalescing, and (if you are using Fermi GPUs) good L1 cache hit rates, because adjacent threads within a block would be reading the same 32 bit word. On non Fermi GPUs, you could probably use a shared memory scheme to efficiently retrieve all the values for a block using coalesced reads.

address space and byte adressability

A microprocessor is byte addressable with 24bit address bus and 16bit data bus and one word contains two bytes. I was asked a question regarding attaching peripherals, adding memory, and address space and there's a few general concepts I don't see why they work.
Why is it that to calculate the address space you use the address bus not the data bus? Is the address space a function of the address bus or does it have to do with the microprocessor? How is it relevant that one word contains two bytes?
Why is it that to calculate the address space you use the address bus not the data bus?
Because it's the address bits that go out to the memory subsystem to tell them which memory location you want to read or write. The data bits just carry the data being read or written.
Is the address space a function of the address bus or does it have to do with the microprocessor?
Yes, the address space is a function of the address bus though there are tricks you can use to expand how much memory you can use.
An example of that is bank switching which gives you more accessible memory but no more address space (multiple blocks of memory co-exist at the same address, one at a time).
Another example is shown below where you can effectively double the usable memory, provided you're willing to only read and write words.
How is it relevant that one word contains two bytes?
The data bus size generally dictates the size of a memory cell. Larger memory cells can mean you can have more memory available to you but not more memory cells.
With your example, assuming you can only access words, you could get 16 megawords which is 32 megabytes.
This depends, of course, on how the memory is put together. It may be that you are able to access memory on individual byte boundaries (e.g., bytes 0/1 or 1/2 or 2/3) rather than just word boundaries, which would mean you don't actually get that full 32MB but only 16MB plus maybe one extra byte when you read the word at address FFFFFF).

Resources