OpenCL 1.2 Mem Object flags

OpenCL 1.2 Mem Object flags - memory

So reading through the OpenCL 1.2 reference pages, I noticed a difference in clCreateBuffer.
There are three new cl_mem_flags that pretain to the host usage: CL_MEM_HOST_READ_ONLY, CL_MEM_HOST_WRITE_ONLY, and CL_MEM_HOST_NO_ACCESS. I was just a bit confused how these differ from the cl_mem_flags from earlier versions? Wouldn't CL_MEM_READ_ONLY AND CL_MEM_WRITE_ONLY accomplish the same thing?
Also, do these flags affect how you call functions such as clEnqueueRead/Write/Map/UnmapBuffer ?

These new flags are basically the "host-counterparts" of the original flags.
For example, consider the CL_MEM_READ_ONLY flag:
This flag specifies that the memory object is a read-only memory object when used inside a kernel.
In contrast to that, for CL_MEM_HOST_READ_ONLY:
This flag specifies that the host will only read the memory object
(from the clCreateBuffer documentation, emphasis by me)
They thus allow a more fine-grained specification of what you will do with the memory:
Host read, Kernel write
Host write, Kernel write
Host read+write, Kernel read
....
This enables the OpenCL implementation to perform optimizations under the hood, quoting from the above mentioned documentation:
This flag specifies that the host will only write to the memory object (using OpenCL APIs that enqueue a write or a map for write). This can be used to optimize write access from the host (e.g. enable write combined allocations for memory objects for devices that communicate with the host over a system bus such as PCIe).
Of course this will affect how the buffer may be used. For example, when you create a host-read-only buffer with CL_MEM_HOST_READ_ONLY, then an attempt to write to this buffer with clEnqueueWriteBuffer will fail - again, refering to the documentation:
Errors
clEnqueueWriteBuffer returns CL_SUCCESS if the function is executed successfully. Otherwise, it returns one of the following errors:
...
CL_INVALID_OPERATION if clEnqueueWriteBuffer is called on buffer which has been created with CL_MEM_HOST_READ_ONLY

Related

Is it possible to modify command line parameters programmatically in Delphi?

I'm working with some legacy code that uses ParamCount() and ParamStr() in various places, and now I need to provide different values than those that were actually passed as command line variables.
The simplest solution would be to programmatically add/modify the existing command line parameters, since the alternative is to change A LOT of the legacy code to accept function parameters rather than directly accessing ParamCount() and ParamStr().
Is this possible? Can I somehow add/modify parameters from within the program itself so that ParamCount() ParamStr() will pick up my new/modified parameters?
Edit, clarification of the legacy code:
The legacy code makes some database requests, using where arguments from the command line (and sanitizing them). This is fine in 99.9% of all cases, as these arguments are fundamental for the purpose of the legacy units. However, I'm working on a new feature that "breaks the mold", where one of these fundamental arguments are unknown and need to be fetched from the database and provided internally.
Yes, I could search and replace, but my objective here is to not touch the legacy code, as it's in a unit that is shared among many different programs.
Restarting the program and/or executing a new copy of it from itself is one solution, but seems a bit risky and cumbersome. This is a production program that executes on a server and needs to be as simple and robust as possible.

Is this possible? Can I somehow add/modify parameters from within the program itself so that ParamCount() ParamStr() will pick up my new/modified parameters?
Technically yes, but it is not something that the RTL itself exposes functionality for, so you will have to implement it manually.
Since you are working with legacy code, I'm assuming you are working on Windows only. In which case, ParamStr() and ParamCount() parse the string returned by the Win32 API GetCommandLine() function in kernel32.dll.
So, one option is to simply hook the GetCommandLine() function itself at runtime, such as with Microsoft's Detours, or other similar library. Then your hook can return whatever string you want 1.
1: for that matter, you could just hook ParamCount() and ParamStr() instead, and make them return whatever you want.
Another option - which requires messing around with lower-level memory that you don't own, and I don't advise doing this - is to obtain a pointer to the PEB structure of the calling process. You can obtain that pointer using NTQueryInformationProcess(ProcessBasicInformation). The PEB contains a ProcessParameters field, which is a pointer to an RTL_USER_PROCESS_PARAMETERS struct, which contains the actual CommandLine string as a UNICODE_STRING struct. If your altered string is less then, or equal to, the length of the original command line string, you can just alter the content of the ProcessParameters.CommandLine in-place. Otherwise, you would have to allocate new memory to hold your altered string and then update ProcessParameters.CommandLine to point at that new memory.

Get lua state inside lua

I would like to get the lua state from inside lua so I can pass it too an external program that cannot be hooked up using ffi or dll. I just need a pointer to it and the ability to share it(shared memory across program boundaries.
That or I can create a lua state in my program and then pass that so I would simply need to set the lua state to it inside lua(and it would have to work with shared memory).
I've thought about sharing data using json but ideally I would like to directly access objects.

Lua is pretty good about avoiding heap allocation and global pointers to allocated memory. lua_newstate takes an allocator function as a parameter. The provided function will be used to allocate/deallocate all memory associated with the lua_State object. Including the pointer returned by lua_newstate.
So hypothetically, you could provide an allocator function that allocates/deallocates interprocess shared memory. And then, you can just pass the lua_State to some other process and access it.
First, you clearly cannot do this "from inside lua"; that kind of low-level thing just ain't happening. You cannot access the lua_State object from within Lua. You must be in control of the lua_State creation process for that to be a possibility. So we're talking about C (equivalent) code here, not in-Lua code.
Now, you can expose a C function to Lua which returns a light userdata that just so happens to be the exact lua_State* in question. But Lua can't really do much with light userdata other than pass it to other C function APIs.
Second, while the Lua system provides a guarantee that it will only allocate memory through the allocator, the system does not provide a guarantee that what you're trying to do will work. It is entirely possible that the Lua implementation does use process global memory, so long as it does it in such a way that different threads can access that global memory without breaking threading guarantees.
Obviously, you can inspect the Lua implementation to see if it does anything of the kind. But my point is that the guarantees are that each independent lua_State will be thread-isolated from each other and that each lua_State will only allocate memory through the given allocator. There is no guarantee that Lua's implementation doesn't have some global storage that it uses for some purpose.
So simply sharing the memory allocated by the Lua state may not be enough.
Also, even if this works, the two processes cannot access the same lua_State object at the same time, just like two threads in the same process cannot access the lua_State at the same time.

The lua state is not designed to leave the program / thread it is executing in.
Doing a query on a running lua_state could result in a crash, because it is only notionally consistent when a lua call returns, or a C api function is called. During execution, some un-locked modifications could cause uninitialized memory access, or ininite loops due to lists being inconsistent.

Questions about passing strings and other data from UI to LV2 plugin

I need to pass a string from the UI to the plugin. From the eg-sample, it appears that an LV2 atom should be written to a atom port.
If I understand it correctly
First allocate a LV2_Atom_Forge. May that object be on the stack or does it have to survive after the UI event callback has returned?
Call lv2_atom_forge_set_buffer. How do I know the required size of the buffer? The example sets it to 1024 bytes for no reason. May the buffer be allocated on the stack or does it have to survive the UI after the UI event callback has returned?

The forge is just a utility for writing atoms. The buffer it writes to is provided by the code that uses it, so the lifetime of the forge itself is irrelevant. Allocating it on the stack is fine, though it may be more convenient to keep one around in your UI struct for use in various places.
You can estimate the space required by knowing the format of atoms as described in the documentation, or simply implementing everything with a massive buffer at first and checking the size field of the top-level atom in your output. Keep in mind that this will change if you have variable-sized elements like strings in there. The data passed to the UI callback(s) is const and only valid during the call, it must be copied by the receiver if it needs to be available later.

Reading ARM CPU registers from LKM

I want to read the values stored in the Link Register or Frame Pointer from a linux kernel module and I am not sure the syntax to use. For context, I've compiled Android goldfish 3.4 kernel and am using insmod to load my module into the kernel.

My knowledge of this area is entirely hobbyist in nature, someone else might know something really stylish that obviates this dangerous and hackish method.
As a philosophical issue, the kernel doesn't tamper with user-mode operation as part of it's normal duties. This means you are going to have to tamper with the direct operation of the kernel and potentially cause crashes, corruption and other problematic c-words.
There are two ways to go about doing this. You can go through the syscall entry/exit mechanism: switching a single running thread from running usermode code to running kernel code in the context of that thread while slyly replacing it's stored registers before it goes back again. The second is the context switch mechanism itself, which switches in kernel mode from running in the context of one thread to another, again replacing relevant stored register material.
The operating theory behind all of this is that each user thread has both a user-mode stack and a kernel-mode stack. When a thread enters the kernel, the current value of the user-mode stack and instruction pointer are saved to the thread's kernel-mode stack, and the CPU switches to the kernel-mode stack. The remaining register values and flags are then also saved to the kernel stack.
At this stage, you can directly read and modify those values prior to the process being returned off the run queue. After this, when your thread returns from the kernel to user-mode, the register values and flags are popped from the kernel-mode stack, then the user-mode stack and instruction pointer values are restored from the modified values on the kernel-mode stack.
The schedule has an internal mechanism that selects the process to run next, calling switch_to(). As the name implies, this function essentially just switches the kernel stacks - it saves the current value of the stack pointer into the TCB for the current thread (called struct task_struct in Linux), and loads a previously-saved stack pointer from the TCB for the next thread. You can use this to calculate the user-mode process in question (possibly requiring a cross-reference of existing kernel-mode process structures)

The way to look at the state of the current userspace process from kernel-side is current_pt_regs() (cf. task_pt_regs() for a specific task). This gets you a pointer to a struct pt_regs, which is the same thing you'd find in the mcontext_t in a signal handler (on ARM at least). The kernel even provides nice accessor macros to make the whole caboodle rather civilised - reading through existing uses in the source should give a good feel of how to do it, but for the sake of completeness here's a trivial example*:
#include <asm/ptrace.h>
void func()
{
struct pt_regs *regs = current_pt_regs();
pr_info("User LR was %p\n", (void *)regs->ARM_lr);
}
You'd have to know the ABI details of the userspace binary to know which, if any, register is being used as a frame pointer, but if there is one it's typically in r11 or r7.
*Code typed directly into browser late at night, usual disclaimers apply, etc.

Executable Ada code on the stack

I've just watched a talk on security considerations for railway systems from last year's 32C3.
At minute 25 the speaker briefly talks about Ada. Specifically he says:
Typical Ada implementations have a mechanism called "(tramp / trunk /
?) lines". And that means it will execute code on [the] stack which is
not very good for C programs. And [...] if you want to link Ada code with C
libraries, one of the security mechanisms won't work.
Here is a link (YouTube) to the respective part of the talk. This is the slide in the background. As you see I am unsure about one of the words. Perhaps it's trampolines?
Now my blunt question: Is there any truth in this statement? If so, can anyone elaborate on this mysterious feature of the Ada language and the security mechanism it apparently influences?
Until now I always assumed that code lives in a code segment (aka "text") whereas data (including the stack) is placed in a data segment at a different memory location (as depicted in this graphic). And reading about memory management in Ada suggests it should not be much different there.
While there are ways to circumvent such a layout (see e.g. this "C on stack" question and this "C on heap" answer), I believe modern OSes would commonly prevent such attempts via executable space protection unless the stack is explicitly made executable. - However, for embedded systems it may be still an issue if the code is not kept on ROM (can anyone clarify?).

They're called "trampolines". Here's my understanding of what they're for, although I'm not a GNAT expert, so some of my understanding could be mistaken.
Background: Ada (unlike C) supports nested subprograms. A nested subprogram is able to access the local variables of enclosing subprograms. For example:
procedure Outer is
Some_Variable : Integer;
procedure Inner is
begin
...
Some_Variable := Some_Variable + 1;
...
Since each procedure has its own stack frame that holds its own local variables, there has to be a way for Inner to get at Outer's stack frame, so that it can access Some_Variable, either when Outer calls Inner, or Outer calls some other nested subprograms that call Inner. A typical implementation is to pass a hidden parameter to Inner, often called a "static link", that points to Outer's stack frame. Now Inner can use that to access Some_Variable.
The fun starts when you use Inner'Access, which is an access procedure type. This can be used to store the address of Inner in a variable of an access procedure type. Other subprograms can later use that variable to call the prodcedure indirectly. If you use 'Access, the variable has to be declared inside Outer--you can't store the procedure access in a variable outside Outer, because then someone could call it later, after Outer has exited and its local variables no longer exist. GNAT and other Ada compilers have an 'Unrestricted_Access attribute that gets around this restriction, so that Outer could call some outside subprogram that indirectly calls Inner. But you have to be very careful when using it, because if you call it at the wrong time, the result would be havoc.
Anyway, the problem arises because when Inner'Access is stored in a variable and later used to call Inner indirectly, the hidden parameter with the static link has to be used when calling Inner. So how does the indirect caller know what static link to pass?
One solution (Irvine Compiler's, and probably others) is to make variables of this access type have two values--the procedure address, and the static link (so an access procedure is a "fat pointer", not a simple pointer). Then a call to that procedure will always pass the static link, in addition to other parameters (if any). [In Irvine Compiler's implementation, the static link inside the pointer will be null if it's actually pointing to a global procedure, so that the code knows not to pass the hidden parameter in that case.] The drawback is that this doesn't work when passing the procedure address as a callback parameter to a C routine (something very commonly done in Ada libraries that sit on top of C graphics libraries like gtk). The C library routines don't know how to handle fat pointers like this.
GNAT uses, or at one time used, trampolines to get around this. Basically, when it sees Inner'Unrestricted_Access', it will generate new code (a "trampoline") on the fly. This trampoline calls Inner with the correct static link (the value of the link will be embedded in the code). The access value will then be a thin pointer, just one address, which is the address of the trampoline. Thus, when the C code calls the callback, it calls the trampoline, which then adds the hidden parameter to the parameter list and calls Inner.
This solves the problem, but creates a security issue when the trampoline is generated on the stack.
Edit: I erred when referring to GNAT's implementation in the present tense. I last looked at this a few years ago, and I don't really know whether GNAT still does things this way. [Simon has better information about this.] By the way, I do think it's possible to use trampolines but not put them on the stack, which I think would reduce the security issues. When I last investigated this, if I recall correctly, Windows had started preventing code on the stack from being executed, but it also allowed programs to request memory that could be used to dynamically generate code that could be executed.

A presentation in 2003 on Ada for secure applications (D. Wheeler, SigAda 2003) supports this on page 7 : (quote)
How do Ada and security match poorly?
...
Ada implementations typically need to execute code on the stack("trampolines", e.g. for access values to nested subprograms).
In other (C) words, for function pointers, where the subprograms are nested within other subprograms.
(Speculating : presumably these function pointers are on the stack so they go out of scope when you leave the scope of the outer subprogram)
HOWEVER
A quick search also showed this gcc mailing list message :
[Ada] remove trampolines from front end
dated 2007, which refers to enabling Gnat executables to run on systems with DEP (data execution protection), by eliminating precisely this problematic feature.
This is NOT an authoritative answer but it seems that while "typical" Ada implementations do (or did) so, it may not be the case for at least Gnat this side of 2007, thanks to protection systems on newer hardware driving the necessary changes to the compiler.
Or : definitely true at one time, but possibly no longer true today, at least for Gnat.
I would welcome a more in-depth answer from a real expert...
EDIT : Adam's thorough answer states this is still true of Gnat, so my optimism should be tempered until further information.

FSF GCC 5 generates trampolines under circumstances documented here. This becomes problematic when the trampolines are actually used; in particular, when code takes ’Access or ’Unrestricted_Access of nested subprograms.
You can detect when your code does this by using
pragma Restrictions (No_Implicit_Dynamic_Code);
which needs to be used as a configuration pragma (although you won’t necessarily get warnings at compilation time, see PR 67205). The pragma is documented here.
You used to set configuration pragmas simply by including them in a file gnat.adc. If you’re using gnatmake you can also use the switch -gnatec=foo.adc. gprbuild doesn’t see gnat.adc; instead set global configurations in package Builder in your project file,
package Builder is
for Global_Configuration_Pragmas use "foo.adc";
end Builder;
Violations end up with compilation errors like
$ gprbuild -P trampoline tramp
gcc -c tramp.adb
tramp.adb:26:12: violation of restriction "No_Implicit_Dynamic_Code" at /Users/simon/cortex-gnat-rts/test-arduino-due/gnat.adc:1

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart