Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have been reading about environment variables in Linux. I can understand how to set / read them.
What I want to ask is conceptually why were environment variables added to Linux ? Is there a system requirement that cannot be fulfilled without them ?
To my mind, it seems like an additional configuration layer was needed for some reason, something that was decoupled from the actual process functionality. I want to know the opinion of more experienced developers.
environment variables [herein: env] fulfill a need that can't [conveniently] be handled in other ways.
So, we ask the question: How do programs get configuration data that may change on each invocation?
We could pass everything as program arguments: pgmA PATH=... DISPLAY=... but the program would have to parse that. And, when pgmA invokes pgmB, it would have to pass this data along as arguments. In other words, every program would have to be aware of every variable, even if it had no use for the variable itself.
We could put everything in a config file, but we'd need a different one for each invocation. Where to put these files, how to guarantee they have unique names, how/when to delete them when they're no longer needed [even in the face of an aborted program], becomes intractable.
With env variables, they reside in the program in a special memory section that the kernel creates when the program is execed, and the std execvp et. al. (with some kernel help) will cheerfully pass along this environment without most programs having to do anything.
However, pgmA is at liberty, after a fork, and in the child before execvp to change something for pgmB (e.g. adds an extra directory to PATH). In other words, env is hierarchical between parent and child (e.g. changing child does not change parent--a good thing)
The env also allows things to be passed around that lower order programs need. Consider that pgmA fork/execs pgmB which in turn fork/execs pgmC. pgmA/pgmB are just ordinary programs [and don't use any env variables themselves]. But pgmC is xterm which needs to know what X11 display to output on. It gets that from the DISPLAY env var.
Consider that we ran the above from the main GUI console from within a window terminal program. DISPLAY would [probably] be :0. The xterm shows up on the local screen. Now consider that we do the same exact thing from an ssh login. Here DISPLAY will be w.x.y.z:0 and the xterm will execute on the local machine, but will display itself on the remote system's screen.
Another major use of the environment is to provide configuration for things that don't have [legitimate] access to argv or config files. Namely, shared libraries [.sos]. Here are two examples:
When an ELF program is execed by the kernel, it maps the executable file into memory. It then looks in a special section for the "ELF loader", which, under Linux is (e.g.) /lib64/ld-linux-x86-64.so.2. The kernel maps the loader into the application memory and turns control of the program over to the loader. The loader, in turn, resolves references to, and loads, the shared libraries the program needs, and then transfers control to the program's start function.
The ldd program will print out the shared libraries that a given program uses. But, it doesn't actually do that itself. It sets an env variable, then execs the target program. The ELF loader sees this variable, and instead of executing the program, it merely loads it, printing the names of the shared libraries. The ELF loader has many env vars that can affect its operation (e.g. see man ld.so).
Another library that uses the environment is glibc. When glibc encounters a fatal error, such as double freeing a pointer, or heap corruption, it will print an error message. Normally, glibc will output this to /dev/tty. Sometimes, that's not desirable and we'd rather the error message go to stderr [where we've opened an error log file]. To get glibc to honor our wishes, we set the env var LIBC_FATAL_STDERR_ to 1 before we invoke the program.
This config stuff could be handled by a WinX registry-like interface and the data could reside in per-process kernel memory. But, that's cumbersome for the kernel and the program. The kernel doesn't want to carry this variable sized information in [precious] kernel address space. An application doesn't want the overhead of using syscalls to get at it.
Most C programmers write their main function as int main(int argc,char **argv) but with what gets actually passed it could [more properly] be written as int main(int argc,char **argv,char **envp). The envp actually points to a definition:
char *environ[] = {
"DISPLAY=:0",
"PATH=/usr/bin:/bin",
...
NULL
};
The libc functions getenv, setenv, putenv operate on the global. But, when passing the pointer to execvpe you can specify a different env array altogether, filled with whatever you want. So, you can manipulate the array [and, hence, the environment] directly.
Historical footnote: environment variables aren't specific to Linux. They weren't added--they have always been there [in Linux]. And, the environment has been part of just about any unix-like system, unchanged(!), since the earliest incarnations, just like argc/argv. This goes back to [at least] Bell Labs' Unix V7 [and probably earlier].
Variables are used for writing scripts and program management. For example, you can write a script greeting and indicate in it the variable $ USER, and it will display the name of the current user. Programs can access variables, and variables can be used in their work. For example, the programs running in the GUI on Linux, you can specify the X server, with which they will display their data.
Environment variables is really helpful for running programs on Linux. We usually need to read some system configurations in our program and environment variable is a good place to allow us to access these configurations.
Related
I'm working with a modding api for a game, for those curious it's factorio but it's not really relevant, and the Lua environment is HEAVILY limited, blocking functions like setfenv, it's a 5.1 environment and I do have access to loadstring, pcall, etc. My question is how would you recommend running 'unsafe' code that is provided by a user and limiting what functions they can access without access to environment modification functions? (Preferably whitelist functions/values instead of blacklist, but I'll take whatever I can get)
In Lua 5.1 you need setfenv to create a secure sandbox (see this answer for a typical procedure). So if you don't have access to setfenv, then I don't think it can't be done.
Then again, if the environment you're working in has disabled setfenv and has put a wrapper around loadstring to avoid malicious bytecode loading (again, see the answer I linked) then you might be able to run the script without setting up a special environment for it. It really depends on the details of your current environment as to whether it's safe or not.
I apologize for a late answer (you've probably moved on by now) but it is possible to do this using the built in load function. You can supply a fourth argument to the function which is a custom environment and it returns a function. You can pass a function, a string, or possibly even a thread (I think) to load and get the result you want. I was also having this problem and I thought I'd answer it for future users.
Here is a link to the documentation on the lua site for load: https://www.lua.org/manual/5.2/manual.html#pdf-load
I have tested this to ensure it works properly in Factorio and it appears to work as intended.
Right now I am doing a lot of.
local env = {
print = print,
}
setfenv(func, env)
and then using metamethods to lock propertys on Instances, but it is really inefficient and has lots of bypasses. I googled it and everything I find is the same as this: unworking.
In Lua 5.1, sandboxing is pretty simple. If you have a Lua script in a file somewhere, and you want to prevent it from accessing any functions or anything other than the parameters you provide, you do this:
local script = --Load the script via whatever means. DO NOT RUN IT YET!
setfenv(script, {})
script is now sandboxed. It cannot access anything other than the values you directly provide. Functions it creates cannot access anything outside of this sandbox environment. Your original global environment is completely cut off from them, except for what you permit it to access.
Obviously you can put whatever you like in that table; that table will contain whatever globally accessible stuff you like. You should probably give Lua scripts access to basic Lua standard library functions; most of those are pure functions that can't do anything unpleasant.
Here's a list of Lua standard library stuff that you must not give the user access to, if you want to maintain the integrity your sandbox:
getfenv. There are valid reasons for a user to be able to setfenv, so that it can create mini-sandboxes of its own within your sandbox. But you cannot allow access to the environment of any functions you put in the sandbox if you want to maintain the integrity of the sandbox.
getmetatable: Same reasoning as above; setting metatables is OK. Though malicious code can break an object if they change its metatable, but malicious code can break your entire system just by doing an infinite loop.
The entire debug library. All manner of chicanery is possible through the debug library.
You also apparently need to solve this problem that Lua 5.1 has with loading bytecode from within a Lua script. That can be used to break the sandbox. Unfortunately, Lua 5.1 doesn't really have good tools for that. In Lua 5.2+, you can encapsulate load and loadfile, such that you internally pass "t" as the mode parameter no matter what the user provides. But with Lua 5.1, you need some way to encapsulate load et.al. such that you can tell when the data is text and when it's not. You could probably find the code that Lua uses to distinguish bytecode from text by reading the Lua source.
Or you can just disallow load and its friends altogether.
If you want to prevent the user from doing ugly things to the system, then disallow the os and io libraries.
In a large complex C program, I'd like to save to a file the contents of all memory that is used by static variables, global structures and dynamically allocated variables. Those memory variables are more than 10,000.
The C program has only single thread, no file operation and program itself is not so complex (calculation is complex).
Then, in a same execution of the program, I want to initialize the memory from this saved state.
If this is even possible, can someone offer an approach to accomplish this?
You have to define a Struct to keep al your data in and then you have to implement a function to save it into a file.
Something like this: Saving struct to file
Please note, however, that this method is the simplest, but comes with no portability at all.
Edit after Comment: basically, what you would like to do is save whatever is happening in the program and then restart it after a load. I don't think this is possible in any simple way. You MUST understand what "status of your application" means.
Think about it: doing a dump of the memory saves not only the data, but also the current Instruction Pointer. So, with that "dumb" dump, you would have also saved the actual instruction currently running. And many more complications you really don't want to care about.
The closest thing you are thinking about is running the program in a Virtual Machine. If you pause the VM the execution status will be "saved", but whenever you restart the VM, the program will restart at the exact same execution point you paused it.
If the configurations are scattered through the application, still you can access a global struct used to save everything.
But still you have to know your program and identify what you have to save. No shortcuts on that.
I just played around a bit with Lua and tried the Koneki eclipse plugin, which is quite nice. Problem is that when I make changes in a function I'm debugging at the moment the changes do not become effective when saving the changes. So I'm forced to restart the application. Would be so nice if I could make changes in the debugger and they would become effective on the fly as for example with Smalltalk or to some extend as in hot code replacement in Java. Anybody has a clue whether this is possible?
It is possible to some degree with some limitations. I've been developing an IDE/debugger that provides this functionality. It gives you access to a remote console to execute commands in the context/environment of your running application. The IDE also supports live coding, which reloads modified code as you make changes to it; see demos here.
The main limitation is that you can't modify a currently running function (at least without changes to Lua VM). This means that the effect of your changes to the currently running function will only be seen after you exit and re-enter that function. It works well for environments that call the same function repeatedly (for example a game engine calling draw), but may not work in your case.
Another challenge is dealing with upvalues (values that are created outside of your function and are referenced inside it). There are methods to "read" current upvalues and re-create them when the (new) function is created, but it requires some code analysis to find what functions will be recreated to query them for upvalues, to get the current values, and then to create a new environment with those upvalue and assign proper values to them. My current implementation doesn't do this, which means you need to use global variables as a workaround.
There was also relevant discussion just the other day on the Lua mailing list.
I try to understand mmap and got the following link to read:
http://duartes.org/gustavo/blog/post/page-cache-the-affair-between-memory-and-files
I understand the text in general and it makes sense to me. But at the end is a paragraph, which I don't really understand or it doesn't fit to my understanding.
The read-only page table entries shown above do not mean the mapping is read only, they’re merely a kernel trick to share physical memory until the last possible moment. You can see how ‘private’ is a bit of a misnomer until you remember it only applies to updates. A consequence of this design is that a virtual page that maps a file privately sees changes done to the file by other programs as long as the page has only been read from. Once copy-on-write is done, changes by others are no longer seen. This behavior is not guaranteed by the kernel, but it’s what you get in x86 and makes sense from an API perspective. By contrast, a shared mapping is simply mapped onto the page cache and that’s it. Updates are visible to other processes and end up in the disk. Finally, if the mapping above were read-only, page faults would trigger a segmentation fault instead of copy on write.
The folloing to lines doesn't match for me. I see no sense.
A consequence of this design is that a virtual page that maps a file privately sees changes done to the file by other programs as long as the page has only been read from.
It is private. So it can't see changes by others!
Finally, if the mapping above were read-only, page faults would trigger a segmentation fault instead of copy on write.
Don't know what the author means with this. Is their a flag "MAP_READ_ONLY"? Until a write occurs, every pointer from the programs virtual-pages to the page-table-entries in the page-cache is read-only.
Can you help me to understand this two lines?
Thanks
Update
It seems it got it, with some help.
A consequence of this design is that a virtual page that maps a file privately sees changes done to the file by other programs as long as the page has only been read from.
Although a mapping is private, the virtual page really can see the changes by others, until it modifiy itselfs a page. The modification becomes is private and is only visible to the virtual page of the writing program.
Finally, if the mapping above were read-only, page faults would trigger a segmentation fault instead of copy on write.
I'm told that pages itself can also have permissions (read/write/execute).
Tell me if I'm wrong.
This fragment:
A consequence of this design is that a virtual page that maps a file privately sees changes done to the file by other programs as long as the page has only been read from.
is telling you that the kernel cheats a little bit in the name of optimization. Even though you've asked for a private mapping, the kernel will actually give you a shared one at first. Then, if you write the page, it becomes private.
Observe that this "cheating" doesn't matter (doesn't make any difference) if all processes which are accessing the file are doing it with MAP_PRIVATE, because no actual changes to the file will ever occur in that case. Different processes' mappings will simply be upgraded from "fake cheating MAP_PRIVATE" to true "MAP_PRIVATE" at different times according to whenever each process first writes to the file. This is probably a common scenario. It's only if the file is being concurrently updated by other means (MAP_SHARED with PROT_WRITE or else regular, non-mmap I/O operations) that it makes a difference.
I'm told that pages itself can also have permissions (read/write/execute).
Sure, they can. You have to ask for the permissions you want when you initially map the file, in fact: the third argument to mmap, which will be a combination of PROT_READ, PROT_WRITE, PROT_EXEC, and PROT_NONE.