My goal is to analyze a code illustrating CWE122 that I found in the NIST database : https://samate.nist.gov/SARD/test-cases/234158/versions/2.0.0
I dont succeed in making the tool emit an alarm for the flaw described in the link above, using eva plugin. I tried with different parameters, but it does not work.
Can you provide me the good parameters please ?
Probable cause: different machdep
I believe you are using a 64-bit machdep (architectural configuration), which is the default in Frama-C 24. Adding option -machdep x86_32 should suffice to reproduce the alarm.
Detailed explanation
Here's a simplified version of the SARD test case:
#include <stdio.h>
#include <stdlib.h>
void main() {
double * data = NULL;
/* FLAW: Using sizeof the pointer and not the data type in malloc() */
data = (double *)malloc(sizeof(data));
if (data == NULL) {exit(-1);}
*data = 1.7E300;
{
double * dataCopy = data;
double * data = dataCopy;
/* POTENTIAL FLAW: Attempt to use data, which may not have enough memory allocated */
printf("%lf", *data);
free(data);
}
}
When on a 32-bit architecture (with options -machdep x86_32 or -machdep gcc_x86_32), sizeof(data) equals 4, which is less than the size of a double (8 bytes). In this case (which is the setting used in Frama-C's reproduction of SATE), as soon as we try to assign *data = 1.7E300, there is a buffer overflow and Frama-C/Eva emits a red alarm.
However, on a 64-bit architecture, sizeof(data) is 8, which is by coincidence also the size of a double. Therefore, there is no buffer overflow and no undefined behavior in the code.
Note that the flaw is a "human-level" flaw, that is, it requires knowing that the intention of the programmer was to match its sizeof with the type used in the cast ((double *)). But other than using some syntactic heuristics for such cases, there is no way to detect it directly on a 64-bit architecture, because there's no actual buffer overflow nor undefined behavior.
It's possible to imagine a portability analysis which tries all possible combinations of sizes and architectures, and reports issues if any of the combinations produces a warning, but I'm not aware of any tools which do it. Instead, what some analyzers do (and so could Frama-C) is to perform one analysis per architectural configuration (e.g. 32-bit, ILP64, LP64, LLP64, etc.) and perform the union/join of alarms for each test case. This would require a fair amount of scripting and double (or more) the overall execution time, but it is technically feasible, and already done by some users concerned with portability.
Historical note: when Frama-C participated in SATE 5, by default it used a 32-bit architecture configuration, and this has been carried over to SATE 6 by adding -machdep x86_32
General remarks on reproducing tests from SATE
The options used for the SATE 6 reproduction are located in the fc_analyze.sh script, more specifically the flags assigned to FCFLAGS and EVAFLAGS (the former concern parsing and general flags; the latter are specific to the Eva plugin):
FCFLAGS="\
-no-autoload-plugins -load-module from,inout,report,eva,variadic \
-kernel-warn-key parser:decimal-float=inactive \
-kernel-warn-key typing:no-proto=inactive \
-kernel-warn-key typing:implicit-conv-void-ptr=inactive \
-eva-warn-key locals-escaping=inactive \
-add-symbolic-path $TESTCASESUPPORT_DIR:TESTCASESUPPORT_DIR \
-machdep x86_32 \
"
EVAFLAGS="\
-eva-msg-key=-initial-state,-final-states \
-eva-no-show-progress \
-eva-print-callstacks \
-eva-slevel 300 \
-warn-special-float none \
-warn-signed-downcast \
-warn-unsigned-overflow \
-eva-warn-copy-indeterminate=-#all \
-eva-no-remove-redundant-alarms \
-eva-domains equality,sign \
-eva-verbose 0 \
"
(Options -no-autoload-plugins and -load-module are mostly for performance issues when running 40k+ tests, and can be removed; and -add-symbolic-path is for versioning test oracles, but can equally be removed.)
Obtaining the command-line used by Frama-C
If you are able to re-run the SATE6 test scripts, that is, if you can go to the directory:
C/testcases/CWE122_Heap_Based_Buffer_Overflow/s11
And run:
make CWE122_Heap_Based_Buffer_Overflow__sizeof_double_31_bad.res
Or:
make CWE122_Heap_Based_Buffer_Overflow__sizeof_double_31_bad.gui
Then a quick hack to obtain the direct command-line used by Frama-C for that test is to modify the fc/analyze.sh script and add a -x flag to the first line:
- #!/bin/bash -eu
+ #!/bin/bash -eux
That way, whenever you run a test case, you will get a (fairly large) command line which should hopefully help reproduce the test case.
But, overall, copying the options from FCFLAGS and EVAFLAGS should be sufficient to reproduce the analyses.
Related
I am trying to implement device driver for /dev/null & /dev/zero pseudo devices in C.
First, I would like to know how these two are different in usage?
My plan is to register these as /dev/null & /dev/zero and then in the corresponding write & read methods; just return success always.
Is above implementation correct?
This documentation covers these pseudo-device files pretty well:
http://tldp.org/LDP/abs/html/zeros.html
You won't be able to register them as /dev/null and /dev/zero, since those things already exist. Unless you arrange to remove them, which I don't recommend, unless you have some very specific scenario requiring it, which it does not sound like you do.
/dev/zero is supposed to produce zeros, so just returning in the kernel write (interface with user space read) functions would not be adequate. Also, /dev/null must be able to read() (be written to) so it can consume data. Etc. etc.
I'm just starting to repeat that link, which is pointless. Check that link for a full description of the intended behavior.
I'm playing around with a binary and when I load it into my debugger, or even run readelf, I noticed the entry point is 0x530 instead of the usual 0x80****** that'd learned ELF's were loaded at.
Why is this? Is there anything else going on? The binary is linked and not stripped.
instead of the usual 0x80****** that'd learned ELF's were loaded at.
You learned wrong.
While 0x804800 is the usual address that 32-bit x86 Linux binaries are linked at, that address is by no means universal or special.
64-bit x86_64 and aarch64 binaries are linked at default address of 0x40000, and powerpc64le binaries at default address of 0x10000000.
There is no reason a binary could not be linked at any other (page-aligned) address (so long as it is not 0, and allows for sufficient stack at the high end of the address space.
Why is this?
The binary was likely linked with a custom linker script. Nothing wrong with that.
As mentioned by Employed, the entry address is not fixed.
Just to verify, I've tried on x86_64:
gcc -Wl,-Ttext-segment=0x800000 hello_world.c
which sets the entry point to 0x800000 (+ the ELF header size, which gets loaded at 0x800000 in memory) instead of the default 0x400000.
Then both:
readelf -h a.out
and gdb -ex 'b _start' tell me the entry is at 0x800440 as expected (the header is 0x440 bytes).
This is because that value is an input that tells the Linux kernel where to set the PC when forking a new process.
The default 0x400000 comes from the default linker script used. You can also modify the linker script as mentioned in https://stackoverflow.com/a/31380105/895245 , change 0x400000 there, and use the new script with -T script
If I put it at anything below 0x200000 (2Mb) exactly or other low addresses, the program gets killed. I think this is because ld always loads the sections at multiples of 2Mb, which is the largest page size supported (in huge page), so anything lower starts at 0, which is bad: Why is the ELF execution entry point virtual address of the form 0x80xxxxx and not zero 0x0?
In Linux, the mmap(2) man page explains that an anonymous mapping
. . . is not backed by any file; its contents are initialized to zero.
The FreeBSD mmap(2) man page does not make a similar guarantee about zero-filling, though it does promise that bytes after the end of a file in a non-anonymous mapping are zero-filled.
Which flavors of Unix promise to return zero-initialized memory from anonymous mmaps? Which ones return zero-initialized memory in practice, but make no such promise on their man pages?
It is my impression that zero-filling is partially for security reasons. I wonder if any mmap implementations skip the zero-filling for a page that was mmapped, munmapped, then mmapped again by a single process, or if any implementations fill a newly mapped page with pseudorandom bits, or some non-zero constant.
P.S. Apparently, even brk and sbrk used to guarantee zero-filled pages. My experiments on Linux seem to indicate that, even if full pages are zero-filled upon page fault after a sbrk call allocates them, partial pages are not:
#include <unistd.h>
#include <stdio.h>
int main() {
const intptr_t many = 100;
char * start = sbrk(0);
sbrk(many);
for (intptr_t i = 0; i < many; ++i) {
start[i] = 0xff;
}
printf("%d\n",(int)start[many/2]);
sbrk(many/-2);
sbrk(many/2);
printf("%d\n",(int)start[many/2]);
sbrk(-1 * many);
sbrk(many/2);
printf("%d\n",(int)start[0]);
}
Which flavors of Unix promise to return zero-initialized memory from anonymous mmaps?
GNU/Linux
As you said in your question, the Linux version of mmap promises to zero-fill anonymous mappings:
MAP_ANONYMOUS
The mapping is not backed by any file; its contents are initialized to zero.
NetBSD
The NetBSD version of mmap promises to zero-fill anonymous mappings:
MAP_ANON
Map anonymous memory not associated with any specific file. The file descriptor is not used for creating MAP_ANON regions, and must be specified as -1. The mapped memory will be zero filled.
OpenBSD
The OpenBSD manpage of mmap does not promise to zero-fill anonymous mappings. However, Theo de Raadt (prominent OpenBSD developer), declared in November 2019 on the OpenBSD mailing list:
Of course it is zero filled. What else would it be? There are no plausible alternatives.
I think it detracts from the rest of the message to say something so obvious.
And the other OpenBSD developers did not contradict him.
IBM AIX
The AIX version of mmap promises to zero-fill anonymous mappings:
MAP_ANONYMOUS
Specifies the creation of a new, anonymous memory region that is initialized to all zeros.
HP-UX
According to nixdoc.net, the HP-UX version of mmap promises to zero-fill anonymous mappings:
If MAP_ANONYMOUS is set in flags, a new memory region is created and initialized to all zeros.
Solaris
The Solaris version of mmap promises to zero-fill anonymous mappings:
When MAP_ANON is set in flags, and fildes is set to -1, mmap() provides a direct path to return anonymous pages to the caller. This operation is equivalent to passing mmap() an open file descriptor on /dev/zero with MAP_ANON elided from the flags argument.
This Solaris man page gives us a way to get zero-filled memory pages without relying on the behavior of mmap used with the MAP_ANONYMOUS flag: do not use the MAP_ANONYMOUS flag, and create a mapping backed by the /dev/zero file. It would be useful to know the list of Unix-like operating systems providing the /dev/zero file, to see if this approach is more portable than using the MAP_ANONYMOUS flag (neither /dev/zero nor MAP_ANONYMOUS are POSIX).
Interestingly, the Wikipedia article about /dev/zero claims that MAP_ANONYMOUS was introduced to remove the need of opening /dev/zero when creating an anonymous mapping.
It's hard to say which ones promise what without simply exhaustively enumerating all man pages or other release documentation, but the underlying code that handles MAP_ANON is (usually? always?) also used to map in bss space for executables, and bss space needs to be zero-filled. So it's pretty darn likely.
As for "giving you back your old values" (or some non-zero values but most likely, your old ones) if you unmap and re-map, it certainly seems possible, if some system were to be "lazy" about deallocation. I have only used a few systems that support mmap in the first place (BSD and Linux derivatives) and neither one is lazy that way, at least, not in the kernel code handling mmap.
The reason sbrk might or might not zero-fill a "regrown" page is probably tied to history, or lack thereof. The current FreeBSD code matches with what I recall from the old, pre-mmap days: there are two semi-secret variables, minbrk and curbrk, and both brk and sbrk will only invoke SYS_break (the real system call) if they are moving curbrk to a value that is at least minbrk. (Actually, this looks slightly broken: brk has the at-least behavior but sbrk just adds its argument to curbrk and invokes SYS_break. Seems harmless since the kernel checks, in sys_obreak() in /sys/vm/vm_unix.c, so a too-negative sbrk() will fail with EINVAL.)
I'd have to look at the Linux C library (and then perhaps kernel code too) but it may simply ignore attempts to "lower the break", and merely record a "logical break" value in libc. If you have mmap() and no backwards compatibility requirements, you can implement brk() and sbrk() entirely in libc, using anonymous mappings, and it would be trivial to implement both of them as "grow-only", as it were.
I understand that OpenMP is in fact just a set of macros which is compiled into pthreads. Is there a way of seeing the pthread code before the rest of the compilation occurs? I am using GCC to compile.
First, OpenMP is not a simple set of macros. It may be seen a simple transformation into pthread-like code, but OpenMP does require more than that including runtime support.
Back to your question, at least, in GCC, you can't see pthreaded code because GCC's OpenMP implementation is done in the compiler back-end (or middle-end). Transformation is done in IR(intermediate representation) level. So, from the viewpoint of programmers, it's not easy to see how the code is actually transformed.
However, there are some references.
(1) An Intel engineer provided a great overview of the implementation of OpenMP in Intel C/C++ compiler:
http://www.drdobbs.com/parallel/how-do-openmp-compilers-work-part-1/226300148
http://www.drdobbs.com/parallel/how-do-openmp-compilers-work-part-2/226300277
(2) You may take a look at the implementation of GCC's OpenMP:
https://github.com/mirrors/gcc/tree/master/libgomp
See libgomp.h does use pthread, and loop.c contains the implementation of parallel-loop construct.
OpenMP is a set of compiler directives, not macros. In C/C++ those directives are implemented with the #pragma extension mechanism while in Fortran they are implemented as specially formatted comments. These directives instruct the compiler to perform certain code transformations in order to convert the serial code into parallel.
Although it is possible to implement OpenMP as transformation to pure pthreads code, this is seldom done. Large part of the OpenMP mechanics is usually built into a separate run-time library, which comes as part of the compiler suite. For GCC this is libgomp. It provides a set of high level functions that are used to easily implement the OpenMP constructs. It is also internal to the compiler and not intended to be used by user code, i.e. there is no header file provided.
With GCC it is possible to get a pseudocode representation of what the code looks like after the OpenMP transformation. You have to supply it the -fdump-tree-all option, which would result in the compiler spewing a large number of intermediate files for each compilation unit. The most interesting one is filename.017t.ompexp (this comes from GCC 4.7.1, the number might be different on other GCC versions, but the extension would still be .ompexp). This file contains an intermediate representation of the code after the OpenMP constructs were lowered and then expanded into their proper implementation.
Consider the following example C code, saved as fun.c:
void fun(double *data, int n)
{
#pragma omp parallel for
for (int i = 0; i < n; i++)
data[i] += data[i]*data[i];
}
The content of fun.c.017t.ompexp is:
fun (double * data, int n)
{
...
struct .omp_data_s.0 .omp_data_o.1;
...
<bb 2>:
.omp_data_o.1.data = data;
.omp_data_o.1.n = n;
__builtin_GOMP_parallel_start (fun._omp_fn.0, &.omp_data_o.1, 0);
fun._omp_fn.0 (&.omp_data_o.1);
__builtin_GOMP_parallel_end ();
data = .omp_data_o.1.data;
n = .omp_data_o.1.n;
return;
}
fun._omp_fn.0 (struct .omp_data_s.0 * .omp_data_i)
{
int n [value-expr: .omp_data_i->n];
double * data [value-expr: .omp_data_i->data];
...
<bb 3>:
i = 0;
D.1637 = .omp_data_i->n;
D.1638 = __builtin_omp_get_num_threads ();
D.1639 = __builtin_omp_get_thread_num ();
...
<bb 4>:
... this is the body of the loop ...
i = i + 1;
if (i < D.1644)
goto <bb 4>;
else
goto <bb 5>;
<bb 5>:
<bb 6>:
return;
...
}
I have omitted big portions of the output for brevity. This is not exactly C code. It is a C-like representation of the program flow. <bb N> are the so-called basic blocks - collection of statements, treated as single blocks in the program's workflow. The first thing that one sees is that the parallel region gets extracted into a separate function. This is not uncommon - most OpenMP implementations do more or less the same code transformation. One can also observe that the compiler inserts calls to libgomp functions like GOMP_parallel_start and GOMP_parallel_end, which are used to bootstrap and then to finish the execution of a parallel region (the __builtin_ prefix is removed later on). Inside fun._omp_fn.0 there is a for loop, implemented in <bb 4> (note that the loop itself is also expanded). Also all shared variables are put into a special structure that gets passed to the implementation of the parallel region. <bb 3> contains the code that computes the range of iterations over which the current thread would operate.
Well, not quite a C code, but this is probably the closest thing that one can get from GCC.
I haven't tested it with openmp. But the compiler option -E should give you the code after preprocessing.
I am using OpenCL dev software of Nvidia on GTX550ti graphics card, and encounter a strange problem. (I am freshman for OpenCL).
My kernel code is like this:
__kernel void kernel_name(...)
{
size_t d = get_local_id(0);
char abc[8];
...
}
Actually, the char abc[8] is useless (dead code) for my case. But, if I have the char abc[8] in my kernel code, the result will be totally messy and the running time of kernel will be much longer (2095712 ns). If I comment out the char abc[8], the result becomes correct, and the running time of kernel becomes shorter (697856 ns). The compiler of kernel won't wipe off the dead code?
The above is just an explicit example that I can repeat. I also encounter more stranger case that one program gets different result when run at different time in totally the same environment.
Is that related to memory allocation or..? Anyone can give me some advice on how to find the problem?
By the way, oclDeviceQuery output information is listed as follows:
Platform Version = OpenCL 1.1
CUDA 4.2.1,
SDK Revision = 7027912
My OS is Windows XP.
Today is 2012-07-17, and I think I have resolved this problem.
don't use #include in kernel source file.
don't use ultra length line (for example, you write program to generate some line data for kernel source file) in kernel source file.
You're right, that shouldn't effect anything.
That's not your real code though, and I suspect given those run-times that your kernel isn't a simple thing. Possibly you're pushing your locals over some limit which means that variables are having to be stored in some slower memory which pushes your run-times up.
Something like that might also cause a change in behaviour if you had an uninitialised variable bug somewhere. In the fast store it happens to get a value that works. In the slow store it gets something else.
To check this theory I'd try to remove some other local data structure and see if it has the same effect. Anything else 8 bytes or larger should have the same effect.
...of course it's possibly you've found a bug in the OpenCL implementation, but that's easy to check. Just compile the kernel for a different OpenCL device, e.g. the CPU. This is worth doing anyway because different compiler pick up different issues.
Other than that I think you're back to standard debug techniques.
BTW: at one point in your question you call the array abs[8] rather than abc[8]. I assume that's a typo, but if it isn't then that could be your problem as the abs name will clash with the abs() function. That could confuse a stupid compiler.