I am working on static analyzer for C codes, as far as I understand, frama-c used Clang for its AST tree
assume that in your code, you have some system function calls such ast strcpy or strcmp
in this case, is that possible to predict (in static mode without code execution and trace memory access) what will be the variable access (Read/Write) for each passed function parameter ? for instance, both strcmp and strcpy function accept strings as their input arguments , but only strcpy function changed and modified the first parameter
or even if you know some additional software or tools that can help me to have this kind of static analyse, please let me know
Related
I am making a compiler with Jflex and Bison. Jflex does the lexical analysis. Bison does the parsing.
The lexical analysis (in a .l file) is perfect. Tokenizes the input, and passes the input to the .y file for Bison to parse.
I need the parser to print an error for redeclared/undeclared variables. My thought are that it would need some sort of memory to remember all the variables initialized so far, so that it can produce an error for those tokens coming in and when it sees an undeclared variable being used. For example, ''bool", "test", "=", "true", ";", and on a new line, "test2", "=", "false", ";", the parser would need some sort of memory to remember ''test" and when it parses the second line it can access that memory again and say that "test2" is undeclared, hence it would print an error.
What I'm confused about is how we can make a memory like that with bison using Java in the .y file. With C, you would use the -d flag and it would make 2 files with enum types and a header file which would keep track of the declared variables but in Java I'm not too sure if I can do the same as I can't structure the grammar in any way so that it will remember variable names.
I could make a symbol table in Java code to check for redeclared variables, but in the main() in the .y file I have
public static void main(String args[]) throws IOException {
EXAMPLELexer lexer = new EXAMPLLexer(System.in);
EXAMPLE parser = new EXAMPLE(lexer);
if(parser.parse()){
System.out.println("VALID FROM PARSER");
}
else{
System.out.println("ERROR FROM PARSER");
}
return;
}
There is no way to get the tokens individually and pass them into another java instance or whatever.%union{} doesnt work with Java, so I dont know how this is even possible.
I can't find a single piece of documentation explaining this so I would love some answers!
It's actually a lot simpler to add your own data to a Bison-generated Java parser than it is to a C parser (or even a C++ parser).
Note that Bison's Java API does not have unions, mostly because Java doesn't have unions. All semantic values are non-primitive types, so they derive from Object. If you need to, you can cast them to a more precise type, or even a primitive type.
(There is an option to define a more precise base class for semantic value types, but Object is probably a good place to start.)
The %code { ... } blocks are just copied into the parser class. So you can add your own members, as well as methods to manipulate them. If you want a symbol table, just add it as a HashMap to the parser class, and then you can add whatever you like to it in your actions.
Since all the parser actions are within the parser class, they have direct access to whatever members and member functions you add to the parser. All of Bison's internal members and member functions have names starting with yy, except for the member functions documented in the manual, so you can use almost any names you want without fear of name collision.
You can also use %parse-param to add arguments to the constructor; each argument corresponds to a class member. But that's probably not necessary for this particular exercise.
Of course, you'll have to figure out what an appropriate value type for the symbol is; that depends completely on what you're trying to do with the symbols. If you only want to validate that the symbols are defined when they are used, I suppose you could get away with a HashSet, but I'm sure eventually you'll want to store some more useful information.
As per the documentation(https://www.lua.org/manual/5.3/manual.html#lua_pushliteral),
which says that:
This macro is equivalent to lua_pushstring, but should be used only when s is a literal string.
But I can't understand the explanation aforementioned at all.
As far as I can see, there is no difference from the the macro definition for lua_pushliteral:
#define lua_pushliteral(L, s) lua_pushstring(L, "" s)
The documentation for lua_pushliteral in Lua 5.4 is the same as 5.3, except it adds "(Lua may optimize this case.)". So while it is currently the same as calling lua_pushstring, the Lua devs are giving themselves the option to optimize it in the future.
EDIT: As an example, the doc for lua_pushstring says:
Lua will make or reuse an internal copy of the given string, so the memory at s can be freed or reused immediately after the function returns.
But a C string literal is read-only, so it's impossible for the C code to free or reuse the memory. Also, Lua strings are immutable. It's basically useless to copy one immutable object to another immutable object when you could just refer to the same memory from both places. That means one possible way to optimize lua_pushliteral would be to just not make the copy that lua_pushstring does.
I wrote the following code in the file "orgin.lua"
if test==nil then
print(aa["bb"]["cc"]) -- to produce a crash
end
print(1120)
when it crash ,it will generate the following information:
lua: origin.lua:3: attempt to index global 'aa' (a nil value)
In order to prevent decompilation and make sure the code is safe,I use the following command to convert my code:
luac -o -s test.lua origin.lua
I know the argument -s is strip debug information, then it do not show the number of rows when crash:
lua: ?:0: attempt to index global 'aa' (a nil value)
but how to bring debugging information when encryption then lua code use luac?Is there any solution?
There is no way to do this built into Lua, but there are some work-arounds.
If you only need line numbers, then one option is to leave the line numbers in the chunk. Line numbers are not that useful for reverse engineering (unluac currently doesn't use them at all), so it shouldn't affect security. Lua doesn't provide an option for this, but it is easy to modify Lua to leave them in when stripping. From ldump.c
n = (D->strip) ? 0 : f->sizelineinfo;
can be changed to
n = f->sizelineinfo;
(Disclaimer: untested)
A more complicated option would be to modify the Lua runtime to output the virtual machine program counter instead of the line number, and also output information describing the location of the current function in the chunk (e.g. top level, first function, second function nested in third function, etc). Then the line number could be looked up by the developer in a non-stripped version of the chunk. (Here is a reference to someone using this approach on lua-l -- no source code was provided, though.)
Note that preventing decompilation is not true security. It may help against casual attacks, but Lua bytecode is not hard to read.
luac does not encrypt the output. It compiles your Lua source code to bytecode, that's all. The code is neither encrypted nor does it run any faster, only the loadtime is shorter since the compilation step is not needed.
If you want your code to be encrypted, I suggest to encrypt the bytecode using e.g. AES-256 and then decode it in memory just before handing it to the Lua state. This way the bytecode is encrypted on disk, but decripted in memory.
The overhead is low. We use this technique since years.
This is a rather technical question about the compilation process of
ABAP code.
I know that there are ABAP parser and scanner classes that actually
call C kernel functions to do the real work. Then there is code completion
functionality with a transaction that returns and prints the AST (abstract source tree) of a program as ABAP list or XML.
Now my question is: would it be possible to 'skip' the ABAP source
code and directly produce such an AST by other means than writing and then executing an
ABAP program in SE80 or so, and give it to some function that compiles and
executes it as if it had been written in and parsed from ABAP code?
That is, can I skip scanning and parsing of sources and directly give
an AST to the compiler? If so, in what format? ABAP lists look more
a printing format, not like e.g. Lisp lists surrounded by parentheses.
Unfortunately, the ABAP Compiler does not use ASTs to generate the VM code.
The ABAP compiler works sequentially and translates statement per statement (i.e. everything that is between two ".") into one or more virtual machine opcodes.
If you are curious, you could take a look at transaction SYNT which shows the compiler output. You could also take a look at report RSLOAD00 which shows the ABAP VM code that has been generated for a program.
ASTs have only been built on top to allow for code completion or high-level analyses.
If you want to invoke the ABAP compiler, you will need to generate textual ABAP source code.
In fortran you can declare an array with any suitable (integral) range, for example:
real* 8 array(-10:10)
I believe that fortran, when passing by reference, will always pass around array(1) as the reference, but I'm not sure.
I'm using fortran pointers, and I believe that fortran is pointing the "1st" element address, i.e. array(1), not array(-10). However I'm not sure.
How does Fortran deal with negative array indexing in memory? And is it implimentation defined?
Edit: To add a little more detail, I'm passing a malloc'd block from C to fortran by means of using a fortran pointer to point at the the address, which is done by calling a fortran routine from within C. I.e. C goes:
void * pointer = malloc(blockSize*sizeof(double));
fortranpoint_(pointer);
And the fortran point routine looks like:
real*8 :: target block(5, -6:6, 0:0)
real*8 :: pointer array(:,:,:)
entry fortranPoint(block)
array => block
return
The problem is that sometimes when it later tries to access say:
array(1, -6, 0)
I am not sure if this is accessing the address at the beginning of the block or somewhere before it. I now think this is implementation defined, but would like to know the details of each implementation.
Fortran array argument ABI depends on the compiler, and perhaps more crucially, on whether the called procedure has an explicit or implicit interface.
For an implicit interface, typically the address of the first element is passed [1]. In the callee, the procedure then adds an offset depending on how the array dummy argument is declared. E.g. if the array dummy argument is declared somearray(-10:10), then a reference to somearray(x) is calculated as
address_of_first_element_passed_in_to_the_procedure + x + 10
If the procedure has an explicit interface, typically an array descriptor structure is passed rather than the address of the first element. In this structure, the callee can find information on the bounds of each dimension and, of course, a pointer to the actual data, allowing it to calculate the correct offset, similarly to the case of an implicit interface.
[1] Note that this is the first element in memory, that is, the lowest index for each dimension. Not somearray(1) regardless of how the array was declared.
To answer your updated question, for C/Fortran interoperability, use the ISO_C_BINDING feature which is nowadays widely available. This provides a standardized way to pass information between C and Fortran.
If the dummy argument for a regular array in Fortran is declared A(:) (or with more dimensions), the SHAPE is passed, not the specific index range. So the procedure will default to one-indexing. You can override this with a declaration in the procedure of A(-10:), or A(StartIndex:), where StartIndex is another argument.
Fortran pointers do include the index range, but the passing mechanism will be compiler dependent. Code interfacing this to C is likely to be OS & compiler dependent. As already suggested, I'd use a regular array and the ISO C Binding. It is MUCH easier than the old ways of figuring out the compiler passing mechanisms and standard and portable. If you have a large existing Fortran code, you could write a "glue" Fortran procedure that maps between the regular Fortran variable declarations and the ISO C Binding names. While they types will have formally different names, in practice they will be the same if you select the correct ISO C types. The ISO C Binding has been available for many years now -- can you upgrade the compiler on the problem target platform? If not, I'd use a regular Fortran array and either use zero-indexing on the C-side, or explicitly pass as arguments the desired indices.
There are examples of ISO C Binding usage on other Stack Overflow questions.
The interface to a procedure is explicit if it is declared so that it is known to the compiler in the caller. The simplest way it to place the procedures in a module and "use" the module in the caller. Having explicit interfaces helps avoid bugs since the compiler can check consistency between arguments of the caller and callee. It is a little bit like C header files, only easier.