How to parse instructions in assembly into binary(MIPS)? - parsing

I'm trying to do a MIPS project for Comp Arch where I have a text file with a bunch of random mips instructions and I have to do what MARS does and translate these instructions into binary and write the result into a new file.
What I've thought of so far:
-I know how to read a file and write to one
-I think I'll need to hard code the registers binary values and the instructions op codes and function codes
My question is how do I parse through the file and read the instructions to figure out which binary pieces to put? Like if it was add $t1, $t2, $t1 I want it to read "add" and jump to a loop that has the op code and function for add then keep going and if it reads $t1 to pick up the binary value for t1 and so forth until it finishes the line and then have a cue to jump to the next line and keep going.
Any advice on how to parse the file or tips for the project in general are welcome. I probably did a bad job of explaining it but basically we are trying to create what mars does but on a much smaller scale with a select few instructions and registers

Related

In Cobol why would you have a PERFORM THRU with non existent paragraph names

I am trying to figure out what the purpose of the PERFORM command is below. Code was written 20 years ago. ACPY-READ-FIRST, ACPY-READ-NEXT, and ACPY-EXIT don't exist anywhere in the program.
MOVE ACPY-ID TO WS-ACPY-ID.
PERFORM ACPY-READ-FIRST THRU ACPY-EXIT.
150-PYMTS.
PERFORM ACPY-READ-NEXT THRU ACPY-EXIT.
IF NOT SUCCESSFUL OR
ACCT-ID NOT = ACPY-ACCT-ID
GO TO 160-DONE.
Answer: You wouldn't as this would create a syntax error with every compiler.
The paragraphs (or even sections, but I'd look for the former) have to be somewhere in the source unit, I'd say: 95% likeliness to find it in a copybook named in the COPY statement (= COBOL's "include"), 4% that it is inserted by a code generator that was used to process this and 1% that you've just overlooked it (COBOL is case-insensitive, just in case).
Hint: If you have all necessary sources you can use GnuCOBOL to process it and create a listing which shows you the copybook it the paragraphs are included in.

How to bring debugging information when encryption then lua code use luac

I wrote the following code in the file "orgin.lua"
if test==nil then
print(aa["bb"]["cc"]) -- to produce a crash
end
print(1120)
when it crash ,it will generate the following information:
lua: origin.lua:3: attempt to index global 'aa' (a nil value)
In order to prevent decompilation and make sure the code is safe,I use the following command to convert my code:
luac -o -s test.lua origin.lua
I know the argument -s is strip debug information, then it do not show the number of rows when crash:
lua: ?:0: attempt to index global 'aa' (a nil value)
but how to bring debugging information when encryption then lua code use luac?Is there any solution?
There is no way to do this built into Lua, but there are some work-arounds.
If you only need line numbers, then one option is to leave the line numbers in the chunk. Line numbers are not that useful for reverse engineering (unluac currently doesn't use them at all), so it shouldn't affect security. Lua doesn't provide an option for this, but it is easy to modify Lua to leave them in when stripping. From ldump.c
n = (D->strip) ? 0 : f->sizelineinfo;
can be changed to
n = f->sizelineinfo;
(Disclaimer: untested)
A more complicated option would be to modify the Lua runtime to output the virtual machine program counter instead of the line number, and also output information describing the location of the current function in the chunk (e.g. top level, first function, second function nested in third function, etc). Then the line number could be looked up by the developer in a non-stripped version of the chunk. (Here is a reference to someone using this approach on lua-l -- no source code was provided, though.)
Note that preventing decompilation is not true security. It may help against casual attacks, but Lua bytecode is not hard to read.
luac does not encrypt the output. It compiles your Lua source code to bytecode, that's all. The code is neither encrypted nor does it run any faster, only the loadtime is shorter since the compilation step is not needed.
If you want your code to be encrypted, I suggest to encrypt the bytecode using e.g. AES-256 and then decode it in memory just before handing it to the Lua state. This way the bytecode is encrypted on disk, but decripted in memory.
The overhead is low. We use this technique since years.

Dissassembly of Forth code words with 'see'

I am preparing overall knowledge on building a Forth interpreter and want to disassemble some of the generic Forth code words such as +, -, *, etc.
My Gforth (I currently have version 0.7.3, installed on Ubuntu Linux) will allow me to disassemble colon definitions that I make with the command see, as well as the single code word .. But when I try it with other code words, see + or see /, I get an error that says, Code +, and then I'm not able to type in my terminal anymore, even when I press control-c.
I should be able to decompile/disassemble the code words, as shown by the Gforth manual: https://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Decompilation-Tutorial.html
Has anyone else had this issue, and do you know how to fix it?
Reverting to the old ptrace method did it for me.
First, from the command line as user root run:
echo 0 >/proc/sys/kernel/yama/ptrace_scope
After which see should disassemble whatever it can't decompile. Command line example (need not be root):
gforth -e "see + bye"
Output:
Code +
0x000055a9bf6dad66 <gforth_engine+2454>: mov %r14,0x21abf3(%rip) # 0x55a9bf8f5960 <saved_ip>
0x000055a9bf6dad6d <gforth_engine+2461>: lea 0x8(%r13),%rax
0x000055a9bf6dad71 <gforth_engine+2465>: mov 0x0(%r13),%rdx
0x000055a9bf6dad75 <gforth_engine+2469>: add $0x8,%r14
0x000055a9bf6dad79 <gforth_engine+2473>: add %rdx,(%rax)
0x000055a9bf6dad7c <gforth_engine+2476>: mov %rax,%r13
0x000055a9bf6dad7f <gforth_engine+2479>: mov -0x8(%r14),%rcx
0x000055a9bf6dad83 <gforth_engine+2483>: jmpq *%rcx
end-code
Credit: Anton Ertl
Most versions of SEE that I've seen are meant only for decompiling colon definitions. + and / and other arithmetic operations are usually written in assembly code and SEE doesn't know what to do with them. That's why you were getting the CODE error message: they're written in code, not Forth. There are several Forth implementations I've seen that have built in assemblers, but I don't think I've ever seen a dis-assembler. Your best bet for seeing the inner workings of + or / or other such words might be to use DUMP or another such word to get a list of the bytes in the word and either disassemble the word by hand or feed the data into an external disassembler. Or see if you can find the source code for your implementation or a similar one.
SEE is a word that has not a tightly controlled behaviour. It is a kind of best effort to show the code of a word X if invoked as
SEE X
It behaves slightly different according how difficult it is to do this. If you defined the word yourself in the session, you're pretty much guaranteed to get your code back. If it is a built in word, especially if it is a very elementary word like + , it is harder. It may look nothing much like the original definition, because of optimisation or compilation into machine code.
Specifically for gforth, if it gets hard gforth invokes the standard tools that are present on the system to analyse object files. So it may be necessary to install gdb and/or investigate how gforth tries to connect to it. For the concrete example of Ubuntu and gforth 0.7.3 Lutz Mueller gives a recipee.
.
I think SEE does it's job as designed.
There are words in FORTH defined in machine code (often called as primitives) and also there is a possibility to define machine code via assembler by the user ie.:
: MYCODE assembler memonics ;CODE
So the output of SEE shows not Code error, but that (ie.) + word was defined as machine code and one can see the disassembled mnenonics on the right of it's output.

COBOL - Microfocus - Generic I/O

I am responsible for converting an old UNIX based COBOL batch application that was developed by a consultant back in the 1990s to a Windows environment but still in COBOL using Microfocus (Eclipse, etc).
This is a pretty straight-forward task except for one little glitch.
The old application never did any explicit file handling within the COBOL. That is there are no FDs, OPENs, READs, WRITEs or CLOSE commands in the COBOL programs. Instead they wrote a C program that would do one of those different functions based on parameters passed to it (including, but not limited to file name, rec length, and the function desired.)
I would like to rewrite that subroutine in COBOL, which would require very little modifications to the COBOL main programs being converted. That is, it would still call that subroutine, but it would now be in COBOL instead of C.
But the challenge is how to write that subroutine so that it is able to act on most any file. I would think I have to go the route of variable length records because they could literally be any length up to to-be-determined maximum size, but seems like it would be vulnerable to error (as it tries to open different types of files).
Does anybody have any experience on this or ideas on a task like this? If not,l I may have to go the blunt force route of replacing each call statement to that subroutine with the specific COBOL command (Open, Read, etc) that needs to be performed and obviously FD and SELECT for every file would need to be added to the main program.
Thanks in advance.
You might be able to
CALL "subprogram" USING fd-name
where fd-name is
FD fd-name.
...
So, yes? maybe?, you might be able to pull off a subprogram that can take generic COBOL files. But, then you get into matching record layouts and other fun things, so, be wary. This might not work COBOL to COBOL, but it does work COBOL to C and back, as you end up passing a reference to the file control block.
You'll likely be better off looking into stock system libraries. Things like CBL_OPEN_FILE and CBL_READ_FILE if they are available. This will give you a much closer match to streaming IO that will be assumed in the current C subprogram.
Or, as Bill is suggesting in the comments, try and figure out why C was used and if you don't want the foreign functions, just dig in and write new COBOL procedures, as that will likely read better in the end.

what is appropriate for me? generateAllGrams() or is generateCollocations() enough for me?

I am developing a project on wordnet-based document summarizer.in that i need to extract collocations. i tried to research as much as I could, but since i have not worked with Mahout before I am having difficulty in understanding how CollocDriver.java works (in API context)
while scouring through the web, i landed on this :
Mahout Collocations
this is the problem: i have a POSTagged input text. i need to identify collocations in it.i have got collocdriver.java code..now i need to know how do i use it? whether to use generateAllGrams() method or only generateCollocations() method is enough for my subtask within my summarizer..??
and most importantly HOW to use it? i raise this question coz I admit, i dont know the API well,
i also got a grepcode version of collocdriver the two implementations seem to be slightly different..the inputs are in string for the grepcode version and in the form of Path object in the original...
my questions: what is configuration object in input params and how to use it?? will the source / destn will be in string (as in grepcode) or Path (as in original)??
what will be the output?
i have done some further R & D on collocdriver program...i found out that it uses a sequence file and then vector generation...i wanna know how this sequence file / vector generation works..plz help..
To get collocation using mahout,u need to follow some simple steps
1) You must make a sequence file from ur input text file.
/bin/mahout seqdirectory -i /home/developer/Desktop/colloc/ -o /home/developer/Desktop/colloc/test-seqdir -c UTF-8 -chunk 5
2)There are two ways to generate collocations from a sequence file.
a)Convert sequence file to sparse vector and find out the collocation
b)Directly find out the collocation from the sequence file (with out creating the sparse vector)
3)Here i am considering choice b.
/bin/mahout org.apache.mahout.vectorizer.collocations.llr.CollocDriver -i /home/developer/Desktop/colloc/test-seqdir -o /home/developer/Desktop/colloc/test-colloc -a org.apache.mahout.vectorizer.DefaultAnalyzer -ng 3 -p
Just check out the output folder,the files u need is over there !!! (in sequence file format)
/bin/mahout seqdumper -s /home/developer/Desktop/colloc/test-colloc/ngrams/part-r-00000 >> out.txt will give u a text output !!!

Resources