writing a parser for GUI debugger

writing a parser for GUI debugger - parsing

I want to write a GUI frontend to gdb, using MI. Currently I can communicate with gdb via pipe, but a GUI debugger should be able to display source code and allow users to check/modify data using thier mouse.
The question is, in order to know what variable the user is pointing at, I think I need to write a parser. However, I don't want to implement the whole lexer and parser things. How can I get the locations of those identifiers in the source code?
[EDIT]
In short, I want the user to be able to check the value of a variable by hover over the variable using mouse, so I have to parse the code to know where does each variable appear. I want to achieve functions like this:

How can I get the locations of those identifiers in the source code?
... without writing a parser.
You can't. You would need to either write your own (for all programming languages your GUI will support), or hook one of the existing ones.
Clang makes it relatively easy to incorporate the C/C++ parser into a GUI, but ...
not everything can be parsed with Clang
this one aspect of writing a GUI is likely to be 100x more complicated than all the others, so perhaps not worth the effort.

Related

Is there a sandboxable compiled programming language simililar to lua

I'm working on a crowd simulator. The idea is people walking around a city in 2D. Think gray rectangles for the buildings and colored dots for the people. Now I want these people to be programmable by other people, without giving them access to the core back end.
I also don't want them to be able to use anything other than the methods I provide for them. Meaning no file access, internet access, RNG, nothing.
They will receive get events like "You have just been instructed to go to X" or "You have arrived at P" and such.
The script should then allow them to do things like move_forward or how_many_people_are_in_front_of me and such.
Now I have found out that Lua and python are both thousands of times slower than compiled languages (I figured it would be in order of magnitude of 10s times slower), which is way to slow for my simulation.
So heres my question: Is there a programming language that is FOSS, allows me to restrict system access (sandboxing) the entire language to limit the amount of information the script has by only allowing it to use my provided functions, that is reasonably fast, something like <10x slower than Java, where I can send events to objects inside that language with which I can load in new Classes/Objects on the fly.

Don't you think that if there was a scripting language faster than lua and python, then it'd be talked about at least as much as they are?
The speed of a scripting language is rather vague term. Scripting languages essentially are converted to a series of calls to functions written in fast compiled languages. But the functions are usually written to be general with lots of checks and fail-safes, rather than to be fast. For some problems, not a lot of redundant actions stacks up and the script translation results in essentially same machine code as the compiled program would have. For other problems, a person, knowledgeable about the language, might coerce it to translate to essentially same machine code. For other problems the price of convenience stay forever with the script.
If you look at the timings of benchmark tasks, you'll find that there's no consistent winner across them. For one task the language is fastest, for the other it is way behind.
It would make sense to gauge language speed at your task by looking at similar tasks in benchmarks. So, which of those problem maps the closest to yours? My guess would be: none.
Now, onto the question of user programs inside your program.
That's how script languages came to existence in the first place. You can read up on why such a language may be slow for example in SICP.
If you evaluate what you expect people to write in their programs, you might decide, that you don't need to give them whole programming language. Then you may give them a simple set of instructions they can use to describe a few branching decisions and value lookups. Then your own very performant program will construct an object that encompasses the described logic. This tric is described here and there.
However if you keep adding more and more complex commands for users to invoke, you'll just end up inventing your own language. At that point you'll likely wish you'd went with Lua from the very beginning.
That being said, I don't think the snippet below will run significantly different in compiled code, your own interpreter object, or any embedded scripting language:
if event = "You have just been instructed to go to X":
set_front_of_me(X) # call your function
n = how_many_people_are_in_front_of_me() #call to your function
if n > 3:
move_to_side() #call to function provided by you
else:
move_forward() #call to function provided by you
Now, if the users would need to do complex computer-sciency stuff, solve np-problems, do machine learning or other matrix multiplications, then yes, that would be slow, provided someone would actually trouble themselves with implementing that.
If you get to that point, it seem that there are at least some possibilities to sandbox the compiled dlls (at least in some languages). Or you could do compilation of users' code yourself to control the functionality they invoke and then plug it in as a library.

What's the fastest way to read a large file in Ruby?

I've seen answers to this question but I couldn't figure out which of the answers would perform the fastest. These are the answers I've seen- which is best?
Read one line at a time using each or each_line
Read one line at a time using gets
Save it all into an array of lines using readlines and then use each
Use grep (not sure what exactly to do with grep...)
Use sed (not sure what exactly to do with sed...)
Something else?
Also, would it be better to just use another language or should Ruby be fine?
EDIT:
More details: Each line contains something like "id1 attr1_1 attr2_1 id2 attr1_2 attr2_2... idn attr1_n attr2_n" (n is very big) and I need to insert those into a database. For that example line, I would need to insert n rows into the database.

Ruby will likely be using the same or very similar low-level code (written in C) to do the actual reading from disk for the first three options, so they should perform similarly. Given that, you should choose whichever is most convenient for you; the ability to do that is what makes languages like Ruby so useful! You will be reading a lot of data from disk, so I would suggest using each_line and processing each line as you read it.
I would not recommend bringing grep, sed, or any other such external utilities into the picture unless you have a very good reason, as they will make your code less portable and expose you to failures that may be difficult to diagnose.

If you're using Ruby then there's no need to worry about performance. The language is such that it suits an iterative approach to reading a file, line by line, and works very nicely. So long as you're using the language the way it's designed you can let the interpreter people worry about performance. Job done.
If one particular readLargeFileFast method is needed then it should be because it's really hindering the program somehow. Now, you write a C program to do it and popen it as a separate process within your ruby code. You could call it read_large.c and (perhaps) use command line arguments to tell it how to behave.
This is championing the idea that a scripting language is used for a fast development rather than a fast run time. As such a developer can be very productive by swiftly 'prototyping' a program in something like Ruby and only later rewriting the components warrant some low level code. Often, however, once it's working in script, it's not necessary to do anything else at all.
The Ruby Docs describe launching a separate process and treating it as a file. It's easy-peasy! A good start is The Art of Linux Programming's introductory paragraph on program modularity. This book also makes a great example of using linux's standard stream editor, called sed, which you could probably use from Ruby right now.
If you need to parse or edit a lot of text then many interpreters or editors have been written around sed's functionality. Further, it may save you a lot of effort writing something super efficient if you don't know C. Good is the Introduction to SED by Bruce Barnett.

How to get started with embeddable scripting?

I am working on a game in C++. I've been told, though, that I should also use an embeddable scripting language like Lua or Angelscript, but to be honest, I have no idea how or why. What advantages would this bring me, over storing all of my data in some sort of text file? How do I get started? I tried to read some Lua examples, but I don't see how it works, or how exactly I am supposed to use it.

First the "why" question:
If you've made reasonable progress so far, you have game scenery where the action happens, and then a kind of GUI with your visible game controls: Maps, compass, hotkeys, chat box, whatever.
If you make the GUI (positions, sizes, settings, defaults, etc) configurable through a configuration file, that's OK for starters. But if you make it controllable by code then you can do many very cool things. Example: Minimize the map when entering a city. Show other player's portraits when in group. Update the map. Display different hot keys in combat. That kinda thing.
Now you can do your code-controlling of your GUI in C/C++ code, but one problem is that whenever you want to change the behavior, even if only a little, you need to recompile the whole dang game client. If you have a billion players, you have to ship them all a new game client. That's a turn-off. Another problem is that there's no way on earth that a player can customize the GUI.
A simple embedded language solves both problem. You can put that kind of code in separate files that get loaded at runtime and can be fiddled with to anyone's heart's content. If you want to update the GUI in some minor way, you can deliver updates of the GUI code separately from the game proper.
As for the how:
The simplest thing to do is to call a (e.g.) Lua "main" routine once for every frame, perhaps passing a bunch of parameters with the latest updatable information, and let that main routine call other functions to do whatever's needed. The thing to be aware of is that your embedded code only gets control for a short time, namely the time between two screen refreshes; so it does a little updating and painting, then it exits again and returns control to your C/C++ main program loop.
Technically, embedding a Lua interpreter in your program is pretty easy. The Lua interpreter has C source code, or there's pre-compiled libraries (DLLs) for Windows. Just link them into your program, initialize once, call the entry point on every iteration of the main frame loop, done.

Scripts are more powerful than storing all of your data in text files. You can assign arbitrary behavior, construct data from other data (e.g., orc captains are orcs with a bit more), and so on.
Scripts allow for faster development and easier maintenance than C++. No compile / edit / link cycle, you can even tweak the scripts while the game is running, and they're easier to update on end users' machines.
As far as the how, one suggestion would be to see how other games do it. For example, TOME, a Roguelike RPG written in C, uses Lua extensively.

For some inspiration, check out the Alternate Hard and Soft Layers pattern described on the C2 wiki.
As for my two cents, why embed a scripting language? Some reasons that I've experienced include,
REPL
easy string manipulation tools
leverage the power of loops, macros, and recursion within your data set
create dynamically generated content
wrappers to fetch content from the web
logic to provide default values if data is missing
unit tests written at the data set level

What do people do with Parsers, like antlr javacc?

Out of curiosity, I wonder what can people do with parsers, how they are applied, and what do people usually create with it?
I know it's widely used in programming language industry, however I think this is just a tiny portion of it, right?

Besides special-purpose languages, my most ambitious use of a parser generator yet (with good old yacc back in C, and again later with pyparsing in Python) was to extract, validate and possibly alter certain meta-info from SQL queries -- parsing SQL properly is a real challenge (especially if you hope to support more than one dialect!-), a parser generator (and a lexer it sits on top of) at least remove THAT part of the job!-)

They are used to parse text....
To give a more concrete example, where I work we use lexx/yacc to parse strings coming over sockets.
Also from the name it should give you an idea what javacc is used for (java compiler compiler!)

Generally to parse Domain Specific Languages or scripting languages, or similar support for code snipits.

Previously I have seen it used to parse the command line based output of another software tool. This way the outer tool (VPN software) could re-use the base router IPSec code without modification. As lots of what was being parsed was IP Route tables and other structured repeated text.
Using a parser allowed simple changes when the formatting changed, instead of trying to find and tweak the a hand written parser. And the output did change a few times of the life of the product.

I used parsers to help process +/- 800 Clipper source files into similar PRGs that could be compiled with Alaksa Xbase 32.

You can use it to extend your favorite language by getting its language definition from their repository and then adding what you've always wanted to have. You can pass the regular syntax to your application and handle the extension in your own program.

How do I implement forward references in a compiler?

I'm creating a compiler with Lex and YACC (actually Flex and Bison). The language allows unlimited forward references to any symbol (like C#). The problem is that it's impossible to parse the language without knowing what an identifier is.
The only solution I know of is to lex the entire source, and then do a "breadth-first" parse, so higher level things like class declarations and function declarations get parsed before the functions that use them. However, this would take a large amount of memory for big files, and it would be hard to handle with YACC (I would have to create separate grammars for each type of declaration/body). I would also have to hand-write the lexer (which is not that much of a problem).
I don't care a whole lot about efficiency (although it still is important), because I'm going to rewrite the compiler in itself once I finish it, but I want that version to be fast (so if there are any fast general techniques that can't be done in Lex/YACC but can be done by hand, please suggest them also). So right now, ease of development is the most important factor.
Are there any good solutions to this problem? How is this usually done in compilers for languages like C# or Java?

It's entirely possible to parse it. Although there is an ambiguity between identifiers and keywords, lex will happily cope with that by giving the keywords priority.
I don't see what other problems there are. You don't need to determine if identifiers are valid during the parsing stage. You are constructing either a parse tree or an abstract syntax tree (the difference is subtle, but irrelevant for the purposes of this discussion) as you parse. After that you build your nested symbol table structures by performing a pass over the AST you generated during the parse. Then you do another pass over the AST to check that identifiers used are valid. Follow this with one or more additional parses over the AST to generate the output code, or some other intermediate datastructure and you're done!
EDIT: If you want to see how it's done, check the source code for the Mono C# compiler. This is actually written in C# rather than C or C++, but it does use .NET port of Jay which is very similar to yacc.

One option is to deal with forward references by just scanning and caching tokens till you hit something you know how to real with (sort of like "panic-mode" error recovery). Once you have run thought the full file, go back and try to re parse the bits that didn't parse before.
As to having to hand write the lexer; don't, use lex to generate a normal parser and just read from it via a hand written shim that lets you go back and feed the parser from a cache as well as what lex makes.
As to making several grammars, a little fun with a preprocessor on the yacc file and you should be able to make them all out of the same original source

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart