What are some real-world projects done in concatenative languages like Forth, Factor, Joy, etc.?
factorcode.org, concatenative.org and tinyvid.tv are powered by Furnace, a Factor web server and framework.
PostScript is concatenative, and there's obviously a huge number of applications of PostScript. It's just not a general purpose programming language.
As Greg wrote, postscript is the mammoth example.
Concatenative languages pop up everywhere, quite naturally, because of the trivial nature of the language runtime. It's a favourite for many firmwares: I first encountered Forth "in the flesh" in the bootloader for a Sun Sparcstation. It powers the firmware for the OLPC.
Ocaml's parent, Caml was based on realising the semantics of functional programming as the Categorical Abstract Machine (the CAM in Caml).
Bibtex uses a concatenative language to compile style files.
There is the somewhat-obsolete but very cool Quartus Forth for Palm which allowed full compiled application development on the Palm device (Forth as a minimalist language works rather well in those circumstances). Their home page lists several Palm apps.
This FIG page has a list of mostly-embedded projects including a reference to the very cool use of Forth by NASA.
I met a guy at an Apple conference in Queensland back in about 1991 who had retailed a road planning application written in MacForth.
Christopher Diggins was talking about his Cat language being used inside Microsoft to help optimise compilers but I don't know if that went anywhere.
I suspect PowerMOPS (the successor to Neon) may elude the definition of concatenative because its big deal is adding object-orientation, which implies instances.
Take a look at FORTH Inc, They list several projects that they and their customers did, using their FORTH.
Eserv and nncron are written in SP-Forth.
Bitcoin protocol, and most of the other cryptocoins, uses pubkey scripts and signature scripts for validation of transactions:
Pubkey scripts and signature scripts combine secp256k1 pubkeys and signatures with conditional logic, creating a programable authorization mechanism.
These scripts are written in a concatenative language:
The script language is a Forth-like stack-based language deliberately designed to be stateless and not Turing complete. Statelessness ensures that once a transaction is added to the block chain, there is no condition which renders it permanently unspendable. Turing-incompleteness (specifically, a lack of loops or gotos) makes the script language less flexible and more predictable, greatly simplifying the security model.
Part of the firmware on Macs (at least in the older PowerPC models) was written in Forth.
See: Link
Related
I'm currently working on a project that makes use of a custom language with a simple context-free grammar.
Due to the project's characteristics the same language will have to be used on several platforms, especially mobile ones. Currently, I'm using my small hand-written Java parser (for the Android platform). Soon, I'll have to write basically the same parser for JavaScript and later possibly also for C# (Windows Phone) and Objective C (iOS). There is an additional chance that I'll also have to write it for PHP.
My question is: What options are there to simplify the parser development process? Do I really have to write basically the same parser for each platform or is there a less work-intensive way?
From a development process point of view the best alternative would enable me to write a grammar definition which would then automatically be compiled into a parser.
However, basically the only cross-platform parser generator I've found so far it the GOLD Parser which supports two of my target platforms (Java and C#). It would really be awesome if you could point me to other alternatives.
In case you don't know about other cross-platform compiler-compilers: Do you have hints how to structure the code towards future language extensibility?
I commend https://en.wikipedia.org/wiki/Comparison_of_parser_generators to your attention: if we restrict the domain to Java and C/C++, it suggests APG, GOLD, SableCC, and SLK (amongst others) as being cross-language enough for your stated goals. (I'm also requiring that the action code be separated from the grammar rather than inline, since the latter would defeat the purpose.) If you want JavaScript as well, it looks like your choices are APG (GPL-licensed) and WaxEye (MIT-licensed).
If your language is reasonably simple then I would say to just go with whichever you think will be easiest to integrate into your build environment(s) and has a reasonable match with how you think. Unless parsing time is a huge fraction of your application's total workload, parsing speed should not be an issue -- although table size and memory usage might matter in a mobile context. If your grammar is "simple enough," (i.e. not Perl, for instance) I would expect any of those tools to work.
Have a look in Antlr, I am using it for transforming java code and it is really great. Moreover you can find different grammars here.
REx parser generator supports the required targets, except for Objective C and PHP (code generators for those might be possible). It has not yet been published as open source, though, and there is no decent documentation, just sample grammars. But there are projects that are using it successfully, e.g. xqlint. Here is a paper describing the experience from that project.
I'm potentially interested in exploring a stack-based language like Forth (or Factor). What I'd like to see is how an application might be built from the ground up, step by step. The tutorials I've found are rudimentary and are not helping me to understand the bigger picture. It's confusing to think of how one might manage the stack when dealing with lots of parts.
I've always thought (maybe wrongly) that a good way to learn a language would be to use it to write a Roguelike game. I'm having trouble trying figure out how one would juggle a lot of things on a stack: a maze, dozens of creatures, treasures, character stats, etc.
In some sense, all languages are equivalent; you write a program by breaking a problem down into smaller pieces, then you code up those pieces and make them all work together. Forth has unusual syntax features, but it's still a programming language.
In fact, Forth places a great deal of power at your fingertips, much as Lisp does. In Lisp, using "macros", you can write your own control structures that are just as good as anything built-in; in Forth, you can do the same.
If you are interested in finding out more about Forth, I suggest you get the classic books Starting Forth and Thinking Forth by Leo Brodie.
Oh! Google just told me that both books are available free online now:
http://www.forth.com/starting-forth/
http://thinking-forth.sourceforge.net/
I'd point you to Factor rather than pure Forth; there are plenty of sample apps, GUIs, web apps, etc. If you're specifically interested in a web framework, look at Furnace.
Ultimately I don't understand the question; what does being stack-based have to do with getting anything done? Back in the old days it was the language of choice for embedded systems; I wrote everything from cereal boxing robots to calculators to... well, mostly everything.
I attempt to directly answer all of your questions at the bottom of this answer. But before that, here are 7 points or aspects of learning stack-based Forth, or generally about stack-based-languages, or ways to proceed with your goals:
1: Here is a key way to think about this: The stack is not the place where you keep your data -- just the way that you pass parameters from one function (or method) to another.
2: There is a "Basic Language" interpreter written in Forth. This would be one way to write an application in Forth -- write your own Basic interpreter (or type in one you find), then write the Roguelike game in the Basic language you just created.
3: If you go through the old "Forth Dimensions" magazine, several different object-oriented extensions to Forth were created and presented (including source code). Type one of those in, and use it to write your rogue-like game in one of those object-oriented ways. If you're still stuck, write it in your favorite C language -- C++, C#, or Java, and here's the key... write it as primitively as you can. Then take your own application apart -- each method, enum, constant, etc. constitutes part of a Domain Specific Language that you created in order to implement the game. Write down those parts, and figure out how to do it in Forth, ideally using only the idioms of Forth. You may have to work in Forth a while to get to that stage.
4: Learn how to program in assembly language, an assembly language for which a Forth exists that has an embedded assembler for the same microcontroller or CPU. This is where Forth really crushed the competition when computers were just becoming more widely available -- bringing a completely new microcontroller up, creating the BIOS and the OS. Once you've done this in pure assembly language, then go into the Forth that has the same assembly language and program the same application, but using the Forth assembler to build the building blocks (the DSL's) and then writing the BIOS and the OS in those DSL's. Forths happen to be both low-level and high-level at the same time, and the best-written applications are written in multiple DSL's, not in stack-based primitives. Once you've written in assembly language, when you try to learn Forth, you will find that it is easy -- the basic mechanisms are simple, but the combination of basic mechanisms is simple, powerful and elegant.
5: Look at either C# or Java, and realize that they are both based on Virtual Machines, which is a specific kind of DSL. But also realize that both Virtual Machines are stack machines, and the language that powers both of them are stack based languages. But if you program in Java, or if you program in C#, you really don't think at all about the stack foundation upon which you are running -- and that's a really good thing! You want to follow the same pattern. A. Realize what basic mechanisms / tools you need, B. Build the tools you need, C. Build your application with the tools you built. You may have to translate your own infix pseudocode to the postfix VM you've created. Do you need Object Orientation?--You can build it with Forth. Do you need multiprocessing?--You can build it. Do you need an Erlang kind of light multithreading?--You can build it. Do you need a Basic, a Pascal, a C#, a Java, a Lua?--You can build it. Build your Roguelike game in your favorite language, then perhaps a different language, then assembly language. Then build it in Forth.
6: At this link, there is a way to make a microprocessor that natively runs CIL (Common Intermediate Language -- the stack based language of both .NET and Mono, which used to be MSIL Microsoft Intermediate Language, before Microsoft Open Sourced the specification). You buy the IP logic block and then program it into an FPGA. Here is the Abstract of a paper that describes the object-oriented stack-based language and implementation of CIL:
Embedded systems and their applications are becoming ubiquitous and
transparent. Nowadays, the designers need to implement both hardware
and software as fast as they can to face the competition. Hence tools
and IPs became an important factor of the equation. In this paper, we
present a synthesizable core similar to the microarchitecture of
Tanenbaum's (2006) JVM processor. The core is an implementation of a
subset of Microsoft's CIL (Common Intermediate Language). We seek to
accelerate the development of embedded software by providing a
platform onto which the whole .NET framework (C#, Visual Basic.NET...)
(along with its object-oriented approach) could execute. We used a
Xilinx Virtex II Pro as the prototyping platform.
7: Go to the current work-place of Charles Moore, the father of Forth, GreenArrays, Inc. and look under Features for "Implements the colorForth instruction set", -- and you will find that he, too, has taken his virtual machine, having optimized Forth his entire life, and reduced it to an insanely fast 144 core microchip with a claimed capability of 100 BIPS. Here, you should be able to look at the "colorForth instruction set" and the description of it -- he has removed any fluff and bloat, and reduced it to an extremely spartan instruction set. We should learn from this genius before it is too late (he is advanced in years, and Covid19 currently threatens many of our patriarchs).
Below, I try to more directly handle each aspect of your question:
How would one code an application in Forth (or Factor)?
Same way as you would with any other application, learn the building blocks, and build the simplest thing that could possibly work, then grow it from there.
I'm potentially interested in exploring a stack-based language like Forth (or Factor).
There are a multitude of stack-based languages around. They seem to be the foundation, not only of most processors, but also of most virtual machines, as in Java and the JVM, and C# and the CIL.
What I'd like to see is how an application might be built from the ground up, step by step.
That's a great idea for a Forth book... Anybody game? I know there are a lot of Forth people out there, and because of Covid19, a lot of you are at home, too. What kink of application would you like to see? Just looking at FPC teaches a lot.
The tutorials I've found are rudimentary and are not helping me to understand the bigger picture.
There are higher level tutorials out there. But widen your search -- include JVM, CIL, and even LUA (because I think it has a VM, too). Write an app in Lisp. Then take the primitives you built in Lisp, and implement them in Forth, or Factor, or CIL, or JVM instructions.
It's confusing to think of how one might manage the stack when dealing with lots of parts.
That's why I said that you should learn an assembly language -- because most stack-based languages are a micro-step up from that. You have to learn the basic levers, springs, latches, and other primitive mechanisms out of which all of our software is ultimately made. But you only think in terms of stacks to make the primitive mechanisms -- once you have those made, in general, don't even touch the stack stuff -- think only in terms of the basic mechanisms, then build progressively higher layers, even progressing to object orientation, as exemplified in the CIL and JVM. The better applications are layers of implicit DSL's, each DSL targeted at a step up from the previous one. The layers are: Assembly-Language, CIL or other VM, BIOS, Operating-System, Library, Business-Language, Application. Most people only think of the Business-Language layer as a DSL, but every single layer is its own DSL, and really, every class, method, object, config-file, and specialized language, like HTML, CSS, JavaScript, etc. -- they are ALL DSL's, either explicitly (formally specified) or implicitly (created by the programmer as things are built.) Understanding gained from book "Code Complete" second edition -- read over and over is my suggestion.
I've always thought (maybe wrongly) that a good way to learn
a language would be to use it to write a Roguelike game.
Sure... but pick anything you like, really. But perhaps try getting an Arm-based Arduino, programming it, disassembling that, then writing your own assembly language -- then get a Forth for that Arm chip, and program it in that. Then implement your Roguelike game in the Arm-Forth.
I'm having trouble trying figure out how one would juggle a lot of
things on a stack: a maze, dozens of creatures, treasures, character
stats, etc.
Build your "Rogue" Domain-Specific-Language, make that into a VM, then implement the VM instructions first in a high level stack language, like CIL, then try to figure out how to do it in one of the low-level stack-languages. There are many flavors of object-orientation implemented in Forth -- spelunk the old Forth Dimensions articles
and some of the old Forth conference presentations to find them. Google, and the "Internet Archive", or Way-Back Machine, are your friends for this.
There are a lot of books and articles about creating compilers which do all the compilation job at a time. And what about design of incremental compilers/parsers, which are used by IDEs? I'm familiar with first class of compilers, but I have never work with the second one.
I tried to read some articles about Eclipse Java Development Tools, but they describe how to use complete infrastructure(i.e. APIs) instead of describing internal design(i.e. how it works internally).
My goal is to implement incremental compiler for my own programming language. Which books or articles would you recommend me?
This book is worth a look: Builing a Flexible Incremental Compiler Back-End.
Quote from Ch. 10 "Conclusions":
This paper has explored the design of
the back-end of an incremental
compilation system. Rather than
building a single fixed incremental
compiler, this paper has presented a
flexible framework for constructing such
systems in accordance with user needs.
I think this is what you are looking for...
Edit:
So you plan to create something that is known as a "cross compiler"?!
I started a new attempt. Until now, I can't provide the ultimate reference. If you plan such a big project, I'm sure you are an experienced programmer. Therefore it is possible, that you already know these link(s).
Compilers.net
List of certain compilers, even cross compilers (Translators). Unfortunately with some broken links, but 'Toba' is still working and has a link to its source code. May be that this can inspire you.
clang: a C language family frontend for LLVM
Ok, it's for LVVM but source is available in a SVN repository and it seems to be a front end for a compiler (translator). May be that this can inspire you as well.
I'm going to disagree with conventional wisdom on this one because most conventional wisdom makes unwritten assumptions about your goals, such as complete language designs and the need for extreme efficiency. From your question, I am assuming these goals:
learn about writing your own language
play around with your language until it looks elegant
try to emit code into another language or byte code for actual execution.
You want to build a hacking harness and a recursive descent parser.
Here is what you might want to build for a harness, using just a text based processor.
Change the code fragment (now "AT 0700 SET HALLWAY LIGHTS ON FULL")
Compile the fragment
Change the code file (now "tests.l")
Compile from file
Toggle Lexer output (now ON)
Toggle Emitter output (now ON)
Toggle Run on home hardware (now OFF)
Your command, sire?
You will probably want to write your code in Python or some other scripting language. You are optimizing your speed of play, not execution. A recursive descent parser might look like:
def cmd_at():
if next_token.type == cTIME:
num = next_num()
emit("events.setAlarm(events.DAILY, converttime(" + time[0:1] + ", "
+ time[2:] + ", func_" + num + ");")
match_token(cTIME)
match_token(LOCATION)
...
So you need to write:
A little menu for hacking.
Some lexing routines, to return different tokens for numbers, reserved words, and the like.
A bunch of logic for what your language
This approach is aimed at speeding up the cycle for hacking together the language. When you have finished this approach, then you reach for BISON, test harnesses, etc.
Making your own language can be a wonderful journey! Expect to learn. Do not expect to get rich.
I see that there is an accepted answer, but I think that some additional material could be usefully included on this page.
I read the Wikipedia article on this topic and it linked to a DDJ article from 1997:
http://www.drdobbs.com/cpp/codestore-and-incremental-c/184410345?pgno=1
The meat of the article is the first page. It explains that the code in the editor is divided into pieces that are "incorporated" into a "CodeStore" (database). The pieces are incorporated via a work queue which contains unincorporated pieces. A piece of code may be parsed and returned to the work queue multiple times, with some failure on each attempt, until it goes through successfully. The database includes dependencies between the pieces so that when the source code is edited the effects on the edited piece and other pieces can be seen and these pieces can be reprocessed.
I believe other systems approach the problem differently. Java presents different problems than C/C++ but has advantages as well, so Eclipse perhaps has a different design.
I have a task to convert COBOL code to .NET. Are there any converters available? I am trying to understand COBOL code in high level. I have a trouble understanding the COBOL code. Is there any flowchart generators? I appreciate any help.
Thank you..
Migrating software systems from one language or operating environment to another is always a challenge. Here are
a few things to consider:
Legacy code tends to be poorly structured as a result of a
long history of quick fixes and problem work-arounds. This really ups the signal-to-noise ratio
when trying to warp your head around what is really going on.
Converting code leads to further "de-structuring"
to compensate for mis-matches between the source and
target implementation platforms. When you start from a poorly structured base (legacy system),
the end result may be totally un-intelligible.
Documentation of the legacy architecture and/or business processes is generally so far out of
date that it is worse than useless, it may actually be misleading.
Complexity of COBOL code is almost always under estimated.
A number of "features" will be promulgated into the converted system that were originally
built to compensate for things that "couldn't be done" at one time (due to smaller memories,
slower computers etc.). Many of these may now be non-issues and you really don't want them.
There are no obvious or straight forward ways to refactor legacy process driven
systems into an equivalent object oriented system (at least not in a meaningful way).
There have been successful projects that migrated COBOL directly into Java. See naca.
However, the end result is only something its mother (or another COBOL programmer) could love, see this discussion
In general I would be suspicious of any product or tool making claims to convert your COBOL legacy
system into anything but another version of COBOL (e.g. COBOL.net). To this end you still
end up with what is essentially a COBOL system. If this approach is acceptable then you
might want to review this white paper from Micro Focus.
IMHO, your best bet for replacing COBOL is to re-engineer your system. If you ever find
a silver bullet to get from where you are to where you want to be - write a book, become
a consultant and make many millions of dollars.
Sorry to have provided such a negative answer, but if you are working with anything
but a trivial legacy system, the problem is going to be anything but trivial to solve.
Note: Don't bother with flowcharting the existing system. Try to get a handle on process input/output and program to program data transformation and flow. You need to understand the business function here, not a specific implementation of it.
Micro Focus and Fujitsu both have COBOL products that work with .NET. Micro Focus allow you to download a product trial, while the Fujitsu NetCOBOL site has a number of articles and case studies.
Micro Focus
http://www.microfocus.com/products/micro-focus-developer/micro-focus-cobol/windows-and-net/micro-focus-visual-cobol.aspx
Fujitsu
http://www.netcobol.com/products/Fujitsu-NetCOBOL-for-.NET/overview
[Note: I work for Micro Focus]
Hi
Actually, making COBOL applications available on the .NET framework is pretty straightforward (contrary to claim made in one of the earlier responses). Fujitsu and Micro Focus both have COBOL compilers that can create ILASM code for execution in the CLR.
Micro Focus Visual COBOL (http://www.microfocus.com/visualcobol) makes it particularly easy to deploy traditional, procedural COBOL as managed code with full support for COBOL data types, file systems etc. It also includes an updated OO COBOL syntax that takes away a lot of the verbosity & complexity of the syntax to be very easy to write COBOL code based on C# examples. It's unique approach also makes it easy to use all the Visual Studio tools such as IntelliSense.
The original question mentioned "convert" and I would strongly recommend against any approach that requires the source code to be converted to some other language before being used in a .NET environment. The amount of effort and risk involved is highly unlikely to be worth any benefits accrued. On the contrary, keeping the code in COBOL maintains the existing, working code and allows for the option to deploy onto other platforms in the future. For example, how about having a single set of source code and having the option to deploy into .NET as a native language and into a Java environment without changing a line of source code?
I recommend you get a trial copy of Visual COBOL from the link above and see how you can use your existing code in .NET without making any changes.
This is not an easy task. COBOL has fundamental ideas about data types that do not map well with the object-oriented .NET framework (e.g. in COBOL, all data types are represented in terms of fixed-size buffers) and in particular the way groups and arrays work do not map well to .NET classes.
I believe there are COBOL compilers that can actually compile .NET bytecode, but they would have their own runtime libraries to manage all of that. It might be worth looking at one of these compilers and simply leaving the legacy code in COBOL.
Other than that, line-by-line translation is probably not possible. Look at the code at a higher level and translate blocks of code at a time (e.g. at the procedure level or even higher).
There are a lot mechanisms how to convert COBOL to modern scalable environments, such as .NET or Java.
The first is a migration to a new environment with saving the existing COBOL code with some minor modifications (NET Microfocus COBOL);
The second is a migration to a new platform with simulation of COBOL statements and constructions. When there are some additional NET/Java libraries to simulate some specific COBOL logic:
ACCEPT goes to NETLibrary.Accept and so on.
The third approach is the most valuable one, when you migrate to "pure" NET/Java code with all the benefits of the new environment. It can be easily maintained and developed in the future.
However, the unique expertise and toolkits are required for this approach, and there are only a few players on the global market that can help you in this case.
If we are talking about automatic migration, the number of players decreases greatly and, unfortunately for you, you have to pay for the specific technologies and tools (like ours).
However, it is a better idea to invest your money in your future growth in the modern environment, than to spend your money on the "simulation" of old technologies.
Translations is not an easy task. Besides Micro Focus and Fujitsu there is also Raincode that offers a free version of Cobol that nicely integrates with Visual Studio.
I'm doing some small projects which involve having different syntaxes for something, however sometimes these syntaxes are so easy that using a parser generator might be overkill.
Now, when should I use a hand-made parser, and when should I use a parser generator?
Thanks,
William van Doorn
There is no hard-and-fast answer, other than "use whatever is easiest for the particular situation".
My experience is that parsers tend to get more complicated over their lifetimes, so using a parser generator up front usually pays off. Even if the language doesn't get more complicated, using a generator forces you to create a formal specification of the syntax, which is itself valuable.
The downsides are that other programmers may not know how to use the generator, so it makes it difficult for others to help out, and it makes your project dependent on that generator.
It's worth coding the parser by hand if, and only if, you're super-keen to have it be extremely fast even on a machine of very modest speed. For example, in this article on the history of Turbo Pascal from before it got its name, you can see how and why the prototype impressed the small (then Danish) firm "Borland" to hire the prototype's author (Anders Hejlsberg), fully develop the compiler, and launch it as its main product, and I quote...:
with no great expectations I hit the
compile key - AND THEN I WAS
COMPLETELY FLOORED! My test program,
that took minutes to compile and link
using Digital Research’s Pascal MT+,
was compiled and running before I
could blink an eye! That was a great
WOW moment!
Turbo Pascal's amazing compile speed -- coming first and foremost from a carefully hand-coded and highly tuned recursive descent parser coded in assembly language -- allowed it to use a very different strategy from most compilers: no separate compilation pass generating object files and libraries, and then a linker to put them together, rather, Turbo Pascal 1.0 was a single-pass compiler that directly turned source code into a single executable binary.
I remember just the same amazing experience on the tiny personal computers of that era (when a Z80, 64K or RAM, and two floppies was a lot;-) -- Turbo Pascal, with its amazing parser and the IDE and everything else, fit comfortably in memory together with a substantial program in both source and compiled form -- no floppies were needed, which meant many orders of magnitude of difference in program turnaround time.
If Hejlsberg had stuck to what was already the traditional wisdom at the time -- always use parser generators -- Turbo Pascal would probably never have emerged as a commercial product, and definitely not achieved the dominance in the Pascal world it enjoyed for years.
Of course, on a typical PC of today, such extreme parsing speed would not be needed for most compilers. Possible exceptions include compilers that must run seamlessly as part of an "interpreter-like" environment (the simple compilers for languages such as Perl and Python are typically hand-coded, to substantial extents, for that reason -- that was an implementation choice that made them viable in the '90s, although today it's not clear it's still needed), or compilers that run on very limited hardware resources, such as smartphones or low-end netbooks.
In the vast majority of cases in which you'll be writing a compiler, none of these performance considerations probably apply, and you'll be happier with a parser generator.
Your question title suggests that using a grammar is optional. It really isn't - even if I was going to implement a tiny language, I'd sketch out a grammar on a single sheet of paper.
As for when to use parser generators, this is really personal preference. Many people believe in hand-writing recursive descent parsers, rather than using the table-driven approach, for example. The important thing is to be comfortable in understanding the capabilities of the generator.
And don't be thinking that using parser generators is somehow the more professional, or even the easier approach. Bjarne Stroustrup when writing the first C++ compiler intended to use recursive descent, but got talked out of it by some keen colleagues at Bell Labs, much to his eventual chagrin. See section 3.3.2 of The Design and Evolution of C++ for more details.