Proof of Turing Completeness for a stack-based language - stack

I'm writing a joke language that is based on stack operations. I've tried to find the minimum amount of instructions necessary to make it Turing complete, but have no idea if a language based on one stack can even be Turing complete. Will these instructions be enough?
IF (top of stack is non-zero)
WHILE (top of stack is non-zero)
PUSH [n-bit integer (where n is a natural number)]
POP
SWAP (top two values)
DUPLICATE (top value)
PLUS (adds top two values, pops them, and pushes result)
I've looked at several questions and answers (like this one and this one) and believe that the above instructions are sufficient. Am I correct? Or do I need something else like function calls, or variables, or another stack?
If those instructions are sufficient, are any of them superfluous?
EDIT: By adding the ROTATE command (changes the top three values of the stack from A B C to B C A) and eliminating the DUPLICATE, PLUS, and SWAP commands it is possible to implement a 3 character version of the Rule 110 cellular automaton. Is this sufficient to prove Turing completeness?
If there is an example of a Turing complete one-stack language without variables or functions that would be great.

If you want to prove that your language is Turing complete, then you should look at this Q&A on the Math StackExchange site.
How to Prove a Programming Language is Turing Complete?
One approach is to see if you can write a program using your language that can simulate an arbitrary Turing Machine. If you can, that is a proof of Turing completeness.
If you want to know if any of those instructions are superfluous, see if you can simplify your TM emulator to not use one of the instructions.
But if you want to know if a smaller Turing complete language is possible, look at SKI Combinator Calculus. Arguably, there are three instructions: the S, K and I combinators. And I is apparently redundant.

A language based only on a single stack can't be Turing complete (unless you "cheat" by allowing stuff like temporary variables or access to values "deeper" in the stack than the top item). Such a language is, as I understand it, equivalent to a Pushdown Automata, which can implement some stuff (e.g. context-free grammars) but certainly not as much as a full Turing machine.
With that said, Turing machines are actually a much lower bar than you'd intuitively expect - as originally formulated, they were little more than a linked list, the ability to read and modify the linked list, and branching. You don't even need to add all that much to a purely stack-oriented language to make it equivalent to a Turing machine - a second stack will technically do it (although I certainly wouldn't want to program against it), as would a linked list or queue.
Correct me if I'm wrong, but I'd think that establishing that you can read from and write to memory, can do branching, and have at least one of those data structures (two stacks, one queue, one linked list, or the equivalent) would be adequate to establish Turing completeness.
Take a look, too, at nested stack automata.
You may also want to look at the Chomsky hierarchy (it seems like you may be floating somewhere in the vicinity of a Type 1 or a Type 2 language).

As others have pointed, if you can simulate any Turing machine, then your language is Turing-complete.
Yet Turing machines, despite their conceptual simplicity and their amenity to mathematical treatment, are not the easiest machines to simulate.
As a shortcut, you can simulate some simple language that has already been proved Turing-complete.
My intuition tells me that a functional language, particularly LISP, might be a good choice. This SO Q&A has pointers to what a minimum Turing-complete LISP looks like.

Related

Is subroutine executed without stack? Justify your answer with valid arguments

Is subroutine executed without stack? Justify your answer with valid arguments
This is more of a homework (or exam) question; particularly the arguments expected will be in the lectures and/or course material.
In practice, many languages do both, but in such a way that it's indistinguishable from always using the stack, because the stack is needed to handle recursion (and, these days, reentrancy), and executing a subroutine without using the stack is treated purely as an optimisation (often, "inlining").
A few very old languages (eg FORTRAN and COBOL) default to not supporting recursion (much less reentrancy), and therefore may or may not use the stack unless a subroutine is specifically marked as "recursive". Whether they use the stack or not for non-recursive subroutines is up to the compiler (and may differ from version to version or even from subroutine to subroutine).
How this maps to the expected answer on your homework (or exam) depends on how these aspects were covered in the lectures and/or course materials; different courses will emphasise different parts, particularly if they deal with a specific programming language (eg. Python vs C/C++/C# vs FORTRAN).

Is there a sandboxable compiled programming language simililar to lua

I'm working on a crowd simulator. The idea is people walking around a city in 2D. Think gray rectangles for the buildings and colored dots for the people. Now I want these people to be programmable by other people, without giving them access to the core back end.
I also don't want them to be able to use anything other than the methods I provide for them. Meaning no file access, internet access, RNG, nothing.
They will receive get events like "You have just been instructed to go to X" or "You have arrived at P" and such.
The script should then allow them to do things like move_forward or how_many_people_are_in_front_of me and such.
Now I have found out that Lua and python are both thousands of times slower than compiled languages (I figured it would be in order of magnitude of 10s times slower), which is way to slow for my simulation.
So heres my question: Is there a programming language that is FOSS, allows me to restrict system access (sandboxing) the entire language to limit the amount of information the script has by only allowing it to use my provided functions, that is reasonably fast, something like <10x slower than Java, where I can send events to objects inside that language with which I can load in new Classes/Objects on the fly.
Don't you think that if there was a scripting language faster than lua and python, then it'd be talked about at least as much as they are?
The speed of a scripting language is rather vague term. Scripting languages essentially are converted to a series of calls to functions written in fast compiled languages. But the functions are usually written to be general with lots of checks and fail-safes, rather than to be fast. For some problems, not a lot of redundant actions stacks up and the script translation results in essentially same machine code as the compiled program would have. For other problems, a person, knowledgeable about the language, might coerce it to translate to essentially same machine code. For other problems the price of convenience stay forever with the script.
If you look at the timings of benchmark tasks, you'll find that there's no consistent winner across them. For one task the language is fastest, for the other it is way behind.
It would make sense to gauge language speed at your task by looking at similar tasks in benchmarks. So, which of those problem maps the closest to yours? My guess would be: none.
Now, onto the question of user programs inside your program.
That's how script languages came to existence in the first place. You can read up on why such a language may be slow for example in SICP.
If you evaluate what you expect people to write in their programs, you might decide, that you don't need to give them whole programming language. Then you may give them a simple set of instructions they can use to describe a few branching decisions and value lookups. Then your own very performant program will construct an object that encompasses the described logic. This tric is described here and there.
However if you keep adding more and more complex commands for users to invoke, you'll just end up inventing your own language. At that point you'll likely wish you'd went with Lua from the very beginning.
That being said, I don't think the snippet below will run significantly different in compiled code, your own interpreter object, or any embedded scripting language:
if event = "You have just been instructed to go to X":
set_front_of_me(X) # call your function
n = how_many_people_are_in_front_of_me() #call to your function
if n > 3:
move_to_side() #call to function provided by you
else:
move_forward() #call to function provided by you
Now, if the users would need to do complex computer-sciency stuff, solve np-problems, do machine learning or other matrix multiplications, then yes, that would be slow, provided someone would actually trouble themselves with implementing that.
If you get to that point, it seem that there are at least some possibilities to sandbox the compiled dlls (at least in some languages). Or you could do compilation of users' code yourself to control the functionality they invoke and then plug it in as a library.

Who first come up with the idea of the call stack?

It's simple yet fast and effective because of the locality property.
You also manage the memory, a finite resource, by adjusting just one pointer.
I think it's a brilliant idea.
Who first come up with the idea of the call stack?
Since when does computers have supporting instructions for stack?
Is there any historically significant paper on it?
As far as I know, computers have used a call stack since its earliest days.
Stacks themselves were first proposed by Alan Turing, dating back to 1946. I believe that stacks were first used as a theoretical concept to define pushdown automatons.
The first article about call stacks I could find was written by Dijkstra on Numerische Mathematik journal, titled "Recursive Programming" (http://link.springer.com/article/10.1007%2FBF01386232).
Also note that the call stack is there mainly because of recursion. It may be difficult to actually know who really got the idea for the call stack in the first place, since it is pretty intuitive that a stack is needed if you want to support recursion. Consider this quote from Expert C Programming - Deep C Secrets, by Peter Van Der Linden:
A stack would not be needed except for recursive calls. If not for
these, a fixed amount of space for local variables, parameters, and
return addresses would be known at compile time and could be allocated
in the BSS. [...] Allowing recursive calls means that we must find a
way to permit multiple instances of local variables to be in existence
at one time, though only the most recently created will be accessed -
the classic specification of a stack.
This is from chapter 6, page 143 / 144 - if you like this kind of stuff, I highly recommend you to read it.
It is easy to understand that a stack is probably the right structure to use when one wants to keep track of the call chain, since function calls on hold will return in a LIFO manner.

Relationship between parsing, highlighting and completion

For some time now I've been thinking about designing a small toy language from scratch, nothing that will "Rule The World", but mostly as an exercise. I realize there is a lot to learn in order to accomplish this.
This question is about three different concepts (parsing, code highlighting and completion) that strike me as extremely similar. Of course, parsing and ASTgen is part of the compilation, while code highlighting and completion is more of a feature of the IDE, yet I wonder what are the similarities and differences.
I need some hints from someone more experienced in this topic. What code can be shared between these concepts and what are the architecture considerations that could help in this sense?
What you want is a syntax-directed structure editor. This is one that combines parsing with AST building and uses the parser to predict what you can type next (either syntax completion), or has a tie to the compiler's last run, so that it can interpret the edit point to see what valid identifiers might come next by inspecting the compiler's symbol table that was last relevant at that point in the code.
The most difficult part is offering the user a seamless experience; she pretty much has to believe she is editing text or (experience with structure editors shows) she will reject it as awkward.
This is a lot of machinery to coordinate and quite a big effort. The good news is that you need a parser anyway for the compiler; if editing also parses, the AST needed by the compiler is essentially available. (Of course you have to worry about batch compiling, too). The compiler has to build a symbol table; so you can use that in the editing completion process. The more difficult news is that the parsers are a lot harder to build; they can't just declare a user-visible syntax error and quit; rather they have to be tolerant of a number of errors extant at the same moment, hold partial ASTs for the pieces, and stitch them together as the errors are removed by the user.
The Berkeley Harmonia people are doing good work in this area. It is well worth your trouble to read some of their papers to get a detailed sense of the problems and one approach to handling them.
THe other major approach people (notably Intentional Programming and XText) seem to be trying are object-oriented editors, where you attach editing actions to each AST node, and associate every point on the screen with an AST node. Then editing actions invoke AST-node specific actions (insert-character, go right, go up, ...) and it can decide how to act and how to modify the screen. Arguably you can make these editors do anything; its a little harder in practice. I've used these editors; they don't feel like text editors. There are some enthusiastic users, but YMMV.
I think you probably ought to choose between trying to build such an editor, vs. trying to define a new langauge. Doing both at once is likely to overwhelm you with troubles.

Thoughts on minimize code and maximize data philosophy

I have heard of the concept of minimizing code and maximizing data, and was wondering what advice other people can give me on how/why I should do this when building my own systems?
Typically data-driven code is easier to read and maintain. I know I've seen cases where data-driven has been taken to the extreme and winds up very unusable (I'm thinking of some SAP deployments I've used), but coding your own "Domain Specific Languages" to help you build your software is typically a huge time saver.
The pragmatic programmers remain in my mind the most vivid advocates of writing little languages that I have read. Little state machines that run little input languages can get a lot accomplished with very little space, and make it easy to make modifications.
A specific example: consider a progressive income tax system, with tax brackets at $1,000, $10,000, and $100,000 USD. Income below $1,000 is untaxed. Income between $1,000 and $9,999 is taxed at 10%. Income between $10,000 and $99,999 is taxed at 20%. And income above $100,000 is taxed at 30%. If you were write this all out in code, it'd look about as you suspect:
total_tax_burden(income) {
if (income < 1000)
return 0
if (income < 10000)
return .1 * (income - 1000)
if (income < 100000)
return 999.9 + .2 * (income - 10000)
return 18999.7 + .3 * (income - 100000)
}
Adding new tax brackets, changing the existing brackets, or changing the tax burden in the brackets, would all require modifying the code and recompiling.
But if it were data-driven, you could store this table in a configuration file:
1000:0
10000:10
100000:20
inf:30
Write a little tool to parse this table and do the lookups (not very difficult, right?) and now anyone can easily maintain the tax rate tables. If congress decides that 1000 brackets would be better, anyone could make the tables line up with the IRS tables, and be done with it, no code recompiling necessary. The same generic code could be used for one bracket or hundreds of brackets.
And now for something that is a little less obvious: testing. The AppArmor project has hundreds of tests for what system calls should do when various profiles are loaded. One sample test looks like this:
#! /bin/bash
# $Id$
# Copyright (C) 2002-2007 Novell/SUSE
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as
# published by the Free Software Foundation, version 2 of the
# License.
#=NAME open
#=DESCRIPTION
# Verify that the open syscall is correctly managed for confined profiles.
#=END
pwd=`dirname $0`
pwd=`cd $pwd ; /bin/pwd`
bin=$pwd
. $bin/prologue.inc
file=$tmpdir/file
okperm=rw
badperm1=r
badperm2=w
# PASS UNCONFINED
runchecktest "OPEN unconfined RW (create) " pass $file
# PASS TEST (the file shouldn't exist, so open should create it
rm -f ${file}
genprofile $file:$okperm
runchecktest "OPEN RW (create) " pass $file
# PASS TEST
genprofile $file:$okperm
runchecktest "OPEN RW" pass $file
# FAILURE TEST (1)
genprofile $file:$badperm1
runchecktest "OPEN R" fail $file
# FAILURE TEST (2)
genprofile $file:$badperm2
runchecktest "OPEN W" fail $file
# FAILURE TEST (3)
genprofile $file:$badperm1 cap:dac_override
runchecktest "OPEN R+dac_override" fail $file
# FAILURE TEST (4)
# This is testing for bug: https://bugs.wirex.com/show_bug.cgi?id=2885
# When we open O_CREAT|O_RDWR, we are (were?) allowing only write access
# to be required.
rm -f ${file}
genprofile $file:$badperm2
runchecktest "OPEN W (create)" fail $file
It relies on some helper functions to generate and load profiles, test the results of the functions, and report back to users. It is far easier to extend these little test scripts than it is to write this sort of functionality without a little language. Yes, these are shell scripts, but they are so far removed from actual shell scripts ;) that they are practically data.
I hope this helps motivate data-driven programming; I'm afraid I'm not as eloquent as others who have written about it, and I certainly haven't gotten good at it, but I try.
In modern software the line between code and data can become awfully thin and blurry, and it is not always easy to tell the two apart. After all, as far as the computer is concerned, everything is data, unless it is determined by existing code - normally the OS - to be otherwise. Even programs have to be loaded into memory as data, before the CPU can execute them.
For example, imagine an algorithm that computes the cost of an order, where larger orders get lower prices per item. It is part of a larger software system in a store, written in C.
This algorithm is written in C and reads a file that contains an input table provided by the management with the various per-item prices and the corresponding order size thresholds. Most people would argue that a file with a simple input table is, of course, data.
Now, imagine that the store changes its policy to some sort of asymptotic function, rather than pre-selected thresholds, so that it can accommodate insanely large orders. They might also want to factor in exchange rates and inflation - or whatever else the management people come up with.
The store hires a competent programmer and she embeds a nice mathematical expression parser in the original C code. The input file now contains an expression with global variables, functions such as log() and tan(), as well as some simple stuff like the Planck constant and the rate of carbon-14 degradation.
cost = (base * ordered * exchange * ... + ... / ...)^13
Most people would still argue that the expression, even if not as simple as a table, is in fact data. After all it is probably provided as-is by the management.
The store receives a large amount of complaints from clients that became brain-dead trying to estimate their expenses and from the accounting people about the large amount of loose change. The store decides to go back to the table for small orders and use a Fibonacci sequence for larger orders.
The programmer gets tired of modifying and recompiling the C code, so she embeds a Python interpretter instead. The input file now contains a Python function that polls a roomfull of Fib(n) monkeys for the cost of large orders.
Question: Is this input file data?
From a strict technical point, there is nothing different. Both the table and the expression needed to be parsed before usage. The mathematical expression parser probably supported branching and functions - it might not have been Turing-complete, but it still used a language of its own (e.g. MathML).
Yet now many people would argue that the input file just became code.
So what is the distinguishing feature that turns the input format from data into code?
Modifiability: Having to recompile the whole system to effect a change is a very good indication of a code-centric system. Yet I can easily imagine (well, more like I have actually seen) software that has been designed incompetently enough to have e.g. an input table built-in at compile time. And let's not forget that many applications still have icons - that most people would deem data - built in their executables.
Input format: This is the - in my opinion, naively - most common factor that people consider: "If it is in a programming language then it is code". Fine, C is code - you have to compile it after all. I would also agree that Python is also code - it is a full blown language. So why isn't XML/XSL code? XSL is a quite complex language in its own right - hence the L in its name.
In my opinion, none of these two criteria is the actual distinguishing feature. I think that people should consider something else:
Maintainability: In short, if the user of the system has to hire a third party to make the expertise needed to modify the behaviour of the system available, then the system should be considered code-centric to a degree.
This, of course, means that whether a system is data-driven or not should be considered at least in relation to the target audience - if not in relation to the client on a case-by-case basis.
It also means that the distinction can be impacted by the available toolset. The UML specification is a nightmare to go through, but these days we have all those graphical UML editors to help us. If there was some kind of third-party high-level AI tool that parses natural language and produces XML/Python/whatever, then the system becomes data-driven even for far more complex input.
A small store probably does not have the expertise or the resources to hire a third party. So, something that allows the workers to modify its behaviour with the knowledge that one would get in an average management course - mathematics, charts etc - could be considered sufficiently data-driven for this audience.
On the other hand, a multi-billion international corporation usually has in its payroll a bunch of IT specialists and Web designers. Therefore, XML/XSL, Javascript, or even Python and PHP are probably easy enough for it to handle. It also has complex enough requirements that something simpler might just not cut it.
I believe that when designing a software system, one should strive to achieve that fine balance in the used input formats where the target audience can do what they need to, without having to frequently call on third parties.
It should be noted that outsourcing blurs the lines even more. There are quite a few issues, for which the current technology simply does not allow the solution to be approachable by the layman. In that case the target audience of the solution should probably be considered to be the third party to which the operation would be outsourced to.
That third party can be expected to employ a fair number of experts.
One of five maxims under the Unix Philosophy, as presented by Rob Pike, is this:
Data dominates. If you have chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
It is often shortened to, "write stupid code that uses smart data."
Other answers have already dug into how you can often code complex behavior with simple code that just reacts to the pattern of its particular input. You can think of the data as a domain-specific language, and of your code as an interpreter (maybe a trivial one).
Given lots of data you can go further: the statistics can power decisions. Peter Norvig wrote a great chapter illustrating this theme in Beautiful Data, with text, code, and data all available online. (Disclosure: I'm thanked in the acknowledgements.) On pp. 238-239:
How does the data-driven approach compare to a more traditional software development
process wherein the programmer codes explicit rules? ... Clearly, the handwritten rules are difficult to develop and maintain. The big
advantage of the data-driven method is that so much knowledge is encoded in the data,
and new knowledge can be added just by collecting more data. But another advantage is
that, while the data can be massive, the code is succinct—about 50 lines for correct, compared to over 1,500 for ht://Dig’s spelling code. ...
Another issue is portability. If we wanted a Latvian spelling-corrector, the English
metaphone rules would be of little use. To port the data-driven correct algorithm to another
language, all we need is a large corpus of Latvian; the code remains unchanged.
He shows this concretely with code in Python using a dataset collected at Google. Besides spelling correction, there's code to segment words and to decipher cryptograms -- in just a couple pages, again, where Grady Booch's book spent dozens without even finishing it.
"The Unreasonable Effectiveness of Data" develops the same theme more broadly, without all the nuts and bolts.
I've taken this approach in my work for another search company and I think it's still underexploited compared to table-driven/DSL programming, because most of us weren't swimming in data so much until the last decade or two.
In languages in which code can be treated as data it is a non-issue. You use what's clear, brief, and maintainable, leaning towards data, code, functional, OO, or procedural, as the solution requires.
In procedural, the distinction is marked, and we tend to think about data as something stored in an specific way, but even in procedural it is best to hide the data behind an API, or behind an object in OO.
A lookup(avalue) can be reimplemented in many different ways during its lifetime, as long as its starts as a function.
...All the time I desing programs for nonexisting machines and add: 'if we now had a machine comprising the primitives here assumed, then the job is done.'
... In actual practice, of course, this ideal machine will turn out not to exist, so our next task --structurally similar to the original one-- is to program the simulation of the "upper" machine... But this bunch of programs is written for a machine that in all probability will not exist, so our next job will be to simulate it in terms of programs for a next lower level machine, etc., until finally we have a program that can be executed by our hardware...
E. W. Dijkstra in Notes on Structured Programming, 1969, as quoted by John Allen, in Anatomy of Lisp, 1978.
When I think of this philosophy which I agree with quite a bit, the first thing that comes to mind is code efficiency.
When I'm making code I know for sure it isn't always anything close to perfect or even fully knowledgeable. Knowing enough to get close to maximum efficiency out of a machine when it is needed and good efficiency the rest of the time (perhaps trading off for better workflow) has allowed me to produce high quality finished products.
Coding in a data driven way, you end up using code for what code is for. To go and 'outsource' every variable to files would be foolishly extreme, the functionality of a program needs to be in the program and the content, settings and other factors can be managed by the program.
This also allows for much more dynamic applications and new features.
If you have even a simple form of database, you are able to apply the same functionality to many states. You may also do all manner of creative things like changing the context of what your program is doing based on file header data or perhaps directory, file name or extension, though not all data is necessarily stored on a filesystem.
Finally keeping your code in a state where it is simply handling data puts you in a state of mind where you are closer to envisioning what is actually going on. This also keeps the bulk out of your code, greatly reducing bloatware.
I believe it makes code more maintainable, more flexible and more efficient aaaand I like it.
Thank you to the others for your input on this as well! I found it very encouraging.

Resources