Is there a programming language with semantics close to English? - parsing

Most languages allow to 'tweek' to certain extend parts of the syntax (C++,C#) and/or semantics that you will be using in your code (Katahdin, lua). But I have not heard of a language that can just completely define how your code will look like. So isn't there some language which already exists that has such capabilities to override all syntax & define semantics ?
Example of what I want to do is basically from the C# code below:
foreach(Fruit fruit in Fruits)
{
if(fruit is Apple)
{
fruit.Price = fruit.Price/2;
}
}
I want do be able to to write the above code in my perfect language like this:
Check if any fruits are Macintosh apples and discount the price by 50%.
The advantages that come to my mind looking from a coder's perspective in this "imaginary" language are:
It's very clear what is going on (self descriptive) - it's plain English after all even kid would understand my program
Hides all complexities which I have to write in C#. But why should I care to learn that
if statements, arithmetic operators etc since there are already implemented
The disadvantages that I see for a coder who will maintain this program are:
Maybe you would express this program differently from me so you may not get all the
information that I've expressed in my sentence
Programs can be quite verbose and hard to debug but if possible to even proximate this type of syntax above maybe more people would start programming right? That would be amazing I think. I can go to work and just write an essay to draw a square on a winform like this:
Create a form called MyGreetingForm. Draw a square with in the middle of
MyGreetingFormwith a side of 100 points. In the middle of the square write "Hello! Click here to continue" in Arial font.
In the above code the parser must basically guess that I want to use
the unnamed square from the previous sentence, it'd be hard to write such a smart parser I guess, yet it's so simple what I want to do.
If the user clicks on square in the middle of MyGreetingForm show MyMainForm.
In the above code 'basically' the compiler must: 1)generate an event handler 2) check if there is any square in the middle of the form and if there is - 3) hide the form and show another form
It looks very hard to do but it doesn't look impossible IMO to me at least approximate this (I can personally generate a parser to perform the 3 steps above np & it's basically the same that it has to do any way when you add even in c# a.MyEvent=+handler; so I don't see a problem here) so I'm thinking maybe somebody already did something like this ? Or is there some practical burden of complexity to create such a 'essay style' programming language which I can't see ? I mean what's the worse that can happen if the parser is not that good? - your program will crash so you have to re-word it:)

Check out:
The Osmosian Order
of Plain English Programmers
Code Example:
The background is a picture.
A button has a box and a name.
To clear the status:
Clear the status' string.
Show everything.
To create the background:
Draw the screen's box with the white color.
Loop.
Pick a spot anywhere in the screen's box.
Pick a color between the lightest gray color and the white color.
Dab the color on the spot.
If a counter is past 80000, break.
If the counter is evenly divisible by 1000, refresh the screen.
Repeat.
Extract the background given the screen's box. \or Create the background from the screen. Or something.

Some Interactive fiction designers use a language syntax extremely close to the English language. Here's some Inform 7 code, which you can play online:
The foyer is a room.
The apple is in the foyer. It is edible. The description is "This is a ripe,
green granny smith apple."
The apple core is a thing. The description is "This apple core all that is
left of that granny smith apple you just consumed."
After eating the apple:
now the apple core is in the player;
say "You gobble down the apple careful not to eat any of those cyanide-
laced seeds you heard about."
I tutored a course that used Inform 7. One of the tutors had the impression the assignment was to design, not write a game. So he marked the programs by reading them, without realising they were actual programs.

I don't think that this would be an easy task nor do I think it is going to make life easier for debugging
How would you deal with these issues?
spelling mistakes
different dialects in different parts of world
different dialects in the same part of the world
synonyms
which part of sentence do you parse first?
tear (rip) and tear (from eye) both words spellings are the same but mean two different things.
Bring back COBOL or can you remember "Walk West", "Examine Door", "Push Door", "Open Door", "Use key on door" :)
edit - how would you strongly type this?

I have written an extensible English-to-Python compiler called EngScript, which converts structured English into working Python code.
This is an example of EngScript code:
print{create a string from the file called "README.txt"}
print{save the string "Woohoo!" to a file called "ExampleText.txt"}
print{the first 3 letters of "EngScript"}
This is the output that was generated by the EngScript compiler:
print(pythonFunctions.stringFromTextFile("README.txt"))
print(pythonFunctions.writeStringToFile("ExampleText.txt", "Woohoo!"))
print("EngScript"[0:(3 - 1)+1])

LiveCode!
There are a few "natural language", high-level, English-like programming or scripting languages. Probably all of them were inspired by the oldest, COBOL. My personal favorite of these languages is LiveCode. LiveCode is a decendent of MetaCard, a Linux clone of Apple’s now defunct HyperCard that used an English-like scripting language called HyperTalk, which was inspired by SmallTalk, and in turn inspired JavaScript (as well as the entire World-Wide-Web). HyperTalk was the basis for another English-Like scripting language called AppleScript (and later AppleScriptObjC), which still comes with macOS to this very day. LiveCode uses a language called LiveCodeScript, or LCS which, like other HyperCard clones and that have existed over the years (SuperCard, Adobe’s Lingo/Flash ActionScript, Open Xion, Oracle’s Toolbook, etc.), is very similar to HyperTalk at it’s core, often referred to as an X-Talk language. LiveCode has several advantages; it’s very much still in production, it has a dual license (open source and commercial versions), the engine is cross-platform (Mac, Win, Linux, HTML5, iOS, Android, and a server version), and like HyperCard it is also a GUI toolkit and it is extensible. The LiveCode team is currently working on new a lower-level programming language called LiveCode Builder, or LCB. LCB is also an English-like, although LCB is a bit less readable than LCS, it has a goals of having capabilities on par with lower-level languages like C++, Objective C, etc., allowing for extending the LiveCode platform with code libraries and frameworks produced by other programming language libraries, and ultimately allowing for the LiveCode IDE to be written in it’s own language.

Try using the programming language called 'Google' - it has a natural English interface and your code fragment throws back all the answers you are suggesting. Interestingly just six minutes after you asked this question, this very page is #1 for the query:
Check if any fruits are Macintosh
apples and discount the price by 50%
Use the Google API and I think you have the basis of a natural English programming language.

Related

Detect when to use a vs an

I have a service that allows user's (admins) to change the terminology the site uses. My designer wants me to use the format "A Group". The problem is, for some terminology, it should be "An" not "A".
Is there any way to reliably detect which to use? What about localization?
I can brute force it and get 90% of the way by checking the first letter for consonant vs vowel. That won't work for all words though. And that doesn't cover any language except English.
In my opinion you've got only 2 ways:
1- You need to check the first letter and process all the sentence by checking its letters to see if there is any non-English letters.
2- Provide a dictionary of English nouns then you can easily check your word to find if it needs an "a" or "an".
Although the "a versus an" issue is very specific, what you're describing here is a natural language processing issue. Essentially you are being asked to write code that generates a grammatically correct piece of text.
I think you should try to to explain the implications to the designer, especially if you end up localizing in other languages. Your time is probably better spent working on your app's business logic than on language processing.

Profanity checking for promotional codes

I have a slightly unusual profanity-related question.
Now we're used to dealing with profanity-filtering of user-generated content — any method is imperfect, but products like CleanSpeak and WebPurify do a good-enough job.
The problem we have at the moment, though, is that we've been building an engine to run promotional-code–based competitions, that will be used internationally. We could do with checking that none of these codes is profane in Latin American Spanish or Malay (at least in the first instance), to make sure we don't send out a code that's equivalent to FUCK23 or PEN15 or something.
We've tried Googling around and asking people we know, but we can't find an easy way of getting hold of an es-419 or an ms profanity list to filter the codes against. As there are literally millions of codes per locale, we'd rather do an offline check than hit an API for each code (which would be expensive both in terms of bandwidth and usage fees).
I know this is a bit of a long shot, but does anyone know of a good source for profanity lists in different languages?
#disclaim: We know that no profanity filtering is perfect, that it's essentially futile with user-generated content and we have read SO #273516: How do you implement a good profanity filter? — that's not what we're asking.
Building or finding lists in other languages is extremely time consuming and difficult (trust me, we've built many of them at Inversoft). You might be better off tweaking the code generators instead (from what I could tell your code is generating the promotional codes rather than humans).
The best way to tweak a generator is to ensure that the codes can't easily form words based on the general use of consonants and vowels in most European languages. Things get a bit dicey in Polish and others, but it usually works.
Generally, most codes that start with a vowel are followed by another vowel or a non-joining consonant (like 'q' without a 'u'). If the code starts with a consonant then the next character is the same consonant or one that has a low probability of being used. For example, if you start with 's' then adding 'g' is a good choice.
You could also use wiktionary or other similar sources (like Linux dictionary files) to build a statistical approach to this. By extracting the probability of characters being next to each other, you should be able to generate codes with good accuracy of never being words in any language.
However, if I misread your question and you aren't generating the codes programmatically, you can ignore my response completely. :)
I have had the same thoughts. in trying to generate 6 character codes for a project i am doing.
I decided to reduce the likelyhood of obvious porfain codes So i removed the vowels that i found in as many "bad" words as i could think of, from my intial base 36 generation code. Leaving me with something more like a base 28 system that did not include a,e,i,o,u, 1,0. the one and zero were removed to reduce confusion between those characters in some fonts with I,L,O's
so far I have not seen a "profain" code genreated. Although base 28 has 1.something billion unique combinations.
i cannot vouch for other languages, and had not even considered it...

Crowdsourcing translation for mobile developers?

I am developing applications for mobile phones with different operating systems (Android, Symbian, iPhone). Applications are sold internationally so they need to be translated to different languages in addition to english version.
I assume most mobile developers do the translations using some paid external service each time. This approach does not look very cost-effective to me. Would it make sense to have a website where simple translations would be done using crowdsourcing (other developers)? Most strings in mobile applications are very simple and short, for example "OK, "Cancel", "Are you sure?", "Please enter your password". Also the same strings are used in hundreds of applications. Instead of paying for translating all strings, developers could save money by only buying their difficult application specific translations.
Does anyone agree with this idea? I have seen many opensource projects doing the translations succesfully using volunteers.
I just found solution for me. Many users find this question in Google so I think my post must be helpful:
This is solution for us: crowdin.com - agile localization solution for tech companies
Microsoft allows you to view their terminology database: https://www.microsoft.com/Language/en-US/Default.aspx
That covers about 90 languages and will get you the things you mention such as common button captions, etc.
The problem you are facing after that is to try to get only the strings translated that you want. Most translators are going to charge you a minimum number of words. And they are going to want the entire resource file (regardless if you translated them yourself or not). Makes sense because localizing a product means that they need to have the whole picture to ensure consistency, etc. Professional translators will probably not charge you for what they call 100% matches.
I would never ever trust the translation of my product to crowd sourcing. Ever. You get what you pay for. Besides, just because you speak a language natively doesn't mean that you can write well, etc.
How do you check the crowd sourcing translation results for accuracy and quality? In a famous and documented occurrence recently the phrase "No lorries by this route please use the main road" was translated into "We are out of the office until Monday please contact us again then" and turned into road signs that were erected.
Crowd sourcing translation has been used and FaceBook is probably the largest company i know of that tried/used it. I have not tracked their progress but you could investigate it to see it's success or otherwise. Their method of quality checking was to get other people using the translations to vote for the one they preferred, so this was a case of crowd sourcing quality control. At this point the proposal that a camel is a horse designed by a committee jumps unbidden into my mind.
Translation, in spite of all the machine pumped into it, is still more of an art than a science. To translate correctly you need to have a native speaker translating from another language into their own. So for English to German you need a native German speaker who can speak English very well to do it. Within the profession very, very few translators will translate to a language in which they are non native. The reasons for this are many but boil down to the colloquial nature of language.
To be positive you could look at how Facebook fared and follow that route. Another route would be to approach not translators, but a translation agency, there are quite a number of these. Present them with the whole corpus you want translating in the original English and get them to quote you for the whole job. This would mean someone else manhging the job and the quality and they may have shortcuts, especially if the translations are to fairly standard "computerese" type phrases. i.e.'Home', 'Back', 'Next', 'Click here' etc.

Reverse engineering and patching a DirectX game?

Background
I am playing Imperishable Night, one of the Touhou series of games. The shoot button is 'z', moving slower is 'shift', and the arrow keys move. Unfortunately for me, using shift-z ghosts my right arrow key, so I can't move to the right while shooting. This ghosting happens in all applications, and switching keyboards fixes it.
Goal
I want to locate in the disassembled code the directx function that gets the keyboard input and compares it against the 'z' key, and change that key to 'a'. I'm considering this a fun project. Assuming the size of the scan codes are the same, this should be fairly simple. And because the executable is only 400k, maybe this will provide a unique opportunity for me to explore the dark side of the computing underworld (kidding).
Relevant experience
I have some experience with coding in assembly, but not in the disassembly of such. I have no experience with the DirectX apis.
Question
I need some guidance. I've found a listing of directx keyboard scan codes, and a program called PEExplorer that looks like it will do what I need.
Is there a means by which I can turn some of the assembly with C function calls so it's more easily read? I will need to locate where the game retrieves the currently pressed keys, compares those against a list, and it's that list I need to modify.
Any input would be greatly appreciated.
You might be interested in the Detours library from Microsoft Research, it allows you to hook function calls and alter their arguments. Doing that your code can change the scan codes of keys that don't ghost into the ones the game expects.

free to use, in a programmer-friendly format, dictionaries for european languages

I want to experiment with an idea I have of automatically localizing software, or at least suggesting a reasonable translation if a localized string is not available.
I'm not sure this will be working satisfactorily tomorrow morning but I just wanted to play with this idea.
Does anybody know of a dictionary that is free to use, and is in an easy to parse format, that can help me automatically translate words from English to other European languages (French, German, Spanish, etc)
The FreeDict project has quite a few relatively complete dictionaries. Most are from one language to english or vice versa, but some are between two non-english languages as well.
I don't know any dictionary but would like to point something out. You have to bear in mind that translating is not a direct word to word technique in any sense. The Rules of the language change as well and thus leave sentences unreadable. This is why even companies like Google have trouble making good translation software. Context is very hard to programmatically detect and context means everything in choosing the right word, the right structure and so on.
Maybe use a Translation API, if there is one. Google only seem to do a JavaScript API for Language.
You can't even expect to get a reasonable translation with an automatic method. Translating full texts is too hard for a computer to handle completely correct, translating short phrases correctly is impossible.
Take for example the simple text "Open", without a context it's not even possible to tell if it's a verb or an adjective. I know that at least in german that the verb and the adjective translates into two different words.
Also, computer specific concepts often borrow words from similar concepts outside the computer sphere. Those concepts often have a specific translation, but an automatic translation would sometimes try to translate it as if it was the original meaning, which can give you very strange translations.
After a while of searching i solved the problem by myself start to create my own dictionary. I do a lot of translations in my free time. In the beginning it is really boring work...but after a while you get an really good dicitionary. Some friends of mine using it too...so we all benefit from every new Word we translate.

Resources