Delphi component or library to display mathematical expressions - delphi

I'm looking for a simple component that displays mathematical expressions in Delphi. When I started out I thought it would be easy to find something on the net, but it turns out it was harder than anticipated. There are lots and lots of components that will parse mathematical expressions, but few (none?) that will display them.
Ideally I would like a component as simple as a TLabel, where I could set the caption to some expression and it would be displayed correctly, but some sort of library that let's me draw expressions to a canvas would also be sufficient for my needs.
I'm not talking about plotting graphs of functions or something like that. I want to display (for instance)
like this:
MBo's answer was just what I was looking for. Some people may be put off by the fact that all comments and documentation are in Russian, but don't let that scare you. It was really easy to use.
Installation: Unzip the files (at least "ExprMake.pas" and "ExprDraw.pas") to a directory in your library path. That's it.
Use: I haven't experimented extensively with it, but these few lines demonstrates how easy it is.
procedure TForm1.Button1Click(Sender: TObject);
vExprC : TExprClass;
vExprB : TExprBuilder;
vExprB := TExprBuilder.Create;
vExprC := vExprB.BuildExpr('(X^2+3)/X');
vExprC.Canvas := Canvas;
vExprC.Font.Size := 50;

Native Delphi module by Anton Grigoriev to draw mathematical expressions. Assistant program - in Russian. This is how it looks.
Addition about credits:
Modules are free. The author asks only to mention (AboutBox etc) that mathematical expressions have been drawn by means of ExprDraw and ExprMake modules, written by Anton Grigoriev
(raw translation from readme.txt)

I don't know of a native Delphi implementation, but maybe this question is helpful to you: How to render a formula in WPF or WinForms. It mentions some C/C# solutions which could possibly translated or used as DLL (see the OP's solution).
Another alternative could be this Formulator ActiveX Control.
Furthermore it may broaden your search results if you use some other search criteria, especially without the "Delphi" keyword. ;-)
renderer, formula, math, MathML, expression, engine, tex, ...
And as we can learn from MBo's answer, it could also be a good idea to search in other languages :-)
delphi математических формул рисования
I'm sure you searched for something like that, but possibly there is one keyword that you have forgotten.

I was looking for a similar component for some time and MBo's solution would be acceptable.
I was convinced that it could be done also in another way: embedding a TWebBrowser and using an exixting javascript renderer for LaTex and MathML formulas, but...
I just tried QDSEquations and I think it's even a better solution!
Delphi component equation editor that allow you to enter and display math formulas of any complexity, from simple Greek symbols to
matrixes and complex integral expressions. You can use the equation
editor in your projects written in the Delphi environment, for
example, in programs testing knowledge of different mathematics fields
(mathematical analysis, discrete mathematics, probability theory and
so on), physics and other.
It’s quite easy to enter formulas in it:
simple symbols are entered similarly to entering data in a text field
special symbols and formula elements are entered with the help of an additional menu
It's better because you can edit formula directly in a "textfield" component with the help of an additional button-menu component and/or using a math expression string and/or using predefined methods.
Hope it helped!

I had the same problem several months ago, I solved it by getting a LaTeX renderer DLL which could be called from Delphi. Then you just called it, giving it the expression as a string, and it returned you a bitmap with the rendered expression in it.
I forgot the name unfortunately :( but you should be able to find it again by looking for "latex dll delphi"?


Determine Cobol coding style

I'm developing an application that parses Cobol programs. In these programs some respect the traditional coding style (programm text from column 8 to 72), and some are newer and don't follow this style.
In my application I need to determine the coding style in order to know if I should parse content after column 72.
I've been able to determine if the program start at column 1 or 8, but prog that start at column 1 can also follow the rule of comments after column 72.
So I'm trying to find rules that will allow me to determine if texts after column 72 are comments or valid code.
I've find some but it's hard to tell if it will work everytime :
dot after column 72, determine the end of sentence but I fear that dot can be in comments too
find the close character of a statement after column 72 : " ' ) }
look for char at columns 71 - 72 - 73, if there is not space then find the whole word, and check if it's a key word or a var. Problem, it can be a var from a COPY or a replacement etc...
I'd like to know what do you think of these rules and if you have any ideas to help me determine the coding style of a Cobol program.
I don't need an API or something just solid rules that I will be able to rely on.
I think you need to know the COBOL compiler for each program. Its documentation should tell you what conventions/configurations/switches it uses to decide if the source code ends at column 72 or not.
So.... which compiler(s)?
And if you think the column 72 issue is a pain, wait till you get around to actually parsing the COBOL itself. If you are not well prepared to handle the lexical issues of the language, you are probably very badly prepared to handle the syntactic ones.
There is no absolutely reliable way to determine if a COBOL program
is in fixed or free format based only on the source code. Heck it is sometimes difficult to identify
the programming language based only on source code. Check out
this classic polyglot - it is valid under 8 different language compilers. That
said, you could try a few heuristics that might yield
the correct answer more often than not.
Compiler directives imbedded in source code
Watch for certain compiler directives that determine code format.
Unfortunately, every compiler vendor uses their own flavour of directive.
For example, Microfocus COBOL uses the
SOURCEFORMAT directive. This directive will appear near the top of the program so a short pre-scan
could be used to find it. On the other hand, OpenCobol uses >>SOURCE FORMAT IS FREE and
>>SOURCE FORMAT IS FIXED to toggle between free and fixed format, different parts of the same program
could be formatted differently!
The bottom line here is that you will have to support the conventions of multiple COBOL compilers.
Compiler switches
Source code format can be also be specified using a compiler switch. In this case, there are no concrete
clues to go on. However, you can be reasonably sure that the entire source program will be either
fixed or free. All you can do here is guess. Unless the programmer is out to "mess with
your head" (and some will), a program in free format will have the keywords IDENTIFICATION DIVISION or ID DIVISION, starting before column 8.
Every COBOL program will begin with these keywords so you can use them as the anchor point for determining code format in the
absence of imbedded compiler directives.
Warning - this is far from fool proof, but might be a good start.
There won't be an algorithm to do this with 100% certainty, because if comments can be anything, they can also be compilable COBOL code. So you could theoretically write a program that means one thing if the comments are ignored, and something else entirely if the comments are treated as part of the COBOL.
But that's extremely unlikely. What's most likely to happen is that if you try to compile the code under the wrong convention, it will simply fail. So the only accurate way to do this is to try compiling/parsing the program one way, and if you come to a line that can't make sense, switch to the other style. You could also support passing an argument to the compiler when the style is already known.
You can try using heuristics like what you've described, but that will never be totally accurate. The most they can give you is a probability that the code is one or the other style, which will increase as they examine more and more lines of code. They could be useful for helping you guess the style before you start compiling, or for figuring out when the problem is really just a typo in the code.
Regarding ideas for heuristics, it's hard to say. If there were a standard comment sigil like // or # in other languages, this would be a lot easier (actually, there is, but it sounds like your code doesn't follow this convention). The only thing I can think of would be to check whether every line (or maybe 99% of lines, and not counting empty lines or lines commented with *) has a period somewhere before position 72.
One thing you DON'T want to do is apply any heuristics to the part after position 72. That is, you don't want to be checking the comments to see if they're valid COBOL. You want to check what you know is COBOL first, and see if that works by itself. There are several reasons for this:
Comments written in English are likely to have periods and quotes in them, so your first and second bullet points are out.
Natural languages are WAY harder to parse than something like COBOL.
The comments could easily have COBOL in them (maybe someone commented out the previous version of the line).
An important rule for comments is that they should never affect what the program does. If changing the comments can change how the program is compiled, you violate that.
All that in mind, my opinion is that you shouldn't use heuristics at all. You should always try to compile the program under both conventions unless one is explicitly specified. There's a chance that code will compile successfully under both conventions, and then you'll have two different programs and no way to tell which one is correct.
If that happens, you need to compare the two results (perhaps with a hash or something) to see if they're the same program. If they're the same, great, but if not, you'll need to force the user to explicitly choose a convention.
Most COBOL compilers will allow you to generate and analyze the post text manipulation phase.
The text preprocessor output can be seen (using OpenCOBOL for the example)
cobc -E program.cob
The text manipulation processor deals with any COPY ... REPLACING compiler directives, as well as converting SOURCE FORMAT IS FIXED (with line continuations, string literal concatenations, comment line removal, among other things) to the actual free format that the compiler lexical analyzer needs. A lot of the OpenCOBOL toolkits (Cross referencer and Animator, to name two) use source code AFTER the preprocessor pass. I don't think you'll lose any street cred if your parser program relies on post processed source code files.

How do I compare unicode strings containing non-english characters to sort alpabetically?

I am trying to sort array/lists/whatever of data based upon the unicode string values in them which contain non-english characters, I want them sorted correctly alphabetically.
I have written a lot of code (D2010, win XP), which I thought was pretty solid for future internationalisation, but it is not. Its all using unicodestring (string) data type, which up until now I have just been putting english characters into the unicode strings.
It seems I have to own up to making a very serious unicode mistake. I talked to my German friend, and tried out some German ß's, (ß is 'ss' and should come after S and before T in alphabet) and and ö's etc (note the umlaut) and none of my sorting algorithms work anymore. Results are very mixed up. Garbage.
Since then I have been reading up extensively and learnt a lot of unpleasant things with regards to unicode collation. Things are looking grim, much grimmer than I ever expected, I have seriously messed this up. I hope I am missing something and things are not actually quite as grim as they appear at present. I have been tinkering around looking at windows api calls (RtlCompareUnicodeString) with no success (protection faults), I could not get it to work. Problem with API calls I learnt is that they change on various newer windows platforms, and also with delphi going cross plat soon, with linux later, my app is client server so I need to be concerned about this, but tbh with the situation being what is it (bad) I would be grateful for any forward progress, ie win api specific.
Is using win api function RtlCompareUnicodeString to obvious solution? If so I should really try again with that but tbh I have been taken aback by all of the issues involved with unicode collation and I not clear at all what I should be doing to compare these strings this way anyway.
I learnt of the IBM ICU c++ opensource project, there is a delphi wrapper for it albeit for an older version of ICU. It seems a very comprehensive solution which is platform independant. Surely I cannot be looking at creating a delphi wrapper for this (or updating the existing one) to get a good solution for unicode collation?
I would be extremely glad to hear advice at two levels :-
A) A windows specific non portable solution, I would be glad off that at the moment, forget the client server ramifications!
B) A more portable solution which is immune from the various XP/vista/win7 variations of unicode api functions, therefore putting me in good stead for XE2 mac support and future linux support, not to mention the client server complications.
Btw I dont really want to be doing 'make-do' solutions, scanning strings prior to comparison and replacing certain tricky characters etc, which I have read about. I gave the German examplle above, thats just an example, I want to get it working for all (or at least most, far east, russian) languages, I don't want to do workarounds for a specific language or two. I also do not need any advice on the sorting algorithms, they are fine, its just the string comparison bit that's wrong.
I hope I am missing/doing something stupid, this all looks to be a headache.
Thank you.
EDIT, Rudy, here is how I was trying to call RtlCompareUnicodeString. Sorry for the delay I have been having a horrible time with this.
program Project26
procedure RtlInitUnicodeString(
SourceString:pwidechar) stdcall; external 'NTDLL';
function RtlCompareUnicodeString(
):integer stdcall; external 'NTDLL';
RtlInitUnicodeString(#k, pwidechar(x));
RtlInitUnicodeString(#l, pwidechar(y));
I realise this is most likely wrong, I am not used to calling api unctions directly, this is my best guess.
About your StringCompareEx api function. That looked really good, but is avail on Vista + only, I'm using XP. StringCompare is on XP, but that's not Unicode!
To recap, the basic task afoot, is to compare two strings, and to do so based on the character sort order specified in the current windows locale.
Can anyone say for sure if ansicomparetext should do this or not? It don't work for me, but others have said it should, and other things i have read suggest it should.
This is what I get with 31 test strings when using AnsiCompareText when in German Locale (space delimited - no strings contain spaces) :-
arß Asß asß aßs no nö ö ön oo öö oöo öoö öp pö ss SS ßaß ßbß sß Sßa
Sßb ßß ssss SSSS ßßß ssßß SSßß ßz ßzß z zzz
I am still keen to hear if I should expect AnsiCompareText to work using the locale info, as lkessler has said so, and lkessler has also posted about these subjects before and seems have been through this before.
However, following on from Rudy's advice I have also been checking out CompareStringW - which shares the same documentation with CompareString, so it is NOT non-unicode as I have stated earlier.
Even if AnsiCompareText is not going to work, although I think it should, the win32api function CompareStringW should indeed work. Now I have defined my API function, and I can call it, and I get a result, and no error... but i get the same result everytime regardless of the input strings! It returns 1 everytime - which means less than. Here's my code
function CompareStringW(
):integer stdcall; external 'Kernel32.dll';
writeln(r); // result is 1=less than, 2=equal, 3=greater than
I feel I am getting somewhere now after much pain. Would be glad to know about AnsiCompareText, and what I am doing wrong with the above CompareStringW api call. Thank you.
Firstly, I fixed the api call to CompareStringW myself, I was passing in #mystring when I should do PString(mystring). Now it all works correctly.
Now, you can imagine my dismay when I still got the same sort result as I did right at the beginning...
arß asß aßs Asß no nö ö ön oo öö oöo öoö öp pö ss SS ßaß ßbß sß Sßa
Sßb ßß ssss SSSS ßßß ssßß SSßß ßz ßzß z zzz
You may also imagine my EXTREME dismay not to mention simultaneous joy when I realised the sort order IS CORRECT, and IT WAS CORRECT RIGHT BACK IN THE BEGGINING! It make sme sick to say it, but there was never any problem in the first place - this is all down to my lack of German knowledge. I beleived the sort was wrong, since you can see above string start with S, then later they start with ß, then s again and back to ß and so on. Well I can't speak German however I could still clearly see that they was not sorted correctly - my German friend told me ß comes after S and before T... I WAS WRONG! What is happening is that string functions (both AnsiCompareText and winapi CompareTextW) are SUBSTITUTING every 'ß' with 'ss', and every 'ö' with a normal 'o'... so if i take those result above and to a search and replace as described I get...
arss asss asss Asss no no o on oo oo ooo ooo op po ss SS ssass ssbss
sss Sssa Sssb ssss ssss SSSS ssssss ssssss SSssss ssz sszss z zzz
Looks pretty correct to me! And it always was.
I am extremely grateful for all the advice given, and extremely sorry to have wasted your time like this. Those german ß's got me all confused, there was never nothing wrong with the built in delphi function or anything else. It just looked like there was. I made the mistake of combining them with normal 's' in my test data, any other letter would have not have created this illusion of un-sortedness! The squiggly ß's have made me look a fool! ßs!
Rudy and lkessler we're both especially helpful, ty both, I have to accept lkessler's answer as most correct, sorry Rudy.
You said you had problems calling Windows API calls yourself. Could you post the code, so people here can see why it failed? It is not as hard as it may seem, but it does require some care. ISTM that RtlCompareUnicodeStrings() is too low level.
I found a few solutions:
You could use the Windows API function CompareStringEx. This will compare using Unicode specific collation types. You can specify how you want this done (see link). It does require wide strings, i.e. PWideChar pointers to them. If you have problems calling it, give a holler and I'll try to add some demo code.
More or less portable
To make this more or less portable, you could write a function that compares two strings and use conditional defines to choose the different comparison APIs for the platform.
Try using CompareStr for case sensitive, or CompareText for case insensitive if you want your sorts exactly the same in any locale.
And use AnsiCompareStr for case sensitive, or AnsiCompareText for case insensitive if you want your sorts to be specific to the locale of the user.
See: How can I get TStringList to sort differently in Delphi for a lot more information on this.
In Unicode the numeric order of the characters is certainly not the sorting sequence. AnsiCompareText as mentioned by HeartWare does take locale specifics into consideration when comparing characters, but, as you found out, does nothing wrt the sorting order. What you are looking for is called the collation sequence of a language, which specifies the alphabetic sorting order for a language taking diacritics etc into consideration. They were sort of implied in the old Ansi Code pages, though those didn't account for sorting difference between languages using the same character set either.
I checked the D2010 docs. Apart from some TIB* components I didn't find any links. C++ builder does seem to have a compare function that takes collation into account, but that's not much use in Delphi. There you will probably have to use some Windows' API functions directly.
Sorting collate all out:
Collation terminology: (though that pertains to MS SQL 2005, it may be helpful)
The 'Sorting "Collate" all out' article is by Michael Kaplan, someone who has great in-depth knowledge of all things Unicode and all intricacies of various languages. His blog has been invaluable to me when porting from D2006 to D2009.
Have you tried AnsiCompareText ? Even though it is called "Ansi", I believe it calls on to an OS-specific Unicode-able comparison routine...
It should also make you safe from cross-platform dependencies (provided that Embarcadero supplies a compatible version in the various OS's they target).
I do not know how good the comparison works with the various strange Unicode ways to encode strings, but try it out and let us know the result...

I've heard that LaTeX is Turing complete. Are there any programs written in LaTeX?

It's possible to do interesting things with what would ordinarily be thought of as typesetting languages. For example, you can construct the Mandelbrot set using postscript.
It is suggested in this MathOverflow question that LaTeX may be Turing-complete. This implies the ability to write arbitrary programs (although it may not be easy!). Does anyone know of any concrete example of such a program in LaTeX, which does something highly unusual with the language?
In issue 13 of The Monad Reader, Stephen Hicks writes about implementing the solution to an ICFP contest (involving Mars rover navigation) in TeX, with copious use of macros. Amusingly, the solution's output when typeset is a postscript map of the rover's path.
Alternatively, Andrew Greene wrote a BASIC interpreter in TeX (more details). This may count as slightly perverse.
The pgfmath library still amazes me. But on a more Turing-related note: it is possible to write an actual Turing machine in TeX, as per It's just a nifty way of using expansions in TeX.
PostScript is Turing complete as well, if you'll read the manual you'll be amazed by the general programming capabilities of it (at least, I was).
I'm not sure if this qualifies as programming per se, but I've recently starting doing something a bit like Object Oriented stuff in LaTeX. (You don't need to know any maths to follow the following.) In recent papers, I've been writing about categories, which have objects and morphisms. Since there've been quite a few of those, I wanted a consistent style so that, say, 𝒞 was a category with typical object C and typical morphism c. Then I'd also have 𝒟 with D and d. So I define a "class", say "category" (you need to be a mathematician to understand the joke there), and declare that C is an instance of this class, and then have access to \ccat, \cobj, \cmor and so forth. The reason for not doing \cat{c}, \obj{c}, and \mor{c}, and so forth, is that sometimes these categories have special names and so after declaring the instance, I can modify it's name very easily (simply redefine \ccat - well, actually \mathccat since \ccat is a wrapper which selects \mathccat in math mode and \textccat in text mode). (Of course, it's a little more complicated than the above suggests and the OO stuff really comes in useful when I want to define a new category as a variant of an old one (it can even deal with the case where the old one doesn't exist yet.).)
Although it may not qualify as actual programming, I am using it in papers and do find it useful - the other answers (so far) have more of the feel of showing off the capabilities of LaTeX than of a sensible solution to a practical problem.
I know of someone who wrote the answer to an ACM contest problem in LaTeX.

Parsing Source Code - Unique Identifiers for Different Languages? [closed]

I'm building an application that receives source code as input and analyzes several aspects of the code. It can accept code from many common languages, e.g. C/C++, C#, Java, Python, PHP, Pascal, SQL, and more (however many languages are unsupported, e.g. Ada, Cobol, Fortran). Once the language is known, my application knows what to do (I have different handlers for different languages).
Currently I'm asking the user to input the programming language the code is written in, and this is error-prone: although users know the programming languages, a small percentage of them (on rare occasions) click the wrong option just due to recklessness, and that breaks the system (i.e. my analysis fails).
It seems to me like there should be a way to figure out (in most cases) what the language is, from the input text itself. Several notes:
I'm receiving pure text and not file names, so I can't use the extension as a hint.
The user is not required to input complete source codes, and can also input code snippets (i.e. the include/import part may not be included).
it's clear to me that any algorithm I choose will not be 100% proof, certainly for very short input codes (e.g. that could be accepted by both Python and Ruby), in which cases I will still need the user's assistance, however I would like to minimize user involvement in the process to minimize mistakes.
If the text contains "x->y()", I may know for sure it's C++ (?)
If the text contains "public static void main", I may know for sure it's Java (?)
If the text contains "for x := y to z do begin", I may know for sure it's Pascal (?)
My question:
Are you familiar with any standard library/method for figuring out automatically what the language of an input source code is?
What are the unique code "tokens" with which I could certainly differentiate one language from another?
I'm writing my code in Python but I believe the question to be language agnostic.
Vim has a autodetect filetype feature. If you download vim sourcecode you will find a /vim/runtime/filetype.vim file.
For each language it checks the extension of the file and also, for some of them (most common), it has a function that can get the filetype from the source code. You can check that out. The code is pretty easy to understand and there are some very useful comments there.
build a generic tokenizer and then use a Bayesian filter on them. Use the existing "user checks a box" system to train it.
Here is a simple way to do it. Just run the parser on every language. Whatever language gets the farthest without encountering any errors (or has the fewest errors) wins.
This technique has the following advantages:
You already have most of the code necessary to do this.
The analysis can be done in parallel on multi-core machines.
Most languages can be eliminated very quickly.
This technique is very robust. Languages that might appear very similar when using a fuzzy analysis (baysian for example), would likely have many errors when the actual parser is run.
If a program is parsed correctly in two different languages, then there was never any hope of distinguishing them in the first place.
I think the problem is impossible. The best you can do is to come up with some probability that a program is in a particular language, and even then I would guess producing a solid probability is very hard. Problems that come to mind at once:
use of features like the C pre-processor can effectively mask the underlyuing language altogether
looking for keywords is not sufficient as the keywords can be used in other languages as identifiers
looking for actual language constructs requires you to parse the code, but to do that you need to know the language
what do you do about malformed code?
Those seem enough problems to solve to be going on with.
One program I know which even can distinguish several different languages within the same file is ohcount. You might get some ideas there, although I don't really know how they do it.
In general you can look for distinctive patterns:
Operators might be an indicator, such as := for Pascal/Modula/Oberon, => or the whole of LINQ in C#
Keywords would be another one as probably no two languages have the same set of keywords
Casing rules for identifiers, assuming the piece of code was writting conforming to best practices. Probably a very weak rule
Standard library functions or types. Especially for languages that usually rely heavily on them, such as PHP you might just use a long list of standard library functions.
You may create a set of rules, each of which indicates a possible set of languages if it matches. Intersecting the resulting lists will hopefully get you only one language.
The problem with this approach however, is that you need to do tokenizing and compare tokens (otherwise you can't really know what operators are or whether something you found was inside a comment or string). Tokenizing rules are different for each language as well, though; just splitting everything at whitespace and punctuation will probably not yield a very useful sequence of tokens. You can try several different tokenizing rules (each of which would indicate a certain set of languages as well) and have your rules match to a specified tokenization. For example, trying to find a single-quoted string (for trying out Pascal) in a VB snippet with one comment will probably fail, but another tokenizer might have more luck.
But since you want to perform analysis anyway you probably have parsers for the languages you support, so you can just try running the snippet through each parser and take that as indicator which language it would be (as suggested by OregonGhost as well).
Some thoughts:
$x->y() would be valid in PHP, so ensure that there's no $ symbol if you think C++ (though I think you can store function pointers in a C struct, so this could also be C).
public static void main is Java if it is cased properly - write Main and it's C#. This gets complicated if you take case-insensitive languages like many scripting languages or Pascal into account. The [] attribute syntax in C# on the other hand seems to be rather unique.
You can also try to use the keywords of a language - for example, Option Strict or End Sub are typical for VB and the like, while yield is likely C# and initialization/implementation are Object Pascal / Delphi.
If your application is analyzing the source code anyway, you code try to throw your analysis code at it for every language and if it fails really bad, it was the wrong language :)
My approach would be:
Create a list of strings or regexes (with and without case sensitivity), where each element has assigned a list of languages that the element is an indicator for:
class => C++, C#, Java
interface => C#, Java
implements => Java
[attribute] => C#
procedure => Pascal, Modula
create table / insert / ... => SQL
etc. Then parse the file line-by-line, match each element of the list, and count the hits.
The language with the most hits wins ;)
How about word frequency analysis (with a twist)? Parse the source code and categorise it much like a spam filter does. This way when a code snippet is entered into your app which cannot be 100% identified you can have it show the closest matches which the user can pick from - this can then be fed into your database.
Here's an idea for you. For each of your N languages, find some files in the language, something like 10-20 per language would be enough, each one not too short. Concatenate all files in one language together. Call this lang1.txt. GZip it to lang1.txt.gz. You will have a set of N langX.txt and langX.txt.gz files.
Now, take the file in question and append to each of he langX.txt files, producing langXapp.txt, and corresponding gzipped langXapp.txt.gz. For each X, find the difference between the size of langXapp.gz and langX.gz. The smallest difference will correspond to the language of your file.
Disclaimer: this will work reasonably well only for longer files. Also, it's not very efficient. But on the plus side you don't need to know anything about the language, it's completely automatic. And it can detect natural languages and tell between French or Chinese as well. Just in case you need it :) But the main reason, I just think it's interesting thing to try :)
The most bulletproof but also most work intensive way is to write a parser for each language and just run them in sequence to see which one would accept the code. This won't work well if code has syntax errors though and you most probably would have to deal with code like that, people do make mistakes. One of the fast ways to implement this is to get common compilers for every language you support and just run them and check how many errors they produce.
Heuristics works up to a certain point and the more languages you will support the less help you would get from them. But for first few versions it's a good start, mostly because it's fast to implement and works good enough in most cases. You could check for specific keywords, function/class names in API that is used often, some language constructions etc. Best way is to check how many of these specific stuff a file have for each possible language, this will help with some syntax errors, user defined functions with names like this() in languages that doesn't have such keywords, stuff written in comments and string literals.
Anyhow you most likely would fail sometimes so some mechanism for user to override language choice is still necessary.
I think you never should rely on one single feature, since the absence in a fragment (e.g. somebody systematically using WHILE instead of for) might confuse you.
Also try to stay away from global identifiers like "IMPORT" or "MODULE" or "UNIT" or INITIALIZATION/FINALIZATION, since they might not always exist, be optional in complete sources, and totally absent in fragments.
Dialects and similar languages (e.g. Modula2 and Pascal) are dangerous too.
I would create simple lexers for a bunch of languages that keep track of key tokens, and then simply calculate a key tokens to "other" identifiers ratio. Give each token a weight, since some might be a key indicator to disambiguate between dialects or versions.
Note that this is also a convenient way to allow users to plugin "known" keywords to increase the detection ratio, by e.g. providing identifiers of runtime library routines or types.
Very interesting question, I don't know if it is possible to be able to distinguish languages by code snippets, but here are some ideas:
One simple way is to watch out for single-quotes: In some languages, it is used as character wrapper, whereas in the others it can contain a whole string
A unary asterisk or a unary ampersand operator is a certain indication that it's either of C/C++/C#.
Pascal is the only language (of the ones given) to use two characters for assignments :=. Pascal has many unique keywords, too (begin, sub, end, ...)
The class initialization with a function could be a nice hint for Java.
Functions that do not belong to a class eliminates java (there is no max(), for example)
Naming of basic types (bool vs boolean)
Which reminds me: C++ can look very differently across projects (#define boolean int) So you can never guarantee, that you found the correct language.
If you run the source code through a hashing algorithm and it looks the same, you're most likely analyzing Perl
Indentation is a good hint for Python
You could use functions provided by the languages themselves - like token_get_all() for PHP - or third-party tools - like pychecker for python - to check the syntax
Summing it up: This project would make an interesting research paper (IMHO) and if you want it to work well, be prepared to put a lot of effort into it.
There is no way of making this foolproof, but I would personally start with operators, since they are in most cases "set in stone" (I can't say this holds true to every language since I know only a limited set). This would narrow it down quite considerably, but not nearly enough. For instance "->" is used in many languages (at least C, C++ and Perl).
I would go for something like this:
Create a list of features for each language, these could be operators, commenting style (since most use some sort of easily detectable character or character combination).
For instance:
Some languages have lines that start with the character "#", these include C, C++ and Perl. Do others than the first two use #include and #define in their vocabulary? If you detect this character at the beginning of line, the language is probably one of those. If the character is in the middle of the line, the language is most likely Perl.
Also, if you find the pattern := this would narrow it down to some likely languages.
I would have a two-dimensional table with languages and patterns found and after analysis I would simply count which language had most "hits". If I wanted it to be really clever I would give each feature a weight which would signify how likely or unlikely it is that this feature is included in a snippet of this language. For instance if you can find a snippet that starts with /* and ends with */ it is more than likely that this is either C or C++.
The problem with keywords is someone might use it as a normal variable or even inside comments. They can be used as a decider (e.g. the word "class" is much more likely in C++ than C if everything else is equal), but you can't rely on them.
After the analysis I would offer the most likely language as the choice for the user with the rest ordered which would also be selectable. So the user would accept your guess by simply clicking a button, or he can switch it easily.
In answer to 2: if there's a "#!" and the name of an interpreter at the very beginning, then you definitely know which language it is. (Can't believe this wasn't mentioned by anyone else.)

Is there a calculator with LaTeX-syntax?

When I write math in LaTeX I often need to perform simple arithmetic on numbers in my LaTeX source, like 515.1544 + 454 = ???.
I usually copy-paste the LaTeX code into Google to get the result, but I still have to manually change the syntax, e.g.
\frac{154,7}{25} - (289 - \frac{1337}{42})
must be changed to
154,7/25 - (289 - 1337/42)
It seems trivial to write a program to do this for the most commonly used operations.
Is there a calculator which understand this syntax?
I know that doing this perfectly is impossible (because of the halting problem). Doing it for the simple cases I need is trivial. \frac, \cdot, \sqrt and a few other tags would do the trick. The program could just return an error for cases it does not understand.
WolframAlpha can take input in TeX form.
The LaTeXCalc project is designed to do just that. It will read a TeX file and do the computations. For more information check out the home page at
The calc package allows you to do some calculations in source, but only within commands like \setcounter and \addtolength. As far as I can tell, this is not what you want.
If you already use sage, then the sagetex package is pretty awesome (if not, it's overkill). It allows you get nicely formatted output from input like this:
The square of
1 & 2 \\
3 & 4
is \sage{matrix([[1, 2], [3,4]])^2}.
The prime factorization of the current page number is \sage{factor(\thepage)}
As Andy says, the answer is yes there is a calculator that can understand most latex formulas: Emacs.
Try the following steps (assuming vanilla emacs):
Open emacs
Open your .tex file (or activate latex-mode)
position the point somewhere between the two $$ or e.g. inside the begin/end environment of the formula (or even matrix).
use calc embedded mode for maximum awesomeness
So with point in the formula you gave above:
$\frac{154,7}{25} - (289 - \frac{1337}{42})$
press C-x * d to duplicate the formula in the line below and enter calc-embedded mode which should already have activated a latex variant of calc for you. Your buffer now looks like this:
$\frac{154,7}{25} - (289 - \frac{1337}{42})$
Note that the fraction as already been transformed as far as possible. Doing the same again (C-x * d) and pressing c f to convert the fractional into a floating point number yields the following buffer:
$\frac{154,7}{25} - (289 - \frac{1337}{42})$
I used C-x * d to duplicate the formula and then enter embedded mode in order to have the intermediate values, however there is also C-x * e which avoids the duplication and simply enters embedded mode for the current formula.
If you are interested you should really have a look at the info page for Emacs Calc - Embedded Mode. And in general the help for the Gnu Emaca Calculator together with the awesome interactive tutorial.
You can run an R function called Sweave on a (mostly TeX with some R) file that can replace R expressions with their results in Tex.
A tutorial can be found here:
My calculator can do that. To get the formatted output, double-click the result formula and press ctrl+c to copy it.
It can do fairly advanced stuff too (differentiation, easy integrals (and not that easy ones)...).
A sample computation:
There is a way to do what you want just not quite how you describe.
You can use the fp package (\usepackage[options]{fp}) the floating point package will do anything you want; solving equations, adding dividing and many more. Unfortunately it will not read the LaTeX math you instead have to do something a little different, the documentation is very poor so I'll give an example here.
for instance if you want to do (2x3)/5 you would type:
\FPmul\p{2}{3} % \p is the assignment of the operation 2x3
\FPupn\p{\p{} 7 round} % upn evaluates the assignment \p and rounds to 7dp
\FPdiv\q{\p}{5} % divides the assigned value p by 5 names result q
\FPupn\q{\q{} 4 round} % rounds the result to 4 decimal places and evaluates
$\frac{2\times3}{5}=\FPprint\q$ % This will print the result of the calculations in the math.
the FP commands are always ibvisible, only FPprint prints the result associated with it so your documents will not be messy, FP commands can be placed wherever you wish (not verb) as long as they are before the associated FPprint.
You could just paste it into symbolab which as a bonus has free step by step solutions. Also since symbolab uses mathquill it instantly formats your latex.
Considering that LaTeX itself is a Turing-complete markup language I strongly doubt you can build something like this that isn't built directly into LaTeX. Furthermore, LaTeX math matkup itself has next to no semantic meaning, it merely describes the visual appearance.
That being said, you can probably hack together something which recognizes a non-programmable subset of LaTeX math markup and spits out the result in the same way. If all you're interested in is simple arithmetics with fractions and integers (careful with decimal fractions, though, as they may appear as 3{,}141... in German texts :)) this shouldn't be too hard. But once you start with integrals, matrices, etc. I fear that LaTeX lacks expressiveness to accurately describe your intentions. It is a document preparation system, after all and thus not very suitable as input for computer algebra systems.
Side note: You can switch to Word which has—in its current version—a math markup language which is sufficiently LaTeX-like (by now it even supports LaTeX markup) and yet still Google-friendly for simpler terms:
With the free Microsoft Math add-in you can even let Word calculate expressions in-place:
There is none, because it is generally not possible.
LaTeX math mode markup is presentational markup and there are cases in which it does not provide enough information to calculate the expression.
That was one of the reasons MathML content markup was created and also why MathML is used in Mathematica. MathML actually is sort of two languages in one:
presentation markup
content markup
To accomplish what you are after you'll have to have MathML with comibned presentation and content markup (see MathML spec).
In my opinion your best bet is to use MathML (even if it is verbose) and convert to LaTeX when necessary. That said, I also like LaTeX syntax best and maybe what we need is a compact syntax for MathML (something similar in spirit to RelaxNG compact syntax).
For calculations with LaTeX you can use a CalcTeX package.
This package understand elements of LaTeX language and makes an calculations, for example your problem is avialble on
or just please write
For calculation please use following enviromentals
$515.1544 + 454$
\[ \frac{154.7}{25}-(289-\frac{1337}{42.})
For more info please visite project web site or contact author of this project.
For performing the math within your LaTeX itself, you might also look into the pgfmath package, which is more powerful and convenient than the calc package. You can find out how to use it from Part VI of The TikZ and PGF Packages Manual, which you can find here (version 2.10 currently):
Emacs calc-mode accepts latex-input. I use it daily. Press "d", followed by "L" to enter latex input mode. Press "'" to open a prompt where you can paste your tex.
Anyone saing it is not possible is wrong.
IIRC Mathematica can do it.
There is none, because it is generally not possible. LaTeX math mode
markup is presentational markup and there are cases in which it does
not provide enough information to calculate the expression.
You are right. LaTeX as it is does not provide enough info to make any calculations.Moreover, it does not represent any information to do it. But nobody prevents to wright in LaTeX format a text that contains such an information.
It is a difficult path, because you need to build a system of rules superimposed on what content ofthe text in Latex format needs to contain that it would be recognizable by your interpreter. And then convince the user that it is necessary to learn, etc. etc...
The easiest way to create a logical and intuitive calculator of mathematical expressions. And the expression is already possible to convert latex. It's almost like what you said. This is implemented in the program which I have pointed to. AnEasyCalc allows to type an expression as you type the plane text in any text editor. It checks, calculates and generate LateX string by its own then. Its very easy and rapid work. Just try and you will see that.
This is not exactly what you are asking for but it is a nice package
that you can include in a LaTeX document to do all kind of operations including arithmetic, calculus and even vectors and matrices:
The package name is "calculator"
The latex2sympy2 Python library can parse LaTeX math expressions.
from latex2sympy2 import latex2sympy
tex_str = r"""YOUR TEX MATH HERE"""
tex_str = r"\frac{9\pi}{\ln(12)}+22" # example TeX math
sympy_object = latex2sympy(tex_str)
evaluated_tex = float(sympy_object.evalf())
This Python 3 code evaluates 9𝜋/ln(12)+22 (in its LaTeX from above) to 33.37842899841745.
The snippet above only handles basic algebraic simplification (math expressions without variables). Since the library converts LaTeX math to SymPy objects, the above code can easily be tweaked and extended to handle much more complicated LaTeX math (including solving derivatives, integrals, etc...).
The latex2sympy2 library can be installed via the pip command: pip install --user latex2sympy2
try the AnEasyCalc program. It allows to get the latex formula very easy:
