ActionScript - same opcode for two distinct instructions? - actionscript

Looking at the AVM2 specs (here, page 66-67) i found out that there are two instructions which do something different but they have the same opcode:
greaterequals, 0xAF
greaterthan, 0xAF
Both have the same format (no arguments). Both have the same stack transition. Since these instructions do something different i am a bit confused. Does someone know what are the correct opcodes?

greaterequals opcode is 0xb0
In the online documentation the error is partially fixed (decimal value is wrong still)

Related

Z3: Complex numbers?

I have been searching on whether z3 supports complex numbers and have found the following: https://leodemoura.github.io/blog/2013/01/26/complex.html
The author states that (1) Complex numbers are not yet implemented in Z3 as a built-in (this was written in 2013), and (2) that Complex numbers can be encoded on top of the Real numbers provided by Z3.
The basic idea is to represent a Complex number as a pair of Real numbers. He defines the basic imaginary number with I=(0,1), that is: I means the real part equals 0 and the imaginary part equals 1.
He offers the encoding (I mean, we can test it on our machines), where we can solve the equation x^2+2=0. I received the following result:
sat
x = (-1.4142135623?)*I
The sat result does make sense, since this equation is solvable in the simulation of the theory of complex numbers (as a result of the theory of algebraically closed fields) we have just made. However, the root result does not make sense to me. I mean: what about (1.4142135623?)*I?
I would understand to receive the two roots, but, if only one received, I do not understand why I get the negated solution.
Maybe I misread something or I missed something.
Also, I would like to say if complex numbers have already been implemented built in Z3. I mean, with a standard:
x = Complex("x")
And with tactics of kind of a NCA (from nonlinear complex arithmetic).
I have not seen any reference to this theory in SMT-LIB either.
AFAIK there is no plan to add complex numbers to SMT-LIB. There's a Google group for SMT-LIB and it might make sense to send a post there to see if there is any interest there.
Note, that particular blog post says "find a root"; this is just satisfiability, i.e. it finds one solution, not all of them. (But you can ask for another one by adding an assertion that says x should be different from the first result.)

Parsing one number into multiple tokens in ANTLR

I am trying to use ANTLR as a parser for my company's latest project. I have been unable to find any information on how to parse one number, say (0005039906179210835699175654) into multiple tokens (a 5 digit number, a 3 digit number, a 14 digit number, and a 6 digit number).
My current code spits back an error,
line 1:1 no viable alternative at input '0005039906179210835699175654'
Also, on another note, does anyone know how to get the name of a token by using a listener? That's just a bonus question I guess :) Thanks in advance to everyone who responds!
EDIT:
To clarify the whole problem, my company receives information in a legacy format from automated systems. This information must be parsed into POJOs for further processing. I am trying to use ANTLR as an easy, smooth, readable, and expandable solution to this. One example is this line:
U0005138606179090232769522950 0863832 18322862 0284785 3
Which must be parsed into the sections: U, 00051, 386, 06179090232769, 522950, 0863832, 18322862, 0284785, and 3. Obviously the sections separated by white space are easy to parse but I have been unable to find a way in ANTLR to parse the values not separated by white space. Any help would be appreciated, thanks!
EDIT2:
To be perfectly clear as to why I'm using ANTLR instead of just java, my company receives messages in 5 legacy formats, and the system implemented to parse them must be easily expandable to accommodate more in the future. ANTLR is easy to read and understand. Plus, it will be easier to construct additional grammars and listeners than try to maintain a random mess of java.
EDIT3:
I thought of a solution but it is pretty janky. My idea is to parse the 28 character number as one token, then split it using java from a listener since it is broken up the same way each time. I'll report back later today on whether I got it to work.
EDIT4, FINAL UPDATE:
I have chosen to go my solution mentioned in edit3. It is not pretty, but it works and it is fast enough. Thank your very much to everyone who commented, shared ideas, and stimulated thought!

ASCII Representation of Hexadecimal

I have a string that, by using string.format("%02X", char), I've received the following:
74657874000000EDD37001000300
In the end, I'd like that string to look like the following:
t e x t NUL NUL NUL í Ó p SOH NUL ETX NUL (spaces are there just for clarification of characters desired in example).
I've tried to use \x..(hex#), string.char(0x..(hex#)) (where (hex#) is alphanumeric representation of my desired character) and I am still having issues with getting the result I'm looking for. After reading another thread about this topic: what is the way to represent a unichar in lua and the links provided in the answers, I am not fully understanding what I need to do in my final code that is acceptable for this to work.
I'm looking for some help in better understanding an approach that would help me to achieve my desired result provided below.
ETA:
Well I thought that I had fixed it with the following code:
function hexToAscii(input)
local convString = ""
for char in input:gmatch("(..)") do
convString = convString..(string.char("0x"..char))
end
return convString
end
It appeared to work, but didnt think about characters above 127. Rookie mistake. Now I'm unsure how I can get the additional characters up to 256 display their ASCII values.
I did the following to check since I couldn't truly "see" them in the file.
function asciiSub(input)
input = input:gsub(string.char(0x00), "<NUL>") -- suggested by a coworker
print(input)
end
I did a few gsub strings to substitute in other characters and my file comes back with the replacement strings. But when I ran into characters in the extended ASCII table, it got all forgotten.
Can anyone assist me in understanding a fix or new approach to this problem? As I've stated before, I read other topics on this and am still confused as to the best approach towards this issue.
The simple way to transform a base16-encoded string is just to
function unhex( input )
return (input:gsub( "..", function(c)
return string.char( tonumber( c, 16 ) )
end))
end
This is basically what you have, just a bit cleaner. (There's no need to say "(..)", ".." is enough – if you specify no captures, you'll automatically get the whole match. And while it might work if you write string.char( "0x"..c ), it's just evil – you concatenate lots of strings and then trigger the automatic conversion to numbers. Much better to just specify the base when explicitly converting.)
The resulting string should be exactly what went into the hex-dumper, no matter the encoding.
If you cannot correctly display the result, your viewer will also be unable to display the original input. If you used different viewers for the original input and the resulting output (e.g. a text editor and a terminal), try writing the output to a file instead and looking at it with the same viewer you used for the original input, then the two should be exactly the same.
Getting viewers that assume different encodings (e.g. one of the "old" 8-bit code pages or one of the many versions of Unicode) to display the same thing will require conversion between different formats, which tends to be quite complicated or even impossible. As you did not mention what encodings are involved (nor any other information like OS or programs used that might hint at the likely encodings), this could be just about anything, so it's impossible to say anything more specific on that.
You actually have a couple of problems:
First, make sure you know the meaning of the term character encoding, and that you know the difference between characters and bytes. A popular post on the topic is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Then, what encoding was used for the bytes you just received? You need to know this, otherwise you don't know what byte 234 means. For example it could be ISO-8859-1, in which case it is U+00EA, the character ê.
The characters 0 to 31 are control characters (eg. 0 is NUL). Use a lookup table for these.
Then, displaying the characters on the terminal is the hard part. There is no platform-independent way to display ê on the terminal. It may well be impossible with the standard print function. If you can't figure this step out you can search for a question dealing specifically with how to print Unicode text from Lua.

Determine Cobol coding style

I'm developing an application that parses Cobol programs. In these programs some respect the traditional coding style (programm text from column 8 to 72), and some are newer and don't follow this style.
In my application I need to determine the coding style in order to know if I should parse content after column 72.
I've been able to determine if the program start at column 1 or 8, but prog that start at column 1 can also follow the rule of comments after column 72.
So I'm trying to find rules that will allow me to determine if texts after column 72 are comments or valid code.
I've find some but it's hard to tell if it will work everytime :
dot after column 72, determine the end of sentence but I fear that dot can be in comments too
find the close character of a statement after column 72 : " ' ) }
look for char at columns 71 - 72 - 73, if there is not space then find the whole word, and check if it's a key word or a var. Problem, it can be a var from a COPY or a replacement etc...
I'd like to know what do you think of these rules and if you have any ideas to help me determine the coding style of a Cobol program.
I don't need an API or something just solid rules that I will be able to rely on.
I think you need to know the COBOL compiler for each program. Its documentation should tell you what conventions/configurations/switches it uses to decide if the source code ends at column 72 or not.
So.... which compiler(s)?
And if you think the column 72 issue is a pain, wait till you get around to actually parsing the COBOL itself. If you are not well prepared to handle the lexical issues of the language, you are probably very badly prepared to handle the syntactic ones.
There is no absolutely reliable way to determine if a COBOL program
is in fixed or free format based only on the source code. Heck it is sometimes difficult to identify
the programming language based only on source code. Check out
this classic polyglot - it is valid under 8 different language compilers. That
said, you could try a few heuristics that might yield
the correct answer more often than not.
Compiler directives imbedded in source code
Watch for certain compiler directives that determine code format.
Unfortunately, every compiler vendor uses their own flavour of directive.
For example, Microfocus COBOL uses the
SOURCEFORMAT directive. This directive will appear near the top of the program so a short pre-scan
could be used to find it. On the other hand, OpenCobol uses >>SOURCE FORMAT IS FREE and
>>SOURCE FORMAT IS FIXED to toggle between free and fixed format, different parts of the same program
could be formatted differently!
The bottom line here is that you will have to support the conventions of multiple COBOL compilers.
Compiler switches
Source code format can be also be specified using a compiler switch. In this case, there are no concrete
clues to go on. However, you can be reasonably sure that the entire source program will be either
fixed or free. All you can do here is guess. Unless the programmer is out to "mess with
your head" (and some will), a program in free format will have the keywords IDENTIFICATION DIVISION or ID DIVISION, starting before column 8.
Every COBOL program will begin with these keywords so you can use them as the anchor point for determining code format in the
absence of imbedded compiler directives.
Warning - this is far from fool proof, but might be a good start.
There won't be an algorithm to do this with 100% certainty, because if comments can be anything, they can also be compilable COBOL code. So you could theoretically write a program that means one thing if the comments are ignored, and something else entirely if the comments are treated as part of the COBOL.
But that's extremely unlikely. What's most likely to happen is that if you try to compile the code under the wrong convention, it will simply fail. So the only accurate way to do this is to try compiling/parsing the program one way, and if you come to a line that can't make sense, switch to the other style. You could also support passing an argument to the compiler when the style is already known.
You can try using heuristics like what you've described, but that will never be totally accurate. The most they can give you is a probability that the code is one or the other style, which will increase as they examine more and more lines of code. They could be useful for helping you guess the style before you start compiling, or for figuring out when the problem is really just a typo in the code.
EDIT:
Regarding ideas for heuristics, it's hard to say. If there were a standard comment sigil like // or # in other languages, this would be a lot easier (actually, there is, but it sounds like your code doesn't follow this convention). The only thing I can think of would be to check whether every line (or maybe 99% of lines, and not counting empty lines or lines commented with *) has a period somewhere before position 72.
One thing you DON'T want to do is apply any heuristics to the part after position 72. That is, you don't want to be checking the comments to see if they're valid COBOL. You want to check what you know is COBOL first, and see if that works by itself. There are several reasons for this:
Comments written in English are likely to have periods and quotes in them, so your first and second bullet points are out.
Natural languages are WAY harder to parse than something like COBOL.
The comments could easily have COBOL in them (maybe someone commented out the previous version of the line).
An important rule for comments is that they should never affect what the program does. If changing the comments can change how the program is compiled, you violate that.
All that in mind, my opinion is that you shouldn't use heuristics at all. You should always try to compile the program under both conventions unless one is explicitly specified. There's a chance that code will compile successfully under both conventions, and then you'll have two different programs and no way to tell which one is correct.
If that happens, you need to compare the two results (perhaps with a hash or something) to see if they're the same program. If they're the same, great, but if not, you'll need to force the user to explicitly choose a convention.
Most COBOL compilers will allow you to generate and analyze the post text manipulation phase.
The text preprocessor output can be seen (using OpenCOBOL for the example)
cobc -E program.cob
The text manipulation processor deals with any COPY ... REPLACING compiler directives, as well as converting SOURCE FORMAT IS FIXED (with line continuations, string literal concatenations, comment line removal, among other things) to the actual free format that the compiler lexical analyzer needs. A lot of the OpenCOBOL toolkits (Cross referencer and Animator, to name two) use source code AFTER the preprocessor pass. I don't think you'll lose any street cred if your parser program relies on post processed source code files.

How do I compare unicode strings containing non-english characters to sort alpabetically?

I am trying to sort array/lists/whatever of data based upon the unicode string values in them which contain non-english characters, I want them sorted correctly alphabetically.
I have written a lot of code (D2010, win XP), which I thought was pretty solid for future internationalisation, but it is not. Its all using unicodestring (string) data type, which up until now I have just been putting english characters into the unicode strings.
It seems I have to own up to making a very serious unicode mistake. I talked to my German friend, and tried out some German ß's, (ß is 'ss' and should come after S and before T in alphabet) and and ö's etc (note the umlaut) and none of my sorting algorithms work anymore. Results are very mixed up. Garbage.
Since then I have been reading up extensively and learnt a lot of unpleasant things with regards to unicode collation. Things are looking grim, much grimmer than I ever expected, I have seriously messed this up. I hope I am missing something and things are not actually quite as grim as they appear at present. I have been tinkering around looking at windows api calls (RtlCompareUnicodeString) with no success (protection faults), I could not get it to work. Problem with API calls I learnt is that they change on various newer windows platforms, and also with delphi going cross plat soon, with linux later, my app is client server so I need to be concerned about this, but tbh with the situation being what is it (bad) I would be grateful for any forward progress, ie win api specific.
Is using win api function RtlCompareUnicodeString to obvious solution? If so I should really try again with that but tbh I have been taken aback by all of the issues involved with unicode collation and I not clear at all what I should be doing to compare these strings this way anyway.
I learnt of the IBM ICU c++ opensource project, there is a delphi wrapper for it albeit for an older version of ICU. It seems a very comprehensive solution which is platform independant. Surely I cannot be looking at creating a delphi wrapper for this (or updating the existing one) to get a good solution for unicode collation?
I would be extremely glad to hear advice at two levels :-
A) A windows specific non portable solution, I would be glad off that at the moment, forget the client server ramifications!
B) A more portable solution which is immune from the various XP/vista/win7 variations of unicode api functions, therefore putting me in good stead for XE2 mac support and future linux support, not to mention the client server complications.
Btw I dont really want to be doing 'make-do' solutions, scanning strings prior to comparison and replacing certain tricky characters etc, which I have read about. I gave the German examplle above, thats just an example, I want to get it working for all (or at least most, far east, russian) languages, I don't want to do workarounds for a specific language or two. I also do not need any advice on the sorting algorithms, they are fine, its just the string comparison bit that's wrong.
I hope I am missing/doing something stupid, this all looks to be a headache.
Thank you.
EDIT, Rudy, here is how I was trying to call RtlCompareUnicodeString. Sorry for the delay I have been having a horrible time with this.
program Project26
{$APPTYPE CONSOLE}
uses
SysUtils;
var
a,b:ansistring;
k,l:string;
x,y:widestring;
r:integer;
procedure RtlInitUnicodeString(
DestinationString:pstring;
SourceString:pwidechar) stdcall; external 'NTDLL';
function RtlCompareUnicodeString(
String1:pstring;
String2:pstring;
CaseInSensitive:boolean
):integer stdcall; external 'NTDLL';
begin
x:='wef';
y:='fsd';
RtlInitUnicodeString(#k, pwidechar(x));
RtlInitUnicodeString(#l, pwidechar(y));
r:=RtlCompareUnicodeString(#k,#l,false);
writeln(r);
readln;
end.
I realise this is most likely wrong, I am not used to calling api unctions directly, this is my best guess.
About your StringCompareEx api function. That looked really good, but is avail on Vista + only, I'm using XP. StringCompare is on XP, but that's not Unicode!
To recap, the basic task afoot, is to compare two strings, and to do so based on the character sort order specified in the current windows locale.
Can anyone say for sure if ansicomparetext should do this or not? It don't work for me, but others have said it should, and other things i have read suggest it should.
This is what I get with 31 test strings when using AnsiCompareText when in German Locale (space delimited - no strings contain spaces) :-
arß Asß asß aßs no nö ö ön oo öö oöo öoö öp pö ss SS ßaß ßbß sß Sßa
Sßb ßß ssss SSSS ßßß ssßß SSßß ßz ßzß z zzz
EDIT 2.
I am still keen to hear if I should expect AnsiCompareText to work using the locale info, as lkessler has said so, and lkessler has also posted about these subjects before and seems have been through this before.
However, following on from Rudy's advice I have also been checking out CompareStringW - which shares the same documentation with CompareString, so it is NOT non-unicode as I have stated earlier.
Even if AnsiCompareText is not going to work, although I think it should, the win32api function CompareStringW should indeed work. Now I have defined my API function, and I can call it, and I get a result, and no error... but i get the same result everytime regardless of the input strings! It returns 1 everytime - which means less than. Here's my code
var
k,l:string;
function CompareStringW(
Locale:integer;
dwCmpFlags:longword;
lpString1:pstring;
cchCount1:integer;
lpString2:pstring;
cchCount2:integer
):integer stdcall; external 'Kernel32.dll';
begin;
k:='zzz';
l:='xxx';
writeln(length(k));
r:=comparestringw(LOCALE_USER_DEFAULT,0,#k,3,#l,3);
writeln(r); // result is 1=less than, 2=equal, 3=greater than
readln;
end;
I feel I am getting somewhere now after much pain. Would be glad to know about AnsiCompareText, and what I am doing wrong with the above CompareStringW api call. Thank you.
EDIT 3
Firstly, I fixed the api call to CompareStringW myself, I was passing in #mystring when I should do PString(mystring). Now it all works correctly.
r:=comparestringw(LOCALE_USER_DEFAULT,0,pstring(k),-1,pstring(l),-1);
Now, you can imagine my dismay when I still got the same sort result as I did right at the beginning...
arß asß aßs Asß no nö ö ön oo öö oöo öoö öp pö ss SS ßaß ßbß sß Sßa
Sßb ßß ssss SSSS ßßß ssßß SSßß ßz ßzß z zzz
You may also imagine my EXTREME dismay not to mention simultaneous joy when I realised the sort order IS CORRECT, and IT WAS CORRECT RIGHT BACK IN THE BEGGINING! It make sme sick to say it, but there was never any problem in the first place - this is all down to my lack of German knowledge. I beleived the sort was wrong, since you can see above string start with S, then later they start with ß, then s again and back to ß and so on. Well I can't speak German however I could still clearly see that they was not sorted correctly - my German friend told me ß comes after S and before T... I WAS WRONG! What is happening is that string functions (both AnsiCompareText and winapi CompareTextW) are SUBSTITUTING every 'ß' with 'ss', and every 'ö' with a normal 'o'... so if i take those result above and to a search and replace as described I get...
arss asss asss Asss no no o on oo oo ooo ooo op po ss SS ssass ssbss
sss Sssa Sssb ssss ssss SSSS ssssss ssssss SSssss ssz sszss z zzz
Looks pretty correct to me! And it always was.
I am extremely grateful for all the advice given, and extremely sorry to have wasted your time like this. Those german ß's got me all confused, there was never nothing wrong with the built in delphi function or anything else. It just looked like there was. I made the mistake of combining them with normal 's' in my test data, any other letter would have not have created this illusion of un-sortedness! The squiggly ß's have made me look a fool! ßs!
Rudy and lkessler we're both especially helpful, ty both, I have to accept lkessler's answer as most correct, sorry Rudy.
You said you had problems calling Windows API calls yourself. Could you post the code, so people here can see why it failed? It is not as hard as it may seem, but it does require some care. ISTM that RtlCompareUnicodeStrings() is too low level.
I found a few solutions:
Non-portable
You could use the Windows API function CompareStringEx. This will compare using Unicode specific collation types. You can specify how you want this done (see link). It does require wide strings, i.e. PWideChar pointers to them. If you have problems calling it, give a holler and I'll try to add some demo code.
More or less portable
To make this more or less portable, you could write a function that compares two strings and use conditional defines to choose the different comparison APIs for the platform.
Try using CompareStr for case sensitive, or CompareText for case insensitive if you want your sorts exactly the same in any locale.
And use AnsiCompareStr for case sensitive, or AnsiCompareText for case insensitive if you want your sorts to be specific to the locale of the user.
See: How can I get TStringList to sort differently in Delphi for a lot more information on this.
In Unicode the numeric order of the characters is certainly not the sorting sequence. AnsiCompareText as mentioned by HeartWare does take locale specifics into consideration when comparing characters, but, as you found out, does nothing wrt the sorting order. What you are looking for is called the collation sequence of a language, which specifies the alphabetic sorting order for a language taking diacritics etc into consideration. They were sort of implied in the old Ansi Code pages, though those didn't account for sorting difference between languages using the same character set either.
I checked the D2010 docs. Apart from some TIB* components I didn't find any links. C++ builder does seem to have a compare function that takes collation into account, but that's not much use in Delphi. There you will probably have to use some Windows' API functions directly.
Docs:
Sorting collate all out: http://www.siao2.com/2008/12/06/9181413.aspx
Collation terminology: http://msdn.microsoft.com/en-us/library/ms143726(SQL.90).aspx (though that pertains to MS SQL 2005, it may be helpful)
The 'Sorting "Collate" all out' article is by Michael Kaplan, someone who has great in-depth knowledge of all things Unicode and all intricacies of various languages. His blog has been invaluable to me when porting from D2006 to D2009.
Have you tried AnsiCompareText ? Even though it is called "Ansi", I believe it calls on to an OS-specific Unicode-able comparison routine...
It should also make you safe from cross-platform dependencies (provided that Embarcadero supplies a compatible version in the various OS's they target).
I do not know how good the comparison works with the various strange Unicode ways to encode strings, but try it out and let us know the result...

Resources