How to read 2 bytes in flex? - flex-lexer

I want to read utf-16 by 2 bytes in flex and use it.
I will use 2 byte data converted to multibyte.
I opened the file with "rb" and read it.
But I can't read the following.
4C AE C0 C9 20 00 00 B3 9C B0 ...
flex read like this.
4C AE C0 C9 20 00 00 B3 9C B0 ...
I tried to read with this code.
<string>[\x00-\xff][\x00-\xff]{
...
}
How to read 2 bytes at a time in flex?

I couldn't reproduce the problem using the hints in the question. It appears to work perfectly, processed with several different Flex versions (2.6.4, 2.5.39 and 2.5.4) in case you are using some antiquated version. I didn't try any non-default command-line options.
So it's impossible to say what you are doing wrong without seeing everything you are doing. Of course, we don't want to wade through your entire program; instead, we ask that you provide a minimal reproducible example (MRE). Note that a MRE is not your program, nor is it a few lines of your program. It's a complete, compilable program which shows the same problem. Writing a MRE and verifying that it shows the same issue is a good debugging technique; in many cases, the exercise will let you find the problem yourself.
That description might be a bit theoretical, so I'm including the test code I wrote. Had it produced the output you claim to receive, it would have served as an MRE, so it might serve as a starting point for your investigation:
%option noinput nounput noyywrap nodefault
%x U16STRING
%%
u["] { fputs("UTF-16LE string:", stdout);
BEGIN(U16STRING); }
.|\n ; /* Ignore everything else */
<U16STRING>{
\x22\x00 { putchar('\n'); BEGIN(INITIAL); }
[\x00-\xff]{2} { fputs(" <", stdout);
for (int i = 0; i < yyleng; ++i)
printf(" %02hhx", (unsigned char)yytext[i]);
fputs(" >", stdout);
}
[\x00-\xff] { printf(" %02x !!", yytext[0]); BEGIN(INITIAL); }
<<EOF>> { puts(" EOF"); return 0; }
}
That code identifies sequences which start with Ascii/UTF-8 u" and end with a UTF-16LE double quote; the sequences are expected to be UTF-16LE, although the code doesn't check for surrogates. All it does is dump the bytes in hex to demonstrate that the input is being correctly tokenised.
Here's a sample run, including the build commands (the Flex source is called b2.l):
$ flex --version
flex 2.6.4
$ flex -o b2.c b2.l
$ gcc -o b2 -Wall b2.c -lfl
$ printf 'u"\x4c\xae\xc0\xc9\x20\x00\x00\xb3\x9c\xb0\x22\x00' | ./b2
UTF-16LE string: < 4c ae > < c0 c9 > < 20 00 > < 00 b3 > < 9c b0 >
$

Related

Repetitive regular expression in Lua

I need to find a pattern of 6 pairs of hexadecimal numbers (without 0x), eg.
"00 5a 4f 23 aa 89"
This pattern works for me, but the question is if there any way to simplify it?
[%da-f][%da-f]%s[%da-f][%da-f]%s[%da-f][%da-f]%s[%da-f][%da-f]%s[%da-f][%da-f]%s[%da-f][%da-f]
Lua patterns do not support limiting quantifiers and many more features that regular expressions support (hence, Lua patterns are not even regular expressions).
You can build the pattern dynamically since you know how many times you need to repeat a part of a pattern:
local text = '00 5a 4f 23 aa 89'
local answer = text:match('[%da-f][%da-f]'..('%s[%da-f][%da-f]'):rep(5) )
print (answer)
-- => 00 5a 4f 23 aa 89
See the Lua demo.
The '[%da-f][%da-f]'..('%s[%da-f][%da-f]'):rep(5) can be further shortened with %x hex char shorthand:
'%x%x'..('%s%x%x'):rep(5)
Lua supports %x for hexadecimal digits, so you can replace all every [%da-f] with %x:
%x%x%s%x%x%s%x%x%s%x%x%s%x%x%s%x%x
Lua doesn't support specific quantifiers {n}. If it did, you could make it quite a lot shorter.
Also you can use a "One or more" with the Plus-Sign to shorten up...
print(('Your MAC is: 00 5a 4f 23 aa 89'):match('%x+%s%x+%s%x+%s%x+%s%x+%s%x+'))
-- Tested in Lua 5.1 up to 5.4
It is described under "Pattern Item:" in...
https://www.lua.org/manual/5.4/manual.html#6.4.1
final solution:
local text = '00 5a 4f 23 aa 89'
local pattern = '%x%x'..('%s%x%x'):rep(5)
local answer = text:match(pattern)
print (answer)

How do I calculate the FCS field in an Ethernet Frame?

I see a few implementations, but I decided to look at exactly how the specification calls out the FCS for encoding.
So say my input is as follows:
dst: 0xAA AA AA AA AA AA
src: 0x55 55 55 55 55 55
len: 0x00 04
msg: 0xDE AD BE EF
concatenating this in the order that seems to be specified in the format (and the order expressed in the spec later on) seems to indicate my input is:
M(x) = 0xAA AA AA AA AA AA 55 55 55 55 55 55 00 04 DE AD BE EF
a) "The first 32 bits of the frame are complemented."
complemented first 32 MSB of M(x): 0x55 55 55 55 AA AA 55 55 55 55 55 55 00 04 DE AD BE EF
b) "The n bits of the protected fields are then considered to be the coefficients of a polynomial M(x) of
degree n ā€“ 1. (The first bit of the Destination Address field corresponds to the x(nā€“1) term and the last
bit of the MAC Client Data field (or Pad field if present) corresponds to the x0 term.)"
I did this in previous. see M(x)
c) "M(x) is multiplied by x^32 and divided by G(x), producing a remainder R(x) of degree <=31."
Some options online seem to ignore the 33rd bit to represent x^32. I am going to ignore those simplified shortcuts for this exercise since it doesn't seem to specify that in the spec.
It says to multiply M(x) by x^32. So this is just padding with 32 zeroes on the LSBs. (i.e. if m(x) = 1x^3 + 1, then m(x) * x^2 = 1x^5 + 1x^2 + 0)
padded: 0x55 55 55 55 AA AA 55 55 55 55 55 55 00 04 DE AD BE EF 00 00 00 00
Next step is to divide. I am dividing the whole M(x) / G(x). Can you use XOR shifting directly? I see some binary division examples have the dividened as 101 and the divisior as 110 and the remainder is 11. Other examples explain that by converting to decimal, you cannot divide. Which one is it for terms for this standard?
My remainder result is for option 1 (using XOR without carry bit consideration, shifting, no padding) was:
0x15 30 B0 FE
d) "The coefficients of R(x) are considered to be a 32-bit sequence."
e) "The bit sequence is complemented and the result is the CRC."
CRC = 0xEA CF 4F 01
so my entire Ethernet Frame should be:
0xAA AA AA AA AA AA 55 55 55 55 55 55 00 04 DE AD BE EF EA CF 4F 01
In which my dst address is its original value.
When I check my work with an online CRC32 BZIP2 calculator, I see this result: 0xCACF4F01
Is there another option or online tool to calculate the Ethernet FCS field? (not just one of many CRC32 calculators)
What steps am I missing? Should I have padded the M(x)? Should I have complemented the 32 LSBs instead?
Update
There was an error in my CRC output in my software. It was a minor issue with copying a vector. My latest result for CRC is (before post-complement) 35 30 B0 FE.
The post-complement is: CA CF 4F 01 (matching most online CRC32 BZIP2 versions).
So my ethernet according to my programming is currently:
0xAA AA AA AA AA AA 55 55 55 55 55 55 00 04 DE AD BE EF CA CF 4F 01
The CRC you need is commonly available in zlib and other libraries as the standard PKZip CRC-32. It is stored in the message in little-endian order. So your frame with the CRC would be:
AA AA AA AA AA AA 55 55 55 55 55 55 00 04 DE AD BE EF B0 5C 5D 85
Here is an online calculator, where the first result listed is the usual CRC-32, 0x855D5CB0.
Here is simple example code in C for calculating that CRC (calling with NULL for mem gives the initial CRC):
unsigned long crc32iso_hdlc(unsigned long crc, void const *mem, size_t len) {
unsigned char const *data = mem;
if (data == NULL)
return 0;
crc ^= 0xffffffff;
while (len--) {
crc ^= *data++;
for (unsigned k = 0; k < 8; k++)
crc = crc & 1 ? (crc >> 1) ^ 0xedb88320 : crc >> 1;
}
return crc ^ 0xffffffff;
}
The 0xedb88320 constant is 0x04c11db7 reflected.
The actual code used in libraries is more complex and faster.
Here is the calculation of that same CRC (in Mathematica), using the approach described in the IEEE 802.3 document with polynomials, so you can see the correct resulting powers of x used for the remainder calculation:
If you click on the image, it will embiggen to make it easier to read the powers.
The confusing factor here is the 802.3 spec. It mentions that first bit is LSB (least significant bit == bit 0) only in one place, in section 3.2.3-b, and it mentions that for CRC, "the first bit of the Destination Address field corresponds to the x^(n-1) term", so each byte input to the CRC calculation is bit reflected.
Using this online calculator:
http://www.sunshine2k.de/coding/javascript/crc/crc_js.html
Select CRC-32 | CRC32, click on custom, input reflected on, result reflected off. With this data:
AA AA AA AA AA AA 55 55 55 55 55 55 00 04 DE AD BE EF
per the spec, the calculated CRC is 0x0D3ABAA1, stored and transmitted as shown:
bit 0 first | bit 7 first
AA AA AA AA AA AA 55 55 55 55 55 55 00 04 DE AD BE EF | 0D 3A BA A1
To simplify the output to always transmit bit 0 first, bit reflect the CRC bytes:
bit 0 first | bit 0 first
AA AA AA AA AA AA 55 55 55 55 55 55 00 04 DE AD BE EF | B0 5C 5D 85
Note that the bit 0 always first method results in transmitted bits identical to the spec.
Change result setting for the CRC calculator, input reflected on, result reflected on. With this data:
AA AA AA AA AA AA 55 55 55 55 55 55 00 04 DE AD BE EF
the calculated CRC is 0x855D5CB0, stored least significant byte first and transmitted as shown:
bit 0 first | bit 0 first
AA AA AA AA AA AA 55 55 55 55 55 55 00 04 DE AD BE EF | B0 5C 5D 85
For verifying received data, rather than compare a calculated CRC of received data versus the received CRC, the process can calculate a CRC on received data and CRC. Assuming the alternative setup where all bytes are received bit 0 first, then with this received frame or any frame without error
bit 0 first | bit 0 first
AA AA AA AA AA AA 55 55 55 55 55 55 00 04 DE AD BE EF | B0 5C 5D 85
the calculated CRC will always be 0x2144DF1C. In the case of a hardware implementation, the post complement of the CRC is usually performed one bit at a time as bits are shifted out, outside of the logic used to calculate the CRC, and in this case, after receiving a frame without error, the CRC register will always contain 0xDEBB20E3 (0x2144DF1C ^ 0xFFFFFFFF). So verification is done by computing CRC on a received frame and comparing the CRC to a 32 bit constant (0x2144DF1C or 0xDEBB20E3).

Linking objects compiled with gcc, containing SSE instructions, to a Delphi program

Is there a way to link objects that have been compiled with GCC without disabling SSE instructions, statically to a Delphi program (Windows x86_64)?
Embarcadero's Delphi compiler has supported COFF object format at least since Delphi XE2. However, there seems to be a serious limitation: If SSE instructions aren't disabled with gcc's -mno-sse2, there's a good chance to get access violations. More specifically, gcc places constants in .rdata section that is ignored by Delphi. (Tested with XE5, XE7, 10.1, 10.2, 10.3)
With GCC 8.2 the thing got even more difficult: Until now, it was enough to disable SSE2.
The following code, when compiled with gcc 8.2.1, results in SSE instructions even if -mno-sse2 is given as an option:
unsigned *pIndexTable;
...
// Initialize index values
for (int i = 0; i < count; i++) {
*pIndexTable = i;
pIndexTable++;
}
Objdump output:
*pIndexTable = i;
a56: 0f 28 05 00 00 00 00 movaps xmm0,XMMWORD PTR [rip+0x0] # a5d <GenerateTable+0x79d> a59: R_X86_64_PC32 .rdata
a5d: c1 e8 02 shr eax,0x2
a60: 0f 11 04 9d 00 00 00 00 movups XMMWORD PTR [rbx*4+0x0],xmm0
If SSE is totally disabled with -mno-sse, things get even more broken: SSE is a part of x86_64 ABI.

What is this most likely if not Lua bytecode?

I considered posting this in reverse engineering but because of the brevity of the question and general irrelevance I decided to post it here.
This may be a really easy question but I haven't been able to find an answer - I should probably read a bit of Lua's source before asking this, but here goes: in a program that has integrated Lua, this is the first few bytes of the buffer being executed:
11 16 A5 F1 9E A8 8B 64 78 8E 2F EA 1C 31 D3 B6 D3 D5 77 23 77 79 1B 73
I've never understood Lua very well, but that doesn't look like byte code. Is there anything else it could be or is it just certainly something custom? I'm pretty sure actual Lua opcodes haven't been modified.
if possible put whole file somewhere. Lua bytecode usualy starts with 1B 4C 75 61 and then some debug infos however there are zeros for spliting of the informations which your sample doesnt have
other way is asking on http://forum.xentax.com/
Put this into a file and try running luac -l filename on it. This will disassemble the binary to the VM instructions.
If it is Lua code you'll get some meaningful output.

iOS Base64 Lib that prevents CRLF

I'm having troubles decoding/encoding a base64 string because of the CRLF on it.
I've tried this lib Base64.h and this one NSData+Base64.h but both do not handle well the CRLF.
Anyone had this problem before?
Anyone has an advice on how to avoid these CRLF? I think Android's Java lib is replacing this with a '0', am I correct?
public static final int CRLF = 4;
Base64 encodes 64 characters, namely 'A-Za-z0-9+/' with a possible trailing '=' to indicate a non mod 3 length. CR+LF may be used as a line separator, generally decode each line separately.
See Wikipedia Base64 for more information on CR+LF variants.
"+vqbiP7s3oe7/puJ8v2a3fOYnf3vmpap"
decoded is:
"FA FA 9B 88 FE EC DE 87 BB FE 9B 89 F2 FD 9A DD F3 98 9D FD EF 9A 96 A9"
The last character is not 0.

Resources