Addressing memory data in 32 bit protected mode with nasm - memory

So my book says i can define a table of words like so:
table: dw "13,37,99,99"
and that i can snatch values from the table by incrementing the index into the address of the table like so:
mov ax, [table+2] ; should give me 37
but instead it places 0x2c33 in ax rather than 0x3337
is this because of a difference in system architecture? maybe because the book is for 386 and i'm running 686?

0x2C is a comma , and 0x33 is the character 3, and they appear at positions 2 and 3 in your string, as expected. (I'm a little confused as to what you were expecting, since you first say "should give me 37" and later say "rather than 0x3337".)

You have defined a string constant when I suspect that you didn't mean to. The following:
dw "13,37,99,99"
Will produce the following output:
Offset 00 01 02 03 04 05 06 07 08 09 0A 0B
31 33 2C 33 37 2C 39 39 2C 39 39 00
Why? Because:
31 is the ASCII code for '1'
33 is the ASCII code for '3'
2C is the ASCII code for ','
...
39 is the ASCII code for '9'
NASM also null-terminates your string by putting 0 byte at the end (If you don't want your strings to be null-terminated use single quotes instead, '13,37,99,99')
Take into account that ax holds two bytes and it should be fairly clear why ax contains 0x2C33.
I suspect what you wanted was more along the lines of this (no quotes and we use db to indicate we are declaring byte-sized data instead of dw that declares word-sized data):
db 13,37,99,99
This would still give you 0x6363 (ax holds two bytes / conversion of 99, 99 to hex). Not sure where you got 0x3337 from.
I recommend that you install yourself a hex editor and have an experiment inspecting the output from NASM.

Related

What is this unusual text being used with loadstring() in Lua?

I have some Lua code which appears to be an attempt to secure the code by obscurity. My understanding of the loadstring() function is a text string is composed of Lua source code text and then converted to executable Lua code by the loadstring() method.
With the following Lua source, I tried to read the contents of the variable code by invoking print on the variable code; while I did see some valid source text in the converted string, a majority of the characters were not displayed (I assume ones with character codes below 40 and above 176). Note that there are some particularly high values in there for ASCII, e.g. 231 is obviously in the extended set, being the trademark sign. Additionally, there are several null characters in there. All this makes me doubt if it is indeed ASCII.
Could someone please tell me if the string is valid Lua source, and how to be able to get Lua to return the string as printable characters so that I can see what this code does?
When I run my version with print in the Lua console on Windows I get many empty boxes, presumably the console can only print pure ASCII?
Note that the code is executed using Lua version 5.0.2
code='\27\76\117\97\80\1\4\4\4\6\8\9\9\8\182\9\147\104\231\245\125\65\12\0\0\0\64\108\117\97\101\109\103\46\108\117\97\0\1\0\0\0\0\0\0\5\23\0\0\0\8\0\0\0\16\0\0\0\17\0\0\0\17\0\0\0\17\0\0\0\17\0\0\0\17\0\0\0\18\0\0\0\18\0\0\0\19\0\0\0\20\0\0\0\21\0\0\0\35\0\0\0\35\0\0\0\26\0\0\0\49\0\0\0\49\0\0\0\37\0\0\0\59\0\0\0\59\0\0\0\54\0\0\0\61\0\0\0\66\0\0\0\2\0\0\0\4\0\0\0\104\52\120\0\1\0\0\0\22\0\0\0\7\0\0\0\77\111\100\117\108\101\0\12\0\0\0\22\0\0\0\0\0\0\0\12\0\0\0\4\13\0\0\0\122\122\97\78\111\100\101\78\97\109\101\115\0\4\6\0\0\0\90\90\65\48\49\0\4\6\0\0\0\90\90\65\48\50\0\4\14\0\0\0\122\122\97\84\101\120\116\90\101\105\108\101\110\0\4\12\0\0\0\122\122\97\80\111\115\105\116\105\111\110\0\3\0\0\0\0\0\0\240\63\4\8\0\0\0\122\122\97\84\101\120\116\0\4\1\0\0\0\0\4\20\0\0\0\122\122\97\67\117\114\114\101\110\116\84\101\120\116\86\97\108\117\101\0\4\9\0\0\0\122\122\97\83\101\116\117\112\0\4\10\0\0\0\122\122\97\83\101\108\101\99\116\0\4\9\0\0\0\122\122\97\82\101\115\101\116\0\4\0\0\0\0\0\0\0\2\0\0\0\0\1\0\7\14\0\0\0\3\0\0\0\3\0\0\0\4\0\0\0\4\0\0\0\4\0\0\0\5\0\0\0\5\0\0\0\5\0\0\0\5\0\0\0\4\0\0\0\5\0\0\0\7\0\0\0\7\0\0\0\8\0\0\0\4\0\0\0\7\0\0\0\115\116\114\116\98\108\0\0\0\0\0\13\0\0\0\16\0\0\0\40\102\111\114\32\103\101\110\101\114\97\116\111\114\41\0\5\0\0\0\11\0\0\0\12\0\0\0\40\102\111\114\32\115\116\97\116\101\41\0\5\0\0\0\11\0\0\0\2\0\0\0\118\0\5\0\0\0\11\0\0\0\0\0\0\0\2\0\0\0\4\7\0\0\0\98\117\102\102\101\114\0\4\1\0\0\0\0\0\0\0\0\14\0\0\0\65\0\0\1\7\0\0\1\0\0\0\1\3\128\1\2\222\0\128\1\5\0\0\4\198\0\0\5\83\1\2\4\7\0\0\4\29\0\0\1\84\254\127\0\5\0\0\1\27\0\1\1\27\128\0\0\0\0\0\0\26\0\0\0\1\1\0\4\18\0\0\0\27\0\0\0\28\0\0\0\28\0\0\0\29\0\0\0\29\0\0\0\30\0\0\0\32\0\0\0\32\0\0\0\32\0\0\0\32\0\0\0\32\0\0\0\33\0\0\0\33\0\0\0\33\0\0\0\33\0\0\0\33\0\0\0\27\0\0\0\35\0\0\0\2\0\0\0\8\0\0\0\122\122\97\70\105\108\101\0\0\0\0\0\17\0\0\0\6\0\0\0\122\101\105\108\101\0\3\0\0\0\16\0\0\0\1\0\0\0\7\0\0\0\77\111\100\117\108\101\0\5\0\0\0\4\5\0\0\0\114\101\97\100\0\0\4\14\0\0\0\122\122\97\84\101\120\116\90\101\105\108\101\110\0\4\12\0\0\0\122\122\97\80\111\115\105\116\105\111\110\0\3\0\0\0\0\0\0\240\63\0\0\0\0\18\0\0\0\148\3\128\0\139\62\0\1\153\0\1\1\85\128\125\0\20\0\128\0\148\2\128\0\4\0\0\2\6\63\1\2\4\0\0\3\70\191\1\3\73\128\1\2\4\0\0\2\4\0\0\3\70\191\1\3\140\191\1\3\201\128\126\2\212\251\127\0\27\128\0\0\0\0\0\0\37\0\0\0\1\2\0\7\21\0\0\0\39\0\0\0\39\0\0\0\39\0\0\0\39\0\0\0\39\0\0\0\40\0\0\0\40\0\0\0\40\0\0\0\40\0\0\0\43\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\49\0\0\0\3\0\0\0\6\0\0\0\118\97\108\117\101\0\0\0\0\0\20\0\0\0\9\0\0\0\110\111\100\101\78\97\109\101\0\0\0\0\0\20\0\0\0\20\0\0\0\122\122\97\83\101\108\101\99\116\101\100\80\111\115\105\116\105\111\110\0\10\0\0\0\20\0\0\0\1\0\0\0\7\0\0\0\77\111\100\117\108\101\0\7\0\0\0\4\8\0\0\0\122\122\97\84\101\120\116\0\4\14\0\0\0\122\122\97\84\101\120\116\90\101\105\108\101\110\0\4\20\0\0\0\122\122\97\67\117\114\114\101\110\116\84\101\120\116\86\97\108\117\101\0\4\5\0\0\0\67\97\108\108\0\4\5\0\0\0\90\90\65\48\0\4\14\0\0\0\58\65\99\116\105\118\97\116\101\78\111\100\101\0\3\0\0\0\0\0\0\240\63\0\0\0\0\21\0\0\0\4\0\0\2\4\0\0\3\198\190\1\3\6\128\1\3\201\0\125\2\4\0\0\2\4\0\0\3\134\190\1\3\201\0\126\2\0\0\0\2\197\0\0\3\1\1\0\4\0\128\0\5\65\1\0\6\147\1\2\4\1\1\0\5\0\0\1\6\147\129\2\5\129\1\0\6\89\0\2\3\27\128\0\0\0\0\0\0\54\0\0\0\1\0\0\4\19\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\59\0\0\0\0\0\0\0\1\0\0\0\7\0\0\0\77\111\100\117\108\101\0\7\0\0\0\4\5\0\0\0\67\97\108\108\0\4\13\0\0\0\122\122\97\78\111\100\101\78\97\109\101\115\0\3\0\0\0\0\0\0\240\63\4\14\0\0\0\58\65\99\116\105\118\97\116\101\78\111\100\101\0\4\4\0\0\0\97\108\108\0\3\0\0\0\0\0\0\0\0\3\0\0\0\0\0\0\0\64\0\0\0\0\19\0\0\0\5\0\0\0\4\0\0\1\198\190\0\1\6\191\0\1\193\0\0\2\147\128\0\1\1\1\0\2\65\1\0\3\89\0\2\0\5\0\0\0\4\0\0\1\198\190\0\1\6\192\0\1\193\0\0\2\147\128\0\1\1\1\0\2\65\1\0\3\89\0\2\0\27\128\0\0\23\0\0\0\34\0\0\0\202\0\0\1\10\0\1\2\65\0\0\3\129\0\0\4\95\0\0\2\137\0\125\1\10\0\0\2\137\128\126\1\201\63\127\1\73\64\128\1\73\64\129\1\98\0\0\2\0\128\0\0\137\128\129\1\162\0\0\2\0\128\0\0\137\0\130\1\226\0\0\2\0\128\0\0\137\128\130\1\27\0\1\1\27\128\0\0';
return loadstring(code)();
This string is valid chunk of Lua code precompiled into bytecode. Header say it's for Lua 5.0. It's not a text, it doesn't need decoding, so can be run directly with loadstring()
To provide a few more details than Vlad's answer for anyone who may come across this posting.
The Lua loadstring() function accepts a string of characters that are either Lua source text or Lua bytecode. It appears that the function determines type of the text by looking at the first character of the string to see if it is an escape character (0x1b or decimal 27) or not.
The loadstring() function returns an anonymous function so in the code sample:
code='\27\76\117\97\80\1\4\4\4\6\8\9\9\8\182\9\147\104\231\245\125\65\12\0\0\0\64\108\117\97\101\109\103\46\108\117\97\0\1\0\0\0\0\0\0\5\23\0\0\0\8\0\0\0\16\0\0\0\17\0\0\0\17\0\0\0\17\0\0\0\17\0\0\0\17\0\0\0\18\0\0\0\18\0\0\0\19\0\0\0\20\0\0\0\21\0\0\0\35\0\0\0\35\0\0\0\26\0\0\0\49\0\0\0\49\0\0\0\37\0\0\0\59\0\0\0\59\0\0\0\54\0\0\0\61\0\0\0\66\0\0\0\2\0\0\0\4\0\0\0\104\52\120\0\1\0\0\0\22\0\0\0\7\0\0\0\77\111\100\117\108\101\0\12\0\0\0\22\0\0\0\0\0\0\0\12\0\0\0\4\13\0\0\0\122\122\97\78\111\100\101\78\97\109\101\115\0\4\6\0\0\0\90\90\65\48\49\0\4\6\0\0\0\90\90\65\48\50\0\4\14\0\0\0\122\122\97\84\101\120\116\90\101\105\108\101\110\0\4\12\0\0\0\122\122\97\80\111\115\105\116\105\111\110\0\3\0\0\0\0\0\0\240\63\4\8\0\0\0\122\122\97\84\101\120\116\0\4\1\0\0\0\0\4\20\0\0\0\122\122\97\67\117\114\114\101\110\116\84\101\120\116\86\97\108\117\101\0\4\9\0\0\0\122\122\97\83\101\116\117\112\0\4\10\0\0\0\122\122\97\83\101\108\101\99\116\0\4\9\0\0\0\122\122\97\82\101\115\101\116\0\4\0\0\0\0\0\0\0\2\0\0\0\0\1\0\7\14\0\0\0\3\0\0\0\3\0\0\0\4\0\0\0\4\0\0\0\4\0\0\0\5\0\0\0\5\0\0\0\5\0\0\0\5\0\0\0\4\0\0\0\5\0\0\0\7\0\0\0\7\0\0\0\8\0\0\0\4\0\0\0\7\0\0\0\115\116\114\116\98\108\0\0\0\0\0\13\0\0\0\16\0\0\0\40\102\111\114\32\103\101\110\101\114\97\116\111\114\41\0\5\0\0\0\11\0\0\0\12\0\0\0\40\102\111\114\32\115\116\97\116\101\41\0\5\0\0\0\11\0\0\0\2\0\0\0\118\0\5\0\0\0\11\0\0\0\0\0\0\0\2\0\0\0\4\7\0\0\0\98\117\102\102\101\114\0\4\1\0\0\0\0\0\0\0\0\14\0\0\0\65\0\0\1\7\0\0\1\0\0\0\1\3\128\1\2\222\0\128\1\5\0\0\4\198\0\0\5\83\1\2\4\7\0\0\4\29\0\0\1\84\254\127\0\5\0\0\1\27\0\1\1\27\128\0\0\0\0\0\0\26\0\0\0\1\1\0\4\18\0\0\0\27\0\0\0\28\0\0\0\28\0\0\0\29\0\0\0\29\0\0\0\30\0\0\0\32\0\0\0\32\0\0\0\32\0\0\0\32\0\0\0\32\0\0\0\33\0\0\0\33\0\0\0\33\0\0\0\33\0\0\0\33\0\0\0\27\0\0\0\35\0\0\0\2\0\0\0\8\0\0\0\122\122\97\70\105\108\101\0\0\0\0\0\17\0\0\0\6\0\0\0\122\101\105\108\101\0\3\0\0\0\16\0\0\0\1\0\0\0\7\0\0\0\77\111\100\117\108\101\0\5\0\0\0\4\5\0\0\0\114\101\97\100\0\0\4\14\0\0\0\122\122\97\84\101\120\116\90\101\105\108\101\110\0\4\12\0\0\0\122\122\97\80\111\115\105\116\105\111\110\0\3\0\0\0\0\0\0\240\63\0\0\0\0\18\0\0\0\148\3\128\0\139\62\0\1\153\0\1\1\85\128\125\0\20\0\128\0\148\2\128\0\4\0\0\2\6\63\1\2\4\0\0\3\70\191\1\3\73\128\1\2\4\0\0\2\4\0\0\3\70\191\1\3\140\191\1\3\201\128\126\2\212\251\127\0\27\128\0\0\0\0\0\0\37\0\0\0\1\2\0\7\21\0\0\0\39\0\0\0\39\0\0\0\39\0\0\0\39\0\0\0\39\0\0\0\40\0\0\0\40\0\0\0\40\0\0\0\40\0\0\0\43\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\46\0\0\0\49\0\0\0\3\0\0\0\6\0\0\0\118\97\108\117\101\0\0\0\0\0\20\0\0\0\9\0\0\0\110\111\100\101\78\97\109\101\0\0\0\0\0\20\0\0\0\20\0\0\0\122\122\97\83\101\108\101\99\116\101\100\80\111\115\105\116\105\111\110\0\10\0\0\0\20\0\0\0\1\0\0\0\7\0\0\0\77\111\100\117\108\101\0\7\0\0\0\4\8\0\0\0\122\122\97\84\101\120\116\0\4\14\0\0\0\122\122\97\84\101\120\116\90\101\105\108\101\110\0\4\20\0\0\0\122\122\97\67\117\114\114\101\110\116\84\101\120\116\86\97\108\117\101\0\4\5\0\0\0\67\97\108\108\0\4\5\0\0\0\90\90\65\48\0\4\14\0\0\0\58\65\99\116\105\118\97\116\101\78\111\100\101\0\3\0\0\0\0\0\0\240\63\0\0\0\0\21\0\0\0\4\0\0\2\4\0\0\3\198\190\1\3\6\128\1\3\201\0\125\2\4\0\0\2\4\0\0\3\134\190\1\3\201\0\126\2\0\0\0\2\197\0\0\3\1\1\0\4\0\128\0\5\65\1\0\6\147\1\2\4\1\1\0\5\0\0\1\6\147\129\2\5\129\1\0\6\89\0\2\3\27\128\0\0\0\0\0\0\54\0\0\0\1\0\0\4\19\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\56\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\57\0\0\0\59\0\0\0\0\0\0\0\1\0\0\0\7\0\0\0\77\111\100\117\108\101\0\7\0\0\0\4\5\0\0\0\67\97\108\108\0\4\13\0\0\0\122\122\97\78\111\100\101\78\97\109\101\115\0\3\0\0\0\0\0\0\240\63\4\14\0\0\0\58\65\99\116\105\118\97\116\101\78\111\100\101\0\4\4\0\0\0\97\108\108\0\3\0\0\0\0\0\0\0\0\3\0\0\0\0\0\0\0\64\0\0\0\0\19\0\0\0\5\0\0\0\4\0\0\1\198\190\0\1\6\191\0\1\193\0\0\2\147\128\0\1\1\1\0\2\65\1\0\3\89\0\2\0\5\0\0\0\4\0\0\1\198\190\0\1\6\192\0\1\193\0\0\2\147\128\0\1\1\1\0\2\65\1\0\3\89\0\2\0\27\128\0\0\23\0\0\0\34\0\0\0\202\0\0\1\10\0\1\2\65\0\0\3\129\0\0\4\95\0\0\2\137\0\125\1\10\0\0\2\137\128\126\1\201\63\127\1\73\64\128\1\73\64\129\1\98\0\0\2\0\128\0\0\137\128\129\1\162\0\0\2\0\128\0\0\137\0\130\1\226\0\0\2\0\128\0\0\137\128\130\1\27\0\1\1\27\128\0\0';
return loadstring(code)();
you have a text string that contains Lua bytecode, as indicated by the leading escape character of \27, and then a call to loadstring() to create a function which is then executed.
The first few characters of the text string contain the precompiled Lua header (see Lua 5.2 Bytecode and Virtual Machine). The length of this header varies depending on the version of Lua. However the first few characters seem to be fairly standard. code='\27\76\117\97\80 ... contains the escape character (0x1b or decimal 27), the capital letter L (decimal 76), the lower case letter u (decimal 117), the lower case letter a (decimal 97), and the Lua version (decimal 80 is 0x50 indicating version 5.0).
The following example is from Lua 5.2 Bytecode and Virtual Machine.
What exactly is in the bytecode? Here is the hexdump of hello.luac
(made by hd on my system).
00000000 1b 4c 75 61 52 00 01 04 04 04 08 00 19 93 0d 0a |.LuaR...........|
00000010 1a 0a 00 00 00 00 00 00 00 00 00 01 04 07 00 00 |................|
00000020 00 01 00 00 00 46 40 40 00 80 00 00 00 c1 80 00 |.....F##........|
00000030 00 96 c0 00 01 5d 40 00 01 1f 00 80 00 03 00 00 |.....]#.........|
00000040 00 04 06 00 00 00 48 65 6c 6c 6f 00 04 06 00 00 |......Hello.....|
The format is not officially documented, and needs to be
reverse-engineered. The necessary material is in the Lua source code,
of course, in several places, mainly ldump.c and lundump.c. I have
also cross-checked with NFI and LAT, but any remaining errors are
mine.
The code starts with an 18-byte file header, which is the same for all
official Lua 5.2 bytecode compiled on a machine like yours, whether by
luac or load or loadfile. Lua 5.1 only had a 12-byte header, similar
to the first 12 bytes of this one.
Byte numbers are in origin-1 decimal (mostly showing the arithmetic)
and origin-0 hex.
1 x00: 1b 4c 75 61 LUA_SIGNATURE from lua.h.
5 x04: 52 00
Binary-coded decimal 52 for the Lua version, 00 to say the bytecode is
compatible with the "official" PUC-Rio implementation.
5+2 x06: 01 04
04 04 08 00 Six system parameters. On x386 machines they mean:
little-endian, 4-byte integers, 4-byte VM instructions, 4-byte size_t
numbers, 8-byte Lua numbers, floating-point. These parameters must all
match up between the bytecode file and the Lua interpreter, otherwise
the bytecode is invalid.
7+6 x0c: 19 93 0d 0a 1a 0a
Present in all
bytecode produced by Lua 5.2 from PUC-Rio. Described in lundump.h as
"data to catch conversion errors". Might be constructed from
binary-coded decimal 1993 (the year it all started), Windows line
terminator, MS-DOS text file terminator, Unix line terminator.
After these 18 bytes come the functions defined in the file. Each function
starts with an 11-byte function header.
13+6 x12: 00 00 00 00 Line number in source code where chunk starts.
0 for the main chunk.
19+4 x16: 00 00 00 00 Line number in source
code where chunk stops. 0 for the main chunk.
23+4 x1a: 00 01 04
Number of parameters, vararg flag, number of registers used by this
function (not more than 255, obviously). Local variables are stored in
registers; there may not be more than 200 of them (see lparser.c).

The DHT in JPEG does not contain the actual huffman code, how come?

The DHT contains 16 bytes that just contains count of how many values were encoded with huffman code of each length from 1 bit all the way to 16 bits. After this, it contains the actual values that were encoded, all these value are 8 bits in size.
Q: Why is huffman code not stored, how does decoder derive the codes?
Q: If there are say 4 values that have huffman code of 3 bits long, we shall write them as 4 bytes. Does it matter what order they are in or they have to be in ascending or descending order? I do know that the values must be in order such that the values with 1 bit huffman code are then followed by values with 2 bit huffman code e.t.c.
Q: I have used jpegsnoop to look at huffman table of different files, some made in MS paint and some were downloaded. I find that they all have the SAME table.
Here are the DHT entries I got from JPEG snoop:
Destination ID = 1
Class = 1 (AC Table)
Codes of length 01 bits (000 total):
Codes of length 02 bits (002 total): 00 01
Codes of length 03 bits (001 total): 02
Codes of length 04 bits (002 total): 03 11
Codes of length 05 bits (004 total): 04 05 21 31
Codes of length 06 bits (004 total): 06 12 41 51
Codes of length 07 bits (003 total): 07 61 71
Codes of length 08 bits (004 total): 13 22 32 81
Codes of length 09 bits (007 total): 08 14 42 91 A1 B1 C1
Codes of length 10 bits (005 total): 09 23 33 52 F0
Codes of length 11 bits (004 total): 15 62 72 D1
Codes of length 12 bits (004 total): 0A 16 24 34
Codes of length 13 bits (000 total):
Codes of length 14 bits (001 total): E1
Codes of length 15 bits (002 total): 25 F1
Codes of length 16 bits (119 total): 17 18 19 1A 26 27 28 29 2A 35 36 37 38 39 3A 43
44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63
64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 82
83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99
9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7
B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5
D6 D7 D8 D9 DA E2 E3 E4 E5 E6 E7 E8 E9 EA F2 F3
F4 F5 F6 F7 F8 F9 FA
Total number of codes: 162
And
Destination ID = 1
Class = 0 (DC / Lossless Table)
Codes of length 01 bits (000 total):
Codes of length 02 bits (003 total): 00 01 02
Codes of length 03 bits (001 total): 03
Codes of length 04 bits (001 total): 04
Codes of length 05 bits (001 total): 05
Codes of length 06 bits (001 total): 06
Codes of length 07 bits (001 total): 07
Codes of length 08 bits (001 total): 08
Codes of length 09 bits (001 total): 09
Codes of length 10 bits (001 total): 0A
Codes of length 11 bits (001 total): 0B
Codes of length 12 bits (000 total):
Codes of length 13 bits (000 total):
Codes of length 14 bits (000 total):
Codes of length 15 bits (000 total):
Codes of length 16 bits (000 total):
Total number of codes: 012
The AC table compresses RRRRSSSS that contain zero-run length and AC coefficient magnitude while the DC table compresses SSSS. Thus, I think that the AC table must contain total of 255 entries (exlcuded 0) while the DC table must be 15 entries (excluded 0). However, neither of the tables contain this many total number of codes. WHY?
Q: Why is huffman code not stored, how does decoder derive the codes?
The reason the Huffman tables is defined as they are rather than with the actual codes is that it is much smaller and simpler to encode that way. PNG uses a similar but different method.
Keep in mind that to store the Huffman codes in the JPEG stream you would need to include both the length and the code itself. The result would be much larger than encoding a count of lengths.
Q: If there are say 4 values that have huffman code of 3 bits long, we shall write them as 4 bytes. Does it matter what order they are in or they have to be in ascending or descending order?
If the Huffman code has 3 bits, it is written as three bits to the JPEG stream. The codes are generated in ascending order.
Q: I have used jpegsnoop to look at huffman table of different files, some made in MS paint and some were downloaded. I find that they all have the SAME table.
The encoder is being lazy and using a fixed Huffman table. There is a sample Huffman table in the JPEG standard that they often use. To generate optimal Huffman codes, the encoder must make two passes over the data. With a preset table, the encoder only needs to make one pass.
F.1.2.1.2 and F.1.2.2.1 of the JPEG Specification explain why the Huffman tables are not fully populated. For baseline encoding DC difference values are limited to 11 bits (table F.1) and AC values are limited to 10 bits (table F.2).
Since DC Huffman symbols only need SSSS values from 0 to 11 their Huffman trees need only 12 codes as you've reported.
AC Huffman symbols have a prefix zero run count from 0 to 15. With 11 bit sizes that works out to 16 * 11 = 176 symbols. However, they don't include the symbols 0x10, 0x20, ... 0xE0 because they are redundant. They encode a run of 1, 2, ... 14 zeros followed by a 0 value. If an encoder has, say, 7 zero values followed by a 3 bit value it can encode that as 0x73. There would be no point encoding it with two symbols 0x60;0x03.
Ignoring those 14 useless symbols we end up with 162 codes as you have reported.
By the way, the 0xF0 (ZRL) value is needed because there isn't a symbol that can express a run of 16 zeros followed by a value thus is cannot be merged.
I don't know why the JPEG spec limits the DC and AC values to a certain number of bits. I speculate that the extra precision would have no effect or is typically thrown away by quantization. Or maybe it has to do with the mathematics of the Inverse Discrete Cosine Transform. Keep in mind that these Huffman encoded values are (quantized) coefficients for the IDCT and are only indirectly related to the continuous tone RGB output.
The Huffman encoding is almost fully determined by the relative frequencies of all 256 symbols (except tiebreaker rules). This means you can choose many, many formats to encode those relative frequencies; the most simple one would be to simply store all these frequencies. The receiver can then rebuild the encoding from that order.
Background: the two least frequent characters of a Huffman encoding share the same (long) prefix, and differ only in the last bit. This combination is then assigned a joint frequency (sum of both combinations), which is used recursively to determine the prefix. Finally, you end up with two sets, one holding X characters and the other holding 256-X characters. The first set has prefix 0 and the second set has prefix 1.
Yes, that's arbitrary, you could swap those 0 and 1, and have a similar table and the same compression ratio - a 0 is just as long as a one. That's why you have detailed rules (e.g. most common set gets 0, tiebreaker is first byte in set)
Back to encoding. You want to store these relative frequencies efficiently, as we're using compression here. Now, as I pointed out, when we have codes suffix-0 and suffix-1, they're both equally long (namely the suffix length plus one). So we know from the fact that there are 119 unique 16 bits codes that there are 60 unique prefixes with length 15. Calculating backwards, we also know that there are two unique symbols with length 15, total 62, so there must be 31 prefixes of length 14. We can backtrack again to prefixes of length 1.
Again, it's necessary to point out that we don't know here the exact values of those prefixes, and the matching symbols. This depends on the tiebreaker rules, as pointed out, but those rules are fixed for JPEG.
JPEG does have a bit of a special case: Huffman codes for very rare symbols should be longer than 16 bits. That's inconvenient, so in building the table you don't choose the two sets with the least combined frequency if either of them already has a long suffix - combining those two subsets would just make the suffix even longer. You see this with all the 16-bit codes in the example: most should have had longer codes in a pure Huffman encoding.
I think the worst case is if the most frequent character appears 50%, the next 25%, etc. You'll get codes 0, 10, 110, 1110 etcetera. That's unary counting, which is indeed optimal for that case, but the longest code is now 256 bits. You'd need a document with 2^256 bytes to have a frequency of 2^-256, though.

How to add filter to field in Wireshark

I'm trying to add a filter to a field in Wireshak.
My dissector name is: "basic".
it has 3 fields - field1, field2, field3.
each field can have a value of string.
I want that on Wireshark i'll be able to filter by a particular field, for example: basic.field1. (just the same as you look for tcp.len)
How can i do this?
You must declare the fields, assign them to your protocol and add them to the tree when appropriate. There are currently 2 different types of strings supported by Lua, those of type ftypes.STRING, which is used for strings of a known, fixed length, and type ftypes.STRINGZ, which is a NULL (zero)-terminated string, so how you declare the fields will depend upon which of the 2 types they are.
Unfortunately, despite the documentation listing ftypes.UINT_STRING as a supported type, it isn't as can be seen in the source code for wslua_proto_field.c. This type of string is applicable when a length field precedes the string to indicate the length of the string in bytes. In any case, it isn't currently available for Lua-based dissectors.
So, as an example, let's suppose your protocol uses UDP/33333 as its transport and port number, and its 3 fields consist of each of the 3 types of strings described above, namely:
field1: a fixed-length string of 12 bytes.
field2: a NULL-terminated string of arbitrary length.
field3: a counted string preceded by a 2 byte length field in big-endian (network) byte order.
Given these assumptions, the following will dissect the packets:
-- Protocol
local p_basic = Proto("basic", "Basic Protocol")
-- Fields
local f_basic_field1 = ProtoField.string("basic.field1", "Field1")
local f_basic_field2 = ProtoField.stringz("basic.field2", "Field2")
local f_basic_field3 = ProtoField.string("basic.field3", "Field3")
p_basic.fields = { f_basic_field1, f_basic_field2, f_basic_field3 }
-- Dissection
function p_basic.dissector(buf, pinfo, tree)
local basic_tree = tree:add(p_basic, buf(0,-1))
pinfo.cols.protocol:set("BASIC")
basic_tree:add(f_basic_field1, buf(0, 12))
local strz = buf(12):stringz()
local field2_len = string.len(strz) + 1
basic_tree:add(f_basic_field2, buf(12, field2_len))
local field3_len = buf:range(12 + field2_len, 2):uint()
basic_tree:add(f_basic_field3, buf(12 + field2_len + 2, field3_len))
end
-- Registration
local udp_table = DissectorTable.get("udp.port")
udp_table:add(33333, p_basic)
If you want to test this, first save the above lua code to a file such as basic.lua in your personal plugins directory (Found via Help -> About Wireshark -> Folders -> Personal Plugins). You can then use the following hex bytes to test it:
0000 00 0e b6 00 00 02 00 0e b6 00 00 01 08 00 45 00
0010 00 37 00 00 40 00 40 11 b5 ea c0 00 02 65 c0 00
0020 02 66 82 35 82 35 00 23 22 32 48 65 6c 6c 6f 20
0030 77 6f 72 6c 64 21 48 69 20 74 68 65 72 65 00 00
0040 04 42 79 65 21
Save these bytes into a text file, e.g., basic.txt. Start Wireshark and import the file via File -> Import from Hex Dump... -> Filename:basic.txt -> OK. You should see the 3 fields dissected as part of the "Basic Protocol".
For further help with Lua dissectors, you might want to refer to one or more of the following:
Wireshark Lua (wiki page)
Wireshark Lua Examples (wiki page)
Wireshark Lua Dissectors (wiki page)
Wireshark Contributed scripts, macros, colouring rules and plugins (wiki page)
Chapter 10. Lua Support in Wireshark (Wireshark Developer's Guide)
https://www.wireshark.org/docs/wsdg_html_chunked/wsluarm_modules.html (Wireshark Developer's Guide)
Wireshark Lua API Reference Manual Addendum (wiki page)
Lua Directory
Sharkfest '15 presentation and accompanying YouTube video, by Graham Bloice

COM Port Commands CRC XOR

I have a check printer that I want to connect in Delphi 7 by COM port and operate.
I have a command that I extracted with Serial Port Monitor:
STX "PIRI(781" FS NULL ETX "0B" wich is 02 50 49 52 49 28 37 38 31 1c 00 03 30 42 hex
The manual says the following:
CRC (which is the last two digits after the ETX) - packet checksum. It
is calculated by the following algorithm: executing XOR for every
byte of the block including ETX by excluding STX. The data of the
checksum take up two bytes and are a symbolic representation of the
numeric in a hexadecimal calculation system.
I tried to an ONLINE CRC calculator and return a 1B result and a 27 numeric.
How to do it? For "PIRI(781" FS NULL ETX it should be 0B
The documentation incorrectly identifies the check value as a CRC. It is not. It is simply the exclusive-or of the noted bytes. The exclusive-or of 50 49 52 49 28 37 38 31 1c 00 03 is 0b. You then convert the 0b to hex (with an upper case B, i.e. 0B), and get 30 42.

Convert DeDe constant to valid declaration or other interface extraction tool?

I am using DeDe to create an API (Interface) I can compile to. (Strictly legit: while we wait for the vendor to deliver a D2010 version in two months, we can at least get our app compiling...)
We'll stub out all methods.
Dede emits constant declarations like these:
LTIMGLISTCLASS =
00: ÿÿÿÿ....LEADIMGL|FF FF FF FF 0D 00 00 00 4C 45 41 44 49 4D 47 4C|
10: IST32. |49 53 54 33 32 00|;
DS_PREFIX =
0: ÿÿÿÿ....DICM.|FF FF FF FF 04 00 00 00 44 49 43 4D 00|;
How would I convert these into a compilable statement?
In theory, I don't care about the actual values, since I doubt they're use anywhere, but I'd like to get their size correct. Are these integers, LongInts or ???
Any other hints on using DeDe would be welcome.
Those are strings. The first four bytes are the reference count, which for string literals is always -1 ($ffffffff). The next four bytes are the character count. Then comes the characters an a null terminator.
const
LTIMGLISTCLASS = 'LEADIMGLIST32'; // 13 = $0D characters
DS_PREFIX = 'DICM'; // 4 = $04 characters
You don't have to "doubt" whether those constants are used anywhere. You can confirm it empirically. Compile your project without those constants. If it compiles, then they're not used.
If your project doesn't compile, then those constants must be used somewhere in your code. Based on the context, you can provide your own declarations. If the constant is used like a string, then declare a string; if it's used like an integer, then declare an integer.
Another option is to load your project in a version of Delphi that's compatible with the DCUs you have. Use code completion to make the IDE display the constant and its type.

Resources