I know that Common Lisp discourages a programmer from touching raw memory, but I would like to know whether it is possible to see how an object is stored on a byte level. Of course, a garbage collector moves objects in memory space and two subsequent calls of a function (obj-as-bytes obj) could yield different results, but let us assume that we need just a memory snapshot. How would you implement such function?
My attempt with SBCL looks as follows:
(defun obj-as-bytes (obj)
(let* ((addr (sb-kernel:get-lisp-obj-address obj)) ;; get obj address in memory
(ptr (sb-sys:int-sap addr)) ;; make pointer to this area
(size (sb-ext:primitive-object-size obj)) ;; get object size
(output))
(dotimes (idx size)
(push (sb-sys:sap-ref-64 ptr idx) output)) ;; collect raw bytes into list
(nreverse output))) ;; return bytes in the reversed order
Let's try:
(obj-as-bytes #(1)) =>
(0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 111 40 161 4 16 0 0 0 23 1 16 80 0 0 0)
(obj-as-bytes #(2) =>
(0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 95 66 161 4 16 0 0 0 23 1 16 80 0 0 0)
From this output I conclude that there is a lot of garbage, which occupies space for future memory allocations. And we see it because (sb-ext:primitive-object-size obj) seems to return a chunk of memory which is large enough to fit the object.
This code demonstrates it:
(loop for n from 0 below 64 collect
(sb-ext:primitive-object-size (make-string n :initial-element #\a))) =>
(16 32 32 32 32 48 48 48 48 64 64 64 64 80 80 80 80 96 96 96 96 112 112 112 112 128 128 128 128 144 144 144 144 160 160 160 160 176 176 176 176 192 192 192 192 208 208 208 208 224 224 224 224 240 240 240 240 256 256 256 256 272 272 272)
So, obj-as-bytes would give a correct result if sb-ext:primitive-object-size were more accurate. But I cannot find any alternative.
Do you have any suggestions how to fix this function or how to implement it differently?
As I mentioned in a comment the layout of objects in memory is very implementation-specific and the tools to explore it are necessarily also implementation-dependent.
This answer discusses the layout for 64-bit versions of SBCL and only for 64-bit versions which have 'wide fixnums'. I'm not sure in which order these two things arrived in SBCL as I haven't looked seriously at any of this since well before SBCL and CMUCL diverged.
This answer also may be wrong: I'm not an SBCL developer and I'm only adding it because no one who is has (I suspect tagging the question properly might help with this).
Information below comes from looking at the GitHub mirror, which seems to be very up to date with the canonical source but a lot faster.
Pointers, immediate objects, tags
[Information from here.] SBCL allocates on two-word boundaries. On a 64-bit system this means that the low four bits of any address are always zero. These low four bits are used as a tag (the documentation calls this the 'lowtag') to tell you what sort of thing is in the rest of the word.
A lowtag of xyz0 means that the rest of the word is a fixnum, and in particular xyz will then be the low bits of the fixnum, rather than tag bits at all. This means both that there are 63 bits available for fixnums and that fixnum addition is trivial: you don't need to mask off any bits.
A lowtag of xy01 means that the rest of the word is some other immediate object. Some of the bits to the right of the lowtag (which I think SBCL calls a 'widetag' although I am confused about this as the term seems to be used in two ways) will say what the immediate object is. Examples of immediate objects are characters and single-floats (on a 64-bit platform!).
the remaining lowtag patterns are xy11, and they all mean that things are pointers to some non-immediate object:
0011 is an instance of something;
0111 is a cons;
1011 is a function;
1111 is something else.
Conses
Because conses don't need any additional type information (a cons is a cons) the lowtag is enough: a cons is then just two words in memory, each of which in turn has lowtags &c.
Other non-immediate objects
I think (but am not sure) that all other non-immediate objects have a word which says what they are (which may also be called a 'widetag') and at least one other word (because allocation is on two-word boundaries). I suspect that the special tag for functions means that function call can just jump to the entry point of the function's code.
Looking at this
room.lisp has a nice function called hexdump which knows how to print out non-immediate objects. Based on that I wrote a little shim (below) which tries to tell you useful things. Here are some examples.
> (hexdump-thing 1)
lowtags: 0010
fixnum: 0000000000000002 = 1
1 is a fixnum and its representation is just shifted right one bit as described above. Note that the lowtags actually contain the whole value in this case!
> (hexdump-thing 85757)
lowtags: 1010
fixnum: 0000000000029DFA = 85757
... but not in this case.
> (hexdump-thing #\c)
lowtags: 1001
immediate: 0000000000006349 = #\c
> (hexdump-thing 1.0s0)
lowtags: 1001
immediate: 3F80000000000019 = 1.0
Characters and single floats are immediate: some of the bits to the left of the lowtag tells the system what they are, I think?
> (hexdump-thing '(1 . 2))
lowtags: 0111
cons: 00000010024D6E07 : 00000010024D6E00
10024D6E00: 0000000000000002 = 1
10024D6E08: 0000000000000004 = 2
> (hexdump-thing '(1 2 3))
lowtags: 0111
cons: 00000010024E4BC7 : 00000010024E4BC0
10024E4BC0: 0000000000000002 = 1
10024E4BC8: 00000010024E4BD7 = (2 3)
Conses. In the first case you can see the two fixnums sitting as immediate values in the two fields of the cons. In the second, if you decoded the lowtag of the second field it would be 0111: it's another cons.
> (hexdump-thing "")
lowtags: 1111
other: 00000010024FAE8F : 00000010024FAE80
10024FAE80: 00000000000000E5
10024FAE88: 0000000000000000 = 0
> (hexdump-thing "x")
lowtags: 1111
other: 00000010024FC22F : 00000010024FC220
10024FC220: 00000000000000E5
10024FC228: 0000000000000002 = 1
10024FC230: 0000000000000078 = 60
10024FC238: 0000000000000000 = 0
> (hexdump-thing "xyzt")
lowtags: 1111
other: 00000010024FDDAF : 00000010024FDDA0
10024FDDA0: 00000000000000E5
10024FDDA8: 0000000000000008 = 4
10024FDDB0: 0000007900000078 = 259845521468
10024FDDB8: 000000740000007A = 249108103229
Strings. These have some type information, a length field, and then characters are packed two to a word. A single-character string needs four words, the same as a four-character one. You can read the character codes out of the data.
> (hexdump-thing #())
lowtags: 1111
other: 0000001002511C3F : 0000001002511C30
1002511C30: 0000000000000089
1002511C38: 0000000000000000 = 0
> (hexdump-thing #(1))
lowtags: 1111
other: 00000010025152BF : 00000010025152B0
10025152B0: 0000000000000089
10025152B8: 0000000000000002 = 1
10025152C0: 0000000000000002 = 1
10025152C8: 0000000000000000 = 0
> (hexdump-thing #(1 2))
lowtags: 1111
other: 000000100252DC2F : 000000100252DC20
100252DC20: 0000000000000089
100252DC28: 0000000000000004 = 2
100252DC30: 0000000000000002 = 1
100252DC38: 0000000000000004 = 2
> (hexdump-thing #(1 2 3))
lowtags: 1111
other: 0000001002531C8F : 0000001002531C80
1002531C80: 0000000000000089
1002531C88: 0000000000000006 = 3
1002531C90: 0000000000000002 = 1
1002531C98: 0000000000000004 = 2
1002531CA0: 0000000000000006 = 3
1002531CA8: 0000000000000000 = 0
Same deal for simple vectors: header, length, but now each entry takes a word of course. Above all entries are fixnums and you can see them in the data.
And so it goes on.
The code that did this
This may be wrong and an earlier version of it definitely did not like small bignums (I think hexdump doesn't like them). If you want real answers either read the source or ask an SBCL person. Other implementations are available, and will be different.
(defun hexdump-thing (obj)
;; Try and hexdump an object, including immediate objects. All the
;; work is done by sb-vm:hexdump in the interesting cases.
#-(and SBCL 64-bit)
(error "not a 64-bit SBCL")
(let* ((address/thing (sb-kernel:get-lisp-obj-address obj))
(tags (ldb (byte 4 0) address/thing)))
(format t "~&lowtags: ~12T~4,'0b~%" tags)
(cond
((zerop (ldb (byte 1 0) tags))
(format t "~&fixnum:~12T~16,'0x = ~S~%" address/thing obj))
((= (ldb (byte 2 0) tags) #b01)
(format t "~&immediate:~12T~16,'0x = ~S~%" address/thing obj))
((= (ldb (byte 2 0) tags) #b11) ;must be true
(format t "~&~A:~12T~16,'0x : ~16,'0x~%"
(case (ldb (byte 2 2) tags)
(#b00 "instance")
(#b01 "cons")
(#b10 "function")
(#b11 "other"))
address/thing (dpb #b0000 (byte 4 0) address/thing))
;; this tells you at least something (and really annoyingly
;; does not pad addresses on the left)
(sb-vm:hexdump obj))
;; can't happen
(t (error "mutant"))))
(values))
I have an alarm system that I have configured to send SMS messages to my phone as well as over Ethernet.
Here a few of the SMSes I receive:
5522 18 1137 00 003 1C76
5522 18 3137 00 003 3278
5522 18 1130 00 002 E36E
5522 18 1401 00 001 ED6E
5522 18 1302 00 003 ED70
5522 18 1302 00 004 EE71
5522 18 1302 00 009 F376
5522 18 3147 00 009 417F
5522 18 1137 00 004 1D77
5522 18 3137 00 009 3379
5522 18 1602 00 000 0870
The first 4 bytes are the account number, the next 2 are always 18, the next 4 are event codes, 2 group bytes and 3 zone numbers. At the end there are 4 bytes which I suspect is some kind of checksum.
This is some kind of Ademco Contact ID format. However, I do not recognize the checksum.
It's not a time stamp as the last message (0870) is sent periodically and is always the same.
When sending via DTMF 0 should have value 10, but I do not know if that is the case with SMSes. Most likely not.
The checksum formula in Ademco's Contact ID is calculated using the formula:
S= HEX Checksum which is one digit.
(Sum of all message digits + S) MOD 15 = 0
and if the value is equal to 10 the checksum digit is 0.
The official Contact ID specification is here:http://li0r.files.wordpress.com/2012/07/sia-dc-05-1999-09_contact_id.pdf
So, using 5522 18 1602 00 000 0870 as an example:
LET C = checksum
5+5+2+2+1+8+1+6+2=32
(32+S) modulo 15 is congruent to 0
We then need the closest multiple of 15 going higher than 32 which would be 45.
45-32=13
Lets test that.
45 modulo 15 is congruent to 0
It is correct however, as Contact ID is 16 digits and you have 19 I would suspect that your panel is using a different proprietary implementation of Contact ID. If you post the make/model of the panel that this came from I may be able to explain things further.
I hope this answers your question!
-Alex
P.S.: To calculate mod use a percent sign in Google
P.P.S: The document that describes Contact ID is actually: DC-05-1999.09 the document you referenced is actually the computer interface communication protocol specifications.
I just want to correct AdemcoGuy's calculation as it seems to be incorrect:
So, the example was 5522 18 1602 00 000 0870
We need to replace each 0 by 10.
So:
5+5+2+2+1+8+1+6+10+2+10+10+10+10+10 = 92
than 100-92 = 8
So the cheksum is 8
Anyway in the question the checksum seems missing and what is the last 4 digit only knows who manufactured the panel which has been send it :)
#ACCT MT QXYZ GG CCC where:
ACCT: 4 Digit Account number (0-9, B-F)
MT: Message Type - Always 18
Q: Event qualifier, which gives specific event information:
1: New Event or Opening
3: New Restore or Closing
6: Previously reported condition still present (Status report)
XYZ: Event code (3 Hex digits 0-9,B-F)
GG: Group or Partition number (2 Hex digits 0-9, B-F). Use 00 to indicate that no specific group or partition information applies.
CCC: Zone number (Event reports) or User (Open / Close reports) (3 Hexdigits 0-9,B-F ). Use 000 to indicate that no specific zone or user information applies.
To look up event codes see this document (pdf).
I came across this post while trying to figure out the checksums of my own alarm system (Woonveilig/Egardia) that seems to be using the same format. I found a forum post on the german alarm forum that contains a snippet of C code to calculate CRCs for the LUPUS alarm system. This CRC calculation method seems to match both my own and Lasse's SMS based system. Here's the C code converted to a simple calculation tool:
#include <stdio.h>
#include <string.h>
// Code from: https://www.alarmforum.de/showthread.php?tid=12037&pid=75893
/**
* Fletcher Checksum.(LUPUS version, 16-bit)
*/
static unsigned int fletcher_sum(char* data, int len) {
unsigned int sum1 = 0x0, sum2 = 0x0;
while (len) {
unsigned int tlen = (len > 256) ? 256 : len;
len -= tlen;
do {
sum1 += *data++;
sum1 = (sum1 & 0xff);
sum2 += sum1;
sum2 = (sum2 & 0xff);
} while (--tlen);
}
return sum2 << 8 | sum1;
}
int main() {
char input[50];
int sum;
printf("Enter input: ");
fgets(input, sizeof(input), stdin);
sum = fletcher_sum(input, strlen(input)-1);
printf("%x\n", sum);
return 0;
}
Example (first SMS from the question post):
# cc checksum.c
# ./a.out
Enter input: 5522 18 1137 00 003
1c76
This is an Erlang question.
I have run into some unexpected behavior by io:fread.
I was wondering if someone could check whether there is something wrong with the way I use io:fread or whether there is a bug in io:fread.
I have a text file which contains a "triangle of numbers"as follows:
59
73 41
52 40 09
26 53 06 34
10 51 87 86 81
61 95 66 57 25 68
90 81 80 38 92 67 73
30 28 51 76 81 18 75 44
...
There is a single space between each pair of numbers and each line ends with a carriage-return new-line pair.
I use the following Erlang program to read this file into a list.
-module(euler67).
-author('Cayle Spandon').
-export([solve/0]).
solve() ->
{ok, File} = file:open("triangle.txt", [read]),
Data = read_file(File),
ok = file:close(File),
Data.
read_file(File) ->
read_file(File, []).
read_file(File, Data) ->
case io:fread(File, "", "~d") of
{ok, [N]} ->
read_file(File, [N | Data]);
eof ->
lists:reverse(Data)
end.
The output of this program is:
(erlide#cayle-spandons-computer.local)30> euler67:solve().
[59,73,41,52,40,9,26,53,6,3410,51,87,86,8161,95,66,57,25,
6890,81,80,38,92,67,7330,28,51,76,81|...]
Note how the last number of the fourth line (34) and the first number of the fifth line (10) have been merged into a single number 3410.
When I dump the text file using "od" there is nothing special about those lines; they end with cr-nl just like any other line:
> od -t a triangle.txt
0000000 5 9 cr nl 7 3 sp 4 1 cr nl 5 2 sp 4 0
0000020 sp 0 9 cr nl 2 6 sp 5 3 sp 0 6 sp 3 4
0000040 cr nl 1 0 sp 5 1 sp 8 7 sp 8 6 sp 8 1
0000060 cr nl 6 1 sp 9 5 sp 6 6 sp 5 7 sp 2 5
0000100 sp 6 8 cr nl 9 0 sp 8 1 sp 8 0 sp 3 8
0000120 sp 9 2 sp 6 7 sp 7 3 cr nl 3 0 sp 2 8
0000140 sp 5 1 sp 7 6 sp 8 1 sp 1 8 sp 7 5 sp
0000160 4 4 cr nl 8 4 sp 1 4 sp 9 5 sp 8 7 sp
One interesting observation is that some of the numbers for which the problem occurs happen to be on 16-byte boundary in the text file (but not all, for example 6890).
I'm going to go with it being a bug in Erlang, too, and a weird one. Changing the format string to "~2s" gives equally weird results:
["59","73","4","15","2","40","0","92","6","53","0","6","34",
"10","5","1","87","8","6","81","61","9","5","66","5","7",
"25","6",
[...]|...]
So it appears that it's counting a newline character as a regular character for the purposes of counting, but not when it comes to producing the output. Loopy as all hell.
A week of Erlang programming, and I'm already delving into the source. That might be a new record for me...
EDIT
A bit more investigation has confirmed for me that this is a bug. Calling one of the internal methods that's used in fread:
> io_lib_fread:fread([], "12 13\n14 15 16\n17 18 19 20\n", "~d").
{done,{ok,"\f"}," 1314 15 16\n17 18 19 20\n"}
Basically, if there's multiple values to be read, then a newline, the first newline gets eaten in the "still to be read" part of the string. Other testing suggests that if you prepend a space it's OK, and if you lead the string with a newline it asks for more.
I'm going to get to the bottom of this, gosh-darn-it... (grin) There's not that much code to go through, and not much of it deals specifically with newlines, so it shouldn't take too long to narrow it down and fix it.
EDIT^2
HA HA! Got the little blighter.
Here's the patch to the stdlib that you want (remember to recompile and drop the new beam file over the top of the old one):
--- ../erlang/erlang-12.b.3-dfsg/lib/stdlib/src/io_lib_fread.erl
+++ ./io_lib_fread.erl
## -35,9 +35,9 ##
fread_collect(MoreChars, [], Rest, RestFormat, N, Inputs).
fread_collect([$\r|More], Stack, Rest, RestFormat, N, Inputs) ->
- fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, More);
+ fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, [$\r|More]);
fread_collect([$\n|More], Stack, Rest, RestFormat, N, Inputs) ->
- fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, More);
+ fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, [$\n|More]);
fread_collect([C|More], Stack, Rest, RestFormat, N, Inputs) ->
fread_collect(More, [C|Stack], Rest, RestFormat, N, Inputs);
fread_collect([], Stack, Rest, RestFormat, N, Inputs) ->
## -55,8 +55,8 ##
eof ->
fread(RestFormat,eof,N,Inputs,eof);
_ ->
- %% Don't forget to count the newline.
- {more,{More,RestFormat,N+1,Inputs}}
+ %% Don't forget to strip and count the newline.
+ {more,{tl(More),RestFormat,N+1,Inputs}}
end;
Other -> %An error has occurred
{done,Other,More}
Now to submit my patch to erlang-patches, and reap the resulting fame and glory...
Besides the fact that it seems to be a bug in one of the erlang libs I think you could (very) easily circumvent the problem.
Given the fact your file is line-oriented I think best practice is that you process it line-by-line as well.
Consider the following construction. It works nicely on an unpatched erlang and because it uses lazy evaluation it can handle files of arbitrary length without having to read all of it into memory first. The module contains an example of a function to apply to each line - turning a line of text-representations of integers into a list of integers.
-module(liner).
-author("Harro Verkouter").
-export([liner/2, integerize/0, lazyfile/1]).
% Applies a function to all lines of the file
% before reducing (foldl).
liner(File, Fun) ->
lists:foldl(fun(X, Acc) -> Acc++Fun(X) end, [], lazyfile(File)).
% Reads the lines of a file in a lazy fashion
lazyfile(File) ->
{ok, Fd} = file:open(File, [read]),
lazylines(Fd).
% Actually, this one does the lazy read ;)
lazylines(Fd) ->
case io:get_line(Fd, "") of
eof -> file:close(Fd), [];
{error, Reason} ->
file:close(Fd), exit(Reason);
L ->
[L|lazylines(Fd)]
end.
% Take a line of space separated integers (string) and transform
% them into a list of integers
integerize() ->
fun(X) ->
lists:map(fun(Y) -> list_to_integer(Y) end,
string:tokens(X, " \n")) end.
Example usage:
Eshell V5.6.5 (abort with ^G)
1> c(liner).
{ok,liner}
2> liner:liner("triangle.txt", liner:integerize()).
[59,73,41,52,40,9,26,53,6,34,10,51,87,86,81,61,95,66,57,25,
68,90,81,80,38,92,67,73,30|...]
And as a bonus, you can easily fold over the lines of any (lineoriented) file w/o running out of memory :)
6> lists:foldl( fun(X, Acc) ->
6> io:format("~.2w: ~s", [Acc,X]), Acc+1
6> end,
6> 1,
6> liner:lazyfile("triangle.txt")).
1: 59
2: 73 41
3: 52 40 09
4: 26 53 06 34
5: 10 51 87 86 81
6: 61 95 66 57 25 68
7: 90 81 80 38 92 67 73
8: 30 28 51 76 81 18 75 44
Cheers,
h.
I noticed that there are multiple instances where two numbers are merged, and it appears to be at the line boundaries on every line starting at the fourth line and beyond.
I found that if you add a whitespace character to the beginning of every line starting at the fifth, that is:
59
73 41
52 40 09
26 53 06 34
10 51 87 86 81
61 95 66 57 25 68
90 81 80 38 92 67 73
30 28 51 76 81 18 75 44
...
The numbers get parsed properly:
39> euler67:solve().
[59,73,41,52,40,9,26,53,6,34,10,51,87,86,81,61,95,66,57,25,
68,90,81,80,38,92,67,73,30|...]
It also works if you add the whitespace to the beginning of the first four lines, as well.
It's more of a workaround than an actual solution, but it works. I'd like to figure out how to set up the format string for io:fread such that we wouldn't have to do this.
UPDATE
Here's a workaround that won't force you to change the file. This assumes that all digits are two characters (< 100):
read_file(File, Data) ->
case io:fread(File, "", "~d") of
{ok, [N] } ->
if
N > 100 ->
First = N div 100,
Second = N - (First * 100),
read_file(File, [First , Second | Data]);
true ->
read_file(File, [N | Data])
end;
eof ->
lists:reverse(Data)
end.
Basically, the code catches any of the numbers which are the concatenation of two across a newline and splits them into two.
Again, it's a kludge that implies a possible bug in io:fread, but that should do it.
UPDATE AGAIN The above will only work for two-digit inputs, but since the example packs all digits (even those < 10) into a two-digit format, that will work for this example.