Quartus initializing RAM - memory

I made an entity in which quartus successfully recognizes RAM, and instantiates a RAM megafunction for it. It would be nice if I could initialize that RAM from a file. I found tutorials for making such file (.mif file). Now that I have created that file, i don't know how to make quartus initialize that module. Any help is appreciated.
Here is my RAM entity:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity RAM is
port (
clk: in std_logic;
we: in std_logic;
data_in: in std_logic_vector (7 downto 0);
read_addr: in integer range 0 to 65535;
write_addr: in integer range 0 to 65535;
data_out: out std_logic_vector (7 downto 0)
);
end entity RAM;
architecture RAM_arch of RAM is
type memory is array (65535 downto 0) of std_logic_vector (7 downto 0);
signal content: memory;
begin
process(clk)
begin
if (RISING_EDGE(clk)) then
if (we = '1') then
content(write_addr) <= data_in;
end if;
data_out <= content(read_addr);
end if;
end process;
end architecture;

Possibly the best way to initialise the memory is to ... put an initialisation clause on the memory variable. There may be Quartus-specific ways to load .MIF files, but this is probably simpler, definitely more portable (to Xilinx for example), and more flexible because you get to define the file format, you don't have to generate .mif files.
Given the following code:
type memory is array (65535 downto 0) of std_logic_vector (7 downto 0);
signal content: memory;
you could simply write
type memory is array (65535 downto 0) of std_logic_vector (7 downto 0);
signal content: memory := init_my_RAM(filename => "ram_contents.txt");
Now it is possible but unlikely that Quartus doesn't support initialisation this way,
so we can test it by writing a simple init_my_ram function ignoring the actual file contents:
function init_my_ram (filename : string) return memory is
variable f : file;
variable m : memory;
begin
file_open(f, filename, read_mode);
for i in memory'range loop
m(i) := X"55";
end loop;
file_close(f);
return m;
end init_my_ram;
Because the function call is an initialiser, and called at elaboration time when the design is synthesised, this is all synthesisable.
If this compiles and Quartus generates a memory full of X"55", you are good to go, to parse whatever file format you want, in the init_my_ram function. (Binary files are harder, and the reader code may not be so portable between tools, but not impossible).
The .MIF approach has one potential advantage though : you can update just the memory contents without requiring another synthesis/place and route cycle.

One simple way to initalize a ram area is as follows:
(quartus 15.1 tested)
(* ram_init_file = "Bm437_IBM_VGA8.mif" *) reg [7:0] Bm437_IBM_VGA8[4096];
Best regards,
Johi.

simplest method of initializing is by writing .mif with any simple editor such as Notepad. The .mif list below is for a ROM decoder as multiplexer. 6-bit address 64-bit data. .mif can contain any data word size 8,16,32,64-bit etc..in both hex or binary. Works all the time. The file must be in same directory as project.
WIDTH=64;
DEPTH=128;
ADDRESS_RADIX=HEX;
DATA_RADIX=HEX;
CONTENT BEGIN
000 : 0000000000000001;-- 0
001 : 0000000000000002;-- 1
002 : 0000000000000004;-- 2
003 : 0000000000000008;-- 3
004 : 0000000000000010;-- 4
005 : 0000000000000020;-- 5
006 : 0000000000000040;-- 6
007 : 0000000000000080;-- 7
008 : 0000000000000100;-- 8
009 : 0000000000000200;-- 9
00A : 0000000000000400;-- 10
00B : 0000000000000800;-- 11
00C : 0000000000001000;-- 12
00D : 0000000000002000;-- 13
00E : 0000000000004000;-- 14
00F : 0000000000008000;-- 15
010 : 0000000000010000;-- 16
011 : 0000000000020000;-- 17
012 : 0000000000040000;-- 18
013 : 0000000000080000;-- 19
014 : 0000000000100000;-- 20
015 : 0000000000200000;-- 21
016 : 0000000000400000;-- 22
017 : 0000000000800000;-- 23
018 : 0000000001000000;-- 24
019 : 0000000002000000;-- 25
01A : 0000000004000000;-- 26
01B : 0000000008000000;-- 27
01C : 0000000010000000;-- 28
01D : 0000000020000000;-- 29
01E : 0000000040000000;-- 30
01F : 0000000080000000;-- 31
020 : 0000000100000000;-- 32
021 : 0000000200000000;-- 33
022 : 0000000400000000;-- 34
023 : 0000000800000000;-- 35
024 : 0000001000000000;-- 36
025 : 0000002000000000;-- 37
026 : 0000004000000000;-- 38
027 : 0000008000000000;-- 39
028 : 0000010000000000;-- 40
029 : 0000020000000000;-- 41
02A : 0000040000000000;-- 42
02B : 0000080000000000;-- 43
02C : 0000100000000000;-- 44
02D : 0000200000000000;-- 45
02E : 0000400000000000;-- 46
02F : 0000800000000000;-- 47
030 : 0001000000000000;-- 48
031 : 0002000000000000;-- 49
032 : 0004000000000000;-- 50
033 : 0008000000000000;-- 51
034 : 0010000000000000;-- 52
035 : 0020000000000000;-- 53
036 : 0040000000000000;-- 54
037 : 0080000000000000;-- 55
038 : 0100000000000000;-- 56
039 : 0200000000000000;-- 57
03A : 0400000000000000;-- 58
03B : 0800000000000000;-- 59
03C : 1000000000000000;-- 60
03D : 2000000000000000;-- 61
03E : 4000000000000000;-- 62
03F : 8000000000000000;-- 63
[40..7F] : 0000000000000000;
END;

As specified in this document this is the proper way to init memory from file:
signal content: memory;
attribute ram_init_file : string;
attribute ram_init_file of content:
signal is "init.mif";

If you generated one of the RAM modules using the wizard but forgot to add a memory initialization file to it you can add one later by doing the following:
Tools > MegaWizard Plug-In Manager > Edit an existing custom megafunction variation > {Select your file} > Next > Mem Init > Yes, use this file for the memory content data > Browse

Related

Object memory layout in Common Lisp

I know that Common Lisp discourages a programmer from touching raw memory, but I would like to know whether it is possible to see how an object is stored on a byte level. Of course, a garbage collector moves objects in memory space and two subsequent calls of a function (obj-as-bytes obj) could yield different results, but let us assume that we need just a memory snapshot. How would you implement such function?
My attempt with SBCL looks as follows:
(defun obj-as-bytes (obj)
(let* ((addr (sb-kernel:get-lisp-obj-address obj)) ;; get obj address in memory
(ptr (sb-sys:int-sap addr)) ;; make pointer to this area
(size (sb-ext:primitive-object-size obj)) ;; get object size
(output))
(dotimes (idx size)
(push (sb-sys:sap-ref-64 ptr idx) output)) ;; collect raw bytes into list
(nreverse output))) ;; return bytes in the reversed order
Let's try:
(obj-as-bytes #(1)) =>
(0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 111 40 161 4 16 0 0 0 23 1 16 80 0 0 0)
(obj-as-bytes #(2) =>
(0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 95 66 161 4 16 0 0 0 23 1 16 80 0 0 0)
From this output I conclude that there is a lot of garbage, which occupies space for future memory allocations. And we see it because (sb-ext:primitive-object-size obj) seems to return a chunk of memory which is large enough to fit the object.
This code demonstrates it:
(loop for n from 0 below 64 collect
(sb-ext:primitive-object-size (make-string n :initial-element #\a))) =>
(16 32 32 32 32 48 48 48 48 64 64 64 64 80 80 80 80 96 96 96 96 112 112 112 112 128 128 128 128 144 144 144 144 160 160 160 160 176 176 176 176 192 192 192 192 208 208 208 208 224 224 224 224 240 240 240 240 256 256 256 256 272 272 272)
So, obj-as-bytes would give a correct result if sb-ext:primitive-object-size were more accurate. But I cannot find any alternative.
Do you have any suggestions how to fix this function or how to implement it differently?
As I mentioned in a comment the layout of objects in memory is very implementation-specific and the tools to explore it are necessarily also implementation-dependent.
This answer discusses the layout for 64-bit versions of SBCL and only for 64-bit versions which have 'wide fixnums'. I'm not sure in which order these two things arrived in SBCL as I haven't looked seriously at any of this since well before SBCL and CMUCL diverged.
This answer also may be wrong: I'm not an SBCL developer and I'm only adding it because no one who is has (I suspect tagging the question properly might help with this).
Information below comes from looking at the GitHub mirror, which seems to be very up to date with the canonical source but a lot faster.
Pointers, immediate objects, tags
[Information from here.] SBCL allocates on two-word boundaries. On a 64-bit system this means that the low four bits of any address are always zero. These low four bits are used as a tag (the documentation calls this the 'lowtag') to tell you what sort of thing is in the rest of the word.
A lowtag of xyz0 means that the rest of the word is a fixnum, and in particular xyz will then be the low bits of the fixnum, rather than tag bits at all. This means both that there are 63 bits available for fixnums and that fixnum addition is trivial: you don't need to mask off any bits.
A lowtag of xy01 means that the rest of the word is some other immediate object. Some of the bits to the right of the lowtag (which I think SBCL calls a 'widetag' although I am confused about this as the term seems to be used in two ways) will say what the immediate object is. Examples of immediate objects are characters and single-floats (on a 64-bit platform!).
the remaining lowtag patterns are xy11, and they all mean that things are pointers to some non-immediate object:
0011 is an instance of something;
0111 is a cons;
1011 is a function;
1111 is something else.
Conses
Because conses don't need any additional type information (a cons is a cons) the lowtag is enough: a cons is then just two words in memory, each of which in turn has lowtags &c.
Other non-immediate objects
I think (but am not sure) that all other non-immediate objects have a word which says what they are (which may also be called a 'widetag') and at least one other word (because allocation is on two-word boundaries). I suspect that the special tag for functions means that function call can just jump to the entry point of the function's code.
Looking at this
room.lisp has a nice function called hexdump which knows how to print out non-immediate objects. Based on that I wrote a little shim (below) which tries to tell you useful things. Here are some examples.
> (hexdump-thing 1)
lowtags: 0010
fixnum: 0000000000000002 = 1
1 is a fixnum and its representation is just shifted right one bit as described above. Note that the lowtags actually contain the whole value in this case!
> (hexdump-thing 85757)
lowtags: 1010
fixnum: 0000000000029DFA = 85757
... but not in this case.
> (hexdump-thing #\c)
lowtags: 1001
immediate: 0000000000006349 = #\c
> (hexdump-thing 1.0s0)
lowtags: 1001
immediate: 3F80000000000019 = 1.0
Characters and single floats are immediate: some of the bits to the left of the lowtag tells the system what they are, I think?
> (hexdump-thing '(1 . 2))
lowtags: 0111
cons: 00000010024D6E07 : 00000010024D6E00
10024D6E00: 0000000000000002 = 1
10024D6E08: 0000000000000004 = 2
> (hexdump-thing '(1 2 3))
lowtags: 0111
cons: 00000010024E4BC7 : 00000010024E4BC0
10024E4BC0: 0000000000000002 = 1
10024E4BC8: 00000010024E4BD7 = (2 3)
Conses. In the first case you can see the two fixnums sitting as immediate values in the two fields of the cons. In the second, if you decoded the lowtag of the second field it would be 0111: it's another cons.
> (hexdump-thing "")
lowtags: 1111
other: 00000010024FAE8F : 00000010024FAE80
10024FAE80: 00000000000000E5
10024FAE88: 0000000000000000 = 0
> (hexdump-thing "x")
lowtags: 1111
other: 00000010024FC22F : 00000010024FC220
10024FC220: 00000000000000E5
10024FC228: 0000000000000002 = 1
10024FC230: 0000000000000078 = 60
10024FC238: 0000000000000000 = 0
> (hexdump-thing "xyzt")
lowtags: 1111
other: 00000010024FDDAF : 00000010024FDDA0
10024FDDA0: 00000000000000E5
10024FDDA8: 0000000000000008 = 4
10024FDDB0: 0000007900000078 = 259845521468
10024FDDB8: 000000740000007A = 249108103229
Strings. These have some type information, a length field, and then characters are packed two to a word. A single-character string needs four words, the same as a four-character one. You can read the character codes out of the data.
> (hexdump-thing #())
lowtags: 1111
other: 0000001002511C3F : 0000001002511C30
1002511C30: 0000000000000089
1002511C38: 0000000000000000 = 0
> (hexdump-thing #(1))
lowtags: 1111
other: 00000010025152BF : 00000010025152B0
10025152B0: 0000000000000089
10025152B8: 0000000000000002 = 1
10025152C0: 0000000000000002 = 1
10025152C8: 0000000000000000 = 0
> (hexdump-thing #(1 2))
lowtags: 1111
other: 000000100252DC2F : 000000100252DC20
100252DC20: 0000000000000089
100252DC28: 0000000000000004 = 2
100252DC30: 0000000000000002 = 1
100252DC38: 0000000000000004 = 2
> (hexdump-thing #(1 2 3))
lowtags: 1111
other: 0000001002531C8F : 0000001002531C80
1002531C80: 0000000000000089
1002531C88: 0000000000000006 = 3
1002531C90: 0000000000000002 = 1
1002531C98: 0000000000000004 = 2
1002531CA0: 0000000000000006 = 3
1002531CA8: 0000000000000000 = 0
Same deal for simple vectors: header, length, but now each entry takes a word of course. Above all entries are fixnums and you can see them in the data.
And so it goes on.
The code that did this
This may be wrong and an earlier version of it definitely did not like small bignums (I think hexdump doesn't like them). If you want real answers either read the source or ask an SBCL person. Other implementations are available, and will be different.
(defun hexdump-thing (obj)
;; Try and hexdump an object, including immediate objects. All the
;; work is done by sb-vm:hexdump in the interesting cases.
#-(and SBCL 64-bit)
(error "not a 64-bit SBCL")
(let* ((address/thing (sb-kernel:get-lisp-obj-address obj))
(tags (ldb (byte 4 0) address/thing)))
(format t "~&lowtags: ~12T~4,'0b~%" tags)
(cond
((zerop (ldb (byte 1 0) tags))
(format t "~&fixnum:~12T~16,'0x = ~S~%" address/thing obj))
((= (ldb (byte 2 0) tags) #b01)
(format t "~&immediate:~12T~16,'0x = ~S~%" address/thing obj))
((= (ldb (byte 2 0) tags) #b11) ;must be true
(format t "~&~A:~12T~16,'0x : ~16,'0x~%"
(case (ldb (byte 2 2) tags)
(#b00 "instance")
(#b01 "cons")
(#b10 "function")
(#b11 "other"))
address/thing (dpb #b0000 (byte 4 0) address/thing))
;; this tells you at least something (and really annoyingly
;; does not pad addresses on the left)
(sb-vm:hexdump obj))
;; can't happen
(t (error "mutant"))))
(values))

Question regarding HEX editor and retrieving file content from raw format

How to view the content of a file in raw format in hex editor? and how to find the header offset and tailer offset of a document in raw format in hex editor?
1. How to view the content of a file in raw format in hex editor?
On Linux / Mac you can use xxd, which also has a lot of formatting options of the output, but a simple example:
xxd file.pdf | less
00000000: 2550 4446 2d31 2e37 0d25 e2e3 cfd3 0d0a %PDF-1.7.%......
00000010: 3131 3837 3420 3020 6f62 6a0d 3c3c 2f4c 11874 0 obj.<</L
00000020: 696e 6561 7269 7a65 6420 312f 4c20 3330 inearized 1/L 30
00000030: 3934 3237 392f 4f20 3131 3837 372f 4520 94279/O 11877/E
00000040: 3133 3334 3538 2f4e 2037 362f 5420 3238 133458/N 76/T 28
00000050: 3536 3638 312f 4820 5b20 3136 3733 2034 56681/H [ 1673 4
00000060: 3331 315d 3e3e 0d65 6e64 6f62 6a0d 2020 311]>>.endobj.
...
...
002f36c0: 4134 3534 3437 3444 4434 3337 3e3c 3036 A454474DD437><06
002f36d0: 3839 3542 4133 4234 4341 3434 3044 4232 895BA3B4CA440DB2
002f36e0: 3435 3937 3645 3545 3331 3231 3738 3e5d 45976E5E312178>]
002f36f0: 3e3e 0d73 7461 7274 7872 6566 0d31 3136 >>.startxref.116
002f3700: 0d25 2545 4f46 0d .%%EOF.
You can also open any file using popular hex editor HxD on Windows ( screenshot from https://mh-nexus.de/en/graphics/HxDShotLarge.png )
2. how to find the header offset and tailer offset
Let's take a look at file signatures and magic bytes. As you can see, the lenght of them can differ:
1F 9D .. 0 z tar.z compressed file (often tar zip) using Lempel-Ziv-Welch algorithm
25 50 44 46 2d %PDF- 0 pdf PDF document[16]
ed ab ee db í«îÛ 0 rpm RedHat Package Manager (RPM) package [3]
If you don't want to manually inspect based on the previous list, but rather programatically identify file signatures, there are some libraries for different languages, such as pyfsig, and they maintain a list of current file signatures under current list that they can deal with.

Nvidia-smi showing fan speed as not available

My machine has nvidia Tesla K20m gpu. I would like to know gpu utilzation, memory utilization, temperature and fan speed. So I have used nvidia-smi to know the details. Nvidia-smi log is as follows
==============NVSMI LOG==============
Timestamp : Tue Dec 10 11:06:11 2013
Driver Version : 319.49
Attached GPUs : 1
GPU 0000:84:00.0
Product Name : Tesla K20m
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 128
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0325212069909
GPU UUID : GPU-8b890015-e683-4061-6596-d27716c2900b
VBIOS Version : 80.10.11.00.0B
Inforom Version
Image Version : 2081.0208.01.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : Compute
Pending : Compute
PCI
Bus : 0x84
Device : 0x00
Domain : 0x0000
Device Id : 0x102810DE
Bus Id : 0000:84:00.0
Sub System Id : 0x101510DE
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
Memory Usage
Total : 4799 MB
Used : 11 MB
Free : 4788 MB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
Gpu : 29 C
Power Readings
Power Management : Supported
Power Draw : 25.44 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 324 MHz
SM : 324 MHz
Memory : 324 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Default Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Compute Processes : None
How to know fan speed. Is there any plug-in? Can anyone help me?
Have you tried nvclock ? It works for my Tesla K40M.

How do I calculate this checksum?

I have an alarm system that I have configured to send SMS messages to my phone as well as over Ethernet.
Here a few of the SMSes I receive:
5522 18 1137 00 003 1C76
5522 18 3137 00 003 3278
5522 18 1130 00 002 E36E
5522 18 1401 00 001 ED6E
5522 18 1302 00 003 ED70
5522 18 1302 00 004 EE71
5522 18 1302 00 009 F376
5522 18 3147 00 009 417F
5522 18 1137 00 004 1D77
5522 18 3137 00 009 3379
5522 18 1602 00 000 0870
The first 4 bytes are the account number, the next 2 are always 18, the next 4 are event codes, 2 group bytes and 3 zone numbers. At the end there are 4 bytes which I suspect is some kind of checksum.
This is some kind of Ademco Contact ID format. However, I do not recognize the checksum.
It's not a time stamp as the last message (0870) is sent periodically and is always the same.
When sending via DTMF 0 should have value 10, but I do not know if that is the case with SMSes. Most likely not.
The checksum formula in Ademco's Contact ID is calculated using the formula:
S= HEX Checksum which is one digit.
(Sum of all message digits + S) MOD 15 = 0
and if the value is equal to 10 the checksum digit is 0.
The official Contact ID specification is here:http://li0r.files.wordpress.com/2012/07/sia-dc-05-1999-09_contact_id.pdf
So, using 5522 18 1602 00 000 0870 as an example:
LET C = checksum
5+5+2+2+1+8+1+6+2=32
(32+S) modulo 15 is congruent to 0
We then need the closest multiple of 15 going higher than 32 which would be 45.
45-32=13
Lets test that.
45 modulo 15 is congruent to 0
It is correct however, as Contact ID is 16 digits and you have 19 I would suspect that your panel is using a different proprietary implementation of Contact ID. If you post the make/model of the panel that this came from I may be able to explain things further.
I hope this answers your question!
-Alex
P.S.: To calculate mod use a percent sign in Google
P.P.S: The document that describes Contact ID is actually: DC-05-1999.09 the document you referenced is actually the computer interface communication protocol specifications.
I just want to correct AdemcoGuy's calculation as it seems to be incorrect:
So, the example was 5522 18 1602 00 000 0870
We need to replace each 0 by 10.
So:
5+5+2+2+1+8+1+6+10+2+10+10+10+10+10 = 92
than 100-92 = 8
So the cheksum is 8
Anyway in the question the checksum seems missing and what is the last 4 digit only knows who manufactured the panel which has been send it :)
#ACCT MT QXYZ GG CCC where:
ACCT: 4 Digit Account number (0-9, B-F)
MT: Message Type - Always 18
Q: Event qualifier, which gives specific event information:
1: New Event or Opening
3: New Restore or Closing
6: Previously reported condition still present (Status report)
XYZ: Event code (3 Hex digits 0-9,B-F)
GG: Group or Partition number (2 Hex digits 0-9, B-F). Use 00 to indicate that no specific group or partition information applies.
CCC: Zone number (Event reports) or User (Open / Close reports) (3 Hexdigits 0-9,B-F ). Use 000 to indicate that no specific zone or user information applies.
To look up event codes see this document (pdf).
I came across this post while trying to figure out the checksums of my own alarm system (Woonveilig/Egardia) that seems to be using the same format. I found a forum post on the german alarm forum that contains a snippet of C code to calculate CRCs for the LUPUS alarm system. This CRC calculation method seems to match both my own and Lasse's SMS based system. Here's the C code converted to a simple calculation tool:
#include <stdio.h>
#include <string.h>
// Code from: https://www.alarmforum.de/showthread.php?tid=12037&pid=75893
/**
* Fletcher Checksum.(LUPUS version, 16-bit)
*/
static unsigned int fletcher_sum(char* data, int len) {
unsigned int sum1 = 0x0, sum2 = 0x0;
while (len) {
unsigned int tlen = (len > 256) ? 256 : len;
len -= tlen;
do {
sum1 += *data++;
sum1 = (sum1 & 0xff);
sum2 += sum1;
sum2 = (sum2 & 0xff);
} while (--tlen);
}
return sum2 << 8 | sum1;
}
int main() {
char input[50];
int sum;
printf("Enter input: ");
fgets(input, sizeof(input), stdin);
sum = fletcher_sum(input, strlen(input)-1);
printf("%x\n", sum);
return 0;
}
Example (first SMS from the question post):
# cc checksum.c
# ./a.out
Enter input: 5522 18 1137 00 003
1c76

Unexpected behavior of io:fread in Erlang

This is an Erlang question.
I have run into some unexpected behavior by io:fread.
I was wondering if someone could check whether there is something wrong with the way I use io:fread or whether there is a bug in io:fread.
I have a text file which contains a "triangle of numbers"as follows:
59
73 41
52 40 09
26 53 06 34
10 51 87 86 81
61 95 66 57 25 68
90 81 80 38 92 67 73
30 28 51 76 81 18 75 44
...
There is a single space between each pair of numbers and each line ends with a carriage-return new-line pair.
I use the following Erlang program to read this file into a list.
-module(euler67).
-author('Cayle Spandon').
-export([solve/0]).
solve() ->
{ok, File} = file:open("triangle.txt", [read]),
Data = read_file(File),
ok = file:close(File),
Data.
read_file(File) ->
read_file(File, []).
read_file(File, Data) ->
case io:fread(File, "", "~d") of
{ok, [N]} ->
read_file(File, [N | Data]);
eof ->
lists:reverse(Data)
end.
The output of this program is:
(erlide#cayle-spandons-computer.local)30> euler67:solve().
[59,73,41,52,40,9,26,53,6,3410,51,87,86,8161,95,66,57,25,
6890,81,80,38,92,67,7330,28,51,76,81|...]
Note how the last number of the fourth line (34) and the first number of the fifth line (10) have been merged into a single number 3410.
When I dump the text file using "od" there is nothing special about those lines; they end with cr-nl just like any other line:
> od -t a triangle.txt
0000000 5 9 cr nl 7 3 sp 4 1 cr nl 5 2 sp 4 0
0000020 sp 0 9 cr nl 2 6 sp 5 3 sp 0 6 sp 3 4
0000040 cr nl 1 0 sp 5 1 sp 8 7 sp 8 6 sp 8 1
0000060 cr nl 6 1 sp 9 5 sp 6 6 sp 5 7 sp 2 5
0000100 sp 6 8 cr nl 9 0 sp 8 1 sp 8 0 sp 3 8
0000120 sp 9 2 sp 6 7 sp 7 3 cr nl 3 0 sp 2 8
0000140 sp 5 1 sp 7 6 sp 8 1 sp 1 8 sp 7 5 sp
0000160 4 4 cr nl 8 4 sp 1 4 sp 9 5 sp 8 7 sp
One interesting observation is that some of the numbers for which the problem occurs happen to be on 16-byte boundary in the text file (but not all, for example 6890).
I'm going to go with it being a bug in Erlang, too, and a weird one. Changing the format string to "~2s" gives equally weird results:
["59","73","4","15","2","40","0","92","6","53","0","6","34",
"10","5","1","87","8","6","81","61","9","5","66","5","7",
"25","6",
[...]|...]
So it appears that it's counting a newline character as a regular character for the purposes of counting, but not when it comes to producing the output. Loopy as all hell.
A week of Erlang programming, and I'm already delving into the source. That might be a new record for me...
EDIT
A bit more investigation has confirmed for me that this is a bug. Calling one of the internal methods that's used in fread:
> io_lib_fread:fread([], "12 13\n14 15 16\n17 18 19 20\n", "~d").
{done,{ok,"\f"}," 1314 15 16\n17 18 19 20\n"}
Basically, if there's multiple values to be read, then a newline, the first newline gets eaten in the "still to be read" part of the string. Other testing suggests that if you prepend a space it's OK, and if you lead the string with a newline it asks for more.
I'm going to get to the bottom of this, gosh-darn-it... (grin) There's not that much code to go through, and not much of it deals specifically with newlines, so it shouldn't take too long to narrow it down and fix it.
EDIT^2
HA HA! Got the little blighter.
Here's the patch to the stdlib that you want (remember to recompile and drop the new beam file over the top of the old one):
--- ../erlang/erlang-12.b.3-dfsg/lib/stdlib/src/io_lib_fread.erl
+++ ./io_lib_fread.erl
## -35,9 +35,9 ##
fread_collect(MoreChars, [], Rest, RestFormat, N, Inputs).
fread_collect([$\r|More], Stack, Rest, RestFormat, N, Inputs) ->
- fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, More);
+ fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, [$\r|More]);
fread_collect([$\n|More], Stack, Rest, RestFormat, N, Inputs) ->
- fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, More);
+ fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, [$\n|More]);
fread_collect([C|More], Stack, Rest, RestFormat, N, Inputs) ->
fread_collect(More, [C|Stack], Rest, RestFormat, N, Inputs);
fread_collect([], Stack, Rest, RestFormat, N, Inputs) ->
## -55,8 +55,8 ##
eof ->
fread(RestFormat,eof,N,Inputs,eof);
_ ->
- %% Don't forget to count the newline.
- {more,{More,RestFormat,N+1,Inputs}}
+ %% Don't forget to strip and count the newline.
+ {more,{tl(More),RestFormat,N+1,Inputs}}
end;
Other -> %An error has occurred
{done,Other,More}
Now to submit my patch to erlang-patches, and reap the resulting fame and glory...
Besides the fact that it seems to be a bug in one of the erlang libs I think you could (very) easily circumvent the problem.
Given the fact your file is line-oriented I think best practice is that you process it line-by-line as well.
Consider the following construction. It works nicely on an unpatched erlang and because it uses lazy evaluation it can handle files of arbitrary length without having to read all of it into memory first. The module contains an example of a function to apply to each line - turning a line of text-representations of integers into a list of integers.
-module(liner).
-author("Harro Verkouter").
-export([liner/2, integerize/0, lazyfile/1]).
% Applies a function to all lines of the file
% before reducing (foldl).
liner(File, Fun) ->
lists:foldl(fun(X, Acc) -> Acc++Fun(X) end, [], lazyfile(File)).
% Reads the lines of a file in a lazy fashion
lazyfile(File) ->
{ok, Fd} = file:open(File, [read]),
lazylines(Fd).
% Actually, this one does the lazy read ;)
lazylines(Fd) ->
case io:get_line(Fd, "") of
eof -> file:close(Fd), [];
{error, Reason} ->
file:close(Fd), exit(Reason);
L ->
[L|lazylines(Fd)]
end.
% Take a line of space separated integers (string) and transform
% them into a list of integers
integerize() ->
fun(X) ->
lists:map(fun(Y) -> list_to_integer(Y) end,
string:tokens(X, " \n")) end.
Example usage:
Eshell V5.6.5 (abort with ^G)
1> c(liner).
{ok,liner}
2> liner:liner("triangle.txt", liner:integerize()).
[59,73,41,52,40,9,26,53,6,34,10,51,87,86,81,61,95,66,57,25,
68,90,81,80,38,92,67,73,30|...]
And as a bonus, you can easily fold over the lines of any (lineoriented) file w/o running out of memory :)
6> lists:foldl( fun(X, Acc) ->
6> io:format("~.2w: ~s", [Acc,X]), Acc+1
6> end,
6> 1,
6> liner:lazyfile("triangle.txt")).
1: 59
2: 73 41
3: 52 40 09
4: 26 53 06 34
5: 10 51 87 86 81
6: 61 95 66 57 25 68
7: 90 81 80 38 92 67 73
8: 30 28 51 76 81 18 75 44
Cheers,
h.
I noticed that there are multiple instances where two numbers are merged, and it appears to be at the line boundaries on every line starting at the fourth line and beyond.
I found that if you add a whitespace character to the beginning of every line starting at the fifth, that is:
59
73 41
52 40 09
26 53 06 34
10 51 87 86 81
61 95 66 57 25 68
90 81 80 38 92 67 73
30 28 51 76 81 18 75 44
...
The numbers get parsed properly:
39> euler67:solve().
[59,73,41,52,40,9,26,53,6,34,10,51,87,86,81,61,95,66,57,25,
68,90,81,80,38,92,67,73,30|...]
It also works if you add the whitespace to the beginning of the first four lines, as well.
It's more of a workaround than an actual solution, but it works. I'd like to figure out how to set up the format string for io:fread such that we wouldn't have to do this.
UPDATE
Here's a workaround that won't force you to change the file. This assumes that all digits are two characters (< 100):
read_file(File, Data) ->
case io:fread(File, "", "~d") of
{ok, [N] } ->
if
N > 100 ->
First = N div 100,
Second = N - (First * 100),
read_file(File, [First , Second | Data]);
true ->
read_file(File, [N | Data])
end;
eof ->
lists:reverse(Data)
end.
Basically, the code catches any of the numbers which are the concatenation of two across a newline and splits them into two.
Again, it's a kludge that implies a possible bug in io:fread, but that should do it.
UPDATE AGAIN The above will only work for two-digit inputs, but since the example packs all digits (even those < 10) into a two-digit format, that will work for this example.

Resources