I've been reading book about Erlang to evaluate if it's suitable for my project, and struble upon the bit syntax part of Learn You Some Erlang for Great Book.
Simply put, here's the code:
1> Color = 16#F09A29.
15768105
2> Pixel = <<Color:24>>.
<<240,154,41>>
What's confusing me is this: the Color variable is 24 bits, but how could Erlang knows that it has to divide the variable (in line 2) into three segments? How is the rule read?
I've tried to read the rest of the chapter, but it's getting more and more confusing me, because I don't understand how it divides the numbers. Could you please explain how the bit syntax works? How can it know that it's 3 segments, and how can it becomes <<154, 41>> when I do this:
1> Color = 16#F09A29.
15768105
2> Pixel = <<Color:16>>.
<<154,41>>
Thanks before.
Color = 16#F09A29 is an integer that can be written as 15768105 in decimal representation, as well as
00000000111100001001101000101001
in binary representation.
when you define a binary Pixel = << Color:24 >>. it just means that you say "Match the 24 less significant bits of Color with the binary Pixel". so Pixel is bounded to
111100001001101000101001,
without any split! when the shell prints it out, it does it byte per byte in decimal representation that is:
11110000 = 15*16 = 240, 10011010 = 9 * 16 + 10 = 154, 00101001 = 2 *
16 + 9 = 41 => << 240,154,41 >>
in the same way, when you define Pixel = << Color:16 >>, it takes only the 16 less significant bits and assign them to the binary =
1001101000101001,
which is printed 10011010 =
9 * 16 + 10 = 154, 00101001 = 2 * 16 + 9 = 41 => << 154,41 >>.
In the case of <> the binary equals now
100001001101000101001
( the 21 less significant bits) and when the shell prints them, it starts as usual, dividing the binary into bytes so
10000100 = 8*16 + 4 = 132, 11010001 = 13 *16 +1 = 209, as it remains only 5 bits 01001, the last chunk of data is printed 5:9 to tell us that the size of the last value is not 8 bits = 1 byte as usual, but only 5 bits =>
<< 132,209,5:9 >>.
The nice thing with binaries, is that you can "decode" them using size specification (maybe it is more clear with the example bellow).
(exec#WXFRB1824L)43> Co=16#F09A29.
15768105
(exec#WXFRB1824L)44> Pi = <<Co:24>>.
<<240,154,41>>
(exec#WXFRB1824L)45> <<R:8,V:8,B:8>> = Pi.
<<240,154,41>>
(exec#WXFRB1824L)46> R.
240
Erlang doesn't really "divide" anything. Binaries are just continuous blocks of data. It's the default human-readable representation that is printed by REPL is a comma-separated list of byte values.
It's just showing the 8-bit bytes that make up the binary. You're telling it to get 24 bits, and it's rendering them in the numeric representation (0-255) of each individual byte.
Related
While working on a routine to open TIFF files generated by a microscope software in Matlab I got stuck on reading compressed images. At this point it's a matter of honour for me to understand how to decode tiff data in the PackBits format.
With little experience in real computer science, I have troubles understanding the guidelines in the TIFF documentation, more specifically:
In the inverse routine, it is best to encode a 2-byte repeat run as a
replicate run except when preceded and followed by a literal run. In
that case, it is best to merge the three runs into one literal run.
Always encode 3-byte repeats as replicate runs. That is the essence of
the algorithm. Here are some additional rules:
• Pack each row
separately. Do not compress across row boundaries.
• The number of uncompressed bytes per row is defined to be (ImageWidth + 7) / 8. If the uncompressed bitmap is required to have an even number of bytes per row, decompress into word-aligned buffers.
• If a run is larger than 128 bytes, encode the remainder of the run as one or more additional replicate runs.
source: https://www.fileformat.info/format/tiff/corion-packbits.htm
I understand how to implement the pseudocode, and decode a sample string compressed with PackBits in Matlab. However, I'm lost during parsing a chunk of 16 bit, greyscale Tiff file. My question is how do I go about it? I don't really understand what it means in replicate run, neither what is a word-aligned buffer.
When I start decoding the data form form the first byte, I just get nonsense.
Help with understanding the logic of decompression will be appreciated, also a link to code decompressing the Tiff PackBits will be helpful.
~Jakub
Edit: I got the decompression algorithm to work, my error was to interpret the bytes wrongly, here is a code, if anyone will be interested in a similar problem in the future.
Tiff_file = 'compressed.tiff';
%open and read tiff file file
imInfo = imfinfo(Tiff_file);
fId = fopen(Tiff_file);
im = fread(fId);
fclose(fId);
%parse the file
output = zeros(1,imInfo.Width * imInfo.Height * 2);%preallocate
thisLoc = 1;
for strip = 1:length(imInfo.StripOffsets)
thisLength = imInfo.StripByteCounts(strip);
thisOffset = imInfo.StripOffsets(strip);
thisStrip = im(thisOffset + 1 : thisOffset + thisLength);
pntr = 1; %start at the first byte
%loop throught the coded data
while pntr < thisLength
key = thisStrip(pntr);
if key >= 129
key = 257 - key;
datTmp = repmat(thisStrip(pntr+1), [1 key]);
output(thisLoc:thisLoc+key-1) = datTmp;
thisLoc = thisLoc+key;
pntr = pntr + 2;
elseif key == 128 %nope
pntr = pntr + 1;
else
datTmp = thisStrip(pntr + 1 : pntr + 1 + key);
output(thisLoc:thisLoc+key) = datTmp;
thisLoc = thisLoc + key+1;
pntr = pntr + key + 2;
end
end
end
im = typecast(uint8(output),'uint16');
%reshape decoded data.
im = reshape(im,[imInfo.Width imInfo.Height])';
Can someone explain why s is a string with 4096 chars
iex(9)> s = String.duplicate("x", 4096)
... lots of "x"
iex(10)> String.length(s)
4096
but its memory size are a few 6 words?
iex(11)> :erts_debug.size(s)
6 # WHAT?!
And why s2 is a much shorter string than s
iex(13)> s2 = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
iex(14)> String.length(s)
50
but its size has more 3 words than s?
iex(15)> :erts_debug.size(s2)
9 # WHAT!?
And why does the size of these strings does not match their lengths?
Thanks
First clue why this is showing that values can be found in this question. Quoting size/1 docs:
%% size(Term)
%% Returns the size of Term in actual heap words. Shared subterms are
%% counted once. Example: If A = [a,b], B =[A,A] then size(B) returns 8,
%% while flat_size(B) returns 12.
Second clue can be found in Erlang documentation about bitstrings implementation.
So in the first case the string is too big to fit on heap alone, so it uses refc binaries which are stored on stack and on heap there is only pointer to given binary.
In second case string is shorter than 64 bytes and it uses heap binaries which is just array of bytes stored directly in on the heap, so that gives us 8 bytes per word (64-bit) * 9 = 72 and when we check documentation about exact memory overhead in VM we see that Erlang uses 3..6 words per binary + data, where data can be shared.
I am not able to understand bit packing in erlang.
Suppose:
R=4, G=6 and B=8
then why is the output like this:
<< R:5,G:5,B:6 >>
output: <<33,136>>.
I don't get it. Can anyone please explain?
<< R:5,G:5,B:6 >>
This record we allocate 5,5 and 6 bits, and the result is a 2-byte binary sequence. To better understand why this happens, start the reverse conversion. Transform numbers 33 and 136 in binary form:
integer_to_list(33,2).
integer_to_list(136,2).
"100001"
"10001000"
We get the following lines. Since each segment of the binary sequence is a multiple of 8, supplement the presentation of 33 zeros to the left.
L2=lists:append("00",lists:append(integer_to_list(33,2),integer_to_list(136,2))).
"0010000110001000"
Proceed to the decoding of. The third argument indicates the number of bits
V1 = list_to_integer(lists:sublist(L2,5),2).
V2 = list_to_integer(lists:sublist(L2,6,5),2).
V3 = list_to_integer(lists:sublist(L2,11,6),2).
4
6
8
Sorry for my English,hope I explained clearly.
I'm trying to get my head around the Bit Syntax in Erlang and I'm having some trouble understand how this works:
Red = 10.
Green = 61.
Blue = 20.
Color = << Red:5, Green:6, Blue:5 >> .
I've seen this example in the Software for a concurrent world by Joe Armstrong second edition and this code will
create a 16 bit memory area containing a single RGB triplet.
My question is how can 3 bytes be packed in a 16-bit memory area?. I'm not familiar whatsoever with bit shifting and I wasn't able to find anything relevant to this subject referring to erlang as well. My understand so far is that the segment is made up of 16 parts and that Red occupies 5, green 6 and blue 5 however I'm note sure how this is even possible.
Given that
61 = 0011011000110001
which alone is 16 bits how is this packaging possible?
To start with, 61 is only equal to 00110110 00110001 if you store it as two ASCII digits. When written in binary, 61 is 111101.
Note that the binary representation requires six binary digits, or six "bits" for short. That's what we're taking advantage of in this line:
Color = << Red:5, Green:6, Blue:5 >> .
We're using 5 bits for the red value, 6 bits for the green value, and 5 bits for the blue value, for a total of 16 bits. This works since the red value and the blue value are both less than 32 (since 31 is the largest number that can be represented with 5 bits), and the green value is less than 64 (since 63 is the largest number that can be represented with 6 bits).
The complete value is 01010 111101 10100 (three segments for red, green and blue), or if we split it into two bytes, 01010111 10110100.
I have 30000 files to process each file has 80000 x 5 lines. I need to read all files and process them finding the average of each line. I have written the code to read and extract all data from the file. My code is in Fortran. There is an array of (30000 X 800000) My program could not go over (3300 X 80000). I need to add the 4th column of each file in 300 file steps, I mean 4th column of 1st file with 4th column of 301st file, 4th col of 2nd file with 4th col of 302nd file and so on .Do you think this is because of the limitation of the size of array that Fortran can handle? If so, is there any way to increase the size of the array that Fortran can handle? What about the no of files? My code looks like this:
This program runs well.
implicit double precision (a-h,o-z),integer(i-n)
dimension x(78805,5),y(78805,5),den(78805,5)
dimension b(3300,78805),bb(78805)
character*70,fn
nf = 3300 ! NUMBER OF FILES
nj = 78804 ! Number of rows in file.
ns = 300 ! No. of steps for files.
ncores = 11 ! No of Cores
c--------------------------------------------------------------------
c--------------------------------------------------------------------
!Initialization
do i = 0,nf
do j = 1, nj
x(j,1) = 0.0
y(j,2) = 0.0
den(j,4) = 0.0
c a(i,j) = 0.0
b(i,j) = 0.0
c aa(j) = 0.0
bb(j) = 0.0
end do
end do
c-------!Body program-----------------------------------------------
iout = 6 ! Output Files upto "ns" no.
DO i= 1,nf ! LOOP FOR THE NUMBER OF FILES
write(fn,10)i
open(1,file=fn)
do j=1,nj ! Loop for the no of rows in the domain
read(1,*)x(j,1),y(j,2),den(j,4)
if(i.le.ns) then
c a(i,j) = prob(j,3)
b(i,j) = den(j,4)
else
c a(i,j) = prob(j,3) + a(i-ns,j)
b(i,j) = den(j,4) + b(i-ns,j)
end if
end do
close(1)
c ----------------------------------------------------------
c -----Write Out put [Probability and density matrix]-------
c ----------------------------------------------------------
if(i.ge.(nf-ns)) then
do j = 1, nj
c aa(j) = a(i,j)/(ncores*1.0)
bb(j) = b(i,j)/(ncores*1.0)
write(iout,*) int(x(j,1)),int(y(j,2)),bb(j)
end do
close(iout)
iout = iout + 1
end if
END DO
10 format(i0,'.txt')
END
It's hard to say for sure because you haven't given all the details yet, but your problem is quite possibly that you are using a 32 bit compiler producing 32 bit executables and you are simply running out of address space.
Although your operating system supports 64 bit address space, your 32 bit process is still limited to 32 bit addresses.
You have found a limit at 3300*78805*8 which is just under 2GB and this supports my theory.
No matter what is the cause of your immediate problem, your fundamental problem is that you appear to be loading everything into memory at once. I've not closely studied your algorithm but on first inspection it seems likely that you could re-arrange it to avoid having everything in memory at once.