About Neo4j internal data storage - neo4j

I was looking at the data files storage in Neo4j and I came up with interrogations.
I followed those two articles :
http://digitalstain.blogspot.jp/2010/10/neo4j-internals-file-storage.html
http://digitalstain.blogspot.jp/2011/11/rooting-out-redundancy-new-neo4j.html
I'm working with Neo4j 2.0.0
Here are my questions :
The .id files have a 8 bytes id while the other files just need 4 bytes.
Why is the size different?
.
In the neostore.nodestore.db, a node is represented by 14 bytes. We have :
1 : in-use flag
2 to 5 : id of the first relation
6 to 9 : id of the first property
10 to 13 : id of the first label
The 14th byte is always 00 for me, what does it mean?
.
About how you make the difference between
a short string (which can be stocked like the primitives properties)
and a long string (which has to be stocked in a separate file).
In the article, it is said that the limit is 3 blocks and a half.
There may be no direct relation to neo4j but I've tried several strings
the same length strings don't produce the same result (one goes with other properties, the other goes to the separate file).
Can someone explain why?
.
When a long string is stocked in the neostore.relationshiptypestore.db.names file,
each record takes 128 bytes. I don't fully understand the structure :
1 to 3 : ???
4 : length of the string
5 to 8 : ??? (always ff ff ff ff for me, which makes me think about a link but...)
9 to 128 : content of the string
One last question :
In the file neostore.propertystore.db.index, the 2 to 5 bytes are always 00 to me.
In the article, it is said that it represents the property count.
Any additionnal information?
--
Sorry for the long post, didn't know if I had to make one or several posts...
Hope my questions are understandable.
Thanks!

Related

Understanding AArch64 Translation Tables

I'm doing a hobby OS project and I an trying to get Virtual Memory set up. I had another project in an x86 architecture working with Page Tables but I am now learning ArmV8 now.
Now, I now that the maximum amount of bits used for addressing is 48[1]. The last 12 to 16 bits are used "as-is" to index within the selected region (depending on which granule size is selected[2]).
I just don't understand how we get those intermediate bits. Obviously the documentation is showing that intermediate tables are used[3] but it is quite unclear on how those tables are used.
In the first half of the following image, we see translation of an address with 4k granules and using 38 address bits.
I can't understand this image in the slightest. The "offsets", for example bits 38 to 30 point to an entry in the L1 table. How and where is this table defined ?
What I think is happening is, this a 12+8+8+8 address translation scheme. Starting from the right, 12 bits to find an offset within a 4096 block of memory. Right of that is 8 bits for L3, meaning that L3 indexes 256 blocks of 4096 bytes (1MB). Right of this, L2, has 8 bits also so 256 entries of (256*4096), totalling 256MB per L2 entry. Right of L2 is L1 with also 8 bits, 256 entries of 256MB means the total addressable memory is 64GB of physical RAM.
I don't think this is correct because that would only allow a 1:1 mapping of memory. Each table descriptor needs to carry some access flags and what not. Thus going back to the question of: how are those table defined. Each offset section is 8 bits and that's not enough to contain the address of a translation table.
Anyway, I am completely lost. I would appreciate if someone could give me a "plain english" explanation of how a translation table walk is done ? A graph would be nice but probably too much effort, I'll make one and share if after to help me synthesize the information. Or at least, if someone has one, a link to a good video/guide where the information isn't totally obfuscated ?
Here is the list of materials I have consulted:
https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translating-a-Virtual-Address-to-a-Physical-Address
https://forums.raspberrypi.com/viewtopic.php?t=227139
https://armv8-ref.codingbelief.com/en/chapter_d4/d42_4_translation_tables_and_the_translation_proces.html
https://github.com/bztsrc/raspi3-tutorial/blob/master/10_virtualmemory/mmu.c
[1]https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translation-tables-in-ARMv8-A
[2]https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translation-tables-in-ARMv8-A/Effect-of-granule-sizes-on-translation-tables
[3]https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translating-a-Virtual-Address-to-a-Physical-Address
The entire model behind translation tables arises from three values: the size of a translation table entry (TTE), the hardware page size (aka "translation granule"), and the amount of bits used for virtual addressing.
On arm64, TTEs are always 8 bytes. The hardware page size can be one of 4KiB, 16KiB or 64KiB (0x1000, 0x4000 or 0x10000 bytes), depending on both hardware support and runtime configuration. The amount of bits used for virtual addressing similarly depends on hardware support and runtime configuration, but with a lot more complex constraints.
By example
For the sake of simplicity, let's consider address translation under TTBR0_EL1 with no block mappings, no virtualization going on, no pointer authentication, no memory tagging, no "large physical address" support and the "top byte ignore" feature being inactive. And let's pick a hardware page size of 0x1000 bytes and 39-bit virtual addressing.
From here, I find it easiest to start at the result and go backwards in order to understand why we arrived here. So suppose you have a virtual address of 0x123456000 and the hardware maps that to physical address 0x800040000 for you. Because the page size is 0x1000 bytes, that means that for 0 <= n <= 0xfff, all accesses to virtual address 0x123456000+n will go to physical address 0x800040000+n. And because 0x1000 = 2^12, that means the lowest 12 bytes of your virtual address are not used for address translation, but indexing into the resulting page. Though the ARMv8 manual does not use this term, they are commonly called the "page offset".
63 12 11 0
+------------------------------------------------------------+-------------+
| upper bits | page offset |
+------------------------------------------------------------+-------------+
Now the obvious question is: how did we get 0x800040000? And the obvious answer is: we got it from a translation table. A "level 3" translation table, specifically. Let's defer how we found that for just a moment and suppose we know it's at 0x800037000. One thing of note is that translation tables adhere to the hardware page size as well, so we have 0x1000 bytes of translation information there. And because we know that one TTE is 8 bytes, that gives us 0x1000/8 = 0x200, or 512 entries in that table. 512 = 2^9, so we'll need 9 bits from our virtual address to index into this table. Since we already use the lower 12 bits as page offset, we take bits 20:12 here, which for our chosen address yield the value 0x56 ((0x123456000 >> 12) & 0x1ff). Multiply by the TTE size, add to the translation table address, and we know that the TTE that gave us 0x800040000 is written at address 0x8000372b0.
63 21 20 12 11 0
+------------------------------------------------------------+-------------+
| upper bits | L3 index | page offset |
+------------------------------------------------------------+-------------+
Now you repeat the same process over for how you got 0x800037000, which this time came from a TTE in a level 2 translation table. You again take 9 bits off your virtual address to index into that table, this time with an value of 0x11a ((0x123456000 >> 21) & 0x1ff).
63 30 29 21 20 12 11 0
+------------------------------------------------------------+-------------+
| upper bits | L2 index | L3 index | page offset |
+------------------------------------------------------------+-------------+
And once more for a level 1 translation table:
63 40 39 30 29 21 20 12 11 0
+------------------------------------------------------------+-------------+
| upper bits | L1 index | L2 index | L3 index | page offset |
+------------------------------------------------------------+-------------+
At this point, you used all 39 bits of your virtual address, so you're done. If you had 40-bit addressing, then there'd be another L0 table to go through. If you had 38-bit addressing, then we would've taken the L1 table all the same, but it would only span 0x800 bytes instead of 0x1000.
But where did the L1 translation table come from? Well, from TTBR0_EL1. Its physical address is just in there, serving as the root for address translation.
Now, to perform the actual translation, you have to do this whole process in reverse. You start with a translation table from TTBR0_EL1, but you don't know ad-hoc whether it's L0, L1, etc. To figure that out, you have to look at the translation granule and the number of bits used for virtual addressing. With 4KiB pages there's a 12-bit page offset and 9 bits for each level of translation tables, so with 39 bits you're looking at an L1 table. Then you take bits 39:30 of the virtual address to index into it, giving you the address of the L2 table. Rinse and repeat with bits 29:21 for L2 and 20:12 for L3, and you've arrived at the physical address of the target page.

Converting Address Sequence to Reference String In Page replacement

Recently when i was going through the basic concepts of Operating Systems , In the unit Virtual Memory Management(Albert Silberschatz 7th Edition) , I came across this concept for page replacement-
For example, if we trace a particular process, we might record the following
address sequence:
0100,0432,0101,0612,0102,0103,0104,0101,0611,0102,0103,
0104,0101,0610,0102,0103,0104,0101,0609,0102,0105
At 100 bytes per page, this sequence is reduced to the following reference
string:
1, 4, 1, 6, 1, 6, 1, 6, 1, 6,
I could not possibly understand how the recorded address sequence could be converted to reference string.
Wow, yet another confused reader of that book. To begin with, a page size is always going to be a power of 2. So let's make your question marginally realistic and say that those are all hexadecimal digits.
What you are doing here is then converting byte accesses into page references. The zeroth page starts at 0000 and ends at 00FF. The 1st page starts at 0100 ends at 01FF. Thus a reference to the 0100 is the first byte of page 1.
The page that starts at 0400 extends to 04FF. Thus the address 0432 is to the 4th page.
I don't know what point the authors are trying to get to.
I read somewhere that the 2nd digit from the MSB is the page reference string , for eg: if it's 0100 then the reference string is 1 , similarly 0323 gives 3 and so on.., and if you find a combination of 0100 0102 0113 then , all the three combined gives one reference string i.e 1. But I don't have any proper evidence in support of this logic or exactly to what the writers wanted to infer.
Reference String is set of successfully unique pages referred in the given list of Virtual Address or Logical Address.
Consider the Logical Address given as 3001,1002,1003,2001,2002,3003,2601,2602,1004,1005,1507,1510,2003,2008,3501,3603,4001,4002,1020,1021 and page size as 250 .
The Reference String of the given memory address sequence is calculated as
Reference String = Logical Address / Page Size                 
String Reference :  12 , 4 , 8 , 12 , 10 , 4 , 6 , 8 , 14 , 16 ,  4 
Address sequence: 0100, 0432, 0101, 0612, 0102, 0103, 0104, 0101, 0611, 0102, 0103, 0104, 0101, 0610, 0102, 0103, 0104, 0101, 0609, 0102, 0105
Page size: 100 bytes
Reference string: 1 4 1 6 1 6 1 6 1 6 1
The reference in the reference string is obtained by dividing (integer division) each address reference by the page size.
Reference = Address / Page Size
Consecutive occurrences of the same reference are replaced by a single reference.
Source:
https://notesmilenge.files.wordpress.com/2014/09/mod3b.pdf

Length and size of strings in Elixir / Erlang needs explanation

Can someone explain why s is a string with 4096 chars
iex(9)> s = String.duplicate("x", 4096)
... lots of "x"
iex(10)> String.length(s)
4096
but its memory size are a few 6 words?
iex(11)> :erts_debug.size(s)
6 # WHAT?!
And why s2 is a much shorter string than s
iex(13)> s2 = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
iex(14)> String.length(s)
50
but its size has more 3 words than s?
iex(15)> :erts_debug.size(s2)
9 # WHAT!?
And why does the size of these strings does not match their lengths?
Thanks
First clue why this is showing that values can be found in this question. Quoting size/1 docs:
%% size(Term)
%% Returns the size of Term in actual heap words. Shared subterms are
%% counted once. Example: If A = [a,b], B =[A,A] then size(B) returns 8,
%% while flat_size(B) returns 12.
Second clue can be found in Erlang documentation about bitstrings implementation.
So in the first case the string is too big to fit on heap alone, so it uses refc binaries which are stored on stack and on heap there is only pointer to given binary.
In second case string is shorter than 64 bytes and it uses heap binaries which is just array of bytes stored directly in on the heap, so that gives us 8 bytes per word (64-bit) * 9 = 72 and when we check documentation about exact memory overhead in VM we see that Erlang uses 3..6 words per binary + data, where data can be shared.

translate virtual address to physical address

The following page table is for a system with 16-bit virtual and physical addresses and with 4,096-byte pages. The reference bit is set to 1 when the page has been referenced. Periodically, a thread zeroes out all values of the reference bit.All numbers are provided in decimal.
I want to convert the following virtual addresses (in hexadecimal) to the equivalent physical addresses. Also I want to set the reference bit for the appropriate entry in the page table.
• 0xE12C
• 0x3A9D
• 0xA9D9
• 0x7001
• 0xACA1
I know the answers are but I want to know how can I achieve these answers:
0xE12C → 0x312C
0x3A9D → 0xAA9D
0xA9D9 → 0x59D9
0x7001 → 0xF001
0xACA1 → 0x5CA1
I found and tried This but it did not help me much.
It is given that virtual address is 16 bit long.Hence, there are 2^16 addresses in the virtual address space.
Page Size is given to be 4 KB ( there are 4K (4 * (2 ^ 10) )addresses in a page), so the number of pages will be ( 2^16 ) / ( 2 ^ 12 ) = 2 ^ 4.
To address each page 4 bits are required.
The most significant 4 bits in the virtual address will denote the page number being referred and the remaining 12 bits will be the page offset.
One thing to remember is page size (in the virtual address space ) is always same as the frame size in the main memory. Hence the last 12 bits will remain same in the physical address as that of the virtual address.
To get the frame address in the main memory just use the first 4 bits.
Example: Consider the virtual address 0xACA1
Here A in ACA1 denotes the page number ( 10 ) and corresponding frame no is 5 ( 0101) hence the resulting physical address will be → 0x5CA1.
To translate a virtual address to a physical address (applies ONLY to this homework question), we need to know 2 things:
Page Size
Number of bits for virtual address
In this example: 16-bit system, 4KB page size and physical memory size is 64KB.
First of all we need to determine the number of needed bits to act as offset inside page. log2(Page-Size) = log2(4096) = 12 bits for offset
Out of the 16 bits for virtual address, 12 are for offset, that means each process has 2^4 = 16 virtual pages. Each entry in page table stores the corresponding frame accommodating the page. For example:
Now lets translate!
First of all for ease of work lets convert 0xE12C to binary.
0xE12C = (1110 0001 0010 1100) in base 2
1110 = 14 in decimal
Entry 14 in P.T => Page frame 3.
Lets concatenate it to the 12 offset bits
Answer: (0011 0001 0010 1100) = 0x312C
Another example: 0x3A9D
0x3A9D = 0011 1010 1001 1101
0011 = 3
PageTable[3] = 10
10 in decimal = 1010 in binary
1010 1010 1001 1101 in binary = 0xAA9D
To help you solve this question, we need to get our details right:
16 bit of virtual address space = 2^16 = 65,536 address space
16 bit of physical address space = 2^16 = 65,536 address space
4096 Byte page size determines the offset, which is Log(4096) / Log (2) = 12 bit. This means, 2^12 for Page size
As per #Akash Mahapatra, the offset from virtual address is directly mapped to the offset onto physical address
As such, we now have:
2^16 (16bit) for virtual address - 2^12 (12bit) for offset = 4-bit for pages, or rather total number of pages available.
I won't repeat the calculation for physical since it's the same numbers.
2^4 (4bit) for pages = 16, which correlates to the number of table entries above!
We're getting there... be patient! :)
Memory Address 0xE12C in hex notation is also known to be holding 16-bit of address. (Since it's stated in the question.)
Let's butcher the address now...
We first remove '0x' from the info.
We can convert E12C to binary notation like #Tony Tannous, but I am going to apply a little short-cut.
I simply use a ratio. Well, the address is notated in 4 characters above, and since 16/4 = 4, I can define the first letter as virtual address, while the other 3 are offset address.
With the information, 'E' in hexadecimal format, I need to convert to Decimal = 14. Then I look at your table provided, and I found page frame '3'. Page frame 3 is noted in decimal format, which then need to be converted back to Hexadecimal format... Duh!... which is 3!
So, the Physical address mapping of the virtual memory location of 0xE12C can be found at 0x312C on the physical memory.
You will then go back to the table, and refer to the reference bit column and put a '1' to the row 14.
Apply the same concept for these -
0x3A9D → 0xAA9D
0xA9D9 → 0x59D9
0x7001 → 0xF001
0xACA1 → 0x5CA1
If you notice, the last 3 digits are the same (which determines the offset).
And the 1st of the 4-digits are mapped according to the table:
table entry 3 -> page frame 10 -> hex notation A
table entry A (10) -> page frame 5 -> hex notation 5
table entry 7 -> page frame 15 -> hex notation F
table entry A (10) -> page frame 5 -> hex notation 5
Hope this explanation helps you and others like me! :)

COBOL Packed data type: Type = P5

This may be a very basic question for COBOL experts. But I till date had nothing to do with COBOL. We are processing some files based on character position. The files are being sent to us from mainframe machines and we have a layout file for that that says somethings like this.
POSITION : LENGTH : TYPE : DESCRIPTION
----------:--------:------:-------------------------------
61-70 : 10 : P5 : FIELD-1 9(13)V(05)
71-80 : 10 : P5 : Field-2 9(13)V(05)
81-81 : 1 : A/N : FLAG
82-84 : 3 : N : NUMBER OF DAYS 9(3)
I understand that the type A/N means it is alpha-numeric. N means numeric and P means Packed data type. What i dont understand is what P5 means. What is the significance of 5 that comes next to P?
What is the significance of 5 that comes next to P?
I'm not sure. Five 16 bit words, maybe.
Your packed fields are 10 bytes and holding 19 characters (18 digits plus the sign). The decimal point is implied.
If the sign byte (the rightmost byte) is anything other than hexadecimal F, update your question.
If you could update your question with five hexadecimal strings representing five of the numbers, that would be great.
Right now, I'm guessing that it's an ordinary packed decimal field.
P - packed decimal (i.e. Cobol Comp-3) a 18 digit packed decimal would occupy 10 bytes which agrees with the lengths give
5 - number of digits after the decimal point (at a guess).
Field definition in cobol is probably
03 FIELD-1 pic s9(13)V(05) comp-3.
in packed decimal, the sign is held in the last nyble (4 bits) and each nyble (4 bits) holds one decimal digit.
i.e.
121 is represented as x'121c'
while
-121 is represented as x'121d'
If you are using java and can get the cobol copybook, there are packages that can read the file using the cobol copybook.
I would bet it means 5 decimal places.

Resources