aarch64 MMU: skipping first/second level tables

aarch64 MMU: skipping first/second level tables - arm64

on armv7 [1] / aarch32 [2] MMU, when using Long descriptor, when the virtual space described by ttbr0 is small enough (1Gb here), the level 1 translation can be skipped, leaving only two levels of translations.
However I saw nothing of the like in the aarch64 translation description. Does anyone know if it is still possible to reduce the number of translation table used by ttbr0 when using aarch64 ?
A reference in the ARM ARM would be great if it exists.
Best,
V.
[1]: ARMARM v7 , B3.6 Long-descriptor translation table format, Fig B3-12 General view of stage 1 address translation using Long-descriptor format
[2]: ARMARM v8, G4.6.1 Overview of VMSAv8-32 address translation using Long-descriptor translation tables, Fig G4-8

As discussed here, it is still possible, but extended to several scenario. I mostly depend on the granule size choose, and the size of the virtual space. There are some tables (like Table D4-11 TCR.TnSZ values and IA ranges when there is no concatenation of translation tables) which helps you understand at which level of translation you will start, depending on your configuration.

Related

Using HERE in Forth for temporary space

I'm writing a game in Forth (for learning purposes).
The game is played on a "10 cell board". I'm trying new stuff so I did
here 10 [char] - fill
to set up the space for the board.
Then, to play 'X' in position 3
[char] X here 3 + c!
This has been working fine, but raises the question
Is this OK?
What if the board was a million cells wide?
Thanks

The described approach has the certain environmental dependencies, so your program just have to match the environmental restriction on programs of your Forth system (i.e. that you use).
1. Size of the data space
The word UNUSED returns "the amount of space remaining in the region addressed by HERE". So, a program can check the available space.
Also, according to the subsection 4.1.3 Other system documentation of the Forth Standard:
A system shall provide the following information:
[...]
program data space available, in address units;
So, you have just to check whether your Forth system provides enough data space for you program, and how the available data space can be configured (if any).
2. Transient regions
In the general case, it is not safe for a portable program to use the data space without reserving it.
According to the section 3.3.3.6 Other transient regions of the Forth Standard, the contents of the data space regions identified by PAD, WORD, and #> may become invalid after the data space is allocated. Сonsequently, the contents of the region identified by HERE may become invalid after the contents of the regions identified by PAD, WORD, and #> are changed.
See also A.3.3.3.6 Other transient regions:
In many existing Forth systems, these areas are at HERE or just beyond it, hence the many restrictions.
Moreover, some Forth systems may use the region identified by HERE in internal purposes during translation. E.g. Gforth 0.7.9 uses this region when decoding escaped strings.
The phrase:
s\" test\-passed" cr here over type cr type cr
outputs:
test-passed
test-passed
So, you have to check the restrictions of your Forth system whether you may use the region identified by HERE without reserving the space (and in what conditions).

Largest amount of entries in lua table

I am trying to build a Sieve of Eratosthenes in Lua and i tried several things but i see myself confronted with the following problem:
The tables of Lua are to small for this scenario. If I just want to create a table with all numbers (see example below), the table is too "small" even with only 1/8 (...) of the number (the number is pretty big I admit)...
max = 600851475143
numbers = {}
for i=1, max do
table.insert(numbers, i)
end
If I execute this script on my Windows machine there is an error message saying: C:\Program Files (x86)\Lua\5.1\lua.exe: not enough memory. With Lua 5.3 running on my Linux machine I tried that too, error was just killed. So it is pretty obvious that lua can´t handle the amount of entries.
I don´t really know whether it is just impossible to store that amount of entries in a lua table or there is a simple solution for this (tried it by using a long string aswell...)? And what exactly is the largest amount of entries in a Lua table?
Update: And would it be possible to manually allocate somehow more memory for the table?
Update 2 (Solution for second question): The second question is an easy one, I just tested it by running every number until the program breaks: 33.554.432 (2^25) entries fit in one one-dimensional table on my 12 GB RAM system. Why 2^25? Because 64 Bit per number * 2^25 = 2147483648 Bits which are exactly 2 GB. This seems to be the standard memory allocation size for the Lua for Windows 32 Bit compiler.
P.S. You may have noticed that this number is from the Euler Project Problem 3. Yes I am trying to accomplish that. Please don´t give specific hints (..). Thank you :)

The Sieve of Eratosthenes only requires one bit per number, representing whether the number has been marked non-prime or not.
One way to reduce memory usage would be to use bitwise math to represent multiple bits in each table entry. Current Lua implementations have intrinsic support for bitwise-or, -and etc. Depending on the underlying implementation, you should be able to represent 32 or 64 bits (number flags) per table entry.
Another option would be to use one or more very long strings instead of a table. You only need a linear array, which is really what a string is. Just have a long string with "t" or "f", or "0" or "1", at every position.
Caveat: String manipulation in Lua always involves duplication, which rapidly turns into n² or worse complexity in terms of performance. You wouldn't want one continuous string for the whole massive sequence, but you could probably break it up into blocks of a thousand, or of some power of 2. That would reduce your memory usage to 1 byte per number while minimizing the overhead.
Edit: After noticing a point made elsewhere, I realized your maximum number is so large that, even with a bit per number, your memory requirements would optimally be about 73 gigabytes, which is extremely impractical. I would recommend following the advice Piglet gave in their answer, to look at Jon Sorenson's version of the sieve, which works on segments of the space instead of the whole thing.
I'll leave my suggestion, as it still might be useful for Sorenson's sieve, but yeah, you have a bigger problem than you realize.

Lua uses double precision floats to represent numbers. That's 64bits per number.
600851475143 numbers result in almost 4.5 Terabytes of memory.
So it's not Lua's or its tables' fault. The error message even says
not enough memory
You just don't have enough RAM to allocate that much.
If you would have read the linked Wikipedia article carefully you would have found the following section:
As Sorenson notes, the problem with the sieve of Eratosthenes is not
the number of operations it performs but rather its memory
requirements.[8] For large n, the range of primes may not fit in
memory; worse, even for moderate n, its cache use is highly
suboptimal. The algorithm walks through the entire array A, exhibiting
almost no locality of reference.
A solution to these problems is offered by segmented sieves, where
only portions of the range are sieved at a time.[9] These have been
known since the 1970s, and work as follows
...

Text clustering within a log file

I am working on a problem of finding similar content in a log file. Let's say I have a log file which looks like this:
show version
Operating System (OS) Software
Software
BIOS: version 1.0.10
loader: version N/A
kickstart: version 4.2(7b)
system: version 4.2(7b)
BIOS compile time: 01/08/09
kickstart image file is: bootflash:/m9500-sf2ek9-kickstart-mz.4.2.7b.bin
kickstart compile time: 8/16/2010 13:00:00 [09/29/2010 23:10:48]
system image file is: bootflash:/m9500-sf2ek9-mz.4.2.7b.bin
system compile time: 8/16/2010 13:00:00 [09/30/2010 00:46:36]`
Hardware
xxxx MDS 9509 (9 Slot) Chassis ("xxxxxxx/xxxxx-2")
xxxxxxx, xxxx with 1033100 kB of memory.
Processor Board ID xxxx
Device name: xxx-xxx-1
bootflash: 1000440 kB
slot0: 0 kB (expansion flash)
For a human eye, it can easily be understood that "Software" and the data below is a section and "Hardware" and the data below is another section. Is there a way I can model using machine learning or some other technique to cluster similar sections based on a pattern? Also, I have shown 2 similar kinds of pattern but the patterns between sections might vary and hence should identify as different section. I have tried to find similarity using cosine similarity but it doesn't help much because the words aren't similar but the pattern is.

I see actually two separate machine learning problems:
1) If I understood you correctly the first problem you want to solve is the problem to split each log into distinct section, so one for Hardware, one for Software etc.
In order to achieve this one approach could be try to extract heading which mark the beginning of a new section. In order to do so you could manually label a set of different logs and label each row as heading=true, heading= false
No you could try to train a classifier which takes your labeled data as an input and the result could be a model.
2) Now that you have this different sections, you can split each log into those section and treat each section as a separate document.
Now I would first try a straigt-forward document clustering using a standard nlp pipeline:
Tokenize your document to get the tokens
Normalize them (maybe stemming is not the best idea for logs)
Create for each document a tf-idf vector
Start with a simple clustering algorithm like k-means to try to cluster the different section
After the clustering you should have the section similar to each other in the same cluster
I hope this helped, I think especially the first task is quit hard and maybe hand-tailored patterns will perform better.

What are the alignment restrictions on the new Haswell AVX "gather" instructions?

I'm looking at the AVX programming reference. The new Haswell instructions include some eagerly awaited "gather" loads. However, I can't figure out what the alignment restrictions are on the indexed data items. Section 2.5 "Memory alignment" of the reference seems like it ought to list the various VGATHER* instructions in one of tables 2.4 or 2.5... but it doesn't.
Background: while gather instructions' supported data sizes are 4 and 8 bytes, my application could benefit from gather-loading adjacent 16-bit data value pairs to DWORDS. Odd indices with a 2-byte scale will produce 2-byte aligned 4-byte loads and it's not clear to me from the manual whether this will fault or otherwise fail to work as intended (I rather suspect I'm out of luck given all the instructions supporting unaligned accesses seem to have a 'U' in them).

This is the first time I hear about AVX2. But I'm guessing the memory alignment restriction won't be different from current implementation of AVX on Sandy Bridge with the new VEX coding scheme. I.e. no alignment required unless explicitly using aligned VMOV instruction with A in the name. Most instruction allow access with any byte-granularity alignment.
In fact, see section 2.5, page 35 of Intel(R) Advanced Vector Extensions Programming Reference which states exactly this.

How does a virtual machine work?

I've been looking into how programming languages work, and some of them have a so-called virtual machines. I understand that this is some form of emulation of the programming language within another programming language, and that it works like how a compiled language would be executed, with a stack. Did I get that right?
With the proviso that I did, what bamboozles me is that many non-compiled languages allow variables with "liberal" type systems. In Python for example, I can write this:
x = "Hello world!"
x = 2**1000
Strings and big integers are completely unrelated and occupy different amounts of space in memory, so how can this code even be represented in a stack-based environment? What exactly happens here? Is x pointed to a new place on the stack and the old string data left unreferenced? Do these languages not use a stack? If not, how do they represent variables internally?

Probably, your question should be titled as "How do dynamic languages work?."
That's simple, they store the variable type information along with it in memory. And this is not only done in interpreted or JIT compiled languages but also natively-compiled languages such as Objective-C.

In most VM languages, variables can be conceptualized as pointers (or references) to memory in the heap, even if the variable itself is on the stack. For languages that have primitive types (int and bool in Java, for example) those may be stored on the stack as well, but they can not be assigned new types dynamically.
Ignoring primitive types, all variables that exist on the stack have their actual values stored in the heap. Thus, if you dynamically reassign a value to them, the original value is abandoned (and the memory cleaned up via some garbage collection algorithm), and the new value is allocated in a new bit of memory.

The VM has nothing to do with the language. Any language can run on top of a VM (the Java VM has hundreds of languages already).
A VM enables a different kind of "assembly language" to be run, one that is more fit to adapting a compiler to. Everything done in a VM could be done in a CPU, so think of the VM like a CPU. (Some actually are implemented in hardware).
It's extremely low level, and in many cases heavily stack based--instead of registers, machine-level math is all relative to locations relative to the current stack pointer.
With normal compiled languages, many instructions are required for a single step. a + might look like "Grab the item from a point relative to the stack pointer into reg a, grab another into reg b. add reg a and b. put reg a into a place relative to the stack pointer.
The VM does all this with a single, short instruction, possibly one or two bytes instead of 4 or 8 bytes PER INSTRUCTION in machine language (depending on 32 or 64 bit architecture) which (guessing) should mean around 16 or 32 bytes of x86 for 1-2 bytes of machine code. (I could be wrong, my last x86 coding was in the 80286 era.)
Microsoft used (probably still uses) VMs in their office products to reduce the amount of code.
The procedure for creating the VM code is the same as creating machine language, just a different processor type essentially.
VMs can also implement their own security, error recovery and memory mechanisms that are very tightly related to the language.
Some of my description here is summary and from memory. If you want to explore the bytecode definition yourself, it's kinda fun:
http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc.html

The key to many of the 'how do VMs handle variables like this or that' really comes down to metadata... The meta information stored and then updated gives the VM a much better handle on how to allocate and then do the right thing with variables.
In many cases this is the type of overhead that can really get in the way of performance. However, modern day implementations, etc have come a long way in doing the right thing.
As for your specific questions - treating variables as vanilla objects / etc ... comes down to reassigning / reevaluating meta information on new assignments - that's why x can look one way and then the next.

To answer a part of your questions, I'd recommend a google tech talk about python, where some of your questions concerning dynamic languages are answered; for example what a variable is (it is not a pointer, nor a reference, but in case of python a label).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart