how to get page size - memory

I was asked this question in an interview Plz tell me the answer :-
You have no documentation of the kernel. You only knows that you kernel supports paging.
How will you find that page size ? There is no flag or macro you have that can tell you about page size.
I was given the hint as you can use Time to get the answer. I still have no clue for it.

Run code like the following:
for (int stride = 1; stride < maxpossiblepagesize; stride += searchgranularity) {
char* somemem = (char*)malloc(veryverybigsize*stride);
starttime = getcurrentveryaccuratetime();
for (pos = somemem; pos < somemem+veryverybigsize*stride; pos += stride) {
// iterate over "veryverybigsize" chunks of size "stride"
*pos = 'Q'; // Just write something to force the page back into physical memory
}
endtime = getcurrentveryaccuratetime();
printf("stride %u, runtime %u", stride, endtime-starttime);
}
Graph the results with stride on the X axis and runtime on the Y axis. There should be a point at stride=pagesize, where the performance no longer drops.
This works by incurring a number of page faults. Once stride surpasses pagesize, the number of faults ceases to increase, so the program's performance no longer degrades noticeably.
If you want to be cleverer, you could exploit the fact that the mprotect system call must work on whole pages. Try it with something smaller, and you'll get an error. I'm sure there are other "holes" like that, too - but the code above will work on any system which supports paging and where disk access is much more expensive than RAM access. That would be every seminormal modern system.

It looks to me like a question about 'how does paging actually work'
They want you to explain the impact that changing the page size will have on the execution of the system.
I am a bit rusty on this stuff, but when a page is full, the system starts page swapping, which slows everything down. So you want to run something that will fill up the memory to different sizes, and measure the time it takes to do a task. At some point there will be a jump, where the time taken to do the task will suddenly jump.
Like I said I am a bit rusty on the implementation of doing this. But i'm pretty sure that is the shape of the answer they were after.

Whatever answer they were expecting it would almost certainly be a brittle solution. For one thing you can have multiple pages sizes so any answer you may have gotten for one small allocation may be irrelevant for the next multi-megabyte allocation (see things like Linux's Large Page support).
I suspect the question was more aimed at seeing how you approached the problem rather than the final solution you came up with.
By the way this question isn't about linux because you do have documentation for that as well as POSIX compliance, for which you just call sysconf(_SC_PAGE_SIZE).

Related

How to efficiently create a large vector of items initialized to the same value?

I'm looking to allocate a vector of small-sized structs.
This takes 30 milliseconds and increases linearly:
let v = vec![[0, 0, 0, 0]; 1024 * 1024];
This takes tens of microseconds:
let v = vec![0; 1024 * 1024];
Is there a more efficient solution to the first case? I'm okay with unsafe code.
Fang Zhang's answer is correct in the general case. The code you asked about is a little bit special: it could use alloc_zeroed, but it does not. As Stargateur also points out in the question comments, with future language and library improvements it is possible both cases could take advantage of this speedup.
This usually should not be a problem. Initializing a whole big vector at once probably isn't something you do extremely often. Big allocations are usually long-lived, so you won't be creating and freeing them in a tight loop -- the cost of initializing the vector will only be paid rarely. Sooner than resorting to unsafe, I would take a look at my algorithms and try to understand why a single memset is causing so much trouble.
However, if you happen to know that all-bits-zero is an acceptable initial value, and if you absolutely cannot tolerate the slowdown, you can do an end-run around the standard library by calling alloc_zeroed and creating the Vec using from_raw_parts. Vec::from_raw_parts is unsafe, so you have to be absolutely sure the size and alignment of the allocated memory is correct. Since Rust 1.44, you can use Layout::array to do this easily. Here's an example:
pub fn make_vec() -> Vec<[i8; 4]> {
let layout = std::alloc::Layout::array::<[i8; 4]>(1_000_000).unwrap();
// I copied the following unsafe code from Stack Overflow without understanding
// it. I was advised not to do this, but I didn't listen. It's my fault.
unsafe {
Vec::from_raw_parts(
std::alloc::alloc_zeroed(layout) as *mut _,
1_000_000,
1_000_000,
)
}
}
See also
How to perform efficient vector initialization in Rust?
vec![0; 1024 * 1024] is a special case. If you change it to vec![1; 1024 * 1024], you will see performance degrades dramatically.
Typically, for non-zero element e, vec![e; n] will clone the element n times, which is the major cost. For element equal to 0, there is other system approach to init the memory, which is much faster.
So the answer to your question is no.

Largest amount of entries in lua table

I am trying to build a Sieve of Eratosthenes in Lua and i tried several things but i see myself confronted with the following problem:
The tables of Lua are to small for this scenario. If I just want to create a table with all numbers (see example below), the table is too "small" even with only 1/8 (...) of the number (the number is pretty big I admit)...
max = 600851475143
numbers = {}
for i=1, max do
table.insert(numbers, i)
end
If I execute this script on my Windows machine there is an error message saying: C:\Program Files (x86)\Lua\5.1\lua.exe: not enough memory. With Lua 5.3 running on my Linux machine I tried that too, error was just killed. So it is pretty obvious that lua can´t handle the amount of entries.
I don´t really know whether it is just impossible to store that amount of entries in a lua table or there is a simple solution for this (tried it by using a long string aswell...)? And what exactly is the largest amount of entries in a Lua table?
Update: And would it be possible to manually allocate somehow more memory for the table?
Update 2 (Solution for second question): The second question is an easy one, I just tested it by running every number until the program breaks: 33.554.432 (2^25) entries fit in one one-dimensional table on my 12 GB RAM system. Why 2^25? Because 64 Bit per number * 2^25 = 2147483648 Bits which are exactly 2 GB. This seems to be the standard memory allocation size for the Lua for Windows 32 Bit compiler.
P.S. You may have noticed that this number is from the Euler Project Problem 3. Yes I am trying to accomplish that. Please don´t give specific hints (..). Thank you :)
The Sieve of Eratosthenes only requires one bit per number, representing whether the number has been marked non-prime or not.
One way to reduce memory usage would be to use bitwise math to represent multiple bits in each table entry. Current Lua implementations have intrinsic support for bitwise-or, -and etc. Depending on the underlying implementation, you should be able to represent 32 or 64 bits (number flags) per table entry.
Another option would be to use one or more very long strings instead of a table. You only need a linear array, which is really what a string is. Just have a long string with "t" or "f", or "0" or "1", at every position.
Caveat: String manipulation in Lua always involves duplication, which rapidly turns into n² or worse complexity in terms of performance. You wouldn't want one continuous string for the whole massive sequence, but you could probably break it up into blocks of a thousand, or of some power of 2. That would reduce your memory usage to 1 byte per number while minimizing the overhead.
Edit: After noticing a point made elsewhere, I realized your maximum number is so large that, even with a bit per number, your memory requirements would optimally be about 73 gigabytes, which is extremely impractical. I would recommend following the advice Piglet gave in their answer, to look at Jon Sorenson's version of the sieve, which works on segments of the space instead of the whole thing.
I'll leave my suggestion, as it still might be useful for Sorenson's sieve, but yeah, you have a bigger problem than you realize.
Lua uses double precision floats to represent numbers. That's 64bits per number.
600851475143 numbers result in almost 4.5 Terabytes of memory.
So it's not Lua's or its tables' fault. The error message even says
not enough memory
You just don't have enough RAM to allocate that much.
If you would have read the linked Wikipedia article carefully you would have found the following section:
As Sorenson notes, the problem with the sieve of Eratosthenes is not
the number of operations it performs but rather its memory
requirements.[8] For large n, the range of primes may not fit in
memory; worse, even for moderate n, its cache use is highly
suboptimal. The algorithm walks through the entire array A, exhibiting
almost no locality of reference.
A solution to these problems is offered by segmented sieves, where
only portions of the range are sieved at a time.[9] These have been
known since the 1970s, and work as follows
...

using z3 for ALLSAT

I'm using Z3 as a black box to find all possible combinations of some real-world objects with C# code like this:
while (solver.Check() == Status.SATISFIABLE)
{
SATModel = solver.Model;
....
//invert the Model
....
solver.Assert(InvertedModel)
}
For most of my problems the program is working fine, but now I have a bigger problem, where there would be 8.5E+64 possible combinations without constraints.
I'm starting with some 6000 constraints.
What I observe is that the check action takes less than .02 seconds at the beginning and builds up slowly. After 100000 found solutions it takes already 1 second per turn and after 130000 turns I measure 2 seconds.
Is there an easy way to improve the performance?
It's not unreasonable that the solver is taking longer and longer with each constraint. But to make sure it's not some sort of a memory-leak on the C# part, you should check that the time taken in your while loop is really in the Check part and not in the invert/assert part. If you determine z3 is the responsible party, perhaps filing it at https://github.com/Z3Prover/z3/issues might solicit a better answer from the developers.

What algorithm can I use to turn a drunkards walk into a correlated RNG?

I'm a novice programmer (the only reason I say this is because I'm not super familiar with all the terms yet) and I'm trying to make walls generate in respect to the wall before it. I've posted a question about it on here before
Randomly generated tunnel walls that don't jump around from one to the next
and sort of got the answer. What I was mainly looking for was the for loop that was used (I think). Th problem is I didn't know how to implement it properly without getting errors.
My problem ended up being "I couldn't figure out how to inc. this in to it. I have 41 walls altogether that i'm using and the walls are named Left1 and Right1. i had something like this
CGFloat Left1 = 14; for( int i = 0; i < 41; i++ ){
CGFloat offset = (CGFloat)arc4random_uniform(2*100) - 100;
Left1 += offset;
Right1 = Left1 + 100;
but it was telling me as a yellow text that Local declaration of "Left1" hides instance variable and then in a red text it says "Assigning to 'UIImageView *__strong' from incompatible type 'float'. i'm not sure how to fix this"
and I wasn't sure how to fix it. I realize (I think) that arc4random and arc4random_uniform are pretty much the same thing, as far as i know, with slight differences, but not the difference i'm looking for.
As I said before, i'm pretty novice so any example would really be helpful, especially with the variables i'm trying to use. Thank you.
You want a "hashing" function, and preferably a "cryptographic" one because they tend to be significantly higher quality - at the expense of requiring additional CPU resources. But on modern hardware the extra CPU power usually isn't a problem.
The basic idea is you can give any data to the function, and it will spit out a completely random result, but always the same result if you provide the same input.
Have a read up on them here:
http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Cryptographic_hash_function
There are hundreds of different algorithms in common use, which is best will depend on what you need.
Personally I recommend sha256. A quick search of "sha256 ios" here on stack overflow will show you how to make one, with the CommonCrypto library. The gist is you should create an NSString or NSData object that contains every offset, then run the entire thing through sha256. The result will be a perfectly random 256 bit number.
If 256 bits is too much, just cut it up. For example you could grab just the first 16 bits of the number, and you will have a perfectly random 16 bit number.

How is RAM able to acess any place in memory at O(1) speed

We are taught that the abstraction of the RAM memory is a long array of bytes. And that for the CPU it takes the same amount of time to access any part of it. What is the device that has the ability to access any byte out of the 4 gigabytes (on my computer) in the same time? As this does not seem as a trivial task for me.
I have asked colleagues and my professors, but nobody can pinpoint to the how this task can be achieved with simple logic gates, and if it isn't just a tricky combination of logic gates, then what is it?
My personal guess is that you could achieve the access of any memory in O(log(n)) speed, where n would be the size of memory. Because each gate would split the memory in two and send you memory access instruction to the next split the memory in two gate. But that requires ALOT of gates. I can't come up with any other educated guess, and I don't even know the name of the device that I should look up in Google.
Please help my anguished curiosity, and thanks in advance.
edit<
This is what I learned!
quote from yours "the RAM can send the value from cell addressed X to some output pins", here is where everyone skip (again) the thing that is not trivial for me. The way that I see it, In order to build a gate that from 64 pins decides which byte out of 2^64 to get, each pin needs to split the overall possible range of memory into two. If bit at index 0 is 0 -> then the address is at memory 0-2^64/2, else address is at memory 2^64/2-2^64. And so on, However the amout of gates (lets call them) that the memory fetch will go through will be 64, (a constant). However the amount of gates needed is N, where N is the number of memory bytes there are.
Just because there is 64 pins, it doesn't mean that you can still decode it into a single fetch from a range of 2^64. Does 4 gigabytes memory come with a 4 gigabytes gates in the memory control???
now this can be improved, because as I read furiously more and more about how this memory is architectured, if you place the memory into a matrix with sqrt(N) rows and sqrt(N) columns, the amount of gates that a fetch memory will need to go through is O(log(sqrt(N)*2) and the amount of gates that will be required will be 2*sqrt(N), which is much better, and I think that its probably a trade secret.
/edit<
What the heck, I might as well make this an answer.
Yes, in the physical world, memory access cannot be constant time.
But it cannot even be logarithmic time. The O(log n) circuit you have in mind ultimately involves some sort of binary (or whatever) tree, and you cannot make a binary tree with constant-length wires in a 3D universe.
Whatever the "bits per unit volume" capacity of your technology is, storing n bits requires a sphere with radius O(n^(1/3)). Since information can only travel at the speed of light, accessing a bit at the other end of the sphere requires time O(n^(1/3)).
But even this is wrong. If you want to talk about actual limitations of our universe, our physics friends say the absolute maximum number of bits you can store in any sphere is proportional to the sphere's surface area, not its volume. So the actual radius of a minimal sphere containing n bits of information is O(sqrt(n)).
As I mentioned in my comment, all of this is pretty much moot. The models of computation we generally use to analyze algorithms assume constant-access-time RAM, which is close enough to the truth in practice and allows a fair comparison of competing algorithms. (Although good engineers working on high-performance code are very concerned about locality and the memory hierarchy...)
Let's say your RAM has 2^64 cells (places where it is possible to store a single value, let's say 8-bit). Then it needs 64 pins to address every cell with a different number. When at the input pins of your RAM there 'appears' a binary number X the RAM can send the value from cell addressed X to some output pins, and your CPU can get the value from there. In hardware the addressing can be done quite easily, for example by using multiple NAND gates (such 'addressing device' from some logic gates is called a decoder).
So it is all happening at the hardware-level, this is just direct addressing. If the CPU is able to provide 64 bits to 64 pins of your RAM it can address every single memory cell (as 64 bit is enough to represent any number up to 2^64 -1). The only reason why you do not get the value immediately is a kind of 'propagation time', so time it takes for the signal to go through all the logic gates in the circuit.
The component responsible for dealing with memory accesses is the memory controller. It is used by the CPU to read from and write to memory.
The access time is constant because memory words are truly layed out in a matrix form (thus, the "byte array" abstraction is very realistic), where you have rows and columns. To fetch a given memory position, the desired memory address is passed on to the controller, which then activates the right column.
From http://computer.howstuffworks.com/ram1.htm:
Memory cells are etched onto a silicon wafer in an array of columns
(bitlines) and rows (wordlines). The intersection of a bitline and
wordline constitutes the address of the memory cell.
So, basically, the answer to your question is: the memory controller figures it out. Of course that, given a memory address, the mapping to column and row must be easy to calculate in a constant time.
To fully understand this topic, I recommend you to read this guide on how memory works: http://computer.howstuffworks.com/ram.htm
There are so many concepts to master that it is difficult to explain it all in one answer.
I've been reading your comments and questions until I answered. I think you are on the right track, but there is some confusion here. The random access in which you are implying doesn't exist in the same way you think it does.
Reading, writing, and refreshing are done in a continuous cycle. A particular cell in memory is only read or written in a certain interval if a signal is detected to do so in that cycle. There is going to be support circuitry that includes "sense amplifiers to amplify the signal or charge detected on a memory cell."
Unless I am misunderstanding what you are implying, your confusion is in how easy it is to read/write to a cell. It's different dependent on chip design but there IS a minimum number of cycles it takes to read or write data to a cell.
These are my sources:
http://www.doc.ic.ac.uk/~dfg/hardware/HardwareLecture16.pdf
http://www.electronics.dit.ie/staff/tscarff/memory/dram_cycles.htm
http://www.ece.cmu.edu/~ece548/localcpy/dramop.pdf
To avoid a humungous answer, I left most of the detail out but all three of these will describe the process you are looking for.

Resources