how does multilevel page table save memory spaces - memory

So I have learned that Multilevel page table is good at saving memory spaces during memory paging/mapping period.
Suppose I have a page directory with n items, so I will have n page table, so it costs just as large as space as single-level page table.
Is there anything I missed?

In multilevel page table there is one root page table that is present in the main-memory and all other page tables are kept in virtual memory. The root page table always remain in main memory. So, when you require a particular page table it will be loaded to the main memory while all other are still kept in virtual memory i.e page tables stored in virtual memory can be swapped in and out as and when required.

Related

how does an inverted page table deal with multiple process accessing the same frame

I've seen the other posts but none has the same problem, So from my understanding an inverted page table's entry depends on both the process id and virtual page number, in the actual page table if the information of both the process id and virtual page number matches then the index is the physical page number/frame number. my question is what happens when more than 1 process needs that frame/physical memory. you can't store both id or vpn in the same index
Interesting question! My guess is that this is one of the reasons that inverted page tables aren't very popular.
One solution could be to append a "shared" bit to each entry and create a hash table for each pid. When a page frame with the "shared" bit set is encountered, the MMU should trigger a fault that will cause the OS to use the pid of the requesting process and the virtual address to index into the hash table. At this point, its the same behavior as the global hash table, where the hashed entry contains the pairs.
An upside to this is that we can still use the pid field in the page table entry, so 1 of the shared pids will hit.
Some downsides are that the hash table per pid could be the same size as the global hash table, so we effectively increase the size of the global hash table by 2^16 (or however many pids are supported)! Of course, the hash table per pid probably won't be that large, so we could dynamically change the size based on how many entries are in use. But, this has its own side-effects, where we may have to evict other pages whenever we want to increase the size.
I'm sure there are better solutions out there, and I would love to hear them.
Another possible solution:
Add a "remap" bit to the page table entry. If a page table entry has this "remap" bit set, the given physical address is not the real physical address, instead its the index into the Remap Table. The Remap Table is a software-controlled data structure that contains mappings from indices to physical addresses. It is shared by all processes, as a page table entry already has the pid. The size of the Remap Table is dynamic; the Remap Table can reside anywhere in contiguous memory, pointed to by a Remap Table Base Register. The Remap Table Tail Register keeps track of the top of the Remap Table, which can increase or decrease.
When a brand new shared page table entry is created, it should be created as a standard page table entry with pid, physical address and virtual address. When other processes request a page for this shared physical address, they should have a page table entry created with the "remap" bit set, and with the physical address pointing to their remap table entry. Software can determine the remap table entry by searching from the base to the tail for any available entries. If none are found, software should increase the tail pointer, evicting the page residing there, if necessary. This remap table entry should contain the physical address of the shared page.
When removing a shared page table, software should remove the remap table entry. If the remap table entry to be removed is at the top of the remap table, software should decrease the Remap Table Pointer to the highest valid entry.
Simplest Solution:
Just create another page frame entry for all of the processes that share that same page table. They should have the pid and virtual address specific to that process, and the same physical page frame number of the shared page. This page table entry should have a "shared" bit that indicates it can be / is being shared with other processes, and that it should be written back to disk when it is evicted. This way, when each process' page table entry is removed, they will each write back the contents of the shared page to disk. This will result in n-1 (n being the number of processes sharing the page) "wasted" write backs, but it avoids the overhead of trying to keep track of all of the processes sharing a page.
A simple solution is to map only one virtual address to shared physical address in page table ,so other virtual address references mapped to same shared physical address will result into page faults.(source-Operating System Principles-Galvin)

If I have to fetch complete table all the time, what will be effect of indexing?

I'm creating a registration app, where I make users to register themselves.
Now admin has option to fetch export all registered users to CSV/Excel file. Records could be thousands. I can't fetch them all in once in one row, how to make it fast ? Will indexing help ?
When exporting the entire table, all data must be read.
Adding an index cannot reduce the amount of data.
An index would help only if you sort the output by some column(s).
Indexing will generally help when reading or updating a subset of data in the table, such as a single row or group or rows.
For example, if you wanted to allow all the data to be exported, or even viewed, one page at a time, a primary key field with an index can be used to retrieve a subset, such as 20, rows at a time, resulting in a better "perceived" performance, rather than waiting for the entire table to be exported before viewing any of the data.

MongoDB / Mongoid embedded document loading with ror

I have a document (DataSet) with many embedded documents (DataPoint) (1:N relation). Since this appears as an array to me in rails, if I want to read every 20th element for example, will it load every element into memory, or only every 20th element?
I am trying to figure out if this will be inefficient. I would like ideally only to load what I need from the DB.
Here is an example:
a = DataSet.first
points = a.data_points.values_at(*(0..a.data_points.count).step(20))
Is this bad? Is there a mongoid specific way to do this?
Embedded documents aren't relations (in the typical RDBMS fashion) but are actually embedded (hence the name) within the parent record, just like any other attribute. So when you call DataSet.first, you're loading the entire document, as well as its embedded records, into memory.
Depending on how your application is structured, you may see a benefit from denormalizing every 20th DataPoint into a separate embedded relation (during a callback, or in a background task, or something like that), and then when you load the document, load only those points with DataSet.only(:datapoints_sample).first - which will load only that relation into memory (and no other attributes).

Reading from SQLite in Xcode iOS SDK

I have program with a table in SQLite with about 100 rows and 6 colunms of text (not more than hundred of characters). Each time click a button, the program will display in the view a contents of a row in the table.
My question is that should I copy the content of whole table into an array and then reading from array each time user clicks button or I access the table in database each time the user click the button? Which one is more efficient?
It all depends, but retrieving from the database as you need data (rather than storing the whole thing in an array) would generally be most efficient use of memory, which is a pretty precious resource. My most extravagant use of memory would be to store an array of table unique identifiers (i.e. a list of primary keys, returned in the correct order), that way I'm not scanning through the database every time, but my unique identifiers are always numeric, so it doesn't use up too much memory. So, for something of this size, I generally:
open the database;
load an array of solely the table's unique identifiers (which were returned in the right order for my tableview using the SQL ORDER BY clause);
as table cells are built, I'll go back to the database and get the few fields I need for that one row corresponding to the unique identifier that I've kept track of for that one table row;
when I go to the details view, I'll again get that one row from the database; and
when I'm all done, I'll close the database
This yields good performance while not imposing too much of a drain on memory. If the table was different (much larger or much smaller) I might suggest different approaches, but this seems reasonable to me given your description of your data.
for me - it was much easier to re-read the database for each view load, and drop the contents when done.
the overhead of keeping the contents in memory was just too much to offset the quick read of a small dataset.
100 row and 6 columns is not a lot of data. iOS is capable of handling larger data than that and very efficiently. So don't worry about creating a new array. Reading from the database should work just fine.

Indexed View vs. Aggregate Table

It appears that indexed views and aggregate tables are used for the same purpose: To precompute aggregates in order to improve query performance. What are the benefits to using one approach over another? Is it ease of maintenance when using the views versus having to maintain the ETL required for the aggregate table?
You seem to be using SQL Server, so here are some points to consider.
Indexed view may or may not contain aggregations.
There is a list of functions (operators, keywords) that can not be used in indexed views, many of them aggregate.
Indexed view binds schema to tables referenced by the view.
Also, disabling an index on the view physically deletes the data. In data-warehousing, all indexes are usually dropped or disabled during loading. So, rebuilding this index would have to re-aggregate whole table after every major (daily?) load -- as opposed to an aggregate table which may be updated only for a last day or so.

Resources