possible to have two working sets? 1) data 2) code - working-set

In regards to Operating System concepts... Can a process to have two working sets, one that represents data and another that represents code?

A "Working Set" is a term associated with Virtual Memory Manangement in Operating systems, however it is an abstract idea.
A working set is just the concept that there is a set of virtual memory pages that the application is currently working with and that there are other pages it isn't working with. Any page that is being currently used by the application is by definition part of the 'Working Set', so its impossible to have two.
Operating systems often do distinguish between code and data in a process using various page permissions and memory protection but this is a different concept than a "Working set".

This depends on the OS.
But on common OSes like Windows there is no real difference between data and code so no, it can't split up it's working set in data and code.

As you know, the working set is the set of pages that a process needs to have in primary store to avoid thrashing. If some of these are code, and others data, it doesn't matter - the point is that the process needs regular access to these pages.
If you want to subdivide the working set into code and data and possibly other categorizations, to try to model what pages make up the working set, that's fine, but the working set as a whole is still all the pages needed, regardless of how these pages are classified.
EDIT: Blocking on I/O - does tis affect the working set?
Remember that the working set is a model of the pages used over a given time period. When the length of time the process is blocked is short compared to the time period being modelled, then it changes little - the wait is insignificant and the working set over the time period being considered is unaffected.
But when the I/O wait is long compared to the modelled preriod, then it changes much. During the period the process is blocked, it's working set is emmpty. An OS could theoretically swap out all the processes' pages on the basis of this.
The working set model attempts to predict what pages the process will need based on it's past behaviour. In this case, if the process is still blocked at time t+1, then the model of an empty working set is correct, but as soon as the process is unblocked, it's working set will be non-empty - the prediction by the model still says no pages needed, so the predictive power of the model breaks down. But this is to be expected - you can't really predict the future. Normally. And the working set is expected to change over time.

This question is from the book "operating system concepts". The answer they are looking for (found elsewhere on the web) is:
Yes, in fact many processors provide two TLBs for this very reason. As
an example, the code being accessed by a process may retain the same
working set for a long period of time. However, the data the code
accesses may change, thus reflecting a change in the working set for
data accesses.
Which seems reasonable but is completely at odds with some of the other answers...

Related

any downsides to writing the same file 1000's of times in iOS?

I'm considering overwriting the same small file 1,000's - 100,000's of times in an iOS app. Are there any downsides to this, given that flash memory is rated for 1000's of writes (but not, say, 100,000's)?
Will the system file cache save me if I stick to standard FileHandle operations? (without me having to implement my own such cache)
This has been addressed before: Reading/Writing to/from iPhone's Documents folder performance
Any new insights?
Update in response to some comments below: in general I agree with you that sometimes examining the choice of solution is more critical than helping with the proposed solution itself.
However, for this case, I feel the question is legit. Basically, it applies to any program where there is a small amount of very volatile data that needs to be persisted often: say, a position in a game, or a stock tick, or some counter, or the last key pressed, or something like that. It needs to be reliably read after process restart, so the app can pick up where it left off, hence the question:
Can I use the iOS file system for that? I know I can't write 10,000's of times to actual flash memory - that would burn it out. But will file system operations solve this for me, through some form of caching? Or do I need to do that myself, 'by hand'?
I sort of assume 'yes' (file system will solve) - otherwise other apps that do this (there must be some) would be burning out phones all the time! But: hard to know for sure...
Update again: asked this question on apple forums:
https://forums.developer.apple.com/thread/116740
Still no clear answer. Some answers are: just cache it yourself to avoid any such potential problems (and there can be: a file write can fail, and increasing the frequency increases the probability of failure in weird ways). Another is: iOS logs so much stuff, there's no way I can write more frequently than that, and that's fine, so no worries... I guess I'll leave this question open for now.

How can I create a golden master for mvc 4 application

I was wondering how to create a golden master approach to start creating some tests for my MVC 4 application.
"Gold master testing refers to capturing the result of a process, and
then comparing future runs against the saved “gold master” (or known
good) version to discover unexpected changes." - #brynary
Its a large application with no tests and it will be good to start development with the golden master to ensure the changes we are making to increase the test coverage and hopefully decrease the complexity in the long don't break the application.
I am think about capturing a days worth of real world traffic from the IIS logs and use that to create the golden master however I am not sure the easiest or best way to go about it. There is nothing out of the ordinary on the app lots controllers with post backs etc
I am looking for a way to create a suitable golden master for a MVC 4 application hosted in IIS 7.5.
NOTES
To clarify something in regards to the comments the "golden master" is a test you can run to verify output of the application. It is like journalling your application and being able to run that journal every time you make a change to ensure you have broken anything.
When working with legacy code, it is almost impossible to understand
it and to write code that will surely exercise all the logical paths
through the code. For that kind of testing, we would need to
understand the code, but we do not yet. So we need to take another
approach.
Instead of trying to figure out what to test, we can test everything,
a lot of times, so that we end up with a huge amount of output, about
which we can almost certainly assume that it was produced by
exercising all parts of our legacy code. It is recommended to run the
code at least 10,000 (ten thousand) times. We will write a test to run
it twice as much and save the output.
Patkos Csaba - http://code.tutsplus.com/tutorials/refactoring-legacy-code-part-1-the-golden-master--cms-20331
My question is how do I go about doing this to a MVC application.
Regards
Basically you want to compare two large sets of results and control variations, in practice, an integration test. I believe that the real traffic can't exactly give you the control that I think you need it.
Before making any change to the production code, you should do the following:
Create X number of random inputs, always using the same random seed, so you can generate always the same set over and over again. You will probably want a few thousand random inputs.
Bombard the class or system under test with these random inputs.
Capture the outputs for each individual random input
When you run it for the first time, record the outputs in a file (or database, etc). From then on, you can start changing your code, run the test and compare the execution output with the original output data you recorded. If they match, keep refactoring, otherwise, revert back your change and you should be back to green.
This doesn't match with your approach. Imagine a scenario in which a user makes a purchase of a certain product, you can not determine the outcome of the transaction, insufficient credit, non-availability of the product, so you cannot trust in your input.
However, what you now need is a way to replicate that data automatically, and the automation of the browser in this case cannot help you much.
You can try a different approach, something like the Lightweight Test Automation Framework or else MvcIntegrationTestFramework which are the most appropriate to your scenario

How can i regard a page as volatile or predict the next time of a page's content modification?

I'm running virtual machine, so all the system information i can get. How can I use them to detect a page or revalant pages volatile? The result can be just a approximate volatile time of empirical conclusion. I want to use time series analysis to predict the next time of a page's content modification, is it possible and accurate? Are there any better methods? Thanks very much!
I'm going to answer for pages inside a process as the question if related to the OS gets very complex.
You can use VirtualQuery() and VirtualQueryEx()to determine the status of a given memory page. This includes if it is read only, guard page, DLL image section, writeable, etc. From these statuses you can infer the volatility of some pages.
All the read only pages can be assumed to be none-volatile. But that isn't strictly accurate since you can use VirtualProtect() to change the protection status of a page. And you can use VirtualProtextEx() to the same in a different application. So you'd need to re-check these.
What about the other pages? Any writeable page you're going to have to periodically check them. For example calculate a checksum and compare to previous checksum's to see if they've changed. And then record the time between changes.
You could use the NTDLL Function NtQueryInformationProcess() with ProcessWorkingSetWatch to get data on the page faults for the system.
Note sure if this what you're looking for but it's the simplest approach I can think of. It's potentially a bit CPU hungry. And reading each page regularly to calculate the checksums will trash your cache.

Neo4j Speed when invoked for the first time

I have a simple jquery which calls a servlet via get and then Neo4j is used to return data in JSON format.
The system is workable after the FIRST query but the very first time it is used the system ins unbelievably slow. This is some kind of initialisation issue. I am using Heroku web hosting.
The code is fairly long so I am not posting now, but are there any known issues regarding the first invocation of Neo4j?
I have done limited testing so far for performance as I had a lot of JSON problems anyway and they only just got resolved.
Summary:
JQuery(LINUX)<--> get (JSON) <---> Neo4j
First Query - response is 10-20 secs
Second Query - time is 2-3 secs
More queries - 2/3 secs.
This is not a one-off; I tested this a few times and always the same pattern comes up.
This is a normal behaviour of Neo4j where store files are mapped into memory lazily for parts of the files that become hot, and becoming hot requires perhaps thousands of requests to such a part. This is a behaviour that has big stores in mind, whereas for smaller stores it merely gets in the way (why not map the whole thing if it fits in memory?).
Then on top of that is an "object" cache that further optimizes access, that get populated lazily for requested entities.
Using an SSD instead of spinning media will usually speed up the initial non-memory-mapped random access quite a bit, but in your scenario I recognize that's not viable.
There are thoughts on beeing more sensitive to hot parts of the store (i.e. memory map even if not as hot) at the start of a database lifecycle, or more precisely have the heat sensitivity be a function of how much is currently memory mapped versus how much can be mapped at maximum. This has shown to make initial requests much more responsive.

Storing Data In Memory: Session vs Cache vs Static

A bit of backstory: I am working on an web application that requires quite a bit of time to prep / crunch data before giving it to the user to edit / manipulate. The data request task ~ 15 / 20 secs to complete and a couple secs to process. Once there, the user can manipulate vaules on the fly. Any manipulation of values will require the data to be reprocessed completely.
Update: To avoid confusion, I am only making the data call 1 time (the 15 sec hit) and then wanting to keep the results in memory so that I will not have to call it again until the user is 100% done working with it. So, the first pull will take a while, but, using Ajax, I am going to hit the in-memory data to constantly update and keep the response time to around 2 secs or so (I hope).
In order to make this efficient, I am moving the intial data into memory and using Ajax calls back to the server so that I can reduce processing time to handle the recalculation that occurs w/ this user's updates.
Here is my question, with performance in mind, what would be the best way to storing this data, assuming that only 1 user will be working w/ this data at any given moment.
Also, the user could potentially be working in this process for a few hours. When the user is working w/ the data, I will need some kind of failsafe to save the user's current data (either in a db or in a serialized binary file) should their session be interrupted in some way. In other words, I will need a solution that has an appropriate hook to allow me to dump out the memory object's data in the case that the user gets disconnected / distracted for too long.
So far, here are my musings:
Session State - Pros: Locked to one user. Has the Session End event which will meet my failsafe requirements. Cons: Slowest perf of the my current options. The Session End event is sometimes tricky to ensure it fires properly.
Caching - Pros: Good Perf. Has access to dependencies which could be a bonus later down the line but not really useful in current scope. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Static - Pros: Best Perf. Easies to maintain as I can directly leverage my current class structures. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Does anyone have any suggestions / comments on what I option I should choose?
Thanks!
Update: Forgot to mention, I am using VB.Net, Asp.Net, and Sql Server 2005 to perform this task.
I'll vote for secret option #4: use the database for this. If you're talking about a 20+ second turnaround time on the data, you are not going to gain anything by trying to do this in-memory, given the limitations of the options you presented. You might as well set this up in the database (give it a table of its own, or even a separate database if the requirements are that large).
I'd go with the caching method of for storing the data across any page loads. You can name the cache you want to store the data in to avoid conflicts.
For tracking user-made changes, I'd go with a more old-school approach: append to a text file each time the user makes a change and then sweep that file at intervals to save changes back to DB. If you name the files based on the user/account or some other session-unique indicator then there's no issue with conflict and the app (or some other support app, which might be a better idea in general) can sweep through all such files and update the DB even if the session is over.
The first part of this can be adjusted to stagger the write out more: save changes to Session, then write that to file at intervals, then sweep the file at larger intervals. you can tune it to performance and choose what level of possible user-change loss will be possible.
Use the Session, but don't rely on it.
Simply, let the user "name" the dataset, and make a point of actively persisting it for the user, either automatically, or through something as simple as a "save" button.
You can not rely on the session simply because it is (typically) tied to the users browser instance. If they accidentally close the browser (click the X button, their PC crashes, etc.), then they lose all of their work. Which would be nasty.
Once the user has that kind of control over the "persistent" state of the data, you can rely on the Session to keep it in memory and leverage that as a cache.
I think you've pretty much just answered your question with the pros/cons. But if you are looking for some peer validation, my vote is for the Session. Although the performance is slower (do you know by how much slower?), your processing is going to take a long time regardless. Do you think the user will know the difference between 15 seconds and 17 seconds? Both are "forever" in web terms, so go with the one that seems easiest to implement.
perhaps a bit off topic. I'd recommend putting those long processing calls in asynchronous (not to be confused with AJAX's asynchronous) pages.
Take a look at this article and ping me back if it doesn't make sense.
http://msdn.microsoft.com/en-us/magazine/cc163725.aspx
I suggest to create a copy of the data in a new database table (let's call it EDIT) as you send the initial results to the user. If performance is an issue, do this in a background thread.
As the user edits the data, update the table (also in a background thread if performance becomes an issue). If you have to use threads, you must make sure that the first thread is finished before you start updating the rows.
This allows a user to walk away, come back, even restart the browser and commit whenever she feels satisfied with the result.
One possible alternative to what the others mentioned, is to store the data on the client.
Assuming the dataset is not too large, and the code that manipulates it can be handled client side. You could store the data as an XML data island or JSON object. This data could then be manipulated/processed and handled all client side with no round trips to the server. If you need to persist this data back to the server the end resulting data could be posted via an AJAX or standard postback.
If this does not work with your requirements I'd go with just storing it on the SQL server as the other comment suggested.

Resources