Exam 70-496: Limit TFS proxy data cache to 15GB - tfs

One of the question in exam 70-496 is about limiting disk usage for cache. What would be the right answer for following question?
PercentageBasedPolicy OR FixedSizeBasedPolicy
Question is
You are the administrator of a TFS system that uses version control proxies at remote sites to reduce the burden on the WAN. The hard disk that stores the cache for a version control proxy server is upgraded to a larger size.
Management wants to ensure that more of the disk is used but not all of it.
You need to ensure that the proxy always uses a maximum of 15 GB for caching.
What should you do?
The link has answer and according to me it should be fixed but various forums says it would be percentage based. Please advise.
Link for Microsoft definition
http://msdn.microsoft.com/en-us/library/ms400763(v=vs.100).aspxv

Fixed the hint is in the "always" and "maximum".
You could probably do it with a percentage one if you know the total disk size, but it's kind of a stupid way of doing it. Unless Fixed: 15000 is no part of the possible answers.
The size is specified in megabytes.
See also:
https://msdn.microsoft.com/en-us/library/ms400763(v=vs.80).aspx

Related

Perfino System Requirements

We're planning to evaluate and eventually potentially purchase perfino. I went quickly through the docs and cannot find the system requirements for the installation. Also I cannot find it's compatibility with JBoss 7.1. Can you provide details please?
There are no hard system requirements for disk space, it depends on the amount of business transactions that you're recording. All data will be consolidated, so the database reaches a maximum size after a while, but it's not possible to say what that size will be. Consolidation times can be configured in the general settings.
There are also no hard system requirements for CPU and physical memory. A low-end machine will have no problems monitoring 100 JVMs, but the exact details again depend on the amount of monitored business transactions.
JBoss 7.1 is supported. "Supported" means that web service and EJB calls can be tracked between JVMs, otherwise all application servers work with perfino.
I haven't found any official system requirements, but this is what we figured out experimentally.
We collect about 10,000 transactions a minute from 8 JVMs. We have a lot of distinct and long SQL queries. We use AWS machine with 2 VCPUs and 8GB RAM.
When the Perfino GUI is not being used, the CPU load is low. However, for the GUI to work properly, we had to modify perfino_service.vmoptions:
-Xmx6000m. Before that we had experienced multiple OutOfMemoryError in Perfino when filtering in the transactions view. After changing the memory settings, the GUI is running fine.
This means that you need a machine with about 8GB RAM. I guess this depends on the number of distinct transactions you collect. Our limit is high, at 30,000.
After 6 weeks of usage, there's 7GB of files in the perfino directory. Perfino can clear old recordings after a configurable time.

Neo4j : Advices for hardware sizing and config [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I've been playing with a neo4j 2.0 db for a few months and I plan to install the db on a dedicated server. I've already tried several configs of neo4j (jvm, caches, ...) but I'm still not sure to have found the best one.
Therefore, it seems better to ask to the experts :)
Context
Db primitives:
Nodes = 224,114,478
Relationships = 417,681,104
Properties = 224,342,951
Db files:
nodestore.db = 3.064Gb
relationshipstore.db = 13.460Gb
propertystore.db = 8.982 Gb
propertystore.db.string = 5.998 Gb
propertystore.db.arrays = 1kb
OS server:
Windows server 2012 (64b)
Db usage:
Mostly graph traversals using cypher queries.
Perfs are not too bad on my dev laptop even if some queries have huge lags (I suspect that the main reason is swap caused by lack of RAM)
Graph specificities:
I suspect that some nodes may be huge hubs (till 1M relationships) but it should remain exceptional.
What would be your advices regarding:
hardware sizing,
neo4j configuration:
heap size,
use memory map buffer (is there any reason to keep the value to false with windows ?)
cache type,
recommended jvm settings for windows,
...
Thanks in advance !
laurent
The on-disk size of your graph is in total ~32 GB. Neo4j has a two layered cache architecture. The first layer is the file buffer cache. Ideally it should have the same size as the on disk graph, so ~32GB in your case.
IMPORTANT When running Neo4j on Windows, the file buffer cache is part of Java heap (due to the suckiness of Windows by itself). On Linux/Mac it's off heap. That is reason why I generally do not recommend production environments for Neo4j on Windows.
cache_type should be hpc when using enterprise edition and soft for community.
To have some reasonable amount for the second cache layer (object cache) I'd suggest to have a machine with at least 64GB RAM. Since file buffer and object cache are both on heap, make heap large and consider using G1: -Xmx60G -XX:+UseG1GC. Observe GC behaviour by uncommenting gc logging in neo4j-wrapper.conf and tweak the settings step by step.
Please note that Neo4j 2.2 might come with a different file buffer cache implementation that works off heap on Windows as well.
Please take a look at these two guides for performance tuning and hardware sizing calculations:
http://neo4j.com/developer/guide-performance-tuning/
http://neo4j.com/developer/guide-sizing-and-hardware-calculator/

Large Storage Solution

We are a small bootstrapped ISP in a third world country where bandwidths are usually expensive and slow. We recently got a customer who need storage solution, of 10s of TB of mostly video files (its a tv station). The thing is I know my way around linux but I have never done anything like this before. We have a backblaze 3 storage pod casing which we are thinking of using as a storage server. The Server will be connected to customer directly so its not gonna go through the internet, because 100+mbps speed is unheard off in this part of the world.
I was thinking of using 4TB HDD all formatted with ext4 and using LVM to make them one large volume (50-70tb at least). So customer logs in to an FTP like client and dumps whatever files he/she wants. But the customer only sees a single volume, and we can add space as his requirements increases. Of course this is just on papers from preliminary research as i don't have prior experience with this kind of system. Also I have to take cost in to consideration so can't go for any proprietary solution.
My questions are:
Is this the best way to handle this probably, are there equally good or better solutions out there?
For large storage solutions (at least large for me) what are my cost effective options when it comes to dealing with data corruption and HD failure.
Would love to hear any other solutions and tips you guys might have. thanks!
ZFS might be a good option but there is no native bug-free solution for Linux, yet. I would recommend other operating systems in that case.
Today I would recommend Linux MD raid5 on enterprise disks or raid6 on consumer/desktop disks. I would not assign more than 6 disks to an array. LVM can then be used to tie the arrays to a logical volume suitable for ext4.
The ext4-filesystem is well tested and stable while XFS might be better for large file storage. The downside to XFS is that it is not possible to shrink an XFS filesystem. I would prefer ext4 because of it's more flexible nature.
Please also take into consideration that backups are still required even if you are storing your data on raid-arrays. The data can silently corrupt or be accidentally deleted.
In the end, everything depends on what the customer wants. Telling the customer the price of the service usually has an effect on the requirements.
I would like to add to the answer that mingalsuo gave. As he stated, it really comes down to the customer requirements. You don't say what, specifically, the customer will do with this data. Is it for archive only? Will they be actively streaming the data? What is your budget for this project? These types of answers will better determine the proposed solution. Here are some options based on a great many assumptions. Maybe one of them will be a good fit for your project.
CAPACITY:
In this case, you are not that concerned about performance but more interested in capacity. In this case, the number of spindles don't really matter much. As Mingalsuo stated, put together a set of RAID-6 SATA arrays and use LVM to produce a large volume.
SMALL BUSINESS PERFORMANCE:
In this case, you need performance. The customer is going to store files but also requires the ability for a small number of simultaneous data streams. Here you want as many spindles as possible. For streaming, it does little good to focus on the size of the controller cache. Just focus on the number of spindles. You want as many as possible. Keep in mind that the time to rebuild a failed drive increases with the size of the drive. And, during a rebuild, your performance will suffer. For these reasons I'd suggest smaller drives. Maybe 1TB drives at most. This will provide you with faster rebuild times and more spindles for streaming.
ENTERPRISE PERFORMANCE:
Here you need high performance - similar to that that an enterprise demands. You require many simultaneous data streams and performance is required. In this case, I would stay away from SATA drives and use 900G or 1.2TB SAS drives instead. I would also suggest that you consider abstracting the storage layer from the server layer. Create a Linux server and use iSCSI (or fibre) to connect to the storage device. This will allow you to load balance if possible, or at the very least make recovery from disaster easier.
NON TRADITIONAL SOLUTIONS:
You stated that the environment has few high-speed connections to the internet. Again, depending on the requirements, you still might consider cloud storage. Hear me out :) Let's assume that the files will be uploaded today, used for the next week or month, and then rarely read. In this case, these files are sitting on (potentially) expensive disks for no reason except archive. Wouldn't it be better to keep those active files on expensive (local) disk until they "retire" and then move them to less expensive disk? There are solutions that do just that. One, for example, is called StorSimple. This is an appliance that contains SAS (and even flash) drives and uses cloud storage to automatically migrate "retired" data from the local storage to cloud storage. Because this data is retired it wouldn't matter if it took longer than normal to move it to the cloud. And, this appliance automatically pulls it back from the cloud to local storage when it is accessed. This solution might be too expensive for your project but there are similar ones that you might find will work for you. The added benefit of this is that your data is automatically backed up by the cloud provider and you have an unlimited supply of storage at your disposal.

How Reduce Disk read write overhead?

I have one website mainly composed on javascript. I hosted it on IIS.
This website request for the images from the particular folder on hard disk and display them to end user.
The request of image are very frequent and fast.
Is there any way to reduce this overhead of disk read operation ?
I heard about memory mapping, where portion of hard disk can be mapped and it will be used as the primary memory.
Can somebody tell me if I am wrong or right, if I am right what are steps to do this.
If I am wrong , is there any other solution for this ?
While memory mapping is viable idea, I would suggest using memcached. It runs as a distinct process, permits horizontal scaling and is tried and tested and in active deployment at some of the most demanding website. A well implemented memcached server can reduce disk access significantly.
It also has bindings for many languages including those over the internet. I assume you want a solution for Java (Your most tags relate to that language). Read this article for installation and other admin tasks. It also has code samples (for Java) that you can start of with.
In pseudocode terms, what you need to do, when you receive a request for, lets say, Moon.jpeg is:
String key = md5_hash(Moon.jpeg); /* Or some other key generation mechanism */
IF key IN memcached
SUPPLY FROM memcached /* No disk access */
ELSE
READ Moon.jpeg FROM DISK
STORE IN memcached ASSOCIATED WITH key
SUPPLY
END
This is a very crude algorithm, you can read more about cache algorithms in this Wiki article.
The wrong direction. You want to reduce IO to slow disks (relative). You would want to have the files mapped in physical memory. In simple scenarios the OS will handle this automagically with file cache. You may look if Windows provides any tunable parameters or at least see what perf metric you can gather.
If I remember correctly (years ago) IIS handle static files very efficiently due a kernel routing driver linked to IIS, but only if it doesnt pass through further ISAPI filters etc.. You can probably find some info related to this on Channel9 etc..
Long term wise you should look to move static assets to a CDN such as CloudFront etc..
Like any problem though... are you sure you have a problem?

What size is considered 'big' for the tbl_version table in TFS_Main database

We are currently experiencing significant waits in TFS database, and are trying to understand if these are as a consequence of the size of the tbl_Version version history table in the database.
Currently this table contains just over 20 million records, and is taking up approximately 6GB of storage space (total index space is just over 10GB). Looking at the queries that SQL Server is having to deal with, we have high PAGEIOLATCH_SH waits whenever this table is accessed. Obviously we don't have control over the queries being thrown at the database (all part of TFS).
Currently we have TFS on a Virtual Machine, and in essence want to get to understand whether we should (a) move to a physical machine, (b) attempt to reduce size of tbl_version or (c) follow a combination of these.
In our organisation it will be non-trivial to move to a physical server, so I'd like to get a feel for whether our table sizes are 'normal' or not before making any such decision.
PageLatch_SH typically indicates a wait for a page to be loaded from disk to memory. From the sounds of it tbl_Version is not being kept around in memory. There are 2 things you can do to improve the situation:
a. Get more RAM (not sure how much RAM you have on the server).
b. Get a faster disk subsystem.
In TFS 2010 we enable page compression if you have Enterprise Edition of SQL. This should help with the problem.
Based on some 2007 stats from Microsoft: http://blogs.msdn.com/b/bharry/archive/2007/03/13/march-devdiv-dogfood-statistics.aspx probably not the biggest.
But MS (as documented on that blog) had done some DB tuning, this I believe is in TFS 2010, but for earlier versions you'll probably need to talk to MS direct.
Caveat: We're using TFS 2008.
We're currently sitting with about 9GB of data (18GB index) with 31M rows. This is after about a year and a half of usage in an IS shop with 50-60 active developers.
Part of our problem, which we still need to address, is large binaries stored in the version control system. The answer to my question here may provide some information as to whether or not there are a few major offenders that are causing the size of that table to be bigger than you want.

Resources