How to increase resources available to Plastic Merge Tool? - plasticscm

When trying to merge changes in UE4's massive Commit.gitdeps.xml, I get the error "The files have too many differences, there aren't enough resources to complete the operation.". Is there a setting I can change in order to work around this error, such as specifying a higher RAM usage limit for mergetool?

f you are getting the "Unable to calculate the differences. The files have too many differences, there aren't enough resources to complete the operation" message, it means that you have reached the maximum differences limit.
The limit is 32k differences per operation. The file size shouldn´t be a problem.
It needs more or less 1,5Gb memory. We decided to limit the max differences because after this limit the memory increases a lot (we have the algorithm optimized until these values).
In order to reduce the differences, you can also try changing the comparison method in the external merge tool. This way, the differences number should be reduced.
In the Plastic "Preferences" panel, you can also configure a different merge/diff tool.

Related

How do I calculate memory bandwidth on a given (Linux) system, from the shell?

I want to write a shell script/command which uses commonly-available binaries, the /sys fileystem or other facilities to calculate the theoretical maximum bandwidth for the RAM available on a given machine.
Notes:
I don't care about latency, just bandwidth.
I'm not interested in the effects of caching (e.g. the CPU's last-level cache), but in the bandwidth of reading from RAM proper.
If it helps, you may assume a "vanilla" Intel platform, and that all memory DIMMs are identical; but I would rather you not make this assumption.
If it helps, you may rely on root privileges (e.g. using sudo)
#einpoklum you should have a look at Performance Counter Monitor available at https://github.com/opcm/pcm. It will give you the measurements that you need. I do not know if it supports kernel 2.6.32
Alternatively you should also check Intel's EMON tool which promises support for kernels as far back as 2.6.32. The user guide is listed at https://software.intel.com/en-us/download/emon-user-guide, which implies that it is available for download somewhere on Intel's software forums.
I'm not aware of any standalone tool that does it, but for Intel chips only, if you know the "ARK URL" for the chip, you could get the maximum bandwidth using a combination of a tool to query ARK, like curl, and something to parse the returned HTML, like xmllint --html --xpath.
For example, for my i7-6700HQ, the following works:
curl -s 'https://ark.intel.com/products/88967/Intel-Core-i7-6700HQ-Processor-6M-Cache-up-to-3_50-GHz' | \
xmllint --html --xpath '//li[#class="MaxMemoryBandwidth"]/span[#class="value"]/span/text()' - 2>/dev/null
This returns 34.1 GB/s which is the maximum theoretical bandwidth of my chip.
The primary difficulty is determining the ARK URL, which doesn't correspond in an obvious way to the CPU brand string. One solution would be to find the CPU model on an index page like this one, and follow the link.
This gives you the maximum theoretical bandwidth, which can be calculated as (number of memory channels) x (trasfer width) x (data rate). The data rate is the number of transfers per unit time, and is usually the figure given in the name of the memory type, e.g., DDR-2133 has a data rate of 2133 million transfers per second. Alternately you can calculate it as the product of the bus speed (1067 MHz in this case) and the data rate multiplier (2 for DDR technologies).
For my CPU, this calculation gives 2 memory channels * 8 bytes/transfer * 2133 million transfers/second = 34.128 GB/s, consistent with the ARK figure.
Note that theoretical maximum as reported by ARK might be lower or higher than the theoretical maximum on your particular system for various reasons, including:
Fewer memory channels populated than the maximum number of channels. For example, if I only populated one channel on my dual channel system, theoretical bandwidth would be cut in half.
Not using the maximum speed supported RAM. My CPU supports several RAM types (DDR4-2133, LPDDR3-1866, DDR3L-1600) with varying speeds. The ARK figure assumes you use the fastest possible supported RAM, which is true in my case, but may not be true on other systems.
Over or under-clocking of the memory bus, relative to the nominal speed.
Once you get the correct theoretical figure, you won't actually reach this figure in practice, due to various factors including the following:
Inability to saturate the memory interface from one or more cores due to limited concurrency for outstanding requests, as described in the section "Latency Bound Platforms" in this answer.
Hidden doubling of bandwidth implied by writes that need to read the line before writing it.
Various low-level factors relating the DRAM interface that prevents 100% utilization such as the cost to open pages, the read/write turnaround time, refresh cycles, and so on.
Still, using enough cores and non-termporal stores, you can often get very close to the theoretical bandwidth, often 90% or more.

Spreadsheet Gear -- Generating large report via copy and paste seems to use a lot of memory and processor

I am attempting to generate a large workbook based report with 3 supporting worksheets of 100,12000 and 12000 rows and a final output sheet all formula based that ends up representing about 120 entities at 100 rows a piece. I generate a template range and copy and paste it replacing the entity ID cell after pasting each new range. It is working fine but I noticed that memory usage in the IIS Express process is approx 500mb and it is taking 100% processor usage as well.
Are there any guidelines for generating workbooks in this manner?
At least in terms of memory utilization, it would help to have some comparison, maybe against Excel, in how much memory is utilized to simply have the resultant workbook opened. For instance, if you were to open the final report in both Excel and the "SpreadsheetGear 2012 for Windows" application (available in the SpreadsheetGear folder under the Start menu), what does the Task Manager measure for each of these applications in terms of memory consumption? This may provide some insight as to whether the memory utilization you are seeing in the actual report-building process is unusually high (is there a lot of extra overhead for your routine?), or just typical given the size of the workbook you are generating.
In terms of CPU utilization, this one is a bit more difficult to pinpoint and is certainly dependent on your hardware as well as implementation details in your code. Running a VS Profiler against your routine certainly would be interesting to look into, if you have this tool available to you. Generally speaking, the CPU time could potentially be broken up into a couple broad categories—CPU cycles used to "build" your workbook and CPU cycles to "calculate" it. It could be helpful to better determine which of these is dominating the CPU. One way to do this might be to, if possible, ensure that calculations don't occur until you are finished actually generating the workbook. In fact, avoiding any unnecessary calculations could potentially speed things up...it depends on the workbook, though. You could avoid calculations by setting IWorkbookSet.Calculation to Manual mode and not calling any of the IWorkbook’s "Calculate" methods (Calculate/CalculateFull/CalculateFullRebuild) until you are fished up with this process. If you don't have access to a Profiler too, maybe set some timers, Console.WriteLines and monitor the Task Manager to see how your CPU fluctuates during different parts of your routine. With any luck you might be able to better isolate what part of the routine is taking the most amount of time.

Measuring TLB miss handling cost in x86-64

I want to estimate the performance overhead due to TLB misses on a x86-64 (Intel Nehalem) machine running Linux. I wish to get this estimate by using some performance counters. Does anybody has some pointers on what is the best way to estimate this?
Thanks
Arka
If you can get access to a "Westmere" based system the performance characteristics of your code should be quite similar to what you have on the "Nehalem", but you will have access to a new hardware performance counter event that measures almost exactly what you want.
On Westmere, the best estimate of performance lost while waiting for TLB misses to be handled is probably from the hardware performance counter Event 08H, Mask 04H "DTLB_LOAD_MISSES.WALK_CYCLES", which is described as counting "Cycles Page Miss Handler is busy with a page walk due to a load miss in the Second Level TLB".
This is described in "Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 3B: System Programming Guide, Part 2" (document number: 253669), available online at
http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html
The reason this event is necessary is that TLB miss processing time is dominated by the time required to read the cache line containing the page table entry. If that cache line is in the L2 cache, then the overhead of a TLB misses will be very small (of the order of 10 cycles). If the line is in the L3 cache, then maybe 25 cycles. If the line is in memory, then ~200 cycles.
If there is also a miss in the upper-level page translation caches, it will take multiple trips to memory to find and retrieve the desired page table entry (e.g., https://stackoverflow.com/a/9674980/1264917).
On some processors the L2 cache counters can tell you how many table walks hit and missed in the L2, but not on Nehalem. (It would not help a lot in this case since TLB walks that hit in the L3 are also fairly fast and what you really want are the TLB walks that have to go to memory.)

Limiting ImageMagick Memory Use

Basically I'm trying to keep the memory use on my Nginx server under a certain amount, both because I'm insane (according to my friends) & I want to save money. However I'm worried ImageMagick may push it over the edge.
I'm using -limit area 20MiB and I've also tried -limit memory 15MiB -limit map 15MiB but when checking the process (as it runs) through top -c (with Shift-M) and ps aux it shows it using, sometimes, considerably more memory than I've set in the limits. To give numbers it may be using 35MB or 40MB, instead of the 20MB/30MB I would expect. I wouldn't be bothered for 2MB or 3MB but that's quite a large offset.
I've been told the extra memory may be the ImageMagick's overhead as it loads the interpreter etc, but I'm not super familiar with Unix programs so haven't a clue in that department.
If anyone can explain why this is happening, that would be great. If it's a normal thing, great. I'll just adjust things to take into account the fact that it may use my limit plus a certain amount, but if it isn't and the -limit parameter doesn't limit memory to a certain amount, what exactly is the point in having that parameter in ImageMagick?
Again thanks for your help in advance, it's much appreciated, as always.
According the documentation ImageMagick is moving all memory operations to mmaped files, so it will start to swap if you have enough disk space, see the manual:
SNIP from manual -limit:
The value for File is in number of files. The Disk limit is in
Gigabutes and the values for the other resources are in Megabytes. By
default the limits are 768 files, 1024MB memory, 4096MB map, and
unlimited disk, but these are adjusted at startup time on platforms
that can provide information about available resources. When the limit
is reached, ImageMagick will fail in some fashion, or take
compensating actions if possible. For example, -limit memory 32 -limit
map 64 limits memory When the pixel cache reaches the memory limit it
uses memory mapping. When that limit is reached it goes to disk. If
disk has a hard limit, the program will fail.
The Limits only affect ImageMagick's pixel cache. The program code and anything the libraries / delegates may do to load or process the images are not influenced by these settings at all.
You don't specify what you're looking at in top, the proper column would obviously be RES or RSIZE. With such small limits as 20MiB, the program and library code will represent a significant fraction of resident set size.
To verify that you're using the right units for your environment variables, use identify -list resource . If the size of the memory pixel cache (MAGICK_MEMORY_LIMIT) is insufficient for an image, an mmap-ed file will be used (MAGICK_MAP_LIMIT) and if that limit is too low, a conventional disk file (MAGICK_DISK_LIMIT) is used instead. If all the limits are too low, ImageMagick will fail immediately with an error such as cache resources exhausted, Memory allocation failed or corrupt image.

2 Files, Half the Content, vs. 1 File, Twice the Content, Which is Greater?

If I have 2 files each with this:
"Hello World" (x 1000)
Does that take up more space than 1 file with this:
"Hello World" (x 2000)
What are the drawbacks of dividing content into multiple smaller files (assuming there's reason to divide them into more files, not like this example)?
Update:
I'm using a Macbook Pro, 10.5. But I'd also like to know for Ubuntu Linux.
Marcelos gives the general performance case. I'd argue worrying about this is premature optimization. you should split things into different files where it is logical to split them.
also if you really care about file size of such repetitive files then you can compress them.
your example even hints at this, a simple run length encoding of
"Hello World"x1000
is much more space efficient than actually having "hello world" written out 1000 times.
Files take up space in the form of clusters on the disk. A cluster is a number of sectors, and the size depends on how the disk was formatted.
A typical size for clusters is 8 kilobytes. That would mean that the two smaller files would use two clusters (16 kilobytes) each and the larger file would use three clusters (24 kilobytes).
A file will by average use half a cluster more than it's size. So with a cluster size of 8 kilobytes each file will by average have an overhead of 4 kilobytes.
Most filesystems use a fixed-size cluster (4 kB is typical but not universal) for storing files. Files below this cluster size will all take up the same minimum amount.
Even above this size, the proportional wastage tends to be high when you have lots of small files. Ignoring skewness of size distribution (which makes things worse), the overall wastage is about half the cluster size times the number of files, so the fewer files you have for a given amount of data, the more efficiently you will store things.
Another consideration is that metadata operations, especially file deletion, can be very expensive, so again smaller files aren't your friends. Some interesting work was done in ReiserFS on this front until the author was jailed for murdering his wife (I don't know the current state of that project).
If you have the option, you can also tune the file sizes to always fill up a whole number of clusters, and then small files won't be a problem. This is usually too finicky to be worth it though, and there are other costs. For high-volume throughput, the optimal file size these days is between 64 MB and 256 MB (I think).
Practical advice: Stick your stuff in a database unless there are good reasons not to. SQLite substantially reduces the number of reasons.
I think the usage of file(s) is to take into consideration, according to the API and the language used to read/write them (and hence eventually API restrictions).
Fragmentation of the disk, that will tend to decrease with only big files, will penalize data access if you're reading one big file in one shot, whereas several access spaced out time to small files will not be penalized by fragmentation.
Most filesystems allocate space in units larger than a byte (typically 4KB nowadays). Effective file sizes get "rounded up" to the next multiple of that "cluster size". Therefore, dividing up a file will almost always consume more total space. And of course there's one extra entry in the directory, which may cause it to consume more space, and many file systems have an extra intermediate layer of inodes where each file consumes one entry.
What are the drawbacks of dividing
content into multiple smaller files
(assuming there's reason to divide
them into more files, not like this
example)?
More wasted space
The possibility of running out of inodes (in extreme cases)
On some filesystems: very bad performance when directories contain many files (because they're effectively unordered lists)
Content in a single file can usually be read sequentially (i.e. without having to move the read/write head) from the HD, which is the most efficient way. When it spans multiple files, this ideal case becomes much less likely.

Resources