I need to profile a virtual machine memory access in terms of the number of page
faults per second generated and number of last level cache miss encountered per
second. Is there a standard test suite that helps me achieve this?
Below I describe the exact scenario I need to achieve:
Run a program / test suite on a virtual machine to generate enormous number
of page faults.
Run a program / test suite on a virtual machine to generate large number of
last level cache misses.
Monitor the number of page faults per second and last level cache miss per
second on the virtual machine.
Monitor the corresponding number of page faults & last level cache miss on
hosting bare metal machine.
Beyond this is the set of analysis results I need to generate.
Query 1:
Is there a standard test suite which helps me achieve my objective? Please point
out the reference if so. I browsed through SPEC benchmarks, but I did not seem to
find anything of much use to my work.
Query 2:
IF there is no such suite, is there a way I can write a program to emulate the
scenario described above?
Any pointers in either directions are appreciated.
Thanks!
Related
I'm running mobilnet SSD and getting around 14ms per input image. Is it possible for me to run two of these models at the same time on the same dev board tpu? For example I have a backlog of 100 images I want to get through and the only thing that is important to me is how long it takes to get through all 100. So if I could run 2 or 4 at a time that would be amazing. I tried to read through the docs and I looked at pipelining but the edge compiler tells me "~$ Warning: For the given model, you're creating more segments than is necessary". Everything else I've read about running in parallel is about using two physical edge TPUs. If it's not possible that's fine I just want to know for sure :)
Thank you
You can run multiple models, but the TPU has limited memory and will swap your models in and out so you may not see a performance improvement by delegating your task to multiple models. However, you could co-compile your models. This process 'compiles' each model with the same identifier (a caching token) which enables them both to run on the TPU without getting swapped in and out.
Compiling models is done with the edgetpu_compiler; the process works like this:
edgetpu_compiler someModel.tflite someOtherModel.tflite
Or with the same model:
edgetpu_compiler someModelA.tflite someModelA_duplicate.tflite
There are some nuances to the process, such as the order in which you feed the models to the edgetpu_compiler process can impact performance as does the scenario where your combined models are too big to fit into the TPU RAM. I suggest starting with this documentation about multiple models.
So I received a dated schema that used to work well at the beginning but it's experiencing some scaling issues.
Among of them, the space used by the indexes is catching my attention so I would like to know if they are being used, how many times, etc.
Other that explaining/profiling queries, is there anything else I could use to have this kind of information?
The information you are looking for would be under metrics monitoring, but index accesses is not one of the available metrics Neo4j provides. (Neo4j supports Prometheus, but I don't know if Prometheus captures that info either)
But there are some indirect ways you can get this data.
Assuming you have a test server that replicates production, with appropriate load tests, you can try removing the index and seeing how it affects the load tests. (This way is a bit cumbersome, but probably gives the most accurate measure of how varies DB changes affect performance, but only if the load tests accurately reflect production use.)
Alternatively, for a more static analysis, you should only be executing pre-defined, parameterized cyphers. So you can Profile/Explain those Cyphers against the DB at different scales, and compare those notes to the Cypher logs (either calling end, or using Neo4j metrics monitoring) to get an idea of how often each one is called.
We are capturing information for consumer sites in multiple different report suites.
Is it possible to merge all these data to a parent report suite without adding that parent report suite's account id in s_account variable?
For example
Site 1 uses report-suite1
s_account = "report-suite1";
Site 2 uses report-suite2
s_account = "report-suite2"
Instead of using
s_account = "report-suite1,report-suite2"
is it possible to merge the data to a 3rd virtual account from the Reports console itself?
The only way you can route data to a separate fully fledged report suite is either via javascript (e.g. setting s_account as you have shown in your post), or to ask Adobe To create a VISTA rule.
You didn't state your reasons for not wanting to throw a "global" rsid into your js code. Is it because you don't have the technical resources/ability to do it? If so, and if you want a full 3rd rsid for all the data to go to, then you can ask Adobe to create a VISTA rule. It should be fairly easy for them to setup, but they will charge you for it. And I think they will create one for each report suite. I don't generally recommend going this route unless you really have to, though. Mostly because the cost, but also because you don't have personal visibility into it.
Alternatively, if you do have the tech resources to update the js code, but the cost of throwing another rsid into the mix is an issue (from extra server hits), then you may want to consider replacing all of your report suites with a single global report suite, e.g.
s_account='report-global';
Then, create a Virtual Report Suite for each site. You can go to Components > Virtual Report Suites to set them up. The TL;DR is you create them by pointing at your report-global rsid as the source and then creating a segment based off something unique to the site (e.g. the domain, or maybe some eVar with a site-specific value).
The major downside to going the virtual report suite route is historical data from your previous report suites will not be available in the same place as this new global report suite and its virtual report suites. But it's a "one time migration" thing, and the historical data won't be lost; you'll just have some extra work on your end referencing it in the old rsids, esp if you want to compare historical to current in the new (virtual) risds.
The 2nd major thing to consider is unique limits. Not sure how much traffic / unique values vars get on your sites, but there is a monthly unique value limit you may have to consider with all of the sites going to the same report suite. Beyond looking at tricks to make values less unique on a case by case basis (e.g. removing query param string from URLs), there isn't a good way to solve for this except to stick with separate rsids. Well.. Adobe will increase unique limit on certain vars if you ask them, but it will cost you..
Another alternative to consider is a Rollup report suite. If you go to Admin > Report Suites, where your current report suites are listed. To the left you should see Rollups and an Add link next to it. This will create a Rollup report suite made up of data from one or more report suites.
Note though that a Rollup report suite is not the same as full fledged report suite. Please refer to the link above for full details/limitations, but the main benefit is it won't cost you anything except the couple of minutes to set it up in the interface. But the limitations of it.. the main points of note are you only get aggregated data, data is not deduped between the rsids, and many reports are limited or not available. In practice, I rarely ever see anybody actually go this route because it's too limited. But hey, maybe it's good enough for you.
I'm running a job which reads about ~70GB of (compressed data).
In order to speed up processing, I tried to start a job with a large number of instances (500), but after 20 minutes of waiting, it doesn't seem to start processing the data (I have a counter for the number of records read). The reason for having a large number of instances is that as one of the steps, I need to produce an output similar to an inner join, which results in much bigger intermediate dataset for later steps.
What should be an average delay before the job is submitted and when it starts executing? Does it depend on the number of machines?
While I might have a bug that causes that behavior, I still wonder what that number/logic is.
Thanks,
G
The time necessary to start VMs on GCE grows with the number of VMs you start, and in general VM startup/shutdown performance can have high variance. 20 minutes would definitely be much higher than normal, but it is somewhere in the tail of the distribution we have been observing for similar sizes. This is a known pain point :(
To verify whether VM startup is actually at fault this time, you can look at Cloud Logs for your job ID, and see if there's any logging going on: if there is, then some VMs definitely started up. Additionally you can enable finer-grained logging by adding an argument to your main program:
--workerLogLevelOverrides=com.google.cloud.dataflow#DEBUG
This will cause workers to log detailed information, such as receiving and processing work items.
Meanwhile I suggest to enable autoscaling instead of specifying a large number of instances manually - it should gradually scale to the appropriate number of VMs at the appropriate moment in the job's lifetime.
Another possible (and probably more likely) explanation is that you are reading a compressed file that needs to be decompressed before it is processed. It is impossible to seek in the compressed file (since gzip doesn't support it directly), so even though you specify a large number of instances, only one instance is being used to read from the file.
The best way to approach the solution of this problem would be to split a single compressed file into many files that are compressed separately.
The best way to debug this problem would be to try it with a smaller compressed input and take a look at the logs.
I have 357 tests (534 assertions) for my app (using Shoulda). The whole test suite runs in around 80 seconds. Is this time OK? I'm just curious, since this is one of my first apps where I write tests extensively. No fancy stuff in my app.
Btw.: I tried to use in memory sqlite3 database, but the results were surprisingly worse (around 83 seconds). Any clues here?
I'm using Macbook with 2GB of RAM and 2GHz Intel Core Duo processor as my development machine.
I don't feel this question is rails specific, so I'll chime in.
The main thing about testing is that it should be fast enough for you to run them a lot (as in, all the time). Also, you may wish to split your tests into a few different sets, specifically things like 'long running tests' and 'unit tests'.
One last option to consider, if your database setup is time consuming, would be to create your domain by restoring from a backup, rather than doing a whole bunch of inserts.
Good luck!
You should try this method https://github.com/dchelimsky/rspec/wiki/spork---autospec-==-pure-bdd-joy- using spork to spin up a couple of processes that stay running and batch out your tests. I found it to be pretty quick.
It really depends on what your tests are doing. Test code can be written efficiently or not in exactly the same way as any other code can.
One obvious optimisation in many cases is to write your test code in such a way that everything (or as much as possible) is done in memory, as opposed to many read/writes to the database. However, you may have to change your application code to have the right interfaces to achieve this.
Large test suites can take some time to run.
I generally use "autospec -f" when developing, this only runs the specs that have changed since the last run - makes it much more efficient to keep your tests running.
Of course, if you are really serious, you will run a Continuous Integration setup like Cruise Control - this will automate your build process and run in the background, checking out your latest building and running the suite.
If you're looking to speed up the runtime of your test suite, then I'd use a test server such as this one from Roman Le NĂ©grate.
You can experiment with preloading fixtures, but it will be harder to maintain, and, IMHO, not worth it's speed improvements (20% maximum I think, but it depends)
It's known that SQLite is slower than mysql/pgsql, excepting very small, tiny DBs.
As someone already said, you can put mysql (or other DB) datafiles on some kind of RAMDisk (I use tmpfs on linux).
PS: we have 1319 Rspec examples now, and it runs for 230 seconds on C2D-3Ghz-4GRam, and I think it's fine. So, yours is fine too.
As opposite to in-memory SQLite, you can put a MySQL database on RAMDISK (on Windows) or on tmpfs on Linux.
MySQL has a very efficient buffering, so putting database in memory does not help a lot until you update a lot of data really often.
More significant is the way of test isolation and data preparation for each test.
You can use transactional fixtures. That means that each test will be wrapped into transaction and thus next test will start at the initial point.
This is faster than cleaning up the database before each test.
There are situations when you want to use both transactions and explicit data erasing, here is a good article about it: http://www.3hv.co.uk/blog/2009/05/08/switching-off-transactions-for-a-single-spec-when-using-rspec/