I am using Elki for running LOF algorithm but every time i run LOF algorithm in Elki, Elki is giving different runtime on same data set.
I am confused why is this happening?
As #Anony-Mousse mentioned in the comments, getting reliable runtime values is hard.
This is not specific to ELKI - the LOF algorithm does not involve randomness, so one would assume that you get the same runtime every time. But in reality, you just don't get the same result every time.
Here are some hints on getting more reliable runtimes:
Do not run multiple experiments in the same process. Start a fresh JVM for each (otherwise, it will likely have to garbage collect the objects of the previous run at some time otherwise).
Runtime differences of less than 1 second are not significant. To measure runtimes of say less than a second, you should rather use specialized tools such as JMH.
Run several times, and use the median value, or a truncated mean.
Disable turbo boost, and make sure your CPU is well cooled. On Linux (and I would suggest that you use a dedicated Linux system for benchmarking!), you can disable CPU boost via:
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
My CPU has a 3.4 GHz base clock, but can boost to 3.9 GHz. That is 15% faster, but it cannot run 3.9 Ghz when using all cores. Depending on the number of threads running, the actual clock could be 3.4, 3.7, 3.8, 3.9 GHz; and it could even temporarily decide to throttle down to even 1.6 GHz. So you may even need to monitor your CPU clock speeds (e.g. see grep "^cpu MHz" /proc/cpuinfo). But definitely disable to turbo boost.
Do not use -verbose. Logging has a noticeable impact on runtime.
Do not use the MiniGUI. Notice that the MiniGUI displays a command line for you to copy, like KDDCLIApplication -dbc.in file.csv -algorithm ... that allows you to run the experiment from console, without GUI overhead.
Do not use visualization. Visualization needs a lot of memory, and CPU, and that will have a measureable impact. ELKI tries to measure the algorithm time independently, and then visualize afterwards; but there might still be some memory cleanup in the background even if you have closed the visualization window. Preferably, for runtime benchmarking, use -resulthandler DiscardResultHandler to discard the actual result; and get the timinig from the -time log (the log should be only a few lines long; avoid excessive output)
Related
Neo4j 3.5.12 Community Edition
Ubuntu Server 20.04.2
RAM: 32 Gb
EC2 instance with 4 or 8 CPUs (I change it to accommodate for processing at the moment)
Database files: 6.5Gb
Python, WSGI, Flask
dbms.memory.heap.initial_size=17g
dbms.memory.heap.max_size=17g
dbms.memory.pagecache.size=11g
I'm seeing high CPU use on the server in what appears to be a random pattern. I've profiled all the queries for the pages that I know that people are visiting at those times and they are all optimised with executions under 50ms in all cases. The CPU use doesn't seem linked with user numbers which are very low at most times anyway (max 40 concurrent users). I've checked all queries in cron jobs too.
I reduced the database notes significantly and that made no difference to performance.
I warm the database by preloading all nodes into ram with MATCH (n) OPTIONAL MATCH (n)-[r]->() RETURN count(n.prop) + count(r.prop);
The pattern is that there will be a few minutes of very low CPU use (as I would expect from this setup with these user numbers) and then processing on most CPU cores goes up to the high 90%s and the machine becomes unresponsive to new requests. Changing to an 8CPU instance sorts it, but shouldn't be needed for this level of traffic.
I would like to profile the queries with query logging, but the community edition doesn't support that.
Thanks.
Run a CPU profiler such as perf to record where CPU time is spent. You can then visualize it as a FlameGraph or, since your bursts only occur at random intervals, visualize it over time with Netflix' FlameScope
Since Neo4j is a Java application, it might also be worthwhile to have a look at async-profiler which is priceless when it comes to profiling Java applications (and it generates similar FlameGraphs and can output log files compatible with FlameScope or JMC)
I wonder why increasing of --ram_utilization_factor is not recommended (from the docs):
This option, which takes an integer argument, specifies what percentage of the system's RAM Bazel should try to use for its subprocesses. This option affects how many processes Bazel will try to run in parallel. The default value is 67. If you run several Bazel builds in parallel, using a lower value for this option may avoid thrashing and thus improve overall throughput. Using a value higher than the default is NOT recommended. Note that Bazel's estimates are very coarse, so the actual RAM usage may be much higher or much lower than specified. Note also that this option does not affect the amount of memory that the Bazel server itself will use.
Since Bazel has no way of knowing how much memory an action/worker uses/will use, the only way of setting this up seems ram_utilization_factor.
That comment is very old and I believe was the result of some trial and error when --ram_utilization_factor was first implemented. The comment was added to make sure that developers would have some memory left over for other applications to run on their machines. As far as I can tell, there is no deeper reason for it.
I wanted to start erlang while varying the number of cores
in order to test the scalability of my program. I expect that running the program on more cores should be faster than running on less cores.
How can I specify the core limits?
In fact, I have tried with smp -disable (and I supposed that it will run on 1 core? isn't it?) But the execution time still the same as with more cores.
I tried also to put +S 1:1 (assuming 1 scheduler so as to run on 1 core? as well as other scheduler numbers), but it seems nothing has changed.
Was that because of characteristic of my program or did I do something wrong on specifying the core limits?
And if possible could someone give some tips on how to scale your Erlang programs.
Thank you very much.
This is the first time that I've actually run into timing issues regarding the task I have to tackle. I need to do a calculation (running against a webservice) with approximately 7M records. This would take more than 180hrs, so I was thinking about running multiple instances of the webservice on EC2 and just running rake tasks in parallel.
Since I have never done this before, I was wondering what needs to be considered.
More precisely:
What's the maximum number of rake tasks I can run (Is there any limit
at all besides your own machine power)?
What's the maximum number of concurrent connections to a postgres 9.3
db?
Are there any things to be considered when running multiple
active_record.save actions at the same time?
I am looking forward to hearing your thoughts.
Best,
Phil
rake instances
Every time you run rake, you are running a new instance of your ruby server, with all associated memory and related load-dependency usages. Look in your Rakefile for the inits.
your number of instances in limited by memory and CPU used
you must profile each memory and CPU to know how many can be run
you could write a program to monitor and calculate what's possible, but heuristics will work better for one-off, and first experiments.
datastore
heuristically explore your database capacity, too.
watch for write-locks that create blocking
watch for slow reads due to missing indices
look at your postgres configs to see concurrency limits, cache size, etc.
.save
each rake task is its own ruby server, so multiple active_record.save actions impacts:
blocking/waiting due to write-locking
one instance getting 'old' data that was read prior to another's update .save
operational complexity
the number of records (7MM) is just a multiplier for all of the operations that occur upon each record. The operational complexity is the source of limitation, since theoretically, running 7MM workers would solve the problem in the minimum timescale
if 180hr is accurate (dubious), then (180 * 60 * 60 * 1000) / 7000000 == 92.57 ms per process.
Look for any shared-resource that is an IO blocker.
look for any common calculation that you can do in advance and cache. A lookup beats a calc.
errata
leave headroom for base OS processes. These will vary by your environment, but you mention AWS but best to conceptually learn how to monitor any system for activity
run top in a separate screen / terminal as the rakes are running.
Prefer to run 2 tops in different screens. sort 1 by memory, sort the other by CPU
have a way to monitor the rakes
watch for events that bubble up the top processes.
if you do this long / well enough, you've profiled you headroom
run more rakes to fill your headroom
don't overrun your memory or you'll get swapping
You may want to consider beanstalk instead, but my guess is you'll find that more complicated than learning all these good foundations, first.
I'm using savon gem to interact with a soap api. I'm trying to send three parallel request to the api using parallel gem. Normally each request takes around 13 seconds to complete so for three requests it takes around 39 seconds. After using parallel gem and sending three parallel requests using 3 threads it takes around 23 seconds to complete all three requests which is really nice but I'm not able to figure out why its not completing it in like 14-15 seconds. I really need to lower the total time as it directly affects the response time of my website. Any ideas on why it is happening? Are network requests blocking in nature?
I'm sending the requests as follows
Parallel.map(["GDSSpecialReturn", "Normal", "LCCSpecialReturn"], :in_threads => 3){ |promo_plan| self.search_request(promo_plan) }
I tried using multiple processes also but no use.
I have 2 theories:
Part of the work load can't run in parallel, so you don't see 3x speedup, but a bit less than that. It's very rare to see multithreaded tasks speedup 100% proportionally to the number of CPUs used, because there are always a few bits that have to run one at a time. See Amdahl's Law, which provides equations to describe this, and states that:
The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program
Disk I/O is involved, and this runs slower in parallel because of disk seek time, limiting the IO per second. Remember that unless you're on an SSD, the disk has to make part of a physical rotation every time you look for something different on it. With 3 requests at once, the disk is skipping repeatedly over the disk to try to fulfill I/O requests in 3 different places. This is why random I/O on hard drives is much slower than sequential I/O. Even on an SSD, random I/O can be a bit slower, especially if small-block read-write is involved.
I think option 2 is the culprit if you're running your database on the same system. The problem is that when the SOAP calls hit the DB, it gets hit on both of these factors. Even blazing-fast 15000 RPM server hard drives can only manage ~200 IO operations per second. SSDs will do 10,000-100,000+ IO/s. See figures on Wikipedia for ballparks. Though, most databases do some clever memory caching to mitigate the problems.
A clever way to test if it's factor 2 is to run an H2 Database in-memory DB and test SOAP calls using this. They'll complete much faster, probably, and you should see similar execution time for 1,3, or $CPU-COUNT requests at once.
That's actually is big question, it depends on many factors.
1. Ruby language implementation
It could be different between MRI, Rubinus, JRuby. Tho I am not sure if the parallel gem
support Rubinus and JRuby.
2. Your Machine
How many CPU cores do you have in your machine, you can leverage this using parallel process? Have you tried using process do this if you have multiple cores?
Parallel.map(["GDSSpecialReturn", "Normal", "LCCSpecialReturn"]){ |promo_plan| self.search_request(promo_plan) } # by default it will use [number] of processes if you have [number] of CPUs
3. What happened underline self.search_request?
If you running this in MRI env, cause the GIL, it actually running your code not concurrently. Or put it precisely, the IO call won't block(MRI implementation), so only the network call part will be running concurrently, but not all others. That's why I am interesting about what other works you did inside self.search_request, cause that would have impact on the overall performance.
So I recommend you can test your code in different environments and different machines(it could be different between your local machine and the real production machine, so please do try tune and benchmark) to get the best result.
Btw, if you want to know more about the threads/process in ruby, highly recommend Jesse Storimer's Working with ruby threads, he did a pretty good job explaining all this things.
Hope it helps, thanks.