Jena tdbloader performance and limits - jena

When trying to load a current Wikidata dump as documented in Get Your Own Copy of WikiData by following the procedure describe in https://muncca.com/2019/02/14/wikidata-import-in-apache-jena/ i am running into some performance problems and limits of Apache Jenas tdbloader commands.
There seem to be two versions of it:
tdbloader2
tdb2.tdbloader
The name tdbloader2 for the TDB1 tdbloader is confusing and led to
its usage as a first attempt.
The experience with TDB1/tdbloader2 was that the loading went quite well for the first few billion triples.
The speed was 150 k triples/second initially. It then fell to some 100k triples/second at around 9 billion triples. At 10 billion triples the speed dropped to 15000 triples/second at around 10 billion triples and stayed around 5000 triples/second when moving towards 11 billion triples.
I had expected the import to have finished by then so currently i am even doubting that the progress is counting triples but instead lines of turtle input which might not be the same since the input has some 15 billion lines but only some 11 billion triples are expected.
Since the import already ran for 3.5 days at this point i had to make a decision whether to abort it and look for better import options or simply wait for a while.
So i placed this question on stackoverflow. Based on AndyS's hint that there are two versions of tdbloader i aborted the TDB1 import after some 4.5 days and over 11 billion triples having been reported to be imported in the phase "data". The performance was down to 2.3 k triples/second at that point.
With the modified script using tdb2.tdbloader the import has been running again for multiple attempts as documented in the wiki. Two import tdb2.tdbloader attempts already failed with crashing Java VMs so I changed the hardware from my MacPro to the old linux box again (which is unfortunately slower) and later back again.
I changed the java virtual machine to a recent OpenJDK after the older Oracle JVM crashed in a first attempt with tdb2.tdbloader. This Java VM crashed with the same symptomps # Internal Error (safepoint.cpp:310), see e.g. https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8169477
For the attempts with tdb2.tdbloader I'll assume that 15.7 billion triples need to be imported (one per line of the turtle file). For a truthy dataset the number of triples would be some 13 billion triples.
If you look a the performance results shown in the wiki article you'll find that when there is a logarithmic performance degradation. For rotating disks the degradation is so bad that it makes the import take so long it's not worthwhile waiting for the result (we are talking multiple months here ...)
In the diagram below both axes have a logarithmig scale.
The x-axis shows the log of the total number of triples imported (up to 3 billion when the import was aborted)
The y-axis shows the log of the batch / avg sizes - the number of triples imported in a given time frame.
The more triples are imported the slower things get from a top 300.000 triples per second to as low as only 300 triples per second.
With the 4 th attempt the performance was some 1k triples/second after 11 days and some 20% of the data imported. This would mean an estimated time of finish of the import after 230 days - given the degradation of the speed probably quite a bit longer (more than a year).
The target database size was 320 GByte so hopefully the result would fit in the 4 TerraByte of disk space allocated for the target and is not the limiting factor.
Since Jonas Sourlier reported on his success after some 7 days with an SSD disk i finally asked my project lead for financing a 4 TB SSD disk and lend it to me for experiments. With that disk a fifth attempt was successful now for the truthy dataset some 5.2 billion triples where imported after about 4 1/2 days. The bad news is that this is exactly what i didn't want - i had hoped to solve the problem by software and configuration settings and not by throwing quicker and more costly hardware at the problem. Nevertheless here is the diagram for this import:
I intend to import the full 12 billion triples soon and for that it would still be good to know how to improve the speed with software / configuration settings or other non hardware approaches.
I did not tune the Java VM Args or split the files yet as mentioned Apache Users mailing list discussion of end of 2017
The current import speed is obviously inacceptable. On the other hand heavily investing in extra hardware is not an option due to a limited budget.
There are some questions that are not answered by the links you'll find in the wiki article mentioned above:
https://www.wikidata.org/wiki/Wikidata:Database_download#RDF_dumps
Failed to install wikidata-query-rdf / Blazegraph
Wikidata on local Blazegraph : Expected an RDF value here, found '' [line 1]
Wikidata import into virtuoso
Virtuoso System Requirements
https://muncca.com/2019/02/14/wikidata-import-in-apache-jena/
https://users.jena.apache.narkive.com/J1gsFHRk/tdb2-tdbloader-performance
What is proven to speed up the import without investing into extra hardware?
e.g. splitting the files, changing VM arguments, running multiple processes ...
What explains the decreasing speed at higher numbers of triples and how can this be avoided?
What successful multi-billion triple imports for Jena do you know of and what are the circumstances for these?

Related

Query on Estimated Savings

We have a question regarding the time saved by rectifying the opportunities presented on Google Lighthouse.
Question 1:
We have embedded an example of a Lighthouse scan from our company's website.
These are the opportunities identified:
Reduce initial server response time (estimated savings: 1.57s)
Enable text compression (estimated savings: 1.02s)
Reduce unused CSS (estimated savings: 0.72s)
Reduce unused JavaScript (estimated savings: 0.55s)
Eliminate render-blocking resources (estimated savings: 0.55s)
May I understand if assuming all issues addressed by Lighthouse is addressed by our developers, then we should be able to save at least 4 minutes and 41 seconds of performance speed (4m41s is derived by totaling all the estimated savings identified by Lighthouse)?
Question 2:
If it is true that I am able to save 4 minutes and 41 seconds of performance speed, which of the 6 metrics do I match this data against?
Thank you.
First of all, summing the time estimates in the Opportunities section gives you 4.41 seconds, not 4 minutes 41 seconds. In addition, these are just time estimates, so the actual savings could vary. These estimates are mostly there to help identify the most important areas that can be focused on, prioritizing engineering efforts.
The metrics shown in the second image are what the score is calculated with, not the opportunities in the first image (though improving those will likely help your score). web.dev (a resource by Google, who makes Lighthouse) has some great resources explaining what the metrics mean and also how the score is calculated from those metrics. These metrics taken together can help you see a bigger picture of how your page is performing.

Castalia Memory Issue

My application layer protocol works fine, but when the number of nodes is large (more than 600) it exits without any error.
I traced the code and didn't find any problem. It seems a memory problem since the number of nodes is large and doing many operations.
Update:
In my application:
Each node broadcasts 2msg/second, during all the simulation time.
The msgs contain much information related to my application.
All the nodes are static.
Using BypassRouting, BypassMAC, Radio cc2420.
Castalia works for nodes larger than 600 and reaches to 2500 from my previous experiments but with low simulation time ... so it depends on the relation between the # of nodes and simulation time and # of sent messages per second.
Single experiment run successfully... but when running for example with 30 seed (i.e. -r 30) ... & num of nodes = 110
its stopped after exp 13 simulation time = 1000s
& its stopped after exp 22 if simulation time = 600s
How I can free memory from unnecessary things during simulation runs.
(note: previously I increased the swap memory and worked for a specific limit)
Thanks,
Without more information on your application and the simulation scenario it's hard to provide very specific suggestions. At the very least, you could provide your ini file and information about any custom modules you are using (your application module for example). Are you using any mobile nodes for example? Which protocols are you using? What does you app module do? In general Castalia should be able to handle 600 nodes. In the past, we have tested Castalia with thousands of (static) nodes.
You could use a memory profiler. An excellent tool (a suite of tools really) is valgrind. You can find memory leaks, and you can also memory profile your program. The heap profiler tool of valgrind is called 'massif':
Massif is a heap profiler. It performs detailed heap profiling by taking regular snapshots of a program's heap. It produces a graph showing heap usage over time, including information about which parts of the program are responsible for the most memory allocations. The graph is supplemented by a text or HTML file that includes more information for determining where the most memory is being allocated. Massif runs programs about 20x slower than normal.
Read the valgrind documentation for more info. This is the way you invoke the tool:
valgrind --tool=massif <executable> <arguments>
The executable in this case is CastaliaBin (not the Castalia python script, which is a higher level execution tool).

Torch: RNN clones run out of GPU memory

Karpathy's char-rnn (based on Wojciechz learning_to_execute) uses a common RNN hack:
clone a prototype network as many times as there are time steps per sequence
share the parameters between the clones
I can watch my 5GB GPU memory run out when I clone 217 times (the threshold is likely lower), resulting in this:
lua
opt/torch/install/share/lua/5.1/torch/File.lua:270: cuda runtime error (2) : out of memory at /mounts/Users/student/davidk/opt/torch/extra/cutorch/lib/THC/THCStorage.cu:44
The problem is the clone_many_times() function (linked above). The clones seem to point to the same physical parameters storage in the prototype, but for some reason it still explodes.
Has anyone encountered this and/or have an idea how to train really long sequences?
(Same question asked here: https://github.com/karpathy/char-rnn/issues/108)
To run the model, I had to increase memory capacity on the GPUs. With Sun's Grid Engine, use -l h_vmem=8G for 8 GB
Otherwise, you can try torch-rnn. It uses Adam for optimization and hard-codes the RNN/LSTM forward/backward passes for space/time efficiency. This also avoids headaches with cloning models.

Neo4j inserting large files - huge difference in time between

I am inserting a set of files (pdfs, of each 2 MB) in my database.
Inserting 100 files at once takes +- 15 seconds, while inserting 250 files at once takes 80 seconds.
I am not quite sure why this big difference is happening, but I assume it is because the amount of free memory is full between this amount. Could this be the problem?
If there is any more detail I can provide, please let me know.
Not exactly sure of what is happening on your side but it really looks like what is described here in the neo4j performance guide.
It could be:
Memory issues
If you are experiencing poor write performance after writing some data
(initially fast, then massive slowdown) it may be the operating system
that is writing out dirty pages from the memory mapped regions of the
store files. These regions do not need to be written out to maintain
consistency so to achieve highest possible write speed that type of
behavior should be avoided.
Transaction size
Are you using multiple transactions to upload your files ?
Many small transactions result in a lot of I/O writes to disc and
should be avoided. Too big transactions can result in OutOfMemory
errors, since the uncommitted transaction data is held on the Java
Heap in memory.
If you are on linux, they also suggest some tuning to improve performance. See here.
You can look up the details on the page.
Also, if you are on linux, you can check memory usage by yourself during import by using this command:
$ free -m
I hope this helps!

How accurate is Delphi's 'Now' timestamp

I'm going to be working on a project that will require (fairly) accurate time-stamping of incoming RS232 serial and network data from custom hardware. As the data will be coming from a number of independant hardware sources, I will need to timestamp all data so it can be deskewed/ interpolated to a nominal point in time.
My immediate though was just to use the inbuilt Now command to timestamp, however a quick Google seems to indicate that this is only going to be accurate to around 50 msecs or so.
Unfortunately, the more I read the more confused I become. There seems to be a lot of conflicting advice on GetTickCount and QueryPerformanceCounter, with complications due to todays multicore processors and CPU throttling. I have also seen posts recommending using the Windows multimedia timers, but I cannot seem to find any code snippets to do this.
So, can anyone advise me:
1) How accurate 'Now' will be.
2) Whether there is a simple, higher accuracy alternative.
Note: I would be hoping to timestamp to within, say , 10 milliseconds, and i am not looking for a timer as such, just a better time-stamping method. This will be running on a Windows 7 32 bit low-power micro-PC. I will be using either Delphi XE or Delphi 2007, if it makes any difference.
According to documentation, Now is as accurate only to the nearest second:
Although TDateTime values can represent milliseconds, Now is accurate only to the nearest second.
Despite this, looking at the current implementation, Now is as accurate as the GetLocalTime windows API could be.
Making a quick test, it shows Now returns values for each millisecond in the clock, for example:
begin
System.SysUtils.FormatSettings.LongTimeFormat := 'hh:mm:ss.zzz';
for I := 1 to 5000 do
Writeln(TimeToStr(Now()));
end.
When I executed this console program from the command line project1 >times.txt, in a Windows 7 64 bits machine, I got a file that goes along 29 milliseconds continually (no one is missing in the file).
You have to face the fact that running in a Windows environment, your application/thread may get processor slices with varying time in between, depending on how busy is the system and the priority of your application/threads versus all the other threads running in the system.

Resources