I'm trying to import data from the relational database into neo4j. The process goes like this (simplified a little bit):
while (!sarBatchService.isFinished()) {
logger.info("New batch started");
Date loadDeklFrom = sarBatchService.getStartDateForNewBatch();
Date loadDeklTo = sarBatchService
.getEndDateForNewBatch(loadDeklFrom); // = loadDeklFrom + 2 hours
logger.info("Dates calculated");
Date startTime = new Date();
List<Dek> deks = dekLoadManager
.loadAllDeks(loadDeklFrom, loadDeklTo); // loading data from the relational database (POINT A)
Date endLoadTime = new Date();
logger.info("Deks loaded");
GraphDatabase gdb = template.getGraphDatabase();
Transaction tx = gdb.beginTx();
logger.info("Transaction started!");
try {
for (Deks dek : deks) {
/* transform dek into nodes, and save
this nodes with Neo4jTemplate.save() */
}
logger.info("Deks saved");
Date endImportTime = new Date();
int aff = sarBatchService.insertBatchData(loadDeklFrom,
loadDeklTo, startTime, endLoadTime, endImportTime,
deks.size()); // (POINT B)
if (aff != 1) {
String msg = "Something went wrong",
throw new RuntimeException(msg);
}
logger.info("Batch data saved into relational database");
tx.success();
logger.info("Transaction marked as success.");
} catch (NoSuchFieldException | SecurityException
| IllegalArgumentException | IllegalAccessException
| NoSuchMethodException | InstantiationException
| InvocationTargetException e1) {
logger.error("Something bad happend :(");
logger.error(e1.getStackTrace().toString());
} finally {
logger.info("Closing transaction...");
tx.close(); // (POINT C)
logger.info("Transaction closed!");
logger.info("Need more work? " + !sarBatchService.isFinished());
}
}
So, data in the relational database has a timestamp which indicates when it's stored and I'm loading it in two hours by two hours time intervals (POINT A in the code). After that, I'm iterating through loaded data, transforming it into nodes (spring-data-neo4j nodes), storing in neo4j and storing informations about the current batch (POINT B) in the relational database. I'm logging almost every step to debug more easily.
The program successfully finishes 158 batches. The problem starts as the 159th batch starts. The program stops at the POINT C in the code (tx.close()) and waits there for 4 hours (which usually lasts a few seconds). After that continues normally.
I've tried running it on tomcat 7 with 10GB heap size and 4GB heap size. The result is the same (blocks on 159th batch). The maximum number of nodes in one transaction is between 10k and 15k (7k on average), and the 159th batch has less then 10k nodes.
The interesting part is that everything goes well if the data is loaded 4 by 4 hours or 12 by 12 hours. Also, if I restart Tomcat or execute only the 159th batch everything passes without problems.
I'm using spring 3.2.8 with spring-data-neo4j 3.0.2.
This is the neo4j's message.log:
...
2014-11-24 15:21:38.080+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for an additional 418ms [total block time: 150.973s]
2014-11-24 15:21:45.722+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for an additional 377ms [total block time: 151.35s]
...
2014-11-24 15:23:57.381+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for an additional 392ms [total block time: 156.593s]
2014-11-24 15:24:06.758+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Rotating [/home/pravila/data/neo4j/nioneo_logical.log.1] # version=22 to /home/pravila/data/neo4j/nioneo_logical.log.2 from position 26214444
2014-11-24 15:24:06.763+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Rotate log first start entry # pos=24149878 out of [339=Start[339,xid=GlobalId[NEOKERNL|5889317606667601380|364|-1], BranchId[ 52 49 52 49 52 49 ],master=-1,me=-1,time=2014-11-24 15:23:13.021+0000/1416842593021,lastCommittedTxWhenTransactionStarted=267]]
2014-11-24 15:24:07.401+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Rotate: old log scanned, newLog # pos=2064582
2014-11-24 15:24:07.402+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Log rotated, newLog # pos=2064582, version 23 and last tx 267
2014-11-24 15:24:07.684+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Rotating [/home/pravila/data/neo4j/index/lucene.log.1] # version=6 to /home/pravila/data/neo4j/index/lucene.log.2 from position 26214408
2014-11-24 15:24:07.772+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Rotate log first start entry # pos=25902494 out of [134=Start[134,xid=GlobalId[NEOKERNL|5889317606667601380|364|-1], BranchId[ 49 54 50 51 55 52 ],master=-1,me=-1,time=2014-11-24 15:23:13.023+0000/1416842593023,lastCommittedTxWhenTransactionStarted=133]]
2014-11-24 15:24:07.871+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Rotate: old log scanned, newLog # pos=311930
2014-11-24 15:24:07.878+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Log rotated, newLog # pos=311930, version 7 and last tx 133
2014-11-24 15:24:10.919+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for an additional 214ms [total block time: 156.807s]
2014-11-24 15:24:17.486+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for an additional 405ms [total block time: 157.212s]
...
2014-11-24 15:25:28.692+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for an additional 195ms [total block time: 159.316s]
2014-11-24 15:25:33.238+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Rotating [/home/pravila/data/neo4j/nioneo_logical.log.2] # version=23 to /home/pravila/data/neo4j/nioneo_logical.log.1 from position 26214459
2014-11-24 15:25:33.242+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Rotate log first start entry # pos=24835943 out of [349=Start[349,xid=GlobalId[NEOKERNL|-6436474643536791121|374|-1], BranchId[ 52 49 52 49 52 49 ],master=-1,me=-1,time=2014-11-24 15:25:27.038+0000/1416842727038,lastCommittedTxWhenTransactionStarted=277]]
2014-11-24 15:25:33.761+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Rotate: old log scanned, newLog # pos=1378532
2014-11-24 15:25:33.763+0000 INFO [o.n.k.i.t.x.XaLogicalLog]: Log rotated, newLog # pos=1378532, version 24 and last tx 277
2014-11-24 15:25:37.031+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for an additional 148ms [total block time: 159.464s]
2014-11-24 15:25:45.891+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for an additional 153ms [total block time: 159.617s]
....
2014-11-24 15:26:48.447+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for an additional 221ms [total block time: 161.641s]
I don't know what's going on here...
Please help.
It very much looks like you have a leaking perhaps outer transaction there.
So that the inner transaction that you show actually finishes but the outer one continues to accumulate state. As Neo doesn't suspend outer transactions but purely nests them there will be no real commit until you hit the outer tx.success(); tx.close();
You should see it if you take thread dump when it blocks to see if it is actually stuck in commit.
after hours an hours of searching and testing I tried to rerun the whole batch with the 4 by 4 hours time interval. It also stopped after the 145th batch (transaction). The difference was that it threw an error (Too many opened files). I set the ulimit for opened files to unlimited and now it works. The only mystery is why the program didn't throw an error the first time.
Related
#define READ_COL 4
void read_data(kern_colmeta *colmeta
, int ncols
, HeapTupleHeaderData *htup
, cl_char tup_dclass[READ_COL]
, cl_long tup_values[READ_COL])
{
char *addr ;//__attribute__((unused));
EXTRACT_HEAP_TUPLE_BEGIN(addr, colmeta, ncols, htup);
EXTRACT_HEAP_TUPLE_NEXT(addr);
EXTRACT_HEAP_TUPLE_NEXT(addr);
EXTRACT_HEAP_READ_32BIT(addr, tup_dclass[3],tup_values[3]);
EXTRACT_HEAP_TUPLE_NEXT(addr);
EXTRACT_HEAP_TUPLE_NEXT(addr);
EXTRACT_HEAP_TUPLE_NEXT(addr);
//EXTRACT_HEAP_READ_POINTER(addr,tup_dclass[1],tup_values[1]);
EXTRACT_HEAP_TUPLE_NEXT(addr);
//EXTRACT_HEAP_READ_POINTER(addr ,tup_dclass[2],tup_values[2]);
EXTRACT_HEAP_TUPLE_END();
}
void accel (char *a, char *b)//, int* o)
{
#pragma HLS INTERFACE m_axi depth=125 port=a
#pragma HLS INTERFACE m_axi depth=1984 port=b
kern_colmeta col[16];
memcpy(col, b, sizeof(kern_colmeta)*16);
HeapTupleHeaderData *htup;
htup = (HeapTupleHeaderData *)a;
cl_char tup_dclass[READ_COL];
cl_long tup_values[READ_COL];
read_data(col, 16, htup, tup_dclass, tup_values);
}
The top function is accel(), error occurs in the code when calling EXTRACT_HEAP_READ_32BIT(). C simulation results are normal,but as once try to run synthesis, it faild. error log just like as follows:
INFO: [HLS 200-1510] Running: csynth_design
INFO: [HLS 200-111] Finished File checks and directory preparation: CPU user time: 0 seconds. CPU system time: 0 seconds. Elapsed time: 0 seconds; current allocated memory: 205.967 MB.
INFO: [HLS 200-10] Analyzing design file '/root/gyf/hls/Unable_to_Schedule/kernel_fun.cpp' ...
INFO: [HLS 200-111] Finished Source Code Analysis and Preprocessing: CPU user time: 0.23 seconds. CPU system time: 0.08 seconds. Elapsed time: 0.21 seconds; current allocated memory: 207.566 MB.
INFO: [HLS 200-777] Using interface defaults for 'Vitis' flow target.
INFO: [HLS 200-111] Finished Command csynth_design CPU user time: 3.3 seconds. CPU system time: 0.35 seconds. Elapsed time: 3.56 seconds; current allocated memory: 209.139 MB.
Pre-synthesis failed.
while executing
"source accel.tcl"
("uplevel" body line 1)
invoked from within
"uplevel \#0 [list source $arg] "
INFO: [HLS 200-112] Total CPU user time: 5.81 seconds. Total CPU system time: 0.99 seconds. Total elapsed time: 5.43 seconds; peak allocated memory: 208.758 MB.
INFO: [Common 17-206] Exiting vitis_hls at Fri Feb 25 13:28:33 2022..
The Project code
I have exactly the same error:
INFO: [HLS 200-111] Finished Command csynth_design CPU user time: 45.76 seconds. CPU system time: 0.94 seconds. Elapsed time: 46.21 seconds; current allocated memory: 194.761 MB.
Pre-synthesis failed.
while executing
"source ../scripts/ip_v6.tcl"
("uplevel" body line 1)
invoked from within
"uplevel \#0 [list source $arg] "
I would be interested to know how I can track the error source. I also have the same error message if I launch viti_hls GUI.
Yesterday I had a stub problem, which turns out to be an ActiveStorage problem. I work on a system that stores cached data with ActiveStorage. We are using Rails 5.2.3 with the multiverse gem.
I want to load a specific file in the setup of a request spec.
So I create a user object :
user = create :user, name: 'bob'
using that factory :
FactoryBot.define do
factory :user
name { 'Robert' }
email { 'robert#email.com' }
end
end
and then try to attach a file to it like so :
user.cached_data.attach(
io: File.open(Rails.root.join('spec', 'support', 'sample_data.json'),
filename: 'sample_data.json',
content_type: 'application/json'
)
My spec is stuck there, it don't even launch, and I get that error :
ActiveRecord::LockWaitTimeout:
Mysql2::Error::TimeoutError: Lock wait timeout exceeded; try restarting transaction: INSERT INTO `active_storage_attachments` (`name`, `record_type`, `record_id`, `blob_id`, `created_at`) VALUES ('cached_data', 'User', 102, 60, '2019-08-22 16:24:07')
One of my coworkers told me that if I remove the config.use_transactional_fixtures = true in my rails_helper, that would remove my transaction problem. But I want to keep that line, which seems like a healthy standard to me.
EDIT 26/08/19
I used the MySQL console to launch the SHOW ENGINE INNODB STATUS; command during the lock. From the output I got that list of transactions :
------------
TRANSACTIONS
------------
Trx id counter 295543
Purge done for trx's n:o < 295541 undo n:o < 0 state: running but idle
History list length 66
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 295531, not started
MySQL thread id 26, OS thread handle 0x700008449000, query id 214 localhost db_user
---TRANSACTION 0, not started
MySQL thread id 20, OS thread handle 0x7000083fc000, query id 251 localhost root init
SHOW ENGINE INNODB STATUS
---TRANSACTION 295427, not started
MySQL thread id 1, OS thread handle 0x7000083af000, query id 0 Waiting for requests
---TRANSACTION 295542, ACTIVE 40 sec
2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1
MySQL thread id 29, OS thread handle 0x700008530000, query id 242 localhost db_user
Trx #rec lock waits 0 #table lock waits 0
Trx total rec lock wait time 0 SEC
Trx total table lock wait time 0 SEC
---TRANSACTION 295541, ACTIVE 40 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 1
MySQL thread id 28, OS thread handle 0x7000084e3000, query id 247 localhost db_user update
INSERT INTO `active_storage_attachments` (`name`, `record_type`, `record_id`, `blob_id`, `created_at`) VALUES ('cached_data', 'User', 99, 55, '2019-08-26 10:08:49')
Trx read view will not see trx with id >= 295542, sees < 295532
Trx #rec lock waits 1 #table lock waits 0
Trx total rec lock wait time 0 SEC
Trx total table lock wait time 0 SEC
------- TRX HAS BEEN WAITING 40 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5557 page no 3 n bits 96 index `PRIMARY` of table `test`.`active_storage_blobs` trx table locks 2 total table locks 2 trx id 295541 lock mode S locks rec but not gap waiting lock hold time 40 wait time before grant 0
------------------
---TRANSACTION 295532, ACTIVE 40 sec
2 lock struct(s), heap size 360, 0 row lock(s), undo log entries 3
MySQL thread id 27, OS thread handle 0x700008496000, query id 243 localhost db_user
Trx #rec lock waits 0 #table lock waits 0
Trx total rec lock wait time 0 SEC
Trx total table lock wait time 0 SEC
I just running a simple insertion query in Enterprise version neo4j-3.1.3 on neo4j browser. At the very first time insertion execution time is 410ms, and subsequent insertion it reduce to 4ms.
CQL:
create (n:City {name:"Trichy", lat:50.25, lng:12.21});
//Execution Time : Completed after 410 ms.
Even fetching a single node taking much time.
My CQL Query:
MATCH (n:City) RETURN n LIMIT 25
//Execution time : Started streaming 1 record after 105 ms and completed after 111 ms
I have allotted: dbms.memory.pagecache.size=15g
System:
total used free shared buffers cached
Mem: 11121 8171 2950 611 956 2718
-/+ buffers/cache: 4496 6625
Swap: 3891 0 3891
Why its taking much time on singe insertion. Even 4ms seems too costly for a single insertion with minimal property.
Even fetching also taking much time.
I'm having difficulty getting ipcluster to start all of the ipengines that I ask for. It appears to be some sort of timeout issue. I'm using IPython 2.0 on a linux cluster with 192 processors. I run a local ipcontroller, and start ipengines on my 12 nodes using SSH. It's not a configuration problem (at least I don't think it is) because I'm having no problems running about 110 ipengines. When I try for a larger amount, some of them seem to die during start up, and I do mean some of them - the final number I have varies a little. ipcluster reports that all engines have started. The only sign of trouble that I can find (other than not having use of all of the requested engines) is the following in some of the ipengine logs:
2014-06-20 16:42:13.302 [IPEngineApp] Loading url_file u'.ipython/profile_ssh/security/ipcontroller-engine.json'
2014-06-20 16:42:13.335 [IPEngineApp] Registering with controller at tcp://10.1.0.253:55576
2014-06-20 16:42:13.429 [IPEngineApp] Starting to monitor the heartbeat signal from the hub every 3010 ms.
2014-06-20 16:42:13.434 [IPEngineApp] Using existing profile dir: u'.ipython/profile_ssh'
2014-06-20 16:42:13.436 [IPEngineApp] Completed registration with id 49
2014-06-20 16:42:25.472 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (1 time(s) in a row).
2014-06-20 18:09:12.782 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (1 time(s) in a row).
2014-06-20 19:14:22.760 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (1 time(s) in a row).
2014-06-20 20:00:34.969 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (1 time(s) in a row).
I did some googling to see if I could find some wisdom, and the only thing I've come across is http://permalink.gmane.org/gmane.comp.python.ipython.devel/12228. The author seems to think it's a timeout of sorts.
I also tried tripling (90 seconds as opposed to the default 30) the IPClusterStart.early_shutdown and IPClusterEngines.early_shutdown times without any luck.
Thanks - in advance - for any pointers on getting the full use of my cluster.
When I try execute ipcluster start --n=200 I get: OSError: [Errno 24] Too many open files
This could be what happens to you too. Try raising the open file limit of the OS.
There is a part of my app where I perform operations concurrently. They consist of initializing many CALayers and rendering them to bitmaps.
Unfortunately, during these operations (each takes about 2 seconds to complete on an iphone 4), the Dirty Size indicated by VM Tracker spikes to ~120MB. Allocations spike to ~12MB(does not accumulate) From my understanding, the Dirty size is memory that cannot be freed. so often, my app and all other apps in the background gets killed.
Incident Identifier: 7E6CBE04-D965-470D-A532-ADBA007F3433
CrashReporter Key: bf1c73769925cbff86345a576ae1e576728e5a11
Hardware Model: iPhone3,1
OS Version: iPhone OS 5.1.1 (9B206)
Kernel Version: Darwin Kernel Version 11.0.0: Sun Apr 8 21:51:26 PDT 2012; root:xnu-
1878.11.10~1/RELEASE_ARM_S5L8930X
Date: 2013-03-18 19:44:51 +0800
Time since snapshot: 38 ms
Free pages: 1209
Active pages: 3216
Inactive pages: 1766
Throttled pages: 106500
Purgeable pages: 0
Wired pages: 16245
Largest process: Deja Dev
Processes
Name UUID Count resident pages
geod <976e1080853233b1856b13cbd81fdcc3> 338
LinkedIn <24325ddfeed33d4fb643030edcb12548> 6666 (jettisoned)
Music~iphone <a3a7a86202c93a6ebc65b8e149324218> 935
WhatsApp <a24567991f613aaebf6837379bbf3904> 2509
MobileMail <eed7992f4c1d3050a7fb5d04f1534030> 945
Console <9925a5bd367a7697038ca5a581d6ebdf> 926 (jettisoned)
Test Dev <c9b1db19bcf63a71a048031ed3e9a3f8> 81683 (active)
MobilePhone <8f3f3e982d9235acbff1e33881b0eb13> 867
debugserver <2408bf4540f63c55b656243d522df7b2> 92
networkd <80ba40030462385085b5b7e47601d48d> 158
notifyd <f6a9aa19d33c3962aad3a77571017958> 234
aosnotifyd <8cf4ef51f0c635dc920be1d4ad81b322> 438
BTServer <31e82dfa7ccd364fb8fcc650f6194790> 275
CommCenterClassi <041d4491826e3c6b911943eddf6aaac9> 722
SpringBoard <c74dc89dec1c3392b3f7ac891869644a> 5062 (active)
aggregated <a12fa71e6997362c83e0c23d8b4eb5b7> 383
apsd <e7a29f2034083510b5439c0fb5de7ef1> 530
configd <ee72b01d85c33a24b3548fa40fbe519c> 465
dataaccessd <473ff40f3bfd3f71b5e3b4335b2011ee> 871
fairplayd.N90 <ba38f6bb2c993377a221350ad32a419b> 169
fseventsd <914b28fa8f8a362fabcc47294380c81c> 331
iapd <0a747292a113307abb17216274976be5> 323
imagent <9c3a4f75d1303349a53fc6555ea25cd7> 536
locationd <cf31b0cddd2d3791a2bfcd6033c99045> 1197
mDNSResponder <86ccd4633a6c3c7caf44f51ce4aca96d> 201
mediaremoted <327f00bfc10b3820b4a74b9666b0c758> 257
mediaserverd <f03b746f09293fd39a6079c135e7ed00> 1351
lockdownd <b06de06b9f6939d3afc607b968841ab9> 279
powerd <133b7397f5603cf8bef209d4172d6c39> 173
syslogd <7153b590e0353520a19b74a14654eaaa> 178
wifid <3001cd0a61fe357d95f170247e5458f5> 319
UserEventAgent <dc32e6824fd33bf189b266102751314f> 409
launchd <5fec01c378a030a8bd23062689abb07f> 126
**End**
On closer inspection, the dirty memory consists mostly of Image IO and Core Animation pages. multiple entries consisting of hundreds to thousands of pages. What does Image IO and Core Animation do exactly? and how can I reduce the Dirty Memory?
edit: tried doing this on a serial queue and no improvement on size of Dirty memory
another question. how large is too large for Dirty Memory and allocations?
Updated:
- (void) render
{
for (id thing in mylist) {
#autorelease {
CALayer *layer = createLayerFromThing(thing);
UIImage *img = [self renderLayer:layer];
[self writeToDisk:img];
}
}
}
in createLayerFromThing(thing); I actually creating a layer with a huge amount of sub layers
UPDATED
first screenshot for maxConcurrentOperationCount = 4
second for maxConcurrentOperationCount = 1
============================================================================================================================================================
and since it bringing down the number of concurrent operations barely made a dent,
I decided to try maxConcurrentOperationCount = 10
It's difficult to say what's going wrong without any details but here are a few ideas.
A. Use #autorelease. CALayers generate bitmaps in the backgound, which in aggregate can take-up lots of space if they are not freed in time. If you are creating and rendering many layers I suggest adding an autorelease block inside your rendering loop. This won't help if ALL your layers are nested and needed at the same time for rendering.
- (void) render
{
for (id thing in mylist) {
#autorelease {
CALayer *layer = createLayerFromThing(thing);
[self renderLayer:layer];
}
}
}
B. Also if you are using CGBitmapCreateContext for rendering are you calling the matching CGContextRelease? This goes also for CGColorRef.
C. If you are allocating memory with malloc or calloc are you freeing it when done? One way to ensure that this happens
Post the code for the rendering loop to provide more context.
I believe there are two possibilities here:
The items you create are not autoreleased.
The memory you are taking is what it is, due to the number of concurrent operations you are doing.
In the first case the solution is simple. Send an autorelease message to the layers and images upon their creation.
In the second case, you could limit the number of concurrent operations by using an NSOperationQueue. Operation queues have a property called maxConcurrentOperationCount. I would try with a value of 4 and see how the memory behavior changes from what you currently have. Of course, you might have to try different values to get the right memory vs performance balance.
Autorelease will wait till the end of the run loop to clean up. If you explicit release it can take it from the heap without filling the pool.
- (void) render {
for (id thing in mylist) {
CALayer *layer = createLayerFromThing(thing); // assuming this thing is retained
[self renderLayer:layer];
[layer release]; // this layer no longer needed
} }
Also run build with analyse and see if you have leaks and fix them too.