nvprof Warning: The path to CUPTI and CUDA Injection libraries might not be set in LD_LIBRARY_PATH

nvprof Warning: The path to CUPTI and CUDA Injection libraries might not be set in LD_LIBRARY_PATH - nvidia

I get the message in the subject when I try to run a program I developed with OpenACC through Nvidia's nvprof profiler like this:
nvprof ./SFS 4
If I run nvprof with -o [output_file] the warning message doesn't appear, but the output file is not created. What could be wrong here?
The LD_LIBRARY_PATH is set in my .bashrc to: /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/cuda/11.0/lib64/ because there I have found these files there (they have "cupti" and "inj" in their names and I thought they are the needed ones):
lrwxrwxrwx 1 root root 19 Aug 4 05:27 libaccinj64.so -> libaccinj64.so.11.0
lrwxrwxrwx 1 root root 23 Aug 4 05:27 libaccinj64.so.11.0 -> libaccinj64.so.11.0.194
...
lrwxrwxrwx 1 root root 16 Aug 4 05:27 libcupti.so -> libcupti.so.11.0
lrwxrwxrwx 1 root root 20 Aug 4 05:27 libcupti.so.11.0 -> libcupti.so.2020.1.0
...
I am on Ubuntu 18.04. workstation with Nvidia GeForce RTX 2070, and have CUDA version 11 installed.
nvidia-smi command gives me this:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:02:00.0 On | N/A |
| 30% 40C P2 58W / 185W | 693MiB / 7981MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
The compilers I have (nvidia and portland) are from the latest Nvidia HPC-SDK, version 20.7-0
I compile my programs with -acc -Minfo=accel options, not sure how could I set -ta= and if it is needed at all?
P.S. I am also not sure if running my code, with or without nvprof uses GPUs at all, although I did set ACC_DEVICE_TYPE to nvidia.
Any advice would be very welcome.
Cheers

Which nvprof are you using? The one that ships with NV HPC 20.7 or your own install?
This looks very similar to an issue reported yesterday on the NVIDIA DevTalk user forums:
https://forums.developer.nvidia.com/t/new-20-7-version-where-is-the-detail-release-bugfix/146168/4
Granted this was for Nsight-systems, but it may be the same issue. It appears to be a problem with the 2020.3 version of the profilers which is the version we ship with the NV HPC 20.7 SDK. As I note, the Nsight-Systems 2020.4 release should have this fixed, so the work around would be download and install 2020.4 or use a prior release.
https://developer.nvidia.com/nsight-systems
There does seem to be a temporary issue with the Nsight-systems download that hopefully be corrected before you see this note.
Also, nvprof is in the process of being deprecated so you should consider moving to use Nsight-systems and Nsight-compute.
https://developer.nvidia.com/blog/migrating-nvidia-nsight-tools-nvvp-nvprof/

Related

Unable to start SonarQube, Getting Error in Terminal

I am trying to implement SonarQube on my Mac Pro M1.
I have followed steps from: here
Also I have installed JDK 11.
But getting Error:
Running SonarQube...
wrapper | --> Wrapper Started as Console
wrapper | Launching a JVM...
jvm 1 | Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org
jvm 1 | Copyright 1999-2006 Tanuki Software, Inc. All Rights Reserved.
jvm 1 |
jvm 1 |
jvm 1 | WARNING - Unable to load the Wrapper's native library because none of the
jvm 1 | following files:
jvm 1 | libwrapper-macosx-aarch64-64.dylib
jvm 1 | libwrapper-macosx-universal-64.dylib
jvm 1 | libwrapper.dylib
jvm 1 | could be located on the following java.library.path:
jvm 1 | /Applications/SonarQube/bin/macosx-universal-64/./lib
jvm 1 | Please see the documentation for the wrapper.java.library.path
jvm 1 | configuration property.
jvm 1 | System signals will not be handled correctly.
jvm 1 |
jvm 1 | 2022.08.02 16:08:58 INFO app[][o.s.a.AppFileSystem] Cleaning or creating temp directory /Applications/SonarQube/temp
jvm 1 | 2022.08.02 16:08:58 INFO app[][o.s.a.es.EsSettings] Elasticsearch listening on [HTTP: 127.0.0.1:9001, TCP: 127.0.0.1:58182]
jvm 1 | 2022.08.02 16:08:59 INFO app[][o.s.a.ProcessLauncherImpl] Launch process[ELASTICSEARCH] from [/Applications/SonarQube/elasticsearch]: /Applications/SonarQube/elasticsearch/bin/elasticsearch
jvm 1 | 2022.08.02 16:08:59 INFO app[][o.s.a.SchedulerImpl] Waiting for Elasticsearch to be up and running
jvm 1 | Exception in thread "main" java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release
jvm 1 | at java.base/java.lang.System.setSecurityManager(System.java:416)
jvm 1 | at org.elasticsearch.bootstrap.Security.setSecurityManager(Security.java:99)
jvm 1 | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:70)
jvm 1 | 2022.08.02 16:08:59 WARN app[][o.s.a.p.AbstractManagedProcess] Process exited with exit value [ElasticSearch]: 1
jvm 1 | 2022.08.02 16:08:59 INFO app[][o.s.a.SchedulerImpl] Process[ElasticSearch] is stopped
jvm 1 | 2022.08.02 16:08:59 INFO app[][o.s.a.SchedulerImpl] SonarQube is stopped
jvm 1 | 2022.08.02 16:08:59 ERROR app[][o.s.a.p.EsManagedProcess] Failed to check status
jvm 1 | org.elasticsearch.ElasticsearchException: java.lang.InterruptedException
jvm 1 | at org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2695)
jvm 1 | at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:2171)
jvm 1 | at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:2137)
jvm 1 | at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:2105)
jvm 1 | at org.elasticsearch.client.ClusterClient.health(ClusterClient.java:151)
jvm 1 | at org.sonar.application.es.EsConnectorImpl.getClusterHealthStatus(EsConnectorImpl.java:64)
jvm 1 | at org.sonar.application.process.EsManagedProcess.checkStatus(EsManagedProcess.java:92)
jvm 1 | at org.sonar.application.process.EsManagedProcess.checkOperational(EsManagedProcess.java:77)
jvm 1 | at org.sonar.application.process.EsManagedProcess.isOperational(EsManagedProcess.java:62)
jvm 1 | at org.sonar.application.process.ManagedProcessHandler.refreshState(ManagedProcessHandler.java:223)
jvm 1 | at org.sonar.application.process.ManagedProcessHandler$EventWatcher.run(ManagedProcessHandler.java:288)
jvm 1 | Caused by: java.lang.InterruptedException: null
jvm 1 | at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1048)
jvm 1 | at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:243)
jvm 1 | at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:75)
jvm 1 | at org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2692)
jvm 1 | ... 10 common frames omitted
wrapper | <-- Wrapper Stopped
I have checked lots of StackOverflow Questions and solutions regarding same But still can't figure it out.
I would really appreciate for your help.

tensorflow-gpu is not working with Blas GEMM launch failed

I installed tensorflow-gpu to run my tensorflow code on my GPU. But I can't make it run. It keeps on giving the above mentioned error. Following is my sample code followed by the error stack trace:
import tensorflow as tf
import numpy as np
def check(W,X):
return tf.matmul(W,X)
def main():
W = tf.Variable(tf.truncated_normal([2,3], stddev=0.01))
X = tf.placeholder(tf.float32, [3,2])
check_handle = check(W,X)
with tf.Session() as sess:
tf.initialize_all_variables().run()
num = sess.run(check_handle, feed_dict =
{X:np.reshape(np.arange(6), (3,2))})
print(num)
if __name__ == '__main__':
main()
My GPU is pretty good GeForce GTX 1080 Ti with 11 GB vram and there is nothing else significant running on it(just chrome) as you can see in the nvidia-smi :
Fri Aug 4 16:34:49 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 381.22 Driver Version: 381.22 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 0000:07:00.0 On | N/A |
| 30% 55C P0 79W / 250W | 711MiB / 11169MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7650 G /usr/lib/xorg/Xorg 380MiB |
| 0 8233 G compiz 192MiB |
| 0 24226 G ...el-token=963C169BB38ADFD67B444D57A299CE0A 136MiB |
+-----------------------------------------------------------------------------+
Following is the error stack trace:
2017-08-04 15:44:21.585091: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585110: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585114: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585118: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585122: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.853700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:07:00.0
Total memory: 10.91GiB
Free memory: 9.89GiB
2017-08-04 15:44:21.853724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-08-04 15:44:21.853728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-08-04 15:44:21.853734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:07:00.0)
2017-08-04 15:44:24.948616: E tensorflow/stream_executor/cuda/cuda_blas.cc:365] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-08-04 15:44:24.948640: W tensorflow/stream_executor/stream.cc:1601] attempting to perform BLAS operation using StreamExecutor without BLAS support
2017-08-04 15:44:24.948805: W tensorflow/core/framework/op_kernel.cc:1158] Internal: Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
[[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
Traceback (most recent call last):
File "test.py", line 51, in <module>
_, loss_out, res_out = sess.run([train_op, loss, res], feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
[[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
[[Node: layer2/MatMul/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_158_layer2/MatMul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'layer1/MatMul', defined at:
File "test.py", line 18, in <module>
pre_activation = tf.matmul(input_ph, weights)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1816, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1217, in _mat_mul
transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
[[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
[[Node: layer2/MatMul/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_158_layer2/MatMul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
To add to it, my previous installation of tensorflow cpu worked pretty well. Any help is appreciated. Thanks!
Note- I have cuda-8.0 with cudnn-5.1 installed and their paths added in my bashrc profile .

I had a very similar problem. For me it coincided with an nvidia driver update. So I though it was a problem with the driver. But changing the driver had no effect. What eventually worked for me was cleaning out the nvidia cache:
sudo rm -rf ~/.nv/
Found this suggestion in the NVIDIA developer forum:
https://devtalk.nvidia.com/default/topic/1007071/cuda-setup-and-installation/cuda-error-when-running-matrixmulcublas-sample-ubuntu-16-04/post/5169223/
I suspect that during the driver update there where still some compiled files of the old version that were not compatible, or even that were corrupted during the process. Assumptions aside, this solved the problem for me.

So for me the reason for this error was that my cuda and all sub directories and files required root privileges. So tensorflow required root privileges as well to be able to use cuda. So uninstalling tensorflow and installing it again as a root user solved the problem for me.

Installing the right NVIDIA driver and CUDA versions for the my NVIDIA graphics card(For eg. NVIDIA RTX 2070 in my case) worked for me.

Neo4j Performance Challenge - How to Improve?

I've been wrangling with Neo4J for the last few weeks, trying to resolve some extremely challenging performance problems. At this point, I need some additional help because I can't determine how to move forward.
I have a graph with a total of approx 12.5 Million nodes and 64 Million relationships. The purpose of the graph is going to be analyzing suspicious financial behavior, so it is customers, accounts, transactions, etc.
Here is an example of the performance challenge:
This query for total nodes takes 96,064ms to complete, which is extremely long.
neo4j-sh (?)$ MATCH (n) RETURN count(n);
+----------+
| count(n) |
+----------+
| 12519940 |
+----------+
1 row
96064 ms
The query for total relationships takes 919,449ms to complete, which seems silly.
neo4j-sh (?)$ MATCH ()-[r]-() return count(r);
+----------+
| count(r) |
+----------+
| 64062508 |
+----------+
1 row
919449 ms
I have 6.6M Transaction Nodes. When I attempt to search for transactions that have an amount above $8,000, the query takes 653,637ms also way too long.
neo4j-sh (?)$ MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);
+----------+
| count(t) |
+----------+
| 10696 |
+----------+
1 row
653637 ms
Relevant Schema
ON :Transaction(baseamount) ONLINE
ON :Transaction(type) ONLINE
ON :Transaction(amount) ONLINE
ON :Transaction(currency) ONLINE
ON :Transaction(basecurrency) ONLINE
ON :Transaction(transactionid) ONLINE (for uniqueness constraint)
Profile of Query:
neo4j-sh (?)$ PROFILE MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);
+----------+
| count(t) |
+----------+
| 10696 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+Filter
|
+NodeByLabel
+------------------+---------+----------+-------------+------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+------------------+---------+----------+-------------+------------------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(t) |
| EagerAggregation | 1 | 0 | | |
| Filter | 10696 | 13216382 | | Property(t,amount(62)) > { AUTODOUBLE0} |
| NodeByLabel | 6608191 | 6608192 | t, t | :Transaction |
+------------------+---------+----------+-------------+------------------------------------------+
I am running these in the neo4j shell.
The performance challenges here are starting to create substantial doubt about whether I can even use Neo4J, and seem opposite of the potential the platform offers.
I am fully admit that I may have misconfigured something (I'm relatively new to Neo4J), so guidance on what to fix or what to look at is much appreciated.
Here are details of my setup:
System: Linux, Ubuntu, 16GB RAM, 3.5 i5 Proc, 256GB SSD HD
CPU
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i5-4690K CPU # 3.50GHz
stepping : 3
microcode : 0x12
cpu MHz : 4230.625
cache size : 6144 KB
Memory
$ cat /proc/meminfo
MemTotal: 16115020 kB
MemFree: 224856 kB
MemAvailable: 8807160 kB
Buffers: 124356 kB
Cached: 8429964 kB
SwapCached: 8388 kB
Disk
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/data1--vg-root 219G 32G 177G 16% /
Neo4J.properties
neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=1G
neostore.relationshipgroupstore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=500M
neostore.propertystore.db.arrays.mapped_memory=50M
neostore.propertystore.db.index.keys.mapped_memory=200M
relationship_auto_indexing=true
Neo4J-Wrapper.properties
wrapper.java.additional=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional=-Dlog4j.configuration=file:conf/log4j.properties
#********************************************************************
# JVM Parameters
#********************************************************************
wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional=-XX:-OmitStackTraceInFastThrow
# Uncomment the following lines to enable garbage collection logging
wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
wrapper.java.additional=-XX:+PrintGCDetails
wrapper.java.additional=-XX:+PrintGCDateStamps
wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime
wrapper.java.additional=-XX:+PrintPromotionFailure
wrapper.java.additional=-XX:+PrintTenuringDistribution
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=4096
wrapper.java.maxmemory=6144
Other:
Changed the open file settings for Linux to 40k
I am not running anything else on this machine, no X Windows, no other DB server. Here is a snippet of top while running a query:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15785 neo4j 20 0 12.192g 8.964g 2.475g S 100.2 58.3 227:50.98 java
1 root 20 0 33464 2132 1140 S 0.0 0.0 0:02.36 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
The total file size in the graph.db directory is:
data/graph.db$ du --max-depth=1 -h
1.9G ./schema
36K ./index
26G .
Data loading was extremely hit or miss. Some merges would take less than 60 seconds (Even for ~200 to 300K inserts), while some merges would last for over 3 hours (11,898,514ms for a CSV file with 189,999 rows merging on one date)
I get constant GC thread blocking:
2015-03-27 14:56:26.347+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 15422ms.
2015-03-27 14:56:39.011+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 12363ms.
2015-03-27 14:56:57.533+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 13969ms.
2015-03-27 14:57:17.345+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 14657ms.
2015-03-27 14:57:29.955+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 12309ms.
2015-03-27 14:58:14.311+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 1928ms.
Please let me know if I should add anything else that would be salient to the discussion
Update 1
Thank you very much for your help, I just moved so I was delayed in responding.
Size of Neostore Files:
/data/graph.db$ ls -lah neostore.*
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.id
-rw-rw-r-- 1 neo4j neo4j 110 Apr 2 13:03 neostore.labeltokenstore.db
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.labeltokenstore.db.id
-rw-rw-r-- 1 neo4j neo4j 874 Apr 2 13:03 neostore.labeltokenstore.db.names
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.labeltokenstore.db.names.id
-rw-rw-r-- 1 neo4j neo4j 200M Apr 2 13:03 neostore.nodestore.db
-rw-rw-r-- 1 neo4j neo4j 41 Apr 2 13:03 neostore.nodestore.db.id
-rw-rw-r-- 1 neo4j neo4j 68 Apr 2 13:03 neostore.nodestore.db.labels
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.nodestore.db.labels.id
-rw-rw-r-- 1 neo4j neo4j 2.8G Apr 2 13:03 neostore.propertystore.db
-rw-rw-r-- 1 neo4j neo4j 128 Apr 2 13:03 neostore.propertystore.db.arrays
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.arrays.id
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.id
-rw-rw-r-- 1 neo4j neo4j 720 Apr 2 13:03 neostore.propertystore.db.index
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.index.id
-rw-rw-r-- 1 neo4j neo4j 3.1K Apr 2 13:03 neostore.propertystore.db.index.keys
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.index.keys.id
-rw-rw-r-- 1 neo4j neo4j 1.7K Apr 2 13:03 neostore.propertystore.db.strings
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.strings.id
-rw-rw-r-- 1 neo4j neo4j 47M Apr 2 13:03 neostore.relationshipgroupstore.db
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.relationshipgroupstore.db.id
-rw-rw-r-- 1 neo4j neo4j 1.1G Apr 2 13:03 neostore.relationshipstore.db
-rw-rw-r-- 1 neo4j neo4j 1.6M Apr 2 13:03 neostore.relationshipstore.db.id
-rw-rw-r-- 1 neo4j neo4j 165 Apr 2 13:03 neostore.relationshiptypestore.db
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.relationshiptypestore.db.id
-rw-rw-r-- 1 neo4j neo4j 1.3K Apr 2 13:03 neostore.relationshiptypestore.db.names
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.relationshiptypestore.db.names.id
-rw-rw-r-- 1 neo4j neo4j 3.5K Apr 2 13:03 neostore.schemastore.db
-rw-rw-r-- 1 neo4j neo4j 25 Apr 2 13:03 neostore.schemastore.db.id
I read that mapped memory settings are replaced by another cache, and I have commented out those settings.
Java Profiler
JvmTop 0.8.0 alpha - 16:12:59, amd64, 4 cpus, Linux 3.16.0-33, load avg 0.30
http://code.google.com/p/jvmtop
Profiling PID 4260: org.neo4j.server.Bootstrapper
68.67% ( 14.01s) org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.read()
18.73% ( 3.82s) org.neo4j.kernel.impl.nioneo.store.StoreFailureException.<init>()
2.86% ( 0.58s) org.neo4j.kernel.impl.cache.ReferenceCache.put()
1.11% ( 0.23s) org.neo4j.helpers.Counter.inc()
0.87% ( 0.18s) org.neo4j.kernel.impl.cache.ReferenceCache.get()
0.65% ( 0.13s) org.neo4j.cypher.internal.compiler.v2_1.parser.Literals$class.PropertyKeyName()
0.63% ( 0.13s) org.parboiled.scala.package$.getCurrentRuleMethod()
0.62% ( 0.13s) scala.collection.mutable.OpenHashMap.<init>()
0.62% ( 0.13s) scala.collection.mutable.AbstractSeq.<init>()
0.62% ( 0.13s) org.neo4j.kernel.impl.cache.AutoLoadingCache.get()
0.61% ( 0.13s) scala.collection.TraversableLike$$anonfun$map$1.apply()
0.61% ( 0.12s) org.neo4j.kernel.impl.transaction.TxManager.assertTmOk()
0.61% ( 0.12s) org.neo4j.cypher.internal.compiler.v2_1.commands.EntityProducerFactory.<init>()
0.61% ( 0.12s) scala.collection.AbstractTraversable.<init>()
0.61% ( 0.12s) scala.collection.immutable.List.toStream()
0.60% ( 0.12s) org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord()
0.57% ( 0.12s) org.neo4j.kernel.impl.transaction.TxManager.getTransaction()
0.37% ( 0.08s) org.parboiled.scala.Parser$class.rule()
0.06% ( 0.01s) scala.util.DynamicVariable.value()

Unfortunately the schema indexes (aka those created using CREATE INDEX ON :Label(property)) do not yet support larger than/smaller than conditions. Therefore Neo4j falls back to scan all nodes with the given label and filter on their properties. This is of course expensive.
I do see two different approaches to tackle this:
1) If your condition does always have a pre-defined maximum granularity e.g. 10s of USDs, you can build up an "amount-tree" similar to a time-tree (see http://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html).
2) if you don't know the granularity upfront the other option is to setup a manual or auto index for the amount property, see http://neo4j.com/docs/stable/indexing.html. The most easy thing is probably using auto index. In neo4j.properties set the following options:
node_auto_indexing=true
node_keys_indexable=amount
Note that this will not automatically add all existing transaction into that index, it just puts those in the index that have been written to since auto indexing is enabled.
You can do a explicit range query on the auto index using
MATCH t=node:node_auto_index("amount:[6000 TO 999999999]")
RETURN count(t)

Why autoextend on Oracle XE not worked

We had a problem with our prod environment. Suddenly the exception began to appear.
ORA-01654: unable to extend index EMA.TRANSFERI2 by 128 in tablespace SYSTEM
As the solution of the problem my collegue added new datafile. But the question is, why the autoextend mechanism didn't worked? I'm not DBA, but I checked the configuration and it seems ok to me. It occurs only on prod environment, so I would rather avoid experimenting.
We have the table in system tablespace, which I already know, should be moved to users tablespace. But anyway, autoextend should work also on system tablepsace. Here is my config of table, datafiles and tablespace
TABLESPACE_NAME | PCT_FREE | PCT_USED | INITIAL_EXTENT | NEXT_EXTENT | MIN_EXTENTS | MAX_EXTENTS | PCT_INCREASE
SYSTEM | 10 | 40 | 65536 | 1048576 | 1 | 2147483645 | null
FILE_NAME | FILE_ID | TABLESPACE_NAME | BYTES | BLOCKS | STATUS | RELATIVE_FNO | AUTOEXTENSIBLE | MAXBYTES | MAXBLOCKS | INCREMENT_BY | USER_BYTES | USER_BLOCKS | ONLINE_STATUS
/u01/app/oracle/oradata/XE/system.dbf | 1 | SYSTEM | 629145600 | 76800 | AVAILABLE | 1 | YES | 629145600 | 76800 | 1280 | 628097024 | 76672 | SYSTEM
/u01/app/oracle/oradata/XE/system2.dbf | 5 | SYSTEM | 1048576000 | 128000 | AVAILABLE | 5 | YES | 2147483648 | 262144 | 25600 | 1047527424 | 127872 | SYSTEM
TABLESPACE_NAME | BLOCK_SIZE | INITIAL_EXTENT | NEXT_EXTENT | MIN_EXTENTS | MAX_EXTENTS | MAX_SIZE | PCT_INCREASE | MIN_EXTLEN | STATUS | CONTENTS | ALLOCATION_TYPE | SEGMENT_SPACE_MANAGEMENT | BIGFILE
SYSTEM | 8192 | 65536 | null | 1 | 2147483645 | 2147483645 | 65536 | ONLINE | PERMANENT | LOCAL | SYSTEM | MANUAL | NO

The MAXBYTES value for your system.dbf file is set to 629145600, so when your file size reached that limit, it couldn't be extended any further. It had autoextended up to that point, but wouldn't extend beyond the soft limit that had been specified for the file. That was set when the tablespace was created, using the autoextend MAXSIZE clause.
The limit may have been set because of the size of the underlying file system, to cause an error in case of runaway/unexpected growth, unintentionally, or for some other reason now known only to whoever set the database up.
As an alternative to adding a second data file, your DBA could have increased the soft limit on the existing file with alter database. But neither should be done lightly; the reason for the original restriction should be understood (especially if the filesystem could run out of space as a result of an increase) and the reason for growth should be examined too.

cudnn error :: CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED

I am trying to install a open-source software "openpose" for which I needed to install cuda, cudnn and nvidia drivers. Output of nvidia-smi is :
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 940MX Off | 00000000:01:00.0 Off | N/A |
| N/A 47C P8 N/A / N/A | 107MiB / 2004MiB | 7% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1513 G /usr/lib/xorg/Xorg 63MiB |
| 0 1698 G /usr/bin/gnome-shell 41MiB |
+-----------------------------------------------------------------------------+
And output of cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2 gives:
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
After successfull installations of all the above softwares and libraries, I finally ran openpose
with:
./build/examples/openpose/openpose.bin --video examples/media/video.avi
But the output was:
Starting OpenPose demo...
Configuring OpenPose...
Starting thread(s)...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
F0214 01:02:35.327615 3433 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED
*** Check failure stack trace: ***
# 0x7fabb8f390cd google::LogMessage::Fail()
# 0x7fabb8f3af33 google::LogMessage::SendToLog()
# 0x7fabb8f38c28 google::LogMessage::Flush()
# 0x7fabb8f3b999 google::LogMessageFatal::~LogMessageFatal()
# 0x7fabb89459d3 caffe::CuDNNConvolutionLayer<>::LayerSetUp()
# 0x7fabb8a42308 caffe::Net<>::Init()
# 0x7fabb8a441e0 caffe::Net<>::Net()
# 0x7fabbaa2ccaa op::NetCaffe::initializationOnThread()
# 0x7fabbaa500a1 op::addCaffeNetOnThread()
# 0x7fabbaa51518 op::PoseExtractorCaffe::netInitializationOnThread()
# 0x7fabbaa57163 op::PoseExtractorNet::initializationOnThread()
# 0x7fabbaa4be61 op::PoseExtractor::initializationOnThread()
# 0x7fabbaa46a51 op::WPoseExtractor<>::initializationOnThread()
# 0x7fabbaa8aff1 op::Worker<>::initializationOnThreadNoException()
# 0x7fabbaa8b120 op::SubThread<>::initializationOnThread()
# 0x7fabbaa8d2d8 op::Thread<>::initializationOnThread()
# 0x7fabbaa8d4a7 op::Thread<>::threadFunction()
# 0x7fabba32566f (unknown)
# 0x7fabb9a476db start_thread
# 0x7fabb9d8088f clone
Aborted
I have gone through a lot of online discussions but could not figure out how to resolve this.

I have been having the same problem with CUDNN.
Although not ideal, I have been running without CUDNN. In cmake-gui uncheck USE_CUDNN and then compile. When running openpose I have also had to reduce -net_resolution.
For example: ./build/examples/openpose/openpose.bin -net_resolution 256x192
The greater the resolution the slower the FPS though.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

nvprof Warning: The path to CUPTI and CUDA Injection libraries might not be set in LD_LIBRARY_PATH - nvidia

Related

Unable to start SonarQube, Getting Error in Terminal

tensorflow-gpu is not working with Blas GEMM launch failed

Neo4j Performance Challenge - How to Improve?

Why autoextend on Oracle XE not worked

cudnn error :: CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED

Categories

Resources