Neo4j Performance Challenge - How to Improve? - neo4j

I've been wrangling with Neo4J for the last few weeks, trying to resolve some extremely challenging performance problems. At this point, I need some additional help because I can't determine how to move forward.
I have a graph with a total of approx 12.5 Million nodes and 64 Million relationships. The purpose of the graph is going to be analyzing suspicious financial behavior, so it is customers, accounts, transactions, etc.
Here is an example of the performance challenge:
This query for total nodes takes 96,064ms to complete, which is extremely long.
neo4j-sh (?)$ MATCH (n) RETURN count(n);
+----------+
| count(n) |
+----------+
| 12519940 |
+----------+
1 row
96064 ms
The query for total relationships takes 919,449ms to complete, which seems silly.
neo4j-sh (?)$ MATCH ()-[r]-() return count(r);
+----------+
| count(r) |
+----------+
| 64062508 |
+----------+
1 row
919449 ms
I have 6.6M Transaction Nodes. When I attempt to search for transactions that have an amount above $8,000, the query takes 653,637ms also way too long.
neo4j-sh (?)$ MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);
+----------+
| count(t) |
+----------+
| 10696 |
+----------+
1 row
653637 ms
Relevant Schema
ON :Transaction(baseamount) ONLINE
ON :Transaction(type) ONLINE
ON :Transaction(amount) ONLINE
ON :Transaction(currency) ONLINE
ON :Transaction(basecurrency) ONLINE
ON :Transaction(transactionid) ONLINE (for uniqueness constraint)
Profile of Query:
neo4j-sh (?)$ PROFILE MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);
+----------+
| count(t) |
+----------+
| 10696 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+Filter
|
+NodeByLabel
+------------------+---------+----------+-------------+------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+------------------+---------+----------+-------------+------------------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(t) |
| EagerAggregation | 1 | 0 | | |
| Filter | 10696 | 13216382 | | Property(t,amount(62)) > { AUTODOUBLE0} |
| NodeByLabel | 6608191 | 6608192 | t, t | :Transaction |
+------------------+---------+----------+-------------+------------------------------------------+
I am running these in the neo4j shell.
The performance challenges here are starting to create substantial doubt about whether I can even use Neo4J, and seem opposite of the potential the platform offers.
I am fully admit that I may have misconfigured something (I'm relatively new to Neo4J), so guidance on what to fix or what to look at is much appreciated.
Here are details of my setup:
System: Linux, Ubuntu, 16GB RAM, 3.5 i5 Proc, 256GB SSD HD
CPU
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i5-4690K CPU # 3.50GHz
stepping : 3
microcode : 0x12
cpu MHz : 4230.625
cache size : 6144 KB
Memory
$ cat /proc/meminfo
MemTotal: 16115020 kB
MemFree: 224856 kB
MemAvailable: 8807160 kB
Buffers: 124356 kB
Cached: 8429964 kB
SwapCached: 8388 kB
Disk
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/data1--vg-root 219G 32G 177G 16% /
Neo4J.properties
neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=1G
neostore.relationshipgroupstore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=500M
neostore.propertystore.db.arrays.mapped_memory=50M
neostore.propertystore.db.index.keys.mapped_memory=200M
relationship_auto_indexing=true
Neo4J-Wrapper.properties
wrapper.java.additional=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional=-Dlog4j.configuration=file:conf/log4j.properties
#********************************************************************
# JVM Parameters
#********************************************************************
wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional=-XX:-OmitStackTraceInFastThrow
# Uncomment the following lines to enable garbage collection logging
wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
wrapper.java.additional=-XX:+PrintGCDetails
wrapper.java.additional=-XX:+PrintGCDateStamps
wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime
wrapper.java.additional=-XX:+PrintPromotionFailure
wrapper.java.additional=-XX:+PrintTenuringDistribution
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=4096
wrapper.java.maxmemory=6144
Other:
Changed the open file settings for Linux to 40k
I am not running anything else on this machine, no X Windows, no other DB server. Here is a snippet of top while running a query:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15785 neo4j 20 0 12.192g 8.964g 2.475g S 100.2 58.3 227:50.98 java
1 root 20 0 33464 2132 1140 S 0.0 0.0 0:02.36 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
The total file size in the graph.db directory is:
data/graph.db$ du --max-depth=1 -h
1.9G ./schema
36K ./index
26G .
Data loading was extremely hit or miss. Some merges would take less than 60 seconds (Even for ~200 to 300K inserts), while some merges would last for over 3 hours (11,898,514ms for a CSV file with 189,999 rows merging on one date)
I get constant GC thread blocking:
2015-03-27 14:56:26.347+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 15422ms.
2015-03-27 14:56:39.011+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 12363ms.
2015-03-27 14:56:57.533+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 13969ms.
2015-03-27 14:57:17.345+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 14657ms.
2015-03-27 14:57:29.955+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 12309ms.
2015-03-27 14:58:14.311+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 1928ms.
Please let me know if I should add anything else that would be salient to the discussion
Update 1
Thank you very much for your help, I just moved so I was delayed in responding.
Size of Neostore Files:
/data/graph.db$ ls -lah neostore.*
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.id
-rw-rw-r-- 1 neo4j neo4j 110 Apr 2 13:03 neostore.labeltokenstore.db
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.labeltokenstore.db.id
-rw-rw-r-- 1 neo4j neo4j 874 Apr 2 13:03 neostore.labeltokenstore.db.names
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.labeltokenstore.db.names.id
-rw-rw-r-- 1 neo4j neo4j 200M Apr 2 13:03 neostore.nodestore.db
-rw-rw-r-- 1 neo4j neo4j 41 Apr 2 13:03 neostore.nodestore.db.id
-rw-rw-r-- 1 neo4j neo4j 68 Apr 2 13:03 neostore.nodestore.db.labels
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.nodestore.db.labels.id
-rw-rw-r-- 1 neo4j neo4j 2.8G Apr 2 13:03 neostore.propertystore.db
-rw-rw-r-- 1 neo4j neo4j 128 Apr 2 13:03 neostore.propertystore.db.arrays
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.arrays.id
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.id
-rw-rw-r-- 1 neo4j neo4j 720 Apr 2 13:03 neostore.propertystore.db.index
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.index.id
-rw-rw-r-- 1 neo4j neo4j 3.1K Apr 2 13:03 neostore.propertystore.db.index.keys
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.index.keys.id
-rw-rw-r-- 1 neo4j neo4j 1.7K Apr 2 13:03 neostore.propertystore.db.strings
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.strings.id
-rw-rw-r-- 1 neo4j neo4j 47M Apr 2 13:03 neostore.relationshipgroupstore.db
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.relationshipgroupstore.db.id
-rw-rw-r-- 1 neo4j neo4j 1.1G Apr 2 13:03 neostore.relationshipstore.db
-rw-rw-r-- 1 neo4j neo4j 1.6M Apr 2 13:03 neostore.relationshipstore.db.id
-rw-rw-r-- 1 neo4j neo4j 165 Apr 2 13:03 neostore.relationshiptypestore.db
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.relationshiptypestore.db.id
-rw-rw-r-- 1 neo4j neo4j 1.3K Apr 2 13:03 neostore.relationshiptypestore.db.names
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.relationshiptypestore.db.names.id
-rw-rw-r-- 1 neo4j neo4j 3.5K Apr 2 13:03 neostore.schemastore.db
-rw-rw-r-- 1 neo4j neo4j 25 Apr 2 13:03 neostore.schemastore.db.id
I read that mapped memory settings are replaced by another cache, and I have commented out those settings.
Java Profiler
JvmTop 0.8.0 alpha - 16:12:59, amd64, 4 cpus, Linux 3.16.0-33, load avg 0.30
http://code.google.com/p/jvmtop
Profiling PID 4260: org.neo4j.server.Bootstrapper
68.67% ( 14.01s) org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.read()
18.73% ( 3.82s) org.neo4j.kernel.impl.nioneo.store.StoreFailureException.<init>()
2.86% ( 0.58s) org.neo4j.kernel.impl.cache.ReferenceCache.put()
1.11% ( 0.23s) org.neo4j.helpers.Counter.inc()
0.87% ( 0.18s) org.neo4j.kernel.impl.cache.ReferenceCache.get()
0.65% ( 0.13s) org.neo4j.cypher.internal.compiler.v2_1.parser.Literals$class.PropertyKeyName()
0.63% ( 0.13s) org.parboiled.scala.package$.getCurrentRuleMethod()
0.62% ( 0.13s) scala.collection.mutable.OpenHashMap.<init>()
0.62% ( 0.13s) scala.collection.mutable.AbstractSeq.<init>()
0.62% ( 0.13s) org.neo4j.kernel.impl.cache.AutoLoadingCache.get()
0.61% ( 0.13s) scala.collection.TraversableLike$$anonfun$map$1.apply()
0.61% ( 0.12s) org.neo4j.kernel.impl.transaction.TxManager.assertTmOk()
0.61% ( 0.12s) org.neo4j.cypher.internal.compiler.v2_1.commands.EntityProducerFactory.<init>()
0.61% ( 0.12s) scala.collection.AbstractTraversable.<init>()
0.61% ( 0.12s) scala.collection.immutable.List.toStream()
0.60% ( 0.12s) org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord()
0.57% ( 0.12s) org.neo4j.kernel.impl.transaction.TxManager.getTransaction()
0.37% ( 0.08s) org.parboiled.scala.Parser$class.rule()
0.06% ( 0.01s) scala.util.DynamicVariable.value()

Unfortunately the schema indexes (aka those created using CREATE INDEX ON :Label(property)) do not yet support larger than/smaller than conditions. Therefore Neo4j falls back to scan all nodes with the given label and filter on their properties. This is of course expensive.
I do see two different approaches to tackle this:
1) If your condition does always have a pre-defined maximum granularity e.g. 10s of USDs, you can build up an "amount-tree" similar to a time-tree (see http://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html).
2) if you don't know the granularity upfront the other option is to setup a manual or auto index for the amount property, see http://neo4j.com/docs/stable/indexing.html. The most easy thing is probably using auto index. In neo4j.properties set the following options:
node_auto_indexing=true
node_keys_indexable=amount
Note that this will not automatically add all existing transaction into that index, it just puts those in the index that have been written to since auto indexing is enabled.
You can do a explicit range query on the auto index using
MATCH t=node:node_auto_index("amount:[6000 TO 999999999]")
RETURN count(t)

Related

nvprof Warning: The path to CUPTI and CUDA Injection libraries might not be set in LD_LIBRARY_PATH

I get the message in the subject when I try to run a program I developed with OpenACC through Nvidia's nvprof profiler like this:
nvprof ./SFS 4
If I run nvprof with -o [output_file] the warning message doesn't appear, but the output file is not created. What could be wrong here?
The LD_LIBRARY_PATH is set in my .bashrc to: /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/cuda/11.0/lib64/ because there I have found these files there (they have "cupti" and "inj" in their names and I thought they are the needed ones):
lrwxrwxrwx 1 root root 19 Aug 4 05:27 libaccinj64.so -> libaccinj64.so.11.0
lrwxrwxrwx 1 root root 23 Aug 4 05:27 libaccinj64.so.11.0 -> libaccinj64.so.11.0.194
...
lrwxrwxrwx 1 root root 16 Aug 4 05:27 libcupti.so -> libcupti.so.11.0
lrwxrwxrwx 1 root root 20 Aug 4 05:27 libcupti.so.11.0 -> libcupti.so.2020.1.0
...
I am on Ubuntu 18.04. workstation with Nvidia GeForce RTX 2070, and have CUDA version 11 installed.
nvidia-smi command gives me this:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:02:00.0 On | N/A |
| 30% 40C P2 58W / 185W | 693MiB / 7981MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
The compilers I have (nvidia and portland) are from the latest Nvidia HPC-SDK, version 20.7-0
I compile my programs with -acc -Minfo=accel options, not sure how could I set -ta= and if it is needed at all?
P.S. I am also not sure if running my code, with or without nvprof uses GPUs at all, although I did set ACC_DEVICE_TYPE to nvidia.
Any advice would be very welcome.
Cheers
Which nvprof are you using? The one that ships with NV HPC 20.7 or your own install?
This looks very similar to an issue reported yesterday on the NVIDIA DevTalk user forums:
https://forums.developer.nvidia.com/t/new-20-7-version-where-is-the-detail-release-bugfix/146168/4
Granted this was for Nsight-systems, but it may be the same issue. It appears to be a problem with the 2020.3 version of the profilers which is the version we ship with the NV HPC 20.7 SDK. As I note, the Nsight-Systems 2020.4 release should have this fixed, so the work around would be download and install 2020.4 or use a prior release.
https://developer.nvidia.com/nsight-systems
There does seem to be a temporary issue with the Nsight-systems download that hopefully be corrected before you see this note.
Also, nvprof is in the process of being deprecated so you should consider moving to use Nsight-systems and Nsight-compute.
https://developer.nvidia.com/blog/migrating-nvidia-nsight-tools-nvvp-nvprof/

grep --exclude-from: how to include multiple files

If I use --exclude-from multiple times (to include multiple files), will grep just use the last --exclude-from or will it "or" all the filters together as if they had been in one file (and using one --exclude-from)?
Although this is not clear from either the man pages or grep's source code, exclude-from is additive. Here's my example:
Make five files named 'file.a' through 'file.f' each containing 'test_string', and five filters 'ex_a.lst' through 'ex_f.lst' containing '*.a' through '*.f', respectively:
$ for X in a b c d e f; do echo test_string > file.$X; echo file.$X > ex_$X.lst; done
$ ls -l
total 48
-rw-rw-r-- 1 user group 7 Jan 17 17:02 ex_a.lst
-rw-rw-r-- 1 user group 7 Jan 17 17:02 ex_b.lst
-rw-rw-r-- 1 user group 7 Jan 17 17:02 ex_c.lst
-rw-rw-r-- 1 user group 7 Jan 17 17:02 ex_d.lst
-rw-rw-r-- 1 user group 7 Jan 17 17:02 ex_e.lst
-rw-rw-r-- 1 user group 7 Jan 17 17:02 ex_f.lst
-rw-rw-r-- 1 user group 12 Jan 17 17:02 file.a
-rw-rw-r-- 1 user group 12 Jan 17 17:02 file.b
-rw-rw-r-- 1 user group 12 Jan 17 17:02 file.c
-rw-rw-r-- 1 user group 12 Jan 17 17:02 file.d
-rw-rw-r-- 1 user group 12 Jan 17 17:02 file.e
-rw-rw-r-- 1 user group 12 Jan 17 17:02 file.f
$ cat file.a
test_string
$ cat ex_a.lst
file.a
Search for 'test_string' in the current directory, without filters:
$ grep -R test_string | sort
file.a:test_string
file.b:test_string
file.c:test_string
file.d:test_string
file.e:test_string
file.f:test_string
All files matched. Now add three filters from three files:
$ grep -R test_string --exclude-from=ex_a.lst --exclude-from=ex_c.lst --exclude-from=ex_e.lst | sort
file.b:test_string
file.d:test_string
file.f:test_string
We are left with only three results, the ones that weren't filtered out! For this to be the case, we must have selected all three 'ex_[ace].lst' filter files.

GFS2 flags 0x00000005 blocked,join

I have cluster RHEL6,
cman, corosync, pacemaker.
After adding new memebers I got error in GFS mounting. GFS never mounts on servers.
group_tool
fence domain
member count 4
victim count 0
victim now 0
master nodeid 1
wait state none
members 1 2 3 4
dlm lockspaces
name clvmd
id 0x4104eefa
flags 0x00000000
change member 4 joined 1 remove 0 failed 0 seq 1,1
members 1 2 3 4
gfs mountgroups
name lv_gfs_01
id 0xd5eacc83
flags 0x00000005 blocked,join
change member 3 joined 1 remove 0 failed 0 seq 1,1
members 1 2 3
In processes:
root 2695 2690 0 08:03 pts/1 00:00:00 /bin/bash /etc/init.d/gfs2 start
root 2702 2695 0 08:03 pts/1 00:00:00 /bin/bash /etc/init.d/gfs2 start
root 2704 2703 0 08:03 pts/1 00:00:00 /sbin/mount.gfs2 /dev/mapper/vg_shared-lv_gfs_01 /mnt/share -o rw,_netdev,noatime,nodiratime
fsck.gfs2 -yf /dev/vg_shared/lv_gfs_01
Initializing fsck
jid=1: Replayed 0 of 0 journaled data blocks
jid=1: Replayed 20 of 21 metadata blocks
Recovering journals (this may take a while)
Journal recovery complete.
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
RGs: Consistent: 183 Cleaned: 1 Inconsistent: 0 Fixed: 0 Total: 184
2 blocks may need to be freed in pass 5 due to the cleaned resource groups.
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Block 11337799 (0xad0047) bitmap says 1 (Data) but FSCK saw 0 (Free)
Fixed.
Block 11337801 (0xad0049) bitmap says 1 (Data) but FSCK saw 0 (Free)
Fixed.
RG #11337739 (0xad000b) free count inconsistent: is 65500 should be 65502
RG #11337739 (0xad000b) Inode count inconsistent: is 15 should be 13
Resource group counts updated
Pass5 complete
The statfs file is wrong:
Current statfs values:
blocks: 12057320 (0xb7fae8)
free: 9999428 (0x989444)
dinodes: 15670 (0x3d36)
Calculated statfs values:
blocks: 12057320 (0xb7fae8)
free: 9999432 (0x989448)
dinodes: 15668 (0x3d34)
The statfs file was fixed.
Writing changes to disk
gfs2_fsck complete
gfs2_edit -p 0xad0047 field di_size /dev/vg_shared/lv_gfs_01
10 (Block 11337799 is type 10: Ext. attrib which is not implemented)
Howto drop flag blocked,join from GFS?
I solved it by reboot all servers which have GFS,
it is one of the unpleasant behavior of GFS.
GFS lock based on kernel and in the few cases it solved only with reboot.
there is very usefull manual - https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Global_File_System_2/index.html

find what process is leaving tmp files in my home directory

We have a large number of linux machines as a compute farm. We
launch jobs on the farm using LSF.
Sporadically, and randomly, some job is creating and delete thousands of 'tmp' files in
my home directly:
ls /home/cpp_home/tmp*
---------- 1 cpp_home dev 0 Dec 10 14:25 tmpxJL9In
-rw------- 1 cpp_home dev 0 Dec 10 14:25 tmpnvAtiS
-rw------- 1 cpp_home dev 0 Dec 10 14:25 tmphSrnk7
-rw------- 1 cpp_home dev 0 Dec 10 14:25 tmpJFO5Cr
---------- 1 cpp_home dev 0 Dec 10 14:25 tmpRIzn7A
-rw------- 1 cpp_home dev 0 Dec 10 14:25 tmpvulwsT
---------- 1 cpp_home dev 0 Dec 10 14:25 tmpeSz_gN
---------- 1 cpp_home dev 0 Dec 10 14:25 tmpEcatTM
-rw------- 1 cpp_home dev 0 Dec 10 14:25 tmpOy1jdi
---------- 1 cpp_home dev 0 Dec 10 14:26 tmp4oB8ua
How the hell can I found out what process is doing this?
They look suspiciously like std 'C' library tempfile, or standard
python tempfiles.... but since they don't stick around for
long I can't find out what job (of the thousands that are running via LSF)
are creating them.
I don't have source code for all the jobs... There is a great deal of
third party CAD/EDA tools in use, so it could be one of them. Or it could
be perl, or python scripts, or...
If the variety of jobs is high, this could be difficult to find; however, if the job types are not too diverse, you should be able to correlate the times that you find these being created with jobs owned by the user owning the files. By running similar jobs in isolation, you could then test to verify the behavior.
There is perhaps though another possibility. Depending on LSF settings some output/logging files may be created as temp files and copied into their final location. I could imagine this may explain the phenomenon, but normally these would fall under a $HOME/.lsbatch directory. Settings might be able to adjust this location. See this text from the BSUB command reference:
If the parameter LSB_STDOUT_DIRECT in lsf.conf is set to Y or y, and you use
the -o or -oo option, the standard output of a job is written to the file you
specify as the job runs. If LSB_STDOUT_DIRECT is not set, and you use -o or -oo,
the standard output of a job is written to a temporary file and copied to the
specified file after the job finishes.

Thinking Sphinx not working in test mode

I'm trying to get Thinking Sphinx to work in test mode in Rails. Basically this:
ThinkingSphinx::Test.init
ThinkingSphinx::Test.start
freezes and never comes back.
My test and devel configuration is the same for test and devel:
dry_setting: &dry_setting
adapter: mysql
host: localhost
encoding: utf8
username: rails
password: blahblah
development:
<<: *dry_setting
database: proj_devel
socket: /tmp/mysql.sock # sphinx requires it
test:
<<: *dry_setting
database: proj_test
socket: /tmp/mysql.sock # sphinx requires it
and sphinx.yml
development:
enable_star: 1
min_infix_len: 2
bin_path: /opt/local/bin
test:
enable_star: 1
min_infix_len: 2
bin_path: /opt/local/bin
production:
enable_star: 1
min_infix_len: 2
The generated config files, config/development.sphinx.conf and config/test.sphinx.conf only differ in database names, directories and similar things; nothing functional.
Generating the index for devel goes without an issue
$ rake ts:in
(in /Users/pupeno/proj)
default config
Generating Configuration to /Users/pupeno/proj/config/development.sphinx.conf
Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file '/Users/pupeno/proj/config/development.sphinx.conf'...
indexing index 'user_core'...
collected 7 docs, 0.0 MB
collected 0 attr values
sorted 0.0 Mvalues, 100.0% done
sorted 0.0 Mhits, 99.8% done
total 7 docs, 422 bytes
total 0.098 sec, 4320.80 bytes/sec, 71.67 docs/sec
indexing index 'user_delta'...
collected 0 docs, 0.0 MB
collected 0 attr values
sorted 0.0 Mvalues, nan% done
total 0 docs, 0 bytes
total 0.010 sec, 0.00 bytes/sec, 0.00 docs/sec
distributed index 'user' can not be directly indexed; skipping.
but when I try to do it for test it freezes:
$ RAILS_ENV=test rake ts:in
(in /Users/pupeno/proj)
DEPRECATION WARNING: require "activeresource" is deprecated and will be removed in Rails 3. Use require "active_resource" instead.. (called from /Users/pupeno/.rvm/gems/ruby-1.8.7-p249/gems/activeresource-2.3.5/lib/activeresource.rb:2)
default config
Generating Configuration to /Users/pupeno/proj/config/test.sphinx.conf
Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file '/Users/pupeno/proj/config/test.sphinx.conf'...
indexing index 'user_core'...
It's been there for more than 10 minutes, the user table has 4 records.
The database directory look quite diferently, but I don't know what to make of it:
$ ls -l db/sphinx/development/
total 96
-rw-r--r-- 1 pupeno staff 196 Mar 11 18:10 user_core.spa
-rw-r--r-- 1 pupeno staff 4982 Mar 11 18:10 user_core.spd
-rw-r--r-- 1 pupeno staff 417 Mar 11 18:10 user_core.sph
-rw-r--r-- 1 pupeno staff 3067 Mar 11 18:10 user_core.spi
-rw-r--r-- 1 pupeno staff 84 Mar 11 18:10 user_core.spm
-rw-r--r-- 1 pupeno staff 6832 Mar 11 18:10 user_core.spp
-rw-r--r-- 1 pupeno staff 0 Mar 11 18:10 user_delta.spa
-rw-r--r-- 1 pupeno staff 1 Mar 11 18:10 user_delta.spd
-rw-r--r-- 1 pupeno staff 417 Mar 11 18:10 user_delta.sph
-rw-r--r-- 1 pupeno staff 1 Mar 11 18:10 user_delta.spi
-rw-r--r-- 1 pupeno staff 0 Mar 11 18:10 user_delta.spm
-rw-r--r-- 1 pupeno staff 1 Mar 11 18:10 user_delta.spp
$ ls -l db/sphinx/test/
total 0
-rw-r--r-- 1 pupeno staff 0 Mar 11 18:11 user_core.spl
-rw-r--r-- 1 pupeno staff 0 Mar 11 18:11 user_core.tmp0
-rw-r--r-- 1 pupeno staff 0 Mar 11 18:11 user_core.tmp1
-rw-r--r-- 1 pupeno staff 0 Mar 11 18:11 user_core.tmp2
-rw-r--r-- 1 pupeno staff 0 Mar 11 18:11 user_core.tmp7
Nothing gets added to a log when this happens. Any ideas where to go from here?
I can run the command line manually:
/opt/local/bin/indexer --config config/test.sphinx.conf --all
which generates the output as the rake ts:in, so no help there.
The problem was the random ids generated by fixtures. The solution is described on http://freelancing-god.github.com/ts/en/common_issues.html#slow_indexing
Slow Indexing
If Sphinx is taking a
while to process all your records,
there are a few common reasons for
this happening. Firstly, make sure you
have database indexes on any foreign
key columns and any columns you filter
or sort by.
Secondly – are you using fixtures?
Rails’ fixtures have randomly
generated IDs, which are usually
extremely large integers, and Sphinx
isn’t set up to process disparate IDs
efficiently by default. To get around
this, you’ll need to set
sql_range_step in your
config/sphinx.yml file for the
appropriate environments:
development:
sql_range_step: 10000000
I added it to both, development and test environments.

Resources