We use XGBoost model for regression prediction model, We use XGBoost as grid search hyper parameter tuning process,
We run this model on 90GB h2o cluster. This process now running over 1.2 years, but suddenly this process stop due to "Closing connection _sid_af1c at exit"
Training data set is 800 000, due to this error we decreased it to 500 000 but same error occurred.
ntrees - 300,400
depth - 8.10
variables - 382
I have attached H2o memory log and our application error log. Could you please support to fixed this issue.
----------------------------------------H2o Log [Start]----------------------
**We start H2o as 2 node cluster, but h2o log crated on one node.**
INFO water.default: ----- H2O started -----
INFO water.default: Build git branch: master
INFO water.default: Build git hash: 0588cccd72a7dc1274a83c30c4ae4161b92d9911
INFO water.default: Build git describe: jenkins-master-5236-4-g0588ccc
INFO water.default: Build project version: 3.33.0.5237
INFO water.default: Build age: 1 year, 3 months and 17 days
INFO water.default: Built by: 'jenkins'
INFO water.default: Built on: '2020-10-27 19:21:29'
WARN water.default:
WARN water.default: *** Your H2O version is too old! Please download the latest version from http://h2o.ai/download/ ***
WARN water.default:
INFO water.default: Found H2O Core extensions: [XGBoost, KrbStandalone]
INFO water.default: Processed H2O arguments: [-flatfile, /usr/local/h2o/flatfile.txt, -port, 54321]
INFO water.default: Java availableProcessors: 20
INFO water.default: Java heap totalMemory: 962.5 MB
INFO water.default: Java heap maxMemory: 42.67 GB
INFO water.default: Java version: Java 1.8.0_262 (from Oracle Corporation)
INFO water.default: JVM launch parameters: [-Xmx48g]
INFO water.default: JVM process id: 83043#masterb.xxxxx.com
INFO water.default: OS version: Linux 3.10.0-1127.10.1.el7.x86_64 (amd64)
INFO water.default: Machine physical memory: 62.74 GB
INFO water.default: Machine locale: en_US
INFO water.default: X-h2o-cluster-id: 1644769990156
INFO water.default: User name: 'root'
INFO water.default: IPv6 stack selected: false
INFO water.default: Possible IP Address: ens192 (ens192), xxxxxxxxxxxxxxxxxxxx
INFO water.default: Possible IP Address: ens192 (ens192), xxxxxxxxxxx
INFO water.default: Possible IP Address: lo (lo), 0:0:0:0:0:0:0:1%lo
INFO water.default: Possible IP Address: lo (lo), 127.0.0.1
INFO water.default: H2O node running in unencrypted mode.
INFO water.default: Internal communication uses port: 54322
INFO water.default: Listening for HTTP and REST traffic on http://xxxxxxxxxxxx:54321/
INFO water.default: H2O cloud name: 'root' on /xxxxxxxxxxxx:54321, discovery address /xxxxxxxxxxxx:57653
INFO water.default: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555):
INFO water.default: 1. Open a terminal and run 'ssh -L 55555:localhost:54321 root#xxxxxxxxxxxx'
INFO water.default: 2. Point your browser to http://localhost:55555
INFO water.default: Log dir: '/tmp/h2o-root/h2ologs'
INFO water.default: Cur dir: '/usr/local/h2o/h2o-3.33.0.5237'
INFO water.default: Subsystem for distributed import from HTTP/HTTPS successfully initialized
INFO water.default: HDFS subsystem successfully initialized
INFO water.default: S3 subsystem successfully initialized
INFO water.default: GCS subsystem successfully initialized
INFO water.default: Flow dir: '/root/h2oflows'
INFO water.default: Cloud of size 1 formed [/xxxxxxxxxxxx:54321]
INFO water.default: Registered parsers: [GUESS, ARFF, XLS, SVMLight, AVRO, PARQUET, CSV]
INFO water.default: XGBoost extension initialized
INFO water.default: KrbStandalone extension initialized
INFO water.default: Registered 2 core extensions in: 2632ms
INFO water.default: Registered H2O core extensions: [XGBoost, KrbStandalone]
INFO hex.tree.xgboost.XGBoostExtension: Found XGBoost backend with library: xgboost4j_gpu
INFO hex.tree.xgboost.XGBoostExtension: XGBoost supported backends: [WITH_GPU, WITH_OMP]
INFO water.default: Registered: 217 REST APIs in: 353ms
INFO water.default: Registered REST API extensions: [Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4]
INFO water.default: Registered: 291 schemas in 112ms
INFO water.default: H2O started in 4612ms
INFO water.default:
INFO water.default: Open H2O Flow in your web browser: http://xxxxxxxxxxxx:54321
INFO water.default:
INFO water.default: Cloud of size 2 formed [mastera.xxxxxxxxxxxx.com/xxxxxxxxxxxx:54321, masterb.xxxxxxxxxxxx.com/xxxxxxxxxxxx:54321]
INFO water.default: Locking cloud to new members, because water.rapids.Session$1
INFO hex.tree.xgboost.task.XGBoostUpdater: Initial Booster created, size=448
ERROR water.default: Got IO error when sending a batch of bytes:
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468)
at water.H2ONode$SmallMessagesSendThread.sendBuffer(H2ONode.java:605)
at water.H2ONode$SmallMessagesSendThread.run(H2ONode.java:588)
----------------------------------------H2o Log [End]--------------------------------
----------------------------------------Application Log [Start]----------------------
Checking whether there is an H2O instance running at http://localhost:54321 . connected.
Warning: Your H2O cluster version is too old (1 year, 3 months and 17 days)! Please download and install the latest version from http://h2o.ai/download/
-------------------------- ------------------------------------------------------------------
H2O_cluster_uptime: 19 mins 49 secs
H2O_cluster_timezone: Asia/Colombo
H2O_data_parsing_timezone: UTC
H2O_cluster_version: 3.33.0.5237
H2O_cluster_version_age: 1 year, 3 months and 17 days !!!
H2O_cluster_name: root
H2O_cluster_total_nodes: 2
H2O_cluster_free_memory: 84.1 Gb
H2O_cluster_total_cores: 40
H2O_cluster_allowed_cores: 40
H2O_cluster_status: locked, healthy
H2O_connection_url: http://localhost:54321
H2O_connection_proxy: {"http": null, "https": null}
H2O_internal_security: False
H2O_API_Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4
Python_version: 3.7.0 final
-------------------------- ------------------------------------------------------------------
-------------------------- ------------------------------------------------------------------
H2O_cluster_uptime: 19 mins 49 secs
H2O_cluster_timezone: Asia/Colombo
H2O_data_parsing_timezone: UTC
H2O_cluster_version: 3.33.0.5237
H2O_cluster_version_age: 1 year, 3 months and 17 days !!!
H2O_cluster_name: root
H2O_cluster_total_nodes: 2
H2O_cluster_free_memory: 84.1 Gb
H2O_cluster_total_cores: 40
H2O_cluster_allowed_cores: 40
H2O_cluster_status: locked, healthy
H2O_connection_url: http://localhost:54321
H2O_connection_proxy: {"http": null, "https": null}
H2O_internal_security: False
H2O_API_Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4
Python_version: 3.7.0 final
-------------------------- ------------------------------------------------------------------
release memory here...
Checking whether there is an H2O instance running at http://localhost:54321 . connected.
Warning: Your H2O cluster version is too old (1 year, 3 months and 17 days)! Please download and install the latest version from http://h2o.ai/download/
-------------------------- ------------------------------------------------------------------
H2O_cluster_uptime: 19 mins 49 secs
H2O_cluster_timezone: Asia/Colombo
H2O_data_parsing_timezone: UTC
H2O_cluster_version: 3.33.0.5237
H2O_cluster_version_age: 1 year, 3 months and 17 days !!!
H2O_cluster_name: root
H2O_cluster_total_nodes: 2
H2O_cluster_free_memory: 84.1 Gb
H2O_cluster_total_cores: 40
H2O_cluster_allowed_cores: 40
H2O_cluster_status: locked, healthy
H2O_connection_url: http://localhost:54321
H2O_connection_proxy: {"http": null, "https": null}
H2O_internal_security: False
H2O_API_Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4
Python_version: 3.7.0 final
-------------------------- ------------------------------------------------------------------
Parse progress: |█████████████████████████████████████████████████████████| 100%
xgboost Grid Build progress: |████████Closing connection _sid_af1c at exit
H2O session _sid_af1c was not closed properly.
Closing connection _sid_9313 at exit
H2O session _sid_9313 was not closed properly.
----------------------------------------Application Log [End]----------------------
This typically means one of the nodes crashed, it can be due to many different reasons - memory is the most common one.
I see your machine has about 64GB of physical memory and H2O is getting 48GB out of that. XGBoost runs in native memory, not in the JVM memory. For XGBoost we recommend splitting the physical memory 50-50 to H2O and XGBoost.
You are running a development version of H2O (3.33) - I suggest upgrading to the latest stable.
Related
We are experiencing performance problems with Stardog requests (about 500 000ms minimum to get an answer). We followed the Debian Based Systems installation described in the Stardog documentation and have a stardog service installed in our Ubutu VM.
Azure machine: Standard D4s v3 (4 virtual processors, 16 Gb memory)
Total amount of memory of the VM = 16 Gio of memory
We tested several JVM environment variables
Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g
Xms8g -Xmx8g -XX:MaxDirectMemorySize=8g
We also tried to upgrade the VM with a machine but without success:
Azure: Standard D8s v3 - 8 virtual processors, 32 Gb memory
By doing the command: systemctl status stardog in the machine with 32Gio memory
we get :
stardog.service - Stardog Knowledge Graph
Loaded: loaded (/etc/systemd/system/stardog.service; enabled; vendor prese>
Active: active (running) since Tue 2023-01-17 15:41:40 UTC; 1min 35s ago
Docs: https://www.stardog.com/
Process: 797 ExecStart=/opt/stardog/stardog-server.sh start (code=exited, s>
Main PID: 969 (java)
Tasks: 76 (limit: 38516)
Memory: 1.9G
CGroup: /system.slice/stardog.service
└─969 java -Dstardog.home=/var/opt/stardog/ -Xmx8g -Xms8g XX:MaxD
stardog-admin server status :
Access Log Enabled : true
Access Log Type : text
Audit Log Enabled : true
Audit Log Type : text
Backup Storage Directory : .backup
CPU Load : 1.88 %
Connection Timeout : 10m
Export Storage Directory : .exports
Memory Heap : 305M (Max: 8.0G)
Memory Mode : DEFAULT{Starrocks.block_cache=20, Starrocks.dict_block_cache=10, Native.starrocks=70, Heap.dict_value=50, Starrocks.txn_block_cache=5, Heap.dict_index=50, Starrocks.untracked_memory=20, Starrocks.memtable=40, Starrocks.buffer_pool=5, Native.query=30}
Memory Query Blocks : 0B (Max: 5.7G)
Memory RSS : 4.3G
Named Graph Security : false
Platform Arch : amd64
Platform OS : Linux 5.15.0-1031-azure, Java 1.8.0_352
Query All Graphs : false
Query Timeout : 1h
Security Disabled : false
Stardog Home : /var/opt/stardog
Stardog Version : 8.1.1
Strict Parsing : true
Uptime : 2 hours 18 minutes 51 seconds
Knowing that there is only stardog server installed in this VM, 8G JVM Heap Memory & 20G Direct Memory for Java, is it normal to have 1.9G in memory (No process in progress)
and 4.1G (when the query is in progress)
"databases.xxxx.queries.latency": {
"count": 7,
"max": 471.44218324400003,
"mean": 0.049260736982859085,
"min": 0.031328932000000004,
"p50": 0.048930366,
"p75": 0.048930366,
"p95": 0.048930366,
"p98": 0.048930366,
"p99": 0.048930366,
"p999": 0.048930366,
"stddev": 0.3961819852037625,
"m15_rate": 0.0016325388459502614,
"m1_rate": 0.0000015369791915358426,
"m5_rate": 0.0006317127755974434,
"mean_rate": 0.0032760240366080024,
"duration_units": "seconds",
"rate_units": "calls/second"
Of all your queries the slowest took 8 minutes to complete while the others completed very quickly. Best to identify the slow query and profile it.
I have a ktor app. I works fine when I run it in development mode. I package it in a docker image by copying over what the gradle application plugin provided. That also works fine on my local machine 8 cores. But now the strange part. When I do exactly the same thing on a rented V-Server also running Ubuntu-20.04 like my local system, ktor is incredible slow.
docker-compose logs server:
server | 2021-08-24 08:00:23.337 [main] INFO ktor.application - Autoreload is disabled because the development mode is off.
server | 2021-08-24 08:25:35.048 [main] INFO ktor.application - Autoreload is disabled because the development mode is off.
server | 2021-08-24 09:18:48.246 [main] INFO c.e.e.s.TemplateStore - Starting to parse Sentences
server | 2021-08-24 09:18:48.345 [main] INFO c.e.e.s.TemplateStore - Finished parsing sentences
server | 2021-08-24 09:18:48.346 [main] INFO ktor.application - Responding at http://0.0.0.0:8080
server | 2021-08-24 09:18:48.347 [main] INFO ktor.application - Application started in 3193.32 seconds.
Application started in 3193.32 seconds
The source code can be found here https://github.com/1-alex98/whatisthat . It has a docker-compose.yml defining the whole docker container being started.
Local system 32 gb ram + 8 cores . V-Server 4 gb Ram + 2 cores (htop shows pleinty of resources are free).
I am looking for ideas on what in the world could cause this behavior. Or ways to debug it.
Update:
Seems to read a file forever:
"main" #1 prio=5 os_prio=0 cpu=652.14ms elapsed=173.92s tid=0x00007f01d4016000 nid=0xe runnable [0x00007f01dace6000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(java.base#11.0.12/Native Method)
at java.io.FileInputStream.read(java.base#11.0.12/FileInputStream.java:279)
at java.io.FilterInputStream.read(java.base#11.0.12/FilterInputStream.java:133)
at sun.security.provider.NativePRNG$RandomIO.readFully(java.base#11.0.12/NativePRNG.java:424)
at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(java.base#11.0.12/NativePRNG.java:526)
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(java.base#11.0.12/NativePRNG.java:545)
- locked <0x00000000c7571158> (a java.lang.Object)
at sun.security.provider.NativePRNG$Blocking.engineNextBytes(java.base#11.0.12/NativePRNG.java:268)
at java.security.SecureRandom.nextBytes(java.base#11.0.12/SecureRandom.java:751)
at kotlin.random.AbstractPlatformRandom.nextBytes(PlatformRandom.kt:47)
at kotlin.random.Random.nextBytes(Random.kt:260)
at com.example.routes.websocket.WebsocketRoutingKt.<clinit>(WebsocketRouting.kt:40)
at com.example.plugins.RoutingKt$routing$1.invoke(Routing.kt:13)
at com.example.plugins.RoutingKt$routing$1.invoke(Routing.kt:11)
at io.ktor.routing.Routing$Feature.install(Routing.kt:106)
at io.ktor.routing.Routing$Feature.install(Routing.kt:88)
at io.ktor.application.ApplicationFeatureKt.install(ApplicationFeature.kt:68)
at io.ktor.routing.RoutingKt.routing(Routing.kt:129)
at com.example.plugins.RoutingKt.routing(Routing.kt:11)
at com.example.ApplicationKt$main$1.invoke(Application.kt:18)
at com.example.ApplicationKt$main$1.invoke(Application.kt:14)
at io.ktor.server.engine.internal.CallableUtilsKt.executeModuleFunction(CallableUtils.kt:50)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading$launchModuleByName$1.invoke(ApplicationEngineEnvironmentReloading.kt:317)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading$launchModuleByName$1.invoke(ApplicationEngineEnvironmentReloading.kt:316)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.avoidingDoubleStartupFor(ApplicationEngineEnvironmentReloading.kt:341)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.launchModuleByName(ApplicationEngineEnvironmentReloading.kt:316)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.access$launchModuleByName(ApplicationEngineEnvironmentReloading.kt:30)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading$instantiateAndConfigureApplication$1.invoke(ApplicationEngineEnvironmentReloading.kt:304)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading$instantiateAndConfigureApplication$1.invoke(ApplicationEngineEnvironmentReloading.kt:295)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.avoidingDoubleStartup(ApplicationEngineEnvironmentReloading.kt:323)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.instantiateAndConfigureApplication(ApplicationEngineEnvironmentReloading.kt:295)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.createApplication(ApplicationEngineEnvironmentReloading.kt:136)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.start(ApplicationEngineEnvironmentReloading.kt:268)
at io.ktor.server.netty.NettyApplicationEngine.start(NettyApplicationEngine.kt:174)
at com.example.ApplicationKt.main(Application.kt:21)
at com.example.ApplicationKt.main(Application.kt)
It is a fresh rented server but I guess something is wrong with it
docker-compose being slow and my program not starting seemed to be due to insufficient(not good enough) input to /dev/urandom. Installing https://github.com/smuellerDD/jitterentropy-rngd resolved the problem.
Run the orientdb with the docker. The server is powered off, restart the server, and then start orientdb fails, the error is shown below,but can I find the file in the folder 'metadata'?
2020-04-05 11:37:39:282 INFO Detected limit of amount of simultaneously open files is 1048576, limit of open files for disk cache will be set to 523776 [ONative]
2020-04-05 11:37:39:298 INFO Loading configuration from: /orientdb/config/orientdb-server-config.xml... [OServerConfigurationLoaderXml]
2020-04-05 11:37:39:587 INFO OrientDB Server v3.0.22 - Veloce (build 8afc634a2ea9c351898ae57dee8d739ff4851252, branch 3.0.x) is starting up... [OServer]
2020-04-05 11:37:39:619 INFO OrientDB config DISKCACHE=800MB [orientechnologies]
2020-04-05 11:37:39:697 INFO System is started under an effective user : `root` [OEngineLocalPaginated]
2020-04-05 11:37:39:697 INFO Allocation of 12160 pages. [OEngineLocalPaginated]
2020-04-05 11:37:39:880 INFO WAL maximum segment size is set to 32,241 MB [OrientDBDistributed]
2020-04-05 11:37:39:882 INFO Databases directory: /orientdb/databases [OServer]
2020-04-05 11:37:39:987 INFO Direct IO for WAL located in /orientdb/databases/OSystem is allowed with block size 4096 bytes. [OCASDiskWriteAheadLog]
2020-04-05 11:37:39:988 INFO Page size for WAL located in /orientdb/databases/OSystem is set to 4096 bytes. [OCASDiskWriteAheadLog]
2020-04-05 11:37:40:587 INFO Storage 'plocal:/orientdb/databases/OSystem' is opened under OrientDB distribution : 3.0.22 - Veloce (build 8afc634a2ea9c351898ae57dee8d739ff4851252, branch 3.0.x) [OLocalPaginatedStorage]
2020-04-05 11:37:41:013 INFO Listening binary connections on 0.0.0.0:2424 (protocol v.37, socket=default) [OServerNetworkListener]
2020-04-05 11:37:41:019 INFO Listening http connections on 0.0.0.0:2480 (protocol v.10, socket=default) [OServerNetworkListener]
2020-04-05 11:37:41:024 INFO Found ORIENTDB_ROOT_PASSWORD variable, using this value as root's password [OServer]
2020-04-05 11:37:41:764 INFO Installing dynamic plugin 'orientdb-studio-3.0.22.zip'... [OServerPluginManager]
2020-04-05 11:37:41:767 INFO Installing dynamic plugin 'orientdb-etl-3.0.22.jar'... [OServerPluginManager]
2020-04-05 11:37:41:773 INFO ODefaultPasswordAuthenticator is active [ODefaultPasswordAuthenticator]
2020-04-05 11:37:41:776 INFO OServerConfigAuthenticator is active [OServerConfigAuthenticator]
2020-04-05 11:37:41:777 INFO OSystemUserAuthenticator is active [OSystemUserAuthenticator]
2020-04-05 11:37:41:778 INFO [OVariableParser.resolveVariables] Property not found: distributed [orientechnologies]
2020-04-05 11:37:41:785 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2020-04-05 11:37:41:788 INFO OrientDB Studio available at http://172.18.0.2:2480/studio/index.html [OServer]
2020-04-05 11:37:41:788 INFO OrientDB Server is active v3.0.22 - Veloce (build 8afc634a2ea9c351898ae57dee8d739ff4851252, branch 3.0.x). [OServer]
2020-04-05 11:38:16:755 INFO Direct IO for WAL located in /orientdb/databases/metadata is allowed with block size 4096 bytes. [OCASDiskWriteAheadLog]
2020-04-05 11:38:16:755 INFO Page size for WAL located in /orientdb/databases/metadata is set to 4096 bytes. [OCASDiskWriteAheadLog]Exception `6EA2C93F` in storage `plocal:/orientdb/databases/metadata`: 3.0.22 - Veloce (build 8afc634a2ea9c351898ae57dee8d739ff4851252, branch 3.0.x)
com.orientechnologies.orient.core.exception.OStorageException: File with name internal.pcl does not exist in storage metadata
at com.orientechnologies.orient.core.storage.cache.local.OWOWCache.loadFile(OWOWCache.java:732)
at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.openFile(ODurableComponent.java:188)
at com.orientechnologies.orient.core.storage.cluster.v0.OPaginatedClusterV0.open(OPaginatedClusterV0.java:192)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.openClusters(OAbstractPaginatedStorage.java:532)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:391)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthenticate(OrientDBEmbedded.java:236)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthenticate(OrientDBEmbedded.java:59)
at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:936)
at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:908)
at com.orientechnologies.orient.server.network.protocol.http.command.OServerCommandAuthenticatedDbAbstract.authenticate(OServerCommandAuthenticatedDbAbstract.java:162)
at com.orientechnologies.orient.server.network.protocol.http.command.OServerCommandAuthenticatedDbAbstract.beforeExecute(OServerCommandAuthenticatedDbAbstract.java:122)
at com.orientechnologies.orient.server.network.protocol.http.command.get.OServerCommandGetConnect.beforeExecute(OServerCommandGetConnect.java:50)
at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.service(ONetworkProtocolHttpAbstract.java:225)
at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.execute(ONetworkProtocolHttpAbstract.java:707)
at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:69)
Recently I've reinstalled my VPS and have a fresh install of Neo4j on it.
I'm using putty to connect from my machine, tunneling port 7474 as I've done in the past. I'm new to Neo4j 3.2 and am getting this error when I try to connect to the server on the Neo4j browser:
N/A: WebSocket connection failure. Due to security constraints in your
web browser, the reason for the failure is not available to this Neo4j
Driver.
After trying a lot of different suggestions for sort of related topics I ended up allowing remote connections and discovered that when I access remotely eg. http://my_vps_ip:7474/browser/ I have no issues at all.
This is the output of neo4j status:
● neo4j.service - Neo4j Graph Database
Loaded: loaded (/lib/systemd/system/neo4j.service; disabled; vendor preset: enabled)
Active: active (running) since Fri 2017-05-12 04:47:11 CEST; 2h 1min ago
Main PID: 17040 (java)
Tasks: 38
Memory: 272.1M
CPU: 1min 6.731s
CGroup: /system.slice/neo4j.service
└─17040 /usr/bin/java -cp /var/lib/neo4j/plugins:/etc/neo4j:/usr/share/neo4j/lib/*:/var/lib/neo4j/plugins/* -server -XX:
May 12 04:47:11 vps276997 neo4j[17040]: import: /var/lib/neo4j/import
May 12 04:47:11 vps276997 neo4j[17040]: data: /var/lib/neo4j/data
May 12 04:47:11 vps276997 neo4j[17040]: certificates: /var/lib/neo4j/certificates
May 12 04:47:11 vps276997 neo4j[17040]: run: /var/run/neo4j
May 12 04:47:11 vps276997 neo4j[17040]: Starting Neo4j.
May 12 04:47:12 vps276997 neo4j[17040]: 2017-05-12 02:47:12.417+0000 INFO ======== Neo4j 3.2.0 ========
May 12 04:47:12 vps276997 neo4j[17040]: 2017-05-12 02:47:12.844+0000 INFO Starting...
May 12 04:47:13 vps276997 neo4j[17040]: 2017-05-12 02:47:13.950+0000 INFO Bolt enabled on 0.0.0.0:7687.
May 12 04:47:18 vps276997 neo4j[17040]: 2017-05-12 02:47:18.196+0000 INFO Started.
May 12 04:47:20 vps276997 neo4j[17040]: 2017-05-12 02:47:20.274+0000 INFO Remote interface available at http://localhost:7474/
Any ideas why this might be happening?
Please ensure that public access to 7687 port is enabled in your
'neo4j.conf' file. In the latest version, it should be two line in your 'neo4j.conf':
dbms.connector.bolt.enabled=true
dbms.connector.bolt.listen_address=0.0.0.0:7687
That is because neo4j's bolt protocol takes 7687 port.
Also ensure your expose 7687 in your instance to public, if you are using AWS EC2, choose protocol to be TCP because bolt is based on TCP.
If you are using Docker/k8s, also ensure that you expose all ports(7474,7473,7687 by default) in your containers or k8s service.
There is a neo4j knowledge base article is about this exact issue.
Quote:
This error can be resolved by editing the file
$NEO4J_HOME/conf/neo4j.conf and uncommenting:
# To have Bolt accept non-local connections, uncomment this line:
dbms.connector.bolt.address=0.0.0.0:7687
We have installed WSO2 Message Broker, v2.2.0 on Suse 64 bit OS, single core. We have configured the master-datasources.xml to point to an Oracle database. The startup of the MB takes minutes, especially:
TID: [0] [MB] [2014-06-11 15:57:53,039] INFO {org.apache.cassandra.thrift.ThriftServer} - Listening for thrift clients... {org.apache.cassandra.thrift.ThriftServer}
TID: [0] [MB] [2014-06-11 15:57:53,219] INFO {org.apache.cassandra.service.GCInspector} - GC for MarkSweepCompact: 407 ms for 1 collections, 60663688 used; max is 1037959168 {org.apache.cassandra.service.GCInspector}
TID: [0] [MB] [2014-06-11 15:58:39,137] WARN {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} - Waiting for required OSGi services: org.wso2.carbon.server.admin.common.IServerAdmin,org.wso2.carbon.throttling.agent.ThrottlingAgent, {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent}
TID: [0] [MB] [2014-06-11 15:59:39,136] WARN {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} - Waiting for required OSGi services: org.wso2.carbon.server.admin.common.IServerAdmin,org.wso2.carbon.throttling.agent.ThrottlingAgent, {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent}
TID: [0] [MB] [2014-06-11 16:00:39,136] WARN {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} - Waiting for required OSGi services: org.wso2.carbon.server.admin.common.IServerAdmin,org.wso2.carbon.throttling.agent.ThrottlingAgent, {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent}
Is there a reason for this?
With Wso2 MB 220 we are getting these kind of errors when zookeeper/casandra server does not start properly.Ideally if clustering enabled zookeeper(Internal or External) server should be started properly before MB starts.
Further If you trying to run a MB cluster on a single machine and want to run two Zookeeper nodes there, Most probably you will be end up in these OSGI level errors.Please follow blog post on http://indikasampath.blogspot.com/2014/05/wso2-message-broker-cluster-setup-in.html for configuration details on WSO2 Message Broker cluster setup on a single machine