Install Apache Kylin in custom environment

Install Apache Kylin in custom environment - kylin

I'm trying to install Apache Kylin in Ubuntu 16.04.
I installed:
hadoop 3.1.2 in pseudo distributed mode (fs.default.name: hdfs://localhost:9000)
apache hive 3.1.2 and db derby 10.14.2.0 (config hive use db derby)
hbase 1.4.10 in pseudo distributed mode (using hdfs://localhost:9000/hbase)
but when i call:
hbase shell
hbase(main):001:0> list
get error:
ERROR: Can't get master address from ZooKeeper; znode data == null
Here is some help for this command:
List all user tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:
hbase> list
hbase> list 'abc.*'
hbase> list 'ns:abc.*'
hbase> list 'ns:.*'
and when i call:
ssh localhost
kylin.sh start
get error
2019-09-27 09:26:41,029 INFO [main] client.ZooKeeperRegistry:107 : ClusterId read in ZooKeeper is null
Exception in thread "main" java.lang.IllegalArgumentException: Failed to find metadata store by url: kylin_metadata#hbase
at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:99)
at org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:111)
at org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(AclTableMigrationTool.java:99)
at org.apache.kylin.tool.AclTableMigrationCLI.main(AclTableMigrationCLI.java:43)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:92)
... 3 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:372)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:275)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:436)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:310)
at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:639)
at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:409)
at org.apache.kylin.storage.hbase.HBaseConnection.tableExists(HBaseConnection.java:281)
at org.apache.kylin.storage.hbase.HBaseConnection.createHTableIfNeeded(HBaseConnection.java:306)
at org.apache.kylin.storage.hbase.HBaseResourceStore.createHTableIfNeeded(HBaseResourceStore.java:114)
at org.apache.kylin.storage.hbase.HBaseResourceStore.<init>(HBaseResourceStore.java:88)
... 8 more

From the error, obviously your HBase is not running; Please make sure HBase is good.

Hadoop has a long history and it is complex, so we recommend you to use some well-tested Hadoop Distribution such as CDH and HDP, but not a custom Hadoop environment.
If you are do a PoC and want to learn Kylin quickly, please use Docker image https://hub.docker.com/r/apachekylin/apache-kylin-standalone. If want to use Kylin in a more formal Hadoop environment, could you please use a CDH 5.x or HDP 2.x Hadoop Distribution?
If you have more question, please contact Kylin community by user mailing list.

Related

Kubeflow on mac m1

I'm trying to install Kubeflow on mac m1.
I have a single node Kubernetes cluster running from Docker Desktop. The version is V1.25.0. kubectl get nodes returns a single node.
I am trying to install Kubeflow with kfctl
kfctl version gives kfctl v1.2.0-0-gbc038f9
When I execute the command
kfctl apply -V -f https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_k8s_istio.v1.2.0.yaml
I'm getting the following error
clusterrole.rbac.authorization.k8s.io/application-controller-cluster-role unchanged
clusterrolebinding.rbac.authorization.k8s.io/application-controller-cluster-role-binding unchanged
service/application-controller-service unchanged
statefulset.apps/application-controller-stateful-set configured
WARN[0012] Encountered error applying application application: (kubeflow.error): Code 500 with message: Apply.Run : [unable to recognize "/tmp/kout706984614": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1", unable to recognize "/tmp/kout706984614": no matches for kind "Application" in version "app.k8s.io/v1beta1"] filename="kustomize/kustomize.go:284"
WARN[0012] Will retry in 4 seconds. filename="kustomize/kustomize.go:285"
serviceaccount/application-controller-service-account unchanged
clusterrole.rbac.authorization.k8s.io/application-controller-cluster-role unchanged
clusterrolebinding.rbac.authorization.k8s.io/application-controller-cluster-role-binding unchanged
service/application-controller-service unchanged
statefulset.apps/application-controller-stateful-set configured
WARN[0018] Encountered error applying application application: (kubeflow.error): Code 500 with message: Apply.Run : [unable to recognize "/tmp/kout783197161": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1", unable to recognize "/tmp/kout783197161": no matches for kind "Application" in version "app.k8s.io/v1beta1"] filename="kustomize/kustomize.go:284"
WARN[0018] Will retry in 6 seconds. filename="kustomize/kustomize.go:285"
It looks like some version mismatch, any idea what should be the right version combinations between K8s, Kfctl, and YAML files?

I could not set this up on M1 as 64-bit VM is not supported. As a work around I used MiniKF from Arrkito hosted in the AWS marketplace
https://aws.amazon.com/marketplace/pp/prodview-7shm7yqkubjhg?sr=0-1&ref_=beagle&applicationId=AWSMPContessa
This will set up Kubeflow on Single EC2 instances.

Upgrading Neo4j data fails on "logs contains entries with prefix 2"

I'm running a database on Neo4j v3.5.11 CE via Docker volume on AWS. I want to upgrade to 4.4.9, so I created a tar of ./graph.db and brought it back to my dev box. I extracted to /var/lib/neo4j/data/databases. I mounted it to a neo4j v3.5.11 container and it starts fine. I can see all the data via localhost:7474.
Next I try mounting to neo4j v4.4.0 via:
docker run -d -p 7474:7474 -p 7687:7687 -v /var/lib/neo4j/data:/var/lib/neo4j/data -v /var/lib/neo4j/plugins:/plugins -v /var/lib/neo4j/logs:/var/log/neo4j -e NEO4J_AUTH=none -e NEO4J_dbms_allow__upgrade=true --name neo4j neo4j:4.0.0
Neo4j fails: "Transaction logs contains entries with prefix 2, and the highest supported prefix is 1. This indicates that the log files originates from a newer version of neo4j." This is odd because it was upgraded from 3.5.5 and has been running on 3.5.11--never touched by a newer version.
docker logs neo4j-apoc
Fetching versions.json for Plugin 'apoc' from https://neo4j-contrib.github.io/neo4j-apoc-procedures/versions.json
Installing Plugin 'apoc' from https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.0.0.7/apoc-4.0.0.7-all.jar to /plugins/apoc.jar
Applying default values for plugin apoc to neo4j.conf
Skipping dbms.security.procedures.unrestricted for plugin apoc because it is already set
Directories in use:
home: /var/lib/neo4j
config: /var/lib/neo4j/conf
logs: /logs
plugins: /plugins
import: /var/lib/neo4j/import
data: /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
run: /var/lib/neo4j/run
Starting Neo4j.
2022-09-10 14:18:32.888+0000 WARN Unrecognized setting. No declared setting with name: apoc.export.file.enabled
2022-09-10 14:18:32.892+0000 WARN Unrecognized setting. No declared setting with name: apoc.import.file.enabled
2022-09-10 14:18:32.893+0000 WARN Unrecognized setting. No declared setting with name: apoc.import.file.use_neo4j_config
2022-09-10 14:18:32.921+0000 INFO ======== Neo4j 4.0.0 ========
2022-09-10 14:18:32.934+0000 INFO Starting...
2022-09-10 14:18:48.713+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabaseService#123d7057' was successfully initialized, but failed to start. Please see the attached cause exception "Transaction logs contains entries with prefix 2, and the highest supported prefix is 1. This indicates that the log files originates from a newer version of neo4j.". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabaseService#123d7057' was successfully initialized, but failed to start. Please see the attached cause exception "Transaction logs contains entries with prefix 2, and the highest supported prefix is 1. This indicates that the log files originates from a newer version of neo4j.".
I tried a couple things:
1.) Deleting the transaction logs: sudo rm graph.db/neostore.transaction.db.* It throws the same exact transaction log error, even though there are no transaction logs in the directory;
2.) Tried a database recovery by adding this to the run command: -e NEO4J_unsupported_dbms_tx__log_fail__on__corrupted__log__files=false This fails with "Unknown store version 'SF4.3.0'":
2022-09-10 15:39:48.458+0000 INFO Starting...
2022-09-10 15:40:34.529+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabaseService#2a39aa2b' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown store version 'SF4.3.0'". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabaseService#2a39aa2b' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown store version 'SF4.3.0'".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabaseService#2a39aa2b' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown store version 'SF4.3.0'".
Any ideas appreciated! Thanks!

Deleting transaction logs is never a good idea. What you want to do is add an environment variable:
dbms.allow_upgrade=true
Then it should work as the docs states that you can update the latest 3.5 to 4.0.0 Neo4j version.

Kylin build fails at first step

My Kylin metadata is corrupt, so I removed all metadata and reinstalled Kylin on the same server.
I tried running:
$KYLIN_HOME/bin/sample.sh
And it is not giving any error.
So i tried to create a simple cube with 1 fact and 2 dimension tables.
But my cube build failed at its first step, with this error:
java.lang.NullPointerException
at org.apache.kylin.source.hive.CreateFlatHiveTableStep.getCubeSpecificConfig(CreateFlatHiveTableStep.java:100)
at org.apache.kylin.source.hive.CreateFlatHiveTableStep.doWork(CreateFlatHiveTableStep.java:105)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I have the same problem，and fixed。
The reason is that zookeeper already have directory kylin。When I remove then kylin on zookeeper，cube build successfully
1. use zkCli.sh to connect to zookeeper
2. rmr /kylin
3. restart kylin。

Neo4j interoperability between community and enterprise edition databases

I am having issue switching between neo4j enterprise and community versions.Since i was unable to do a graphml import,i switched to enterprise where i can import graphml databases.Once i am done i am trying to open the database file created in enterprise version in community version it is giving error.
org.neo4j.server.database.LifeCycleManagingDatabase was succesfully initialized but failed to start
Is it possible to open a db created in enterprise version in community.What am i doing wrong here?
Please find the error i am getting when i am opening the db from java .
Exception in thread "main" java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, D:\roshni\graph.db
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:314)
at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:59)
at org.neo4j.graphdb.factory.GraphDatabaseFactory.newDatabase(GraphDatabaseFactory.java:107)
at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:94)
at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:176)
at org.neo4j.graphdb.factory.GraphDatabaseFactory.newEmbeddedDatabase(GraphDatabaseFactory.java:66)
at Testing.main(Testing.java:15)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.transaction.state.DataSourceManager#f1cb476' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:499)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:108)
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:309)
... 6 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.NeoStoreDataSource#2ad13d80' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:499)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:108)
at org.neo4j.kernel.impl.transaction.state.DataSourceManager.start(DataSourceManager.java:117)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:493)
... 8 more
Caused by: org.neo4j.kernel.impl.storemigration.StoreUpgrader$UpgradingStoreVersionNotFoundException: 'neostore.nodestore.db' does not contain a store version, please ensure that the original database was shut down in a clean state.
at org.neo4j.kernel.impl.storemigration.UpgradableDatabase.checkUpgradeable(UpgradableDatabase.java:86)
at org.neo4j.kernel.impl.storemigration.StoreMigrator.needsMigration(StoreMigrator.java:158)
at org.neo4j.kernel.impl.storemigration.StoreUpgrader.getParticipantsEagerToMigrate(StoreUpgrader.java:259)
at org.neo4j.kernel.impl.storemigration.StoreUpgrader.migrateIfNeeded(StoreUpgrader.java:134)
at org.neo4j.kernel.NeoStoreDataSource.upgradeStore(NeoStoreDataSource.java:532)
at org.neo4j.kernel.NeoStoreDataSource.start(NeoStoreDataSource.java:434)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:493)
... 11 more

It's better to have same version of Neo4j community and enterprise.
If your enterprise version is older then community. I suggest to change following property for update datastore
conf/neo4j.properties
allow_store_upgrade=true

In addition to what #MicTech said, you cannot downgrade a datastore. Neo4j supports upgrades. So when moving from community to enterprise, the enterprise variant needs to be the same version or a newer one.
Before doing a store upgrade, it's crucial to do a clean shutdown with the old version.

As per their documentation on Ubuntu and Debian you can do an upgrade as follow, for Neo4j 2.3.1
The Neo4j Debian repository can be used on Debian or Ubuntu.
To use the repository follow these steps:
wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo
apt-key add - echo 'deb http://debian.neo4j.org/repo stable/'
/tmp/neo4j.list sudo mv /tmp/neo4j.list /etc/apt/sources.list.d
sudo apt-get update
Installing Neo4j
To install the latest Neo4j Community Edition:
sudo apt-get install neo4j
To install the latest Neo4j Enterprise Edition:
sudo apt-get install neo4j-enterprise
The installation process will guide you thru the upgrade

Could not connect to a primary node for replica set <Moped::Cluster nodes=[<Moped::Node resolved_address="127.0.0.1:27017">]>

I'm following though with the RailsApp tutorial with Devise and Mongoid (http://railsapps.github.io/tutorial-rails-mongoid-devise.html) and am encountering the following error when I get to 'Rake db:seed' down at the 'Set Up a Database Seed File' section.
Could not connect to a primary node for replica set <Moped::Cluster nodes=[<Moped::Node resolved_address="127.0.0.1:27017">]>
I've tried the instructions from nixoncd on this page here but has not fixed the issue. It tells me 'file exists' and 'Already loaded'. 'https://groups.google.com/forum/#!topic/mongodb-user/Hhh8iNCciMk
I get this if I type 'mongod' in terminal.
ERROR: could not read from config file
Any help welcome. I'm on a Mac OSX Mountain Lion with Mongoid installed using homebrew - though MongoDB was installed using the download package mongodb.org.
MongoDB shell version: 2.4.6
Thanks
EDIT: I'm not sure if this issue is related or not. Also having issues launching mongoDB. Also posted issue here:
mongoDB, could not read from config file -- config in different folder / Uninstall it?

First See if your database is running by mongo , If yes
Use this command:
sudo rm /var/lib/mongodb/mongod.lock
mongod --repair
sudo service mongodb start
Your database will work.

Installing MongoDB solved this for me:
sudo apt-get install mongodb-server

The answers above will work for you in the majority of the cases where this error occurs.
However, I would like to note that you can also get the Could not connect to a primary node for replica set error when trying to write exceptionally large batches of records to MongoDB in one request. I have encountered this error when writing more than 200,000 1 KB documents to a remote MongoDB server with in one request. The remote server had 8 GB memory and would handle several requests at once. This error can be avoided by cutting down the batch size of your requests.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart