How to configure an Hbase cluster in fully distributed mode using Docker - docker

I'm setting up an HBase cluster in fully distributed mode, with 1 HMaster node and 3 Regionservers node
My hbase-site.xml file contain
<configuration>
<property>
<name>hbase.master</name>
<value>hbase.master:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop.master:9000/data/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/tmp</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hbase.master</value>
</property>
</configuration>
My hadoop cluster is running normally.
I run Zookeeper on the same machine with hbase master, and the configuration file zoo.cfg as it default value.
When I start cluster and view the Hbase Master web UI, all the Regionserver appear in the list, but when I try to create table or something else command such as hbase>status they always show Exeception:
ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:1889)
at org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:695)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:42406)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2033)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
So what's wrong with my cluster?

I found out that I'm using Docker version 1.8.2
After fully remove Docker then install the older one (1.7.0) then my script run normally

Related

How to take backup and restore of apache ignite docker?

I am using apache ignite docker for my project as persistant storage. I need to take the whole backup of ignite data under the 'config' folder like marshaller, db, binary and log. I am using ignite:2.3.0.
Below is the docker compose configuration of ignite:
ignite:
hostname: ignite
image: apacheignite/ignite:2.3.0
environment:
- CONFIG_URI=file:///opt/ignite/apache-ignite-fabric-2.3.0-bin/work/ignite_config.xml
volumes:
- ./volumes/ignite/config:/opt/ignite/apache-ignite-fabric-2.3.0-bin/work
network_mode: "host"
ports:
- 0.0.0.0:${IGNITE_EXPOSED_PORT1}:${IGNITE_EXPOSED_PORT1}
- 0.0.0.0:${IGNITE_EXPOSED_PORT2}:${IGNITE_EXPOSED_PORT2}
- 0.0.0.0:${IGNITE_EXPOSED_PORT3}:${IGNITE_EXPOSED_PORT3}
ignite_config.xml
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<!-- Enabling Apache Ignite Persistent Store. -->
<property name="peerClassLoadingEnabled" value="true"/>
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
<!-- Setting the size of the default region to 5GB. -->
<property name="maxSize" value="#{5L * 1024 * 1024 * 1024}"/>
</bean>
</property>
</bean>
</property>
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="localPort" value="48500"/>
<property name="localPortRange" value="30"/>
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>IP:48500..48530</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
Can you please suggest how should I take backup and restore when I bring the ignite container up again?
If you need to take a backup of live cluster node, make sure to deactivate cluster before taking a backup, reactivate after the backup is finished.
You can also make use of Apache Ignite snapshot feature introduced recently, or of more advanced GridGain snapshot features.

Hadoop. How to avoid workers file due to Docker automatic names

Some tools like Hadoop need to explicitly especify the name of workers (section Slaves File in docs), but when deploys with Docker Swarm it assigns automatic container names, so workers file doesn't work anymore as the names in it don't exist. Is there any way to avoid this file or, at least, assign aliases for containers (independently of container name) to make it work?
Maybe I can't use docker-compose.yml file and I must create the services manually over the cluster... Any kind of light on the subject would be really appreciated
Well, Hadoop documentation sucks... Apparently if you set the alias of master node in the core-site.xml file you can omit the workers file. These are the step I followed:
Customized the core-site.xml file (in my docker-compose.yml file I put my master service with the name nodemaster). This file must be in master and workers nodes:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://nodemaster:9000</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://nodemaster:9000</value>
</property>
</configuration>
</configuration>
Now when you run:
start-dfs.sh
start-yarn.sh
I'll connect to the master automatically

maven built dockerized project that runs a node application

I have a project that is built via maven, its a dockerized project for a node application.
I want to be able to customize my CMD/EntryPoint based on the maven build arguments.
I know that when we do docker run and provide it the arguments it is accepted and that works fine.
but I want to do the same thing from maven commandline.
Is there a way to let docker run know the argument passed in the maven commandline?
or even better can I edit the dockerfile and read commandline args of maven and use in the dockerfile ENTRYPOINT?
Thanks in advance,
Minakshi
Based on this, you can either use maven-resources-plugin to replace instances of ${...} with the values set in maven before you build the docker file.
Example:
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
<executions>
<execution>
<id>filter-dockerfile</id>
<phase>generate-resources</phase>
<goals>
<goal>copy-resources</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}</outputDirectory>
<resources>
<resource>
<directory>src/main/docker</directory>
<filtering>true</filtering>
</resource>
</resources>
</configuration>
</execution>
</executions>
This assume your docker file is under src/main/docker/ path. The replaced docker file will be copied on ${project.build.directory} path.
Or based on this comment, you could pass arguments to docker file.
Example:
On your maven docker plugin
<configuration>
<buildArgs>
<artifactId>${project.artifactId}</artifactId>
<groupId>${project.groupId}</groupId>
</buildArgs>
</configuration>
Then access those properties as ARGs on docker file
ARG artifactId
ARG groupId
ENV ARTIFACT_ID=${artifactId} GROUP_ID=${groupId}
Hope this help answer you question.
Thank you for the responses
I used the resource filtering in maven to solve my problem:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<userdefvariable></userdefvariable> // variable that you want to pass along
</properties>
<build>
<resources>
<resource>
<directory>src/main/resource</directory> // path to the file (can be anything)
<filtering>true</filtering> // must be true this is what does replacement
</resource>
</resources>
</build>
add to maven commands: "resources:resources -Duserdefvariable=value"
//this setup generates a file in target folder after running the mvn commands, where the variable is injected with the value given by the user.
in Dockerfile now you can instead put in a command to run the file:
CMD ["sh", "path to the script in target folder"]
// in this script should be the commands that you want to use

ERROR: Cannot set priority of journalnode process 6520

I have three physical nodes with docker installed on them. I have configured a high available hadoop cluster among these nodes. The configuration is like this:
Core-site.xml:
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/tmp/hadoop/dfs/jn</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>10.32.0.1:2181,10.32.0.2:2181,10.32.0.3:2181</value>
</property>
Hdfs-site.xml:
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>10.32.0.1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>10.32.0.2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>10.32.0.1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>10.32.0.2:50070</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.
ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://
10.32.0.1:8485;10.32.0.2:8485;10.32.0.3:8485/mycluster</value>
</property>
<property>
<name>dfs.permissions.enable</name>
<value> false </value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hdfs/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hdfs</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-
check</name>
<value>false</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
I made hdfs user and ssh-passwordless. When I want to start journalnode to format namenode via this command:
sudo /opt/hadoop/bin/hdfs --daemon start journalnode
I receive this error:
ERROR: Cannot set priority of journalnode process 6520
Would you please what is wrong with my configuration to receive the error?
Thank you in advance.
Problem solved. I check the log in /opt/hadoop/logs/*.log and see this line:
Cannot make directory of /tmp/hadoop/dfs/journalnode.
First, I put configuration of journal node directory to hdfs-site.xml and made a journal node directory. Then I started journal node again and I faced with this error:
directory is not writable. So, I ran these commands to make the directory writable:
chmod 777 /tmp/hadoop/dfs/journalnode
chown -R root /tmp/hadoop/dfs/journalnode
Then I could start journal node.

Can you specify Arquillian to use a specific Wildfly configuration?

We are using Wildfly 8.0.0 Final but are in the process of moving to Wildfly 8.2. We are using Arquillian to run our unit tests in the container. I have noticed that Arquillian always seems to use the Wildfly standalone.xml.
It would be useful to be able to tell Arquillian what configuration to use when starting Wildfly. Wildfly comes with several different configuration files. It would be useful to have able to have Arquillian run wildfly with a specific configuration or even tell Arquillian what configuration to use for a test.
We use the Wildfly CLI to configure wildfly properties. This configuration is stored in the configuration file. If we could specify which configuration to use in starting Wildfly for a test we could then test our different configurations.
This seems reasonable, but I haven't found a way to do this.
The Wildfly configuration file is specified by the startup parameter, --server-config.
As John wrote, you can. Adding an example of /arquillian.xml:
(This is for WFly 10.x but it's been the same since AS 7 I think.)
<arquillian xmlns="http://jboss.org/schema/arquillian"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://jboss.org/schema/arquillian
http://jboss.org/schema/arquillian/arquillian_1_0.xsd">
<container qualifier="jbossas-managed" default="true">
<configuration>
<property name="jbossHome">target/wildfly-10.1.0.Final</property>
<property name="serverConfig">standalone-full.xml</property>
<property name="javaVmArguments">-Xms64m -Xmx2048m -Dorg.jboss.resolver.warning=true -Djboss.socket.binding.port-offset=100</property>
<property name="managementPort">10090</property>
<!--<property name="javaVmArguments">-Xms64m -Xmx2048m -Dorg.jboss.resolver.warning=true -agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=y</property>-->
</configuration>
</container>
</arquillian>
Yes, just specify serverConfig in your arquillian.xml. By default it will be standalone.xml

Resources