pyhdfs.HdfsIOException: Failed to find datanode, suggest to check cluster health. excludeDatanodes=null - docker

I am trying to run hadoop using docker provided here:
https://github.com/big-data-europe/docker-hadoop
I use the following command:
docker-compose up -d
to up the service and am able to access it and browse file system using: localhost:9870. Problem rises whenever I try to use pyhdfs to put file on HDFS. Here is my sample code:
hdfs_client = HdfsClient(hosts = 'localhost:9870')
# Determine the output_hdfs_path
output_hdfs_path = 'path/to/test/dir'
# Does the output path exist? If not then create it
if not hdfs_client.exists(output_hdfs_path):
hdfs_client.mkdirs(output_hdfs_path)
hdfs_client.create(output_hdfs_path + 'data.json', data = 'This is test.', overwrite = True)
If test directory does not exist on HDFS, the code is able to successfully create it but when it gets to the .create part it throws the following exception:
pyhdfs.HdfsIOException: Failed to find datanode, suggest to check cluster health. excludeDatanodes=null
What surprises me is that my code is able to create the empty directory but fails to put the file on HDFS. My docker-compose.yml file is exactly the same as the one provided in the github repo. The only change I've made is in the hadoop.env file where I change:
CORE_CONF_fs_defaultFS=hdfs://namenode:9000
to
CORE_CONF_fs_defaultFS=hdfs://localhost:9000
I have seen this other post on sof and tried the following command:
hdfs dfs -mkdir hdfs:///demofolder
which works fine in my case. Any help is much appreciated.

I would keep the default CORE_CONF_fs_defaultFS=hdfs://namenode:9000 setting.
Works fine for me after adding a forward slash to the paths
import pyhdfs
fs = pyhdfs.HdfsClient(hosts="namenode")
output_hdfs_path = '/path/to/test/dir'
if not fs.exists(output_hdfs_path):
fs.mkdirs(output_hdfs_path)
fs.create(output_hdfs_path + '/data.json', data = 'This is test.')
# check that it's present
list(fs.walk(output_hdfs_path))
[('/path/to/test/dir', [], ['data.json'])]

Related

how to resolve influxdb.conf parse config error

while running influxdb:1.0 in docker i get this following error
[run] 2019/01/13 09:04:14 InfluxDB starting, version 1.0.2, branch master, commit ff307047057b7797418998a4ed709b0c0f346324
[run] 2019/01/13 09:04:14 Go version go1.6.2, GOMAXPROCS set to 1
[run] 2019/01/13 09:04:14 Using configuration at: /etc/influxdb/influxdb.conf
run: parse config: Near line 1 (last key parsed 'Merging'): Expected key separator '=', but got 'w' instead.
this is my first 3 lines from .conf file
Merging with configuration at: /etc/influxdb/influxdb.conf
reporting-disabled = false
bind-address = ":8088"
does anybody know how to resolve this?
thanks
According to the configuration file that you posted you have the following issue at the first line:
Merging with configuration at: /etc/influxdb/influxdb.conf
Is being treated as a normal config line so it should be in a form like this
Merging = with configuration at: /etc/influxdb/influxdb.conf
What you need to do is to comment the first line to avoid this issue.
The second note on your config file - it is not related to this issue in specific but you may get other errors - that the third line needs to be under [admin] so the final config should be like this:
# Merging with configuration at: /etc/influxdb/influxdb.conf
reporting-disabled = false
[admin]
bind-address = ":8088"
It would be better if you got the original config file and then start modifying it in order to follow the same standards without getting new issues related to the file format.

How to change the Python version in Azure Machine Learning sdk ContainerImage with CondaDependencies

I am trying to get my Faster R-CNN model into an Container Instance on ACI. For that I need my docker image to posses python version 3.5.*. I specify that in my conda yaml file, but every time I spin an instance up and docker run -it *** /bin/bash into it I see that it only has Python 3.6.7.
https://user-images.githubusercontent.com/21140767/50680590-82b20b80-1008-11e9-9bfe-4a0e71084ce0.png
How can I get my Docker image to have Python version 3.5.*? I already tried conda installing Python version 3.5.2, but that didn't work as eventually it didn't posses 3.5.2, but only 3.6.7. (dfimage lets you see the dockerfile from which the image was created, https://hub.docker.com/r/chenzj/dfimage/).
https://user-images.githubusercontent.com/21140767/50680673-d6245980-1008-11e9-9d48-71a7c150d925.png
My yaml:
name: project_environment
dependencies:
- python=3.5.2
- pip:
- matplotlib
- opencv-python==3.4.3.18
- azureml-core==1.0.6
- numpy
- cntk
- cython
channels:
- anaconda
Notebook cell:
from azureml.core.conda_dependencies import CondaDependencies
svmandss = CondaDependencies.create(python_version="3.5.2", pip_packages=[
"matplotlib",
"opencv-python==3.4.3.18",
"azureml-core",
"numpy",
"cntk",
"cython"], )
svmandss.add_channel('anaconda')
with open("fasterrcnn.yml","w") as f:
f.write(svmandss.serialize_to_string())
Another notebook cell with ContainerImage specifications.
image_config = ContainerImage.image_configuration(execution_script="score_fasterrcnn.py",runtime="python",conda_file="./fasterrcnn.yml",dependencies=listdir("utils"),docker_file="./Dockerfile")
service = Webservice.deploy_from_model(workspace=ws,
name='faster-rcnn',
deployment_config=aciconfig,
models=[Model(workspace=ws, name='Faster-RCNN')],
image_config=image_config)
service.wait_for_deployment(show_output=True)
Note
For better readability see my GitHub issue: (https://github.com/Azure/MachineLearningNotebooks/issues/163).
Currently, the version of Python is fixed to what's in Azure ML's base image, when deploying the web service. We're investigating removing this limitation in future.
Since this is one of the top Google answers when searching for "azureml python version" I'm posting the answer here. The documentation is not very clear when it comes to this, but the following will work:
from azureml.core import Workspace
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
ws = Workspace.from_config()
# This is the important part
conda_dep = CondaDependencies(conda_dependencies_file_path="pipeline/environment.yml")
aml_run_config = RunConfiguration(conda_dependencies=conda_dep)
# Define compute target - must be preconfigured in th workspace
compute_target = ws.compute_targets['my-azureml-target']
aml_run_config.target = compute_target
from azureml.pipeline.steps import PythonScriptStep
script_source_dir = "./pipeline"
step_1_script = "test.py"
step_1 = PythonScriptStep(
script_name=step_1_script,
source_directory=script_source_dir,
compute_target=compute_target,
runconfig=aml_run_config,
allow_reuse=True
)
from azureml.pipeline.core import Pipeline
# Build the pipeline
pipeline1 = Pipeline(workspace=ws, steps=[step_1])
from azureml.core import Experiment
# Submit the pipeline to be run
pipeline_run1 = Experiment(ws, 'Test-pipeline').submit(pipeline1)
pipeline_run1.wait_for_completion(show_output=True)
This assumes the following directory structure:
root/
create_pipeline.py
pipeline/
test.py
environment.yml
where create_pipeline.py is the file above, test.py is the script you would like to run and environment.yml is the conda environment file - including the python version.
I was able to change the Python version by registering the environment in Azure ML Workspace:
from azureml.core.environment import Environment, Workspace
environment = Environment.from_conda_specification(name='myenv', file_path='environment.yml')
environment.python.user_managed_dependencies = False
workspace = Workspace.from_config()
environment = environment.register(workspace=workspace)
env_build = environment.build(workspace=workspace)
Then, configure the endpoint for publishing as follows:
from azureml.core.model import InferenceConfig
environment = Environment.get(workspace=workspace, name='myenv')
inference_config = InferenceConfig(
entry_script='inference.py',
source_directory='.',
environment=environment
)
This is using Azure ML SDK 1.29.0. Perhaps this has already been fixed and the original method works as well, but I didn't test that.
EDIT:
This is no longer an issue for me. I found another way to get my code to work with python version 3.6.7.
This is however still an issue if you ask me. If in the future I do need python version 3.5 then there will not be a solution as of now.
You can still post an answer if you would like.

Docker, ENOENT: no such file or directory

I have a Storage constant that is used in a file called listingController.js
const storage = Storage({
keyFilename: "../key/keyname.json"
});
Everything works fine when I'm not using Docker but after I create a Docker image and deploy it on server I get the following error:
ENOENT: no such file or directory, open '/key/keyname.json'
at wrapError (/app/node_modules/gcs-resumable-upload/build/src/index.js:17:12)
at /app/node_modules/gcs-resumable-upload/build/src/index.js:235:19
at getToken (/app/node_modules/google-auto-auth/index.js:27:9)
at getAuthClient (/app/node_modules/google-auto-auth/index.js:233:9)
at <anonymous>
Here I see a problem that the '..' is ignored in front of the path which is why I think that file is not found.
Here is my project structure:
src
--- key
----- keyname.json
----- firebasekeyfilename.json
--- controller
----- listingController.js
----- firebaseController.js
I have tried all different combinations of file names and paths but I cannot get it to find that file.
Does anyone have a clue why this is happening?
In my firebaseController I have the following reference to a similar file in the same folder and it works fine.
var serviceAccount = require("../key/firebasekeyfilename");
The only difference is that the path is inside require() and I guess that requires a different path.
Been stuck with this for a couple of days now, any pointers would be appreciated, thank!

symfony/yaml backed symfony/config not parsing environment variables

I have recreated a simple example in this tiny github repo. I am attempting to use symfony/dependency-injection to configure monolog/monolog to write logs to php://stderr. I am using a yaml file called services.yml to configure dependency injection.
This all works fine if my yml file looks like this:
parameters:
log.file: 'php://stderr'
log.level: 'DEBUG'
services:
stream_handler:
class: \Monolog\Handler\StreamHandler
arguments:
- '%log.file%'
- '%log.level%'
log:
class: \Monolog\Logger
arguments: [ 'default', ['#stream_handler'] ]
However, my goal is to read the path of the log files and the log level from environment variables, $APP_LOG and LOG_LEVEL respectively. According to The symphony documentations on external paramaters the correct way to do that in the services.yml file is like this:
parameters:
log.file: '%env(APP_LOG)%'
log.level: '%env(LOGGING_LEVEL)%'
In my sample app I verified PHP can read these environment variables with the following:
echo "Hello World!\n\n";
echo 'APP_LOG=' . (getenv('APP_LOG') ?? '__NULL__') . "\n";
echo 'LOG_LEVEL=' . (getenv('LOG_LEVEL') ?? '__NULL__') . "\n";
Which writes the following to the browser when I use my original services.yml with hard coded values.:
Hello World!
APP_LOG=php://stderr
LOG_LEVEL=debug
However, if I use the %env(VAR_NAME)% syntax in services.yml, I get the following error:
Fatal error: Uncaught UnexpectedValueException: The stream or file "env_PATH_a61e1e48db268605210ee2286597d6fb" could not be opened: failed to open stream: Permission denied in /var/www/vendor/monolog/monolog/src/Monolog/Handler/StreamHandler.php:107 Stack trace: #0 /var/www/vendor/monolog/monolog/src/Monolog/Handler/AbstractProcessingHandler.php(37): Monolog\Handler\StreamHandler->write(Array) #1 /var/www/vendor/monolog/monolog/src/Monolog/Logger.php(337): Monolog\Handler\AbstractProcessingHandler->handle(Array) #2 /var/www/vendor/monolog/monolog/src/Monolog/Logger.php(532): Monolog\Logger->addRecord(100, 'Initialized dep...', Array) #3 /var/www/html/index.php(17): Monolog\Logger->debug('Initialized dep...') #4 {main} thrown in /var/www/vendor/monolog/monolog/src/Monolog/Handler/StreamHandler.php on line 107
What am I doing wrong?
Ok you need a few things here. First of all you need version 3.3 of Symfony, which is still in beta. 3.2 was the released version when I encountered this. Second you need to "compile" the environment variables.
Edit your composer.json with the following values and run composer update. You might need to update other dependencies. You can substitute ^3.3 with dev-master.
"symfony/config": "^3.3",
"symfony/console": "^3.3",
"symfony/dependency-injection": "^3.3",
"symfony/yaml": "^3.3",
You will likely have to do this for symfony/__WHATEVER__ if you have other symfony components.
Now in you're code after you load your yaml configuration into your dependency container you compile it.
So after you're lines here (perhaps in bin/console):
$container = new ContainerBuilder();
$loader = new YamlFileLoader($container, new FileLocator(__DIR__ . DIRECTORY_SEPARATOR . '..'));
$loader->load('services.yml');
Do this:
$container->compile(true);
Your IDE's intellisense might tell you compile takes no parameters. That's ok. That's because compile() grabs its args indirectly via func_get_arg().
public function compile(/*$resolveEnvPlaceholders = false*/)
{
if (1 <= func_num_args()) {
$resolveEnvPlaceholders = func_get_arg(0);
} else {
. . .
}
References
Github issue where this was discussed
Pull request to add compile(true)
Using this command after loading your services.yaml file should help.
$containerBuilder->compile(true);
given your files gets also validated by the checks for proper configurations which this method also does. The parameter is $resolveEnvPlaceholders which makes environmental variables accessible to the yaml services configuration.

RMySQL not working with a cnf file

I am trying to connect to a MySQL server through R and it works perfect with the follwoing line:
con <- dbConnect(MySQL(), user="user", password="password",dbname="dbname", host="localhost", port=3306)
But, I would like to use a cnf file so that my user/apssword credentials donot appear in my code and tried the following:
rmysql.settingsfile<-"mydefault.cnf"
rmysql.db<-"test_db"
drv<-dbDriver("MySQL")
con<-dbConnect(drv,default.file=rmysql.settingsfile,group=rmysql.db)
And this is how my cnf file looks:
[test_db]
user=user
password=password
database=dbname
host=localhost
port=3306
It is in the same folder as in my R script which is my current working directory. But, I run into the following error:
Error in mysqlNewConnection(drv, ...) :
RS-DBI driver: (Failed to connect to database: Error: Access denied for user 'ODBC'#'localhost' (using password: NO)
)
Any suggestions, please?
Thanks so much
I had this problem very recently. RMySQL looks in the root directory for these files so you need to fully qualify the location of the file. i.e.:
rmysql.settingsfile<-"/home/MD-Tech/mydefault.cnf"
or
rmysql.settingsfile<-"c:\Users\MD-Tech\rfiles\mydefault.cnf"
Two things could be going on.
The CNF file should be encrypted, password should say password = ****. The MySQL documentation shows how to create a CNF file. Below would work for your code to create the CNF
shell> mysql_config_editor set --login-path=test_db --host=localhost --user=user --password
press enter without typing password, you will be prompted to enter it
The second thing is that user = NULL and password = NULL are missing as referenced in the src_mysql documentation
rmysql.settingsfile <- "~/.mylogin.cnf"
rmysql.db <- "test_db"
drv <- dbDriver("MySQL")
con <- dbConnect(drv, default.file = rmysql.settingsfile, group = rmysql.db, user = NULL, password = NULL)
When you add these and run the code, you should be set.
Fing something working at: https://www.r-bloggers.com/mysql-and-r/
Not in configuration file... but work.
con <- dbConnect(MySQL(),
user="me", password="nuts2u",
dbname="my_db", host="localhost")
Ya, getting this setup for the first time can be like pulling cats' teeth! Here is what I did while running R on a Droplet (Ubuntu 16.04, MySQL 5.7.16).
First, make sure you can at least login successfully to MySQL through the terminal
mysql -u kevin -p
Next, run R and verify that you can login in directly with dbConnect() using a user name and password
mydb = dbConnect(drv, user='kevin', password='ilovecats', dbname='catnapdb', host='127.0.0.1', port=3306)
Edit your mysql.cnf text file and at the bottom add a new group (exact name of this file and its location will depend on operating system and versions).
[whiskerpatrol]
user = kevin
password = ilovecats
host = 127.0.0.1
port = 3306
database = catnapdb

Resources