EMR 6 Beta with Docker Support has S3 Access Issue - docker

I am exploring the new EMR 6.0.0 with Docker support in order to make decision if we want to use it. One of our projects is written in Scala 2.11. But EMR 6.0.0 comes with Spark built from Scala 2.12. So I switched to try 6.00-beta, which is Spark 2.4.3 built from Scala 2.11. If it works on 6.0.0-beta, then we will upgrade our code to Scala 2.12 and use 6.0.0.
A few issues I am having are when I tried to run my Scala spark job:
When it tried to read parquet from S3, I got error: java.lang.RuntimeException: Cannot create temp dirs: [/mnt/s3]
When I tried to make API call with https, I got error: usun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.
When it tried to read files from S3, I got error: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found. I was able to able to hack this one by passing the path by --jars. Maybe not the best solution.
I am guessing there must be something I need to set either during bootstrap or in the Docker file.
Can someone please help? Thanks!

I figure out the S3 issue. In beta version, /mnt/s3 is not mounted and given the read and write permission.
So I need to add the "docker.allowed.rw-mounts" to the container-executor configuration like below:
docker.allowed.rw-mounts=/etc/passwd,/mnt/s3

Related

Use Of experiments=no_use_multiple_sdk_containers in Google cloud dataflow

Issue Summary:
Hi,
I am using avro version 1.11.0 for parsing an avro file and decoding it. We have a custom requirement, so i am not able to use ReadFromAvro. When trying this with dataflow there arises a dependency issues as avro-python3 with version 1.82 is already available. The issue is of class TimestampMillisSchema which is not present in avro-python3. It fails stating Attribute TimestampMillisSchema not found in avro.schema. I then tried passing a requirements file with avro==1.11.0 but now the dataflow was not able to start giving error "Error syncing pod" which seems to be because of dependencies conflicts.
To Solve the issue , we set an experiment flag (--experiments=no_use_multiple_sdk_containers ) which ran fine.
I want to know a better solution of my issue and also does the above flag will effect the pipeline performance.
Please try with the dataflow run command:
--prebuild_sdk_container_engine=cloud_build --experiments=use_runner_v2
this would use cloud build to build the container with your extra dependencies and then would use it within the dataflow run.

How to Update a python library thats already present in GCP dataflow

I am using avro version 1.11.0 for parsing an avro file and decoding it. We have a custom requirement, so i am not able to use ReadFromAvro. When trying this with dataflow there arises a dependency issues as avro-python3 with version 1.82 is already available. The issue is of class TimestampMillisSchema which is not present in avro-python3. It fails stating Attribute TimestampMillisSchema not found in avro.schema.
I then tried passing a requirements file with avro==1.11.0 but now the dataflow was not able to start giving error "Error syncing pod" which seems to be because of dependencies conflicts.
Any Idea/help on how this should be resolved.
Thanks

Hyperledger Iroha: Can't generate genesis-block

I am trying to generate a new genesis-block in Hyperledger Iroha as it is suggested in
https://iroha.readthedocs.io/en/latest/getting_started/index.html#starting-iroha-node
and
https://hyperledger.github.io/iroha-api/#create-genesis-block
but unfortunately I can't do it because I am always getting the same error message.
$ cat peer.list
localhost:10001
$ ./iroha-cli --genesis_block --peers_address peer.list
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::out_of_range> >'
what(): bimap<>: invalid key
Aborted (core dumped)
I am receiving this error both on my local machine where I had compiled Iroha from scratch using the source code, as well as within an Iroha container.
I think I have the correct dependencies, otherwise I would have not been able to build Iroha from scratch. Also, note that I can start irohad correctly by using the configuration example from https://iroha.readthedocs.io/en/latest/getting_started/index.html#launching-iroha-daemon.
Any help or suggestion is greatly appreciated.
There was, indeed, a bug affecting the permissions needed to generate a block. It is fixed now and should not occur: https://github.com/hyperledger/iroha/pull/1351
This is a known issue in the development of hyperledger iroha, see here: https://github.com/hyperledger/iroha/issues/1362.
It arises when iroha is compiled with Ansible Playbook.
Try to uninstall Ansible from your system and re-compile iroha and you shouldn't encounter the same error.
Obviously this is just a work around, and you won't be able to take advantage of the ansible capabilities.

Grails 3 - Gradle: Binary file gets corrupted during build on Heroku

I am trying to use the Google Rest API from a Heroku instance. I am having problems with my certificate file, but everything works as expected locally.
The certificate is a PKCS 12 certificate, and the exception I get is:
java.io.IOException: DerInputStream.getLength(): lengthTag=111, too
big.
I finally found the source of this problem. Somewhere along the way the certificate file is modified, locally it is 1732 bytes but on the Heroku instance it is 3024 bytes. But I have no idea when this occurs. I build with the same command locally (./gradlew stage) and execute the resulting jar with the same command.
The file is stored in grails-app/conf, I don't know any better place to put it. I am reading it using this.getClass().getClassLoader().getResourceAsStream(...)
I found similar problems can occur when using Maven with resource filtering. But I haven't found any signs of Grails or Gradle doing the same kind of resource filtering.
Does anyone have any clues about what this can be?

symfony plugin installation failing [bhLDAPAuthPlugin]

I'm working on a symfony project and I need a user access conected to an LDAP server. So I searched for something already done to add to my app and found this plugin that has all I wanted.
So I tried to install with the command $ php symfony plugin:install bhLDAPAuthPlugin
for some reason it throws me this error:
No release avaiable for plugin "bhLDAPAuthPlugin"
I don't really understand what that message means. I've checked the spell of the command (also copied the command given in the page of the plugin) and same error appears. If I had no all requeriments for instalation, other errors would be thrown, right?
PS: If you know some easy way to implement by myself the comunication with LDAP (Microsoft Active Directory) will also be appreciated.
No exactly sure how to solve the error message, perhaps it helps is specifically specify which version you wish to install.
Otherwise there's an easy workaround:
Just download the tgz file from here:
http://www.symfony-project.org/plugins/bhLDAPAuthPlugin/6_0_0
and do
php symfony plugin:install bhLDAPAuthPlugin-etc-etc.tgz

Resources