Apache Beam - CassandraIO write Async - 2.6.0 error - google-cloud-dataflow

I am using following libraries for apache beam to run dataflow job to read data from BigQuery and Store/Write to Cassandra.
beam-sdks-java-io-cassandra - 2.6.0
beam-sdks-java-io-jdbc - 2.6.0
beam-sdks-java-io-google-cloud-platform - 2.6.0
beam-sdks-java-core - 2.6.0
google-cloud-dataflow-java-sdk-all - 2.5.0
google-api-client -1.25.0
Since beam-sdks-java-io-cassandra > 2.3 version support saveAsync, I have upgraded all my libraries to 2.6.0.
After libraries update, I am getting following error at the time insert/save data to Cassandra.
java.lang.NoSuchMethodError: com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
at org.apache.beam.sdk.io.cassandra.CassandraServiceImpl$WriterImpl.write(CassandraServiceImpl.java:435)
at org.apache.beam.sdk.io.cassandra.CassandraIO$WriteFn.processElement(CassandraIO.java:493)
It looks like it is an issue of ListenableFuture from Gauva and Cassandra Driver.

I have a workaround for this. Use beam-sdks-java-io-cassandra - 2.4.0.
I'm working on fixing this incl. some other stuff... will update here.
Update: Most likely found the issue. Pushed a fix to my own fork. However it might take sometime until this can be made into a PR and released by the guys at Beam... If anyone wants to use the version I built - you can take a look at how it's done here

Related

Neo4J - How to recover a backup (or dump) from a previous version (3.0.3) to the actual 4.1.3 Community

How can we recover a dump from a database version 3.0.3 to the actual version 4.1.3 Community?
I've searched and tried many different alternatives, but no one have still worked.
In one of this alternatives we've tried the command
neo4j-admin backup --backup-dir=/home/ec2-user --verbose,
but the commands to recover are not working, for example:
neo4j-admin restore --from=/var/lib/neo4j/data/databases/neo4j --database=system --force
We are getting the error message:
WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
Unmatched arguments from index 0: 'restore', '--from=/var/lib/neo4j/data/databases/neo4j', '--verbose', '--database=neo4j', '--force'
Did you mean: store-info or report or memrec?
This seems like a reasonable (but time-consuming) approach:
If you do not have neo4j 3.0.3 installed, get it from here (as documented here), and install it.
Follow the simple 3.0 instructions for restoring a backup (to your 3.0.3 installation). The 3.0 restoration process is very different from the 4.x process.
Upgrade the 3.0.3 installation to 4.1.3, stepwise, by:
upgrading to 3.3, then
upgrading to 3.5.22, then
migrating to 4.x
Good luck.
The community edition of neo4j doesn't have the restore command, only the enterprise edition does. You can take a look at the table here.

Issue while running dataflow

I am getting below error while running dataflow job. I am trying to update my existing beam version to 2.11.0 but I am getting below error at run time.
java.lang.IncompatibleClassChangeError: Class
org.apache.beam.model.pipeline.v1.RunnerApi$StandardPTransforms$Primitives
does not implement the requested interface
com.google.protobuf.ProtocolMessageEnum at
org.apache.beam.runners.core.construction.BeamUrns.getUrn(BeamUrns.java:27)
at
org.apache.beam.runners.core.construction.PTransformTranslation.(PTransformTranslation.java:58)
at
org.apache.beam.runners.core.construction.UnconsumedReads$1.visitValue(UnconsumedReads.java:49)
at
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:666)
at
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
at
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
at
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
at
org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
at
org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
at
org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
at
org.apache.beam.runners.core.construction.UnconsumedReads.ensureAllReadsConsumed(UnconsumedReads.java:40)
at
org.apache.beam.runners.dataflow.DataflowRunner.replaceTransforms(DataflowRunner.java:868)
at
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:660)
at
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:173)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) at
org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
This usually means that the version of com.google.protobuf:protobuf-java that Beam was built with does not match the version at runtime. Does your pipeline code also depend on protocol buffers?
UPDATE: I have filed https://issues.apache.org/jira/browse/BEAM-6839 to track this. It is not expected.
I don't have enough rep to leave a comment, but I ran into this issue and later figured out that my problem was that I had different beam versions in my pom.xml. Some had 2.19 and some had 2.20.
I would do a quick search of your versions in the pom or gradle file to make sure they are all the same.
This may be caused by incompatible dependencies. I successfully upgraded beam from 2.2.0 to 2.20.0 by upgrading the dependencies at the same time.
beam.version: 2.20.0
guava.version: 29.0-jre
bigquery.version: v2-rev20191211-1.30.9
google-api-client.version: 1.30.9
google-http-client.version: 1.34.0
pubsub.version: v1-rev20200312-1.30.9

Google Cloud Dataflow Stuck

Recently I've been getting this error when running dataflow jobs written in Python. The thing is it used to work and no code has changed so I'm thinking it's got something to do with the env.
Error syncing pod d557f64660a131e09d2acb9478fad42f (""), skipping:
failed to "StartContainer" for "python" with CrashLoopBackOff:
"Back-off 20s restarting failed container=python pod=dataflow-)
Can anyone help me with this?
In my case, I was using Apache Beam SDK version 2.9.0 had the same problem.
I used setup.py and set-up field “install_requires” was filled dynamically by loading content of requirements.txt file. It’s okay if you’re using DirectRunner but DataflowRunner is too sensitive for dependencies on local files, so abandoning that technique and hard-coding dependencies from requirements.txt into “install_requires” solved an issue for me.
If you stuck on that try to investigate your dependencies and minimize them as much as you can. Please refer to the Managing Python Pipeline Dependencies documentation topic for help. Avoid using complex or nested code-structures or dependencies on the local filesystem.
Neri, thanks for your pointer to the SDK. I noticed that my requirements file was using an older version of the SDK 2.4.0. I've now changed everything to 2.6.0 and it's no longer stuck.

Annoying SNAPSHOT releases in Grails list-plugin-updates command

Is there a way to tell the Grails list-plugin-updates command to ignore SNAPSHOT releases?
bash-3.2$ grails list-plugin-updates
Plugins with available updates are listed below:
-------------------------------------------------------------
<Plugin> <Current> <Available>
resources 1.2.8 1.2.9-SNAPSHOT
remote-pagination 0.4.6 0.4.7
platform-core 1.0.0 1.0.1-SNAPSHOT
mongodb 2.0.1 3.0.1
mail 1.0.5 1.0.6-SNAPSHOT
joda-time 1.4 1.5-SNAPSHOT
hibernate 3.6.10.15 3.6.10.16-SNAPSHOT
There have been many times when a "real" update has been obscured by a SNAPSHOT release, so I still have to check the Grails website to see the current versions. We are building a production system and almost never want to include a SNAPSHOT release of any plugin.
Documentation:
http://grails.org/doc/latest/ref/Command%20Line/list-plugin-updates.html
Not currently no but it seems like it should be ignoring snapshots so feel free to raise a JIRA issue (and even contribute a fix!)

Creating new rails project throws exception when installing json

I'm kind of newbie in ruby and I'm trying to create a new project in my Win8.1 pc.
I'm using the latest rails installer. The installations ends successfully but when I run rails new my_project I got this error when it is Installing json.
Installing json (1.8.1) creating Makefile
0 [main] make 5852 handle_exceptions: Exception: STATUS_ACCESS_VIOLATION
439 [main] make 5852 open_stackdumpfile: Dumping stack trace to make.exe.stackdump
MSYS-1.0.17 Build:2011-04-24 23:39
Exception: STATUS_ACCESS_VIOLATION at eip=10002840
eax=00000000 ebx=00000000 ecx=75BE6DB4 edx=00000003 esi=00000024 edi=00000001
ebp=0028D638 esp=0028D4A0 program=C:\RailsInstaller\DevKit\bin\make.exe
cs=0023 ds=002B es=002B fs=0053 gs=002B ss=002B
I've try installing as administrator, compatibility mode (win7) and replacing DevKit but nothing works.
Thanks
If you are using rails on windows its obvious that you will have to face so many issues with gems installation, etc. I suggest you to switch to Unix-based system as suggested by Serge Vinogradoff.
If you still wants to continue with windows then you need to check if c compiler is installed on your machine properly or not. If not then RubyInstaller Development Kit may help you: http://rubyinstaller.org/add-ons/devkit/
The devkit installs a C-compiler (and some other stuff) to compile C-written parts.
Install it and try again to install the gem - perhaps with option --platform=ruby.
Also you can try to use json_pure written on pure Ruby
I would suggest to switch to Unix-based system if you want to work with Rails.

Resources