Custom source in flume - flume

I have created a custom source for flume and copied the jar files in the following locations :
mkdir -p /usr/lib/flume-ng/plugins.d/MyFlumeSource/lib/MyFlumeSource.jar
chown -R flume:flume /var/lib/flume-ng/
Also in /etc/flume-ng/conf/flume-env.sh
FLUME_CLASSPATH="/usr/lib/flume-ng/plugins.d/MyFlumeSource/lib/MyFlumeSource.jar"
Updated the Flume configuration file as
# Name the components on this agent
tail1.sources = seq-source
tail1.channels = mem-channel
tail1.sinks = hdfs-sink
# Describe/configure Source
tail1.sources.seq-source.type = org.custom.flume.source.MySource
# Describe the sink
tail1.sinks.hdfs-sink.type = hdfs
tail1.sinks.hdfs-sink.hdfs.path = /user/flume
tail1.sinks.hdfs-sink.hdfs.filePrefix = log
tail1.sinks.hdfs-sink.hdfs.rollInterval = 0
tail1.sinks.hdfs-sink.hdfs.rollCount = 10000
tail1.sinks.hdfs-sink.hdfs.fileType = DataStream
# Use a channel which buffers events in file
tail1.channels.mem-channel.type = memory
tail1.channels.mem-channel.capacity = 1000
tail1.channels.mem-channel.transactionCapacity = 100
# Bind the source and sink to the channel
tail1.sources.seq-source.channels = mem-channel
tail1.sinks.hdfs-sink.channel = mem-channel
Trying to run the flume agent as
flume-ng agent --conf /var/lib/flume-ng/plugins.d/MyFlumeSource/lib/MyFlumeSource.jar --conf-file /etc/flume-ng/conf/flume-conf.properties --name tail1
flume-ng agent --conf-file /etc/flume-ng/conf/flume-conf.properties --name tail1
In both cases I am getting the following error :
ERROR node.PollingPropertiesFileConfigurationProvider: Failed to load configuration data. Exception follows.
org.apache.flume.FlumeException: Unable to create source: seq-source, type: org.custom.flume.source.MySource, class: org.custom.flume.source.MySource
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:48)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:322)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:97)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InstantiationException
at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:379)
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:44)
... 10 more
If any one aware of it please help me.

You installed the plugin under /usr/lib/flume-ng but you are trying to run it from /var/lib/flume-ng and /etc/flume-ng.
In addition, it must be said that --conf option should be pointing to the entire configuration folder.

Are you looking for flume Interceptor? You can add you class as Interceptors which processes message.
If Yes, you can do it in 2 simple steps.
1) Add your jar file to /lib
2)Add config in flume-conf.properties mentioning your Builder class name.

Related

jib gradle plugin + static docker client: cannot build image due to permission error: layer.tar: A required privilege is not held by the client

I want to build Docker image with jib Gradle plugin in Windows, and use a Windows docker client to load it into my WSL 2 container running dockerd, and use WSL 2 as server. Resource-wise I think this is the lightest solution. .
On WSL 2 side, I run dockerd service in Ubuntu 20 on WSL 2, and it's listening on [::]:2375. TLS disabled(--tls=false), only http.
On Windows side, I only downloaded the Docker client(static client, from https://download.docker.com/win/static/stable/x86_64/), and added the dynamic WSL 2 container IP into the insecure-registry in daemon.json. This file is put in the same dir of docker.exe client.
On Intellij IDEA side, I use gradle 5.2.1 wrapper, and jib plugin 3.2.1. I configure jib as follows:
jib {
dockerClient.executable = 'E:\\coding\\environment\\docker\\docker.exe'
dockerClient.environment = [ DOCKER_HOST: '172.21.169.180:2375',
DOCKER_INSECURE_REGISTRIES: "172.21.169.180:5000"]
from.image = 'docker://mini/java#sha256:d3ded1fd0df592c33185d930d976304994bbc539c7bf70a6091cb3da0f7e11fa'
to.image = 'spring-plugins-demo'
container.mainClass = 'dev.westerngun.oldway.ApplicationV1'
}
I know it can connect to dockerd in my WSL 2, because before I add the dynamic IP of Ubuntu the error was not able to connect to daemon. Now it can load the base image and start building.
Then, when I run jibDockerBuild --stacktrace, I see this error:
Execution failed for task ':jibDockerBuild'.
> com.google.cloud.tools.jib.plugins.common.BuildStepsExecutionException: C:\Users\WESTER~1\AppData\Local\Temp\16164656866093264693\33e3f3775358985441c3bea658f06f5307326c83f9c0bcbf8aa4acb327abffde\layer.tar: �ͻ���û���������Ȩ��
* Try:
Run with --info or --debug option to get more log output. Run with --scan to get full insights.
* Exception is:
org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':jibDockerBuild'.
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:121)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:117)
at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:184)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:110)
at org.gradle.api.internal.tasks.execution.ResolveIncrementalChangesTaskExecuter.execute(ResolveIncrementalChangesTaskExecuter.java:84)
at org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:91)
at org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionStateTaskExecuter.execute(ResolveBeforeExecutionStateTaskExecuter.java:74)
at org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58)
at org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:109)
at org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionOutputsTaskExecuter.execute(ResolveBeforeExecutionOutputsTaskExecuter.java:67)
at org.gradle.api.internal.tasks.execution.ResolveAfterPreviousExecutionStateTaskExecuter.execute(ResolveAfterPreviousExecutionStateTaskExecuter.java:46)
at org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:93)
at org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:45)
at org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:94)
at org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57)
at org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:56)
at org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:36)
at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:63)
at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:49)
at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:46)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:416)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:406)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:250)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:158)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:102)
at org.gradle.internal.operations.DelegatingBuildOperationExecutor.call(DelegatingBuildOperationExecutor.java:36)
at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:46)
at org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:43)
at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:355)
at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:343)
at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:336)
at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:322)
at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker$1.execute(DefaultPlanExecutor.java:134)
at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker$1.execute(DefaultPlanExecutor.java:129)
at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:202)
at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.executeNextNode(DefaultPlanExecutor.java:193)
at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:129)
at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
Caused by: org.gradle.internal.UncheckedException: com.google.cloud.tools.jib.plugins.common.BuildStepsExecutionException: C:\Users\WESTER~1\AppData\Local\Temp\16164656866093264693\33e3f3775358985441c3bea658f06f5307326c83f9c0bcbf8aa4acb327abffde\layer.tar: �ͻ���û���������Ȩ��
at org.gradle.internal.UncheckedException.throwAsUncheckedException(UncheckedException.java:67)
at org.gradle.internal.UncheckedException.throwAsUncheckedException(UncheckedException.java:41)
at org.gradle.internal.reflect.JavaMethod.invoke(JavaMethod.java:106)
at org.gradle.api.internal.project.taskfactory.StandardTaskAction.doExecute(StandardTaskAction.java:48)
at org.gradle.api.internal.project.taskfactory.StandardTaskAction.execute(StandardTaskAction.java:41)
at org.gradle.api.internal.project.taskfactory.StandardTaskAction.execute(StandardTaskAction.java:28)
at org.gradle.api.internal.AbstractTask$TaskActionWrapper.execute(AbstractTask.java:705)
at org.gradle.api.internal.AbstractTask$TaskActionWrapper.execute(AbstractTask.java:672)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$4.run(ExecuteActionsTaskExecuter.java:338)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:402)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:394)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:250)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:158)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:92)
at org.gradle.internal.operations.DelegatingBuildOperationExecutor.run(DelegatingBuildOperationExecutor.java:31)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeAction(ExecuteActionsTaskExecuter.java:327)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeActions(ExecuteActionsTaskExecuter.java:312)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.access$200(ExecuteActionsTaskExecuter.java:75)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$TaskExecution.execute(ExecuteActionsTaskExecuter.java:158)
at org.gradle.internal.execution.impl.steps.ExecuteStep.execute(ExecuteStep.java:46)
at org.gradle.internal.execution.impl.steps.CancelExecutionStep.execute(CancelExecutionStep.java:34)
at org.gradle.internal.execution.impl.steps.TimeoutStep.executeWithoutTimeout(TimeoutStep.java:69)
at org.gradle.internal.execution.impl.steps.TimeoutStep.execute(TimeoutStep.java:49)
at org.gradle.internal.execution.impl.steps.CatchExceptionStep.execute(CatchExceptionStep.java:34)
at org.gradle.internal.execution.impl.steps.CreateOutputsStep.execute(CreateOutputsStep.java:49)
at org.gradle.internal.execution.impl.steps.SnapshotOutputStep.execute(SnapshotOutputStep.java:42)
at org.gradle.internal.execution.impl.steps.SnapshotOutputStep.execute(SnapshotOutputStep.java:28)
at org.gradle.internal.execution.impl.steps.CacheStep.executeWithoutCache(CacheStep.java:133)
at org.gradle.internal.execution.impl.steps.CacheStep.lambda$execute$5(CacheStep.java:83)
at org.gradle.internal.execution.impl.steps.CacheStep.execute(CacheStep.java:82)
at org.gradle.internal.execution.impl.steps.CacheStep.execute(CacheStep.java:37)
at org.gradle.internal.execution.impl.steps.PrepareCachingStep.execute(PrepareCachingStep.java:33)
at org.gradle.internal.execution.impl.steps.StoreSnapshotsStep.execute(StoreSnapshotsStep.java:38)
at org.gradle.internal.execution.impl.steps.StoreSnapshotsStep.execute(StoreSnapshotsStep.java:23)
at org.gradle.internal.execution.impl.steps.SkipUpToDateStep.executeBecause(SkipUpToDateStep.java:95)
at org.gradle.internal.execution.impl.steps.SkipUpToDateStep.lambda$execute$0(SkipUpToDateStep.java:88)
at org.gradle.internal.execution.impl.steps.SkipUpToDateStep.execute(SkipUpToDateStep.java:52)
at org.gradle.internal.execution.impl.steps.SkipUpToDateStep.execute(SkipUpToDateStep.java:36)
at org.gradle.internal.execution.impl.DefaultWorkExecutor.execute(DefaultWorkExecutor.java:34)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:109)
... 37 more
Caused by: com.google.cloud.tools.jib.plugins.common.BuildStepsExecutionException: C:\Users\WESTER~1\AppData\Local\Temp\16164656866093264693\33e3f3775358985441c3bea658f06f5307326c83f9c0bcbf8aa4acb327abffde\layer.tar: �ͻ���û���������Ȩ��
at com.google.cloud.tools.jib.plugins.common.JibBuildRunner.runBuild(JibBuildRunner.java:285)
at com.google.cloud.tools.jib.gradle.BuildDockerTask.buildDocker(BuildDockerTask.java:126)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.gradle.internal.reflect.JavaMethod.invoke(JavaMethod.java:103)
... 75 more
Caused by: java.nio.file.FileSystemException: C:\Users\WESTER~1\AppData\Local\Temp\16164656866093264693\33e3f3775358985441c3bea658f06f5307326c83f9c0bcbf8aa4acb327abffde\layer.tar: �ͻ���û���������Ȩ��
at com.google.cloud.tools.jib.tar.TarExtractor.extract(TarExtractor.java:93)
at com.google.cloud.tools.jib.tar.TarExtractor.extract(TarExtractor.java:49)
at com.google.cloud.tools.jib.builder.steps.LocalBaseImageSteps.cacheDockerImageTar(LocalBaseImageSteps.java:217)
at com.google.cloud.tools.jib.builder.steps.LocalBaseImageSteps.lambda$retrieveDockerDaemonLayersStep$0(LocalBaseImageSteps.java:133)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
The error message in Chinese is
C:\Users\WESTER~1\AppData\Local\Temp\16164656866093264693\33e3f3775358985441c3bea658f06f5307326c83f9c0bcbf8aa4acb327abffde\layer.tar: 客户端没有所需的特权。
And I think it can be translated into "A required permission is not held by the client".
I suspect this is because my user is not added to docker-user group, as stated here. But, I uninstalled Docker toolbox and I don't see this group anymore, as it sets DOCKER_HOST and interferes with my setup. Secondly, I don't have Local Users and Group available, seems Windows 10 Home edition does not have it.
Should I try to install gpedit in my Home Edition, add the group and try? But without Docker toolbox, I doubt it would work. Docker documentation explains here that it creates the group and configure it to ensure separation of permissions between root/admin and non-root/non-admin users; I think only creating that group will not work. https://docs.docker.com/desktop/windows/permission-requirements/
But, when I use docker.exe to connect to WSL 2 and save a tar file to C:\Users\WESTER~1\AppData\Local\Temp, it works. The tar file is created and not corrupted. So I think it's not a permission error; anyone can access that dir.
Windows bundled bsd-tar.exe has nothing to do with it; renaming the tar.exe in System32 and build, the error is the same.
It is solved when I run cmd as admin and cd to project dir and do gradlew jibDockerBuild. Image built and loaded into WSL 2 daemon successfully. It is indeed file system permission error.
Although still very strange(as I allowed the permission to everyone on that folder), but at least this is one workaround.
Another workaround, even better:
As per https://github.com/microsoft/WSL/issues/4983, I changed the jib config to set docker host to be http://[::1]:2375, and suddenly it works. Seems only ipv6 is bind.
Now not only the host is reachable, even permission error disappears; no insecure_registries settings needed, neither.

PredictionIO text classification quick start failing when reading the data

I'm following this quick start after starting this ready-to-use PredictionIO Amazon EC2 instance and after running these commands it fails in the pio train:
pio app new MyTextApp
pio import --appid 1 --input data/stopwords.json
pio import --appid 1 --input data/emails.json
pio build
pio train
...
Data set is empty, make sure event fields match imported data.
Exception in thread "main" java.lang.IllegalStateException: Haven't seen any document yet.
at org.apache.spark.mllib.feature.IDF$DocumentFrequencyAggregator.idf(IDF.scala:132)
at org.apache.spark.mllib.feature.IDF.fit(IDF.scala:56)
at uk.co.news.PreparedData.<init>(Preparator.scala:70)
at uk.co.news.Preparator.prepare(Preparator.scala:47)
at uk.co.news.Preparator.prepare(Preparator.scala:43)
Since there is no error when running the command to import emails, I don't understand why the data set is still empty. I double-checked the email.json file and the data is indeed there and this is the result when running
pio import --appid 1 --input data/emails.json
ubuntu#ip-172-31-0-60:~/pio-textclassification$ pio import --appid 1 --input data/emails.json
[INFO] [Runner$] Submission command: /opt/spark-1.4.1-bin-hadoop2.6/bin/spark-submit --class io.prediction.tools.imprt.FileToEvents --files file:/opt/PredictionIO/conf/log4j.properties --driver-class-path /opt/PredictionIO/conf file:/opt/PredictionIO/lib/pio-assembly-0.9.4.jar --appid 1 --input file:/home/ubuntu/pio-textclassification/data/emails.json --env PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/ubuntu/.pio_store,PIO_HOME=/opt/PredictionIO,PIO_FS_ENGINESDIR=/home/ubuntu/.pio_store/engines,PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio,PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc,PIO_FS_TMPDIR=/home/ubuntu/.pio_store/tmp,PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL,PIO_CONF_DIR=/opt/PredictionIO/conf
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver#172.31.0.60:49257]
[INFO] [FileToEvents$] Events are imported.
[INFO] [FileToEvents$] Done.
EDIT:
pio build --verbose
showed an exception that was being swallowed. The problem is with the database connection, but it's still not clear what is wrong since parts of the exception are being replaced with "..."
[DEBUG] [ConnectionPool$] Registered connection pool : ConnectionPool(url:jdbc:postgresql://localhost/pio, user:pio) using factory : <default>
[DEBUG] [ConnectionPool$] Registered singleton connection pool : ConnectionPool(url:jdbc:postgresql://localhost/pio, user:pio)
[DEBUG] [StatementExecutor$$anon$1] SQL execution completed
[SQL Execution]
create table if not exists pio_meta_enginemanifests ( id varchar(100) not null primary key, version text not null, engineName text not null, description text, files text not null, engineFactory text not null); (10 ms)
[Stack Trace]
...
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$1.apply(JDBCEngineManifests.scala:37)
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$1.apply(JDBCEngineManifests.scala:29)
scalikejdbc.DBConnection$class.autoCommit(DBConnection.scala:222)
scalikejdbc.DB.autoCommit(DB.scala:60)
scalikejdbc.DB$$anonfun$autoCommit$1.apply(DB.scala:215)
scalikejdbc.DB$$anonfun$autoCommit$1.apply(DB.scala:214)
scalikejdbc.LoanPattern$class.using(LoanPattern.scala:18)
scalikejdbc.DB$.using(DB.scala:138)
scalikejdbc.DB$.autoCommit(DB.scala:214)
io.prediction.data.storage.jdbc.JDBCEngineManifests.<init>(JDBCEngineManifests.scala:29)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:526)
io.prediction.data.storage.Storage$.getDataObject(Storage.scala:293)
...
[INFO] [RegisterEngine$] Registering engine JmhjlGoEjJuKXhXpY70MbEkuGHMuOZzL 8ccd38126d56ed48adaa9f85547131467f7629f7
[DEBUG] [StatementExecutor$$anon$1] SQL execution completed
[SQL Execution]
update pio_meta_enginemanifests set engineName = 'pio-textclassification', description = 'pio-autogen-manifest', files = 'file:/home/ubuntu/pio-textclassification/target/scala-2.10/uk.co.news-assembly-0.1-SNAPSHOT-deps.jar... (192)', engineFactory = '' where id = 'JmhjlGoEjJuKXhXpY70MbEkuGHMuOZzL' and version = '8ccd38126d56ed48adaa9f85547131467f7629f7'; (3 ms)
[Stack Trace]
...
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$7.apply(JDBCEngineManifests.scala:85)
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$7.apply(JDBCEngineManifests.scala:78)
scalikejdbc.DBConnection$$anonfun$3.apply(DBConnection.scala:297)
scalikejdbc.DBConnection$class.scalikejdbc$DBConnection$$rollbackIfThrowable(DBConnection.scala:274)
scalikejdbc.DBConnection$class.localTx(DBConnection.scala:295)
scalikejdbc.DB.localTx(DB.scala:60)
scalikejdbc.DB$.localTx(DB.scala:257)
io.prediction.data.storage.jdbc.JDBCEngineManifests.update(JDBCEngineManifests.scala:78)
io.prediction.tools.RegisterEngine$.registerEngine(RegisterEngine.scala:50)
io.prediction.tools.console.Console$.build(Console.scala:813)
io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:698)
io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:684)
scala.Option.map(Option.scala:145)
io.prediction.tools.console.Console$.main(Console.scala:684)
io.prediction.tools.console.Console.main(Console.scala)
...
[DEBUG] [StatementExecutor$$anon$1] SQL execution completed
[SQL Execution]
INSERT INTO pio_meta_enginemanifests VALUES( 'JmhjlGoEjJuKXhXpY70MbEkuGHMuOZzL', '8ccd38126d56ed48adaa9f85547131467f7629f7', 'pio-textclassification', 'pio-autogen-manifest', 'file:/home/ubuntu/pio-textclassification/target/scala-2.10/uk.co.news-assembly-0.1-SNAPSHOT-deps.jar... (192)', ''); (1 ms)
[Stack Trace]
...
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$2.apply(JDBCEngineManifests.scala:48)
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$2.apply(JDBCEngineManifests.scala:40)
scalikejdbc.DBConnection$$anonfun$3.apply(DBConnection.scala:297)
scalikejdbc.DBConnection$class.scalikejdbc$DBConnection$$rollbackIfThrowable(DBConnection.scala:274)
scalikejdbc.DBConnection$class.localTx(DBConnection.scala:295)
scalikejdbc.DB.localTx(DB.scala:60)
scalikejdbc.DB$.localTx(DB.scala:257)
io.prediction.data.storage.jdbc.JDBCEngineManifests.insert(JDBCEngineManifests.scala:40)
io.prediction.data.storage.jdbc.JDBCEngineManifests.update(JDBCEngineManifests.scala:89)
io.prediction.tools.RegisterEngine$.registerEngine(RegisterEngine.scala:50)
io.prediction.tools.console.Console$.build(Console.scala:813)
io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:698)
io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:684)
scala.Option.map(Option.scala:145)
io.prediction.tools.console.Console$.main(Console.scala:684)
...
[INFO] [Console$] Your engine is ready for training.
A few things to check:
Does "pio app list" show MyTextApp has appId 1?
Download https://github.com/yipjustin/pio-event-distribution-checker and change engine.json so that appId reads 1, then "pio build" and "pio train" to see if the data is actually imported.
P.S. There is a google group (https://groups.google.com/forum/#!forum/predictionio-user) for which your question will be answered more quickly by the community of PredictionIO users.
The solution was to change the DataSource.scala to match the schema in the emails.json file before running pio build.
This is the only method I had to change in the file:
private def readEventData(sc: SparkContext) : RDD[Observation] = {
//Get RDD of Events.
PEventStore.find(
appName = dsp.appName,
entityType = Some("content"),
eventNames = Some(List("e-mail"))
// Convert collected RDD of events to and RDD of Observation
// objects.
)(sc).map(e => {
val label : String = e.properties.get[String]("label")
Observation(
if (label == "spam") 1.0 else 0.0,
e.properties.get[String]("text"),
label
)
}).cache
}
I had to change the previous values to "content", "e-mail" and "spam".

Oauth2 Cygnus and Cosmos Sink doesn't seems to work

Since the last update , i haven't been able to upload my data to Cosmos using Cygnus . I am aware that we now need to use Oauth2 token to do it . So i did the request for the token .
curl -k -X POST "https://cosmos.lab.fiware.org:13000/cosmos-auth/v1/token" -H "Content-Type: application/x-www-form-urlencoded" -d "grant_type=password&username=guillaume.jourdain#4planet.eu&password=XXXXX"
I get a token, but then i try to check the token :
curl -X GET "http://cosmos.lab.fiware.org:14000/webhdfs/v1/guillaume.jourdain/hostabee?op=liststatus&user.name=guillaume.jourdain#4planet.eu" -H "X-Auth-Token: TheToken"
and even this :
curl -X GET "http://cosmos.lab.fiware.org:14000/webhdfs/v1/guillaume.jourdain/hostabee?op=liststatus&user.name=guillaume.jourdain" -H "X-Auth-Token: TheToken"
And Everytime , for each of this command and for all the token I Tried i get this :
User token not authorized
Next i tried to put the oauth parameter in my cygnus conf file and this occured everytime :
2015-07-17 16:17:17,797 (lifecycleSupervisor-1-1) [INFO - es.tid.fiware.orionconnectors.cosmosinjector.hdfs.HttpFSBackend.createDir(HttpFSBackend.java:71)] HttpFS response: HTTP/1.1 401 Unauthorized
2015-07-17 16:17:17,798 (lifecycleSupervisor-1-1) [ERROR - es.tid.fiware.orionconnectors.cosmosinjector.OrionHDFSSink.start(OrionHDFSSink.java:108)] The directory could not be created in HDFS. HttpFS response: 401 Unauthorized
So yeah , for the moment i'm kinda stuck . Do you have any information for me to resolve this problem ?
EDIT :
Here's my Cygnus Configuration file , maybe the problem is located here
APACHE_FLUME_HOME/conf/cygnus.conf
orionagent.sources = http-source
orionagent.sinks = hdfs-sink
orionagent.channels = notifications
# Flume source, must not be changed
orionagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# channel name where to write the notification events
orionagent.sources.http-source.channels = notifications
# listening port the Flume source will use for receiving incoming notifications
orionagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
orionagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
# regular expression for the orion version the notifications will have in their headers
orionagent.sources.http-source.handler.orion_version = 0\.23\.*
# URL target
orionagent.sources.http-source.handler.notification_target = /notify
# channel name from where to read notification events
orionagent.sinks.hdfs-sink.channel = notifications
# Flume sink that will process and persist in HDFS the notification events, must not be changed
orionagent.sinks.hdfs-sink.type = com.telefonica.iot.cygnus.sinks.OrionHDFSSink
# IP address of the Cosmos deployment where the notification events will be persisted
orionagent.sinks.hdfs-sink.cosmos_host = 130.206.80.46
# port of the Cosmos service listening for persistence operations; 14000 for httpfs, 50070 for webhdfs and free choice for inifinty
orionagent.sinks.hdfs-sink.cosmos_port = 14000
# username allowed to write in HDFS (/user/myusername)
orionagent.sinks.hdfs-sink.cosmos_username = guillaume.jourdain
# dataset where to persist the data (/user/myusername/mydataset)
orionagent.sinks.hdfs-sink.cosmos_password = XXXXX
orionagent.sinks.hdfs-sink.cosmos_dataset = hostABee
orionagent.sinks.hdfs-sink.attr_persistence = column
orionagent.sinks.hdfs-sink.hive_host = 130.206.80.46
orionagent.sinks.hdfs-sink.hive_port = 10000
orionagent.sinks.hdfs-sink.oauth2_token = TheTOKEN
# HDFS backend type (webhdfs, httpfs or infinity)
orionagent.sinks.hdfs-sink.hdfs_api = webhdfs
# channel name
orionagent.channels.notifications.type = memory
# capacity of the channel
orionagent.channels.notifications.capacity = 1000
# amount of bytes that can be sent per transaction
orionagent.channels.notifications.transactionCapacity = 100
Now I get this error (and others). The sink and the handlers does'nt seems to be found
2015-07-27 14:27:10,562 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:40)] Creating instance of sink: hdfs-sink, type: com.telefonica.iot.cygnus.sinks.OrionHDFSSink
2015-07-27 14:27:10,562 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:142)] Failed to load configuration data. Exception follows.
org.apache.flume.FlumeException: Unable to load sink type: com.telefonica.iot.cygnus.sinks.OrionHDFSSink, class: com.telefonica.iot.cygnus.sinks.OrionHDFSSink
at org.apache.flume.sink.DefaultSinkFactory.getClass(DefaultSinkFactory.java:69)
at org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:415)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.lang.ClassNotFoundException: com.telefonica.iot.cygnus.sinks.OrionHDFSSink
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.flume.sink.DefaultSinkFactory.getClass(DefaultSinkFactory.java:67)
... 12 more
Thank you for reading .
Regarding the WebHDFS command for listing a HDFS folder:
curl -X GET "http://cosmos.lab.fiware.org:14000/webhdfs/v1/guillaume.jourdain/hostabee?op=liststatus&user.name=guillaume.jourdain#4planet.eu" -H "X-Auth-Token: TheToken"
The user.name should be user.name=guillaume.jourdain (without the #4planet.eu part).
Regarding Cygnus, have you upgraded to 0.8.2? It is the only Cygnus version supporting OAuth2. I guess you did not upgrade because of the es.tid.fiware.orionconnectors.cosmosinjector.OrionHDFSSink logs (those packages are previous to 0.8.0). You have all the details for upgrading here.

Flume-ng agent not starting and giving nullpointer exception

I am writing custom source sink and channel and my config file is like
agent.sources = source
agent.sinks = sink
agent.channels = channel
agent.sources.source.type = com.flume.FlumeSource
agent.sources.source.channels = channel
agent.channels.channel.type = com.flume.FlumeChannel$Builder
agent.channels.channel.type = file
agent.sinks.sink.type = com.flume.FlumeSink
agent.sinks.sink.hdfs.path = <hdfs path>
agent.sources.source.channels = channel
agent.sinks.sink.channel = channel
I am trying to start the agent by adding the jar to the flume classpath using the command
bin/flume-ng agent --conf-file flume.config --classpath /usr/lib/flume-ng/agent.jar --name nab-agent -Dflume.root.logger=DEBUG,console`
It says the hdfs path is not present where as it exists. It gives NullPointerException saying hostname does not exist. and main class is not found in org.apache.flume.node.Application

Flume NullPointerExceptions on checkpoint

I've setup a file to file source/sink , just as a test of basic flume functionality.
Im currently using the "exec" source, with the command being "tail -F mytmpfile".
In my script, I continuously echo "....." >> mytmpfile , so that the tail command constitutes a stream.
However, I've started seeing the following exception in the flume logs:
java.lang. IllegalStateException: Channel closed [channel=c1]. Due to
java.lang.NullPointerException: null
at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:183)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895)
at org.apache.flume.channel.file.Log.replay(Log.java:406)
at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
... 1 more
Any thoughts on where this NullPointerException is coming from? It appears from scanning the code that maybe it related to a missing folder or directory. But I cant find the exact line on the git hub branches.
This is using apache-flume-1.3.1.23-...
In the past I've had problems with file channels, and they've normally boiled down to two problems:
1) If you're running multiple agents on the same box, make sure you configure them to have separate dataDirs and checkpointDir.
2) On Linux boxes, check that your tmpfs isn't near its capacity. If it's getting full, flume will complain. Try stopping the flume agent, unmount tmpfs, enlarge it, remount and restart the agent.

Resources