PredictionIO text classification quick start failing when reading the data - machine-learning

I'm following this quick start after starting this ready-to-use PredictionIO Amazon EC2 instance and after running these commands it fails in the pio train:
pio app new MyTextApp
pio import --appid 1 --input data/stopwords.json
pio import --appid 1 --input data/emails.json
pio build
pio train
...
Data set is empty, make sure event fields match imported data.
Exception in thread "main" java.lang.IllegalStateException: Haven't seen any document yet.
at org.apache.spark.mllib.feature.IDF$DocumentFrequencyAggregator.idf(IDF.scala:132)
at org.apache.spark.mllib.feature.IDF.fit(IDF.scala:56)
at uk.co.news.PreparedData.<init>(Preparator.scala:70)
at uk.co.news.Preparator.prepare(Preparator.scala:47)
at uk.co.news.Preparator.prepare(Preparator.scala:43)
Since there is no error when running the command to import emails, I don't understand why the data set is still empty. I double-checked the email.json file and the data is indeed there and this is the result when running
pio import --appid 1 --input data/emails.json
ubuntu#ip-172-31-0-60:~/pio-textclassification$ pio import --appid 1 --input data/emails.json
[INFO] [Runner$] Submission command: /opt/spark-1.4.1-bin-hadoop2.6/bin/spark-submit --class io.prediction.tools.imprt.FileToEvents --files file:/opt/PredictionIO/conf/log4j.properties --driver-class-path /opt/PredictionIO/conf file:/opt/PredictionIO/lib/pio-assembly-0.9.4.jar --appid 1 --input file:/home/ubuntu/pio-textclassification/data/emails.json --env PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/ubuntu/.pio_store,PIO_HOME=/opt/PredictionIO,PIO_FS_ENGINESDIR=/home/ubuntu/.pio_store/engines,PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio,PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc,PIO_FS_TMPDIR=/home/ubuntu/.pio_store/tmp,PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL,PIO_CONF_DIR=/opt/PredictionIO/conf
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver#172.31.0.60:49257]
[INFO] [FileToEvents$] Events are imported.
[INFO] [FileToEvents$] Done.
EDIT:
pio build --verbose
showed an exception that was being swallowed. The problem is with the database connection, but it's still not clear what is wrong since parts of the exception are being replaced with "..."
[DEBUG] [ConnectionPool$] Registered connection pool : ConnectionPool(url:jdbc:postgresql://localhost/pio, user:pio) using factory : <default>
[DEBUG] [ConnectionPool$] Registered singleton connection pool : ConnectionPool(url:jdbc:postgresql://localhost/pio, user:pio)
[DEBUG] [StatementExecutor$$anon$1] SQL execution completed
[SQL Execution]
create table if not exists pio_meta_enginemanifests ( id varchar(100) not null primary key, version text not null, engineName text not null, description text, files text not null, engineFactory text not null); (10 ms)
[Stack Trace]
...
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$1.apply(JDBCEngineManifests.scala:37)
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$1.apply(JDBCEngineManifests.scala:29)
scalikejdbc.DBConnection$class.autoCommit(DBConnection.scala:222)
scalikejdbc.DB.autoCommit(DB.scala:60)
scalikejdbc.DB$$anonfun$autoCommit$1.apply(DB.scala:215)
scalikejdbc.DB$$anonfun$autoCommit$1.apply(DB.scala:214)
scalikejdbc.LoanPattern$class.using(LoanPattern.scala:18)
scalikejdbc.DB$.using(DB.scala:138)
scalikejdbc.DB$.autoCommit(DB.scala:214)
io.prediction.data.storage.jdbc.JDBCEngineManifests.<init>(JDBCEngineManifests.scala:29)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:526)
io.prediction.data.storage.Storage$.getDataObject(Storage.scala:293)
...
[INFO] [RegisterEngine$] Registering engine JmhjlGoEjJuKXhXpY70MbEkuGHMuOZzL 8ccd38126d56ed48adaa9f85547131467f7629f7
[DEBUG] [StatementExecutor$$anon$1] SQL execution completed
[SQL Execution]
update pio_meta_enginemanifests set engineName = 'pio-textclassification', description = 'pio-autogen-manifest', files = 'file:/home/ubuntu/pio-textclassification/target/scala-2.10/uk.co.news-assembly-0.1-SNAPSHOT-deps.jar... (192)', engineFactory = '' where id = 'JmhjlGoEjJuKXhXpY70MbEkuGHMuOZzL' and version = '8ccd38126d56ed48adaa9f85547131467f7629f7'; (3 ms)
[Stack Trace]
...
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$7.apply(JDBCEngineManifests.scala:85)
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$7.apply(JDBCEngineManifests.scala:78)
scalikejdbc.DBConnection$$anonfun$3.apply(DBConnection.scala:297)
scalikejdbc.DBConnection$class.scalikejdbc$DBConnection$$rollbackIfThrowable(DBConnection.scala:274)
scalikejdbc.DBConnection$class.localTx(DBConnection.scala:295)
scalikejdbc.DB.localTx(DB.scala:60)
scalikejdbc.DB$.localTx(DB.scala:257)
io.prediction.data.storage.jdbc.JDBCEngineManifests.update(JDBCEngineManifests.scala:78)
io.prediction.tools.RegisterEngine$.registerEngine(RegisterEngine.scala:50)
io.prediction.tools.console.Console$.build(Console.scala:813)
io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:698)
io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:684)
scala.Option.map(Option.scala:145)
io.prediction.tools.console.Console$.main(Console.scala:684)
io.prediction.tools.console.Console.main(Console.scala)
...
[DEBUG] [StatementExecutor$$anon$1] SQL execution completed
[SQL Execution]
INSERT INTO pio_meta_enginemanifests VALUES( 'JmhjlGoEjJuKXhXpY70MbEkuGHMuOZzL', '8ccd38126d56ed48adaa9f85547131467f7629f7', 'pio-textclassification', 'pio-autogen-manifest', 'file:/home/ubuntu/pio-textclassification/target/scala-2.10/uk.co.news-assembly-0.1-SNAPSHOT-deps.jar... (192)', ''); (1 ms)
[Stack Trace]
...
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$2.apply(JDBCEngineManifests.scala:48)
io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$2.apply(JDBCEngineManifests.scala:40)
scalikejdbc.DBConnection$$anonfun$3.apply(DBConnection.scala:297)
scalikejdbc.DBConnection$class.scalikejdbc$DBConnection$$rollbackIfThrowable(DBConnection.scala:274)
scalikejdbc.DBConnection$class.localTx(DBConnection.scala:295)
scalikejdbc.DB.localTx(DB.scala:60)
scalikejdbc.DB$.localTx(DB.scala:257)
io.prediction.data.storage.jdbc.JDBCEngineManifests.insert(JDBCEngineManifests.scala:40)
io.prediction.data.storage.jdbc.JDBCEngineManifests.update(JDBCEngineManifests.scala:89)
io.prediction.tools.RegisterEngine$.registerEngine(RegisterEngine.scala:50)
io.prediction.tools.console.Console$.build(Console.scala:813)
io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:698)
io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:684)
scala.Option.map(Option.scala:145)
io.prediction.tools.console.Console$.main(Console.scala:684)
...
[INFO] [Console$] Your engine is ready for training.

A few things to check:
Does "pio app list" show MyTextApp has appId 1?
Download https://github.com/yipjustin/pio-event-distribution-checker and change engine.json so that appId reads 1, then "pio build" and "pio train" to see if the data is actually imported.
P.S. There is a google group (https://groups.google.com/forum/#!forum/predictionio-user) for which your question will be answered more quickly by the community of PredictionIO users.

The solution was to change the DataSource.scala to match the schema in the emails.json file before running pio build.
This is the only method I had to change in the file:
private def readEventData(sc: SparkContext) : RDD[Observation] = {
//Get RDD of Events.
PEventStore.find(
appName = dsp.appName,
entityType = Some("content"),
eventNames = Some(List("e-mail"))
// Convert collected RDD of events to and RDD of Observation
// objects.
)(sc).map(e => {
val label : String = e.properties.get[String]("label")
Observation(
if (label == "spam") 1.0 else 0.0,
e.properties.get[String]("text"),
label
)
}).cache
}
I had to change the previous values to "content", "e-mail" and "spam".

Related

Liquibase checksum validation failed when running via Jenkins but not from terminal

I am running below command from Linux(Centos) terminal,
mvn --settings /home/centos/.m2/jenkins/liquibase-settings.xml -e resources:resources -Pdev -Dliquibase.promptOnNonLocalDatabase=false -Dliquibase.defaultSchemaName=MYDEV_SCHEMA liquibase:updateSQL liquibase:update -Dsettings.security=/home/centos/.m2/jenkins/liquibase-security-settings.xml -Dfile.encoding=UTF-8
Everything went fine.
Same thing when I run via Jenkins, getting below,
[ERROR] Failed to execute goal org.liquibase:liquibase-maven-plugin:4.2.0:updateSQL (default-cli) on project project-db:
[ERROR] Error setting up or running Liquibase:
[ERROR] Validation Failed:
[ERROR] 16 change sets check sum
[ERROR] db/changelog/ABCD.xml::1234-23::User1 was: 8:67913d9505606eeaaa4998fd594a8ccf but is now: 8:9d985650b579319df50f30732d66909c
[ERROR] db/changelog/ABCD.xml::1234-78::User1 was: 8:3b3babd5d0712f846402af13ede528f7 but is now: 8:0214bf10acfd160fc6f7d709edab2f2e
[ERROR] db/changelog/ABCD.xml::1234-142::User1 was: 8:5e3c8fc77fc87f0e9740c0bff717f579 but is now: 8:53094dd8c32ec71b8d76fdd71009c548
[ERROR] db/changelog/ABCD.xml::1234-200::User1 was: 8:c40ec5c77f7b10961ee550edd756f51f but is now: 8:9bef09eb0681f7ea7bf827b6ac136433
[ERROR] db/changelog/ABCD.xml::1234-923::User1 was: 8:747cbcbda155679dd2fc1bfcc40991c4 but is now: 8:68c8046c220b8d2eb46ed3ac07ebc2a2
[ERROR] db/changelog/ABCD.xml::1234-952::User1 was: 8:ecaad2afacf6c61f18e08cb3e235292a but is now: 8:0f7f9087de5cc2e62a96a86988d07a9d
[ERROR] db/changelog/ABCD.xml::1234-955::User1 was: 8:3ddd6fd25fb4a68accf50190b3ab6738 but is now: 8:8ebed2810bad45ace402f99a957a2c5a
[ERROR] db/changelog/ABCD.xml::1234-957::User1 was: 8:cc6144775a784d10bc4523dccae02c2e but is now: 8:f0fb84fb3a677e760b5bbad3149e8a17
[ERROR] db/changelog/ABCD.xml::1234-958::User1 was: 8:b0c71a212949df4863ce622e61315cee but is now: 8:9c6ea7b8f8cb3f6e65871085527fa4c5
[ERROR] db/changelog/ABCD.xml::1234-960::User1 was: 8:b0966c55100b0a2daae7dd34b7d1849f but is now: 8:5db8b313d34612e1a0035caa73bfae2d
[ERROR] db/changelog/ABCD.xml::1234-961::User1 was: 8:3e3b96c656362b5bed959428772efbdf but is now: 8:622c3530660fa51cfb806cc454736a8e
[ERROR] db/changelog/ABCD.xml::1234-964::User1 was: 8:50e079098e7d2be9e1299d68717af265 but is now: 8:13ab1763f5f21e80dc5f7aa714916f01
[ERROR] db/changelog/ABCD.xml::1234-971::User1 was: 8:fe000258281e834309f9454077e4935d but is now: 8:b238dad4489c9683a3e362820a0ba715
[ERROR] db/changelog/ABCD.xml::1234-974::User1 was: 8:578a1f3510ac700373b40d83ffbfcdde but is now: 8:3eeb6e61dec24eac4148a6c66033e125
[ERROR] db/changelog/ABCD.xml::100000011::User1 was: 8:5d0882f8413b6d1063ab023e7c4ec917 but is now: 8:e019e12a40add4536a128ba7b9b06f69
[ERROR] db/changerequest/ABCD/ABCD.1_Base.xml::ABCD1-100000211::User1 was: 8:9ea4f4f4b5a2db0d1c7e439887e9129c but is now: 8:ebe390648144994233ecd6101e04380c
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.liquibase:liquibase-maven-plugin:4.2.0:updateSQL (default-cli) on project project-db:
My Jenkins code,
dir("${liquibase_working_dir}"){
configFileProvider([
configFile(fileId: 'liquibase-settings.xml', variable: 'LIQUIBASE_SETTINGS'),
configFile(fileId: 'liquibase-security-settings.xml', variable: 'LIQUIBASE_SECURITY_SETTINGS'),
]) {
withMaven(maven:'maven', mavenSettingsFilePath: "${LIQUIBASE_SETTINGS}") {
sh "mvn -e resources:resources liquibase:updateSQL liquibase:update -P${env_lowercase} \"-Dsettings.security=${LIQUIBASE_SECURITY_SETTINGS}\" -Dliquibase.promptOnNonLocalDatabase=false -Dliquibase.defaultSchemaName=${schema} -Dfile.encoding=UTF-8"
}
}
sh "cp target/liquibase/migrate.sql target/liquibase/${env_lowercase}-${currentBuild.number}-${schema}-updates.sql"
}
I missed an important point, there are no commits to the liquibase repository.
When Liquibase reaches a changeset, it computes a checksum for it and stores it in the DATABASECHANGELOG table. The value of storing the checksum for Liquibase is to know if the changeset has been modified since it was run.
If the changeset has been changed since it was run, Liquibase will exit the migration with an error message like Validation failed: change set check sums <changeset identifer> was: <old checksum> but is now: <newchecksum>. This is because Liquibase cannot identify what was changed and the database may be in a state different than what the changelog is expecting.
To ignore this error on valid change made in changeset, there are below options:
1. clearCheckSums : clearCheckSums clears all checksums and nullifies the MD5SUM column of the DATABASECHANGELOG table so they will be re-computed on the next database update.Changesets that have been deployed will have their checksums re-computed, and pending changesets will be deployed. For more details about this approach, please visit this link
2. runOnChange attribute : The runOnChange attribute executes the change the first time it is seen and each time the changeset is modified. For more details about this approach, please visit this link
3. runAlways attribute : Executes the changeset on every run, even if it has been run before. To use this, set attribute runAlways = true in your changeset. Example as below:
<changeSet id="liquibase-0" author="liquibase" runAlways="true">
<sqlFile relativeToChangelogFile="true" path="db/file.sql"/>
</changeSet>
4. The <validCheckSum> attribute : Add a element to the changeset. The text contents of the element should contain the old checksum from the error message.
5. Manual update of the DATABASECHANGELOG table : The first option is to manually update the DATABASECHANGELOG table so that the row with the corresponding id/author/filepath has a null value for the checksum. You would need to do this for all environments where the changeset has been deployed. The next time you run the Liquibase update command, it will update the checksum value to the new correct value.
Cheers!!

tdb2.tdbcompact command line tool returns Failed to get a lock: file

I'm running apache-jena-fuseki-3.13-1 and just found tdb2.tdbcompact from its bin-directory. I should run tdb2.tdbcompact nightly to prevent my jena-fuseki from running out of disk space, but now I get error message( Failed to get a lock: file) when running it:
miettinj#ramen:~/jena> ./apache-jena-3.13.1/bin/tdb2.tdbcompact --loc=./apache-jena-fuseki- 3.13.1/run/databases/test_TDB2
org.apache.jena.dboe.DBOpEnvException: Failed to get a lock: file='/srv/work/miettinj/jena/apache-jena-fuseki-3.13.1/run/databases/test_TDB2/tdb.lock': held by process 6136
ps -x|grep 6136
6136 ? Sl 30:48 /usr/lib64/jvm/java/bin/java -Xmx1200M -cp /srv/work/miettinj/jena/apache-jena-fuseki-3.13.1/fuseki-server.jar
"held by process 6136"
Another process is using the database. Compaction has to happen from the process using the database.
Apache Jena Fuseki Jena 3.17.0 added a function endpoint so that the administrator can ask for compaction on a running Fuseki server.

Starting neo4j from docker container shows "Neo4j is not running"

I have been trying to use use neo4j community in a container and am getting errors. I think this might more a docker usage issues rather than neo4j usage.
I have built a container image from https://github.com/neo4j/docker-neo4j-publish 2.3.9, 3.3.3, 3.3.4 and 3.3.5 (only differences being some new ports in later versions). I have even pulled a native 3.3.3 from dockerhub.com
mkdir /tmp/data
chmod 777 /tmp/data
docker run --detach=true --name=neo4j --publish=7474:7474 --publish=7687:7687 --publish=7473:7473 --volume=/tmp/data:/data neo4j:3.3.3
docker exec -it neo4j find / -name '*.log'
and although it seems to be working with
neo4j> CREATE (n);
0 rows available after 50 ms, consumed after another 0 ms
Added 1 nodes
neo4j> CREATE (m),(o);
0 rows available after 15 ms, consumed after another 0 ms
Added 2 nodes
neo4j> MATCH (n) RETURN n;
+----+
| n |
+----+
| () |
| () |
| () |
+----+
3 rows available after 21 ms, consumed after another 8 ms
I actually get errors like this:
docker exec -it neo4j neo4j status
Neo4j is not running
Now this one looks like I am mistakenly trying to start another instance of Neo4j over a running instance:
docker exec -it neo4j neo4j console
Active database: graph.db
Directories in use:
home: /var/lib/neo4j
config: /var/lib/neo4j/conf
logs: /var/lib/neo4j/logs
plugins: /var/lib/neo4j/plugins
import: /var/lib/neo4j/import
data: /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
run: /var/lib/neo4j/run
Starting Neo4j.
2018-04-15 06:30:13.119+0000 WARN Unknown config option: causal_clustering.discovery_listen_address
2018-04-15 06:30:13.123+0000 WARN Unknown config option: causal_clustering.raft_advertised_address
2018-04-15 06:30:13.123+0000 WARN Unknown config option: causal_clustering.raft_listen_address
2018-04-15 06:30:13.123+0000 WARN Unknown config option: ha.host.coordination
2018-04-15 06:30:13.124+0000 WARN Unknown config option: causal_clustering.transaction_advertised_address
2018-04-15 06:30:13.124+0000 WARN Unknown config option: causal_clustering.discovery_advertised_address
2018-04-15 06:30:13.124+0000 WARN Unknown config option: ha.host.data
2018-04-15 06:30:13.124+0000 WARN Unknown config option: causal_clustering.transaction_listen_address
2018-04-15 06:30:13.146+0000 INFO ======== Neo4j 3.3.3 ========
2018-04-15 06:30:13.186+0000 INFO Starting...
2018-04-15 06:30:13.997+0000 INFO Bolt enabled on 0.0.0.0:7687.
2018-04-15 06:30:14.094+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#44a59da3' was successfully initialized, but failed to start. Please see the attached cause exception "Store and its lock file has been locked by another process: /var/lib/neo4j/data/databases/graph.db/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#44a59da3' was successfully initialized, but failed to start. Please see the attached cause exception "Store and its lock file has been locked by another process: /var/lib/neo4j/data/databases/graph.db/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#44a59da3' was successfully initialized, but failed to start. Please see the attached cause exception "Store and its lock file has been locked by another process: /var/lib/neo4j/data/databases/graph.db/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)".
Does anybody have experience with Neo4j's docker implementation? Is it a single threaded issue meaning I need to call the CLI tools differently from the container?
The neo4j status command only works if you've started neo4j with neo4j start. Start creates a neo4j.pid file that status uses to see if neo4j is running. Starting under docker uses the console option instead of the start option. This does not create the PID file, so the status doesn't work. But that hardly matters, because neo4j is just about the only process running; if neo4j dies, the container will exit. If docker ps -a says that the container is up, then neo4j is up.

Graylog 2.2.0-beta.1 in Docker with UDP input: Unable to load default stream

I'm trying to use graylog2 to collect logs from docker containers. Docs says that only UDP GELF input is supported for this purpose.
I'm using docker-compose to run the graylog server. See gist for all files used: https://gist.github.com/olegabr/7f5190c453bb63c71dabf151d2373c2f.
And I'm using this command to test it:
sendip -p ipv4 -is 127.0.0.1 -p udp -us 5070 -ud 12201 -d '{"version": "1.1","host":"example.org","short_message":"Short message","full_message":"Backtrace here\n\nmore stuff","level":1,"_user_id":9001,"_some_info":"foo","_some_env_var":"bar"}' -v 127.0.0.1
Server receives this message, but it can not process it. I see following in the graylog2 logs:
2016-12-09 11:53:20,125 WARN : org.graylog2.bindings.providers.DefaultStreamProvider - Unable to load default stream, tried 1 times, retrying every 500ms. Processing is blocked until this succeeds.
2016-12-09 11:53:25,129 WARN : org.graylog2.bindings.providers.DefaultStreamProvider - Unable to load default stream, tried 11 times, retrying every 500ms. Processing is blocked until this succeeds.
e.t.c. many many similar lines.
The API call curl http://admin:123456#127.0.0.1:9000/api/count/total returns
{"events":0}
In the server logs I see that the default stream was initialized:
mongo_1 | 2016-12-09T11:51:12.522+0000 I INDEX [conn3] build index on: graylog.pipeline_processor_pipelines_streams properties: { v: 2, unique: true, key: { stream_id: 1 }, name: "stream_id_1", ns: "graylog.pipeline_processor_pipelines_streams" }
graylog_1 | 2016-12-09 11:51:13,408 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog.plugins.pipelineprocessor.periodical.LegacyDefaultStreamMigration] periodical, running forever.
graylog_1 | 2016-12-09 11:51:13,424 INFO : org.graylog.plugins.pipelineprocessor.periodical.LegacyDefaultStreamMigration - Legacy default stream has no connections, no migration needed.
graylog_1 | 2016-12-09 11:51:13,487 INFO : org.graylog2.migrations.V20160929120500_CreateDefaultStreamMigration - Successfully created default stream: All messages
graylog_1 | 2016-12-09 11:51:13,653 INFO : org.graylog2.migrations.V20161125142400_EmailAlarmCallbackMigration - No streams needed to be migrated.
graylog_1 | 2016-12-09 11:51:13,662 INFO : org.graylog2.migrations.V20161125161400_AlertReceiversMigration - No streams needed to be migrated.
graylog_1 | 2016-12-09 11:51:13,672 INFO : org.graylog2.migrations.V20161130141500_DefaultStreamRecalcIndexRanges - Cluster not connected yet, delaying migration until it is reachable.
So, why it can not be loaded when the message arrives? Why it is needed in the first place?
I've tried to find similar reports in web but with no success.
This has nothing to do with the UDP input per se.
Graylog 2.2.0-beta.1 is broken and shouldn't be used. Please downgrade to Graylog 2.1.2 (the latest stable version) or wait for Graylog 2.2.0-beta.2.
See https://groups.google.com/forum/#!searchin/graylog2/docker|sort:date/graylog2/gCycC3_K3vU/EL-Lz_uNDQAJ for a related post on the Graylog mailing list.
same trouble
just setup graylog and configure input gelf udp 12209 port
then test it twice by:
docker run --log-driver=gelf --log-opt gelf-address=udp://127.0.0.1:12209 busybox echo Hello Graylog
in UI i saw:
2 messages in process buffe
2 unprocessed messages are currently in the journal, in 1 segments.
0 messages have been appended in the last second, 0 messages have been read in the last second.
and still getting:
2016-12-09 12:41:23,715 INFO : org.graylog2.inputs.InputStateListener - Input [GELF UDP/584aa67308813b00010d009e] is now RUNNING
2016-12-09 12:41:43,666 WARN : org.graylog2.bindings.providers.DefaultStreamProvider - Unable to load default stream, tried 1 times, retrying every 500ms. Processing is blocked until this succeeds.
anyone have found solution ?

hadoop only launch local job by default why?

I have written my own hadoop program and I can run using pseudo distribute mode in my own laptop, however, when I put the program in the cluster which can run example jar of hadoop, it by default launches the local job though I indicate the hdfs file path, below is the output, give suggestions?
./hadoop -jar MyRandomForest_oob_distance.jar hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
12/03/16 16:21:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/16 16:21:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/16 16:21:25 INFO mapred.JobClient: Running job: job_local_0001
12/03/16 16:21:25 INFO mapred.MapTask: io.sort.mb = 100
12/03/16 16:21:25 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/16 16:21:25 INFO mapred.MapTask: record buffer = 262144/327680
12/03/16 16:21:25 WARN mapred.LocalJobRunner: job_local_0001
java.io.FileNotFoundException: File /user/randomforest/input/genotype1.txt does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at Data.Data.loadData(Data.java:103)
at MapReduce.DearMapper.loadData(DearMapper.java:261)
at MapReduce.DearMapper.setup(DearMapper.java:332)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/16 16:21:26 INFO mapred.JobClient: map 0% reduce 0%
12/03/16 16:21:26 INFO mapred.JobClient: Job complete: job_local_0001
12/03/16 16:21:26 INFO mapred.JobClient: Counters: 0
Total Running time is: 1 secs
LocalJobRunner has been chosen as your configuration most probably has the mapred.job.tracker property set to local or has not been set at all (in which case the default is local). To check, go to "wherever you extracted/installed hadoop"/etc/hadoop/ and see if the file mapred-site.xml exists (for me it did not, a file called mapped-site.xml.template was there). In that file (or create it if it doesn't exist) make sure it has the following property:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
See the source for org.apache.hadoop.mapred.JobClient.init(JobConf)
What is the value of this configuration property in the hadoop configuration on the machine you are submitting this from? Also confirm that the hadoop executable you are running references this configuration (and that you don't have 2+ installations configured differently) - type which hadoop and trace any symlinks you come across.
Alternatively you can override this when you submit your job, if you know the JobTracker host and port number using the -jt option:
hadoop jar MyRandomForest_oob_distance.jar -jt hostname:port hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
If you're using Hadoop 2 and your job is running locally instead of on the cluster, ensure that you have setup mapred-site.xml to contain the mapreduce.framework.name property with a value of yarn. You also need to set up an aux-service in yarn-site.xml
Checkout the Cloudera Hadoop 2 operator migration blog for more information.
I had the same problem that every mapreduce v2 (mrv2) or yarn task only ran with the mapred.LocalJobRunner
INFO mapred.LocalJobRunner: Starting task: attempt_local284299729_0001_m_000000_0
The Resourcemanager and Nodemanagers were accessible and the mapreduce.framework.name was set to yarn.
Setting the HADOOP_MAPRED_HOME before executing the job fixed the problem for me.
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
cheers
dan

Resources