Data in Hbase are not structured as it should be - Twitter Flume

Data in Hbase are not structured as it should be - Twitter Flume - twitter

Users, greetings !
I have installed a flume on my cloudera 4.6, and I am trying to get tweets from twitter.
So I created a HDFS sink and a HBase sink, and they are gathering tweets... But data in HBase is not well structured.
As the data is not structured, I can't make queries on it with impala.
I created a table tweets {NAME => 'tweet'}, {NAME => 'retweet'}, {NAME => 'entities'}, {NAME => 'user'}
and my flume configuration is : http://pastebin.com/4b5d3R8Q
I am following this tutorial, but I don't know what to do with his serializer.
https://github.com/AronMacDonald/Twitter_Hbase_Impala
I have to make it into a jar ?
I have currently this in Hbase: http://pastebin.com/aNGBsvB7
Everything is in the column tweets...

I recompiled and used the flume-sources-1.0-SNAPSHOT.jar from the git:https://github.com/cloudera/cdh-twitter-example and so there were no promblem when using 'TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource'
Install Maven, then download the repository of cdh-twitter-example.
Unzip, then execute inside (as mentionned) :
$ cd flume-sources
$ mvn package
$ cd ..
This problem happened when the twitter4j version updated from 2.2.6 to 3.X, they removed the method setIncludeEntities, and the JAR is not up to date.
PS: Do not download the prebuilt version, it is still the old.

Related

NEO4J 3.1.1 Load CSV

I have been using Neo4j for several months now and am getting pretty exasperated.
It appears that every new version breaks the previous one.
I have multiple Cypher Load scripts that I can no longer run via the command line.
I can run the following from the Browser:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///person.csv" AS csvLine
MERGE (p:Person {sysurn : csvLine.urn})
ON CREATE SET p.dob = trim(csvLine.dob)
ON CREATE SET p.forename = trim(csvLine.forename)
ON CREATE SET p.surname = trim(csvLine.surname );
Previously in version 3.0.3 (Community Edition) I ran the following:
java -cp "C:\Program Files\Neo4j CE 3.0.3\bin\neo4j-desktop-3.0.3.jar" org.neo4j.shell.StartClient -file "D:/nosql/Load data/load_person.cql"
This no longer works in 3.1.1:
java -cp "C:\Program Files\Neo4j CE 3.1.1\bin\neo4j-desktop-3.1.1.jar" org.neo4j.shell.StartClient -file "D:/nosql/Load data/load_person.cql"
I get a Java Error. The general consensus is to run the full .tar version, so I installed that.
I can now run the Cypher from the browser or using cypher-shell. However this is of no use as there is no way to call an external script, so I have to do this for possibly hundreds of scripts.
Recommendation is to use ne04j-shell (now deprecated!)
I try neo4j-shell. This doesn't accept spaces in the path!
I move the file and try to run the following:
"C:\Program Files\neo4j-community-3.1.1\bin\neo4j-shell" -path "D:/nosql/neoDB/databases/graph.db" -config "neo4j.conf" -file "D:/nosql/Loaddata/load_person.cql"
I get the following error:
ERROR (-v for expanded information):
Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory,
D:\nosql\neoDB\databases\graph.db
I have tried various combinations including adding the host name as prompted:
non-JRMP server at remote endpoint
I try adding the -config param, however this again doesn't allow spaces!
With every new version it seems to get more difficult to actually import data into Neo4j.
My question is, Is it possible in version 3.1.1 to run more than one cypher script at a time without manually running every one?
Is it possible to use neo4j-shell in version 3.1.1?

Try using the APOC procedure apoc.cypher.runFile from within cypher-shell. Here is an example (with a file URL formatted for Windows):
CALL apoc.cypher.runFile("file:d:/nosql/Load data/load_person.cql");
If the space in the path still presents problems, you could rename the "Load data" folder to "LoadData" and modify the above query accordingly.

Well I have managed to find a workaround.
Install neo4j 3.1.1
Create a database
Uninstall Neo4j
Install neo4j 3.0.8
Run my Cypher scripts
e.g. java -cp "C:\Program Files\Neo4j CE
3.0.8\bin\neo4j-desktop-3.0.8.jar" org.neo4j.shell.StartClient -file "D:/nosql/Load data/load_person.cql"
uninstall neo4j
install neo4j 3.1.1
I don't think this is going to cut it in a production environment though :)

How to adda new module to MongooseIm chat server written over ejabberd

I am trying to add the mod_zeropush module to the existing ejabberd MongooseIm server.
I copied the beam file to location where all beam files are there in the rel folder.
When do the $sudo bin/mongooseimctl debug command and finding the mod_ all entries matching with mod_... I see all except my mod_zeropush.
Can anyone help me how they made this module add to their chat server.
Raised this issue to Github guys as well : MongooseIM GitHub

I achieved this by getting some help and would like share how its added to MongooseIM.
This setup is done on a server running Ubuntu 16.04.
After you downloaded mod_zeropush.erl (maybe from here), put it in the location as mentioned below:
`<GitSourceMongooseFolder>/apps/ejabberd/src/mod_zeropush.erl`
Run sudo make in MongooseIM directory.
After the build is done, the beam file is created in the rel
folder at location given below:
/MongooseIM/rel/mongooseim/lib/ejabberd-2.1.8+mim-2.0.0beta2-312-g3cec442/ebin
Add the following code to ejabberd.cfg in modules section.
{mod_zeropush, [
{sound, "default"},
{auth_token, "myapp-chat-token"},
{post_url, "http://my.url/mypath"}
]},
Go to rel/mongooseim folder and enter the command sudo
bin/mongooseimctl debug
Check by entering mod_ on shell prompt then Tab; you should see
mod_zero
Go to root/rel github directory and sudo bin/mongooseim restart
Done. You should receive offline messages on your web server.

Building the openfootball database to sqlite

Fellow programmers,
I am trying to create a football pool for my friends and me. Now I would like to automate this process, such that I don't want to enter any match statistics and teams. So I found this excellent repository hosted at this github url:
https://github.com/openfootball/build
As indicated in the examples, you need a directory structure like this for building the actual database:
openmundi/ # -> create folder (e.g. mkdir openmundi)
world.db # -> git clone (see github.com/openmundi)
openfootball/ # -> create folder
build # -> git clone
national-teams # ..
world-cup # ..
So I have this directory structure in my application:
public/tmp/openmundi
wordl.db # from https://github.com/openmundi/world.db
public/tmp/openfootball
build # from https://github.com/openfootball/build.git
europe-champions-league # from https://github.com/openfootball/europe-champions-league.git
Now if I go to my build directory situated in public/tmp/openfootball/build, there is a Rakefile which i can run (I only want the Champions League data). So if I do the following in the build directory I'm getting al sorts of errors that the database can't be build:
rake build DATA=cl201314 # is the command I'm running
I can see in the following file what the rake task is doing: public/tmp/openfootball/build/tasks/setups/cl.rake
The contents of the cl.rake are :
################################
# football clubs n leagues
task :cl201314 => :importbuiltin do
SportDb.read_setup( 'setups/teams', CLUBS_INCLUDE_PATH )
SportDb.read_setup( 'setups/teams', AT_INCLUDE_PATH )
SportDb.read_setup( 'setups/teams', DE_INCLUDE_PATH )
SportDb.read_setup( 'setups/teams', EN_INCLUDE_PATH )
SportDb.read_setup( 'setups/teams', ES_INCLUDE_PATH )
SportDb.read_setup( 'setups/teams', IT_INCLUDE_PATH )
SportDb.read_setup( 'setups/2013_14', EUROPE_CHAMPIONS_LEAGUE_INCLUDE_PATH )
end
If I run the above command, the world.db database is building just fine. But when it is trying to import Champions League data into the database, I get the following error:
deprecated manifest/setup format [SportDb.Reader]; use new plain text format
[info] parsing data 'setups/teams' (../clubs/setups/teams.yml)...
rake aborted!
No such file or directory - ../clubs/setups/teams.yml
So I guess i needed to have the clubs repository aswell. So if i clone that repository and try to build it again, I got the same error message. And if I look into the clubs repository, it is true that a file called public/tmp/openfootball/clubs/setups/teams.yml doesn't exist. The only two that are in that directory are public/tmp/openfootball/clubs/setups/all.txt and public/tmp/openfootball/clubs/setups/clubs.txt.
Why doesn't it build out of the box? Am I suppose to change the files myself or am I missing crucial parts?
The next part is reading this sqlite file and importing it into my mysql database. But I guess I save that for another post. I would really like to solve this.
If it does make any difference, this is my Ruby version:
ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]

thanks for trying football.db - sorry for the trouble - the club repo got reorganized and two new country repos got added e.g. switzerland and france - the build script will get updated shortly so everything will work as expected out-of-the-box. You're welcome to post your question to the opensport/football.db forum for more detailed answers n insights or check back for updates/news. Cheers.
PS:
The next part is reading this sqlite file and importing it into my mysql database.
You can import the data into mysql directly (no need to import into sqlite first). Use the "official" build script - see the /build repo and change the database.yml file using the same syntax for the config like Rails (it's ActiveRecord).

Importing data from spreadsheet to Neo4j

I am new to Neo4j.I downloaded the software from www.neo4j.org and I was able to create the Movie graph which came with the download.
Now I am trying to export data from spreadsheet into Neo4j.Here is the procedure I am following
I was stuck at the last step - THEN EXECUTE THE FOLLOWING COMMAND, making sure that Neo4j is NOT running:
cat import.txt | <neo4j directory>/bin/neo4j-shell -config conf/neo4j.properties -path <neo4j directory>/data/graph.db
I am not sure where to find neo4j-shell and conf/neo4j.properties.I don't have these folders in my download.Then I found that I have to download ne04j-community-2.0.1-windows.
I downloaded it and I see neo4j-shell and also conf/neo4j.properties.Now that I have all the things requires to execute the above statement, I am not sure where to execute it.
I am using windows and I am not familiar with scripting.Can you guide me how and where to execute the command so that I can see the nodes and relationships created in Neo4j.
Thankyou!

On the Windows with the installer the shell is not officially installed.
See this blog post for an explanation: http://java.dzone.com/articles/solving-problems-neo4j-shell
C:\Program Files\Neo4j Community>jre\bin\java -cp bin\neo4j-desktop-2.0.0.jar org.neo4j.shell.StartClient
Btw. for other cool examples check out http://blog.bruggen.com

Ferret search not working for my rails app

First, I logged into the ruby script/console -e production and tried to index each table using
Model.rebuild_index
It worked fine and returned true
I then started the ferret server using the command
ruby script/ferret_server start -e production
Then i started my application and it's all working fine except the search. When i try searching on the search tab, i get an error as follows :
Words::BadWordnetDataset in HomeController#search
Failed to locate the wordnet database. Please ensure it is installed and that
if it resides at a custom path that path is given as an argument when
constructing the Words object.
The search is working fine in the console
result = ActsAsFerret.find("admin",[User], :limit => 2) does fetch me results

Installed the copy of the wordnet data files for OS using :
sudo apt-get install wordnet-base
Word is actually derived from Wordnet :
require 'word'
data = Words::Wordnet.new

Nothing surprising
ActiveRecord has no method rebuild-index

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart