Twitter data streaming - dynamic keyword value change - twitter

I have currently doing Sentiment Analysis on Twitter data in Hadoop.
I have configured FLUME to bring data based upon certain keywords, which I have maintained in "FLUME.conf" file, as shown below
TwitterAgent.sources.Twitter.keywords = Kitkat, Nescafe, Carnation, Milo, Cerelac
With mentioning this, I would like to know whether it is possible to dynamically change the keywords, based upon a java program that will pop-up asking for keywords from user.
Also, apart from this method, please provide any other methods you folks would suggest to hide this complexity of updating the Flume.conf file.
Best Regards,
Ram

Flume provides an Application class to run its configuration file using a Java program.
public class Flume{
public static void main(String[] args) throws Exception
{
new Conf().setConfiguration();
String[] args1 = new String[] { "agent","-nTwitterAgent",
"-fflume.conf" };
BasicConfigurator.configure();
Application.main(args1);
System.setProperty("hadoop.home.dir", "/");
}
}
Here you will get a change to edit your conf file using some file utils like this
class Conf{
int ch;
String keyword ="";
Scanner sc= new Scanner(System.in);
public void setConfiguration(){
System.out.println("Enter the keyword");
keyword=sc.nextLine();
byte[] key= keyword.getBytes();
FileOutputStream fp=null;
FileInputStream src=null;
try{
fp= new FileOutputStream("flumee.conf");
src= new FileInputStream("flume.conf");
while((ch=src.read())!=-1){
fp.write((char)ch);
}
fp.write(key);
}catch(Exception e){
System.out.println("file Exception:"+ e);
}finally{
try{
if(fp!=null){
fp.close();
}
if(src!=null){
src.close();
}
}catch(Exception e){
System.out.println("file closing Exception:"+ e);
}
}
}
}
Now you need to keep the TwitterAgent.sources.Twitter.keywords= line at the end of your flume configuration file so that it will be easier for you to add the keywords into the file. My flume.conf file looks like this
TwitterAgent.sources= Twitter
TwitterAgent.channels= MemChannel
TwitterAgent.sinks=HDFS
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels=MemChannel
TwitterAgent.sources.Twitter.consumerKey=xxx
TwitterAgent.sources.Twitter.consumerSecret=xxx
TwitterAgent.sources.Twitter.accessToken=xxx
TwitterAgent.sources.Twitter.accessTokenSecret=
TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:9000/user/flume/direct
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=1000
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600
TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.channels.MemChannel.transactionCapacity=1000
TwitterAgent.sources.Twitter.keywords=
When i run the program it will ask for the keywords first and then it will start collecting the tweets about that keyword.

Related

Spring Integration - List file names being downloaded via a RemoteFileTemplate

I am downloading files via Sftp using a Spring Integration RemoteFileTemplate. How do I perform some processing on each file name that is being downloaded? I see that the line
.log(LoggingHandler.Level.INFO, "sftp.inbound", Message::getHeaders)
logs the file names but I need the file names available directly.
All I need to do is write the downloaded file names as a list into a POJO for passing as a response to a later process. My code is attached below.
`
#Configuration
#EnableIntegration
public class SftpInboundFlowIntegrationConfig {
private static final Logger log = LoggerFactory.getLogger(SftpInboundFlowIntegrationConfig.class);
private String sftpRemoteDirectory = "/";
#Bean
public SessionFactory<ChannelSftp.LsEntry> inboundSftpSessionFactory() {
DefaultSftpSessionFactory factory = new DefaultSftpSessionFactory(true);
factory.setHost("localhost");
factory.setPort(2222);
factory.setUser("local");
factory.setPassword("local");
factory.setAllowUnknownKeys(true);
return new CachingSessionFactory<>(factory);
}
#Bean
public IntegrationFlow sftpInboundFlow(RemoteFileTemplate<ChannelSftp.LsEntry>
inboundRemoteFileTemplate) {
return e -> e
.log(LoggingHandler.Level.INFO, "sftp.inbound", Message::getPayload)
.log(LoggingHandler.Level.INFO, "sftp.inbound", Message::getHeaders)
.handle(
Sftp.outboundGateway(inboundRemoteFileTemplate, AbstractRemoteFileOutboundGateway.Command.MGET, "payload")
.localDirectory(new File("c:/tmp"))
);
}
#Bean
public RemoteFileTemplate<ChannelSftp.LsEntry> inboundRemoteFileTemplate(SessionFactory<ChannelSftp.LsEntry> inboundSftpSessionFactory) {
RemoteFileTemplate<ChannelSftp.LsEntry> template = new SftpRemoteFileTemplate(inboundSftpSessionFactory);
template.setRemoteDirectoryExpression(new LiteralExpression(sftpRemoteDirectory));
template.setAutoCreateDirectory(true);
template.afterPropertiesSet();
template.setUseTemporaryFileName(false);
return template;
}
}
`
Sorry all. I was trying to accomplish this in the wrong area of code. I realized that when I call my outbound gateway to download the files, sftpOutboundGateway.mget("/"); that it returns the list of files downloaded, which is what I needed.

Jenkins API to retrieve a build log in chunks

For a custom monitoring tool I need an API (REST) to fetch the console log of a Jenkins build in chunks.
I know about the /consoleText and /logText/progressive{Text|HTML} APIs, but the problem with this is that sometimes, our build logs get really huge (up to a few GB). I have not found any way using those existing APIs that avoids fetching and transferring the whole log in one piece. This then normally drives the Jenkins master out of memory.
I already have the Java code to efficiently fetch chunks from a file, and I have a basic Jenkins plugin that gets loaded correctly.
What I'm missing is the correct extension point so that I could call my plugin via REST, for example like
http://.../jenkins/job/<jobname>/<buildnr>/myPlugin/logChunk?start=1000&size=1000
Or also, if that is easier
http://.../jenkins/myPlugin/logChunk?start=1000&size=1000&job=<jobName>&build=<buildNr>
I tried to register my plugin with something like (that code below does not work!!)
#Extension
public class JobLogReaderAPI extends TransientActionFactory<T> implements Action {
public void doLogChunk(StaplerRequest req, StaplerResponse rsp) throws IOException {
LOGGER.log(Level.INFO, "## doLogFragment req: {}", req);
LOGGER.log(Level.INFO, "## doLogFragment rsp: {}", rsp);
}
But I failed to find the right encantation to register my plugin action.
Any tips or pointers to existing plugins where I can check how to register this?
This was indeed more simple than I expected :-) It as always: once one understands the plugin system, it just needs a few lines of code.
Turns out all I needed to do was write 2 very simple classes
The "action factory" that get's called by Jenkins and registers an action on the object in question (in my case a "build" or "run"
public class ActionFactory extends TransientBuildActionFactory {
public Collection<? extends Action> createFor(Run target) {
ArrayList<Action> actions = new ArrayList<Action>();
if (target.getLogFile().exists()) {
LogChunkReader newAction = new LogChunkReader(target);
actions.add(newAction);
}
return actions;
}
The class the implements the logic
public class LogChunkReader implements Action {
private Run build;
public LogChunkReader(Run build) {
this.build = build;
}
public String getIconFileName() {
return null;
}
public String getDisplayName() {
return null;
}
public String getUrlName() {
return "logChunk";
}
public Run getBuild() {
return build;
}
public void doReadChunk(StaplerRequest req, StaplerResponse rsp) throws IOException, ServletException {

How do I use SimpleFileVisitor in Java to find a file name whose encoding may vary?

I'm using SimpleFileVisitor to search for a file. It works fine on Windows and Linux. However when I try using it on Unix like operating systems It doesn't work as expected. I would get errors like this:
java.nio.file.NoSuchFileException:
/File/Location/MyFolder/\u0082\u0096\u0096âĜu0099\u0081\u0097K
\u0097\u0099\u0096\u0097\u0085\u0099Ĝu0089\u0085
It looks like the obtained name is in different character encoding and maybe that is what causing the issue. It looks like in between the obtaining the name and trying to obtain the access to the file, the encoding is getting missed up. This result in calling preVisitDirectory once then visitFileFailed for every file it tries to visit. I'm not sure why the walkFileTree method is doing that. Any idea?
My using for SimpleFileVisitor code looks like this:
Files.walkFileTree(serverLocation, finder);
My SimpleFileVisitor class:
public class Finder extends SimpleFileVisitor<Path> {
private final PathMatcher matcher;
private final List<Path> matchedPaths = new ArrayList<Path>();
private String usedPattern = null;
Finder(String pattern) {
this.usedPattern = pattern;
matcher = FileSystems.getDefault().getPathMatcher("glob:" + pattern);
}
void match(Path file) { //Compare pattern against file or dir
Path name = file.getFileName();
if (name != null && matcher.matches(name))
matchedPaths.add(file);
}
// Check each file.
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
match(file);
return CONTINUE;
}
// Check each directory.
#Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) {
match(dir);
return CONTINUE;
}
#Override
public FileVisitResult visitFileFailed(Path file, IOException e) {
System.out.println("Issue: " + e );
return CONTINUE;
}
Try using "Charset.defaultCharset()" when you create those "file" and "dir" strings you pass around. Otherwise, you could very likely mangle the names in the process of creating those strings to pass them to your visit methods.
You might also check your default encoding on the JVM your are running, if it is out of sync with the file system you are reading, your results will be, err, unpredictable.

NEO4J Spatial: tips about batch inserter

This is my scenario: we are building a routing system by using neo4j and the spatial plugin. We start from the OSM file and we read this file and import nodes and relationships in our graph (a custom graph model)
Now, if we don't use the batch inserter of neo4j, in order to import a compressed OSM file (with compressed dimension of around 140MB, and normal dimensions around 2GB) it takes around 3 days on a dedicated server with the following characteristics: CentOS 6.5 64bit, quad core, 8GB RAM; pease note that the most time is related to the Neo4J Nodes and relationships creation; in-fact if we read the same file without doing anything with neo4j, the file is read in around 7 minutes (i'm sure about this becouse in our process we first read the file in order to store the correct osm nodes ids and then we read again the file in order to create the neo4j graph)
Obviously we need to improve the import proces so we are trying to use the batchInserter. So far, so good (I need to check how much it will perform by using the batchInserter but I guess it will be faster); so the first thing I did was: let's try to use the batch inserter in a simple test case (very similar to our code, but without modifying our code directly)
I list my software versions:
Neo4j: 2.0.2
Neo4jSpatial: 0.13-neo4j-2.0.1
Neo4jGraphCollections: 0.7.1-neo4j-2.0.1
Osmosis: 0.43.1
Since I'm using osmosis in order to read the osm file, I wrote the following Sink implementation:
public class BatchInserterSinkTest implements Sink
{
public static final Map<String, String> NEO4J_CFG = new HashMap<String, String>();
private static File basePath = new File("/home/angelo/Scrivania/neo4j");
private static File dbPath = new File(basePath, "db");
private GraphDatabaseService graphDb;
private BatchInserter batchInserter;
// private BatchInserterIndexProvider batchIndexService;
private SpatialDatabaseService spatialDb;
private SimplePointLayer spl;
static
{
NEO4J_CFG.put( "neostore.nodestore.db.mapped_memory", "100M" );
NEO4J_CFG.put( "neostore.relationshipstore.db.mapped_memory", "300M" );
NEO4J_CFG.put( "neostore.propertystore.db.mapped_memory", "400M" );
NEO4J_CFG.put( "neostore.propertystore.db.strings.mapped_memory", "800M" );
NEO4J_CFG.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );
NEO4J_CFG.put( "dump_configuration", "true" );
}
#Override
public void initialize(Map<String, Object> arg0)
{
batchInserter = BatchInserters.inserter(dbPath.getAbsolutePath(), NEO4J_CFG);
graphDb = new SpatialBatchGraphDatabaseService(batchInserter);
spatialDb = new SpatialDatabaseService(graphDb);
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
//batchIndexService = new LuceneBatchInserterIndexProvider(batchInserter);
}
#Override
public void complete()
{
// TODO Auto-generated method stub
}
#Override
public void release()
{
// TODO Auto-generated method stub
}
#Override
public void process(EntityContainer ec)
{
Entity entity = ec.getEntity();
if (entity instanceof Node) {
Node osmNodo = (Node)entity;
org.neo4j.graphdb.Node graphNode = graphDb.createNode();
graphNode.setProperty("osmId", osmNodo.getId());
graphNode.setProperty("latitudine", osmNodo.getLatitude());
graphNode.setProperty("longitudine", osmNodo.getLongitude());
spl.add(graphNode);
} else if (entity instanceof Way) {
//do something with the way
} else if (entity instanceof Relation) {
//do something with the relation
}
}
}
Then I wrote the following test case:
public class BatchInserterTest
{
private static final Log logger = LogFactory.getLog(BatchInserterTest.class.getName());
#Test
public void batchInserter()
{
File file = new File("/home/angelo/Scrivania/MilanoPiccolo.osm");
try
{
boolean pbf = false;
CompressionMethod compression = CompressionMethod.None;
if (file.getName().endsWith(".pbf"))
{
pbf = true;
}
else if (file.getName().endsWith(".gz"))
{
compression = CompressionMethod.GZip;
}
else if (file.getName().endsWith(".bz2"))
{
compression = CompressionMethod.BZip2;
}
RunnableSource reader;
if (pbf)
{
reader = new crosby.binary.osmosis.OsmosisReader(new FileInputStream(file));
}
else
{
reader = new XmlReader(file, false, compression);
}
reader.setSink(new BatchInserterSinkTest());
Thread readerThread = new Thread(reader);
readerThread.start();
while (readerThread.isAlive())
{
try
{
readerThread.join();
}
catch (InterruptedException e)
{
/* do nothing */
}
}
}
catch (Exception e)
{
logger.error("Errore nella creazione di neo4j con batchInserter", e);
}
}
}
By executing this code, I get this exception:
Exception in thread "Thread-1" java.lang.ClassCastException: org.neo4j.unsafe.batchinsert.SpatialBatchGraphDatabaseService cannot be cast to org.neo4j.kernel.GraphDatabaseAPI
at org.neo4j.cypher.ExecutionEngine.<init>(ExecutionEngine.scala:113)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:53)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:43)
at org.neo4j.collections.graphdb.ReferenceNodes.getReferenceNode(ReferenceNodes.java:60)
at org.neo4j.gis.spatial.SpatialDatabaseService.getSpatialRoot(SpatialDatabaseService.java:76)
at org.neo4j.gis.spatial.SpatialDatabaseService.getLayer(SpatialDatabaseService.java:108)
at org.neo4j.gis.spatial.SpatialDatabaseService.containsLayer(SpatialDatabaseService.java:253)
at org.neo4j.gis.spatial.SpatialDatabaseService.createLayer(SpatialDatabaseService.java:282)
at org.neo4j.gis.spatial.SpatialDatabaseService.createSimplePointLayer(SpatialDatabaseService.java:266)
at it.eng.pinf.graph.batch.test.BatchInserterSinkTest.initialize(BatchInserterSinkTest.java:46)
at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:95)
at java.lang.Thread.run(Thread.java:744)
This is related to this code:
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
So now I'm wondering: how can I use the batchInserter for my case? I have to add the created nodes to the SimplePointLayer....so how can I create it by using the batchInserter graph db service?
Is there any little simple sample?
Any tip is really really appreciated
cheers
Angelo
The OSMImporter class in the code has an example of using the batch inserter to import OSM data. The main thing is that the batch inserter is not really supported by neo4j spatial, so you need to do a few things manually. If you look at the class OSMImporter.OSMBatchWriter, you will see how it does things. It is not using the SimplePointLayer at all, since that does not support the batch inserter. It is creating the graph structure it wants directly. The simple point layer is quite simple, certainly much simpler than the OSM model created by the code I'm referencing, so I think you should be able to write a batch-inserter compatible version yourself without too much trouble.
What I would recommend is that you create the layer and nodes using the batch inserter to create the correct graph structure, then switch to the normal embedded API and use that to iterate through the nodes and add them to the spatial index.

writing UI content to a file on a server

I've got a pop-up textarea, where the user writes some lengthy comments. I would like to store the content of the textarea to a file on the server on "submit". What is the best way to do it and how?
THanks,
This would be very easy to do. The text could be just a string or stringBuffer for size and formatting, then just pass that to your java code and use file operations to write to a file.
This is some GWT code, but it's still Ajax, so it will be similar. Get a handler for an event to capture the button submittal, then get the text in the text area.
textArea.addChangeHandler(new ChangeHandler() {
public void onChange(ChangeEvent changeEvent) {
String text = textArea.getText();
}
});
The passing off mechanism I don't know because you don't show any code, but I just created a file of filenames, line by line by reading filenamesout of a list of files with this:
private void writeFilesListToFile(List<File> filesList) {
for(File file : filesList){
String fileName = file.getName();
appendToFile(fileName);
}
}
private void appendToFile(String text){
try {
BufferedWriter out = new BufferedWriter(new FileWriter<file path andfile name>));
out.write(text);
out.newLine();
out.close();
} catch (IOException e) {
System.out.println("Error appending file with filename: " + text);
}
}
You could do something similar, only write out the few lines you got from the textarea. Without more to go on I can't really get more specific.
HTH,
James

Resources