aggregate logs into oracle db with flume - flume

I want to aggregate log files (10GB everyday), extract the ERROR logs, and then write them into an Oracle DB, can I use apache-flume to achieve this?
I read the document but did not find anything like "Oracle Sink", so I am going to create a custom Sink to write flume events to Oracle. Is it a good idea?

Yeah, creating a custom sink is the right way to go. It's fairly easy to do. Look at the source code for the built-in sinks, such as the Avro Sink, to get started.

Related

How to create a custom connector for Presto and InfluxDB

I am trying to create a custom connector for Presto and InfluxDB in order to make it possible for Presto to run SQL queries on InfluxDB. Are there any examples of such a connector being available already?
Connectors are the source of all data for queries in Presto. Even if your data source doesn’t have underlying tables backing it, as long as you adapt your data source to the API expected by Presto, you can write queries against this data.
The only documentation that I found for writing a connector is:
https://prestodb.io/docs/current/develop/example-http.html
If anyone has other examples, can you please share it?
There are multiple connectors in the presto source tree.
When you're connecting to a data source having JDBC driver (probably not your case), extending presto-base-jdbc driver gives you almost all you need. See for example https://github.com/trinodb/trino/tree/master/presto-postgresql
When you're connecting to a non-JDBC-enabled data source (or you need more that it's possible with presto-base-jdbc), you need to implement all the relevant connector interfaces. There isn't good documentation for this other than Java interfaces & source code, but you can follow examples e.g. https://github.com/trinodb/trino/tree/master/presto-cassandra, https://github.com/trinodb/trino/tree/master/presto-accumulo
Yet another option is Greg Leclercq's suggestion to implement a Thrift connector. See his answer for directions.
Another option if you prefer to code in a programming language other than Java, is to implement a Thrift service and use the Thrift connector

Neo4j sparql plugin data loading

I want to insert RDF data of file containing 10M triple (berlin sparql benchmark) and I want to use neo4j sparql plugin for this. I have following questions regarding this,
sort of similar question was probably asked at Turn Neo4j into a triplestore but I couldn't find answer to my following questions.
Is there any other way to to load data than using http://localhost:7474/db/data/ext/SPARQLPlugin/graphdb/insert_quad ? so I can query it using http://localhost:7474/db/data/ext/SPARQLPlugin/graphdb/execute_sparql. If there is, then how do I do it? and how do I query it after that?
How can I load data which is in ttl form? Do I have to have my data in quad form?
Thanks In Advance!
What do you want to achieve in the first place? Using Neo4j as a plain RDF store won't make you happy.
Try to look at your use-cases, create a sensible property-graph-model for those and import into it.
You can certainly read your ttl file with some tools or libraries available and then drive the Neo4j import from that.

Can I connect directly to the output of a Mahout model with other data-related tools?

My only experience with Machine Learning / Data Mining is via SQL Server Analysis Services.
Using SSAS, I can set up models and fire direct singleton queries against it do to things like real-time market basket analysis and product suggestions. I can grab the "results" from the model as a flattened resultset and visualize same elsewhere.
Can I connect directly to the output of a Mahout model with other data-related tools in the same manner? For example, is there any way I can pull out a tabular resultset so I can render same with the visualization tool of my choice? ODBC driver, maybe?
Thanks!
The output of Mahout is generally a file on HDFS, though you could dump it out anywhere Hadoop can put data. And with another job to translate to put in whatever form you need, it's readable. And if you can find an ODBC driver for the data store you put it in, yes.
So I suppose the answer is, no, there is not by design any integration with any particular consumer. But you can probably hook up whatever you imagine.
There are some bits that are designed to be real-time systems queried via API, but I don't think it's what you mean.

Formatting organizing and filtering data from text files

I'm looking to go through a bunch of text files in a bunch of folders. I'd like to go through each file line by line and do some basic statistics, like grabbing time stamp and count repeating values. Is there any tool or scripting solution that someone could recommend for doing this?
Another possibility is to have a script/tool that could just parse these files and add them to a database like sqlite or access, for easy filtering.
So far I tried using AIR, but it looks like there might be too much data for it to process, and it hangs, but that could be because of some inefficient filtering.
I have used QuickMacros for things like this. It can do just about anyting to a textfile (some illegal in 7 states) as well as connect to databases and perform sql tasks like create and modify tables etc.
I routinely used it to extract data, parse it, and then load it into another database. Especially useful with Scheduled Tasks.
Here's the website
I recommend Perl and CPAN

Logging / Log4J to database

In my Grails application, I run some batch processes asynchronously, and would like the process to log various status messages so that the administrator later can examine them.
I thought of using log4j JDBC appender as the simplest solution, but from what I can tell it doesn't use DataSource. Has anybody gotten it to work or wrote their own Grails DB Appender?
Did anybody have a similar requirement, and how did you implement it?
I can create a notion of Job, which hasMany LoggingMessages, but thought perhaps there is a standard way or a plugin that does this.
P.S. There was a somewhat related discussion a few weeks ago, but that one was about a different aspect than what I need.
http://grails.1312388.n4.nabble.com/Async-Event-that-publishes-progress-td2303653.html
Someone seems to have written a version of the log4j JDBC appender that does use a datasource and adds some other nice features. Have a look at this blog entry by the author.
If you want to consider stand alone log server, I'd recommend this tool. It would accept log data over sockets and persist them. Works with many database brands too. Very simple to set up. It's not a free tool though...

Resources