Which Flume Source suits better for watching growing local file (e.g., a log file)?
Spooling Directory Source isn't suitable because it looks only new files and doesn't allow changes in exists files.
You can always try to use tail on that file in question using the Exec source.
Something like this:
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /var/log/secure
a1.sources.r1.channels = c1
Please note that this isn't the most reliable way, see also the warning in the documentation (https://flume.apache.org/FlumeUserGuide.html#exec-source)
Related
I have installed Hadoop and Spark on Docker. Now, I want to verify that the installations are succesfully done, through a simple jar of a word count example. But I do not know how to do it. Any ideas?
Thank you in advance.
You don't need Hadoop. Spark can run wordcount on plain-text local files
You can use spark-shell rather than any JAR. Run the code from the documentation, but use file:/// path
text_file = sc.textFile("file:///some_local_folder")
counts = text_file.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("file:///tmp/results")
I'm trying to dump a jena database as triples.
There seems to be a command that sounds perfectly suited to the task: tdb2.dump
jena#debian-clean:~$ ./apache-jena-3.8.0/bin/tdb2.tdbdump --help
tdbdump : Write a dataset to stdout (defaults to N-Quads)
Output control
--output=FMT Output in the given format, streaming if possible.
--formatted=FMT Output, using pretty printing (consumes memory)
--stream=FMT Output, using a streaming format
--compress Compress the output with gzip
Location
--loc=DIR Location (a directory)
--tdb= Assembler description file
Symbol definition
--set Set a configuration symbol to a value
--mem=FILE Execute on an in-memory TDB database (for testing)
--desc= Assembler description file
General
-v --verbose Verbose
-q --quiet Run with minimal output
--debug Output information for debugging
--help
--version Version information
--strict Operate in strict SPARQL mode (no extensions of any kind)
jena#debian-clean:~$
But I've not succeded in getting it to write anything to STDOUT.
When I use the --loc parameter to point to a DB, a new copy of that DB appears in the subfolder: Data-0001, but nothing appears in STDOUT.
When I try the --tdb parameter, and point it to a ttl file, I get a stack trace complaining about its formatting.
Google has turned up the Jena documentation telling me the command exists, and that's it. So any help appreciated.
"--loc" should be the same as used to create the database.
Suppose that's "DB2". For TDB2 (not TDB1) after the database is created, then "DB2/Data-0001" will already exist. Do not use this for --loc. Use "--loc DB2".
If it is a TDB1 database (the files are in the directory at "--loc", no "Datat-0001"), the use tdbdump. An empty database has no triples/quads in it so you would get no output.
Fuseki currently (up to 3.16.0) has to be called with the same setup each time it is run, which is fragile regarding TDB1/TDB2. If you created the TDB2 database outside Fuseki and only use command line args, you'll need "--tdb2" each time.
Fuseki in next release (3.17.0) detects existing database type.
Everyone, please need help.
I am using telegraf now as a log feeder for my InfluxDB database, the concept is my telegraf will read a log then send the result to InfluxDB.
[[inputs.logparser]]
files = ["/here/is/the/directory/logname.log"]
from_beginning = false
It works as expected when the log file name is logname.log. But, today i need to changes the logname system to logname.20170320.log where 20170320 is the date of log. Do you mind, how is the right configuration for:
files = ["/here/is/the/directory/logname.log"]
So it can read the daily log that the name dynamicly changed everyday like:
files = ["/here/is/the/directory/logname.20170320.log"]
files = ["/here/is/the/directory/logname.20170321.log"]
Thanks for your help.
Based on #Luv33preet comment here, then i make a script to change the configuration daily using sed, here is the code:
/bin/sed -i "s/`date +'%Y%m%d' -d '1 day ago'`/`date +'%Y%m%d'`/" /etc/telegraf/conf.d/my-config.conf
To change telegraf configuration.
Why do you just set a wildcard for your logfile?
[[inputs.logparser]]
/var/log/*/*.log -> find all .log files with a parent dir in /var/log
from_beginning = false
All,
I am looking for a way to create a file every hour which captures the output of 24 commands. These commands will output the status of replication for 24 consistency groups into the same file. The file or the information in the file needs to be emailed to a DL. My hangup seems to be on the file check. If file exists rename/move, etc.
Thanks
I think you need Test-Path command
$bool = Test-Path $fileName
if($bool){
Rename-Item $fileName $newFileName
}
I have a little script to read my PATH and store in a file, which I would like to be scheduled to run daily.
path = os.getenv("PATH")
file_name = "C:\\temp.txt"
file = io.open(file_name, "w")
file:write(path)
file:close()
If I run it from command line it works, but when I create batch file (I work on Windows XP) and double click it - the os.getenv("PATH") returns false. The batch file:
"C:\Program Files\Lua\5.1\lua" store_path.lua
I read in comments to this question that it "is not a process environment variable, it's provided by the shell, so it won't work". And indeed, some other env variables (like username) work fine.
The two questions I have are:
Why the shell does not have access to the PATH? I thought it would
make a copy of the environment (so only setting env variable would be a problem)?
What would be the best way to read the PATH in such a way that I can add
it to a scheduler?
Have the batch file run it from a shell so that you get shell variables:
cmd /c C:\path\to\lua myfile.lua