Flume hdfs sink - flume

I am trying to use Flume with hdfs as sink. File is getting exported. But I want to customize the name of the output file. I am using hdfs.filePrefix property for it. It is always creating a file with FlumeData.timestamp.

Please paste your configuration.
I tried it and it did work.
My setting:
agent.sinks.flumeHDFS.hdfs.filePrefix = stackoverflow
and I get the expected result.

Related

How to read a xml file which is not well_formed?

I am trying to read an XML file with xml.etree.ElementTree
it gives me an error when it reaches a specific line of the file. I was assuming that this is a regular XML file. please help me how can I resolve it?
xml_file =ET.parse("/Users/arash/project/my_project/extracting_features/TraceData.log").getroot()

Loading whole file from source into HDFS in flume

How to get source filename as it is from source into HDFS in flume?
Ex: source file /usr/sample.txt hdfs: /tmp/sample.txt not like flumeevetns.23343.tmp
how to stop appending timestamp and .tmp?Ex:flumeevent.12334343.tmp(Here 12334343.tmp) I dont want it.
How to read as a whole file from Flume?
How to read csv file in Flume?
You need to add a parameter for spooldir which adds a header which is false by
default.
agentname.sources.sourcename.fileHeader=true
It will keep the same name of file and push into HDFS.

Fuse File System- general input/output error while accessing office files

I have written a fuse mirror file system using FUSE-JNA, Which mirror local directory.
This Mirror file system allow me to open all types of files correctly with no issue but it does not open all types of office files e.g. .docs , .xls etc. And give me be below error while opening any office file.
Note:
I thought its LibreOffice issue, so I removed it and installed OpenOffice. But get the same issue.
Secondly, the errors only pops up when I try to access an office file from my MirrorFileSystem. Office files opens correctly if accessed normally via ubuntu default file system.
So its some thing wrong with my File system.
Finally, (i don't know whether its related to the question or not but) in my mirror file system when I Right Click on a file>Properties> Permission its shows all the fields disabled, as below
This is my getatt() method:
public int getattr(final String path, final StatWrapper stat)
{
....
if (f.isFile())
{
stat.setMode(NodeType.FILE,true,true,true,true,true,true,true,true,true);
stat.size(f.length());
stat.atime(f.lastModified()/ 1000L);
stat.mtime(0);
stat.nlink(1);
stat.uid(0);
stat.gid(0);
stat.blocks((int) ((f.length() + 511L) / 512L));
return 0;
}
...
}
Please guide me how to fix general input/output error while office files?
Office files are not special. There is some other problem with your filesystem implementation, and you need to do more debugging work to find out precisely what the trigger and the cause is. It's very unlikely that the trigger truly is "the file is an office file", unless you have stuff in your filesystem code that operates differently based on the type of file it's dealing with (in which case you should look there). As a first debugging step, you could compare the sha1sum and stat output of the files from the fuse filesystem and from the root filesystem to see if they match. If they don't, adjust the filesystem code such that they do. You could also enable logging on your filesystem class and check if it's returning an I/O error code anywhere. The error message "general input/output error" makes it sound like that is the case.
As for the reason the permissions fields are disabled, that's because the file is owned by root, and you are not root so you can't change the permissions. The reason the file is owned by root is because you set stat.uid(0); and stat.gid(0); in getattr. UID 0 and GID 0 are for the root user and root group respectively. Fuse-JNA already puts the current UID and GID as default stat attributes in getattr, so if you want to use these then just don't call stat.uid(0); or stat.gid(0);.
Thanks for the answer.
I searched on web, on many websites they showed file locking as the reason e.g. https://forum.openoffice.org/en/forum/viewtopic.php?f=10&t=2020 etc
So in fuse, I implemented file lock function and simply return 0
My problem solved. Now I can open all types of office files.
But I do not know, is it perfect solution

How to call input file which is qlready in the package

In my Hadoop Map Reduce application I have one input file.I want that when I execute the jar of my application, then the input file will automatically be called.To do this I code one class to specify the input,output and file itself but from where I am calling the file, there I want to specify the file path. To do that I have used this code:
QueriesTest.class.getResourceAsStream("/src/main/resources/test")
but it is not working (cannot read the input file from the generated jar)
so I have used this one
URL url = this.getClass().getResource("/src/main/resources/test") here I am getting the problem of URL. So please help me out. I am using Hadoop 0.21.
I'm not sure what you want to tell us with your resource loading, but the usual way to add an input file is this:
Configuration conf = new Configuration();
Job job = new Job(conf);
Path in = new Path("YOUR_PATH_IN_HDFS");
FileInputFormat.addInputPath(job, in);
job.setInputFormatClass(TextInputFormat.class); // could be a sequencefile also
// set the other stuff
job.waitForCompletion(true);
Make sure your file resides in HDFS then.

NLog File splitting

I'm using NLog to log to file. Is there a way to configure it to create a new log file when the current one reaches a certain threshold (eg ~50mb)? Can it be done from the configuration file or code?
Yes:
fileName="${basedir}/logs/logfile.txt"
archiveFileName="${basedir}/archives/log.{#####}.txt"
archiveAboveSize="5242880"
archiveNumbering="Sequence"
concurrentWrites="true" <!-- http://nlog-project.org/doc/2.0/sl2/html/P_NLog_Targets_FileTarget_ArchiveAboveSize.htm -->

Resources