writing source and custom sink using flume-ng - flume

I am new to flume-NG. I want my source to send some unique xml files to the channel one by one. The channel will validate the xml files and send the validity(either true or false) and the xml file to th custom sink. This sink will write the valid files and invalid files to different directories in HDFS. I am not sure which source to use. Please help.

None of the current ones will fit your use case. The SpoolingDirectorySource is line-oriented so XML files will confuse it and not arrive in one piece.
I suggest you write a custom source for your application.

Related

Avoid reading the same file multiple times using Telegraf and file input plugin

I need to read csv files inside a folder. New csv files are generated every time a user submits a form. I'm using the "file" input plugin to read the data and send it to Influxdb. These steps are working fine.
The problem is that the same file is read multiple times every data collection interval. I was thinking of a solution where I could move the file that was read to a different folder, but I couldn't do that with Telegraf's "exec" output plug.
ps: I can't change the way csv files are generated.
Any ideas on how to avoid reading the same csv file multiple times?
As you discovered file input plugin is used to read entire files at each collection interval.
My suggestion is for you to instead use the directory monitor input plugin. This will read files in a directory, monitor the directory for new files, and parse the ones that have not already been picked up yet. There are some configuration settings in that plugin that make it easier to time when new files are read as well.
Another option is to use the tail input plugin which will tail a file and only read new updates to that file as things come. However, I think the directory monitor is more likely something you are after for your scenario.
Thanks!

read write file properties with PropertyHandler Shell Extension

I'm trying to create PropertyHandler shell extension.
What's the best way for embedding properties like (Title,Author,.....) to use the same file in multi computers or devices?
StgCreateStorageEx ? way or there is other ways to do it?
because StgCreateStorageEx dealing with NTFS files only and i'm not sure if the file hold these properties with it if i open it in other device with same PropertyHandler
Is there any way to save properties inside the my file ?
The StgCreateStorageEx function creates a new storage object using the IStorage interface. This allows storing multiple data objects within a single binary file, see for example https://en.wikipedia.org/wiki/COM_Structured_Storage. So, technically, you can save almost anything in this file including embedded properties.
I don't think that this is limited to NTFS: The old Microsoft Office .doc format (and many other Microsoft products) use this storage format and work also with FAT32.
If you want to use this binary file format is a completely different question. As you did not provide any information about the content and format of your file, I cannot recommend anything. One alternative would be to store the content of your file in an xml file. Properties like Title and Author then could be added easily.

Copy only new added files from one folder to another, without moving the existing files from source folder

I am doing file integration using mirth. There is one software which generate the HL7 files. I want to read data from that files, without moving them to another destination. Next time when I want to read data, at that time it'll ignore the files from which the data are already read (i.e.Just read the new files data which are generated after last data read).
I had done this but I'll achieve it when I modify the original filename, if I am not modifying the filename then it'll read the duplicate data.
Is there is any solution for this problem, so we can read data from the files which are generated new. I am using mirth 3.5.1 version and HL7 v2 messages.
Thanks in advance.
Thanks #daveloyall, I am posting your comment as a answer here.
When you rename a file at the time you process it, for example, to add a .DONE suffix to the filename, you are adding information that can be used later. The part of the channel that reads files could be configured to skip files that have the .DONE suffix. You also add information if you move the files. Or store the filenames in some database table. I don't know if Mirth has an internal feature that tracks which HL7 messages it already processed, but if such a feature exists, the keyword 'deduplication' might be associated with it.

Creating MERGED file from ditamap and then filtering it

I'm a newbie in DITA OT and I am trying to get just the merged file from my ditamap and after upply xsl to it, but I dont need any other output.
I was thinking that I could use some part of the dita-ot source code that is doing merging or to make sort of "cutted" plugin that produces just the merged xml file and proccess xsl for it, and then gives the filtered xml file.
As I understood, there is a build.xml that is making this job using dost.jar, but I cannot configure, what exactly do I need to use from it. Or is it possible to create just merged file seperately, not starting the transformation, so I can use it for my later needs?
I am very appreciate for any help.
I created a special DITA OT plugin which can be integrated in the DITA OT and used to create just the merged document:
https://github.com/oxygenxml/dita-merged/tree/master/com.oxygenxml.merged
If you choose to publish to PDF and you set the parameter clean.temp to no, after the transformation is over you should obtain in the transformation temporary files directory a file called mapFileName_MERGED.xml which has all topic references expanded.

Download Directory and Contents

Is it possible to persuade the stream result to download an entire directory and it's contents? And if so, how? I've no problem getting it to download individual files, but I have a need to download a series of files that must be in a specific directory structure.
I don't think so.
Stream result allow you to download ONE content, with its MIME type, its name, etc.
This makes it impossible to work with a lot of files, with different names and content type.
What you can do is:
Render in a JSP the list of files (in anchor tags for example), everyone targeting the Action that will download that single file;
Call multiple Actions via scripting opening multiple pages (target="_blank") for every file you have (dangerous, annoying, almost useless...);
Create a zip with Java in server side, containing all your files and directories, then output the zip with Stream result.
I think you may consider the third option.

Resources