Nifi: How to concatenate flowfile to already existing tables in a directory? - join

This is a question about Nifi.
I made Nifi pipeline to convert flowfile with xml format to csv format.
Now, I would like to concatenate or union the converted csv flowfile to existing tables by filename (which stands for table name as well).
Simply put, my processor flow is following.
GetFile (from a particular directory) -> 2. Convert xml to csv -> 3.Update the flowfile with table name
-> 4. PutFile (to a different directory)
But, at the end of the flow, PutFile processor throws an error, saying "file with the same name already exists".
I have no ideas how flowfile can be added to existing csv table.
Any advice, tips, ideas are appreciated.
Thank you in advance.

there is no support to append file however you could use ExecuteGroovyScript to do it:
def ff=session.get()
if(!ff)return
ff.read().withStream{s->
String path = "./out_folder/${ff.filename}"
//sync on file path to avoid conflict on same file writing (hope)
synchronized(path){
new File( path ).append(s)
}
}
REL_SUCCESS << ff
if you need to work with text (reader) content rather then byte (stream) content
the following example shows how to exclude 1 header line from flow file if destination file already exists
def ff=session.get()
if(!ff)return
ff.read().withReader("UTF-8"){r->
String path = "./.data/${ff.filename}"
//sync on file path to avoid conflict on same file writing (hope)
synchronized(path){
def fout = new File( path )
if(fout.exists())r.readLine() //skip 1 line (header) only if out file already exists
fout.append(r) //append to the file the rest of reader content
}
}
REL_SUCCESS << ff

Related

How to change new File method in Groovy?

How do I replace the new File method with a secure one? Is it possible to create a python script and connect it?
Part of the code where I have a problem:
def template Name = new File(file: "${template}").normalize.name.replace(".html", "").replace(".yaml", "")
But when I run my pipeline, I get the error
java.lang.SecurityException: Unable to find constructor: new java.io .File java.util.LinkedHashMap
This method is prohibited and is blacklisted. How do I replace it and with what?
If you're reading the contents of the file, you can replace that "new File" with "readFile".
See https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#readfile-read-file-from-workspace
readFile: Read file from workspace
Reads a file from a relative path (with root in current directory, usually > workspace) and returns its content as a plain string.
file : String
Relative (/-separated) path to file within a workspace to read.
encoding : String (optional)
The encoding to use when reading the file. If left blank, the platform default encoding will be used. Binary files can be read into a Base64-encoded string by specifying "Base64" as the encoding.

store content in txt file into a variable in qlikview

I have a path stored in a txt file in Qlikview 11, what is the right syntax to store that path in a variable?
txt file found in: C:\Projects\X\Shared-COntent\filepath.txt
Content of the txt file is a network path
Thank you!
You can load the txt file as a normal table file and then peek the loaded value. Something like this:
NetowrkPath:
LOAD
#1 as NetworkPath
FROM
[C:\Projects\X\Shared-COntent\filepath.txt] (txt, utf8, explicit labels, delimiter is '\t', msq)
;
let vNetworkPath = peek('NetworkPath'); // <-- this is the variable that will contain the network path
Drop Table NetowrkPath;

python xlrd: convert xls to csv using tempfiles. Tempfile is empty

I am downloading an xls file from the internet. It is in .xls format but I need 'Sheet1' to be in csv format. I use xlrd to make the conversion but seem to have run into an issue where the file I write to is empty?
import urllib2
import tempfile
import csv
import xlrd
url_2_fetch = ____
u = urllib2.urlopen(url_2_fetch)
wb = xlrd.open_workbook(file_contents=u.read())
sh = wb.sheet_by_name('Sheet1')
csv_temp_file = tempfile.TemporaryFile()
with open('csv_temp_file', 'wb') as f:
writer = csv.writer(f)
for rownum in xrange(sh.nrows):
writer.writerow(sh.row_values(rownum))
That seemed to have worked. But now I want to inspect the values by doing the following:
with open('csv_temp_file', 'rb') as z:
reader = csv.reader(z)
for row in reader:
print row
But I get nothing:
>>> with open('csv_temp_file', 'rb') as z:
... reader = csv.reader(z)
... for row in reader:
... print row
...
>>>
I am using a tempfile because I want to do more parsing of the content and then use SQLAlchemy to store the contents of the csv post more parsing to a mySQL database.
I appreciate the help. Thank you.
This is completely wrong:
csv_temp_file = tempfile.TemporaryFile()
with open('csv_temp_file', 'wb') as f:
writer = csv.writer(f)
The tempfile.TemporaryFile() call returns "a file-like object that can be used as a temporary storage area. The file will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected)."
So your variable csv_temp_file contains a file object, already open, that you can read and write to, and will be deleted as soon as you call .close() on it, overwrite the variable, or cleanly exit the program.
So far so good. But then you proceed to open another file with open('csv_temp_file', 'wb') that is not a temporary file, is created in the script's current directory with the fixed name 'csv_temp_file', is overwritten every time this script is run, can cause security holes, strange bugs and race conditions, and is not related to the variable csv_temp_file in any way.
You should trash the with open statement and use the csv_temp_file variable you already have. You can try to .seek(0) on it before using it again with the csv reader, it should work. Call .close() on it when you are done with it and the temporary file will be deleted.

Save current document as .html with same name and path

I'm working on a script for FoldingText which will convert a FoldingText outline (basically a Markdown Text file) into a Remark presentation (an HTML script which makes slideshows from Markdown files). The script works, but I'd like to make the following improvement:
Instead of asking for the name and location to save the HTML file, I'd like to grab the name of the current document and save it as an HTML file (with the same name) in the current folder. The script should fail nicely if there is already a document with that name (offering to either write-over the document or save as a new document with a different name).
The script I'm using for writing to the file was from these forums. The first part is:
on write_to_file(this_data, target_file, append_data) -- (string, file path as string, boolean)
try
set the target_file to the target_file as text
set the open_target_file to ¬
open for access file target_file with write permission
if append_data is false then ¬
set eof of the open_target_file to 0
write this_data to the open_target_file starting at eof as «class utf8»
close access the open_target_file
return true
on error
try
close access file target_file
end try
return false
end try
end write_to_file
And the second part is:
set theFile to choose file name with prompt "Set file name and location:"
my write_to_file(finalText, theFile, true)
Thanks.
FoldingText should have some way of retrieveng the path of the document. I've not found any free dowonload fo the application, so I've not benn able to check by myself, but you should be able to find it if you view the dictionary of the application.
My guess is that there's a property like 'path of', or 'file of' for the FoldingText document:
You will probably end up with something like this:
set theDocPath to file of front document of application "FoldingText"
tell application "Finder"
set theContainer to container of theFile
end tell
set newPath to (theContainer & "export.html) as string
repeat while (file newPath exists)
set newPath to choose file name with prompt "Set file name and location:"
end repeat
my write_to_file(finalText, newPath, true)

How to open Excel file written with incorrect character encoding in VBA

I read an Excel 2003 file with a text editor to see some markup language.
When I open the file in Excel it displays incorrect characters. On inspection of the file I see that the encoding is Windows 1252 or some such. If I manually replace this with UTF-8, my file opens fine. Ok, so far so good, I can correct the thing manually.
Now the trick is that this file is generated automatically, that I need to process it automatically (no human interaction) with limited tools on my desktop (no perl or other scripting language).
Is there any simple way to open this XL file in VBA with the correct encoding (and ignore the encoding specified in the file)?
Note, Workbook.ReloadAs does not function for me, it bails out on error (and requires manual action as the file is already open).
Or is the only way to correct the file to go through some hoops? Either: text in, check line for encoding string, replace if required, write each line to new file...; or export to csv, then import from csv again with specific encoding, save as xls?
Any hints appreciated.
EDIT:
ADODB did not work for me (XL says user defined type, not defined).
I solved my problem with a workaround:
name2 = Replace(name, ".xls", ".txt")
Set wb = Workbooks.Open(name, True, True) ' open read-only
Set ws = wb.Worksheets(1)
ws.SaveAs FileName:=name2, FileFormat:=xlCSV
wb.Close False ' close workbook without saving changes
Set wb = Nothing ' free memory
Workbooks.OpenText FileName:=name2, _
Origin:=65001, _
DataType:=xlDelimited, _
Comma:=True
Well I think you can do it from another workbook. Add a reference to AcitiveX Data Objects, then add this sub:
Sub Encode(ByVal sPath$, Optional SetChar$ = "UTF-8")
Dim stream As ADODB.stream
Set stream = New ADODB.stream
With stream
.Open
.LoadFromFile sPath ' Loads a File
.Charset = SetChar ' sets stream encoding (UTF-8)
.SaveToFile sPath, adSaveCreateOverWrite
.Close
End With
Set stream = Nothing
Workbooks.Open sPath
End Sub
Then call this sub with the path to file with the off encoding.

Resources