I have a script control which splits my data down two file streams and into two File Destinations. Unfortunately even if there are no records for one of the streams the destination file Is still created. How can this be prevented.
Well this is how the SSIS behaves i.e. creating destination files even if there is no data. You would need a workaround to get this through -
Insert a 'Row Count' transformation between Script Task and your 'Destination File' and assign a variable to it say #intNumberOfRows.
Connect the Data Flow Task to a 'File Task' to delete the empty file created by using a conditional expression based on the value of #intNumberOfRows set to 0.
Related
I have foreach loop for excel files. The matter is I need truncate table and extract new rows only if there is any file. If not, process is being continued without foreash loop.
I configured control flow and data flow and it works if there are any file, but if don't I get an error.
Control Flow
I want to handle two situations: 1) When any file exists in directory -> truncate table and do data flow. 2) If directory is empty ( 0 .xlsx files) omit whole process and continue next tasks.
CALL apoc.import.csv(
[{fileName: 'file:/persons.csv', labels: ['Person']}],
[{fileName: 'file:/knows.csv', type: 'KNOWS'}],
{delimiter: '|', arrayDelimiter: ',', stringIds: false}
)
For this example, internally, does the 'import' use merge or create to add nodes, relationships and properties? I tested, it seems it uses 'create' to add new rows even for a new ID record. Is there a way to control this? When to use apoc.load VS apoc.import? It seems apoc.load is a lot more flexible, where users can choose to use cypher commands specifically for purposes. Right?
From the source of CsvEntityLoader (which seems to be doing the work under the covers), nodes are blindly created rather than being merged.
While there's an ignoreDuplicateNodes configuration property you can set, it just ignores IDs duplicated within the incoming CSV (i.e. it's not de-duplicating the incoming records against your existing graph). You could protect yourself from creating duplicate nodes by creating an appropriate unique constraint on any uniquely-identifying properties, which would at least prevent you accidentally running the same import twice.
Personally I'd only use apoc.import.csv to do a one-off bulk load of data into a fresh graph (or to load a dump from another graph that was exported as a CSV by something like apoc.export.csv.*). And even then, you've got the batch import tool that'll do that job with higher performance for large datasets.
I tend to use either the built-in LOAD CSV command or apoc.load.csv for most things, as you can control exactly what you do with each record coming in from the file (such as performing a MERGE rather than a CREATE).
As indicated by by #Pablissimo's answer, the ignoreDuplicateNodes config option (when explicitly set to true) does not actually check for duplicates in the DB - it just checks within the file. A request to address this hole was brought up before, but nothing has been done yet to address it. So, if this is a concern for your use case, then you should not use apoc.import.csv.
The rest of this answer applies iff your files never specify nodes that already exist in your DB.
If your node CSV file follows the neo4j-admin import command's import file header format and has a header that specifies the :ID field for the column containing the node's unique ID, then the apoc.import.csv procedure should, by default, fail when it encounters duplicate node IDs (within the same file). That is because the procedure's ignoreDuplicateNodes config value defaults to false (you can specify true to skip duplicate IDs instead of failing).
However, since your node imports are not failing but are generating duplicate nodes, that implies your node CSV file does not specify the :ID field as appropriate. To fix this, you need to add the :ID field and call the procedure with the config option ignoreDuplicateNodes:true. Or, you can modify those CSV files somehow to remove duplicate rows.
In informatica mapping design, there must be a target table, but in my design, I only use informatica to call store procedures, and after they were called, all work has been done, so I don't need a target table to be inserted or updated.
I used a non-exist table as the target table, and one nonsense field as the input port(cause there must be at least one input port!), then unchecked or the option(insert, update,delete) in the session configuration, so that the informatica would not generated DML SQL statements, avoiding "no table" errors.
But then informatica treat the input row as reject row and try to write it into a bad file. And cause I unchecked the insert option, the session log showed that there was an error that it couldn't be insert into the bad file!
Strangely, this error never showed in the monitor, and all session run successfully! It only appeared in informatica's meta table.
Is there a better way to avoid this problem, although it has no effect to my result? Is there a possibility to use a non-exist table and do nothing to it (include reject the input rows)?
Use a filter transformation just before the target and put filter condition 'FALSE'
No rows will go to the target
I had run into this same issue when i wanted to just execute a stored procedure and nothing else.
I solved this by creating a dummy source object that had one port and a dummy target with one port of the same datatype. In the source qualifier I added a SQL statement select 1 from dual (since it's Oracle).
I then added a filter object that was set to false. Then I connected the single port from the source/qualifier through the filter and finally to the target.
When the mapping is run, the source qualifier will return 1 row of one value, this will pass through to the filter but nothing will come out of the filter because the filter is set to false. This mapping will always be successful and valid because all ports are connected a nothing makes it to the "dummy" target thus no bad file logs or failure, etc.
Let me know if you need any clarification and I can update this answer.
No, you always need a target for the mapping to be valid. But I would rather work with a flat file target instead of a database table, you'll have much less work to do.
If you're on Linux / Unix, you can even route the file to /dev/null (use folder:/dev/, file:null) so the file is not actually written to the filesystem.
And using one dummy port is the right way. As you have said, you need at least one port, even if you don't really use it.
As odd as this may sound (Unix systems): neither source, nor target need to exist.
Source (flat file): /dev/null, column DUMMY
Target (flat file): /dev/null, column DUMMY
And you don't need to use any databases for the session to succeed, nor use any filters. It runs.
I am looking for some help on how to create a trailer record in a flat file using SSIS, I have create a SSIS package that creates a custom header and loads other record from the database into the flat file, it is a fixed width flat file. Now at the end of the file I want to create a Trailer Record along with some static text and Record count. I tried looking on to google but could not get any good example. Any help is much appreciated.
Use a Script Task. Take the file path of the fixed width flat file that you have obtained as a input variable. Once withing the script task use the .Net coding to append the data that you need. I have written a post on it - https://karbimantras.wordpress.com/2016/08/12/adding-record-count-to-flat-file/
I need some help with the following issue:
Inside a foreach loop, I have a Data Flow Task that reads each file from the collection folder. If it fails while proccesing a certain file, that file is copied to an error folder (using a file system task called "Copy Work to Error").
I would like to set up a Send Email Task that warns me if there were any files sent to the error folder during the package execution. I could easily add this task after the "Copy Work to Error" task, but if there are many files that fail the Data Flow Task, my inbox would get filled.
Instead, I would like the Send Mail Task only once (after the foreach loop completes) only if the "Copy Work to Error" task was executed at least once. Is there any way I could achieve that?
Thanks,
Ovidiu
Here's one way that I can think of:
Create an integer variable, #Total outside the ForEach container and set it to 0.
Create an integer variable, #PerIteration inside the ForEach container.
Add a Script Task as an event handler to the File System Task. This task should increment #Total by #PerIteration.
Add your SendMail task after the ForEach container. In the precedence constraint, set type to Expression, and specify the condition #Total > 0. This should ensure that your task is triggered only if the File System Task was executed in the loop at least once.
You could achieve this using just a boolean variable say IsError created outside the scope of the for each loop with default value as False. You can set this to True immediately after the success of Copy Work to Error task using an expression task(SSIS 2012) or an Execute SQL task. And finally your Send Mail task would be connected to the For Each loop with the precedence constraint set as the Expression - isError.
When the error happens, create a record in a table with the information you would like to include in the email - e.g.
1. File that failed with full path
2. the specific error
3. date/time
Then at the end of the package, send a consolidated email. This way, you have a central location to turn to in case you want to revisit the issue, or if the email is lost/not delivered.
If you need implementation help, please revert back.