How to filter Flat File Source using script component - ssis-2012

I have the following scenario:
I have thousands of text files with the below format.The column names are written in separate lines where as the row values are delimited by Pipe(|).
START-OF-FILE
PROGRAMNAME=getdata
DATEFORMAT=yyyymmdd
#Some Text
#Some Text
#Some Text
#Some Text
#Some Text
START-OF-FIELDS
Field1
Field2
Field3
------
FieldN
END-OF-FIELDS
TIMESTARTED=Tue May 12 16:04:42 JST 2015
START-OF-DATA
Field1Value|Field2value|Field3Value|...|Field N Value
Field1Value|Field2value|Field3Value|...|Field N Value
------|...........|----|-------
END-OF-DATA
DATARECORDS=30747
TIMEFINISHED=Tue May 12 16:11:53 JST 2015
END-OF-FILE
Now I have a corresponding SQL Server table, where I can easily load the data as destination.
Since I am new to SSIS, having trouble as to how to write the Script Component so that I can filter the Source Text files and easily load into sql server table.
Thanks in advance!

There are a few ways to do it. If the format of the files are constant, there are some useful properties of the flat file connection manager editor. For example, you can add a new flat file connection into the connection managers. There are some properties such as "Rows to skip" for the above file, you could set this to 18. Then it would start at the columns line with the "|".
Another property of the flat file connection manager that may be useful is that if you open the flat file connection manager, and then click on columns in the side menu, you can set the column delimter to the pipe "|"
But if the format of the file will change, e.g. variable number of header rows, you can use a script task to remove any non-piped rows. e.g. the header and footer.
For example, you can add a method such as file.readalllines and then edit or remove the lines as needed then save the file.
Info about that method is here:
https://msdn.microsoft.com/en-us/library/s2tte0y1%28v=vs.110%29.aspx
e.g. to remove last line in script task
string[] lines = File.ReadAllLines( "input.txt" );
StringBuilder sb = new StringBuilder();
int count = lines.Length - 1; // all except last line
for (int i = 0; i < count; i++)
{
sb.AppendLine(lines[i]);
}
File.WriteAllText("output.txt", sb.ToString());

USE Below VB Script in your SSIS SCript Component Task as source
enter code here
Imports System
Imports System.Data
Imports System.Math
Imports System.IO
Imports Microsoft.SqlServer.Dts.Runtime
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
Inherits UserComponent
'Private strSourceDirectory As String
'Private strSourceFileName As String
Private strSourceSystem As String
Private strSourceSubSystem As String
Private dtBusinessDate As Date
Public Overrides Sub PreExecute()
MyBase.PreExecute()
'
' Add your code here for preprocessing or remove if not needed
''
End Sub
Public Overrides Sub PostExecute()
MyBase.PostExecute()
'
' Add your code here for postprocessing or remove if not needed
' You can set read/write variables here, for example:
Dim strSourceDirectory As String = Me.Variables.GLOBALSourceDirectory.ToString()
Dim strSourceFileName As String = Me.Variables.GLOBALSourceFileName.ToString()
'Dim strSourceSystem As String = Me.Variables.GLOBALSourceSystem.ToString()
'Dim strSourceSubSystem As String = Me.Variables.GLOBALSourceSubSystem.ToString()
'Dim dtBusinessDate As Date = Me.Variables.GLOBALBusinessDate.Date
End Sub
Public Overrides Sub CreateNewOutputRows()
'
' Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
' For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
'
Dim sr As System.IO.StreamReader
Dim strSourceDirectory As String = Me.Variables.GLOBALSourceDirectory.ToString()
Dim strSourceFileName As String = Me.Variables.GLOBALSourceFileName.ToString()
'Dim strSourceSystem As String = Me.Variables.GLOBALSourceSystem.ToString()
'Dim strSourceSubSystem As String = Me.Variables.GLOBALSourceSubSystem.ToString()
'Dim dtBusinessDate As Date = Me.Variables.GLOBALBusinessDate.Date
'sr = New System.IO.StreamReader("C:\QRM_SourceFiles\BBG_BONDS_OUTPUT_YYYYMMDD.txt")
sr = New System.IO.StreamReader(strSourceDirectory & strSourceFileName)
Dim lineIndex As Integer = 0
While (Not sr.EndOfStream)
Dim line As String = sr.ReadLine()
If (lineIndex <> 0) Then 'remove header row
Dim columnArray As String() = line.Split(Convert.ToChar("|"))
If (columnArray.Length > 1) Then
Output0Buffer.AddRow()
Output0Buffer.Col0 = columnArray(0).ToString()
Output0Buffer.Col3 = columnArray(3).ToString()
Output0Buffer.Col4 = columnArray(4).ToString()
Output0Buffer.Col5 = columnArray(5).ToString()
Output0Buffer.Col6 = columnArray(6).ToString()
Output0Buffer.Col7 = columnArray(7).ToString()
Output0Buffer.Col8 = columnArray(8).ToString()
Output0Buffer.Col9 = columnArray(9).ToString()
Output0Buffer.Col10 = columnArray(10).ToString()
Output0Buffer.Col11 = columnArray(11).ToString()
Output0Buffer.Col12 = columnArray(12).ToString()
Output0Buffer.Col13 = columnArray(13).ToString()
Output0Buffer.Col14 = columnArray(14).ToString()
Output0Buffer.Col15 = columnArray(15).ToString()
Output0Buffer.Col16 = columnArray(16).ToString()
Output0Buffer.Col17 = columnArray(17).ToString()
Output0Buffer.Col18 = columnArray(18).ToString()
Output0Buffer.Col19 = columnArray(19).ToString()
Output0Buffer.Col20 = columnArray(20).ToString()
Output0Buffer.Col21 = columnArray(21).ToString()
Output0Buffer.Col22 = columnArray(22).ToString()
Output0Buffer.Col23 = columnArray(23).ToString()
Output0Buffer.Col24 = columnArray(24).ToString()
End If
End If
lineIndex = lineIndex + 1
End While
sr.Close()
End Sub
End Class
Code End

Related

Match Data In Two Files Then Email Each Person

For simplicity, file1.txt is a log file for which I extract logonIds into an array. File2.txt contains rows of logonID,emailAddress,other,needless,data
I need to take all of the logonIDs read into my array from file1 and extract their email addresses from file2. Once I have this information, I can then send each person in file1 an email. Can't just use file2.txt because it contains users who should not receive an email.
I wrote vbscript that extracts logonIDs from file1.txt into array and pulls logonID and email from file2.txt
inFile1 = "C:\Scripts\testvbs\wscreatestatus.txt"
inFile2 = "C:\Scripts\testvbs\WSBatchCreateBuildsList.txt"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objInFile1 = objFSO.OpenTextFile(inFile1, ForReading)
Set objInFile2 = objFSO.OpenTextFile(inFile2, ForReading)
'Creates Array of all DomainIDs for successful deployments
Do Until objInFile1.AtEndOfStream
strNextLine = objInFile1.Readline
arrLogons = Split(strNextLine , vbTab)
If arrLogons(0) = "DEPLOYSUCCESS" Then
arrUserIDList = arrUserIDList & arrLogons(5) & vbCrLf
End If
Loop
Do Until objInFile2.AtEndOfStream
strNextLine = objInFile2.Readline
arrAddressList = Split(strNextLine , ",")
arrMailList = arrMailList & arrAddressList(0) & vbTab & arrAddressList(1) & vbCrLf
Loop
What I need to do next is take my list of user IDs "arrUserIDList", and extract their email address from arrMailList. With this information I can send each user in file1.txt (wscreatestatus.txt) an email.
Thanks!
From the way you construct your arrMailList, I presume you want the selected LogonID's and corresponding email addresses output to a new Tab delimited text file.
If that is the case, I recommend using ArrayList objects to store the values in. ArrayLists have an easy to use Add method and for testing if an item is in an ArrayList, there is the Contains method.
In Code:
Option Explicit
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 8
Dim inFile1, inFile2, outFile, objFSO, objInFile, objOutFile
Dim strNextLine, fields, arrUserIDList, arrMailList
inFile1 = "C:\Scripts\testvbs\wscreatestatus.txt"
inFile2 = "C:\Scripts\testvbs\WSBatchCreateBuildsList.txt"
outFile = "C:\Scripts\testvbs\WSEmailDeploySuccess.txt"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objInFile = objFSO.OpenTextFile(inFile1, ForReading)
'Create an ArrayList of all DomainIDs for successful deployments
Set arrUserIDList = CreateObject( "System.Collections.ArrayList" )
Do Until objInFile.AtEndOfStream
strNextLine = objInFile.Readline
fields = Split(strNextLine , vbTab)
If fields(0) = "DEPLOYSUCCESS" Then
arrUserIDList.Add fields(5)
End If
Loop
'close the first input file
objInFile.Close
'Now read the second file and read the logonID's from it
Set objInFile = objFSO.OpenTextFile(inFile2, ForReading)
'Create an ArrayList of all LogonID's, a TAB character and the EmailAddress
Set arrMailList = CreateObject( "System.Collections.ArrayList" )
Do Until objInFile.AtEndOfStream
strNextLine = objInFile.Readline
fields = Split(strNextLine , ",")
If arrUserIDList.Contains(fields(0)) Then
' store the values in arrMailList as TAB separated values
arrMailList.Add fields(0) & vbTab & fields(1)
End If
Loop
'close the file and destroy the object
objInFile.Close
Set objInFile = Nothing
Set objOutFile = objFSO.OpenTextFile(outFile, ForWriting, True)
For Each strNextLine In arrMailList
objOutFile.WriteLine(strNextLine)
Next
'close the file and destroy the object
objOutFile.Close
Set objOutFile = Nothing
'clean up the other objects
Set objFSO = Nothing
Set arrUserIDList = Nothing
Set arrMailList = Nothing
Hope that helps
This is how I solved my problem, but I think Theo took a better approach.
Const ForReading = 1
Const ForWriting = 2
Dim inFile1, inFile2, strNextLine, arrServiceList, arrUserList
Dim arrUserDeployList, strLine, arrList, outFile, strItem1, strItem2
inFile1 = Wscript.Arguments.Item(0) 'wscreatestatus.txt file (tab delimited)
inFile2 = Wscript.Arguments.Item(1) 'WSBatchCreateBuildsList.txt (comma delimited)
outFile = Wscript.Arguments.Item(2) 'SuccessWSMailList###.txt (for Welcome emails)
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objInFile1 = objFSO.OpenTextFile(inFile1, ForReading)
Set objInFile2 = objFSO.OpenTextFile(inFile2, ForReading)
Set objOutFile = objFSO.CreateTextFile(outFile, ForWriting, True)
'Extracts Logon ID's for successfull deployments into an Array
'#================================================================
i = 0
Do Until objInFile1.AtEndOfStream
ReDim Preserve arrUsers(i)
strNextLine = objInFile1.Readline
arrLogons = Split(strNextLine , vbTab)
If arrLogons(0) = "PENDINGREQUESTS" Then
arrUsers(i) = arrLogons(5)
i = i + 1
End If
Loop
'Extracts success deploy email addresses and writes to file
'#================================================================
Do Until objInFile2.AtEndOfStream
ReDim Preserve arrMailList(i)
strNextLine = objInFile2.Readline
arrAddressList = Split(strNextLine , ",")
strItem1 = arrAddressList(0)
strItem2 = arrAddressList(1)
For Each strArrayEntry In arrUsers
If strArrayEntry = strItem1 Then
objOutFile.WriteLine strItem1 & "," & strItem2
End If
Next
i = i + 1
Loop
objOutFile.Close
objInFile1.Close
objInFile2.Close

VBA to read in tab delimited file

I have some code which reads in a tab delimited file where cell reference B2 matches the reference in the first column in the tab delimited file. This works fine where the text file is small. The below works on a sample file with aa bb and cc as the headers with dummy data.
Option Explicit
Sub TestImport()
Call ImportTextFile(Sheet1.Range("B1"), vbTab, Sheet2.Range("A4"))
End Sub
Public Sub ImportTextFile(strFileName As String, strSeparator As String, rngTgt As Range)
Dim lngTgtRow As Long
Dim lngTgtCol As Long
Dim varTemp As Variant
Dim strWholeLine As String
Dim intPos As Integer
Dim intNextPos As Integer
Dim intTgtColIndex As Integer
Dim wks As Worksheet
Set wks = rngTgt.Parent
intTgtColIndex = rngTgt.Column
lngTgtRow = rngTgt.Row
Open strFileName For Input Access Read As #1
While Not EOF(1)
Line Input #1, strWholeLine
varTemp = Split(strWholeLine, strSeparator)
If CStr(varTemp(0)) = CStr(Range("B2")) Then
wks.Cells(lngTgtRow, intTgtColIndex).Resize(, UBound(varTemp) + 1).Value = varTemp
lngTgtRow = lngTgtRow + 1
End If
Wend
Close #1
Set wks = Nothing
End Sub
I am trying to get the below bit of code to work using ADO as this will run much faster on a text file with a couple of million records however I am getting an error on the '.Open str' part of the code (no value given for one or more required parameters).
It looks like it is to do with how I am defining the string- could you review and see if there is something I am missing...?
Sub QueryTextFile()
t = Timer
Dim cnn As Object
Dim str As String
Set cnn = CreateObject("ADODB.Connection")
cnn.Provider = "Microsoft.Jet.OLEDB.4.0"
cnn.ConnectionString = "Data Source=C:\Users\Davids Laptop\Documents\Other Ad Hoc\Test Files\;Extended Properties=""text;HDR=Yes;FMT=Delimited;"""
cnn.Open
Dim rs As Object
Set rs = CreateObject("ADODB.Recordset")
str = "SELECT * FROM [test1.txt] WHERE [aa]=" & Chr(34) & Range("B2") & Chr(34)
With rs
.ActiveConnection = cnn
.Open str
Sheet1.Range("A4").CopyFromRecordset rs
.Close
End With
cnn.Close
MsgBox Timer - t
End Sub

How to read quoted field from CSV using VBScript

In a sample.csv file, which has fixed number of columns, I have to extract a particular field value and store it in a variable using VBScript.
sample.csv
100,SN,100.SN,"100|SN| 435623| serkasg| 15.32|
100|SN| 435624| serkasg| 15.353|
100|SN| 437825| serkasg| 15.353|"," 0 2345"
101,SN,100.SN,"100|SN| 435623| serkasg| 15.32|
100|SN| 435624| serkasg| 15.353|
100|SN| 437825| serkasg| 15.353|"," 0 2346"
I want to parse the last two fields which are within double quotes and store them in two different array variables for each row.
You could try using an ADO connection
Option Explicit
dim ado: set ado = CreateObject("ADODB.Connection")
ado.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\txtFilesFolder\;Extended Properties=""text;HDR=No;FMT=Delimited"";"
ado.open
dim recordSet: set recordSet = ado.Execute("SELECT * FROM [samples.csv]")
dim field3, field4
do until recordSet.EOF
field3 = recordSet.Fields(3).Value
field4 = recordSet.Fields(4).Value
' use your fields here
recordSet.MoveNext
loop
recordSet.close
ado.close
You may have an issue if those fields are greater than 255 characters in length - if they are, they may return truncated. You also may have better luck with ODBC or ACE connection strings instead of the Jet one I've used here.
Since CSV's are comma-separated, you can use the Split() function to separate the fields into an array:
' Read a line from the CSV...
strLine = myCSV.ReadLine()
' Split by comma into an array...
a = Split(strLine, ",")
Since you have a static number of columns (5), the last field will always be a(4) and the second-to-last field will be a(3).
Your CSV data seems to contain 2 embedded hard returns (CR, LF) per line. Then the first line ReadLine returns is:
100,SN,100.SN,"100|SN| 435623| serkasg| 15.32|
The solution below unwraps these lines before extracting the required fields.
Option Explicit
Const ForReading = 1
Const ForAppending = 8
Const TristateUseDefault = 2 ' Opens the file using the system default.
Const TristateTrue = 1 ' Opens the file as Unicode.
Const TristateFalse = 0 ' Opens the file as ASCII.
Dim FSO, TextStream, Line, LineNo, Fields, Field4, Field5
ExtractFields "sample.csv"
Sub ExtractFields(FileName)
Set FSO = CreateObject("Scripting.FileSystemObject")
If FSO.FileExists(FileName) Then
Line = ""
LineNo = 0
Set TextStream = FSO.OpenTextFile(FileName, ForReading, False, TristateFalse)
Do While Not TextStream.AtEndOfStream
Line = Line & TextStream.ReadLine()
LineNo = LineNo + 1
If LineNo mod 3 = 0 Then
Fields = Split(Line, ",")
Field4 = Fields(3)
Field5 = Fields(4)
MsgBox "Line " & LineNo / 3 & ": " & vbNewLine & vbNewLine _
& "Field4: " & Field4 & vbNewLine & vbNewLine _
& "Field5: " & Field5
Line = ""
End If
Loop
TextStream.Close()
Else
MsgBox "File " & FileName & " ... Not found"
End If
End Sub
Here is an alternative solution that allows for single or multiline CSV records. It uses a regular expression which simplifies the logic for handling multiline records. This solution does not remove CRLF characters embedded in a record; I've left that as an exercise for you :)
Option Explicit
Const ForReading = 1
Const ForAppending = 8
Const TristateUseDefault = 2 ' Opens the file using the system default.
Const TristateTrue = 1 ' Opens the file as Unicode.
Const TristateFalse = 0 ' Opens the file as ASCII.
Dim FSO, TextStream, Text, MyRegExp, MyMatches, MyMatch, Field4, Field5
ExtractFields "sample.csv"
Sub ExtractFields(FileName)
Set FSO = CreateObject("Scripting.FileSystemObject")
If FSO.FileExists(FileName) Then
Set MyRegExp = New RegExp
MyRegExp.Multiline = True
MyRegExp.Global = True
MyRegExp.Pattern = """([^""]+)"",""([^""]+)"""
Set TextStream = FSO.OpenTextFile(FileName, ForReading, False, TristateFalse)
Text = TextStream.ReadAll
Set MyMatches = MyRegExp.Execute(Text)
For Each MyMatch in MyMatches
Field4 = SubMatches(0)
Field5 = SubMatches(1)
MsgBox "Field4: " & vbNewLine & Field4 & vbNewLine & vbNewLine _
& "Field5: " & vbNewLine & Field5, 0, "Found Match"
Next
Set MyMatches = Nothing
TextStream.Close()
Else
MsgBox "File " & FileName & " ... Not found"
End If
End Sub

SQL CLR User Defined Function (C#) adds null character (\0) in between every existing character in String being returned

This one has kept me stumped for a couple of days now.
It's my first dabble with CLR & UDF ...
I have created a user defined function that takes a multiline String as input, scans it and replaces a certain line in the string with an alternative if found. If it is not found, it simply appends the desired line at the end. (See code)
The problem, it seems, comes when the final String (or Stringbuilder) is converted to an SqlString or SqlChars. The converted, returned String always contains the Nul character as every second character (viewing via console output, they are displayed as spaces).
I'm probably missing something fundamental on UDF and/or CLR.
Please Help!!
Code (I leave in the commented Stringbuilder which was my initial attempt... changed to normal String in a desperate attempt to find the issue):
[Microsoft.SqlServer.Server.SqlFunction]
[return: SqlFacet(MaxSize = -1, IsFixedLength = false)]
//public static SqlString udf_OmaChangeJob(String omaIn, SqlInt32 jobNumber) {
public static SqlChars udf_OmaChangeJob(String omaIn, SqlInt32 jobNumber) {
if (omaIn == null || omaIn.ToString().Length <= 0) return new SqlChars("");
String[] lines = Regex.Split(omaIn.ToString(), "\r\n");
Regex JobTag = new Regex(#"^JOB=.+$");
//StringBuilder buffer = new StringBuilder();
String buffer = String.Empty;
bool matched = false;
foreach (var line in lines) {
if (!JobTag.IsMatch(line))
//buffer.AppendLine(line);
buffer += line + "\r\n";
else {
//buffer.AppendLine("JOB=" + jobNumber);
buffer += ("JOB=" + jobNumber + "\r\n");
matched = true;
}
}
if (!matched) //buffer.AppendLine("JOB=" + jobNumber);
buffer += ("JOB=" + jobNumber) + "\r\n";
//return new SqlString(buffer.ToString().Replace("\0",String.Empty)) + "blablabla";
// buffer = buffer.Replace("\0", "|");
return new SqlChars(buffer + "\r\nTheEnd");
}
I know in my experiences, the omaIn parameter should be of type SqlString and when you go to collect its value/process it, set a local variable:
string omaString = omaIn != SqlString.Null ? omaIn.Value : string.empty;
Then when you return on any code path, to rewrap the string in C#, you'd need to set
return omaString == string.empty ? new SqlString.Null : new SqlString(omaString);
I have had some fun wrestling matches learning the intricate hand-off between local and outbound types, especially with CLR TVFs.
Hope that can help!

I only want to import the last 20 text files into excel from a folder

Currently, the folder I'm importing from has 16,000 files and I only need the latest ones. The amount of lines wouldn't be so bad if it didn't break Excel every time it tried to run. The code I'm using imports them all:
Sub ReadFilesIntoActiveSheet()
Dim fso As FileSystemObject
Dim folder As folder
Dim file As file
Dim FileText As TextStream
Dim TextLine As String
Dim Items() As String
Dim i As Long
Dim cl As Range
' Get a FileSystem object
Set fso = New FileSystemObject
' get the directory you want
Set folder = fso.GetFolder("X:\TMS\TRUCK_OUT")
' set the starting point to write the data to
Set cl = ActiveSheet.Cells(1, 1)
' Loop thru all files in the folder
For Each file In folder.Files
' Open the file
Set FileText = file.OpenAsTextStream(ForReading)
' Read the file one line at a time
Do While Not FileText.AtEndOfStream
TextLine = FileText.ReadLine
' Parse the line into | delimited pieces
Items = Split(TextLine, "|")
' Put data on one row in active sheet
For i = 0 To UBound(Items)
cl.Offset(0, i).Value = Items(i)
Next
' Move to next row
Set cl = cl.Offset(1, 0)
Loop
' Clean up
FileText.Close
Next file
Set FileText = Nothing
Set file = Nothing
Set folder = Nothing
Set fso = Nothing
End Sub
Check out the following code - I have commented the extra line of code that i have added
Sub ReadFilesIntoActiveSheet()
Dim fso As FileSystemObject
Dim folder As folder
Dim file As file
Dim FileText As TextStream
Dim TextLine As String
Dim Items() As String
Dim i As Long
Dim cl As Range
Dim fileModDate As Date 'PANKAJ - Added the variable
' Get a FileSystem object
Set fso = New FileSystemObject
' get the directory you want
Set folder = fso.GetFolder("C:\Users\pankaj.jaju\Desktop")
' set the starting point to write the data to
Set cl = ActiveSheet.Cells(1, 1)
' Loop thru all files in the folder
For Each file In folder.Files
fileModDate = file.DateLastModified 'PANKAJ - Added this to get the last modified date for each file
If fileModDate > #1/20/2014 1:00:00 PM# Then 'PANKAJ - Added this to get the desired file based on modified date
' Open the file
Set FileText = file.OpenAsTextStream(ForReading)
' Read the file one line at a time
Do While Not FileText.AtEndOfStream
TextLine = FileText.ReadLine
' Parse the line into | delimited pieces
Items = Split(TextLine, "|")
' Put data on one row in active sheet
For i = 0 To UBound(Items)
cl.Offset(0, i).Value = Items(i)
Next
' Move to next row
Set cl = cl.Offset(1, 0)
' Clean up
FileText.Close
Loop
End If
Next
Set FileText = Nothing
Set file = Nothing
Set folder = Nothing
Set fso = Nothing
End Sub

Resources