Teradata:Get a specific part of a large variable text field

Teradata:Get a specific part of a large variable text field - parsing

My first Post: (be kind )
PROBLEM: I need to extract the View Name from a Text field that contains a full SQL Statements so I can link the field a different data source. There are two text strings that always exist on both sides of the target view. I was hoping to use these as identifying "anchors" along with a substring to bring in the View Name text from between them.
EXAMPLE:
from v_mktg_dm.**VIEWNAME** as lead_sql
(UPPER CASE/BOLD is what I want to extract)
I tried using
SELECT
SUBSTR(SQL_FIELD,INSTR(SQL_FIELD,'FROM V_MKTG_TRM_DM.',19),20) AS PARSED_FIELD
FROM DATABASE.SQL_STORAGE_DATA
But am not getting good results -
Any help is appreciated

You can apply a Regular Expression:
RegExp_Substr_gpl(SQL_FIELD, '(v_mktg_dm\.)(.*?)( AS lead_sql)',1,1,'i',2)
This looks for the string between 'v_mktg_dm.' and ' AS lead_sql'.
RegExp_Substr_gpl is an undocumented variation of RegExp_Substr which simplifies the syntax for ignoring parts of the match

Related

GSheets - How to query a partial string

I am currently using this formula to get all the data from everyone whose first name is "Peter", but my problem is that if someone is called "Simon Peter" this data is gonna show up on the formula output.
=QUERY('Data'!1:1000,"select * where B contains 'Peter'")
I know that for the other formulas if I add an * to the String this issue is resolved. But in this situation for the QUERY formula the same logic do not applies.
Do someone knows the correct syntax or a workaround?

How about classic SQL syntax
=QUERY('Data'!1:1000,"select * where B like 'Peter %'")
The LIKE keyword allows use of wildcard % to represent characters relative to the known parts of the searched string.

See the query reference: developers.google.com/chart/interactive/docs/querylanguage You could split firstname and lastname into separate columns, then only search for firstnames exactly equal to 'Peter'. Though you may want to also check if lowercase/uppercase where lower(B) contains 'peter' or whitespaces are present in unexpected places (e.g., trim()). You could also search only for values that start with Peter by using starts with instead of contains, or a regular expression using matches. – Brian D
It seems that for my case using 'starts with' is a perfect fit. Thank you!

Syntax if variable includes a specific string of text

I was wondering if there was a syntax to select all account names that include a certain string of text.
For example, if I have a SPSS file that has 3 million account names, I'd want to look at only the account names that have a / TKS at the end. The account name could like like Stack Overflow / TKS.

You can use char.index to check whether a string includes a specific substring.
So for example:
compute containsTKS=0.
if char.index(account_name,"/ TKS")>0 containsTKS=1.
execute.
You can then use containsTKS to filter or select cases.

The solution eli-k provided checks if / TKS is inside the account_name, at any position.
If you want to check if the "/ TKS" text is at the end of your account_name, you need a slightly changed syntax:
compute containsTKS=0.
if char.index(account_name,"/ TKS")=char.len(rtrim(account_name))-5 containsTKS=1.
execute.
Then, as eli-k mentioned, "You can then use containsTKS to filter or select cases."

How to change format / treat missing values in SPSS

I'm using SPSS modeler and I have a variable that the software recognizes as numeric. So the missing values are $null$. I want that the missing values of the variable are selectionable with '', as character.
So I would: or trasform the format of the variable from numeric to character or change only the missing values from $null$ to ''.
How can I fix?
thanks in advance

The best way to select null values in a numeric field is to use the #NULL() function from the Blanks and Null section of the Expression Builder.
For example, if you wanted to keep only the null values so that you could inspect them, you might use a Select node. Leave the radio button set as Include. Press the Expression Builder (calculator) button. Change the filter in the drop menu on the left side from General Functions to show Blanks and Null (press B 2 or 3 times). Double-click on #NULL(ITEM). Go to the right side and double-click on your numeric field name. Put a Table node at the end and run it.
Using Select #NULL in IBM SPSS Modeler
Another way to view just the null rows is to enter the #NULL(varname) function into the "Highlight records where" section of the Table dialog box.
"Highlight records where" dialog
When you run the table, any row that is true for this condition will be shown in red.
If you really need the variable to be a string, then use a Compute node to create a copy of this field under a new name and use the to_string() function in the Conversion section of the Expression Builder to change the type of the variable. Now you will be able to use the the Select node to grab "" as the missing value. Or you could use the Filler node to replace the column, but then you would not be able to compare before and after.
The dialog examples shown in this answer use this sample stream that is installed with your IBM SPSS Modeler software:
C:\Program Files\IBM\SPSS\Modeler\18.0\Demos\streams\featureselection.str

The easiest way to do it it's using the Fill node with the configurations:
A) FIELD
B) Condition = #NULL(#FIELD)
C) Replace by = ' '
This node will replace all $null$ for ' ' at the same variable chosen in option a.

I don't think you can customize how you visualize $nulls. (I know it's possible in SQL db though)
So I'd suggest that you work with the numbers and when you want to visualize or export the results, then turn the field to a string one then replace nulls:
Filled node > to_string(#FIELD)
Filler node > blank and nulls > #FIELD = ''

Microsoft Access: Complex string search to update field in another table

I have a table that is linked to Access to return the results of emails into a folder. All of the emails being returned will be answering the same questions. I need to parse this email body text from this table and update several fields of another table with this data. The problem is that the linked table brings the text in super messy. Even though I have the email that is being returned all nicely formatted in a table, it comes back into access a hot mess full of extra spacing. I want to open a recordset based on the linked table (LinkTable), and then parse the LinkTable.Body field somehow so I can update another table with clean data. The data that is coming back into LinkTable looks like this:
Permit? (Note: if yes, provide specific permit type in Additional Requirements section)
No
Phytosanitary Certificate? (Note: if recommended, input No and complete Additional Requirements section)
Yes
Additional Requirements: if not applicable, indicate NA or leave blank (Type of permit required, container labeling, other agency documents, other)
Double containment, The labeling or declaration must provide the following information: -The kind, variety, and origin of each lot of seed -The designation “hybrid” when the lot contains hybrid seed -If the seed was treated, the name of the substance or p
The answer of the first two should either be yes or no, so I figured I could set up code with case statements and based on a match I should place yes or no in the corresponding field in my real table (not sure how to deal with the extra spaces here), The third one could have any number of responses, but it is the last question so anything after the "(Type of permit required, container labeling, other agency documents, other)" could be taken and placed in the other table. Does anyone have any ideas how I could set this up? I am at a bit of a loss, especially with how to deal with all of the extra spaces and how to grab all of the text after the Additional Requirements paragraph. Thank you in advance!
My select statement to get the body text looks like this:
Set rst1 = db.OpenRecordset("SELECT Subject, Contents FROM LinkTable WHERE Subject like '*1710'")

There are multiple ways to do this, one is using Instr() and Len() to find beginning and end of the fixed questions, then Mid() to extract the answers.
But I think using Split() is easier. It's best explained with commented code.
Public Sub TheParsing()
' A string constant that you expect to never come up in the Contents, used as separator for Split()
Const strSeparator = "##||##"
Dim rst1 As Recordset
Dim S As String
Dim arAnswers As Variant
Dim i As Long
S = Nz(rst1!Contents, "")
' Replace all the constant parts (questions) with the separator
S = Replace(S, "Permit? (Note: if yes, provide specific permit type in Additional Requirements section)", strSeparator)
' etc. for the other questions
' Split the remaining string into a 0-based array with the answers
arAnswers = Split(S, strSeparator)
' arAnswers(0) contains everything before the first question (probably ""), ignore that.
' Check that there are 3 answers
If UBound(arAnswers) <> 3 Then
' Houston, we have a problem
Stop
Else
For i = 1 To 3
' Extract each answer
S = arAnswers(i)
' Remove whitespace: CrLf, Tab
S = Replace(S, vbCrLf, "")
S = Replace(S, vbTab, "")
' Trim the remaining string
S = Trim(S)
' Now you have the cleaned up string and can use it
Select Case i
Case 1: strPermit = S
Case 2: strCertificate = S
Case 3: strRequirements = S
End Select
Next i
End If
rst1.MoveNext
' etc
End Sub
This will fail if the constant parts (the questions) have been altered. But so will all other straightforward methods.

FullTextIndex in NexusDB, How to tokenize the searchstring

We are using NexusDB for a small database. We have a table with a FulltextIndex defined on it.
The index is configured with the following options:
Character separator
ccPunctuationDash
ccPunctuationOther
The user enters a search text in an edit box, and then an SQL statement is constructed with the following WHERE clause (%s substituted with the Editbox.text of course):
WHERE CONTAINS(FullIdx, ''%s'')
When the user enters multiple words in the editbox this goes wrong as the two separate words should have been embedded in the WHERE clause like this:
WHERE CONTAINS(FullIdx, 'word1' and 'word2')
So i have to parse the textbox value, scan it for spaces and split the text at those points. That made me wonder if it was possible to parse the search text for every setting of the fulltextindex, using the actual definition of the fulltextindex to create the correct where clause.
So if ccPunctuationDash is enabled in the FulltextIndex definition, than the search text is also split on a '-'.
If you think of it, it is exactly the same process as when the index is created and all strings are tokenized ...
My question: what is the easiest way of tokenizing a searchstring according to the settings of a FUlltextIndex?

The easiest way is... to create an empty #temporary table with a string field, with the same fulltext index settings as your real table. Set the TnxTable.Options to include dsoAddKeyAsVariantField. Load the string to tokenize into the string field, then view the table indexed by the fulltext index. Presto, you get an extra field displayed, which is the sorted tokens. You can now iterate over the table to read the tokens.

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart