I have a table that is linked to Access to return the results of emails into a folder. All of the emails being returned will be answering the same questions. I need to parse this email body text from this table and update several fields of another table with this data. The problem is that the linked table brings the text in super messy. Even though I have the email that is being returned all nicely formatted in a table, it comes back into access a hot mess full of extra spacing. I want to open a recordset based on the linked table (LinkTable), and then parse the LinkTable.Body field somehow so I can update another table with clean data. The data that is coming back into LinkTable looks like this:
Permit? (Note: if yes, provide specific permit type in Additional Requirements section)
No
Phytosanitary Certificate? (Note: if recommended, input No and complete Additional Requirements section)
Yes
Additional Requirements: if not applicable, indicate NA or leave blank (Type of permit required, container labeling, other agency documents, other)
Double containment, The labeling or declaration must provide the following information: -The kind, variety, and origin of each lot of seed -The designation “hybrid” when the lot contains hybrid seed -If the seed was treated, the name of the substance or p
The answer of the first two should either be yes or no, so I figured I could set up code with case statements and based on a match I should place yes or no in the corresponding field in my real table (not sure how to deal with the extra spaces here), The third one could have any number of responses, but it is the last question so anything after the "(Type of permit required, container labeling, other agency documents, other)" could be taken and placed in the other table. Does anyone have any ideas how I could set this up? I am at a bit of a loss, especially with how to deal with all of the extra spaces and how to grab all of the text after the Additional Requirements paragraph. Thank you in advance!
My select statement to get the body text looks like this:
Set rst1 = db.OpenRecordset("SELECT Subject, Contents FROM LinkTable WHERE Subject like '*1710'")
There are multiple ways to do this, one is using Instr() and Len() to find beginning and end of the fixed questions, then Mid() to extract the answers.
But I think using Split() is easier. It's best explained with commented code.
Public Sub TheParsing()
' A string constant that you expect to never come up in the Contents, used as separator for Split()
Const strSeparator = "##||##"
Dim rst1 As Recordset
Dim S As String
Dim arAnswers As Variant
Dim i As Long
S = Nz(rst1!Contents, "")
' Replace all the constant parts (questions) with the separator
S = Replace(S, "Permit? (Note: if yes, provide specific permit type in Additional Requirements section)", strSeparator)
' etc. for the other questions
' Split the remaining string into a 0-based array with the answers
arAnswers = Split(S, strSeparator)
' arAnswers(0) contains everything before the first question (probably ""), ignore that.
' Check that there are 3 answers
If UBound(arAnswers) <> 3 Then
' Houston, we have a problem
Stop
Else
For i = 1 To 3
' Extract each answer
S = arAnswers(i)
' Remove whitespace: CrLf, Tab
S = Replace(S, vbCrLf, "")
S = Replace(S, vbTab, "")
' Trim the remaining string
S = Trim(S)
' Now you have the cleaned up string and can use it
Select Case i
Case 1: strPermit = S
Case 2: strCertificate = S
Case 3: strRequirements = S
End Select
Next i
End If
rst1.MoveNext
' etc
End Sub
This will fail if the constant parts (the questions) have been altered. But so will all other straightforward methods.
Related
I have a list of data with a title column (among many other columns) and I have a Power BI parameter that has, for example, a value of "a,b,c". What I want to do is loop through the parameter's values and remove any rows that begin with those characters.
For example:
Title
a
b
c
d
Should become
Title
d
This comma separated list could have one value or it could have twenty. I know that I can turn the parameter into a list by using
parameterList = Text.Split(<parameter-name>,",")
but then I am unsure how to continue to use that to filter on. For one value I would just use
#"Filtered Rows" = Table.SelectRows(#"Table", each Text.StartsWith([key], <value-to-filter-on>))
but that only allows one value.
EDIT: I may have worded my original question poorly. The comma separated values in the parameterList can be any number of characters (e.g.: a,abcd,foo,bar) and I want to see if the value in [key] starts with that string of characters.
Try using List.Contains to check whether the starting character is in the parameter list.
each List.Contains(parameterList, Text.Start([key], 1)
Edit: Since you've changed the requirement, try this:
Table.SelectRows(
#"Table",
(C) => not List.AnyTrue(
List.Transform(
parameterList,
each Text.StartsWith(C[key], _)
)
)
)
For each row, this transforms the parameterList into a list of true/false values by checking if the current key starts with each text string in the list. If any are true, then List.AnyTrue returns true and we choose not to select that row.
Since you want to filter out all the values from the parameter, you can use something like:
= Table.SelectRows(#"Changed Type", each List.Contains(Parameter1,Text.Start([Title],1))=false)
Another way to do this would be to create a custom column in the table, which has the first character of title:
= Table.AddColumn(#"Changed Type", "FirstChar", each Text.Start([Title],1))
and then use this field in the filter step:
= Table.SelectRows(#"Added Custom", each List.Contains(Parameter1,[FirstChar])=false)
I tested this with a small sample set and it seems to be running fine. You can test both and see if it helps with the performance. If you are still facing performance issues, it would probably be easier if you can share the pbix file.
This seems to work fairly well:
= List.Select(Source[Title], each Text.Contains(Parameter1,Text.Start(_,1))=false)
Replace Source with the name of your table and Parameter1 with the name of your Parameter.
I get a daily email that lists upcoming appointments, and their length. The number of appointments vary from day to day.
The emails go like this:
================
Today's Schedule
9:30 AM
3h
Brazilian Blowout
[Client #1 name]
12:30 PM
1h
Women's Cut
[Client 2 name]
6:00 PM
45m
Men's Cut
[Client #3 name]
Projected Revenue
===================
I want to create an event in a Google Calendar for each appointment, and it seems like zapier MIGHT be able to do this, but all the help resources I can find are very general in nature.
Is this do-able on Zapier? If so, any nudges in the right direction would be awesome.
Any thoughts greatly appreciated.
I had some time to kill and enjoy the odd challenge. So I have put together a solution that should do what you are looking for. I will break it down by steps.
TEMPLATE
Zapier Trigger - Step 1
Type: Trigger
Module: Gmail
Criteria: User Dependent
Comments: For the trigger zap you will want to use a Gmail specific trigger, something to the effect of "execute trigger on emails titled 'xyz'", or "emails labeled 'xyz'" if you setup a filter in your inbox.
Input screenshot:
Output Screenshot:
Zapier Action - Step 2
Type: Action
Module: Code (Python 3)
Comments: The Code offered by Zapier executes whatever (properly written) code you place in its container. It is especially handy as it allows you to incorporate data from previous steps in it through the use of a dictionary variable titled 'input_data'. Zapier offers the Code module in two languages: Javascript and Python. As I am most familiar with Python my solution for this step was written in Python. I will append the code to the end of this answer. Using the data held in the body of the email (retrieved in step 1) we can execute some string manipulations and datetime conversions to break apart the email into its component parts and pass those on to the following Action Step: Create Calendar Event.
Input Screenshot:
Output Screenshot:
Zapier Action - Step 3
Type: Action
Module: Google Calendar - Create Event
Comments: Using the data outputted from the previous code step we can fill out the required fields for creating a new appointment.
Input Screenshot:
Output Screenshot:
PYTHON CODE
from datetime import timedelta, date, datetime
'''
Goal: Extract individual appointment details from variable length email
Steps:
Remove all extraneous and new line characters.
Isolate each individual appointment and group its relevant details.
Derive appointment start and end times using appointment time and duration.
Return all appointments in a list.
'''
def format_appt_times(appt_dict):
appt_start_str = appt_dict.get("appt_start")
appt_dur_str = appt_dict.get("appt_length")
# isolate hour and minutes from appointment time
appt_s_hour = int(appt_start_str[:appt_start_str.find(":")])
if ("pm" in appt_start_str.lower()):
appt_s_hour = 12 if appt_s_hour + 12 >= 24 else appt_s_hour + 12
appt_s_min = int(appt_start_str[appt_start_str.find(":") + 1 :
appt_start_str.find(":") + 3])
# isolate hour and minutes from duration time
appt_d_hour = 0
appt_d_min = 0
if ("h" in appt_dur_str):
appt_d_hour = int(appt_dur_str[:appt_dur_str.find("h")])
if ("m" in appt_dur_str):
appt_d_min = int(appt_dur_str[appt_dur_str.find("m") - 2 : appt_dur_str.find("m")])
# NOTE: adjust timedelta hours depending on your relation to UTC
# create datetime objects for appointment start and end times
time_zone = timedelta(hours=0)
tdy = date.today() - time_zone
duration = timedelta(hours=appt_d_hour, minutes=appt_d_min)
appt_start_dto = datetime(year=tdy.year,
month=tdy.month,
day=tdy.day,
hour=appt_s_hour,
minute=appt_s_min)
appt_end_dto = appt_start_dto + duration
# return properly formatted datetime as string for use in next step.
return (appt_start_dto.strftime("%Y-%m-%dT%H:%M"),
appt_end_dto.strftime("%Y-%m-%dT%H:%M"))
def partition_list(target, part_size):
for data in range(0, len(target), part_size):
yield target[data : data + part_size]
def main():
# Remove all extraneous and new line characters.
email_body = input_data.get("email_body")
head,delin,*email_body,delin,foot = [text for text in email_body.splitlines() if text != ""]
appointment_list = []
# Isolate each individual appointment and group its relevant details.
for text in partition_list(email_body, 4):
template = {
"appt_start" : text[0],
"appt_end" : None,
"appt_length" : text[1],
"appt_title" : text[2],
"appt_client" : text[3]
}
appointment_list.append(template)
for appt in appointment_list:
appt["appt_start"], appt["appt_end"] = format_appt_times(appt)
return appointment_list
return main()
I am not sure of your familiarity with Python, or programming more generally, but the comments in the code explain what each section is doing. If you have any specific questions regarding aspects of the code let me know. Assuming your email template does not change this setup should work exactly as needed. Let me know if anything is unclear.
UPDATE
I thought it best to address your question in the original answer should anyone else have similar questions.
explaining how this code is removing the extra characters:
There is actually a fair bit going on in the first line, so I will do my best to break it down, and provide resources where necessary.
The code in question:
head,delin,*email_body,delin,foot = [text for text in email_body.splitlines() if text != ""]
First step here was to break the text into manageable chunks. I did so with the line email_body.splitlines() which, by default, breaks strings into a list at each newline character found (you can specify your own delimiter).
If we were to inspect the list at this moment its contents would be something of the following:
["================", "", "Today's Schedule", "", "9:30 AM", "", "3h", ..., "[Client #3 name]", "", "Projected Revenue", "", "==================="]
You will notice there is a fair amount of information in there that we really don't want.
First lets look at the "" elements. These are left over as a result of the blank lines between each line of text, which even though they are blank do still have newline characters at the end of them. There a number of ways you could address this within python. We could simply write a for-loop to go through and copy all elements that are not "" to a new list.
To me this felt like additional work, and besides, Python offers list comprehension for just such a scenario. I won't go too deep into list comprehension as there is a lot that can be said about it, and in more insightful ways than I could muster, but it essentially allows you to provide logic against a set of 'data' to form a list. In this case, I specifically wanted to filter out the "" elements returned from the call to splitlines().
And so you will see I address this with the following line
[text for text in email_body.splitlines() if text != ""]
With that we have a list as above less the "" elements. Now we must turn our attention towards the more 'dynamic' garbage strings. Again there are a number of ways to do this. A, not particularly flexible, option could be to simply store the strings we want to remove in variables something to the effect of:
garb_1 = "==================="
garb_2 = "Projected Revenue"
garb_3 = ...
and once again filter the list with yet another for-loop. I instead chose to leverage Python's list unpacking idiom. Which allows us to 'unpack' list objects (and I believe tuples) into variables. As an example:
one, two, three = ["a", "b", "c"]
I'm sure you can guess what is happening above, as long as we provide the same number of variables as are in the list we can 'unpack' it in this fashion. But wait! In our case we don't know how long the list is going to be as it is entirely dependent on the number of appointments you have for any given day. Well this is where star unpacking enters to elevate the functionality. Using my code as the example:
head,delin,*email_body,delin,foot = [text for text in email_body.splitlines() if text != ""]
The *, in plain-English, is saying "I don't know how many elements to expect just give me all of them in a list". As we know that there will always be two lines of garbage at the beginning and end of the email we can assign them to throw away variables and capture everything in between using our variable length *email_body container.
With all of this complete we now have a list with only the data we are looking to capture. If, as you say, there are additional lines of garbage before or after the email_body, you can simply add additional throw away variables to account for them.
Once again feel free to ask any follow up questions.
Michael
Resources
List Comprehension
Star Unpacking
My first Post: (be kind )
PROBLEM: I need to extract the View Name from a Text field that contains a full SQL Statements so I can link the field a different data source. There are two text strings that always exist on both sides of the target view. I was hoping to use these as identifying "anchors" along with a substring to bring in the View Name text from between them.
EXAMPLE:
from v_mktg_dm.**VIEWNAME** as lead_sql
(UPPER CASE/BOLD is what I want to extract)
I tried using
SELECT
SUBSTR(SQL_FIELD,INSTR(SQL_FIELD,'FROM V_MKTG_TRM_DM.',19),20) AS PARSED_FIELD
FROM DATABASE.SQL_STORAGE_DATA
But am not getting good results -
Any help is appreciated
You can apply a Regular Expression:
RegExp_Substr_gpl(SQL_FIELD, '(v_mktg_dm\.)(.*?)( AS lead_sql)',1,1,'i',2)
This looks for the string between 'v_mktg_dm.' and ' AS lead_sql'.
RegExp_Substr_gpl is an undocumented variation of RegExp_Substr which simplifies the syntax for ignoring parts of the match
Shown below is the bottom half of the query I'm working on. The query returns all the values I want, but CSV is breaking apart account names that include commas (e.g. Sales, General and admin) into multiple columns.
I started looking at VBA code a few weeks ago and despite finding numerous pages on the replace function, I couldn't figure out how to get it to run inside my code, specifically after the query and before the delimiter so that the account names are kept intact/separate from the data.
Account names are changed fairly frequently, so ultimately I need a code that allows me to either enter specific account names that include commas so the code knows to ignore those commas, or a parse type function. Thanks in advance.
QueryQuote:
With Sheets("Income").QueryTables.Add(Connection:="URL;" & qurl, Destination:=Sheets("Income").Range("a1"))
.BackgroundQuery = True
.TablesOnlyFromHTML = False
.Refresh BackgroundQuery:=False
.SaveData = True
End With
Sheets("Income").Range("a1").CurrentRegion.TextToColumns Destination:=Sheets("Income").Range("a1"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=True, _
Semicolon:=False, Comma:=True, Space:=False, other:=False
Sheets("Income").Columns("A").ColumnWidth = 20
Sheets("Income").Columns("B:L").ColumnWidth = 8
End Sub
Find and Replace:
Cells.Replace What:="insert text here", Replacement:"insert replacement text here", _
LookAt:=xlPart, SearchOrder:=xlByRows, _
MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False
This is the vBA for the Find and Replace function. Use this for each phrase that needs to be found and replaced.
This is a Text To Columns formula:
Sheets("Income").Range("a1").CurrentRegion.TextToColumns _ Destination:=Sheets"Income").Range("a1"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=True, Semicolon:=False, _
Comma:=True, Space:=False, other:=False
The Comma:=True is what is separating your data. Try changing that to false.
I have a table with 1 million+ records that contain names. I would like to be able to sort the list by the first letter in the name.
.. ABCDEFGHIJKLMNOPQRSTUVWXYZ
What is the most efficient way to setup the db table to allow for searching by the first character in the table.name field?
The best idea right now is to add an extra field which stores the first character of the name as an observer, index that field and then sort by that field. Problem is it's no longer necessarily alphabetical.
Any suggestions?
You said in a comment:
so lets ignore the first letter part. How can I all records that start with A? All A's no B...z ? Thanks – AnApprentice Feb 21 at 15:30
I issume you meant How can I RETURN all records...
This is the answer:
select * from t
where substr(name, 1, 1) = 'A'
I agree with the questions above as to why you would want to do this -- a regular index on the whole field is functionally equivalent. PostgreSQL (with some new ones in v. 9) has some rather powerful indexing capabilities for special cases which you might want to read about here http://www.postgresql.org/docs/9.1/interactive/sql-createindex.html