Extracting specific data out of big chunk of data [closed]

Extracting specific data out of big chunk of data [closed] - google-sheets

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I want to extract the data in my sheet from this huge paragraph. The data behind and after the highlighted data that I need to extract is fixed all the way with no change at all. I guess it can be a guideline.
I tried using Regextract but for some reason it's showing some extra data with the data that I need and I can't seem to use trim to cut it :(

Could you try this implementation I just tested? It should work alright for your purpose.
import re
text = ' 2021-06-09 00:58:49 48s Redz#gmail.com selectedPoliyCode: a8b8a620poliList: {"code":"a8b8a620","isPolicyList":true,"poliyChecked":[{"label":["Audio"],"version":"8","results":[{"results":{"action_audio":"violation"},"area_code":"ALL"}],"description":"Bullying statements in","level":"L1","pt":"ccde764f","categories":["Harassment and bullying"],"poliy":"Bullying statements in","language":"en","isRecommend":false,"checked":true,"pseudo":"","tags":["audio"],"code":"a8b8a620","keywords":["audio"],"content":"<span style=\"background: #ffff00;\">Bull</span>ying statements in NPGA","id":"ccde764f_a8b8a620_en"}]} selectedTitle: Bullying statements;pipeline_infos: {"review_target":"mt_music_report_queue","config_key":"mt_music_report","create_task_logid":"","use_hawk2_config":1,"object_id":"6943268167347227394","env":"prod","object_type":"music_report","create_time":1623166571,"fr_idc":"alisg","mos_extra_data":{}}action: Delete '
pattern_mail = r'(?:\d+s)(.*)(?:selectedPoliyCode)'
pattern_title = r'(?:selectedTitle:)(.*)?;'
pattern_object_id = r'(?:"object_id":")(.*?)(?:")'
mail = re.findall(pattern_mail, text)[0].strip()
title = re.findall(pattern_title, text)[0].strip()
object_id = re.findall(pattern_object_id, text)[0].strip()
Note that the text is the one you posted in the spreadsheet. Also, the pattern for mail might be "selectedPolicyCode".
The three variables should contain the desired values.
My solution is in python, but the regex should work the same. Let me know if it works.
mail: (?:\d+s)(.)(?:selectedPoliyCode)
title: (?:selectedTitle:)(.)?;
id: (?:"object_id":")(.*?)(?:")
Here you can find them used in the black row of words (formula is there):
https://docs.google.com/spreadsheets/d/1m7Z9R1KKwcvGN0K_2TwOO19KsehfDWIcxoL9potYG7M/edit?usp=sharing
mail: =REGEXEXTRACT(B5, "(?:\d+s)(.*)(?:selectedPoliyCode)")
title: =REGEXEXTRACT(B5, "(?:selectedTitle:)(.*)?;")
id: =REGEXEXTRACT(B5, "(?:""object_id"":"")(.*?)(?:"")")

Related

Can I pull a list of info out of an email?

I get a daily email that lists upcoming appointments, and their length. The number of appointments vary from day to day.
The emails go like this:
================
Today's Schedule
9:30 AM
3h
Brazilian Blowout
[Client #1 name]
12:30 PM
1h
Women's Cut
[Client 2 name]
6:00 PM
45m
Men's Cut
[Client #3 name]
Projected Revenue
===================
I want to create an event in a Google Calendar for each appointment, and it seems like zapier MIGHT be able to do this, but all the help resources I can find are very general in nature.
Is this do-able on Zapier? If so, any nudges in the right direction would be awesome.
Any thoughts greatly appreciated.

I had some time to kill and enjoy the odd challenge. So I have put together a solution that should do what you are looking for. I will break it down by steps.
TEMPLATE
Zapier Trigger - Step 1
Type: Trigger
Module: Gmail
Criteria: User Dependent
Comments: For the trigger zap you will want to use a Gmail specific trigger, something to the effect of "execute trigger on emails titled 'xyz'", or "emails labeled 'xyz'" if you setup a filter in your inbox.
Input screenshot:
Output Screenshot:
Zapier Action - Step 2
Type: Action
Module: Code (Python 3)
Comments: The Code offered by Zapier executes whatever (properly written) code you place in its container. It is especially handy as it allows you to incorporate data from previous steps in it through the use of a dictionary variable titled 'input_data'. Zapier offers the Code module in two languages: Javascript and Python. As I am most familiar with Python my solution for this step was written in Python. I will append the code to the end of this answer. Using the data held in the body of the email (retrieved in step 1) we can execute some string manipulations and datetime conversions to break apart the email into its component parts and pass those on to the following Action Step: Create Calendar Event.
Input Screenshot:
Output Screenshot:
Zapier Action - Step 3
Type: Action
Module: Google Calendar - Create Event
Comments: Using the data outputted from the previous code step we can fill out the required fields for creating a new appointment.
Input Screenshot:
Output Screenshot:
PYTHON CODE
from datetime import timedelta, date, datetime
'''
Goal: Extract individual appointment details from variable length email
Steps:
Remove all extraneous and new line characters.
Isolate each individual appointment and group its relevant details.
Derive appointment start and end times using appointment time and duration.
Return all appointments in a list.
'''
def format_appt_times(appt_dict):
appt_start_str = appt_dict.get("appt_start")
appt_dur_str = appt_dict.get("appt_length")
# isolate hour and minutes from appointment time
appt_s_hour = int(appt_start_str[:appt_start_str.find(":")])
if ("pm" in appt_start_str.lower()):
appt_s_hour = 12 if appt_s_hour + 12 >= 24 else appt_s_hour + 12
appt_s_min = int(appt_start_str[appt_start_str.find(":") + 1 :
appt_start_str.find(":") + 3])
# isolate hour and minutes from duration time
appt_d_hour = 0
appt_d_min = 0
if ("h" in appt_dur_str):
appt_d_hour = int(appt_dur_str[:appt_dur_str.find("h")])
if ("m" in appt_dur_str):
appt_d_min = int(appt_dur_str[appt_dur_str.find("m") - 2 : appt_dur_str.find("m")])
# NOTE: adjust timedelta hours depending on your relation to UTC
# create datetime objects for appointment start and end times
time_zone = timedelta(hours=0)
tdy = date.today() - time_zone
duration = timedelta(hours=appt_d_hour, minutes=appt_d_min)
appt_start_dto = datetime(year=tdy.year,
month=tdy.month,
day=tdy.day,
hour=appt_s_hour,
minute=appt_s_min)
appt_end_dto = appt_start_dto + duration
# return properly formatted datetime as string for use in next step.
return (appt_start_dto.strftime("%Y-%m-%dT%H:%M"),
appt_end_dto.strftime("%Y-%m-%dT%H:%M"))
def partition_list(target, part_size):
for data in range(0, len(target), part_size):
yield target[data : data + part_size]
def main():
# Remove all extraneous and new line characters.
email_body = input_data.get("email_body")
head,delin,*email_body,delin,foot = [text for text in email_body.splitlines() if text != ""]
appointment_list = []
# Isolate each individual appointment and group its relevant details.
for text in partition_list(email_body, 4):
template = {
"appt_start" : text[0],
"appt_end" : None,
"appt_length" : text[1],
"appt_title" : text[2],
"appt_client" : text[3]
}
appointment_list.append(template)
for appt in appointment_list:
appt["appt_start"], appt["appt_end"] = format_appt_times(appt)
return appointment_list
return main()
I am not sure of your familiarity with Python, or programming more generally, but the comments in the code explain what each section is doing. If you have any specific questions regarding aspects of the code let me know. Assuming your email template does not change this setup should work exactly as needed. Let me know if anything is unclear.
UPDATE
I thought it best to address your question in the original answer should anyone else have similar questions.
explaining how this code is removing the extra characters:
There is actually a fair bit going on in the first line, so I will do my best to break it down, and provide resources where necessary.
The code in question:
head,delin,*email_body,delin,foot = [text for text in email_body.splitlines() if text != ""]
First step here was to break the text into manageable chunks. I did so with the line email_body.splitlines() which, by default, breaks strings into a list at each newline character found (you can specify your own delimiter).
If we were to inspect the list at this moment its contents would be something of the following:
["================", "", "Today's Schedule", "", "9:30 AM", "", "3h", ..., "[Client #3 name]", "", "Projected Revenue", "", "==================="]
You will notice there is a fair amount of information in there that we really don't want.
First lets look at the "" elements. These are left over as a result of the blank lines between each line of text, which even though they are blank do still have newline characters at the end of them. There a number of ways you could address this within python. We could simply write a for-loop to go through and copy all elements that are not "" to a new list.
To me this felt like additional work, and besides, Python offers list comprehension for just such a scenario. I won't go too deep into list comprehension as there is a lot that can be said about it, and in more insightful ways than I could muster, but it essentially allows you to provide logic against a set of 'data' to form a list. In this case, I specifically wanted to filter out the "" elements returned from the call to splitlines().
And so you will see I address this with the following line
[text for text in email_body.splitlines() if text != ""]
With that we have a list as above less the "" elements. Now we must turn our attention towards the more 'dynamic' garbage strings. Again there are a number of ways to do this. A, not particularly flexible, option could be to simply store the strings we want to remove in variables something to the effect of:
garb_1 = "==================="
garb_2 = "Projected Revenue"
garb_3 = ...
and once again filter the list with yet another for-loop. I instead chose to leverage Python's list unpacking idiom. Which allows us to 'unpack' list objects (and I believe tuples) into variables. As an example:
one, two, three = ["a", "b", "c"]
I'm sure you can guess what is happening above, as long as we provide the same number of variables as are in the list we can 'unpack' it in this fashion. But wait! In our case we don't know how long the list is going to be as it is entirely dependent on the number of appointments you have for any given day. Well this is where star unpacking enters to elevate the functionality. Using my code as the example:
head,delin,*email_body,delin,foot = [text for text in email_body.splitlines() if text != ""]
The *, in plain-English, is saying "I don't know how many elements to expect just give me all of them in a list". As we know that there will always be two lines of garbage at the beginning and end of the email we can assign them to throw away variables and capture everything in between using our variable length *email_body container.
With all of this complete we now have a list with only the data we are looking to capture. If, as you say, there are additional lines of garbage before or after the email_body, you can simply add additional throw away variables to account for them.
Once again feel free to ask any follow up questions.
Michael
Resources
List Comprehension
Star Unpacking

finding an element in an array without using include? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have an array of objects in which I need to check if it meets a specific criteria.
What I've done is looped through the array and then matched it with the ruby include? object. Problem is I've noticed that there are instances where this causes some codes to return true, when they really should be returning false.
group.plan_codes.each do |code|
normalized_plan_code = code.upcase.gsub(" ", "").gsub("+", "")
normalized_plan_code.include? coverage['plan_description'].upcase.gsub(" ", "").gsub("+", "")
end
I'm basically taking these group.plan_codes and matching them with the coverage['plan_description']. Problem I found was that if the code was something like group plan submitting a code like not group plan would still return true because the group plan is included in the plan description
Would anyone know a better way about doing this? I was thinking it could stop looking after the first element is completed, but am a little caught up on the ruby detect

Use a Regex or a straight equality test (==). For sake of clarity, let's assume (that I'm understanding your question correctly and) that you have an array such as:
plans = [ 'not group plan', 'group plan' ]
and you are trying to find the second element:
including = 'group plan'
plans.detect { |plan| plan.include?(including) }
this returns "not group plan", the first element, because it also includes the string 'group plan'. To remedy that, using regex you could use something like:
plans.detect { |plan| plan.match?(/\A#{Regexp.escape(including)}\z/) }
Now, this returns the second element, because you're looking for an exact match. Since it is an exact match, though, you could also use something simpler:
plans.detect { |plan| plan == including }
What the regex gives you is if each plan can include multiple items:
plans = ['plan a,not group plan,plan b', 'plan a,group plan,plan b']
Which is a comma separated list of plan codes and you're looking for any plan that includes 'group plan', now you can use the regex:
plans.detect { |plan| plan.match?(/,#{Regexp.escape(including)},/) }
and have the second element returned. You'll need to work the regex into a format that works for how you are saving the plan codes (in this example, I chose comma separated list, you might have tabs or semicolons or whatever else. If you have just a white space separated list of codes that can contain whitespace, you need to do more work and reject any items that include any codes that are longer and include the code you're looking for.

Adding Array Contents Together in rails [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Trying to add the contents of an array together.
["75.00", "50.00", "25.00"] the way I'm getting that info is with
c = Credit.last
c.payments.map(&:payment_amount).to_a
I'm trying to add up all the values together in the array.

The other posters are correct in that your question doesn't conform to the how to ask guidelines. The responses are not intended to put you down but rather to maintain the quality of content for stack overflow. That said this should get you where you need to go. IN the future please read the guidelines and submit accordingly.
Credit.last.payments.sum(:payment_amount.to_f)

One thing you may not have considered is that the array ["75.00", "50.00", "25.00"] contains a bunch of strings.
If you were to sum them together like this:
["75.00", "50.00", "25.00"].sum
# or like this as one commenter suggested
["75.00", "50.00", "25.00"].reduce(&:+)
# or the long-handed version
["75.00", "50.00", "25.00"].reduce {|str, val| str + val }
You would actually get "75.0050.0025.00". This is because the individual strings in the array are getting concatenated together.
So in fact, you would need to convert the array to floats or integers first. This can be done like this:
floats = ["75.00", "50.00", "25.00"].collect(&:to_f)
# or the long-handed version
["75.00", "50.00", "25.00"].collect {|val| val.to_f }
Then you can sum the values:
sum = floats.sum
Edit:
I just tried summing a string column via ActiveRecord and got an exception ActiveRecord::StatementInvalid: TinyTds::Error: Operand data type nvarchar is invalid for sum operator.:.
payment_total = Credit.last.payments.sum(:payment_amount)
# returns ActiveRecord::StatementInvalid:
# TinyTds::Error: Operand data type nvarchar is invalid for sum
# operator.
Looks like that won't be an option for you. Although, you could change the datatype of the column so that it is something other than a string. If you change the column datatype then you will be able to use aggregate functions.

Microsoft Access: Complex string search to update field in another table

I have a table that is linked to Access to return the results of emails into a folder. All of the emails being returned will be answering the same questions. I need to parse this email body text from this table and update several fields of another table with this data. The problem is that the linked table brings the text in super messy. Even though I have the email that is being returned all nicely formatted in a table, it comes back into access a hot mess full of extra spacing. I want to open a recordset based on the linked table (LinkTable), and then parse the LinkTable.Body field somehow so I can update another table with clean data. The data that is coming back into LinkTable looks like this:
Permit? (Note: if yes, provide specific permit type in Additional Requirements section)
No
Phytosanitary Certificate? (Note: if recommended, input No and complete Additional Requirements section)
Yes
Additional Requirements: if not applicable, indicate NA or leave blank (Type of permit required, container labeling, other agency documents, other)
Double containment, The labeling or declaration must provide the following information: -The kind, variety, and origin of each lot of seed -The designation “hybrid” when the lot contains hybrid seed -If the seed was treated, the name of the substance or p
The answer of the first two should either be yes or no, so I figured I could set up code with case statements and based on a match I should place yes or no in the corresponding field in my real table (not sure how to deal with the extra spaces here), The third one could have any number of responses, but it is the last question so anything after the "(Type of permit required, container labeling, other agency documents, other)" could be taken and placed in the other table. Does anyone have any ideas how I could set this up? I am at a bit of a loss, especially with how to deal with all of the extra spaces and how to grab all of the text after the Additional Requirements paragraph. Thank you in advance!
My select statement to get the body text looks like this:
Set rst1 = db.OpenRecordset("SELECT Subject, Contents FROM LinkTable WHERE Subject like '*1710'")

There are multiple ways to do this, one is using Instr() and Len() to find beginning and end of the fixed questions, then Mid() to extract the answers.
But I think using Split() is easier. It's best explained with commented code.
Public Sub TheParsing()
' A string constant that you expect to never come up in the Contents, used as separator for Split()
Const strSeparator = "##||##"
Dim rst1 As Recordset
Dim S As String
Dim arAnswers As Variant
Dim i As Long
S = Nz(rst1!Contents, "")
' Replace all the constant parts (questions) with the separator
S = Replace(S, "Permit? (Note: if yes, provide specific permit type in Additional Requirements section)", strSeparator)
' etc. for the other questions
' Split the remaining string into a 0-based array with the answers
arAnswers = Split(S, strSeparator)
' arAnswers(0) contains everything before the first question (probably ""), ignore that.
' Check that there are 3 answers
If UBound(arAnswers) <> 3 Then
' Houston, we have a problem
Stop
Else
For i = 1 To 3
' Extract each answer
S = arAnswers(i)
' Remove whitespace: CrLf, Tab
S = Replace(S, vbCrLf, "")
S = Replace(S, vbTab, "")
' Trim the remaining string
S = Trim(S)
' Now you have the cleaned up string and can use it
Select Case i
Case 1: strPermit = S
Case 2: strCertificate = S
Case 3: strRequirements = S
End Select
Next i
End If
rst1.MoveNext
' etc
End Sub
This will fail if the constant parts (the questions) have been altered. But so will all other straightforward methods.

Using a text file as a set of values swift [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I'm working on a project that selects one or more words contains entered(user) keywords.That's why I'm using a set. But the set is turkish dictionary. That is to say, it contains 68k words(most-used). How can I use datas from a text file to avoid overload?

It isn't very clear what you are asking.
If you want to load words from a text file you should look at using NSString's string parsing methods. The NSString method componentsSeparatedByString: will break a large string up into an array of pieces using the specified string as a delimiter. (You can use "\n" to separate words that are on separate lines, or " " to separate your words by spaces.)
If you want a set it's easy to load a set with the items in an array.
However I would recommend using an array and arc4random_uniform to select one of the items randomly.
Something like this
var working_array: = [String]()
let path = "your_path"
let textFile = NSString(
contentsOfFile: path,
encoding: NSUTF8StringEncoding,
error: nil)
let full_array = textFile.componentsSeparatedByString("\n")
Loading array of random words
if working_array isEmpty
{
working_array += full_array
}
fetching an element:
let random_index = arc4random_uniform(working_array.count)
a_word = working_array.removeAtIndex(random_index)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart