I want to run a script for GSheets by a trigger every minute.
The script works without an error if I run it manually, but the trigger will throw an error message 100% of the time.
The idea is to populate a spreadsheet via GForms. The script should update the entries when a new form of an existing data set is submitted (identifier is in row 4). It does this by looking for duplicates and deleting the old set of data.
The trigger should run the script every minute or so to ensure data quality and to eliminate multiple duplicates if they occur.
This is the code for my script:
function updateExisting(columnWithUniqueIdentifier,sheetTabName) {
var dataFromColumnToMatch,lastColumn,lastRow,rowWithExistingUniqueValue,rowOfDataJustSaved,
sh,ss,valueToSearchFor;
// USER SETTINGS - if the values where not passed in to the function
if (!columnWithUniqueIdentifier) {//If you are not passing in the column number
columnWithUniqueIdentifier = 4;//Hard code column number if you want
}
if (!sheetTabName) {//The sheet tab name was not passed in to the function
sheetTabName = "Formularantworten 2";//Hard code if needed
}
//end of user settings
ss = SpreadsheetApp.getActiveSpreadsheet();//Get the active spreadsheet - this code must be in a project bound to spreadsheet
sh = ss.getSheetByName(sheetTabName);
lastRow = sh.getLastRow();
lastColumn = sh.getLastColumn();
Logger.log('lastRow: ' + lastRow)
rowOfDataJustSaved = sh.getRange(lastRow, 1, 1, lastColumn).getValues();//Get the values that were just saved
valueToSearchFor = rowOfDataJustSaved[0][columnWithUniqueIdentifier-1];
Logger.log('valueToSearchFor: ' + valueToSearchFor)
dataFromColumnToMatch = sh.getRange(1, columnWithUniqueIdentifier, lastRow-1, 1).getValues();
dataFromColumnToMatch = dataFromColumnToMatch.toString().split(",");
Logger.log('dataFromColumnToMatch: ' + dataFromColumnToMatch)
rowWithExistingUniqueValue = dataFromColumnToMatch.indexOf(valueToSearchFor);
Logger.log('rowWithExistingUniqueValue: ' + rowWithExistingUniqueValue)
if (rowWithExistingUniqueValue === -1) {//There is no existing data with the unique identifier
return;
}
sh.getRange(rowWithExistingUniqueValue + 1, 1, 1, rowOfDataJustSaved[0].length).setValues(rowOfDataJustSaved);
sh.deleteRow(lastRow);//delete the row that was at then end
SpreadsheetApp.getActiveSpreadsheet().toast("Updating Data Set Successful")
}
The Trigger will throw this Error Message:
23.11.2020, 10:27:03 Info lastRow: 7
23.11.2020, 10:27:03 Info valueToSearchFor: undefined
23.11.2020, 10:27:03 Fehler Exception: Konvertierung von "[object Object]" in int nicht möglich.
at updateExisting(Code:28:30)
GSheet_Trigger_Error_Messeage
Any idea of what I am doing wrong here?
Thanks so much in advance.
EDIT:
Here is a debugger log for rowofdatajustsaved & columnWithUniqueIdentifier.
Seems like they work as intended.
Debugger_Log_UpdateExisting
Problem solved: there has to be a trigger var in the var statement. I figured that there is an unnecessary if statement in the code as well and fixed it. Updated code below.
function updateExisting(trigger, columnWithUniqueIdentifier,sheetTabName) {
var dataFromColumnToMatch,lastColumn,lastRow,rowWithExistingUniqueValue,rowOfDataJustSaved,
sh,ss,valueToSearchFor;
Logger.log('columnWithUniqueIdentifier' + columnWithUniqueIdentifier)
// USER SETTINGS - if the values where not passed in to the function
columnWithUniqueIdentifier = 4;//Hard code column number if you want
Logger.log('sheetTabName' + sheetTabName)
if (!sheetTabName) {//The sheet tab name was not passed in to the function
sheetTabName = "Formularantworten 2";//Hard code if needed
}
//end of user settings
ss = SpreadsheetApp.getActiveSpreadsheet();//Get the active spreadsheet - this code must be in a project bound to spreadsheet
sh = ss.getSheetByName(sheetTabName);
lastRow = sh.getLastRow();
lastColumn = sh.getLastColumn();
Logger.log('lastRow: ' + lastRow)
rowOfDataJustSaved = sh.getRange(lastRow, 1, 1, lastColumn).getValues();//Get the values that were just saved
Logger.log(rowOfDataJustSaved)
debugger
valueToSearchFor = rowOfDataJustSaved[0][columnWithUniqueIdentifier-1];
Logger.log('valueToSearchFor: ' + valueToSearchFor)
dataFromColumnToMatch = sh.getRange(1, columnWithUniqueIdentifier, lastRow-1, 1).getValues();
dataFromColumnToMatch = dataFromColumnToMatch.toString().split(",");
Logger.log('dataFromColumnToMatch: ' + dataFromColumnToMatch)
rowWithExistingUniqueValue = dataFromColumnToMatch.indexOf(valueToSearchFor);
Logger.log('rowWithExistingUniqueValue: ' + rowWithExistingUniqueValue)
if (rowWithExistingUniqueValue === -1) {//There is no existing data with the unique identifier
return;
}
sh.getRange(rowWithExistingUniqueValue + 1, 1, 1, rowOfDataJustSaved[0].length).setValues(rowOfDataJustSaved);
sh.deleteRow(lastRow);//delete the row that was at then end
SpreadsheetApp.getActiveSpreadsheet().toast("Updating Data Set Successful")
}
I'm currently starting work on a text adventure game in Lua--no addons, just pure Lua for my first project. In essence, here is my problem; I'm trying to find out how I can do a "reverse lookup" of a table using one of its variables. Here's an example of what I've tried to do:
print("What are you trying to take?")
bag = {}
gold = {name="Gold",ap=3}
x = io.read("*l")
if x == "Gold" then
table.insert(bag,gold)
print("You took the " .. gold.name .. ".")
end
Obviously, writing a line like this with every single object in the game would be very... exhausting--especially since I think I'll be able to use this solution for not just taking items but movement from room to room using a reverse lookup with each room's (x,y) coordinates. Anyone have any ideas on how to make a more flexible system that can find a table by the player typing in one of its variables? Thanks in advance!
-blockchainporter
This doesn't directly answer your question as you asked it, but I think it would serve the purpose of what you are trying to do. I create a table called 'loot' which can hold many objects, and the player can place any of these in their 'bag' by typing the name.
bag = {}
loot = {
{name="Gold", qty=3},
{name="Axe", qty=1},
}
print("What are you trying to take?")
x = io.read("*l")
i = 1
while loot[i] do
if (x == loot[i].name) then
table.insert(bag, table.remove(loot,i))
else
i = i + 1
end
end
For bonus points, you could check 'bag' to see if the player has some of that item already and then just update the quantity...
while loot[i] do
if (x == loot[i].name) then
j, found = 1, nil
while bag[j] do
if (x == bag[j].name) then
found = true
bag[j].qty = bag[j].qty + loot[i].qty
table.remove(loot,i)
end
j = j + 1
end
if (not found) then
table.insert(bag, table.remove(loot,i))
end
else
i = i + 1
end
end
Again, this isn't a 'reverse lookup' solution like you asked for... but I think it is closer to what you are trying to do by letting a user choose to loot something.
My disclaimer is that I don't use IO functions in my own lua usage, so I have to assume that your x = io.read("*l") is correct.
PS. If you only ever want objects to have a name and qty, and never any other properties (like condition, enchantment, or whatever) then you could also simplify my solution by using key/val pairs:
bag = {}
loot = { ["Gold"] = 3, ["Axe"] = 1 }
print("What are you trying to take?")
x = io.read("*l")
for name, qty in pairs(loot) do
if x == name then
bag.name = (bag.name or 0) + qty
loot.name = nil
end
end
I have a few notes to start before I specifically address your question. (I just want to do this before I forget, so please bear with me!)
I recommend printing to the terminal using stderr instead of stdout--the Lua function print uses the latter. When I am writing a Lua script, I often create a C-style function called eprintf to print formatted output to stderr. I implement it like this:
local function eprintf(fmt, ...)
io.stderr:write(string.format(fmt, ...))
return
end
Just be aware that, unlike print, this function does not automatically append a newline character to the output string; to do so, remember to put \n at the end of your fmt string.
Next, it may be useful to define a helper function that calls io.read("*l") to get an entire line of input. In writing some example code to help answer your question, I called my function getline--like the C++ function that has similar behavior--and defined it like this:
local function getline()
local read = tostring(io.read("*l"))
return read
end
If I correctly understand what it is you are trying to do, the player will have an inventory--which you have called bag--and he can put items into it by entering item names into stdin. So, for instance, if the player found a treasure chest with gold, a sword, and a potion in it and he wanted to take the gold, he would type Gold into stdin and it would be placed in his inventory.
Based on what you have so far, it looks like you are using Lua tables to create these items: each table has a name index and another called ap; and, if a player's text input matches an item's name, the player picks that up item.
I would recommend creating an Item class, which you could abstract nicely by placing it in its own script and then loading it as needed with require. This is a very basic Item class module I wrote:
----------------
-- Item class --
----------------
local Item = {__name = "Item"}
Item.__metatable = "metatable"
Item.__index = Item
-- __newindex metamethod.
function Item.__newindex(self, k, v)
local err = string.format(
"type `Item` does not have member `%s`",
tostring(k)
)
return error(err, 2)
end
-- Item constructor
function Item.new(name_in, ap_in)
assert((name_in ~= nil) and (ap_in ~= nil))
local self = {
name = name_in,
ap = ap_in
}
return setmetatable(self, Item)
end
return Item
From there, I wrote a main driver to encapsulate some of the behavior you described in your question. (Yes, I know my Lua code looks more like C.)
#!/usr/bin/lua
-------------
-- Modules --
-------------
local Item = assert(require("Item"))
local function eprintf(fmt, ...)
io.stderr:write(string.format(fmt, ...))
return
end
local function printf(fmt, ...)
io.stdout:write(string.format(fmt, ...))
return
end
local function getline()
local read = tostring(io.read("*l"))
return read
end
local function main(argc, argv)
local gold = Item.new("Gold", 3)
printf("gold.name = %s\ngold.ap = %i\n", gold.name, gold.ap)
return 0
end
main(#arg, arg)
Now, as for the reverse search which you described, at this point all you should have to do is check the user's input against an Item's name. Here it is in the main function:
local function main(argc, argv)
local gold = Item.new("Gold", 3)
local bag = {}
eprintf("What are you trying to take? ")
local input = getline()
if (input == gold.name) then
table.insert(bag, gold)
eprintf("You took the %s.\n", gold.name)
else
eprintf("Unrecognized item `%s`.\n", input)
end
return 0
end
I hope this helps!
I have this function:
function SecondsFormat(X)
if X <= 0 then return "" end
local t ={}
local ndays = string.format("%02.f",math.floor(X / 86400))
if tonumber(ndays) > 0 then table.insert(t,ndays.."d ") end
local nHours = string.format("%02.f",math.floor((X/3600) -(ndays*24)))
if tonumber(nHours) > 0 then table.insert(t,nHours.."h ") end
local nMins = string.format("%02.f",math.floor((X/60) - (ndays * 1440) - (nHours*60)))
if tonumber(nMins) > 0 then table.insert(t,nMins.."m ") end
local nSecs = string.format("%02.f", math.fmod(X, 60));
if tonumber(nSecs) > 0 then table.insert(t,nSecs.."s") end
return table.concat(t)
end
I would like to add weeks and months to it but cant get my head around the month part to move on to the week part just because the days in a month aren't always the same so can anyone offer some help?
The second question is, is using a table to store the results the most efficient way of dealing with this given the function will be called every 3 seconds for up to 100 items (in a grid)?
Edit:
function ADownload.ETA(Size,Done,Tranrate) --all in bytes
if Size == nil then return "--" end
if Done == nil then return "--" end
if Tranrate == nil then return "--" end
local RemS = (Size - Done) / Tranrate
local RemS = tonumber(RemS)
if RemS <= 0 or RemS == nil or RemS > 63072000 then return "--" end
local date = os.date("%c",RemS)
if date == nil then return "--" end
local month, day, year, hour, minute, second = date:match("(%d+)/(%d+)/(%d+) (%d+): (%d+):(%d+)")
month = month - 1
day = day - 1
year = year - 70
if tonumber(year) > 0 then
return string.format("%dy %dm %dd %dh %dm %ds", year, month, day, hour, minute, second)
elseif tonumber(month) > 0 then
return string.format("%dm %dd %dh %dm %ds",month, day, hour, minute, second)
elseif tonumber(day) > 0 then
return string.format("%dd %dh %dm %ds",day, hour, minute, second)
elseif tonumber(hour) > 0 then
return string.format("%dh %dm %ds",hour, minute, second)
elseif tonumber(minute) > 0 then
return string.format("%dm %ds",minute, second)
else
return string.format("%ds",second)
end
end
I merged the function into the main function as I figured it would probably be quicker but I now have two questions:
1: I had to add
if date == nil then return "--" end
because it errors occasionally with date:match trying to compare with "nil" however os.date mentions nothing in the literature about returning nil as its a string or a table so although the extra line of code fixes the issue I'm wondering why that behaviour occurs as I'm sure I caught all the non events in the previous returns?
2: Sometimes I see functions written like myfunction(...) and I'm sure that just does away with the arguments and if so is there a one line of code that could do away with the first 3 "if" statements?
You can use the os.date function to get date values in a useable format. The '*t' formating parameter makes the returned date into a table instead of a string.
local t = os.date('*t')
print(t.year, t.month, t.day, t.hour, t.min, t.sec)
print(t.wday, t.yday)
os.data uses the current time by default, you can pass it an explicit time if you want (see the os.data docs for more info on this)
local t = os.date('*t', x)
As for table performance, I wouldn't worry about that. Not only is your function not called all that often, but table handling is much cheaper than other things you might be doing (calling os.date, lots of string formatting, etc)
Why not let Lua's os library do the hard work for you?
There is probably an easier (read: better) way to figure out the difference to 01/01/70, but here is a quick idea of how you could use it:
function SecondsFormat(X)
if X <= 0 then return "" end
local date = os.date("%c", X) -- will give something like "01/03/70 03:40:00"
local inPattern = "(%d+)/(%d+)/(%d+) (%d+):(%d+):(%d+)"
local outPattern = "%dy %dm %dd %dh %dm %ds"
local month, day, year, hour, minute, second = date:match(inPattern)
month = month - 1
day = day - 1
year = year - 70
return string.format(outPattern, year, month, day, hour, minute, second)
end
I think that this should also be a lot quicker than constructing the table and calling string.format multiple times - but you'd have to profile that.
EDIT: I ran a quick test with two functions that concatenate "abc", "def" and "ghi" using both methods. Inserting those strings into a table an concatenating took 14 seconds (for several million runs of course) and using a single string.format() took 6 seconds. This does not take into account, that your code calls string.format() anyway (multiple times) - nor the difference between you figuring out the values by division and I by pattern matching. Pattern matching is certainly slower, but I doubt that it outweighs the gains from not having a table - and it's certainly convenient to be able to leverage os.time(). The fastest way would probably be figuring out the month and day manually and then using a single string.format(). But again - you'd have to profile that.
EDIT2: missingno has a good point with using the "*t" option with os.date to give you the values separately in the first place. Again, this depends on whether you want to have a table for convenience vs. a string for storage or whatever reasons. Combining "*t" and a single string.format:
function SecondsFormat(X)
if X <= 0 then return "" end
local date = os.date("*t", X) -- will give you a table
local outPattern = "%dy %dm %dd %dh %dm %ds"
date.month = date.month - 1
date.day = date.day - 1
date.year = date.year - 70
return string.format(outPattern, date.year, date.month, date.day, date.hour, date.min, date.sec)
end
We do business largely in the United States and are trying to improve user experience by combining all the address fields into a single text area. But there are a few problems:
The address the user types may not be correct or in a standard format
The address must be separated into parts (street, city, state, etc.) to process credit card payments
Users may enter more than just their address (like their name or company with it)
Google can do this but the Terms of Service and query limits are prohibitive, especially on a tight budget
Apparently, this is a common question:
PHP script to parse address?
How do I parse the free format address to save into the DataBase
java postal address parser
More efficient way to extract address components
How can i show a pre populated postal address in contacts screen with street, city, zip on android
PHP regexp US address
Is there a way to isolate an address from the text around it and break it into pieces? Is there a regular expression to parse addresses?
I saw this question a lot when I worked for an address verification company. I'm posting the answer here to make it more accessible to programmers who are searching around with the same question. The company I was at processed billions of addresses, and we learned a lot in the process.
First, we need to understand a few things about addresses.
Addresses are not regular
This means that regular expressions are out. I've seen it all, from simple regular expressions that match addresses in a very specific format, to this:
/\s+(\d{2,5}\s+)(?![a|p]m\b)(([a-zA-Z|\s+]{1,5}){1,2})?([\s|,|.]+)?(([a-zA-Z|\s+]{1,30}){1,4})(court|ct|street|st|drive|dr|lane|ln|road|rd|blvd)([\s|,|.|;]+)?(([a-zA-Z|\s+]{1,30}){1,2})([\s|,|.]+)?\b(AK|AL|AR|AZ|CA|CO|CT|DC|DE|FL|GA|GU|HI|IA|ID|IL|IN|KS|KY|LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH|OK|OR|PA|RI|SC|SD|TN|TX|UT|VA|VI|VT|WA|WI|WV|WY)([\s|,|.]+)?(\s+\d{5})?([\s|,|.]+)/i
... to this where a 900+ line-class file generates a supermassive regular expression on the fly to match even more. I don't recommend these (for example, here's a fiddle of the above regex, that makes plenty of mistakes). There isn't an easy magic formula to get this to work. In theory and by theory, it's not possible to match addresses with a regular expression.
USPS Publication 28 documents the many formats of addresses that are possible, with all their keywords and variations. Worst of all, addresses are often ambiguous. Words can mean more than one thing ("St" can be "Saint" or "Street") and there are words that I'm pretty sure they invented. (Who knew that "Stravenue" was a street suffix?)
You'd need some code that really understands addresses, and if that code does exist, it's a trade secret. But you could probably roll your own if you're really into that.
Addresses come in unexpected shapes and sizes
Here are some contrived (but complete) addresses:
1) 102 main street
Anytown, state
2) 400n 600e #2, 52173
3) p.o. #104 60203
Even these are possibly valid:
4) 829 LKSDFJlkjsdflkjsdljf Bkpw 12345
5) 205 1105 14 90210
Obviously, these are not standardized. Punctuation and line breaks not guaranteed. Here's what's going on:
Number 1 is complete because it contains a street address and a city and state. With that information, there's enough to identify the address, and it can be considered "deliverable" (with some standardization).
Number 2 is complete because it also contains a street address (with secondary/unit number) and a 5-digit ZIP code, which is enough to identify an address.
Number 3 is a complete post office box format, as it contains a ZIP code.
Number 4 is also complete because the ZIP code is unique, meaning that a private entity or corporation has purchased that address space. A unique ZIP code is for high-volume or concentrated delivery spaces. Anything addressed to ZIP code 12345 goes to General Electric in Schenectady, NY. This example won't reach anyone in particular, but the USPS would still be able to deliver it.
Number 5 is also complete, believe it or not. With just those numbers, the full address can be discovered when parsed against a database of all possible addresses. Filling in the missing directionals, secondary designator, and ZIP+4 code is trivial when you see each number as a component. Here's what it looks like, fully expanded and standardized:
205 N 1105 W Apt 14
Beverly Hills CA 90210-5221
Address data is not your own
In most countries that provide official address data to licensed vendors, the address data itself belongs to the governing agency. In the US, the USPS owns the addresses. The same is true for Canada Post, Royal Mail, and others, though each country enforces or defines ownership a little differently. Knowing this is important, since it usually forbids reverse-engineering the address database. You have to be careful how to acquire, store, and use the data.
Google Maps is a common go-to for quick address fixes, but the TOS is rather prohibitive; for example, you can't use their data or APIs without showing a Google Map, and for non-commercial purposes only (unless you pay), and you can't store the data (except for temporary caching). Makes sense. Google's data is some of the best in the world. However, Google Maps does not verify the address. If an address does not exist, it will still show you where the address would be if it did exist (try it on your own street; use a house number that you know doesn't exist). This is useful sometimes, but be aware of that.
Nominatim's usage policy is similarly limiting, especially for high volume and commercial use, and the data is mostly drawn from free sources, so it isn't as well maintained (such is the nature of open projects) -- however, this may still suit your needs. It is supported by a great community.
The USPS itself has an API, but it goes down a lot and comes with no guarantees nor support. It might also be hard to use. Some people use it sparingly with no problems. But it's easy to miss that the USPS requires that you use their API only for confirming addresses to ship through them.
People expect addresses to be hard
Unfortunately, we've conditioned our society to expect addresses to be complicated. There's dozens of good UX articles all over the Internet about this, but the fact is, if you have an address form with individual fields, that's what users expect, even though it makes it harder for edge-case addresses that don't fit the format the form is expecting, or maybe the form requires a field it shouldn't. Or users don't know where to put a certain part of their address.
I could go on and on about the bad UX of checkout forms these days, but instead I'll just say that combining the addresses into a single field will be a welcome change -- people will be able to type their address how they see fit, rather than trying to figure out your lengthy form. However, this change will be unexpected and users may find it a little jarring at first. Just be aware of that.
Part of this pain can be alleviated by putting the country field out front, before the address. When they fill out the country field first, you know how to make your form appear. Maybe you have a good way to deal with single-field US addresses, so if they select United States, you can reduce your form to a single field, otherwise show the component fields. Just things to think about!
Now we know why it's hard; what can you do about it?
The USPS licenses vendors through a process called CASS™ Certification to provide verified addresses to customers. These vendors have access to the USPS database, updated monthly. Their software must conform to rigorous standards to be certified, and they don't often require agreement to such limiting terms as discussed above.
There are many CASS-Certified companies that can process lists or have APIs: Melissa Data, Experian QAS, and SmartyStreets to name a few.
(Due to getting flak for "advertising" I've truncated my answer at this point. It's up to you to find a solution that works for you.)
The Truth: Really, folks, I don't work at any of these companies. It's not an advertisement.
There are many street address parsers. They come in two basic flavors - ones that have databases of place names and street names, and ones that don't.
A regular expression street address parser can get up to about a 95% success rate without much trouble. Then you start hitting the unusual cases. The Perl one in CPAN, "Geo::StreetAddress::US", is about that good. There are Python and Javascript ports of that, all open source. I have an improved version in Python which moves the success rate up slightly by handling more cases. To get the last 3% right, though, you need databases to help with disambiguation.
A database with 3-digit ZIP codes and US state names and abbreviations is a big help. When a parser sees a consistent postal code and state name, it can start to lock on to the format. This works very well for the US and UK.
Proper street address parsing starts from the end and works backwards. That's how the USPS systems do it. Addresses are least ambiguous at the end, where country names, city names, and postal codes are relatively easy to recognize. Street names can usually be isolated. Locations on streets are the most complex to parse; there you encounter things such as "Fifth Floor" and "Staples Pavillion". That's when a database is a big help.
UPDATE: Geocode.xyz now works worldwide. For examples see https://geocode.xyz
For USA, Mexico and Canada, see geocoder.ca.
For example:
Input: something going on near the intersection of main and arthur kill rd new york
Output:
<geodata>
<latt>40.5123510000</latt>
<longt>-74.2500500000</longt>
<AreaCode>347,718</AreaCode>
<TimeZone>America/New_York</TimeZone>
<standard>
<street1>main</street1>
<street2>arthur kill</street2>
<stnumber/>
<staddress/>
<city>STATEN ISLAND</city>
<prov>NY</prov>
<postal>11385</postal>
<confidence>0.9</confidence>
</standard>
</geodata>
You may also check the results in the web interface or get output as Json or Jsonp. eg. I'm looking for restaurants around 123 Main Street, New York
No code? For shame!
Here is a simple JavaScript address parser. It's pretty awful for every single reason that Matt gives in his dissertation above (which I almost 100% agree with: addresses are complex types, and humans make mistakes; better to outsource and automate this - when you can afford to).
But rather than cry, I decided to try:
This code works OK for parsing most Esri results for findAddressCandidate and also with some other (reverse)geocoders that return single-line address where street/city/state are delimited by commas. You can extend if you want or write country-specific parsers. Or just use this as case study of how challenging this exercise can be or at how lousy I am at JavaScript. I admit I only spent about thirty mins on this (future iterations could add caches, zip validation, and state lookups as well as user location context), but it worked for my use case: End user sees form that parses geocode search response into 4 textboxes. If address parsing comes out wrong (which is rare unless source data was poor) it's no big deal - the user gets to verify and fix it! (But for automated solutions could either discard/ignore or flag as error so dev can either support the new format or fix source data.)
/*
address assumptions:
- US addresses only (probably want separate parser for different countries)
- No country code expected.
- if last token is a number it is probably a postal code
-- 5 digit number means more likely
- if last token is a hyphenated string it might be a postal code
-- if both sides are numeric, and in form #####-#### it is more likely
- if city is supplied, state will also be supplied (city names not unique)
- zip/postal code may be omitted even if has city & state
- state may be two-char code or may be full state name.
- commas:
-- last comma is usually city/state separator
-- second-to-last comma is possibly street/city separator
-- other commas are building-specific stuff that I don't care about right now.
- token count:
-- because units, street names, and city names may contain spaces token count highly variable.
-- simplest address has at least two tokens: 714 OAK
-- common simple address has at least four tokens: 714 S OAK ST
-- common full (mailing) address has at least 5-7:
--- 714 OAK, RUMTOWN, VA 59201
--- 714 S OAK ST, RUMTOWN, VA 59201
-- complex address may have a dozen or more:
--- MAGICICIAN SUPPLY, LLC, UNIT 213A, MAGIC TOWN MALL, 13 MAGIC CIRCLE DRIVE, LAND OF MAGIC, MA 73122-3412
*/
var rawtext = $("textarea").val();
var rawlist = rawtext.split("\n");
function ParseAddressEsri(singleLineaddressString) {
var address = {
street: "",
city: "",
state: "",
postalCode: ""
};
// tokenize by space (retain commas in tokens)
var tokens = singleLineaddressString.split(/[\s]+/);
var tokenCount = tokens.length;
var lastToken = tokens.pop();
if (
// if numeric assume postal code (ignore length, for now)
!isNaN(lastToken) ||
// if hyphenated assume long zip code, ignore whether numeric, for now
lastToken.split("-").length - 1 === 1) {
address.postalCode = lastToken;
lastToken = tokens.pop();
}
if (lastToken && isNaN(lastToken)) {
if (address.postalCode.length && lastToken.length === 2) {
// assume state/province code ONLY if had postal code
// otherwise it could be a simple address like "714 S OAK ST"
// where "ST" for "street" looks like two-letter state code
// possibly this could be resolved with registry of known state codes, but meh. (and may collide anyway)
address.state = lastToken;
lastToken = tokens.pop();
}
if (address.state.length === 0) {
// check for special case: might have State name instead of State Code.
var stateNameParts = [lastToken.endsWith(",") ? lastToken.substring(0, lastToken.length - 1) : lastToken];
// check remaining tokens from right-to-left for the first comma
while (2 + 2 != 5) {
lastToken = tokens.pop();
if (!lastToken) break;
else if (lastToken.endsWith(",")) {
// found separator, ignore stuff on left side
tokens.push(lastToken); // put it back
break;
} else {
stateNameParts.unshift(lastToken);
}
}
address.state = stateNameParts.join(' ');
lastToken = tokens.pop();
}
}
if (lastToken) {
// here is where it gets trickier:
if (address.state.length) {
// if there is a state, then assume there is also a city and street.
// PROBLEM: city may be multiple words (spaces)
// but we can pretty safely assume next-from-last token is at least PART of the city name
// most cities are single-name. It would be very helpful if we knew more context, like
// the name of the city user is in. But ignore that for now.
// ideally would have zip code service or lookup to give city name for the zip code.
var cityNameParts = [lastToken.endsWith(",") ? lastToken.substring(0, lastToken.length - 1) : lastToken];
// assumption / RULE: street and city must have comma delimiter
// addresses that do not follow this rule will be wrong only if city has space
// but don't care because Esri formats put comma before City
var streetNameParts = [];
// check remaining tokens from right-to-left for the first comma
while (2 + 2 != 5) {
lastToken = tokens.pop();
if (!lastToken) break;
else if (lastToken.endsWith(",")) {
// found end of street address (may include building, etc. - don't care right now)
// add token back to end, but remove trailing comma (it did its job)
tokens.push(lastToken.endsWith(",") ? lastToken.substring(0, lastToken.length - 1) : lastToken);
streetNameParts = tokens;
break;
} else {
cityNameParts.unshift(lastToken);
}
}
address.city = cityNameParts.join(' ');
address.street = streetNameParts.join(' ');
} else {
// if there is NO state, then assume there is NO city also, just street! (easy)
// reasoning: city names are not very original (Portland, OR and Portland, ME) so if user wants city they need to store state also (but if you are only ever in Portlan, OR, you don't care about city/state)
// put last token back in list, then rejoin on space
tokens.push(lastToken);
address.street = tokens.join(' ');
}
}
// when parsing right-to-left hard to know if street only vs street + city/state
// hack fix for now is to shift stuff around.
// assumption/requirement: will always have at least street part; you will never just get "city, state"
// could possibly tweak this with options or more intelligent parsing&sniffing
if (!address.city && address.state) {
address.city = address.state;
address.state = '';
}
if (!address.street) {
address.street = address.city;
address.city = '';
}
return address;
}
// get list of objects with discrete address properties
var addresses = rawlist
.filter(function(o) {
return o.length > 0
})
.map(ParseAddressEsri);
$("#output").text(JSON.stringify(addresses));
console.log(addresses);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea>
27488 Stanford Ave, Bowden, North Dakota
380 New York St, Redlands, CA 92373
13212 E SPRAGUE AVE, FAIR VALLEY, MD 99201
1005 N Gravenstein Highway, Sebastopol CA 95472
A. P. Croll & Son 2299 Lewes-Georgetown Hwy, Georgetown, DE 19947
11522 Shawnee Road, Greenwood, DE 19950
144 Kings Highway, S.W. Dover, DE 19901
Intergrated Const. Services 2 Penns Way Suite 405, New Castle, DE 19720
Humes Realty 33 Bridle Ridge Court, Lewes, DE 19958
Nichols Excavation 2742 Pulaski Hwy, Newark, DE 19711
2284 Bryn Zion Road, Smyrna, DE 19904
VEI Dover Crossroads, LLC 1500 Serpentine Road, Suite 100 Baltimore MD 21
580 North Dupont Highway, Dover, DE 19901
P.O. Box 778, Dover, DE 19903
714 S OAK ST
714 S OAK ST, RUM TOWN, VA, 99201
3142 E SPRAGUE AVE, WHISKEY VALLEY, WA 99281
27488 Stanford Ave, Bowden, North Dakota
380 New York St, Redlands, CA 92373
</textarea>
<div id="output">
</div>
For U.S. address parsing, I prefer using the usaddress package that is available in pip.
python3 -m pip install usaddress
Usage sample:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# address_parser.py
import sys
from usaddress import tag
from json import dumps, loads
if __name__ == '__main__':
tag_mapping = {
'Recipient': 'recipient',
'AddressNumber': 'addressStreet',
'AddressNumberPrefix': 'addressStreet',
'AddressNumberSuffix': 'addressStreet',
'StreetName': 'addressStreet',
'StreetNamePreDirectional': 'addressStreet',
'StreetNamePreModifier': 'addressStreet',
'StreetNamePreType': 'addressStreet',
'StreetNamePostDirectional': 'addressStreet',
'StreetNamePostModifier': 'addressStreet',
'StreetNamePostType': 'addressStreet',
'CornerOf': 'addressStreet',
'IntersectionSeparator': 'addressStreet',
'LandmarkName': 'addressStreet',
'USPSBoxGroupID': 'addressStreet',
'USPSBoxGroupType': 'addressStreet',
'USPSBoxID': 'addressStreet',
'USPSBoxType': 'addressStreet',
'BuildingName': 'addressStreet',
'OccupancyType': 'addressStreet',
'OccupancyIdentifier': 'addressStreet',
'SubaddressIdentifier': 'addressStreet',
'SubaddressType': 'addressStreet',
'PlaceName': 'addressCity',
'StateName': 'addressState',
'ZipCode': 'addressPostalCode',
}
try:
address, _ = tag(' '.join(sys.argv[1:]), tag_mapping=tag_mapping)
except:
with open('failed_address.txt', 'a') as fp:
fp.write(sys.argv[1] + '\n')
print(dumps({}))
else:
print(dumps(dict(address)))
Running address_parser.py:
python3 address_parser.py 9757 East Arcadia Ave. Saugus MA 01906
{"addressStreet": "9757 East Arcadia Ave.", "addressCity": "Saugus", "addressState": "MA", "addressPostalCode": "01906"}
I'm late to the party, but here is an Excel VBA script I wrote years ago for Australia. It can be easily modified to support other Countries. I've made a GitHub repository of the C# code here. I've hosted it on my site and you can download it here: http://jeremythompson.net/Rocks/ParseAddress.xlsm
Strategy
For any country with a PostCode that's numeric or can be matched with a RegEx my strategy works very well:
First we detect the First and Surname which are assumed to be the top line. Its easy to skip the name and start with the address by unticking the checkbox (called 'Name is top row' as shown below).
Next its safe to expect the Address consisting of the Street and Number come before the Suburb and the St, Pde, Ave, Av, Rd, Cres, loop, etc is a separator.
Detecting the Suburb vs the State and even Country can trick the most sophisticated parsers as there can be conflicts. To overcome this I use a PostCode look up based on the fact that after stripping Street and Apartment/Unit numbers as well as the PoBox,Ph,Fax,Mobile etc, only the PostCode number will remain. This is easy to match with a regEx to then look up the suburb(s) and country.
Your National Post Office Service will provide a list of post codes with Suburbs and States free of charge that you can store in an excel sheet, db table, text/json/xml file, etc.
Finally, since some Post Codes have multiple Suburbs we check which suburb appears in the Address.
Example
VBA Code
DISCLAIMER, I know this code is not perfect, or even written well however its very easy to convert to any programming language and run in any type of application. The strategy is the answer depending on your country and rules, take this code as an example:
Option Explicit
Private Const TopRow As Integer = 0
Public Sub ParseAddress()
Dim strArr() As String
Dim sigRow() As String
Dim i As Integer
Dim j As Integer
Dim k As Integer
Dim Stat As String
Dim SpaceInName As Integer
Dim Temp As String
Dim PhExt As String
On Error Resume Next
Temp = ActiveSheet.Range("Address")
'Split info into array
strArr = Split(Temp, vbLf)
'Trim the array
For i = 0 To UBound(strArr)
strArr(i) = VBA.Trim(strArr(i))
Next i
'Remove empty items/rows
ReDim sigRow(LBound(strArr) To UBound(strArr))
For i = LBound(strArr) To UBound(strArr)
If Trim(strArr(i)) <> "" Then
sigRow(j) = strArr(i)
j = j + 1
End If
Next i
ReDim Preserve sigRow(LBound(strArr) To j)
'Find the name (MUST BE ON THE FIRST ROW UNLESS CHECKBOX UNTICKED)
i = TopRow
If ActiveSheet.Shapes("chkFirst").ControlFormat.Value = 1 Then
SpaceInName = InStr(1, sigRow(i), " ", vbTextCompare) - 1
If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
ActiveSheet.Range("FirstName") = VBA.Left(sigRow(i), SpaceInName)
Else
If MsgBox("First Name: " & VBA.Mid$(sigRow(i), 1, SpaceInName), vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("FirstName") = VBA.Left(sigRow(i), SpaceInName)
End If
If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
ActiveSheet.Range("Surname") = VBA.Mid(sigRow(i), SpaceInName + 2)
Else
If MsgBox("Surame: " & VBA.Mid(sigRow(i), SpaceInName + 2), vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("Surname") = VBA.Mid(sigRow(i), SpaceInName + 2)
End If
sigRow(i) = ""
End If
'Find the Street by looking for a "St, Pde, Ave, Av, Rd, Cres, loop, etc"
For i = 1 To UBound(sigRow)
If Len(sigRow(i)) > 0 Then
For j = 0 To 8
If InStr(1, VBA.UCase(sigRow(i)), Street(j), vbTextCompare) > 0 Then
'Find the position of the street in order to get the suburb
SpaceInName = InStr(1, VBA.UCase(sigRow(i)), Street(j), vbTextCompare) + Len(Street(j)) - 1
'If its a po box then add 5 chars
If VBA.Right(Street(j), 3) = "BOX" Then SpaceInName = SpaceInName + 5
If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
ActiveSheet.Range("Street") = VBA.Mid(sigRow(i), 1, SpaceInName)
Else
If MsgBox("Street Address: " & VBA.Mid(sigRow(i), 1, SpaceInName), vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("Street") = VBA.Mid(sigRow(i), 1, SpaceInName)
End If
'Trim the Street, Number leaving the Suburb if its exists on the same line
sigRow(i) = VBA.Mid(sigRow(i), SpaceInName) + 2
sigRow(i) = Replace(sigRow(i), VBA.Mid(sigRow(i), 1, SpaceInName), "")
GoTo PastAddress:
End If
Next j
End If
Next i
PastAddress:
'Mobile
For i = 1 To UBound(sigRow)
If Len(sigRow(i)) > 0 Then
For j = 0 To 3
Temp = Mb(j)
If VBA.Left(VBA.UCase(sigRow(i)), Len(Temp)) = Temp Then
If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
ActiveSheet.Range("Mobile") = VBA.Mid(sigRow(i), Len(Temp) + 2)
Else
If MsgBox("Mobile: " & VBA.Mid(sigRow(i), Len(Temp) + 2), vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("Mobile") = VBA.Mid(sigRow(i), Len(Temp) + 2)
End If
sigRow(i) = ""
GoTo PastMobile:
End If
Next j
End If
Next i
PastMobile:
'Phone
For i = 1 To UBound(sigRow)
If Len(sigRow(i)) > 0 Then
For j = 0 To 1
Temp = Ph(j)
If VBA.Left(VBA.UCase(sigRow(i)), Len(Temp)) = Temp Then
'TODO: Detect the intl or national extension here.. or if we can from the postcode.
If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
ActiveSheet.Range("Phone") = VBA.Mid(sigRow(i), Len(Temp) + 3)
Else
If MsgBox("Phone: " & VBA.Mid(sigRow(i), Len(Temp) + 3), vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("Phone") = VBA.Mid(sigRow(i), Len(Temp) + 3)
End If
sigRow(i) = ""
GoTo PastPhone:
End If
Next j
End If
Next i
PastPhone:
'Email
For i = 1 To UBound(sigRow)
If Len(sigRow(i)) > 0 Then
'replace with regEx search
If InStr(1, sigRow(i), "#", vbTextCompare) And InStr(1, VBA.UCase(sigRow(i)), ".CO", vbTextCompare) Then
Dim email As String
email = sigRow(i)
email = Replace(VBA.UCase(email), "EMAIL:", "")
email = Replace(VBA.UCase(email), "E-MAIL:", "")
email = Replace(VBA.UCase(email), "E:", "")
email = Replace(VBA.UCase(Trim(email)), "E ", "")
email = VBA.LCase(email)
If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
ActiveSheet.Range("Email") = email
Else
If MsgBox("Email: " & email, vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("Email") = email
End If
sigRow(i) = ""
Exit For
End If
End If
Next i
'Now the only remaining items will be the postcode, suburb, country
'there shouldn't be any numbers (eg. from PoBox,Ph,Fax,Mobile) except for the Post Code
'Join the string and filter out the Post Code
Temp = Join(sigRow, vbCrLf)
Temp = Trim(Temp)
For i = 1 To Len(Temp)
Dim postCode As String
postCode = VBA.Mid(Temp, i, 4)
'In Australia PostCodes are 4 digits
If VBA.Mid(Temp, i, 1) <> " " And IsNumeric(postCode) Then
If ActiveSheet.Shapes("chkConfirm").ControlFormat.Value = 0 Then
ActiveSheet.Range("PostCode") = postCode
Else
If MsgBox("Post Code: " & postCode, vbQuestion + vbYesNo, "Confirm Details") = vbYes Then ActiveSheet.Range("PostCode") = postCode
End If
'Lookup the Suburb and State based on the PostCode, the PostCode sheet has the lookup
Dim mySuburbArray As Range
Set mySuburbArray = Sheets("PostCodes").Range("A2:B16670")
Dim suburbs As String
For j = 1 To mySuburbArray.Columns(1).Cells.Count
If mySuburbArray.Cells(j, 1) = postCode Then
'Check if the suburb is listed in the address
If InStr(1, UCase(Temp), mySuburbArray.Cells(j, 2), vbTextCompare) > 0 Then
'Set the Suburb and State
ActiveSheet.Range("Suburb") = mySuburbArray.Cells(j, 2)
Stat = mySuburbArray.Cells(j, 3)
ActiveSheet.Range("State") = Stat
'Knowing the State - for Australia we can get the telephone Ext
PhExt = PhExtension(VBA.UCase(Stat))
ActiveSheet.Range("PhExt") = PhExt
'remove the phone extension from the number
Dim prePhone As String
prePhone = ActiveSheet.Range("Phone")
prePhone = Replace(prePhone, PhExt & " ", "")
prePhone = Replace(prePhone, "(" & PhExt & ") ", "")
prePhone = Replace(prePhone, "(" & PhExt & ")", "")
ActiveSheet.Range("Phone") = prePhone
Exit For
End If
End If
Next j
Exit For
End If
Next i
End Sub
Private Function PhExtension(ByVal State As String) As String
Select Case State
Case Is = "NSW"
PhExtension = "02"
Case Is = "QLD"
PhExtension = "07"
Case Is = "VIC"
PhExtension = "03"
Case Is = "NT"
PhExtension = "04"
Case Is = "WA"
PhExtension = "05"
Case Is = "SA"
PhExtension = "07"
Case Is = "TAS"
PhExtension = "06"
End Select
End Function
Private Function Ph(ByVal Num As Integer) As String
Select Case Num
Case Is = 0
Ph = "PH"
Case Is = 1
Ph = "PHONE"
'Case Is = 2
'Ph = "P"
End Select
End Function
Private Function Mb(ByVal Num As Integer) As String
Select Case Num
Case Is = 0
Mb = "MB"
Case Is = 1
Mb = "MOB"
Case Is = 2
Mb = "CELL"
Case Is = 3
Mb = "MOBILE"
'Case Is = 4
'Mb = "M"
End Select
End Function
Private Function Fax(ByVal Num As Integer) As String
Select Case Num
Case Is = 0
Fax = "FAX"
Case Is = 1
Fax = "FACSIMILE"
'Case Is = 2
'Fax = "F"
End Select
End Function
Private Function State(ByVal Num As Integer) As String
Select Case Num
Case Is = 0
State = "NSW"
Case Is = 1
State = "QLD"
Case Is = 2
State = "VIC"
Case Is = 3
State = "NT"
Case Is = 4
State = "WA"
Case Is = 5
State = "SA"
Case Is = 6
State = "TAS"
End Select
End Function
Private Function Street(ByVal Num As Integer) As String
Select Case Num
Case Is = 0
Street = " ST"
Case Is = 1
Street = " RD"
Case Is = 2
Street = " AVE"
Case Is = 3
Street = " AV"
Case Is = 4
Street = " CRES"
Case Is = 5
Street = " LOOP"
Case Is = 6
Street = "PO BOX"
Case Is = 7
Street = " STREET"
Case Is = 8
Street = " ROAD"
Case Is = 9
Street = " AVENUE"
Case Is = 10
Street = " CRESENT"
Case Is = 11
Street = " PARADE"
Case Is = 12
Street = " PDE"
Case Is = 13
Street = " LANE"
Case Is = 14
Street = " COURT"
Case Is = 15
Street = " BLVD"
Case Is = 16
Street = "P.O. BOX"
Case Is = 17
Street = "P.O BOX"
Case Is = 18
Street = "PO BOX"
Case Is = 19
Street = "POBOX"
End Select
End Function
How would I go about displaying detailed distance between words.
For example, the output of the program could be:
Words are "car" and "cure":
Replace "a" with "u".
Add "e".
The Levenshtein distance does not fulfill my needs (I think).
Try the following. The algorithm is roughly following Wikipedia (Levenshtein distance). The language used below is ruby
Use as an example, the case of changing s into t as follows:
s = 'Sunday'
t = 'Saturday'
First, s and t are turned into arrays, and an empty string is inserted at the beginning. m will eventually be the matrix used in the argorithm.
s = ['', *s.split('')]
t = ['', *t.split('')]
m = Array.new(s.length){[]}
m here, however, is different from the matrix given if the algorithm in wikipedia for the fact that each cell includes not only the Levenshtein distance, but also the (non-)operation (starting, doing nothing, deletion, insertion, or substitution) that was used to get to that cell from an adjacent (left, up, or upper-left) cell. It may also include a string describing the parameters of the operation. That is, the format of each cell is:
[Levenshtein distance, operation(, string)]
Here is the main routine. It fills in the cells of m following the algorithm:
s.each_with_index{|a, i| t.each_with_index{|b, j|
m[i][j] =
if i.zero?
[j, "started"]
elsif j.zero?
[i, "started"]
elsif a == b
[m[i-1][j-1][0], "did nothing"]
else
del, ins, subs = m[i-1][j][0], m[i][j-1][0], m[i-1][j-1][0]
case [del, ins, subs].min
when del
[del+1, "deleted", "'#{a}' at position #{i-1}"]
when ins
[ins+1, "inserted", "'#{b}' at position #{j-1}"]
when subs
[subs+1, "substituted", "'#{a}' at position #{i-1} with '#{b}'"]
end
end
}}
Now, we set i, j to the bottom-right corner of m and follow the steps backwards as we unshift the contents of the cell into an array called steps, until we reach the start.
i, j = s.length-1, t.length-1
steps = []
loop do
case m[i][j][1]
when "started"
break
when "did nothing", "substituted"
steps.unshift(m[i-=1][j-=1])
when "deleted"
steps.unshift(m[i-=1][j])
when "inserted"
steps.unshift(m[i][j-=1])
end
end
Then we print the operation and the string of each step unless that is a non-operation.
steps.each do |d, op, str=''|
puts "#{op} #{str}" unless op == "did nothing" or op == "started"
end
With this particular example, it will output:
inserted 'a' at position 1
inserted 't' at position 2
substituted 'n' at position 2 with 'r'
class Solution:
def solve(self, text, word0, word1):
word_list = text.split()
ans = len(word_list)
L = None
for R in range(len(word_list)):
if word_list[R] == word0 or word_list[R] == word1:
if L is not None and word_list[R] != word_list[L]:
ans = min(ans, R - L - 1)
L = R
return -1 if ans == len(word_list) else ans
ob = Solution()
text = "cat dog abcd dog cat cat abcd dog wxyz"
word0 = "abcd"
word1 = "wxyz"
print(ob.solve(text, word0, word1))