Extract URLs automatically by industry names - url

So I searched a lot about this, but all I found was how to extract titles from URLs.
What I want for example is, I have this name "AB Electronics Inc."
So when you type that in google the first thing that pops up is www.abelectronicsinc.com/
That's all I want, I want to know how to automate this, because I have like 1000 names, I dont want to type all these names one by one. I have a text file with all these names. Like:
ABIOMED, Inc.
Accumold®
Accuratus Lab Services
Accutron Inc.
Acme Monaco
Acorn Product Development
And SO on.
So how do I find the url for those names directly is what I wish to know.
Thank you

Related

How to design a system in Google Sheets that allows for people who don't speak the same language to know they're typing the same thing

I admit this is a strange request. Essentially myself and another person who speaks Mandarin need to work on scheduling asynchronously through a spreadsheet. If either of us enters something in, in our respective sections, it should update the other person's section to match. So If I changed Order 1 on Day 1 from Apple to Butter, it should look at the translated text for Butter in Chinese and update the dropdown list entry for Order 1 on Day 1 from Apple to Butter
Unfortunately it doesn't seem like there's anyway to add formulas to dropdown lists. Any advice here?
I created a super simplified spreadsheet of what I'm looking for Spreadsheet
there is a GOOGLETRANSLATE formula:
also, you have DETECTLANGUAGE that outputs the language code:
both of them (DETECTLANGUAGE is able to work with vertical arrays only) are not supported under ARRAYFORMULA so you will need to drag them around. also, it's worth mentioning that formulae are always 1-directional so you can have a dropdown to be translated but that translated output can't be used directly as the input for back-translation creating a paradox. with a scripted solution, you may have more flexibility tho.

Combining multiple spreadsheet files in to one master overview (read only)

Let me start by saying that I know too little about coding etc to translate some of the solutions given on this platform to solve my issue. So hopefully someone can help me get started..
I am trying to combine a certain section of multiple google spreadsheet files with multiple tabs into one file. The name and number of the various tabs are different (and change over time).
To explain. We have for various person an overview of their projects (each project on its own tab). Each project/tab contains a number of to do's. What I need to achieve is to import al the to do's to a master list so that we have 1 master overview (basically a big to do list that I can sort on date).
Two exmples with dummy information. The relevant information starts on line 79
https://docs.google.com/spreadsheets/d/1FsQd9sKaAG7hKynVIR3sxqx6_yR2_hCMQWAWsOr4tj0/edit?usp=sharing
https://docs.google.com/spreadsheets/d/155J24uQpRC7uGvZEhQdkiSBnYU28iodAn-zR7rUhg1o/edit?usp=sharing
Since this information is dynamic and you are restricted from using app script, you can create a "definitions" or "parameters" sheet where the person must either report the NAMES of their projects and the ROW the tasks starts on and total length. From there you can use importrange function to get their definitions. From the definitions you can use other import range functions to get their tasks list. Concatenating it is gonna be a pretty big issue for you though.
This unfortunately would be much easier for you to accomplish with a different architecture to your docs / sheets. The more a spreadsheet looks like a database (column heads and rows of data that match those headers), the easier they are to work with. The more they look like forms / paper worksheets the more code you would need to parse that format.

How can I smartly extract information from an HTML page?

I am building something that can more or less extract key information from an arbitrary web site. For example, if I crawled a McDonalds page and wanted to figure out programatically the opening and closing time of McDonalds, what is an intelligent way to do it?
In a general case, maybe I also want to find out whether McDonalds sells chicken wings, or the address of McDonalds.
What I am thinking is that I will have a specific case for time, wings, and address and have code that is unique for each of those 3 cases.
But I am not sure how I can approach this. I have the sites crawled and HTML and related information parsed into JSON already. My current approach is something like finding the title tag and checking if the title tag contains key words like address or location, etc. If the title contains those key words, then I will look through the current page and identify chunks of content that resemble an address, such as content that are cities or countries or content that has the word St or Street inside.
I am wondering if there is a better approach to look for key data, and looking for a nicer starting point or bounce some ideas and whatnot. Or even if there are good articles to read about this would be great as well.
Let me know if this is unclear.
Thanks for the help.
In order to parse such HTML pages you have to have knowlege about their structure. There's no general solution for this problem. Each webpage needs its own solution. However, a good approach would be to ensure the HTML code is valid XML too and then use XPath to access elements at known positions. Maybe there's even an XPath like solution for standard HTML (which is not always valid xml). This way you can define a set of XPaths for each page which give you the specific elements if they exist.

Localized country names

Where can I get the country names in all languages? I need these to localize an application.
The proper location to get this information from is CLDR - Unicode Common Locale Data Repository.
There you can find an updated list of countries (core/common/main), the data is available in numerous formats.
I recommend this site: https://github.com/umpirsky/country-list
List of all countries with names and ISO 3166-1 codes in all languages
and data formats.
There's probably an ISO standard document you can buy (a useful standard is ISO 3166-1, I think).
On the other hand, you might just be able to scrape through the various language versions of this wikipedia page, since it has a list of country names. I did a random check and it seemed the entire list was in at least one non-English language, too.
A know this is an old post but I found something that might help others who end up viewing this post via a google search.
This alternative to a select list gives (some) localised country names.
selectToAutocomplete by Jamie Appleseed
Take a look at the data-alternate-spelling tag for the items within the select menu.
IP2Location provide a free CSV formatted list of country names in 81 different languages. I've found this the most useful list for this purpose. The data can be fairy easily transformed into different formats if required:
https://www.ip2location.com/free/country-multilingual

User input parsing - city / state / zipcode / country

I'm looking for advice on parsing input from a user in multiple combinations of City / State / Zip Code / Country.
A common example would be what Google maps does.
Some examples of input would be:
"City, State, Country"
"City, Country"
"City, Zip Code, Country"
"City, State, Zip Code"
"Zip Code"
What would be an efficient and correct way to parse this input from a user?
If you are aware of any example implementations please share :)
The first step would be to break up the text into individual tokens using spaces or commas as the delimiting characters. For scalability, you can then hand each token to a thread or server (if using a Map-Reducer like architecture) to figure out what each token is. For instance,
If we have numbers in the pattern, then it's probably a zip code.
Is the item in the list of known states?
Countries are also fairly easy to handle like states, there's a limited number.
What order are the tokens in compared to the common ways of writing an address? Most input will probably follow the local post office custom for address formats.
Once you have the individual token results, you can glue the parts back together to get a full address. In the cases where there are questions, you can prompt the user what they really meant (like Google maps) and add that information to a learned list.
The easiest method to add that support to an applications, assuming you're not trying to build a map system, is to query Google or Yahoo and ask them to parse the date for you.
I am myself very fascinated with how Google handles that. I do not remember seeing anything similar anywhere else.
I believe, you try to separate an input string in words trying various delimeters - space, comma, semicolon etc. Then you have several combinations. For each combination, you take each words and match it against country, city, town, postal code database. Then you define some metric on how to evaluate the group match result for each combination. Here should also be cross rules, like if the postal code does not match well, but country, city, town match well and in combination refer to a valid address then the metric yields a high mark.
It is sure difficult and not an evening code exercise. It also requires strong computational resources - a shared hosting would probably crack under just 10 requests, but a data center could serve it well.
Not sure if there is an example implementation. Many geographical services are offered on paid basis. Something that sophisticated as GoogleMaps would likely cost a fortune.
Correct me if I'm wrong.
I found a simple PHP implementation
http://www.eotz.com/2008/07/parsing-location-string-php/
Yahoo seems to have a webservice that offers the functionality (sort of)
http://developer.yahoo.com/geo/placemaker/
Openstreetmap seems to offer the same search functionality on its homepage
http://www.openstreetmap.org/
Assuming you're only dealing with those four fields (City Zip State Country), there are finite values for all fields except for City, and even that I guess if you have a big city list is also finite. So just split each field by comma then check against each field list.
Assuming we're talking US addresses-
Zip is most obvious, so check for
that first.
State has 50x2 options
(California or CA), check that next
Country has ~190x2 options, depending
on how encompassing you want to be
(US, United States, USA).
Whatever is left over is probably your City.
As far as efficiency goes, it might make sense to check a handful of 'standard' formats first, like Dan suggests.

Resources