Scrape from website and save into different columns in a spreadsheet [closed] - google-sheets

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 11 months ago.
Improve this question
Suppose there's a website that has a list of details of some companies, for example, name, HQ area, turnover, etc. How do I scrape that data and fill it into different columns (like name, turnover) with each row having the details of a separate company?

Google Sheets allow you to import html tables or list with the IMPORTHTML(url, query, index) function.
For example, using the Wikipedia page List of largest companies by revenue as an example.
We want the data from the main table, so the first thing that we have to do, is to know what index it occupies in the page. To do this, we can use document.querySelectorAll('table') or $$('table'), as you can see from the result, the table that we want is in the position 5 of the array, so inside our google sheet we can use:
=IMPORTHTML("https://en.wikipedia.org/wiki/List_of_largest_companies_by_revenue","table",5)
From here, you should change the query parameter to list and find what index it occupies within the page using the method described above. In any case, you could always use IMPORTXML(url, xpath_query), and knowing the XPath of the information, you could come up with a similar solution.

Related

Data Validation - List from range [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 12 months ago.
Improve this question
I have two sheets one with my Listings which need to be assigned to a master category and a subcategory IF AVAILABLE. So I have two columns for my master category and the subcategory.
The second sheet, Categories, contains also two columns, one for each. The first column lists the master categories and the second column the subscategories which in some cases cells are empty.
I added a data validation on my Listings sheet column and I can select the Master Category from a dropdown list. What I would like to do is have a dropdown list to the Subcategory that pulls the subcategories from the Categories sheet but ONLY the ones that belong to the master category.
I tried to do it with a LOOKUP formula but the same formula that works OK in the sheet, returns a "Please enter a valid range" error.
Here is an example:
https://docs.google.com/spreadsheets/d/1ukmc8T1jDsxWy5aQJZVU3q2YNJvUxaATOzvAvO7WdKA/edit?usp=sharing
OK, I figured it out. I was taking the wrong approach. I created a data validation for the first column that pulled all the categories. On the second column I added a VLOOKUP that brought the master category (if available).

How to consolidate data in google sheet using query function [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Hell,
I encountered one problem while combining different sheets(tabs) into one master sheet, e.g I have three sheets named Cola, Pepsi & Thumbsup, I consolidated their results all in my Master sheet using Query formula with null. Whenever anyone entered data in Cola, Pepsi & Thumbsup sheet it comes to my master sheet somewhere in Between the rows. I need it to comes queue wise, is it possible?
The problem I faced is, I am writing a remark on my master sheet on each entry so whenever someone added new data in Cola, Pepsi & Thumbsup it inserted in between the data and break the sequence of my remarks.
The remark I wrote for e.g Cola 103 it shift to cola 102 when someone enters new data in cola sheet.
About the query, it just displays the values. It's kind of impossible to control as your mention. I recommend to use app-script (macro), function append or getValue setValue or getRange copyto as text paste not just display
If your data have timestamp each row. It might work, sort by depending on date and time added. Need to use just one column sorted. In function query ...order by A asc
Anyways, I still recommend to use app-script (macro)

How can I list each row that matches a string, set by a dropdown? Filter? VLookup? Query? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
https://docs.google.com/spreadsheets/d/1mvBFBuRUl2qiPktE1Y4lCTeaSK4BwJ7S4Nf0S33LTRg/edit?usp=sharing
On Sheet2 I am wanting to show all the columns associated with a specific string that comes from a dropdown data validation. I have tried using VLOOKUP but that only outputs the first entry found, I want to print every entry along the row.
I put an example of what I am trying to get it to look like but not sure if VLOOKUP or QUERY or FILTER or something else is what I need
In your example spreadsheet, sheet 2, delete everything in the range F3:I and then enter in F3
=query(A:D, "where D='"&F2&"'",1)
If you want to reference the data on the sheet you'll have to use
=query('Draft Board'!A:D, "where D='"&F2&"'",1)
See if that works ?

How can you map IDs even when duplicates are eliminated? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I currently have a large set of objects in SAS (>100000) each with about 60 columns of data, including an ID number. There are many duplicate ID numbers in this set of data. My goal is to convert the ID numbers that I currently have into another form of ID number using a piece of software that I have. The problem is that when I input the ID numbers into the software, the converted output comes back without the duplicates, which I need. Is there any way to use the output ID numbers to somehow create a list of output IDs except with the duplicates that the original set of data had. Any language or piece of software would be fine.
Here is a illustration of what I described above.
Original IDs: 086516 677240 449370 677240 941053 449370
Output: 147244 147947 147957 148021
Preferred Output: 147244 147947 147957 147947 148021 147957
You can merge on the ID using a MERGE statement, and it will append the value to each of the records with the same ID value.
data want;
merge have(in=a) newIDs(in=b);
by id;
if a and b;
run;

A program that gather search results from Google and Yahoo [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I want to search on Google Yahoo, for forums and blog posts limited to a specific country. The results will be saved to a database for sorting and further processing.
From each search result, I need :
the URL itself
date and time
the domain
I am working on a program, that accepts keywords as input, and the program will automatically search on Google and Yahoo and save the results to a database.
function OnLoad() {
// Create a search control
var searchControl = new google.search.SearchControl();
// Add in a full set of searchers
var localSearch = new google.search.LocalSearch();
searchControl.addSearcher(localSearch);
searchControl.addSearcher(new google.search.WebSearch());
searchControl.addSearcher(new google.search.VideoSearch());
searchControl.addSearcher(new google.search.BlogSearch());
searchControl.addSearcher(new google.search.NewsSearch());
searchControl.addSearcher(new google.search.ImageSearch());
searchControl.addSearcher(new google.search.BookSearch());
searchControl.addSearcher(new google.search.PatentSearch());
// Set the Local Search center point
localSearch.setCenterPoint("New York, NY");
// tell the searcher to draw itself and tell it where to attach
searchControl.draw(document.getElementById("searchcontrol"));
// execute an inital search
searchControl.execute("VW GTI");
}
google.setOnLoadCallback(OnLoad);
This code is from the Google AJAX search API, however there seems not to be a way to specify the domain, country, date and time as search criteria. Moreover, it returns the result in HTML, which is hard to slice up and save as search results entries to the DB.
EDITED to describe my specific problem.
Parsing the raw HTML should be your last resort here. If they change the markup, you have to redesign your parser. That is pretty much guaranteed to happen before the "3 years" time period that you have mentioned with Google's AJAX Search API.

Resources