I can't use IMPORTXML correctly - google-sheets

I'm a beginner in Sheets, on Google.
I'm trying to import the XML from this page: https://www.flashscore.com.br/basquete/eua/nba
I want to get all the team's names and compare them with a game list, to see if it is the correct game. If positive, get the URL of the game on the same site.
Anyone can help me?
Just to info, I already try every way to solve this but always received N/A error. Because of that, I didn't put any formula here.

Related

Using IMPORTXML in Google Sheets to scrape data from Transfermarkt

I'm trying to see if it's possible to scrape data from Transfermarkt.com to import into a Google Sheets doc.
Currently, I'm trying to import the team data from a player's profile page (see example URL here: https://www.transfermarkt.co.uk/joao-palhinha/profil/spieler/257455) but in future may want to import other data as well.
I wonder if it's possible to scrape data in this way from Transfermarkt, but if it is, any advice on what I'm doing wrong would be very much appreciated!
Right now I'm using, =IMPORTXML(B1,"/html/body/div[3]/main/header/div[4]/div/span[1]/a") where the URL is in B1. I copied the Full XPath from the HTML, but have also tried this by copying just the XPath too.
It says Loading for a few seconds before returning N/A, with an error message 'Resource at URL not found'.
I'm expecting the result in this instance to be Fulham.
Thanks

Solved: Extract date from my Substack webpage to Google Sheets

longtime lurker, first-time poster. I usually solve my issues & upvote without needing to post, but I've been stumped all weekend!
Edit: Erik solved it:
I was looking for an answer to extract the "datePublished" or "dateModified" from a Substack article in a Google Sheet.
Goal: This will tell me when it was the last date/time I updated, for example, my PS5 restock guide, my Walmart PS5 restock guide, etc. If it's too stale, I try to add relevant information. Having it in Google Sheets makes it streamlined as there are dozens of guides.
Test Google Sheet:
https://docs.google.com/spreadsheets/d/1hLBFMWCTc2hpC-1C8Sxd5OVREdNHTVTtrJsAAU5Jl94/edit#gid=0
I've done this before for other sites I've worked at, but there appears to be no date in the meta data on Substack :/ (I could be wrong, as I'm no expert at reading XPATH)
I do see this in the body for the linked example:
<time datetime="2022-07-29T11:52:00.000Z">Jul 29</time>
I've been trying things like this (where E17 is where I put the article URL in Google Sheets) to no effect.
=REGEXEXTRACT(IMPORTXML(E17, "//time[#datetime='datePublished']/#content"), "(.+)T")
I've been mostly working off of this StackOverflow solution, but I haven't been able to apply the same finding to Substack's formatting.
If you want to grab it directly using a Google Sheets formula, this should work for you:
=ArrayFormula(IFERROR(VLOOKUP("*",FLATTEN(IFERROR(REGEXEXTRACT(IMPORTXML("https://www.theshortcut.com/p/ps5-restock","//div[2]"),"Swider(.?.?.?.?\d\d{1}[hrago\s]*)"))),1,FALSE),"???"))
To set realistic expectations, I usually can't invest this much time into working out such a solution on this forum. But I'm on vacation at the moment and filling time while my guest is otherwise occupied.
One further note: this is specific to the two sites you gave as examples. It will only work for sites where the second <div> holds this information and only where the data exists as strings exactly like those found on these two sites (including the poster's last name as "Swider").
ADDENDUM:
Looking at this further, did you try simply the following?
=IMPORTXML(C2, "//time")
(assuming your URL is in C2, etc.)
This seems to work for me, given that it appears the date/time data you want is contained within the first <time> element on the web page.

=importhtml Google Sheets table

Unfortunately I don't know how to source this and was wondering if someone could show me how. I am trying to learn hear so the correct answer is great, but the "How to get the answer" is more important to me.
I am using google sheets, and looking to bring in a table or data point from a website. I know =importhtml works for this but I don't know how to tell it what to get, I just keep getting people giving me the answers instead of the how too.
The current one I am looking for is the website "https://www.marketbeat.com/stocks/NASDAQ/TSLA/earnings/" Using TSLA as the example stock. I am looking to bring in the table that has all the earnings dates. The header is "TESLA (NASDAQ:TSLA) EARNINGS HISTORY BY QUARTER". I am just looking for all the dates in that table, but if it is easier to bring in the whole table I'll do that instead. But I know these websites update and that can change how the =importhtml works, so I would like to be able to fix it myself.
Also I am on a Mac
Thanks for any help.
=index(IMPORTHTML("https://www.marketbeat.com/stocks/NASDAQ/TSLA/earnings/","table",2),,1)
"2" is the table index for the one you are looking for and "1" is the column that you need.
Delete 1 and you will get the whole table.
=index(IMPORTHTML("https://www.marketbeat.com/stocks/NASDAQ/TSLA/earnings/","table",2))

IMPORTHTML on Google Sheets returning a #N/A error, but only in one document

I have a Google Sheets document where I track the prices of several stocks. I made this a couple of months ago, and have been experiencing this issue for the past couple of weeks:
This formula returns "#N/A", the error description is: "Could not fetch url: https://finviz..."
=substitute(INDEX(IMPORTHTML("https://finviz.com/quote.ashx?t=VOO","table",11),8,2),"*","")
However, if I create a new Google Sheets document and use this exact formula, it works. Does anyone know what could be the problem?
I am having the same issue. Something must have been changed at finviz / google :(
There are also some discussions in the google support groups.
One possible solution could be to put all the symbols you're interested in into one URL, e.g. https://finviz.com/screener.ashx?v=161&t=FB,AAPL,GOOG,TSLA&ta=0&p=w
and then parse the resulting table.
Unfortunately I am not very good at the parsing part and have to do it by try and error.
But for example
=importxml("https://finviz.com/screener.ashx?v=161&t=FB,AAPL,GOOG,TSLA&ta=0&p=w";"//*[#id='screener-content']/table/tbody/tr[4]/td/table")
is at least showing some results in google docs. So this might be something to work with.
It will work again by removing 'SUBSTITUTE' and switching to table 8.
A2 = stock ticker
=ÍNDICE(IMPORTHTML("https://finviz.com/quote.ashx?t="&A2;"table";8);7;2)

Performing Google Search In Spreadsheet And Scraping The SERP Data

I hope that you guys are fine. I want to build a simple spreadsheet and I thought I could be able to make one but blank sheet looks horrible to me. I am sure that you guys are kind enough to help me out.
I want to perform multiple Google search queries in Google spreadsheet and want to parse results of each search (top 10 results of each search)
Something like this: https://www.youtube.com/watch?v=tBwEbuMRFlI
But when I tried his given formula in description to play test, Google returned #Error to me, I don't know why.
Can you guys please help me out in making a simple spreadsheet compatible for multiple queries at once? Like one column for keywords (where I could paste my list of keywords) and then 10 columns of search results. All results for one keyword should come in one row
Something like this:
My 1st Example Query = 1st search result, 2nd search result, 3rd result and so on.
My 2nd Example Query = 1st search result, 2nd search result, 3rd result and so on.
It must be easy to code but yeah, it might be time-consuming and I would be very grateful if anyone of you could help me about it.
Looking forward to your help guys.
The problem is that you want to scrape out of Spreadsheets, that's a bad approach and is almost certainly not going to work. Even if you manage to write a scraper inside that limited environment it will easily be spotted by Google.
As you said time is not a problem, I would suggest another route.
Use a backend tool/script that scrapes the data
Use a backend tool/script that creates/modifies the Google spreadsheet
You can run such a script(s) manually on your PC or from a server full automated using a scheduler/cron job.
To create/modify spreadsheets look here: How do I access the Google Spreadsheets API in PHP?
To scrape Google look here: Is it ok to scrape data from Google results?
So this is PHP as language of choice but you can do the exactly same in Java or Python or C#
There is a third party solution like SerpApi you could use for this. It's a paid API with a free trial.
Google Sheets Add-on: SerpApi - Search Engine Results and Ranks
Example code to extract title from the first result:
=SERPAPI_RESULT("engine=google&q=coffee&location=Austin, Texas, United States&google_domain=google.com&gl=us&hl=en", "organic_results.0.title")

Resources