longtime lurker, first-time poster. I usually solve my issues & upvote without needing to post, but I've been stumped all weekend!
Edit: Erik solved it:
I was looking for an answer to extract the "datePublished" or "dateModified" from a Substack article in a Google Sheet.
Goal: This will tell me when it was the last date/time I updated, for example, my PS5 restock guide, my Walmart PS5 restock guide, etc. If it's too stale, I try to add relevant information. Having it in Google Sheets makes it streamlined as there are dozens of guides.
Test Google Sheet:
https://docs.google.com/spreadsheets/d/1hLBFMWCTc2hpC-1C8Sxd5OVREdNHTVTtrJsAAU5Jl94/edit#gid=0
I've done this before for other sites I've worked at, but there appears to be no date in the meta data on Substack :/ (I could be wrong, as I'm no expert at reading XPATH)
I do see this in the body for the linked example:
<time datetime="2022-07-29T11:52:00.000Z">Jul 29</time>
I've been trying things like this (where E17 is where I put the article URL in Google Sheets) to no effect.
=REGEXEXTRACT(IMPORTXML(E17, "//time[#datetime='datePublished']/#content"), "(.+)T")
I've been mostly working off of this StackOverflow solution, but I haven't been able to apply the same finding to Substack's formatting.
If you want to grab it directly using a Google Sheets formula, this should work for you:
=ArrayFormula(IFERROR(VLOOKUP("*",FLATTEN(IFERROR(REGEXEXTRACT(IMPORTXML("https://www.theshortcut.com/p/ps5-restock","//div[2]"),"Swider(.?.?.?.?\d\d{1}[hrago\s]*)"))),1,FALSE),"???"))
To set realistic expectations, I usually can't invest this much time into working out such a solution on this forum. But I'm on vacation at the moment and filling time while my guest is otherwise occupied.
One further note: this is specific to the two sites you gave as examples. It will only work for sites where the second <div> holds this information and only where the data exists as strings exactly like those found on these two sites (including the poster's last name as "Swider").
ADDENDUM:
Looking at this further, did you try simply the following?
=IMPORTXML(C2, "//time")
(assuming your URL is in C2, etc.)
This seems to work for me, given that it appears the date/time data you want is contained within the first <time> element on the web page.
Related
Unfortunately I don't know how to source this and was wondering if someone could show me how. I am trying to learn hear so the correct answer is great, but the "How to get the answer" is more important to me.
I am using google sheets, and looking to bring in a table or data point from a website. I know =importhtml works for this but I don't know how to tell it what to get, I just keep getting people giving me the answers instead of the how too.
The current one I am looking for is the website "https://www.marketbeat.com/stocks/NASDAQ/TSLA/earnings/" Using TSLA as the example stock. I am looking to bring in the table that has all the earnings dates. The header is "TESLA (NASDAQ:TSLA) EARNINGS HISTORY BY QUARTER". I am just looking for all the dates in that table, but if it is easier to bring in the whole table I'll do that instead. But I know these websites update and that can change how the =importhtml works, so I would like to be able to fix it myself.
Also I am on a Mac
Thanks for any help.
=index(IMPORTHTML("https://www.marketbeat.com/stocks/NASDAQ/TSLA/earnings/","table",2),,1)
"2" is the table index for the one you are looking for and "1" is the column that you need.
Delete 1 and you will get the whole table.
=index(IMPORTHTML("https://www.marketbeat.com/stocks/NASDAQ/TSLA/earnings/","table",2))
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Looking for a way to get the stock price for a specific date (eg. 31.1.2020).
I know I can use IMPORTHTML or IMPORTXML together with INDEX to get the table. However, when I use the browser to search for a specific date on investing.com, there's no direct URL for the date, and it rather presents me with the latest stock prices instead. This is the stock I'm looking for
I'm afraid that investing.com do not provide an API
https://www.investing-support.com/hc/en-us/articles/115005473825-Do-you-provide-an-API-
So you won't be able to do this very easily (if at all) with Google Sheets or Apps Script. The reason is that it looks like most of the content on the site is generated with JavaScript, and so it is not part of the original HTML that is shown when you first enter the site. The HTML is what IMPORTHTML gets.
To get the information you are looking for without using and API, would involve browser automation. That is, simulate the clicks that a user might make and then get the data. This can be very finicky and is prone to break whenever the website changes its layout or HTML for whatever reason (something that tends to happen quite often for busy websites).
I would recommend using a different service that has a Sheets friendly HTML format. Better than that, I would look into a service that has an API and interact with it with Apps Script. Finally, if you need it to be investing.com you could look into something like Puppeteer which can automate a browser (though its a fair bit more complex than a formula or an API).
You can import using importhtml the historical data for the last 30 days, and then use a lookup for that data.
To get historical data I use:
query(IMPORTHTML("https://investing.com/equities/STOCK-historical-data"; "table"; 2);"SELECT Col1, Col2")
I don't know if you can import more than 30, I'm searching for that answer myself.
We have a huge collection of spreadsheets with statistical data. There is one "master-sheet" with links to all other sheets. Most of these links have been there for a long time. It seems Google has changed link-formats over time, including id's used to identify the sheets.
Old link format, used often in our master sheet:
http://spreadsheets.google.com/pub?key=rcTO3doih5lvJCjgLSvlajA
Newer link format, used occasionally in our master sheet:
https://docs.google.com/spreadsheet/pub?key=0AkBd6lyS3EmpdDlSTTVWUkU3Z254aEhERmVuQWZaeWc
Newest link format, where Google redirects when you visit a link in the "newer" format: https://docs.google.com/spreadsheets/d/1WipPWXQqXSjj9vPTu1LXD8IxeTfIn4RIBrGaOBd0DXc/pub
Now recently (since a week or so) Google seems to have quit support for the first format. I.e., most of our links are dead, so we can't access our spreadsheets. And we have no way to find out what the new, working, links are.
Does anyone know how to retrieve the spreadsheets when all you have is the old link? We don't have a Google Drive folder with the spreadsheets, so that solution doesn't work.
Thank you so much for any ideas!
You can take the ID of the old link and put it in place of the ID of the newer link (not the newest!), then it will work.
e.g. old link:
http://spreadsheets.google.com/pub?key=rcTO3doih5lvJCjgLSvlajA
Take rcTO3doih5lvJCjgLSvlajA and insert below:
https://docs.google.com/spreadsheet/pub?key=
Results in: https://docs.google.com/spreadsheet/pub?key=rcTO3doih5lvJCjgLSvlajA
You can then follow the redirect to get the newest version of the link
I hope that you guys are fine. I want to build a simple spreadsheet and I thought I could be able to make one but blank sheet looks horrible to me. I am sure that you guys are kind enough to help me out.
I want to perform multiple Google search queries in Google spreadsheet and want to parse results of each search (top 10 results of each search)
Something like this: https://www.youtube.com/watch?v=tBwEbuMRFlI
But when I tried his given formula in description to play test, Google returned #Error to me, I don't know why.
Can you guys please help me out in making a simple spreadsheet compatible for multiple queries at once? Like one column for keywords (where I could paste my list of keywords) and then 10 columns of search results. All results for one keyword should come in one row
Something like this:
My 1st Example Query = 1st search result, 2nd search result, 3rd result and so on.
My 2nd Example Query = 1st search result, 2nd search result, 3rd result and so on.
It must be easy to code but yeah, it might be time-consuming and I would be very grateful if anyone of you could help me about it.
Looking forward to your help guys.
The problem is that you want to scrape out of Spreadsheets, that's a bad approach and is almost certainly not going to work. Even if you manage to write a scraper inside that limited environment it will easily be spotted by Google.
As you said time is not a problem, I would suggest another route.
Use a backend tool/script that scrapes the data
Use a backend tool/script that creates/modifies the Google spreadsheet
You can run such a script(s) manually on your PC or from a server full automated using a scheduler/cron job.
To create/modify spreadsheets look here: How do I access the Google Spreadsheets API in PHP?
To scrape Google look here: Is it ok to scrape data from Google results?
So this is PHP as language of choice but you can do the exactly same in Java or Python or C#
There is a third party solution like SerpApi you could use for this. It's a paid API with a free trial.
Google Sheets Add-on: SerpApi - Search Engine Results and Ranks
Example code to extract title from the first result:
=SERPAPI_RESULT("engine=google&q=coffee&location=Austin, Texas, United States&google_domain=google.com&gl=us&hl=en", "organic_results.0.title")
I have been using LabVIEW to collect measurement data, and I would like to know if it is possible for LabVIEW to communicate the results to a Google Spreadsheet. If so, where could I find resources to learn how to make LabVIEW transmit information to the Google Spreadsheet ?
Thanks!
EDIT AND FOLLOW-UP- I used Jonathan's suggestion below and experimented with the LabVIEW http Post.vi. It's very simple, all you need to do is enter the URL of the Google form (replacing the final "viewform" with "formResponse") and a string with the data you want to enter (with rough syntax = ). A big thanks for that answer, it was really helpful !
However, when I try to use this method for a Google form with more than one page, the data isn't read properly... The form is still sent but every field not present on the first page of the form remains blank on the Spreadsheet. I feel that this is somehow linked to the fact that in the Google form, the URL of all the pages after page 1 are the URL of page 1 with the final "viewform" replaced with "formResponse". Is this what is causing the error or is it something else altogether, and how can I fix it ?
I can think of two ways to do this:
You can create a form in google spreadsheets. The form appears as an html document with standard tags. From here, I would use labview's http functionality to submit data to that form using a POST request. This would be the easiest way to get data in there.
Using the Google Apps API, you can manipulate google spreadsheets and dump data in there directly. This is going to be more complicated in terms of development time, but more configurable in the long run. https://developers.google.com/google-apps/spreadsheets/#what_can_this_api_do There are .net and java code examples throughout the documentation, so it would take some work to port this to LabVIEW, but it could be done.