im trying to pull the sector for companies from marketwatch=INDEX(IMPORTHTML("https://www.marketwatch.com/investing/stock/AAPL/company-profile", "table", 5), 2, 2)
I dont know how to determine what table number to use or how to find the correct column etc.
You will need to get the XPATH of the html element and then use IMPORTXML.
First you need to get the XPATH, easiest way is with Chrome's Inspect feature.
Right Click on the sector name
Select Inspect
Find the line of HTML that is highlighted blue (it doesn't always appear in the middle of the tool)
Click on the three dots to the left of the line
Click Copy
Click full Xpath
Then go to your sheet and use IMPORTXML()
To go one step further, you can do it for a list of stocks, by building the URL with the companies stock symbol like this:
"https://www.marketwatch.com/investing/stock/"&LOWER(D9)&"/company-profile"
From a ticker (french) :
Having the full name : =IMPORTXML("https://www.marketwatch.com/investing/stock/"&MAJUSCULE(A2)&"/company-profile?countrycode=fr&mod=mw_quote_tab";"/html/body/div[3]/div[6]/div[1]/div[1]/div/div/h4")
Having the industry or sector :
=IMPORTXML("https://www.marketwatch.com/investing/stock/"&MAJUSCULE(A2)&"/company-profile?countrycode=fr&mod=mw_quote_tab";"/html/body/div[3]/div[6]/div[1]/div[1]/div/ul/li[1]/span")
Related
I am trying to scrape a website for financials of Indian companies as a side project & put it in Google Sheets using XPATH
Link: https://ticker.finology.in/company/AFFLE
I am able to extract data from elements that have a specific id like cash, net debt, etc. however I am stuck with extracting data for labels like Sales Growth.
tried
Copying the full xpath from console, //*[#id="mainContent_updAddRatios"]/div[13]/p/span - this works, however, i am reliable on the index of the div (13) and that may change for different companies, hence i am unable to automate it.
Please assist with a scalable solution
PS: I am a Product Manager with basic coding expertise as I was a developer few years ago.
At some point you need to "hardcode" something unless you have some other means of mapping the content of the page to your spreadsheet. In your example you appear to be targeting "Sales Growth" percentage. If you are not comfortable hardcoding the index of the div (13), you could identify it by the id of the "Sales Growth" label which is mainContent_lblSalesGrowthorCasa.
For example, change your
//*[#id="mainContent_updAddRatios"]/div[13]/p/span
to:
//*[#id = "mainContent_updAddRatios"]/div[.//span/#id = "mainContent_lblSalesGrowthorCasa"]/p/span
which is selecting the div based on the div containing a span with id="mainContent_lblSalesGrowthorCasa". Ultimately, whether you "hardcode" the exact index of the div or "hardcode" the ids of the nodes, you are still embedding assumptions regarding the structure of page.
Thanks #david, that helped.
Two questions
What if the structure of the page would change? Example: If the website decided to remove the p tag then would my sheet fail? How do we avoid failure in such cases?
Also, since every id is unique, the probability of that getting changed is lesser than the index being changed. Correct me, if I am wrong?
What do we do when the elements don't have an id like Profit Growth, RoE, RoCE etc
I have two worksheets in a Google spreadsheet.
Sheet-A: Treat this like an “order booking” page. Consists of 10 empty line items, where the user can select an item from a dynamically generated dropdown list. The values in the dynamic list come from Sheet-B’s 1st column range
Sheet-B: Treat this like a “menu details” page. It consists of Menu item name, description, ingredients, etc
What I want to do is:
When users try to place an order, they select a menu item from the dropdown in Sheet A.
If they want to know more about an item, they should click on the hyperlink on top of the dropdown value and be navigated to the respective menu item description in Sheet B.
To summarize, the dynamic values coming in the dropdown list should hold a hyperlink within itself which points to where the value is coming from.
This is straightforward, use the HYPERLINK() function and either the CONCATENATE() function or use the concatenate operator "&". Here is an example from one of my projects:
=HYPERLINK(CONCATENATE("https://tracker.telenetwork.com/admin/reports/SCReport/report_emp.asp?emp=",$B$4,"&nt=",$A$4,"&sd=",A7,"&ed=",B7,"&dur=99999&per=15&client=",C4),"Call Recordings")
I built an example for another person asking a similar type question, between that example and the formula above you should be able to figure out how to implement for your specific situation. Feel free to make a copy of this sheet:
https://docs.google.com/spreadsheets/d/1qbLOjTdzISICTKyUp_jK6gZbQCt-OwtDYYy3HNJygeE/edit#gid=795322028
Looking to extract the author name from articles. Currrently using =IMPORTXML(G2,"//*[#class='author-details']")
When I do this, it creates 4 cells underneath which contain the word 'By', which I can't get rid of.
Very new to code - what am I doing wrong?
Attached example: https://docs.google.com/spreadsheets/d/1Mi1D5G1-_gNsQwVQ6I_ealDqcWixKA2p-hFqJpjlGt4/edit?usp=sharing
You can use:
=index(IMPORTXML(G2,"//*[#class='author-details']"),1,2)
This displays only the first row of the second column of what is returned. The information You are after.
Edit:
Additionally, since you highlighted that you want author name. If all the names are in that format "By FIRST LAST #TwitterHandle Affiliation" then you can use this to get just the author's name:
=trim(split(right(index(IMPORTXML(G2,"//*[#class='author-details']"),1,2),len(index(IMPORTXML(G2,"//*[#class='author-details']"),1,2))-3),"#",true,true))
Will likely look like voodoo but paste it in, it works. It removes the first 3 characters ("By "), splits the text at the "#" symbol, and then keeps only the text on the left side of it, the name.
I'm building a searching sheet, which take a dropdown cell as input data. User can select from the list, or type in the key word. Searched data will be populated by rows/column with my query formula. The search result itself works just fine.
However, the drop down list doesn't.
Here's a picture of my search.
As you can see in the picture, autocomplete would only populate 6 items, while my search query produce much more, which is exactly what I need. It's clear enough : Autocomplete compare the whole text, not word - by - word.
Is there anyway to change this behavior ?
My sheet is for other people to look for a specific item. I cannot expect them to know what I have entered in my database and force them to search by those exact keywords.
Anything from script to formula is fine with me. I just need at least, a lead..
I want to know if there is any way to get KW from campaigns that have specific campaign label
For example:
To get top 10 KW in campaigns that have campaign label that called "abc" ?
Thanks
The best method for this is to attach the campaign name as a label on the keyword level. If you have only a few campaigns, you can go into each campaign, then keywords view within the UI, select all and add a new label with the campaign's name.
If you have a giant account (this won't work if you already have other keyword level labels set, if you do, use the first method):
Go to Keywords view from within the UI.Download Keywords report (check editable)
Then copy the contents of the C column and paste the contents into the Label column (making sure the column is still called "Labels" [don't change anything else]
Then click the Reports and Uploads tab in the left navigation window.
Select the Uploads subtab.
Click "Browse for file" to locate your edited and saved report.
Click the "Upload" button.
A yellow box will appear reminding you that uploading the report will immediately update your account. (Beyond this point, there is no "undo" or "cancel" option.) Clicking "Yes, I understand" will immediately begin the process of applying your changes to your account.
Now all of your keywords are labeled with your campaign names!