I am trying to import a table from a webpage into a google spreadsheet.
I have tried using the following two functions and both are giving me the error that the "imported content is empty".
=importhtml("http://financials.morningstar.com/ratios/r.html?t=AAPL","table",1)
And
=importxml("http://financials.morningstar.com/ratios/r.html?t=AAPL", "//*[#id='tab-profitability']/table[2]"
p.s. the imported data is for personal use only and will not be used against the websights policies.
It's not possible with your url (http://financials.morningstar.com/ratios/r.html?t=AAPL).
The command =importhtml() it's possible if the webpage has a html table.
I give you an example :
Example
In this URL : http://fr.wikipedia.org/wiki/Démographie_de_l'Inde
In this webpage , you can see a table . The table is a html table
Code in the page :
<table class="wikitable centre" style="text-align: center;">
<tr>
<th colspan="3" scope="col" width="60%">Évolution de la population</th>
</tr>
<tr>
<th>Année</th>
<th>Population</th>
<th><abbr title="Croissance démographique">%±</abbr></th>
</tr>
<tr>
<td>1951</td>
<td>361 088 000</td>
<td>—</td>
</tr>
<tr>
<td>1961</td>
<td>439 235 000</td>
<td>+ 21,6 %</td>
</tr>
<!-- Others value -->
<td colspan="3" align="center"><small>Source : <a rel="nofollow" class="external autonumber" href="http://indiabudget.nic.in/es2006-07/chapt2007/tab97.pdf">[1]</a></small></td>
</tr>
</table>
In your Google Spreadsheet you can show data
=IMPORTHTML("http://fr.wikipedia.org/wiki/Démographie_de_l'Inde"; "table";
In this Webpage ( http://financials.morningstar.com/ratios/r.html?t=AAPL ), you don't have any html table so you can extract values.
Related
I have the following table element from a website.
Using this formula it only extracts the 1st td ie class=TTRow_left
I want to extract both class=TTRow_left and class=TTRow_right in a google sheet
Formula:
IMPORTHTML("https://www.bsesme.com/","table",6)
Html:
<table width="305" border="0" cellspacing="0" cellpadding="0">
<tbody><tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Listed on SME till Date</td>
<td class="TTRow_right" style="height:22px;" id="AL">386</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">Mkt Cap of Cos. Listed on SME till Date (Rs.Cr.)</td>
<td class="TTRow_right" style="height:22px;" id="MCL">58,225.56</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">Total Amount of Money Raised till Date (Rs. Cr.)</td>
<td class="TTRow_right" style="height:22px;" id="Td13">4,132.16</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Migrated to Main Board</td>
<td class="TTRow_right" style="height:22px;" id="MB">150</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Listed as of Date </td>
<td class="TTRow_right" style="height:22px;" id="CL"> 236</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">No. of Companies Suspended</td>
<td class="TTRow_right" style="height:22px;" id="CS">32</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">No. of Companies Eligible for Trading</td>
<td class="TTRow_right" style="height:22px;" id="CET">201</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">No. of Companies Traded</td>
<td class="TTRow_right" style="height:22px;" id="CT">110</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">Advances/ Declines/ Unchanged</td>
<td class="TTRow_right" style="height:22px;" id="Adv">73/ 32/ 5</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">Mkt Cap of BSE SME Listed Cos. (Rs.Cr.)</td>
<td class="TTRow_right" style="height:22px;" id="Dec">15,095.93</td>
</tr>
<!--<tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of SME companies migrated to main board</td>
<td class="TTRow_right" style="height:22px;" >3</td>
</tr>-->
</tbody></table>
</td>
</tr>
</tbody></table>```
There is a way, You could extract that data with Google Apps Script - i.e. writing a function that reads the values (those are returned by a separated request).
You need to make a request to this url - which is the one that loads the data:
https://www.bsesme.com/markets/MarketStat.aspx?&292022849
Values are:
bse$#$237|32|202|104|58|37|9|15,110.69|12|3,364.25|150|387|58,387.68|4,144.97
And then, extract the data.
I check the page's source code and that page is using javascript for read the data and rearrange it on the main page (i.e. https://www.bsesme.com/).
Tip: Check the main page's source code and check a function called function GetNotices(str) - that function looks like has the logic for rearrange the data.
You will have to check deeper in order to figure out how you can extract this data on your spreadsheet.
In this case IMPORTHTML would be able to return table data as long as it's not JavaScript generated, I tried checking the web page you are trying to scrap data from and it seems that the exact content that is missing is generated through JavaScript as it's not shown when disabled from the browser:
As you can see when JavaScript is disabled the content in the page is not displayed however the Table content TTRow_left is hard coded that's why the function is able to get this information from the web page:
td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Listed on SME till Date'
You will notice that TTRow_right is not displayed therefore the function won't be able to scrap data from it.
I want that my table is only visible if the size of the list opleidingen is not null.
But altough the list is empty, he still shows the header of the table and the icons. What do I do wrong?
Thankyou
<table th:if="${opleidingen.size() != 0}">
<theader>
<tr>
<th>Code</th>
<th>Titel</th>
<th>Thema</th>
<th>Delete</th>
<th>Pas Aan</th>
</tr>
</theader>
<tbody>
<tr th:each="opleiding: ${opleidingen}">
<td><span th:text="${opleiding.getCode()}"/></th>
<td><span th:text="${opleiding.getTitel()}"/></th>
<td><span th:text="${opleiding.getThema()}"/></th>
<td><img src="../static/Delete.gif"/></th>
<td><img src="../static/Edit.gif"/></th>
</tr>
</tbody>
</table>
Keeps being visible: the header of the table theader tekst and the images. Why, since there are no items to be shown, the whole table shouldn't be shown?
You need to have Thymeleaf render the template, otherwise, the Thymeleaf specific attributes will not do anything.
I was browsing the last hours to find a solution for my problem with latest puppeteer (2.0.0) / chromium 78.0.x to get our printing system working. We allow to setup page breaks in tables, which worked find in PhantomJS renderer, but not in the puppeteer/chromium solution.
Beside many little difference in global css and printing PDF header/footer the printing of tables was the last problem (hopefully).
It turns out that the "page-break-before: always" is simply ignored.
Example:
<table>
<thead> ... </thead>
<tbody> ...
<tr style="page-break-before: always;"> ...should be on next page ... </tr>
</tbody>
</table>
Some of the Chrome forum articles point out, this has been solved.
So the question is what is causing the problem.
Regards,
Andre
PS) Later we found now: put a "display: block" on all tags of the table solves the problem. Maybe that helps someone. Any comments on that?
<table style="display: block;">
<thead style="display: block;"> ... </thead>
<tbody style="display: block;"> ...
<tr style="display: block; page-break-before: always;"> ...is now on the next page ... </tr>
</tbody>
</table>
Bad news for the solution we provided above. This destroys the feature of having table headers on each page.
setup 1)
Setting "display: block;" for the thead will disable the feature of having the table header on each page.
==> no page break
setup 2)
Set the thead to "display: table-header-group;" and tbody to "table-row-group" then the chrome will ignore the page-breaks.
==> no table headers on each page
setup 3) Having the thead: "display: table-header-group;" and the tbody: "display: block" is destroying the column structure. The body will be rendered only on the first column.
==> Destroys the table. the body is just in the first column
Here comes our hack to solve the problem. we use setup 3, with this:
- we build a table with just one column
- the column contains a table with all columns we really want to render
- the column widths are set to fix values (that was anyway the case in our rendering system)
<table>
<thead>
<tr>
<td> <table> .... the header of the real table </table> </td>
</tr>
</thead>
<tbody style="display:block;">
<tr>
<td>
<table> .... one row of the real table </table>
<td>
</tr>
<tr>
<td>
<table> .... another row of the real table </table>
<td>
</tr>
</tbody>
</table>
I'm new to JSoup and my question here is how do I extract particular text from multiple blocks that share the same class and attributes?
For example here I want to extract the information on 3rd row of the HTML. How do I specified on my JSoup code to extract the information on 3rd row?
<tr>
<td align="center" colspan="2" class="maintitle">Active Stats</td>
</tr>
<tr>
<td class="row2" valign="top"><b>User's local time</b></td>
<td class="row1">Oct 22 2013, 07:23 PM</td>
</tr>
<tr>
<td class="row2" width="30%" valign="top"><b>Total Cumulative Posts</b></td>
<td width="70%" class="row1"><b>4</b>
<br />( 0 posts per day / 0.00% of total forum posts )
</td>
</tr>
Use the CSS-selector syntax to specify what row to select.
Element e = doc.select("tr:eq(2) td.row2").first();
System.out.println(e.text());
will result in
Total Cumulative Posts
A tip is to at least look through the Jsoup documentation before asking questions.
All this can easily be found in the API.
Jsoup - Use selector syntax
I have the following markup as a part of a Razor view:
<table>
<caption>Presidents</caption>
<thead>
<tr>
<th scope="col">Name</th>
<th scope="col">Born</th>
<th scope="col">Died</th>
</tr>
</thead>
<tbody>
<tr>
<th scope="row">Washington</th>
<td>1732</td>
<td>1799</td>
</tr>
<!-- etc -->
</tbody>
</table>
With the "target schema for validation" set to HTML5, Visual Studio complains thusly:
Warning 1 Validation (HTML5): Element 'th' must not be nested within element 'tbody tfoot'.
Is this really true? If so, could someone link to the spec?
My understanding was that using <th> for row headers was not just legal but encouraged. It certainly seems fairly common, I could link dozens of tutorials explaining (seemingly sensibly) that it helps with accessibility.
Is this a VS bug? A real change coming with HTML5 (a good one? a bad one?)? What's the story?
My understanding was that using <th> for row headers was not just legal but encouraged
As far as I know, this was always legal in HTML 4 (and possibly its predecessors), and hasn't changed in HTML5.
W3C's HTML5 validator, while still experimental, reports no warnings or errors. Then again, I'm sure the HTML5 validation Visual Studio is using is experimental as well since HTML5 itself hasn't yet been finalized.
The HTML5 spec on marking up tabular data, specifically section 4.9.13, shows the use of <th> within <tbody> and <tfoot> to scope row data:
<table>
<thead>
<tr>
<th>
<th>2008
<th>2007
<th>2006
<tbody>
<tr>
<th>Net sales
<td>$ 32,479
<td>$ 24,006
<td>$ 19,315
<tr>
<th>Cost of sales
<td> 21,334
<td> 15,852
<td> 13,717
<tbody>
<tr>
<th>Gross margin
<td>$ 11,145
<td>$ 8,154
<td>$ 5,598
<tfoot>
<tr>
<th>Gross margin percentage
<td>34.3%
<td>34.0%
<td>29.0%
</table>
So it's perfectly legitimate to have <th> elements inside <tr> elements inside either a <tbody> or <tfoot>. As it should be anyway, since table headings aren't just found on table headers.
The HTML5 spec only requires that it be inside a tr, and the spec actually includes an example with a th nested inside a tbody.
Generally a TH in a THEAD will have a scope value of "col" while a TH in a TBODY will have a scope value of "row".