Import table from html into google sheets - google-sheets

I have the following table element from a website.
Using this formula it only extracts the 1st td ie class=TTRow_left
I want to extract both class=TTRow_left and class=TTRow_right in a google sheet
Formula:
IMPORTHTML("https://www.bsesme.com/","table",6)
Html:
<table width="305" border="0" cellspacing="0" cellpadding="0">
<tbody><tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Listed on SME till Date</td>
<td class="TTRow_right" style="height:22px;" id="AL">386</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">Mkt Cap of Cos. Listed on SME till Date (Rs.Cr.)</td>
<td class="TTRow_right" style="height:22px;" id="MCL">58,225.56</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">Total Amount of Money Raised till Date (Rs. Cr.)</td>
<td class="TTRow_right" style="height:22px;" id="Td13">4,132.16</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Migrated to Main Board</td>
<td class="TTRow_right" style="height:22px;" id="MB">150</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Listed as of Date </td>
<td class="TTRow_right" style="height:22px;" id="CL"> 236</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">No. of Companies Suspended</td>
<td class="TTRow_right" style="height:22px;" id="CS">32</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">No. of Companies Eligible for Trading</td>
<td class="TTRow_right" style="height:22px;" id="CET">201</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">No. of Companies Traded</td>
<td class="TTRow_right" style="height:22px;" id="CT">110</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">Advances/ Declines/ Unchanged</td>
<td class="TTRow_right" style="height:22px;" id="Adv">73/ 32/ 5</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">Mkt Cap of BSE SME Listed Cos. (Rs.Cr.)</td>
<td class="TTRow_right" style="height:22px;" id="Dec">15,095.93</td>
</tr>
<!--<tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of SME companies migrated to main board</td>
<td class="TTRow_right" style="height:22px;" >3</td>
</tr>-->
</tbody></table>
</td>
</tr>
</tbody></table>```

There is a way, You could extract that data with Google Apps Script - i.e. writing a function that reads the values (those are returned by a separated request).
You need to make a request to this url - which is the one that loads the data:
https://www.bsesme.com/markets/MarketStat.aspx?&292022849
Values are:
bse$#$237|32|202|104|58|37|9|15,110.69|12|3,364.25|150|387|58,387.68|4,144.97
And then, extract the data.
I check the page's source code and that page is using javascript for read the data and rearrange it on the main page (i.e. https://www.bsesme.com/).
Tip: Check the main page's source code and check a function called function GetNotices(str) - that function looks like has the logic for rearrange the data.
You will have to check deeper in order to figure out how you can extract this data on your spreadsheet.

In this case IMPORTHTML would be able to return table data as long as it's not JavaScript generated, I tried checking the web page you are trying to scrap data from and it seems that the exact content that is missing is generated through JavaScript as it's not shown when disabled from the browser:
As you can see when JavaScript is disabled the content in the page is not displayed however the Table content TTRow_left is hard coded that's why the function is able to get this information from the web page:
td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Listed on SME till Date'
You will notice that TTRow_right is not displayed therefore the function won't be able to scrap data from it.

Related

Is commenting faster than th:remove="all-but-first"?

There's an example in the Thymeleaf docs that I'm curious about.
Is commenting out a block using Thymeleaf-style commenting faster than using th:remove="all-but-first"?
Example:
<table>
<tr th:each="user : ${users}">
<td th:text="${user.name}">Jamie Dimon</td>
</tr>
<!--/* Hidden from evaluation -->
<tr>
<td>Jeff Bezos</td>
</tr>
<tr>
<td>Warren Buffett</td>
</tr>
<!--*/-->
</table>
vs.
<table th:remove="all-but-first">
<tr th:each="user : ${users}">
<td th:text="${user.name}">Jamie Dimon</td>
</tr>
<tr>
<td>Jeff Bezos</td>
</tr>
<tr>
<td>Warren Buffett</td>
</tr>
</table>
In both cases, prototyping would show the same HTML, but I am wondering whether the low precedence of th:remove would make it less desirable since it would be removing the tags after evaluating the th:each.

Unexpected <tr> when looping

I have an array of arrays #iterated_orders like this:
[[1, "Don", 3], [nil, nil, 4], [2, "Vri", nil]]
And code in my view like this:
%table
- #iterated_orders.each do |day, day_name, order_id|
- unless day.blank?
%tr
%td.day= day
%td= order_id
I would expect it to output this html:
<tr>
<td class="day">1 Don</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td class="day">2 Vri</td>
<td></td>
<td></td>
</tr>
But it outputs this HTML:
<tr>
<td class="day">1 Don</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td class="day">2 Vri</td>
<td></td>
</tr>
<tr>
<td></td>
</tr>
Why is there an extra <tr> and is the <td> with the order_id not added to the existing <tr>?
Your Haml actually renders:
<table>
<tr>
<td class='day'>1</td>
</tr>
<td>3</td>
<td>4</td>
<tr>
<td class='day'>2</td>
</tr>
<td></td>
</table>
When you view it in a browser, the browser will correct this to be valid HTML, including adding extra tr elements, which I suspect is where you are seeing your result (although I get something different in Chrome).
The td with the order_ids are not added to the previous tr because that tr has been closed at that point. Your Haml reads as “unless day is blank, insert a new row containing a cell with the day (and close it), and then insert some table cells with the order_ids”.
The best way to achieve what you are trying to do with Haml is to first get your data into a form that matches your intended output. Being familiar with the Enumerable methods can help here. In particular in this case chunk_while is probably what we want:
#sorted_orders = #iterated_orders.chunk_while {|before, after| after[0].blank? }
Now you can iterate over this structure to produce the HTML:
%table
- #sorted_orders.each do |day|
%tr
-# the first sub-array contains the day:
%td.day #{day[0][0]} #{day[0][1]}
-# then add a td for each order_id (including the first):
- day.each do |d|
%td= d[2]
This produces (with your example data):
<table>
<tr>
<td class='day'>1 Don</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td class='day'>2 Vri</td>
<td></td>
</tr>
</table>
which isn’t exactly what your goal is (you have an extra td in the second row). You may have to fix the data a bit more to get equal numbers of elements for each day.

i want only published children in umbraco

I want only those children who are publish in content folder.
this is my below code:
<umbraco:Macro runat="server" language="cshtml">
#foreach (var item in Model.Children)
{
<h3 class="vacancyH">#item.jobTitle</h3>
<table class="vaccTbl">
<tr>
<td class="vaccDetailTitle">Salary & Benefits:</td>
<td class="vaccDetailDesc">#item.salaryBenefits</td>
</tr>
<tr>
<td class="vaccDetailTitle">Employment Type:</td>
<td>#item.employmentType</td>
</tr>
<tr>
<td class="vaccDetailTitle">Department:</td>
<td>#item.department</td>
</tr>
<tr>
<td class="vaccDetailTitle">Report to Position:</td>
<td>#item.reportToPosition</td>
</tr>
<tr>
<td class="vaccDetailTitle">Location:</td>
<td>#item.location</td>
</tr>
<tr>
<td class="vaccDetailTitle">Date of Description:</td>
<td>#item.businessArea</td>
</tr>
<tr>
<td class="vaccDetailTitle" valign="top">Summary:</td>
<td class="tablep">#item.vacancySummary</td>
</tr>
<tr>
<td colspan="2" valign="middle"><img src="/images/wordicon.jpg" alt="" class="docIcon" />Download the Full Job Description</td>
</tr>
</table>
<div class="vaccCloseDate">Application Deadline: #item.applicationDeadline.ToString("dd MMMM yyyy")</div>
<div class="vaccApplyForPosition">Click here to apply</div>
}
</umbraco:Macro>
By this i get the all children which are not published..
Now i want the only published children.
What do you mean by published? What you are doing will only display published items, this is how umbraco works. Using where("visible") relies on you having created a property on one of your doc types called umbracoNaviHide and setting it to true in order to hide items. If what you have is not working then there is another reason for it.
Are your unpublished items greyed out in the content tree?
Try right click in top level content node and republish entire site.
Make sure your browser isn't caching something so clear the cache.
Failing all this simply delete umbraco.config in your app_data folder.
Umbraco does not render unpublished items.

check specfic words in the records using cucumber, capybara

In my code they have one table. In that table the row is not fixed. it may added by everyone.
I that table every third column text should be "Pending". It is the condition. I dont know How to check that every third column text have "Pending".
I was trying this. I dont know weather its right or not.
page.should have_selector('tbody tr td:nth-child(3)', text: Pending)
Its my html
<table id="thisis" class="table table-bordered table-striped">
<thead>
<tr>
<th>Name</th>
<th>Default</th>
<th>Status</th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>Test1</td>
<td>true</td>
<td>
<span class="label label-success">Pending</span>
</td>
<td>
<span>View</span>
<span>/</span>
<span>Edit</span>
<span>/</span>
<span>Publish</span>
</td>
</tr>
<tr>
<td>test2</td>
<td>true</td>
<td>
<span class="label label-success">Pending</span>
</td>
<td>
<span>View</span>
<span>/</span>
<span>Edit</span>
<span>/</span>
<span>Publish</span>
</td>
</tr>
</tbody>
</table>
Thanks for your valuable answers.
Method 1: Use count
Say you have 10 rows in a page, and given your status columns have class "status". Then
expect(page).to have_css(".status", text: "Pending", count: 10)
Method 2: Use scope
To code a table with data, a convention is to assign unique id to each row at least. This will help lots of functions not only the test.
What you need to do is:
Assign an unique CSS id with data id for each row
Add a "status" class for status column for easy identifying
You view will look like this
<tr id="123-row">
<td>bla blah</td>
<td><span class="label label-success status">Pending</span>
...
</tr>
Then, for test, you can do this in Capybara:
within "##{item.id}-row .status"
expect(page).to have_content("Pending")
end

Locate specific table row based on text string

I am working with RSpec and Capybara and have encountered a problem while trying to select a specific row based on :textContent or :text attributes but regardless of the string entered in the test the first row is always selected.
The HTML code is as follows:
<table class="LearningAssetList admin" data-id="1">
<tbody>
<tr class="CategoryHeader">
<td class="expandCell" colspan="9">
<span>Admin Pro / Scheduling</span>
</td>
</tr>
<tr class="headerRow ui-droppable">
<td class="blank"></td>
<td></td>
<td>Name</td>
<td>Description</td>
<td class="center">Length</td>
<td class="center">User Rating</td>
<td style="width:20px;padding:0px;"></td>
<td style="width:20px;padding:0px;"></td>
</tr>
<tr class="assetRow ui-draggable ui-droppable" data-id="49">
<td class="blank"> </td>
<td class="assetPlay icon">
<td class="assetName">
<a onclick="openModal('http://www.youtube.com/v/C0DPdy98e4c','Learning Asset
Test Upload')" href="#">Learning Asset Test Upload</a>
</td>
<td class="assetDescription">
<td class="assetDuration">
<td class="assetRating icon">
<td class="assetFunctions center">
<td class="assetDrag center">
<td class="blank"> </td>
</tr>
</tbody>
</table>
My RSpec code is as follows:
it "should allow asset to be deleted by Admins" do
visit 'http://localhost:3000/'
click_link 'Admin'
within(:xpath, '//*[#class="LearningAssetList admin"]') do
#row = find('tr>td.assetName>a', :textContent => "Learning Asset Test Upload")
row = find('tr>td.assetName>a', :textContent => "Learning Asset Test Upload".to_s)
within(row) do
find(:xpath, '//*[#class="popupMenu"]').click
end
sleep 5
find(:xpath, '//*[#class="delete"]').click
popup = page.driver.browser.switch_to.alert
popup.text.should eq('Are you sure you would like to delete this asset?')
popup.accept
assetList = find(:xpath, '//*[#class="LearningAssetList admin"]')
assetList.should have_content('Learning Asset Test Upload')
sleep 5
end
end
I have another row in the table above this entry where the assetName is simply "Test" and regardless of whether I use text, textContext, or indeed change the string this row is always selected and the more options button is pressed in this row which subsequently ends up in the deletion of the wrong asset.
Can anyone see any problem with the RSpec code or the logic behind selecting the row, I had thought that the text in the assetName td would have to match for the row to be found but this does not seem to be happening.
Your HTML is completely invalid. You can't nest multiple <tr>s inside each other and you haven't closed any of the tags.

Resources