Apache Qpid List of REST end points - messagebroker

Applicable to Qpid Java broker 0.32 onwards, not sure of other Qpid Broker versions ( c++ etc.)
For my projects I was trying to explore various REST end points however failed to found them in Qpid documentation and online.
Therefore figured out a way to extract REST end points which are accessible and have various possible operations and parameter support. This greatly helped me, I hope this helps to other community members.
Sharing REST end points here for reference.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head>
<meta http-equiv="content-type" content="text/html; charset=windows-1252">
<link rel="stylesheet" type="text/css" href="QpidApiDocs_files/apidocs.css">
<title>Qpid API</title>
</head>
<body>
<table class="api">
<thead>
<tr>
<th class="type">Type</th>
<th class="path">Path</th>
<th class="description">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="type" rowspan="2">AccessControlProvider</td>
<td class="path">/api/latest/accesscontrolprovider</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/accesscontrolprovider</td>
</tr>
<tr>
<td class="type" rowspan="2">AuthenticationProvider</td>
<td class="path">/api/latest/authenticationprovider</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/authenticationprovider</td>
</tr>
<tr>
<td class="type" rowspan="2">Binding</td>
<td class="path">/api/latest/binding</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/binding</td>
</tr>
<tr>
<td class="type" rowspan="2">Broker</td>
<td class="path">/api/latest/broker</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/broker</td>
</tr>
<tr>
<td class="type" rowspan="2">BrokerLogInclusionRule</td>
<td class="path">/api/latest/brokerloginclusionrule</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/brokerloginclusionrule</td>
</tr>
<tr>
<td class="type" rowspan="2">BrokerLogger</td>
<td class="path">/api/latest/brokerlogger</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/brokerlogger</td>
</tr>
<tr>
<td class="type" rowspan="2">Connection</td>
<td class="path">/api/latest/connection</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/connection</td>
</tr>
<tr>
<td class="type" rowspan="2">Consumer</td>
<td class="path">/api/latest/consumer</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/consumer</td>
</tr>
<tr>
<td class="type" rowspan="2">Exchange</td>
<td class="path">/api/latest/exchange</td>
<td class="description" rowspan="2"><p>An Exchange is a named entity
within the Virtualhost which receives messages from producers and routes
them to matching Queues within the Virtualhost.</p><p>The server provides a set of exchange types with each exchange type implementing a different routing algorithm.</p></td>
</tr>
<tr>
<td class="path">/api/v6/exchange</td>
</tr>
<tr>
<td class="type" rowspan="2">Group</td>
<td class="path">/api/latest/group</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/group</td>
</tr>
<tr>
<td class="type" rowspan="2">GroupMember</td>
<td class="path">/api/latest/groupmember</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/groupmember</td>
</tr>
<tr>
<td class="type" rowspan="2">GroupProvider</td>
<td class="path">/api/latest/groupprovider</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/groupprovider</td>
</tr>
<tr>
<td class="type" rowspan="2">KeyStore</td>
<td class="path">/api/latest/keystore</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/keystore</td>
</tr>
<tr>
<td class="type" rowspan="2">Plugin</td>
<td class="path">/api/latest/plugin</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/plugin</td>
</tr>
<tr>
<td class="type" rowspan="2">Port</td>
<td class="path">/api/latest/port</td>
<td class="description" rowspan="2"><p>The Broker supports configuration
of Ports to specify the particular AMQP messaging and HTTP/JMX
management connectivity it offers for use.</p><p>Each Port is configured
with the particular Protocols and Transports it supports, as well as
the Authentication Provider to be used to authenticate connections.
Where SSL is in use, the Port configuration also defines which Keystore
to use and (where supported) which TrustStore(s) and whether Client
Certificates should be requested/required.</p></td>
</tr>
<tr>
<td class="path">/api/v6/port</td>
</tr>
<tr>
<td class="type" rowspan="2">PreferencesProvider</td>
<td class="path">/api/latest/preferencesprovider</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/preferencesprovider</td>
</tr>
<tr>
<td class="type" rowspan="2">Publisher</td>
<td class="path">/api/latest/publisher</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/publisher</td>
</tr>
<tr>
<td class="type" rowspan="2">Queue</td>
<td class="path">/api/latest/queue</td>
<td class="description" rowspan="2"><p>Queues are named entities within a
VirtualHost that hold/buffer messages for later delivery to consumer
applications. Consumers subscribe to a queue in order to receive
messages for it.</p><p>The Broker supports different queue types, each
with different delivery semantics. It also allows for messages on a
queue to be treated as a group.</p></td>
</tr>
<tr>
<td class="path">/api/v6/queue</td>
</tr>
<tr>
<td class="type" rowspan="2">RemoteReplicationNode</td>
<td class="path">/api/latest/remotereplicationnode</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/remotereplicationnode</td>
</tr>
<tr>
<td class="type" rowspan="2">Session</td>
<td class="path">/api/latest/session</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/session</td>
</tr>
<tr>
<td class="type" rowspan="2">TrustStore</td>
<td class="path">/api/latest/truststore</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/truststore</td>
</tr>
<tr>
<td class="type" rowspan="2">User</td>
<td class="path">/api/latest/user</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/user</td>
</tr>
<tr>
<td class="type" rowspan="2">VirtualHost</td>
<td class="path">/api/latest/virtualhost</td>
<td class="description" rowspan="2"><p>A virtualhost is a namespace in
which messaging is performed. Virtualhosts are independent; the
messaging goes on a within a virtualhost is independent of any messaging
that goes on in another virtualhost. For instance, a queue named <i>foo</i> defined in one virtualhost is completely independent of a queue named <i>foo</i> in another virtualhost.</p><p>A virtualhost is backed by storage which is used to store the messages.</p></td>
</tr>
<tr>
<td class="path">/api/v6/virtualhost</td>
</tr>
<tr>
<td class="type" rowspan="2">VirtualHostAlias</td>
<td class="path">/api/latest/virtualhostalias</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/virtualhostalias</td>
</tr>
<tr>
<td class="type" rowspan="2">VirtualHostLogInclusionRule</td>
<td class="path">/api/latest/virtualhostloginclusionrule</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/virtualhostloginclusionrule</td>
</tr>
<tr>
<td class="type" rowspan="2">VirtualHostLogger</td>
<td class="path">/api/latest/virtualhostlogger</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/virtualhostlogger</td>
</tr>
<tr>
<td class="type" rowspan="2">VirtualHostNode</td>
<td class="path">/api/latest/virtualhostnode</td>
<td class="description" rowspan="2"></td>
</tr>
<tr>
<td class="path">/api/v6/virtualhostnode</td>
</tr>
</tbody>
</table>
</body></html>

Related

Import table from html into google sheets

I have the following table element from a website.
Using this formula it only extracts the 1st td ie class=TTRow_left
I want to extract both class=TTRow_left and class=TTRow_right in a google sheet
Formula:
IMPORTHTML("https://www.bsesme.com/","table",6)
Html:
<table width="305" border="0" cellspacing="0" cellpadding="0">
<tbody><tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Listed on SME till Date</td>
<td class="TTRow_right" style="height:22px;" id="AL">386</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">Mkt Cap of Cos. Listed on SME till Date (Rs.Cr.)</td>
<td class="TTRow_right" style="height:22px;" id="MCL">58,225.56</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">Total Amount of Money Raised till Date (Rs. Cr.)</td>
<td class="TTRow_right" style="height:22px;" id="Td13">4,132.16</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Migrated to Main Board</td>
<td class="TTRow_right" style="height:22px;" id="MB">150</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Listed as of Date </td>
<td class="TTRow_right" style="height:22px;" id="CL"> 236</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">No. of Companies Suspended</td>
<td class="TTRow_right" style="height:22px;" id="CS">32</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">No. of Companies Eligible for Trading</td>
<td class="TTRow_right" style="height:22px;" id="CET">201</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">No. of Companies Traded</td>
<td class="TTRow_right" style="height:22px;" id="CT">110</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">Advances/ Declines/ Unchanged</td>
<td class="TTRow_right" style="height:22px;" id="Adv">73/ 32/ 5</td>
</tr>
<tr>
<td class="TTRow_left" style="height:22px;">Mkt Cap of BSE SME Listed Cos. (Rs.Cr.)</td>
<td class="TTRow_right" style="height:22px;" id="Dec">15,095.93</td>
</tr>
<!--<tr>
<td class="TTRow_left" style="height:22px;" width="230px">No. of SME companies migrated to main board</td>
<td class="TTRow_right" style="height:22px;" >3</td>
</tr>-->
</tbody></table>
</td>
</tr>
</tbody></table>```
There is a way, You could extract that data with Google Apps Script - i.e. writing a function that reads the values (those are returned by a separated request).
You need to make a request to this url - which is the one that loads the data:
https://www.bsesme.com/markets/MarketStat.aspx?&292022849
Values are:
bse$#$237|32|202|104|58|37|9|15,110.69|12|3,364.25|150|387|58,387.68|4,144.97
And then, extract the data.
I check the page's source code and that page is using javascript for read the data and rearrange it on the main page (i.e. https://www.bsesme.com/).
Tip: Check the main page's source code and check a function called function GetNotices(str) - that function looks like has the logic for rearrange the data.
You will have to check deeper in order to figure out how you can extract this data on your spreadsheet.
In this case IMPORTHTML would be able to return table data as long as it's not JavaScript generated, I tried checking the web page you are trying to scrap data from and it seems that the exact content that is missing is generated through JavaScript as it's not shown when disabled from the browser:
As you can see when JavaScript is disabled the content in the page is not displayed however the Table content TTRow_left is hard coded that's why the function is able to get this information from the web page:
td class="TTRow_left" style="height:22px;" width="230px">No. of Companies Listed on SME till Date'
You will notice that TTRow_right is not displayed therefore the function won't be able to scrap data from it.

Is commenting faster than th:remove="all-but-first"?

There's an example in the Thymeleaf docs that I'm curious about.
Is commenting out a block using Thymeleaf-style commenting faster than using th:remove="all-but-first"?
Example:
<table>
<tr th:each="user : ${users}">
<td th:text="${user.name}">Jamie Dimon</td>
</tr>
<!--/* Hidden from evaluation -->
<tr>
<td>Jeff Bezos</td>
</tr>
<tr>
<td>Warren Buffett</td>
</tr>
<!--*/-->
</table>
vs.
<table th:remove="all-but-first">
<tr th:each="user : ${users}">
<td th:text="${user.name}">Jamie Dimon</td>
</tr>
<tr>
<td>Jeff Bezos</td>
</tr>
<tr>
<td>Warren Buffett</td>
</tr>
</table>
In both cases, prototyping would show the same HTML, but I am wondering whether the low precedence of th:remove would make it less desirable since it would be removing the tags after evaluating the th:each.

i want only published children in umbraco

I want only those children who are publish in content folder.
this is my below code:
<umbraco:Macro runat="server" language="cshtml">
#foreach (var item in Model.Children)
{
<h3 class="vacancyH">#item.jobTitle</h3>
<table class="vaccTbl">
<tr>
<td class="vaccDetailTitle">Salary & Benefits:</td>
<td class="vaccDetailDesc">#item.salaryBenefits</td>
</tr>
<tr>
<td class="vaccDetailTitle">Employment Type:</td>
<td>#item.employmentType</td>
</tr>
<tr>
<td class="vaccDetailTitle">Department:</td>
<td>#item.department</td>
</tr>
<tr>
<td class="vaccDetailTitle">Report to Position:</td>
<td>#item.reportToPosition</td>
</tr>
<tr>
<td class="vaccDetailTitle">Location:</td>
<td>#item.location</td>
</tr>
<tr>
<td class="vaccDetailTitle">Date of Description:</td>
<td>#item.businessArea</td>
</tr>
<tr>
<td class="vaccDetailTitle" valign="top">Summary:</td>
<td class="tablep">#item.vacancySummary</td>
</tr>
<tr>
<td colspan="2" valign="middle"><img src="/images/wordicon.jpg" alt="" class="docIcon" />Download the Full Job Description</td>
</tr>
</table>
<div class="vaccCloseDate">Application Deadline: #item.applicationDeadline.ToString("dd MMMM yyyy")</div>
<div class="vaccApplyForPosition">Click here to apply</div>
}
</umbraco:Macro>
By this i get the all children which are not published..
Now i want the only published children.
What do you mean by published? What you are doing will only display published items, this is how umbraco works. Using where("visible") relies on you having created a property on one of your doc types called umbracoNaviHide and setting it to true in order to hide items. If what you have is not working then there is another reason for it.
Are your unpublished items greyed out in the content tree?
Try right click in top level content node and republish entire site.
Make sure your browser isn't caching something so clear the cache.
Failing all this simply delete umbraco.config in your app_data folder.
Umbraco does not render unpublished items.

Mule Community edition vs Enterprise edition - Feature Comparison?

This should be a simple 'google' ... but I have drawn a blank. I assume it must be out there somewhere, can anyone help me find it?
I need a simple comparison that tells me what is in and what is out of the community edition vs the enterprise edition?
*For example the DataMapper is not included in Community, but this is not clear until you try to deploy, I'd really like to save a lot of wasted effort upfront.
Thank you.
MuleSoft provides a list of comparisons and features here:
http://www.mulesoft.com/platform/soa/mule-esb-enterprise
This details that Datamapper is enterprise only. It has a blanket statement around enterprise connectors, but you can view which are enterprise or community via mulesoft.org/connectors?class=premium
Some transports, (mainly JDBC) have enterprise equivalents which are documented on the individual transports documentation pages: http://www.mulesoft.org/documentation/display/current/JDBC+Transport+Reference
Bumping an old thread, but this page provides a good simple list with simple supported/not supported and also the impact areas of such feature not being included in CE.
http://www.whishworks.com/blog/mule-esb-community-vs-enterprise-edition/
However, adding the same content here so that if the link becomes dead, the content doesn't get lost. Simply hit Run code snippet button > Full page to see the complete table.
<p><strong>High Availability and Performance</strong>
</p>
<table class="matrix" style="height: 247px;" width="650">
<tbody>
<tr>
<td width="173"><b>Feature</b>
</td>
<td width="173"><b>Community</b><b> Edition</b>
</td>
<td width="173"><b>Enterprise Edition (G) / (S)</b>
</td>
<td width="173"><b>Enterprise Edition (P)</b>
</td>
<td width="173"><b>Impact</b>
</td>
</tr>
<tr>
<td width="173">High Availability</td>
<td width="173">No Support</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Message Loss and Transaction failure</td>
</tr>
<tr>
<td width="173">Resilience</td>
<td width="173">No Support</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Impact on effort to take care of state full and failure scenarios</td>
</tr>
<tr>
<td width="173">Caching</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Performance Impact</td>
</tr>
</tbody>
</table>
<p> </p>
<p><strong>Development</strong>
</p>
<table class="matrix" width="650">
<tbody>
<tr>
<td width="173"><b>Feature</b>
</td>
<td width="173"><b>Community</b><b> Edition</b>
</td>
<td width="173"><b>Enterprise Edition (G) / (S)</b>
</td>
<td width="173"><b>Enterprise Edition (P)</b>
</td>
<td width="173"><b>Impact</b>
</td>
</tr>
<tr>
<td width="173">Anypoint Templates</td>
<td width="173">No Support</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Saves development and design effort by using templates. Guesstimated to be 40 to 60% time saving depending on how close the use case matches to the template.</td>
</tr>
<tr>
<td width="173">Transaction Management</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Data loss and Impact on development effort</td>
</tr>
<tr>
<td width="173">Batch Manager</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Impact on Development & Support Effort</td>
</tr>
<tr>
<td width="173">Batch Process component</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Impact on Development & Support Effort</td>
</tr>
<tr>
<td width="173">JDBC Enterprise Connector</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">for handling Batch statements, used in Data Integration project. Performance hit.</td>
</tr>
<tr>
<td width="173">Anypoint Datamapper</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Impact on Development</td>
</tr>
</tbody>
</table>
<p> </p>
<p><strong>Operational Support</strong>
</p>
<table class="matrix" width="650">
<tbody>
<tr>
<td width="173"><b>Feature</b>
</td>
<td width="173"><b>Community</b><b> Edition</b>
</td>
<td width="173"><b>Enterprise Edition (G) / (S)</b>
</td>
<td width="173"><b>Enterprise Edition (P)</b>
</td>
<td width="173"><b>Impact</b>
</td>
</tr>
<tr>
<td width="173">Mule Management Console</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Impact on Support</td>
</tr>
<tr>
<td width="173">SLA and email Alerts</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Impact on Support and Availability</td>
</tr>
<tr>
<td width="173">SNMP Monitoring</td>
<td width="173">No</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Impact on Support and Availability</td>
</tr>
<tr>
<td width="173">HTTP Polling</td>
<td width="173">No</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Impact on Support and Availability. Mule provides Http polling of service for availability.</td>
</tr>
<tr>
<td width="173">Deployment Management</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Impact on Support</td>
</tr>
</tbody>
</table>
<p> </p>
<p><strong>Security</strong>
</p>
<table class="matrix" width="650">
<tbody>
<tr>
<td width="173"><b>Feature</b>
</td>
<td width="173"><b>Community</b><b> Edition</b>
</td>
<td width="173"><b>Enterprise Edition (G) / (S)</b>
</td>
<td width="173"><b>Enterprise Edition (P)</b>
</td>
<td width="173"><b>Impact</b>
</td>
</tr>
<tr>
<td width="173">Role based security</td>
<td width="173">Not Supported</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Major effort to custom develop</td>
</tr>
<tr>
<td width="173">Oauth 2.0 – Secure Token Provider</td>
<td width="173">Not Supported</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Major effort to custom develop</td>
</tr>
<tr>
<td width="173">Message Encryption</td>
<td width="173">Not Supported</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Major effort to custom develop</td>
</tr>
<tr>
<td width="173">SAML 2.0 Module</td>
<td width="173">Not Supported</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Major effort to custom develop</td>
</tr>
<tr>
<td width="173">Secure Property Holder</td>
<td width="173">Not Supported</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Keeps password and other confidential text in encrypted format. This cannot be custom built as it links directly to your endpoint.</td>
</tr>
<tr>
<td width="173">IP Based Filtering</td>
<td width="173">Not Supported</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">IP based filter is available in EE version for filtering endpoints based on inbound IP and requests can be filtered using LDAP.</td>
</tr>
</tbody>
</table>
<p> </p>
<p><strong>Support</strong>
</p>
<table class="matrix" width="650">
<tbody>
<tr>
<td width="173"><b>Feature</b>
</td>
<td width="173"><b>Community</b><b> Edition</b>
</td>
<td width="173"><b>Enterprise Edition (G) / (S)</b>
</td>
<td width="173"><b>Enterprise Edition (P)</b>
</td>
<td width="173"><b>Impact</b>
</td>
</tr>
<tr>
<td width="173">License</td>
<td width="173">Free</td>
<td width="173">Purchase Minimum 2 Cores</td>
<td width="173">Purchase Minimum 4 (2+2) for HA</td>
<td width="173">Licence Cost</td>
</tr>
<tr>
<td width="173">Hardened Code</td>
<td width="173">No Support</td>
<td width="173">Yes</td>
<td width="173">Yes</td>
<td width="173">Impact on stability and performance</td>
</tr>
<tr>
<td width="173">SLA</td>
<td width="173">Forums</td>
<td width="173">8/5, 24 Hours Response Time</td>
<td width="173">24/7, 2 Hours Response Time</td>
<td width="173">Impact on support</td>
</tr>
<tr>
<td width="173">Hot patches & Service packs</td>
<td width="173">No Support</td>
<td width="173">Supported</td>
<td width="173">Supported</td>
<td width="173">Impact on support and availability</td>
</tr>
</tbody>
</table>
-Shanky G.

Ruby Mechanize table scraping doesn't capture entire row

I am trying to scrape a table website with mechanize.
I want to scrape the second row.
When I run :
agent.page.search('table.ea').search('tr')[-2].search('td').map{ |n| n.text }
I would expect it to scrape the whole row. But instead it only scrapes: ["2011-02-17", "0,00"]
Why isn't it scraping all of the columns in the row, but just the first and the last column?
Xpath:
/html/body/center/table/tbody/tr[2]/td[2]/table/tbody/tr[3]/td/table/tbody/tr[2]/td/table/tbody/tr[2]
CSS PATH:
html body center table tbody tr td table tbody tr td table tbody tr td table.ea tbody tr td.total
The page is similar to this:
<table><table><table>
<table width="100%" border="0" cellpadding="0" cellspacing="1" class="ea">
<tr>
<th>Date</th>
<th>One</th>
<th>Two</th>
<th>Three</th>
<th>Four</th>
<th>Five</th>
<th>Six</th>
<th>Seven</th>
<th>Eight</th>
</tr>
<tr>
<td>2011-02-17</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0,00</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">387</td>
<td align="right">0,00</td> <!-- FOV -->
<td align="right">0,00</td>
</tr>
<tr>
<td class="total">Ialt</td>
<td class="total" align="right">0</td>
<td class="total" align="right">40</td>
<td class="total" align="right">0,46</td>
<td class="total" align="right">2</td>
<td class="total" align="right">0</td>
<td class="total" align="right">0</td>
<td class="total" align="right">0</td>
<td class="total" align="right">3.060</td>
<td class="total" align="right">0,00</td>
<td class="total" align="right">18,58</td>
</tr>
</table>
</table></table></table>
Using the following Ruby code (https://gist.github.com/835603):
require 'mechanize'
require 'pp'
a = Mechanize.new { |agent|
agent.user_agent_alias = 'Mac Safari'
}
a.get('http://binarymuse.net/table.html') do |page|
pp page.search('table.ea').search('tr')[-2].search('td').map{ |n| n.text }
end
I get the following output:
["2011-02-17", "0", "0", "0,00", "0", "0", "0", "0", "387", "0,00", "0,00"]
I would recommend you to leave Mechanize to harder stuff than scraping a page.
You can use Nokogiri much more simple than using Mechanize(but ofcourse you can do it with it) since you can just query the page.
Try it out!
here is a link to an answer regarding nokogiri
Personally I used Mechanize when I needed to send forms and stuff like that albeit there are tons of other uses to it!

Resources