Ruby pluck second number from this scraped HTML (wombat) - ruby-on-rails

Here's a section of HTML I'm trying to pull some info from:
<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results</span>
</p>
</div>
I just want to store 3744 from the bit I pull (everything inside the <p>), but I'm having a hard time since the of 3744 doesn't have any CSS styling and I don't understand XPaths at all :)
<span>Showing</span>1-30\nof 3744<span>results</span>
How would you parse the above string to only retrieve the total number of results?

As long as it always looks the same you could also use #scan to get just the last number.
str = '<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results</span>
</p>
</div>'
str.scan(/\d+/).pop.to_i
#=> 3744
Update Explanation of how it works
The scan will pull an Array of all the numbers e.g. ["1","30","3744"] then it will pop the last element from the Array "3744" and then convert that to an integer 3744.
Please note that if the number you want is not the last element in the Array then this will not work as you want e.g.
str = '<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results 14</span>
</p>
</div>'
str.scan(/\d+/).pop.to_i
#=> 14
As you can see since I added the number 14 to the results span this is now the last number in the Array and your results are off. So you could modify it to something like this:
str.gsub(/\s+/,'').scan(/\d+-\d+of(\d+)/).flatten.pop.to_i
#=> 3744
What this will do is remove all spaces with gsub then look for a pattern that equates to something along the lines of #{1,}-#{1,}of#{1,} and capture the last group #=> [["3744"]] then flatten the Array #=> ["3744"] then pop and convert to Integer. This seems like a better solution as it will make sure to match the "of ####" section everytime.

Use regexp look example Rubular:
<span>\w+<\/span>\d\-\d+\\[a-z]+\s(\d+)<span>\w+<\/span>
Match groups:
3744

Related

importXML xpath to google sheets returns #N/A

Under the following span class I am looking to extract the number (416) 123-1234. This number is written after data-number or at the end of the span.
<span class="_Xbe _ZWk kno-fv"> ## The Unique ID is _Xbe _ZWk kno-fv
<a class="fl r-idASUKPhOV34" href="#" data-number="+14161231234" data-pstn-
-call-url="" title="Call via Hangouts" jsaction="r.oVdbr2mIpA8"
data-rtid="idASUKPhOV34" jsl="$t t-6xg4lalHw8M;$x 0;" data-ved="0ahUKEwiDtKTG-snZAhUDzIMKHcntCfAQkAgImAEoADAU">(416) 123-1234</a></span>
The problem comes from the XPATH, I am not specifying the right xpath, I tried to copy xpath from the source code and it returned #N/A. My best guess is the following xpath with importxml, though it still returns #N/A.
=IMPORTXML("https://www.google.com/search?q="&A10,"//span[#class='_Xbe _ZWk kno-fv']/a/#href")
How can I write XPATH to extract the number in either form?

Retrieve contents of a tag using XPath in Google Sheets, but *not* the contents of nested/child tags

I'm trying to retrieve one portion of a <h1> tag using the Google Sheets IMPORTXML function. The funtion below gets everything within the <h1>, but I only want the text starting at Lenovo..... and ending at Silver. I do not want the Item #... that is in the nested <span>.
Is there a way to get the <h1>, but ignore the nexted span?
Working Function
=IMPORTXML("https://www.officedepot.com/a/products/"&A2,"//div[#id='skuHeading']/h1[#class='semi_bold fn']")
Structure
<div id="skuHeading" data-auid="productDetail_text_skuDescription">
<h1 itemprop="name" class="semi_bold fn">
Office DepotĀ® Brand White Copy Paper, Letter Paper Size, 20 Lb, 500 Sheets Per Ream, Case Of 10 Reams
<small class="item_sku" data-auid="productDetail_text_sku">
<span itemprop="sku">
Item # 273646
</span>
</small>
</h1>
</div>

Limit the importxml to a defined span

Currently I am using a transpose and then another column to count the results and give me what I want. But because Tanaike is awesome and helped me on another section, I am trying to wrap my head around what he did and apply it to this.
Starting with this URL in A1,
https://www.zillow.com/homedetails/307-N-Rosedale-Ave-Tulsa-OK-74127/22151896_zpid/
This is the formula in A2:
=If($A$1:A="","",Transpose(importxml($A1:$A,"//span[#class='snl phone']")))
Based on the listing sometimes there are three phone numbers, sometimes four, and sometimes eight that get spread across as many columns as needed.
I am looking for the Property Owner phone number. This is the ELEMENT from the inspection.
<div class="info flat-star-ratings sig-col" id="yui_3_18_1_2_1506365934526_2361"> <span class="snl name notranslate">Property Owner</span> <span class="snl phone" id="yui_3_18_1_2_1506365934526_2360">(918) 740-1698 </span> </div>
So I tried this, and it comes up content is empty. I was thinking to look at the div class info flat, then within that the snl phone, and stop before the /end of span.
=importXML(B17,"//div[#class='info flat-star-ratings sig-col']//span[#class='snl phone']/#span")
What I really need is ONLY the property owner phone number with 95% or greater accuracy.
How about this modification of XPath query?
Modified XPath query :
=importxml(A1,"//div[#class='info flat-star-ratings sig-col']//span[#class='snl phone']")
Result :
If this is not data you want, I'm sorry.
Edit :
4th and 8th number are the same. Is my understanding correct? If it's no problem. Please put URL and a following formula to "A1" and "A2", respectively.
=QUERY(ARRAYFORMULA(IF(IMPORTXML(A1,"//div[#class='info flat-star-ratings sig-col']//span[#class='snl name notranslate']")="Property Owner",IMPORTXML(A1,"//div[#class='info flat-star-ratings sig-col']//span[#class='snl phone']"), "")),"Select * where Col1<>''")
Result :

Parsing text content in ColdFusion

I am attempting to parse text from a <cfoutput query="...">. I am interested in finding the number of times every word in the text is displayed. For example:
"My name is Bob and I like to Bob".
should result in
Bob - 2
Name - 1
etc, etc, etc.
I take my <cfoutput> from a twitter RSS feed. Here is my code:
<blink>
<cfset feedurl="http://twitter.com/statuses/user_timeline/47847839.rss" />
<cftry>
<cffeed source="#feedurl#" properties="feedmeta" query="feeditems" />
<cfcatch></cfcatch>
</cftry>
<ol>
<cfoutput query="feeditems">
#content# #id# <br><br>
</cfoutput>
</ol>
</blink>
I output a pretty great ordered list, but I can't figure out for the life of me how to parse the content and list how many times each word is used.
Thanks for any help you can provide, I am new to these forums!
You can find a solution here:
http://www.coldfusionjedi.com/index.cfm/2007/8/2/Counting-Word-Instances-in-a-String
Basically, split the string up using regex and then loop over the results. There are some darn good comments here as well.

Markdown: How to reference an item in a numbered list, by number (like LaTeX's \ref / \label)?

Is there any way in markdown to do the equivalent of the cross-referencing in this LaTeX snippet? (Taken from here.)
\begin{enumerate}
\item \label{itm:first} This is a numbered item
\item Another numbered item \label{itm:second}
\item \label{itm:third} Same as \ref{itm:first}
\end{enumerate}
Cross-referencing items \ref{itm:second} and \ref{itm:third}.
This LaTeX produces
1. This is a numbered item
2. This is another numbered item
3. Same as 1
Cross-referencing items 2 and 3.
That is, I would like to be able to refer to items in a markdown list without explicitly numbering them, so that I could change the above list to the following without having to manually update the cross references:
1. This is the very first item
2. This is a numbered item
3. This is another numbered item
4. Same as 2
Cross-referencing items 3 and 4.
HTML can't even do that and Markdown is a subset of HTML, so the answer is no.
For example, your list would be represented like so (when rendered by Markdown):
<ol>
<li>This is a numbered item</li>
<li>This is another numbered item</li>
<li>Same as 1</li>
</ol>
Notice that there is no indication of which item is which as far as the numbering goes. That is all inferred at render time by the browser. However, the number values are not stored within the document and are not referenceable or linkable. They are for display only and serve no other purpose.
Now you could write some custom HTML to uniquely identify each list item and make them referenceable:
<ol>
<li id="item1">This is a numbered item</li>
<li id="item2">This is another numbered item</li>
<li id="item3">Same as <a href="#item1>1</a></li>
</ol>
However, those IDs are hardcoded and have no relation to the numbers used to display the items. Although, I suppose that's what you want. To make your updated changes:
<ol>
<li id="item0">This is the very first item</li>
<li id="item1">This is a numbered item</li>
<li id="item2">This is another numbered item</li>
<li id="item3">Same as 2</li>
</ol>
The IDs stay with the item as intended. However, lets move on to the links to those list items. Note that in the first iteration we had:
1
And with the update we had:
2
The only difference being the link's label (changed from "1" to "2"). That is actually changing the document text through some sort of macro magic stuff. Not something HTML can do, at least not without JavaScript and/or CSS to help.
In other words, the text of every reference to the item would need to be manually updated throughout the document every time the list is updated. And that is for HTML. What about Markdown? As the rules state:
Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags.
Therefore in standard Markdown there is not even any way to assign IDs to the list items.
Seems to me you either need to use something other than lists or use something other than Markdown/HTML.
Maybe you need to use the H1.. H6 and then Markdown generates an anchor that you can link to:
# H1
## H2
### H3
#### H4
##### H5
###### H6
Something like:
###### 1. This is a numbered item
###### 2. This is another numbered item
###### 3. Same as 1
Generates:
<h6 id="1-this-is-a-numbered-item">1. This is a numbered item</h6>
<h6 id="2-this-is-another-numbered-item">2. This is another numbered item</h6>
<h6 id="3-same-as-1">3. Same as 1</h6>
Pandoc allows you to use labels in example lists:
Numbered example lists
Extension: example_lists
The special list marker # can be used for sequentially numbered examples. The first list item with a # marker will be numbered '1', the next '2', and so on,
throughout the document. The numbered examples need not occur in a single list; each new list using # will take up where the last stopped. So, for example:
(#) My first example will be numbered (1).
(#) My second example will be numbered (2).
Explanation of examples.
(#) My third example will be numbered (3).
Numbered examples can be labeled and referred to elsewhere in the document:
(#good) This is a good example.
As (#good) illustrates, ...
The label can be any string of alphanumeric characters, underscores, or hyphens.

Resources