Retrieve contents of a tag using XPath in Google Sheets, but *not* the contents of nested/child tags - google-sheets

I'm trying to retrieve one portion of a <h1> tag using the Google Sheets IMPORTXML function. The funtion below gets everything within the <h1>, but I only want the text starting at Lenovo..... and ending at Silver. I do not want the Item #... that is in the nested <span>.
Is there a way to get the <h1>, but ignore the nexted span?
Working Function
=IMPORTXML("https://www.officedepot.com/a/products/"&A2,"//div[#id='skuHeading']/h1[#class='semi_bold fn']")
Structure
<div id="skuHeading" data-auid="productDetail_text_skuDescription">
<h1 itemprop="name" class="semi_bold fn">
Office DepotĀ® Brand White Copy Paper, Letter Paper Size, 20 Lb, 500 Sheets Per Ream, Case Of 10 Reams
<small class="item_sku" data-auid="productDetail_text_sku">
<span itemprop="sku">
Item # 273646
</span>
</small>
</h1>
</div>

Related

importXML xpath to google sheets returns #N/A

Under the following span class I am looking to extract the number (416) 123-1234. This number is written after data-number or at the end of the span.
<span class="_Xbe _ZWk kno-fv"> ## The Unique ID is _Xbe _ZWk kno-fv
<a class="fl r-idASUKPhOV34" href="#" data-number="+14161231234" data-pstn-
-call-url="" title="Call via Hangouts" jsaction="r.oVdbr2mIpA8"
data-rtid="idASUKPhOV34" jsl="$t t-6xg4lalHw8M;$x 0;" data-ved="0ahUKEwiDtKTG-snZAhUDzIMKHcntCfAQkAgImAEoADAU">(416) 123-1234</a></span>
The problem comes from the XPATH, I am not specifying the right xpath, I tried to copy xpath from the source code and it returned #N/A. My best guess is the following xpath with importxml, though it still returns #N/A.
=IMPORTXML("https://www.google.com/search?q="&A10,"//span[#class='_Xbe _ZWk kno-fv']/a/#href")
How can I write XPATH to extract the number in either form?

Tablesorter: filtering by multiple, but not all, columns

I'm using tablesorter with an external filter. (ctrl-F 'filter_external') on the linked page. From the docs:
These external inputs have one requirement, they must have a
data-column="#", where the # targets the column (zero-based index),
pointing to a specific column to search.
<input class="search" type="search" data-column="0" placeholder="Search first column">
If you want to search all columns, using the updated "any match"
method, set the data column value to "all":
<input class="search" type="search" data-column="all" placeholder="Search entire table">
What I'd like is to apply my external filter to a collection of columns (more than one, less than all). Ideally, the html could look something like this:
<input class="search" type="search" data-column="0,1" placeholder="Search first two columns">
or this:
<input class="search" type="search" data-column="0" data-column="1" placeholder="Search first two columns">
(Is the second one even valid html?)
I've been up and down the tablesorter docs and I haven't had any luck applying the sort that I want. One workaround that I attempted was to present a single input to the user and have it write to hidden inputs which were bound to their respective columns:
<input class="search" type="search" placeholder="Search first two columns">
// javascript populates the hidden inputs as the user types in the visible one
<input class="search" type="search" data-column="0" style="display: none;">
<input class="search" type="search" data-column="1" style="display: none;">
This 'works', except that it now 'AND's the filtering from each individual filter, so that BOTH columns have to match the search term for the row to remain visible rather than EITHER column matching the search term for the row to remain visible. The data-column="all" option 'OR's the searches - this is what I want.
That's a great suggestion!
I just added the ability to include multiple columns in an external search input. This change is currently only available within the working branch of the repository
With this change, you can include a range, or multiple columns separated by commas (demo):
<input type="search" class="search" data-column="0-1,3,5-7,9">
Note:
This input behaves the same as an input with data-column="all" in that "range", "notMatch" and "operators" searches are ignored.
If multiple inputs exist, the last search will override all previous searches.
All spaces in the data-column attribute are ignored.

Ruby pluck second number from this scraped HTML (wombat)

Here's a section of HTML I'm trying to pull some info from:
<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results</span>
</p>
</div>
I just want to store 3744 from the bit I pull (everything inside the <p>), but I'm having a hard time since the of 3744 doesn't have any CSS styling and I don't understand XPaths at all :)
<span>Showing</span>1-30\nof 3744<span>results</span>
How would you parse the above string to only retrieve the total number of results?
As long as it always looks the same you could also use #scan to get just the last number.
str = '<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results</span>
</p>
</div>'
str.scan(/\d+/).pop.to_i
#=> 3744
Update Explanation of how it works
The scan will pull an Array of all the numbers e.g. ["1","30","3744"] then it will pop the last element from the Array "3744" and then convert that to an integer 3744.
Please note that if the number you want is not the last element in the Array then this will not work as you want e.g.
str = '<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results 14</span>
</p>
</div>'
str.scan(/\d+/).pop.to_i
#=> 14
As you can see since I added the number 14 to the results span this is now the last number in the Array and your results are off. So you could modify it to something like this:
str.gsub(/\s+/,'').scan(/\d+-\d+of(\d+)/).flatten.pop.to_i
#=> 3744
What this will do is remove all spaces with gsub then look for a pattern that equates to something along the lines of #{1,}-#{1,}of#{1,} and capture the last group #=> [["3744"]] then flatten the Array #=> ["3744"] then pop and convert to Integer. This seems like a better solution as it will make sure to match the "of ####" section everytime.
Use regexp look example Rubular:
<span>\w+<\/span>\d\-\d+\\[a-z]+\s(\d+)<span>\w+<\/span>
Match groups:
3744

Parsing text content in ColdFusion

I am attempting to parse text from a <cfoutput query="...">. I am interested in finding the number of times every word in the text is displayed. For example:
"My name is Bob and I like to Bob".
should result in
Bob - 2
Name - 1
etc, etc, etc.
I take my <cfoutput> from a twitter RSS feed. Here is my code:
<blink>
<cfset feedurl="http://twitter.com/statuses/user_timeline/47847839.rss" />
<cftry>
<cffeed source="#feedurl#" properties="feedmeta" query="feeditems" />
<cfcatch></cfcatch>
</cftry>
<ol>
<cfoutput query="feeditems">
#content# #id# <br><br>
</cfoutput>
</ol>
</blink>
I output a pretty great ordered list, but I can't figure out for the life of me how to parse the content and list how many times each word is used.
Thanks for any help you can provide, I am new to these forums!
You can find a solution here:
http://www.coldfusionjedi.com/index.cfm/2007/8/2/Counting-Word-Instances-in-a-String
Basically, split the string up using regex and then loop over the results. There are some darn good comments here as well.

Markdown: How to reference an item in a numbered list, by number (like LaTeX's \ref / \label)?

Is there any way in markdown to do the equivalent of the cross-referencing in this LaTeX snippet? (Taken from here.)
\begin{enumerate}
\item \label{itm:first} This is a numbered item
\item Another numbered item \label{itm:second}
\item \label{itm:third} Same as \ref{itm:first}
\end{enumerate}
Cross-referencing items \ref{itm:second} and \ref{itm:third}.
This LaTeX produces
1. This is a numbered item
2. This is another numbered item
3. Same as 1
Cross-referencing items 2 and 3.
That is, I would like to be able to refer to items in a markdown list without explicitly numbering them, so that I could change the above list to the following without having to manually update the cross references:
1. This is the very first item
2. This is a numbered item
3. This is another numbered item
4. Same as 2
Cross-referencing items 3 and 4.
HTML can't even do that and Markdown is a subset of HTML, so the answer is no.
For example, your list would be represented like so (when rendered by Markdown):
<ol>
<li>This is a numbered item</li>
<li>This is another numbered item</li>
<li>Same as 1</li>
</ol>
Notice that there is no indication of which item is which as far as the numbering goes. That is all inferred at render time by the browser. However, the number values are not stored within the document and are not referenceable or linkable. They are for display only and serve no other purpose.
Now you could write some custom HTML to uniquely identify each list item and make them referenceable:
<ol>
<li id="item1">This is a numbered item</li>
<li id="item2">This is another numbered item</li>
<li id="item3">Same as <a href="#item1>1</a></li>
</ol>
However, those IDs are hardcoded and have no relation to the numbers used to display the items. Although, I suppose that's what you want. To make your updated changes:
<ol>
<li id="item0">This is the very first item</li>
<li id="item1">This is a numbered item</li>
<li id="item2">This is another numbered item</li>
<li id="item3">Same as 2</li>
</ol>
The IDs stay with the item as intended. However, lets move on to the links to those list items. Note that in the first iteration we had:
1
And with the update we had:
2
The only difference being the link's label (changed from "1" to "2"). That is actually changing the document text through some sort of macro magic stuff. Not something HTML can do, at least not without JavaScript and/or CSS to help.
In other words, the text of every reference to the item would need to be manually updated throughout the document every time the list is updated. And that is for HTML. What about Markdown? As the rules state:
Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags.
Therefore in standard Markdown there is not even any way to assign IDs to the list items.
Seems to me you either need to use something other than lists or use something other than Markdown/HTML.
Maybe you need to use the H1.. H6 and then Markdown generates an anchor that you can link to:
# H1
## H2
### H3
#### H4
##### H5
###### H6
Something like:
###### 1. This is a numbered item
###### 2. This is another numbered item
###### 3. Same as 1
Generates:
<h6 id="1-this-is-a-numbered-item">1. This is a numbered item</h6>
<h6 id="2-this-is-another-numbered-item">2. This is another numbered item</h6>
<h6 id="3-same-as-1">3. Same as 1</h6>
Pandoc allows you to use labels in example lists:
Numbered example lists
Extension: example_lists
The special list marker # can be used for sequentially numbered examples. The first list item with a # marker will be numbered '1', the next '2', and so on,
throughout the document. The numbered examples need not occur in a single list; each new list using # will take up where the last stopped. So, for example:
(#) My first example will be numbered (1).
(#) My second example will be numbered (2).
Explanation of examples.
(#) My third example will be numbered (3).
Numbered examples can be labeled and referred to elsewhere in the document:
(#good) This is a good example.
As (#good) illustrates, ...
The label can be any string of alphanumeric characters, underscores, or hyphens.

Resources