Parsing text content in ColdFusion - parsing

I am attempting to parse text from a <cfoutput query="...">. I am interested in finding the number of times every word in the text is displayed. For example:
"My name is Bob and I like to Bob".
should result in
Bob - 2
Name - 1
etc, etc, etc.
I take my <cfoutput> from a twitter RSS feed. Here is my code:
<blink>
<cfset feedurl="http://twitter.com/statuses/user_timeline/47847839.rss" />
<cftry>
<cffeed source="#feedurl#" properties="feedmeta" query="feeditems" />
<cfcatch></cfcatch>
</cftry>
<ol>
<cfoutput query="feeditems">
#content# #id# <br><br>
</cfoutput>
</ol>
</blink>
I output a pretty great ordered list, but I can't figure out for the life of me how to parse the content and list how many times each word is used.
Thanks for any help you can provide, I am new to these forums!

You can find a solution here:
http://www.coldfusionjedi.com/index.cfm/2007/8/2/Counting-Word-Instances-in-a-String
Basically, split the string up using regex and then loop over the results. There are some darn good comments here as well.

Related

Web Scraping Google-Sheets ImportXML - xpath - specific Number in URL

I am trying to get a specific Number from an URL, which is hyperlinked on the website.
Please see here a copy of my spreadsheet.
In Row "I" - i did a code, so it will directly go the the search of the eBay website, and combines the EAN number ="https://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw="&""&D2
this is the outcome:
https://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=8713439712292
Till here it works.
On the page, i want the ebay Kategorie ID for that articel, which can be found as a Hyperlink on the Categories [See Image of eBay Categorie here] Navigation on the left.
In the URl it is always the first Number, eg. https://www.ebay.de/sch/**158817**/i.html?_from=R40&_nkw=650135421227
InspectCode URL I need
All I want know, is to put the Number 158817 in my google spreadsheet.
With this code
=IMPORTXML(I2;"//*[#id='x-refine__group__0']/ul/li/ul/li/ul")
I only get the categorie name, but I need the number to make my CSV upload work.
What code do I need? Can Someone please guide me?
thank you
Lisa
With A1 = https://www.ebay.de/sch/**158817**/i.html?_from=R40&_nkw=650135421227, try this
=regexextract(IMPORTXML(A1;"//*[#id='x-refine__group__0']/ul/li/ul/li/ul/li/a/#href");"[0-9]+")
assuming that the url is always at the same position in the nomenclature
or, to get all numbers
=arrayformula(regexextract(IMPORTXML(A1;"//*[#id='x-refine__group__0']/ul/li//a/#href");"[0-9]+"))

EXCEL VLOOKUP/MATCH/VARIABLE

Good Evening and thanks in advance for taking the time to read and help.
I have a 3 column excel file which I am trying to populate the 3rd column with a return value found next to the row its found in.
so for example I want to look at column MANAGERSFULLNAME for value
Cheryl Rommelfanger and find the match in column FULLNAME. Once the match is found I want to populate MANAGERSX2FULLNAME but not with the value found in FULLNAME but with the value next to in column MANAGERSFULLNAME
So for this example we look in MANAGERSFULLNAME for Cheryl Rommelfanger and find the match in FULLNAME Cheryl Rommelfanger then populate MANAGERSX2FULLNAME with
William Dearth
FULLNAME MANAGERSFULLNAME MANAGERSX2FULLNAME
Dena Peters Cheryl Rommelfanger
Kyle Marsh Melissa Hall
Cheryl Rommelfanger William Dearth
ive tried a few things and can only get a count not the value next to it.
=MATCH($E2&$F2,INDEX($B2:B4000&$C2:C4000,),)
=IF(ISERROR(MATCH(E2,F2,$B$2:B$4000,$C$2:C$4000,0)),"",E2)
=IF(ISERROR(MATCH(L2,$K$2:K$4000,0)),"",L20)
any help would be greatly appreciated.
So I apologize but I am having a bit of trouble understanding your columns, but the general idea is clear.
Your attempts are really close. You want to use index(match) as opposed to match(index). The link below describes how to do this.
Index match formula
If I'm understanding you correctly it sounds like you're trying to find and list the bosses boss so-to-speak to display a hierarchy of sorts. I'm using just columns A, B, and C (C being the managerx2fullname) this formula should work fine:
=index(B$2:B$4000,match(B2,A$2:A$4000,0))
You will of course need to change the columns to fit your needs. Don't include a dollar sign in B2 because you want this to increment as you drag the formula down the column. The link below shows a screen shot from my test. In it we see that in row 2 John is Adams boss, who in turn is Joe's boss. I think that's what you're shooting for here.
Screen shot

importXML xpath to google sheets returns #N/A

Under the following span class I am looking to extract the number (416) 123-1234. This number is written after data-number or at the end of the span.
<span class="_Xbe _ZWk kno-fv"> ## The Unique ID is _Xbe _ZWk kno-fv
<a class="fl r-idASUKPhOV34" href="#" data-number="+14161231234" data-pstn-
-call-url="" title="Call via Hangouts" jsaction="r.oVdbr2mIpA8"
data-rtid="idASUKPhOV34" jsl="$t t-6xg4lalHw8M;$x 0;" data-ved="0ahUKEwiDtKTG-snZAhUDzIMKHcntCfAQkAgImAEoADAU">(416) 123-1234</a></span>
The problem comes from the XPATH, I am not specifying the right xpath, I tried to copy xpath from the source code and it returned #N/A. My best guess is the following xpath with importxml, though it still returns #N/A.
=IMPORTXML("https://www.google.com/search?q="&A10,"//span[#class='_Xbe _ZWk kno-fv']/a/#href")
How can I write XPATH to extract the number in either form?

Limit the importxml to a defined span

Currently I am using a transpose and then another column to count the results and give me what I want. But because Tanaike is awesome and helped me on another section, I am trying to wrap my head around what he did and apply it to this.
Starting with this URL in A1,
https://www.zillow.com/homedetails/307-N-Rosedale-Ave-Tulsa-OK-74127/22151896_zpid/
This is the formula in A2:
=If($A$1:A="","",Transpose(importxml($A1:$A,"//span[#class='snl phone']")))
Based on the listing sometimes there are three phone numbers, sometimes four, and sometimes eight that get spread across as many columns as needed.
I am looking for the Property Owner phone number. This is the ELEMENT from the inspection.
<div class="info flat-star-ratings sig-col" id="yui_3_18_1_2_1506365934526_2361"> <span class="snl name notranslate">Property Owner</span> <span class="snl phone" id="yui_3_18_1_2_1506365934526_2360">(918) 740-1698 </span> </div>
So I tried this, and it comes up content is empty. I was thinking to look at the div class info flat, then within that the snl phone, and stop before the /end of span.
=importXML(B17,"//div[#class='info flat-star-ratings sig-col']//span[#class='snl phone']/#span")
What I really need is ONLY the property owner phone number with 95% or greater accuracy.
How about this modification of XPath query?
Modified XPath query :
=importxml(A1,"//div[#class='info flat-star-ratings sig-col']//span[#class='snl phone']")
Result :
If this is not data you want, I'm sorry.
Edit :
4th and 8th number are the same. Is my understanding correct? If it's no problem. Please put URL and a following formula to "A1" and "A2", respectively.
=QUERY(ARRAYFORMULA(IF(IMPORTXML(A1,"//div[#class='info flat-star-ratings sig-col']//span[#class='snl name notranslate']")="Property Owner",IMPORTXML(A1,"//div[#class='info flat-star-ratings sig-col']//span[#class='snl phone']"), "")),"Select * where Col1<>''")
Result :

Ruby pluck second number from this scraped HTML (wombat)

Here's a section of HTML I'm trying to pull some info from:
<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results</span>
</p>
</div>
I just want to store 3744 from the bit I pull (everything inside the <p>), but I'm having a hard time since the of 3744 doesn't have any CSS styling and I don't understand XPaths at all :)
<span>Showing</span>1-30\nof 3744<span>results</span>
How would you parse the above string to only retrieve the total number of results?
As long as it always looks the same you could also use #scan to get just the last number.
str = '<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results</span>
</p>
</div>'
str.scan(/\d+/).pop.to_i
#=> 3744
Update Explanation of how it works
The scan will pull an Array of all the numbers e.g. ["1","30","3744"] then it will pop the last element from the Array "3744" and then convert that to an integer 3744.
Please note that if the number you want is not the last element in the Array then this will not work as you want e.g.
str = '<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results 14</span>
</p>
</div>'
str.scan(/\d+/).pop.to_i
#=> 14
As you can see since I added the number 14 to the results span this is now the last number in the Array and your results are off. So you could modify it to something like this:
str.gsub(/\s+/,'').scan(/\d+-\d+of(\d+)/).flatten.pop.to_i
#=> 3744
What this will do is remove all spaces with gsub then look for a pattern that equates to something along the lines of #{1,}-#{1,}of#{1,} and capture the last group #=> [["3744"]] then flatten the Array #=> ["3744"] then pop and convert to Integer. This seems like a better solution as it will make sure to match the "of ####" section everytime.
Use regexp look example Rubular:
<span>\w+<\/span>\d\-\d+\\[a-z]+\s(\d+)<span>\w+<\/span>
Match groups:
3744

Resources