iMacros - How to Scrape Results from Bing Related searches as TXT for each value? - firefox-addon

I need to scrape or grab Related searches when searching with any keyword.
However, in a red box I mark. iMacros can be scrape with EXTRACT:HTM.
So, I must manually edit to extract them as TXT line by line for each time.
Is there any solution to separate the result to TXT for each before saving to CSV?
This is my code:
TAG POS=1 TYPE=DIV ATTR=CLASS:b_rich EXTRACT=HTM
Thanks

Related

Google ImportXML from QGIS metadata file

I am trying to capture elements of an qmd file (that is xml markup) using Google Sheets importxml. Based on How to use importXML function with a file from Google Drive? I think I've got the file importing correctly but can't seem to capture any of the tags.
Here's what I am trying -
=importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","\\identifier")
Here's what the qmd/xml file looks like
<!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd' 'SYSTEM'>
<qgis version="3.9.0-Master">
<identifier>Z:/My Drive/Mangoesmapping/Spatial Projects/2019/DSC/132_Ongoing_Asset_Updates/Working/Sewerage_Updates/Sewerage_Manholes_InspectionShafts.TAB</identifier>
<parentidentifier>Sewerage Manhole Infrastructure</parentidentifier>
<language>AUS</language>
<type>dataset</type>
<title>Sewerage Manholes within Douglas Shire Council</title>
<abstract>Sewerage Manholes within Douglas Shire Council. Most data has been updated based on field work, review of existing AsCon files and discussion with council staff responsible for the assets in 2018/2019. In Port Douglas most of the infrastructure has been surveyed in. </abstract>
<keywords vocabulary="gmd:topicCategory">
<keyword>Infrastructure</keyword>
<keyword>Sewerage</keyword>
If I use
=importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","*")
I get
But I really would like to just get the elements I want by placing the importxml for each tag in the cell I need it in.
You want to retrieve ### of <identifier>###</identifier> from https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download
I could understand like above. If my understanding is correct, how about this answer?
Issue:
In your question, the formula of =importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","\\identifier") uses \\identifier as the xpath. From your data you want to retrieve the values, it seems that you are trying to retrieve ### of <identifier>###</identifier>.
In this case, in order to Selects nodes in the document from the current node that match the selection no matter where they are, // is required to be used instead of \\. This can be seen at the document of here.
Modified formula:
So =importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","\\identifier") can be modified as follows.
=importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","//identifier")
As other xpath, from your data in your question, you can also use the xpath of /qgis/identifier instead of //identifier. So you can also use the following formula.
=importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","/qgis/identifier")
References:
IMPORTXML
XPath Tutorial

Sensenet : Search documents by its content

I can see the document list can be searched by matching a value of document properties.
But I need to search document list which has a specific word in their content.
How can I achieve this by oData.svc query?
Is it possible using Lucene Index concept?
If yes then how?
Updated
I am working with Sense/Net 6.3.0.6337 Community Version.
I have tried to achieve it by following query
https://example.com/OData.svc/workspaces/Document/abcd_gmail_com/Document_Library/?metadata=no&$select=*&query=Taruna
It is working for .docx and .txt files only but doesn't work for .xml and .pdf file.
Is it sensenet version issue?
Thanks
You can add a query string argument called "query" to each and every GET request, e.g. http://www.example.com/OData.svc/?$select=Name,Index,Icon&query=about
returns content that contain "about" from the whole requested site
You can find more examples here in the Custom Query Options section

How to extract all links from page using iMacros & save them to file

I want to extract all urls from a single page using iMacros, then save it to CSV. I got nothing so far :)
TAG POS={{!LOOP}} TYPE=A ATTR=HREF:* EXTRACT=HREF
SAVEAS TYPE=EXTRACT FOLDER=* FILE=urls.csv
Play the above macro in loop mode with the ‘Max:’ value > than the number of links on your page. (E.g. set it to 99999 and manually stop the macro when tags aren’t found any more).

Google XPATH importxml can find "show" but not "showcount" or "count" [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Using this webpage as an example http://forums.macrumors.com/showthread.php?t=1688317
On a google spreadsheet, the following DO NOT work with importxml():
//a[contains(#href,"showpost")]/#href
//a[contains(#href,"showcount")]/#href
//*[#id="postcount18545482"]
The last one (//*[#id="postcount18545482"]) was copied directly from Chrome's element viewer.
The following DO work but exclude any results with the word "showcount", "postcount", or "showpost":
//div[contains(#id,"post_message")]/#id
//a[contains(#href,"show")]/#href
//a[contains(#href,"post")]/#href
Is there something special about the word "count" when working with importxml() or XPATH? How can I get the missing entries?
ImportXML function in Google Docs spreadsheet can not process data that is created in a two-step process. For example, when an authentication token must be retrieved first before making the url request, or when the URL tells the server to dynamically create an xml output after which the user is redirected to the output, even when the URL stays the same. You might want to look into Google Apps Scripts (http://code.google.com/googleapps/appsscript/index.html) to handle this case.
Taken from here
In your particular case the anchor parameters get set in the vbulletin_post_loader.js script called after the page container is loaded.
...
pc_obj=fetch_object("postcount"+this.postid);
openWindow("showpost.php?"+(SESSIONURL?"s="+SESSIONURL:"")
+(pc_obj!=null?"&postcount="+PHP.urlencode(pc_obj.name):"")+"&p="+A)
...
In other words, when importXML() scans the page, the nodes containing 'showpost' or 'postcount' in href are not yet on the page:
Looks like importXML() works with static pages only and not able to handle dynamically loaded content.
Try to find another way of obtaining the number of post in a thread.

Yahoo Pipes RSS: Copy link to looped results from Twitter search

I am using Yahoo pipes to make automated Twitter Searches using terms from the description fields of an RSS feed.
Pipes makes one search from each item in the feed. Each search returns a set of results which are assigned as item.twitloop (all results)
I would like to replace the link from each item in the results with the link from the original query item;
So far I am only able to assign the original link to the first item in the results list rather than to each item.
http://pipes.yahoo.com/pipes/pipe.edit?_id=01f5f60eb8f3c22b45aa3708e5ae057a
Can anyone see where I'm going wrong?
The pipe isn't loading for me - perhaps you didn't set it as public? In any event, I have solved similar problems in the past by using the Loop module. You put the assignment into the loop (usually a string builder works well), and then have the Loop put that original link into item.link.

Resources