Extracting data using Mechanize gem. Parsing data with CSS header

Extracting data using Mechanize gem. Parsing data with CSS header - ruby-on-rails

I am trying to extract data from a website (http://oregonpinotnoirwine.com/) using Mechanize.
I am able to go to the website and select search field. But I am not able to get the data. I am doing this on Ruby IRB.
require 'mechanize'
agent = Mechanize.new
agent.get("http://oregonpinotnoirwine.com/search.php")
form = agent.page.forms[0]
form["wineava"] = "Dundee Hills"
form.submit
Then I am trying to extract all the list of wines that are on the website. So in order to do that, I looked up CSS of the website and decided to use .a like
agent.page.search(".a")
But that didn't return anything. But when I type
agent.page.search(".")
It returns all the data from the website. Now I'm just trying different variations.. when I type
agent.page.search("#wineava")
It returns with the dropdown option from the site but not the wine list..

So I was over-thinking with this.
All the data I needed was on the dropdown menu. So after accessing the website through
agent.get("http://oregonpinotnoirwine.com/search.php")
I was able to get the data I need by
agent.page.search("#winemaker")
But this method will not work if the items were not displayed in the dropdown menu.. Would it?

Related

Migrate autocomplete field to dynamic dropdown

I’m using Oberon forms version 2019.2 CE.
I want to migrate autocomplete field to dynamic dropdown. As resource (Resource URL) I'm using address: <my_service_address>/name={$fr-search-value}&param2={../c_field_name}
c_field_name is other dynamic dropdown.
When I’m using autocomplete everything is fine (correct response from service), but when I switch to dynamic dropdown the response is incorrect (empty response). It looks like {../c_field_name} is empty value.
When I pass some test value to my service (instead of {../c_field_name}), for example:
<my_service_address>/name={$fr-search-value}&param2=1
everything work fine, so my service working well.
I tried to use $c_field_name instead of {../c_field_name}, but it is not worked.
Do you have any suggestions?

Instead of ../c_field_name, use xxf:instance('fr-form-instance')//c_field_name.
Also see the form attached to this message, which uses this technique to create chained dropdowns, where each dropdown passes to the service the value selected by the user in the previous dropdown. And ideally, you should be able to just write $c_field_name, which is covered by request for enhancement #309.

Change structure of automaticly generated urls in prestashop

I have this website of my client made by someone in prestashop which has search input, and after searching for an item it will display a list of matching products, each linking to its page with a url looking like this:
www.website.com/category/full-product-name.html?search_query=search_phrase&results=2
Where a regular url of the product page looks like this:
www.website.com/category/full-product-name.html
The problem is now the google indexes the duplicated urls as separate pages.
I've never worked with prestashop before but I've looked into the template files and found something what I'd assume is file responsible for generating the content with line responsible for the link looking like this:
<a class="product_img_link" href="{$product.link|escape:'html':'UTF-8'}" title="{$product.name|escape:'html':'UTF-8'}" itemprop="url">
Now as I don't know much about prestashop I don't want to blindly change stuff. How could I change it to have the links from the search results have the same structure as the normal product page urls?

Well I don't know what's the point of allowing search engines indexing search pages but the problem is here. For whatever reason the developers decided to include query string into search result links.
You can create an override of search controller (or custom search module would be even better) and throw that line out and you should have normal product links.

Google Spreadsheet getting text with importxml

I've tried this and other versions to no avail? Can anyone help please?
=IMPORTXML("http://performance.morningstar.com/fund/ratings-risk.action?t=MWTRX", "//*[#id='div_ratings_risk']/table/tbody/tr[4]/td[3]/text()")

As explained in the comments to your original question, initially the div Element with the id #div_ratings_risk is initially empty and does not consist of a table.
So Google spreadsheets is not able to parse content that is not there and yet needs to be loaded first.
The content (table) you try to fetch data from into your google spreadsheet is dynamically loaded using jQuery from another URL. You can get that URL using e.g. the chrome developer tools and filter for XHR request.
If you parse the content directly from that HTML it will work. So you would need to change your formula to that URL and adapt your XPath like so:
=IMPORTXML("http://performance.morningstar.com/ratrisk/RatingRisk/fund/rating-risk.action?&t=XNAS:MWTRX&region=usa&culture=en-US&cur=&ops=clear&s=0P00001G5L&ep=true&comparisonRemove=null&benchmarkSecId=&benchmarktype=", "//table/tbody/tr[4]/td[3]/text()")

JSoup: parse Twitter list

I want to parse a Twitter list (e.g. https://twitter.com/spdbt/lists/spd-bundestagsabgeordnete/members) using JSoup. My problem is, that the page is dynamic, i.e. that I only get the first 20 results from the page. Is there any way JSoup can fetch the whole page?
Currently, my codes looks as follows:
Document doc = Jsoup.connect(listAdress).get();
Elements usernames = doc.select(".username.js-action-profile-name");
Elements realNames = doc.select(".fullname.js-action-profile-name");
// iterate over usernames and realNames and do something
Thanks in advance!

Some work around to achieve this
Launch browser with above URL using Selenium
Load page fully
get the page source using Selenium method.
Pass this content to JSOUP
Parse it.
Logic
WebDriver driver = new FirefoxDriver();
driver.get("https://twitter.com/spdbt/lists/spd-bundestagsabgeordnete/members")
//some logic to scroll or you do it manually
String pageContent = driver.getPageSource();
Document doc = Jsoup.parse(pageContent);
//from here write your logic to get the required values

finally solved the problem by using a Twitter library, but thanks for your help.

Yahoo Pipes RSS: Copy link to looped results from Twitter search

I am using Yahoo pipes to make automated Twitter Searches using terms from the description fields of an RSS feed.
Pipes makes one search from each item in the feed. Each search returns a set of results which are assigned as item.twitloop (all results)
I would like to replace the link from each item in the results with the link from the original query item;
So far I am only able to assign the original link to the first item in the results list rather than to each item.
http://pipes.yahoo.com/pipes/pipe.edit?_id=01f5f60eb8f3c22b45aa3708e5ae057a
Can anyone see where I'm going wrong?

The pipe isn't loading for me - perhaps you didn't set it as public? In any event, I have solved similar problems in the past by using the Loop module. You put the assignment into the loop (usually a string builder works well), and then have the Loop put that original link into item.link.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Extracting data using Mechanize gem. Parsing data with CSS header - ruby-on-rails

Related

Migrate autocomplete field to dynamic dropdown

Change structure of automaticly generated urls in prestashop

Google Spreadsheet getting text with importxml

JSoup: parse Twitter list

Yahoo Pipes RSS: Copy link to looped results from Twitter search

Categories

Resources