Is there a way to parse external RSS Feeds with Jekyll? - parsing

I have several websites and would like to show content via RSS like headlines in a Jekyll project. Is it possible to parse external rss feeds with jekyll and use them?

Yes. You'd either want to create a plugin to fetch and parse the external feeds during jekyll build or, plan B, you could always fetch and parse the feeds client-side with AJAX. Since you asked for a Jekyll answer, here's a rough approximation of the former approach:
# Runs during jekyll build
class RssFeedCollector < Generator
safe true
priority :high
def generate(site)
# TODO: Insert code here to fetch RSS feeds
rss_item_coll = null;
# Create a new on-the-fly Jekyll collection called "external_feed"
jekyll_coll = Jekyll::Collection.new(site, 'external_feed')
site.collections['external_feed'] = jekyll_coll
# Add fake virtual documents to the collection
rss_item_coll.each do |item|
title = item[:title]
content = item[:content]
guid = item[:guid]
path = "_rss/" + guid + ".md"
path = site.in_source_dir(path)
doc = Jekyll::Document.new(path, { :site => site, :collection => jekyll_coll })
doc.data['title'] = title;
doc.data['feed_content'] = content;
jekyll_coll.docs << doc
end
end
end
You can then access the collection in your template like so:
{% for item in site.collections['external_feed'].docs %}
<h2>{{ item.title }}</h2>
<p>{{ item.feed_content }}</p>
{% endfor %}
There are a lot of possible variations on the theme but that's the idea.

Well, I don't think Jekyll per se can do that... because Jekyll is more of a CMS. However, Jekyll is written in Ruby can I believe you can easily run ruby/rake tasks with Jekyll (that's even probably what's being used when you build a Jekyll site), so I believe you should probably do that as a ruby script.

Related

Jobs update with Dashing and Ruby

I use Dashing for monitor trends and website statistics.
I create a jobs to check GooglesNews trends and Twitter trends .
The data is displayed well, however, they appear at first load and does put more update then. There is the code for twitter_trends.rb :
require 'nokogiri'
require 'open-uri'
url = 'http://trends24.in/france/~cloud'
data = Nokogiri::HTML(open(url))
list = data.xpath('//ol/li')
tags = list.collect do |tag|
tag.xpath('a').text
end
tags = tags.take(10)
tag_counts = Hash.new({value: 0})
SCHEDULER.every '10s' do
tag = tags.sample
tag_counts[tag] = {label: tag}
send_event('twitter_trends', {items: tag_counts.values})
end
I think I used bad "rufus-scheduler" to schedule my job jobs https://gist.github.com/pushmatrix/3978821#file-sample_job-rb
How to make the data will update correctly on a regular basis ?
Your scheduler looks fine, but it looks like you're making one call to the website:
data = Nokogiri::HTML(open(url))
But never calling it again. Is your intent to only check that site once along with the initial processing of it?
I assume you'd really want to wrap more of your logic into the scheduler loop - only things in there will be rerun when the schedule job hits.
When you covered everything in a scheduler, you are only taking one sample every 10 seconds (http://ruby-doc.org/core-2.2.0/Array.html#method-i-sample) then adding it to tag_counts. This is clearing the tag each time. Thing to remember about schedulers is it's basically a clean slate every time it runs. I'd recommend looping through tags and adding them to tag_counts that way instead of sampling. sampling is kind of unnecessary seeing as you are reducing it to 10 each time you run the scheduler.
If I move the SCHEDULER like this (after url on top), it works but that only one item appears randomly every 10 seconds.
require 'nokogiri'
require 'open-uri'
url = 'http://trends24.in/france/~cloud'
SCHEDULER.every '10s' do
data = Nokogiri::HTML(open(url))
list = data.xpath('//ol/li')
tags = list.collect do |tag|
tag.xpath('a').text
end
tags = tags.take(10)
tag_counts = Hash.new({value: 0})
tag = tags.sample
tag_counts[tag] = {label: tag}
send_event('twitter_trends', {items: tag_counts.values})
end
How to display a list of 10 items, which is updated regularly ?

In need of an explanation of Web scraping with Nokogiri in Rails

I am utterly confused and lost with Nokogiri and web scraping in Rails. I need someone to explain to me how I can get article titles from a web site to list in a view in my Rails application. I can manage to retrieve the data in irb however I have no clue how I can get that same data to be displayed in a view I made.
I have watched a number of tutorials and read documentation and one thing that they do that confuses me the most is when they require nokogiri or open-uri in a their example ruby file what directory is that ruby file supposed to be placed in? Also is that file associated with any controller for it to be displayed in the particular view that I made?
I hope I am explaining my issue as clear as possible without any confusion as I am not trying to confuse myself anymore that i am in my explanation.
See, what I am trying to do is create an application where the user can register and sign in, after they are signed in they are redirected to a page with 3 links. Those links being Audi, BMW and Mercedes-Benz and depending on which link is clicked the user will be then directed to another page where they are returned back a list of articles that mention their desired choice.
I hope this explanation was helpful and I really hope someone can offer to help or give me some kind of documentation that will benefit me.
Thank you!
This is what I did in irb:
2.1.1 :001 > require 'rubygems'
=> false
2.1.1 :002 > require 'nokogiri'
=> true
2.1.1 :003 > require 'open-uri'
=> true
2.1.1 :004 > page = Nokogiri::HTML(open("http://www.dtm.com/de/News/Archiv/index.html"))
I then got this returned:
=> #<Nokogiri::HTML::Document:0x814e3b40 name="document" children=[#<Nokogiri::XML::DTD:0x814e37f8 name="HTML">, #<Nokogiri::XML::Element:0x814e358c name="html" children=[#<Nokogiri::XML::Text:0x814e3384 "\r\n">, #<Nokogiri::XML::Element:0x814e32d0 name="head" children=[#<Nokogiri::XML::Text:0x814e30f0 "\r\n">, #<Nokogiri::XML::Element:0x814e3028 name="title" children=[#<Nokogiri::XML::Text:0x814e2e48 "DTM | Newsarchiv">]>, #<Nokogiri::XML::Text:0x814e2c90 "\r\n">, #<Nokogiri::XML::Element:0x814e2bc8 name="meta" attributes=[#<Nokogiri::XML::Attr:0x814e2b64 name="charset" value="utf-8">]>, #<Nokogiri::XML::Text:0x814e2718 "\r\n">, #<Nokogiri::XML::Element:0x814e2664 name="meta" ...
(I got more but just put up a few lines of what was returned) I am assuming this is the raw data from the page.
I then put:
2.1.1 :008 > puts page
Which returned back the raw HTML content.
Finally I entered:
2.1.1 :014 > page.css("a")
Which returned back the all the links on the page.
I am hoping to help you with a real world example. Lets get some data from Reuters for example.
In your console try this:
# require your tools make sure you have gem install nokogiri
pry(main)> require 'nokogiri'
pry(main)> require 'open-uri'
# set the url
pry(main)> url = "http://www.reuters.com/finance/stocks/overview?symbol=0005.HK"
# load and assign to a variable
pry(main)> doc = Nokogiri::HTML(open(url))
# take a piece of the site that has an element style .sectionQuote you can use ids also
pry(main)> quote = doc.css(".sectionQuote")
Now if you have a look in quote you will see you will have Nokogiri elements. Lets have a look inside:
pry(main)> quote.size
=> 6
pry(main)> quote.first
=> #(Element:0x43ff468 {
name = "div",
attributes = [ #(Attr:0x43ff404 { name = "class", value = "sectionQuote nasdaqChange" })],
children = [
#(Text "\n\t\t\t"),
#(Element:0x43fef18 {
name = "div",
attributes = [ #(Attr:0x43feeb4 { name = "class", value = "sectionQuoteDetail" })],
children = [
#(Text "\n\t\t\t\t"),
#(Element:0x43fe9c8 { name = "span", attributes = [ #(Attr:0x43fe964 { name = "class", value = "nasdaqChangeHeader" })], children = [ #(Text "0005.HK on Hong Kong Stock")] }),
.....
}),
#(Text "\n\t\t")]
})
You can see that nokogiri has essentially encapsulated each DOM element, so that you can search and access it quickly.
if you want to just simply display this div element you can:
pry(main)> quote.first.to_html
=> "<div class=\"sectionQuote nasdaqChange\">\n\t\t\t<div class=\"sectionQuoteDetail\">\n\t\t\t\t<span class=\"nasdaqChangeHeader\">0005.HK on Hong Kong Stock</span>\n\t\t\t\t<br class=\"clear\"><br class=\"clear\">\n\t\t\t\t<span style=\"font-size: 23px;\">\n\t\t\t\t82.85</span><span>HKD</span><br>\n\t\t\t\t<span class=\"nasdaqChangeTime\">14 Aug 2014</span>\n\t\t\t</div>\n\t\t</div>"
and it is possible to use it directly in the view of a rails application.
if you want to be more specific and take individual components and traverse by looping the quote variable for elements one level down, in this instance you can:
pry(main)> quote.each{|p| puts p.inspect}
Or be very specific and get the value of an element ie the name of the stock in our example:
pry(main)> quote.at_css(".nasdaqChangeHeader").content
=> "0005.HK on Hong Kong Stock"
This is a very useful link: http://nokogiri.org/tutorials/searching_a_xml_html_document.html
Really hope this helps
PS: A tip for looking inside objects
(http://ruby-doc.org/core-2.1.1/Object.html#method-i-inspect)
puts quote.inspect
First, you can put nokogiri and openuri in the gemfile of your rails app, with that in place you don't need to require these libraries.
You flow to scrape the sites should be:
# put this code on your controller
web_site = params[:web_site] # could be http://www.bmw.com/com/en/
#doc = Nokogiri::HTML(open(web_site))
#then you can iterate over the document in your view
<% #doc.css('.standardTeaser').each do |teaser_bmw| %>
<p>teaser_bmw.css('.headline').text </p>
#other content of teaser you can search here
<% end %>
So, to scrape the web site you need to fetch the html from the web site and find what content you want to grab.
If you know some basics of css selector it will be very easy to do. Me example doesn't take in account if you want to save the data in a database... but if you want, you just need to create a table with the field you need to save and than create a record after parsing the html.
Is that made sense to you?

Rails + Amazon API + Integration

I'm a rails newb - in a little over my head and could use some help.
I have an existing rails app, and I'm trying to integrate the Amazon Products API with the gem "ruby-aaws"....i.e., place items inside a model, show them in the view, etc.
I've never worked with an external API before, so I'm not sure where to begin to start integration. Any help at all is much appreciated!
Here is some of the code that I've used to pull data with the API:
require 'amazon/aws'
require 'amazon/aws/search'
include Amazon::AWS
include Amazon::AWS::Search
is = ItemSearch.new( 'Watches', { 'Keywords' => 'Gucci' } )
rg = ResponseGroup.new( 'Large' )
req = Request.new
req.locale = 'us'
resp = req.search( is, rg )
items = resp.item_search_response[0].items[0].item
# Available properties for first item:
#
puts items[0].properties
items.each do |item|
attribs = item.item_attributes[0]
puts attribs.label
if attribs.list_price
puts attribs.title, attribs.list_price[0].formatted_price, item.medium_image, ''
end
end
I am also a newbie and trying to do something similar. I found this example on GitHub that looks really promising.
https://github.com/hundredwatt/Amazon-Product-Search-Example
But there are also some great related questions here that have answers for you:
Ruby Amazon book search
Good luck!

Generate a link_to on the fly if a URL is found inside the contents of a db text field?

I have an automated report tool (corp intranet) where the admins have a few text area boxes to enter some text for different parts of the email body.
What I'd like to do is parse the contents of the text area and wrap any hyperlinks found with link tags (so when the report goes out there are links instead of text urls).
Is ther a simple way to do something like this without figuring out a way of parsing the text to add link tags around a found (['http:','https:','ftp:] TO the first SPACE after)?
Thank You!
Ruby 1.87, Rails 2.3.5
Make a helper :
def make_urls(text)
urls = %r{(?:https?|ftp|mailto)://\S+}i
html_text = text.gsub urls, '\0'
html_text
end
on the view just call this function , you will get the expected output.
like :
irb(main):001:0> string = 'here is a link: http://google.com'
=> "here is a link: http://google.com"
irb(main):002:0> urls = %r{(?:https?|ftp|mailto)://\S+}i
=> /(?:https?|ftp|mailto):\/\/\S+/i
irb(main):003:0> html = string.gsub urls, '\0'
=> "here is a link: http://google.com"
There are many ways to accomplish your goal. One way would be to use Regex. If you have never heard of regex, this wikipedia entry should bring you up to speed.
For example:
content_string = "Blah ablal blabla lbal blah blaha http://www.google.com/ adsf dasd dadf dfasdf dadf sdfasdf dadf dfaksjdf kjdfasdf http://www.apple.com/ blah blah blah."
content_string.split(/\s+/).find_all { |u| u =~ /^https?:/ }
Which will return: ["http://www.google.com/", "http://www.apple.com/"]
Now, for the second half of the problem, you will use the array returned above to subsititue the text links for hyperlinks.
links = ["http://www.google.com/", "http://www.apple.com/"]
links.each do |l|
content_string.gsub!(l, "<a href='#{l}'>#{l}</a>")
end
content_string will now be updated to contain HTML hyperlinks for all http/https URLs.
As I mentioned earlier, there are numerous ways to tackle this problem - to find the URLs you could also do something like:
require 'uri'
URI.extract(content_string, ['http', 'https'])
I hope this helps you.

Magento multilanguage - double change in language resuts in 404 (or how to change language within stores not views)

I have a problem with a magento installation. I used Magento ver. 1.5.0.1, community edition to develop this website http://cissmarket.com/.
The problem appears when I change the language from the EU version to French and after that to German. The change to french is ok, but when in the same page i change to German i receive a 404 error. Also this is generation 404 errors in the Google webmaster tools and when i try for example to take this link and paste it in the browser it gives me also a 404 error. I have there some 50 products and ~550 404 errors in Google Webmaster tools. I understand that the problem is from what I described.
Moreover I have a SEO problem since I have this page in french:
http://cissmarket.com/de/cartouches-refilables.html
And when I switch to the german version of the website it takes me to this link
http://cissmarket.com/de/cartouches-refilables.html?___from_store=fr (if i try now to switch to uk I will get the 404 mentioned above)
instead of going to this one:
http://cissmarket.com/de/nachfullpatronen.html
Already checked this 404 error when switching between stores when in a category on magento but it does not relate to my problem.
About settings:
I use the caching service and also I did index all the content.
The product or category I am trying to access is available and set active for all the languages.
System > General > Web > URL options > Add Store Code to Urls is set
to yes.
System > General > Web > Search Engines Optimization > Use Web Server
Rewrites is set to yes.
No other changes has been made to the .htaccess file except for the
ones that the system itself made.
So to conclude: the problem is the 404 given by 2 succesive changes of the language and the bad url address when I switch from one page to another.
Any suggestions would be appreciated.
UPDATE: tried this http://www.activo.com/how-to-avoid-the-___from_store-query-parameter-when-switching-store-views-in-magento but it results in a 404 at the first language change
Edit #1:
Found the problem: file languages.phtml contained this code <?php echo str_replace ("/fr/","/de/",$_lang->getCurrentUrl()); ?> and actually did replace only the language code and not the whole url according to the corresponding translation.
So applied to this
http://cissmarket.com/fr/cartouches-refilables.html
it will return
http://cissmarket.com/de/cartouches-refilables.html
So does anyone know how to get the corresponding URL of the current page for the other languages available in the store?
Edit #2 (using #Vinai solution):
It works on the product pages but not on the category yet.
There is no such thing in the native Magento as far as I know.
That said, you can use the following code to get the current page URL for each store.
$resource = Mage::getSingleton('core/resource');
$requestPath = Mage::getSingleton('core/url')->escape(
trim(Mage::app()->getRequest()->getRequestString(), '/')
);
$select = $resource->getConnection('default_read')->select()
->from(array('c' => $resource->getTableName('core/url_rewrite')), '')
->where('c.request_path=?', $requestPath)
->where('c.store_id=?', Mage::app()->getStore()->getId())
->joinInner(
array('t' => $resource->getTableName('core/url_rewrite')),
"t.category_id=c.category_id AND t.product_id=c.product_id AND t.id_path=c.id_path",
array('t.store_id', 't.request_path')
);
$storeUrls = (array) $resource->getConnection('default_read')
->fetchPairs($select);
This will give you an array with the array key being the store IDs and the array values being the request path after the Magento base URL, e.g. assuming your French store has the ID 1 and the German one has the ID 2, you would get:
Array
(
[1] => cartouches-refilables.html
[2] => nachfullpatronen.html
)
Then, in the foreach loop where the URL for each store is output, use
<?php $url = isset($storeUrls[$_lang->getId()]) ? $_lang->getUrl($storeUrls[$_lang->getId()]) : $_lang->getCurrentUrl() ?>
The call to $_lang->getUrl() will add the base URL, so you will get the full URL for each store (e.g. http://cissmarket.com/de/nachfullpatronen.html). If no store view value is found in the core_url_rewrite table it will revert to the default behaviour.
You still need the ___store=fr query parameter because otherwise Magento will think you are trying to access the new path in the context of the old store. Luckily, the getUrl() call an the store model adds that for you automatically.
The code querying the database can be anywhere of course (since its PHP), even in the template, but please don't put it there. The correct place to have code that access the database is a resource model. I suggest you create a resource model and put it in a method there.
I've found an ugly patch until a better approach comes up.
In the admin section, i've added the following javascript inside the wysiwyg in CMS > PAGES > (My 404 pages) (at the beginning of the wysiwyg) :
<script type="text/javascript" language="javascript">// <![CDATA[
var lang = "en";
var rooturl = "{{config path="web/unsecure/base_url"}}"
var url = document.location.href;
if(!(url.match("/"+lang+"/")))
{
var newUrl = url.replace(rooturl , rooturl+lang+"/" );
window.location.href = newUrl;
}
// ]]></script>
(Note: you need to do this for all of your translated 404 pages. In each 404 page you need to modify lang="en" for your storeview url value)
Because the wysiwyg (tiny_mce) does not allow javascript to be threated, you'll have to modify js/mage/adminhtml/wysiwyg/tinymce/setup.js. Add the following code under line 97 (under "var settings = "):
extended_valid_elements : 'script[language|type|src]',
For Magento 1.7.0.2 and 1.8.0.0 this is the bugfix:
Starting from line 251 of /app/code/core/Mage/Core/Model/Url/Rewrite.php
:
Mage::app()->getCookie()->set(Mage_Core_Model_Store::COOKIE_NAME, $currentStore->getCode(), true);
// endur 02-03-2013 fix for missed store code
// $targetUrl = $request->getBaseUrl(). '/' . $this->getRequestPath();
if (Mage::getStoreConfig('web/url/use_store') && $storeCode = Mage::app()->getStore()>getCode()) {
$targetUrl = $request->getBaseUrl(). '/' . $storeCode . '/' .$this->getRequestPath();
} else {
$targetUrl = $request->getBaseUrl(). '/' . $this->getRequestPath();
}
// endur 02-03-2013 end
Make sure to create a custom copy of the file in:
/app/code/local/Mage/Core/Model/Url/Rewrite.php or:
/app/code/local/YourTheme/Mage/Core/Model/Url/Rewrite.php
source:
It looks like a bug in Magento 1.7. Here is a hack that worked for me.
It should work for a two language store with store code in URL
in var/www/html/shop1/app/code/core/Mage/Core/Model/Url/Rewrite.php
remove this line
// $targetUrl = $request->getBaseUrl(). '/' . $this->getRequestPath();
and add these:
$storecode = Mage::app()->getStore()->getCode();
if ($storecode='en')
{
$targetUrl = $request->getBaseUrl(). '/'.$storecode.'/' . $this->getRequestPath();
}
else
{
$targetUrl = $request->getBaseUrl(). '/' . $this->getRequestPath();
}
Here a another solution for this problem. Just add this code after "$this->load($pathInfo, 'request_path');" in app/code/core/Mage/Core/Model/Url/Rewrite.php:
if (!$this->getId() && !isset($_GET['___from_store'])) {
$db = Mage::getSingleton('core/resource')->getConnection('default_read');
$result = $db->query('select store_id from core_url_rewrite WHERE request_path = "' . $pathInfo . '"');
if ($result) {
$storeIds = array();
if($row = $result->fetch(PDO::FETCH_ASSOC)) {
$storeId = $row['store_id'];
$storeCode = Mage::app()->getStore($storeId)->getCode();
header("HTTP/1.1 301 Moved Permanently");
header("Location: http://" . $_SERVER['HTTP_HOST'] . "/" . $pathInfo . "?___store=" . $storeCode);
exit();
}
}
}
guys. For this error there is magento module. It have rewrite 2 models
http://www.magentocommerce.com/magento-connect/fix-404-error-in-language-switching.html

Resources