Following links to get RSS entry content with feedzirra - ruby-on-rails

I have a Rails app (3.2.11, Ruby 1.9.3) and I'm trying to read the feed at http://www2c.cdc.gov/podcasts/createrss.asp?t=r&c=66 using feedzirra. Looking at the source XML of the feed, this is what entries looks like:
<item>
<title>In the News - Novel (New) Coronavirus in the Arabian Peninsula and United Kingdom</title>
<description>Novel (New) Coronavirus in the Arabian Peninsula and United Kingdom</description>
<link>http://wwwnc.cdc.gov/travel/notices/in-the-news/coronavirus-arabian-peninsula-uk.htm</link>
<guid isPermaLink="true">http://wwwnc.cdc.gov/travel/notices/in-the-news/coronavirus-arabian-peninsula-uk.htm</guid>
<pubDate>Thu, 07 Mar 2013 05:00:00 EST</pubDate>
</item>
<item>
<title>Outbreaks - Dengue in Madeira, Portugal</title>
<description>Dengue in Madeira, Portugal</description>
<link>http://wwwnc.cdc.gov/travel/notices/outbreak-notice/dengue-madeira-portugal.htm</link>
<guid isPermaLink="true">http://wwwnc.cdc.gov/travel/notices/outbreak-notice/dengue-madeira-portugal.htm</guid>
<pubDate>Wed, 20 Feb 2013 05:00:00 EST</pubDate>
</item>
As you can see, this feed doesn't seem to be exposing the entry contents, just a link to the underlying article. My question is this, can I use feedzirra to access the content of the original article? If not, any recommendations on good tools out there? wget? mechanize? httparty? Thanks!

Well, I don't know if it's possible with feedzirra, but from what I see with the XML, all you can get is the title and some more snippets like the description, pubication date..., I can however recommend a tool for this, you should check FeedsAPI , it has a nice simple to use RSS Feeds API and can do what you are tryng to achieve. i hope this could help.

Related

How to index xml data directly on elasticsearch server

I have almost 250 XML data files (one file contain 1000 pairs of xml formatted data) and i have one elasticsearch server. My application build on Ruby on Rails platform. I know how to do index on Model in rails application (ModelName.import) which will do indexes on elasticsearch server.
But is there other way that we can directly do indexing using XML data files on elasticsearch server instead of using .import method?
XML file looks like (XML file may contain 1000 item per file),
<?xml version="1.0" encoding="UTF-8"?>
<catalog items="2" total-pages="260" page="1" per-page="2" status="complete">
<item>
<sku>1</sku>
<vbid>1</vbid>
<created>Sun, 05 Oct 2014 03:35:58 +0000</created>
<updated>Sun, 06 Mar 2016 12:44:48 +0000</updated>
<subjects>
<subject schema="bisac" code="HIS027090">World War I</subject>
<subject schema="coursesmart" code="cs.soc_sci.hist.milit_hist">Social Sciences -> History -> Military History</subject>
</subjects>
<aliases>
<eisbn-canonical>1</eisbn-canonical>
<isbn-canonical>1</isbn-canonical>
<print-isbn-canonical>9780752460864</print-isbn-canonical>
<fpid/>
<isbn13>1</isbn13>
<isbn10>0750951796</isbn10>
<additional-isbns>
<isbn type="print-isbn-10">0752460862</isbn>
<isbn type="print-isbn-13">9780752460864</isbn>
</additional-isbns>
</aliases>
</item>
<item>
<sku>2</sku>
<vbid>2</vbid>
<created>Sun, 05 Oct 2014 03:35:58 +0000</created>
<updated>Sun, 06 Mar 2016 12:44:48 +0000</updated>
<subjects>
<subject schema="bisac" code="HIS027090">World War I</subject>
<subject schema="coursesmart" code="cs.soc_sci.hist.milit_hist">Social Sciences -> History -> Military History</subject>
</subjects>
<aliases>
<eisbn-canonical>2</eisbn-canonical>
<isbn-canonical>2</isbn-canonical>
<print-isbn-canonical>9780752460864</print-isbn-canonical>
<fpid/>
<isbn13>2</isbn13>
<isbn10>0750951796</isbn10>
<additional-isbns>
<isbn type="print-isbn-10">0752460862</isbn>
<isbn type="print-isbn-13">9780752460864</isbn>
</additional-isbns>
</aliases>
</item>
</catalog>

Consistent Encoding for iCal file import

I'm trying to use the iCalendar gem to import some iCal files on a rails 4 site.
Sometimes the file is of type 'text/calendar;charset=utf-8' and sometimes its 'text/calendar; charset=UTF-8;'
I am retrieving it like this:
uri = URI.parse(url)
calendar = Net::HTTP.get_response(uri)
new_calendar = Icalendar.parse(calendar.body)
When its text/calendar;charset=utf-8 it works fine. but when its text/calendar; charset=UTF-8 encoded I get UTF codes in the string
SUMMARY:Tech Job Fair – City(ST) – Jul 1, 2015
ends up being
["Tech Job Fair \xE2\x80\x93 City(ST) \xE2\x80\x93 Jul 1", " 2015"]
Which is then saved to the database and that is undesirable.
Is the charset/content-type revealing the problem here or could it actually just be encoded wrong from the source?
How do I change my retrieval commands to strip those codes out effectively or tell it its a UTF string so it doesn't include them in the first place?
Update: it looks like some are text/calendar;charset=utf-8 and some are text/calendar;charset=UTF-8 and some are text/calendar; charset=UTF-8. Note the last one has a space between the two segments. Could this be causing an issue?
Update2: Opening up my three example iCal files in Notepad++ shows them encoded as "UTF-8 without BOM" in the menu.

Open URI Wrong Output

I am trying to download images from the web and upload them back to Cloudinary. The code I have works for some images, but not for others. I have isolated the problem down to this line (it requires open-uri):
image = open(params[:product_image][:main])
For this image, it works fine. image is
#<Tempfile:/var/folders/49/bmhbmmzj5fl31dm9j6m6gxr00000gn/T/open-uri20150526-7662-1b676ws>
and cloudinary accepts this. However, when I try to pull this image, image becomes
#<StringIO:0x007fa0267c8f80 #base_uri=#<URI::HTTP:0x007fa0267c92c8 URL:http://www.spiresources.net/WebImages/480/swatch/CELW.JPG>,
#meta={"date"=>"Tue, 26 May 2015 22:17:47 GMT", "server"=>"Apache/2.2.22 (Ubuntu)",
"last-modified"=>"Mon, 29 Jun 2009 00:00:00 GMT", "etag"=>"\"44700f-c35-46d715f090000\"",
"accept-ranges"=>"bytes", "content-length"=>"3125", "content-type"=>"image/jpeg"}, #metas={"date"=>["Tue, 26 May 2015 22:17:47 GMT"], "server"=>["Apache/2.2.22 (Ubuntu)"],
"last-modified"=>["Mon, 29 Jun 2009 00:00:00 GMT"], "etag"=>["\"44700f-c35-46d715f090000\""], "accept-ranges"=>["bytes"],
"content-length"=>["3125"], "content-type"=>["image/jpeg"]}, #status=["200", "OK"]>
which cloudinary rejects and raises an error of "No conversion of StringIO to string". Why does open-uri return different objects for what would seem like similar images? How can I make open-uri return a tempfile or at least turn my StringIO to a tempfile?
You can simply give the URL to the Cloudinary upload method. Then Cloudinary will fetch the remote resource directly.

Parsing Description tag in Rss Feed in iOS

Am facing Problem in handling the description tag of RSS feed in iOS.
I have given an example of RSS feed i have received.
I can not handle this description field without knowing the feed beforehand, so I can not make this parser generic.
my question is, can we make a generic RSS feed parser? If yes, then how? i have tried using NSScanner, but somehow i felt it was not much efficient. do we get a better alternative?
EDIT:
i have already parsed the feed using NSXMLParser, i am getting the description field including the html tags, i want to get the original values extracted from there
<item>
<title>End slavery in the U.S., world</title>
<guid isPermaLink="false">http://www.cnn.com/2013/10/23/opinion/myles-slavery/index.html</guid>
<link>http://rss.cnn.com/~r/rss/cnn_topstories/~3/Z13FFqE4z54/index.html</link>
<description>The extraordinary new film "12 Years a Slave" immerses us in the reality of historical slavery at a deep level of complexity and nuance. The film is an opportunity to honor all who were held in chattel slavery, treated like property, and subjected to levels of violence, torture, and control that no human should ever endure.<div class="feedflare">
<a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=Z13FFqE4z54:pYCgKZFqbkU:yIl2AUoC8zA"><img
src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=yIl2AUoC8zA" border="0"></img></a> <a
href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=Z13FFqE4z54:pYCgKZFqbkU:7Q72WNTAKBA"><img
src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=7Q72WNTAKBA" border="0"></img></a> <a
href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=Z13FFqE4z54:pYCgKZFqbkU:V_sGLiPBpWU"><img
src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?i=Z13FFqE4z54:pYCgKZFqbkU:V_sGLiPBpWU" border="0"></img></a>
<a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=Z13FFqE4z54:pYCgKZFqbkU:qj6IDK7rITs"><img
src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=qj6IDK7rITs" border="0"></img></a> <a
href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=Z13FFqE4z54:pYCgKZFqbkU:gIN9vFwOqvQ"><
img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?i=Z13FFqE4z54:pYCgKZFqbkU:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/rss/cnn_topstories/~4/Z13FFqE4z54" height="1" width="1"/>
</description>
<pubDate>Wed, 23 Oct 2013 09:05:27 EDT</pubDate>
<feedburner:origLink>http://www.cnn.com/2013/10/23/opinion/myles-slavery/index.html</feedburner:origLink>
</item>
RSS is just XML and is a well-defined format, so you can use NSXMLParser to parse the feed and extract the information you need.

JqGrid DataBinding exception while exporting to excel file

I am trying to export JqGrid to excel so i follow this instruction and i use it like at below.
var grid = new JqGridModelParticipiant().JqGridParticipiant;
var query = db.ReservationSet.Select(r => new
{
r.Id,
Name = r.Doctor.Name,
Identity = r.Doctor.Identity,
Title = r.Doctor.Title.Name,
Total = r.TotalTL,
Organization = r.Organization.Name
});
grid.ExportToExcel(query,"file.xls");
And i get below exception on the line of " grid.ExportToExcel(query,"file.xls");"
Data binding directly to a store query (DbSet, DbQuery, DbSqlQuery) is
not supported. Instead populate a DbSet with data, for example by
calling Load on the DbSet, and then bind to local data. For WPF bind
to DbSet.Local. For WinForms bind to DbSet.Local.ToBindingList().
As far as i understand that it expect to have ObservableCollection that is on DbSet.Local member. But i am working on projected query so i can't do that.
What is the solution for this problem.
In the answer I posted the demo which shows how to implement export to Excel (real *.XLSX file instead of HTML fragment renamed to *.XLS used here).
The method used for exported to the Excel in jqSuite (the demo) produce HTML fragment like
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: application/excel; charset=utf-8
Server: Microsoft-IIS/7.0
X-AspNetMvc-Version: 2.0
content-disposition: attachment; filename=grid.xls
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Fri, 29 Jun 2012 14:24:54 GMT
Connection: close
<table cellspacing="0" rules="all" border="1" id="_exportGrid" style="border-collapse:collapse;">
<tr>
<td>OrderID</td><td>CustomerID</td><td>OrderDate</td><td>Freight</td><td>ShipName</td>
</tr><tr>
<td>10248</td><td>VINET</td><td>1996/07/04</td><td>32.3800</td><td>Vins et alcools Chevalier</td>
</tr><tr>
<td>10249</td><td>TOMSP</td><td>1996/07/05</td><td>11.6100</td><td>Toms Spezialitäten</td>
</tr><tr>
<td>10250</td><td>HANAR</td><td>1996/07/08</td><td>65.8300</td><td>Hanari Carnes</td>
</tr><tr>
...
</table>
instead of creating of real Excel file. The way is very unsafe because at the opening the "Standard" type of data will be always used. For example if you would export the data like
<td>10249</td><td>TOMSP</td><td>1996/07/05</td><td>11.02.12</td><td>Toms Spezialitäten</td>
the text "11.02.12" will be automatically converted to the date 11.02.2012 if German locale are used as default:
The name "Toms Spezialitäten" from will be wrong displayed as "Toms Spezialitäten".
It can be especially dangerous in case of large table where some small part of data in the middle of grid will be wrong converted. In one project I displayed information about Software and some software versions will be wrong converted to the Date type.
Because of such and other close problems I create real Excel file on the server using Open XML SDK 2.5 or Open XML SDK 2.0. In the way one have no problems described above. So I recommend you to follow the approach described in my old answer.

Resources