Download, Unzip, and Parse a file in Ruby/Rails

Download, Unzip, and Parse a file in Ruby/Rails - ruby-on-rails

I'm working with the very frustrating Bing Ads Api (SOAP) and while I've successfully executed the majority of SOAP requests I need, the last one is giving me trouble.
The team there tells me that to get an ad campaigns stats (clicks, impressions, conversions, etc.) I need to Request a report be generated (pass it the parameters), then take the reportID from the response and "poll" the report with another SOAP request, which yields a download URL for a zip file.
I've successfully done all the above and the download URL (which is only good for 5mins) looks like this:
https://download.api.bingads.microsoft.com/ReportDownload/Download.aspx?q=k471B%2fhtf62jwhaelHhu0EqMSfWCvWSpOOBRu76%2bUC%2bgATLEobf%2bMYiVKX0CBOr52d95ViPXJeKbvAbnb%2bSK%2bGumYlSYQT80kTtt5waa5z%2fmbeXT%2fPFqde95DFR1%2b4yQgekl5T6gKipbMFcQJOn5aGYmtI1ALcREIwJRA%2bi%2b3jOE55Cl69TAzBOUWvB73NAKX6S0Y7zF%2bERnSu7TJnJfmqHopWihGtkeMzoqqwsJVgVDEKz84RrPPaDOs2pxg3qE%2bLSrEwu2cpa7bP%2f9t%2fjUVtIgiZMbMjzSf73VnAUSpYNz
When I go to that URL, its starts to download a zip file that once unzipped, does contain the XML that I need to parse to report to users of the web app I'm creating.
My question is - What is the best way to get at that XML consistently within the app? This really seems like an arduous approach for the app to take considering all the above would have to take place every time a user loads the Bing page, or changes the date range, but they tell me its the only way to do it.
The path I've been heading down is to get the report with HTTParty and then unzip with RubyZip (have been unsuccessful because of undefined conversion error issues) but I'm unsure what to do from there. Store in a database (maybe temporarily)?
Any help would be greatly appreciated.

If there is no better way to use the API, cache the results (in your db or on the filesystem) and refresh the data using a rake task that will run periodically. If you do this, consider adding an option for the user to request an immediate refresh.

Use a background task to download the zip file and then process it, something like delayed_job or resque could be used to start the job with the URL.

Related

Check if resource at NSURL exists BEFORE downloading

I want to pull down new images from a website that is updated regularly. However, there is no API for that site, so I'm trying to fetch it by url, as the images are numbered. The way I've decided to do this is have the app query the website on first launch and loop through the URLs until I get a 404. Then I store that URL and on next launch loop through till I get another 404, etc. Now I know that is less than ideal, but I'm just building a prototype. Anyway…
Since these are images we're talking about, I can't just download them all on first launch. Waste of bandwidth, could take minutes or even hours… So I just need a way using NSURLSession or whatever to get the http status code for any given image without actually downloading it.

You can do this by getting the "HTTP Headers".
This answer will help, if you're comfortable porting Objective-C to help. I can help you with it if required.

Get Info about several files in 1 request

I have a bunch of files downloaded for offline viewing from Box together with their IDs/modifiedAt.
I want to check them all for updated modifiedAt.
I am currently using files/FILE_ID (http://developers.box.com/docs/#files-get), but then I need to send a request for each file.
Is there any way to get file info in bulk, many files at once?
(I am using the iOS SDK)
Thank you

The API currently does not support bulk operations, unfortunately, though it's something we're considering.

Update callback when doc resource changes?

I'm using Google Docs API. How can I know when a file resource changes?
I don't want to repeatedly poll data.

Firstly, you should be using the Google Drive API, which has replaced the Google Documents List API.
Right now, there is no notification system, but it is a feature we are working on. Polling the changes feed every few minutes is not too bad, but it is really not good in some situations.

You can't get a callback, I'm afraid. To avoid polling the file data, however, you could poll the metadata only and test the md5 checksum field to determine when the file has changed.
The md5 checksum field is located in docs$md5Checksum (JSON) or docs:md5Checksum (XML) in the document list entry for the file.

Generating a QR code in rails

I want to generate QR codes in ruby on rails, to run in the background of my website written in rails. Saw this http://code.google.com/p/qrcode-rails/ but cannot work out how I could get this to work for me. Basically in RoR I want to:
Pass a generator a string, my unique code, a 20 character length number (e.g. 32032928889998887776) and have an image generated with the name 'code'_qr.jpg and saved in a resource folder to be attached to an email that my program will send out.
How would I do this, does anyone know?
And while I'm asking (not so important that I get this answer now) but how would I implement QR code reading in, to get that code back, from a web cam? Thanks.

If you just need to write the data from the URL to a file, you can open up a stream, read from the file, and simply write the data to disk -- just remember to use the same extension (.jpg in this instance.)
Note that you could also simply send the link in the email (or post it as an inline image in the email.) If you really, really want to write it to disk and send it as attachment in your production system, the first-class solution for Ruby image processing is ruby-vips or ImageMagick.
Finally, since it's a disk operation, you're going to want to do it outside the normal web request cycle -- you're probably best off farming the operation out with delayed_job, or at the very least triggering the process with an AJAX request. Both of these give you the advantage that you can present a progress bar for the operation.

Ruby Rss parser and event trigger

I'm using RSS library so i can parse Atom and RSS in Ruby and Rails and store it in a model.
I've looked at the standard RSS library, but is there one library that will auto-detect that there is a new rss feed so i can update my database ?
what are the best practice to trigger an instruction in order to store the new rss feed ?
should i use threads to handle that problem ?is it going to be slow?
thank you for your help

OK heres the deal.
If you want a real fast feed parser go for Feedzirra. Does not work on windows. http://github.com/pauldix/feedzirra
Autodiscovery?
-Theres truffle-hog if you don't want to do GET redirects. http://github.com/pauldix/truffle-hog
-Theres feedbag if you want to do GET redirects to find feeds from given urls. This is slower though. http://github.com/damog/feedbag
Feedzirra is the best bet if you want to poll for new entries for your feed. But if you want a more non-polling solution to your problem then i would suggest going through the pubsubhubbub spec. Make sure while parsing your feeds they are pubsubhubbub enabled. Check for the link tag. If it points to pubsubhubbub.appspot.com or any other pubsub enabled hub then just subscribe to the feed by sending a subscription request to the hub. You can then define a endpoint in your app which will in turn receive updated entry pings for your feed subscription from the hub. Just read the raw POST data and store it in your database. Stats are that 95% of the blogger blogs are pubsub enabled. That is a lot of data in your hands already. :)
If you are polling for changes then you should check the last-modified or etag from the header rather than parse the entire feed again. Saves you from wasting resources. Feedzirra takes care of this for you.

I am not sure what you mean by "auto-detect" a new feed?
Are you looking for code that can discover when someone creates a new feed on a site? Or, do you mean discover when an existing feed has a new article?
The first is tough because your code needs to know what site to look at so it needs some sort of auto-discovery of sites with new feeds. Searching the google for "new rss feeds" doesn't return anything that looks useful, at least not on the first page. If you, or your users, know of a new site then you can have an interface to add new sites to search. Then you grab the page at that URL, look for the RSS/Atom auto-discovery links, and go from there. Auto-discovery links can open a can of worms because of duplicate content being served using different protocols (RDF, RSS and Atom), so you have to determine which to use, or multiple feeds with alternate content listed.
If you mean you want to discover when an existing feed has new articles, then you have to keep track of the last time your code looked at the feed, and the last article that was seen, then retrieve the feed and see if any articles were not in your list of previously seen articles. Your code needs to be sensitive to the time-to-live information in a lot of feeds too. Hitting the feed every fifteen minutes when they update once a week is bad form. Most aggregation code can do those things already but you might need to configure a database and tell the code how to find it.
Generally, for this sort of task I set up a crontab entry on a production Linux or Unix system and fire off the job periodically, looking in the database for feeds whose last-run-time plus the stored time-to-live value is in the past.
Does that help any?

Very easy solution is to use Dynamic attribute-based finders
When you are filling your model with RSS feed data, instead of Model.create(...) use Model.find_or_create_by_column(value, :other_column => other_value).
You can specify a date as unique value or RSS message title ... (whatever you want)
I think this is pretty easy. You can make some cron task to fill your model once per hour for example. Only new feeds will be added.
There is no chance to get some "event" when RSS is updated without downloading whole RSS feed again.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart