So i'm trying to scrape json that exists in a website source and use it in my own site.
Heres an example site:
view-source:http://www.viagogo.co.uk/Theatre-Tickets/Musicals/The-Lion-King/The-Lion-King-London-Tickets/E-1545516
If you look partway down there is a var eventListings
I would like to get all the code that exists in that var
So far all i have is this:
url = "http://www.viagogo.co.uk/Theatre-Tickets/Musicals/The-Lion-King/The-Lion-King-London-Tickets/E-1545516"
doc = open(url).read
Any ideas how i can get this?
Thanks
The code you have so far will (basically) function using open-uri from the Ruby standard library. As with any standard library module, require 'open-uri' at the top of the file in which you use it.
Open::URI treats it job as to give you the contents of the file. If you are comfortable using tools to search the raw text for the particular contents you are looking for, that may be enough. There are a few gems, though, that assume you are likely to get back HTML and to provide special support for finding HTML elements and inspecting their contents. This post uses mechanize which in turn is built on top of nokogiri. It is likely to be easier to write working code when using this library, but be aware that installing nokogiri may be difficult in your staging or production environment when making the decision to use it.
Related
I have surfed a couple of hours through the web but couldn't find any articles/walkthroughs/comparisons touching erb integration of webpacker. I've found 1 question, unfortunately, the author haven't read docs attentively and the answer was right there, so - no any additional info there.
I have seen plenty of articles about vue and react, but nobody says a word about erb. However, it's quite clear why using react/vue/else similiar, it is not with erb.
The theme is quite vast and I expect a little hate towards me, so I'd ask two related questions (but if you have something to tell more about it - that's appreciated).
As I understand - it's vanilla (plain) js (maybe with a flavour of jQuery) caring just about dom and styling, with all the preprocessing made by rails. If it is so why not just continue using sprockets?
And what are the reasons to choose it instead of some react/vue/else framework?
You may use both : a vanilla JS framework (React, Vue ...) and some erb files. I find it interesting to setup my constant and other configuration variables within a .js.erb file that is generated by my Rails app when building the js app.
Things I like to put in this erb files :
schemas of my api, generated by my serializers
constants, like enum
values to be used in forms
To generalize, you can put anything owned by the backend that will not change at run time
this save you a couple API calls to retrieve this data. However, I tend to stop doing this as your JS app and Rails become tightly coupled and you can't use the sources of your JS app outside the Rails app
I'm building a simple app with Rails using Markdown for storing content. My question is how to build internal [[wiki]] style links? Either by pre-processing before they get to markdown or some markdown derivative? I release I could probably preprocess using regex, but I'm guessing there are others with ready built solutions.
For example I know Instiki uses both markdown and [[wiki|Wiki]] links and I've looked but couldn't figure out how they're handling it.
Any tips?
If you are using the redcarpet gem you can either use a preprocessor or you can modifiy the generated HTML output.
Have a look at How to extend Redcarpet to support a media library. This article shows how to convert image references to custom HTML and also how to replace boilerplate identifiers with the actual content.
I guess both approaches could be adapted for your specific problem:
The renderer approach directly manipulates the generated HTML code from the markdown code. (This is more elegant as you are not messing with Markdown code)
The preprocess approach manipulates the code by using regular expressions (as you already mentioned) (This is more flexible, but also a little bit messy)
I'm testing it and Nokogiri does not seem to respect Robots.txt file. Is there someway to make it respect? It seems like common question, but I could not find any answer online.
Nokogiri parses the HTML or webpage that you give it. It does not know anything about the robots.txt file for the domain where the page you happen to have requested resides.
I presume that you want to ignore in-site links that are in robots.txt?
Since you've tagged this Rails, I'll assume you use Ruby. In that case you can use the Mechanize library which has the facility to use the robots.txt file.
There is also the original Perl version and other language ports if you prefer those.
In a rails app, given an external URL, I need to make a local copy of web pages not created by my app. Much like "save as" from a browser. I looked into system("wget -r -l 1 http://google.com") It might work, but it copies too much for the pages I tried (like 10x too much). I need to follow the link references to stuff to make the page display properly, but don't want to follow all the a href's to other pages. Any package out there?
This wget command usually works for me, so maybe it will for you:
wget -nd -pHEKk "http://www.google.com/"
But once you get all the files, you'll have to parse them for references to the base url and replace that with ./, which shouldn't be too hard (I don't do Ruby, so I'm not helping with that).
You could also use something like Nokogiri or HPricot. An example with Nokogiri:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://www.google.com/'))
This will give you an actual Ruby object that can be 'queried' using the associated methods.
I was just wondering if anyone knew of any good libraries for parsing .doc files (and similar formats, like .odt) to extract text, yet also keep formatting information where possible for display on a website.
Capability of doing similarly for PDFs would be a bonus, but I'm not looking as much for that.
This is for a Rails project, if that helps at all.
Thanks in advance!
Apache's POI is a very popular way to access Word and Excel documents. There's a Ruby POI binding that might be worth investigating, but it looks like you'll have to build it yourself. And the API doesn't seem very Ruby-like since it's virtually a direct port from the Java code. And it seems to only have been tested against Ruby 1.8.2.