Mechanize - How to follow or "click" Meta refreshes in rails - ruby-on-rails

I have a bit trouble with Mechanize.
When a submit a form with Mechanize. I am come to a page with one meta refresh and there is no links.
My question is how do i follow the meta refresh?
I have tried to allow meta refresh but then i get a socket error.
Sample code
require 'mechanize'
agent = WWW::Mechanize.new
agent.get("http://euroads.dk")
form = agent.page.forms.first
form.username = "username"
form.password = "password"
form.submit
page = agent.get("http://www.euroads.dk/system/index.php?showpage=login")
agent.page.body
The response:
<html>
<head>
<META HTTP-EQUIV=\"Refresh\" CONTENT=\"0;URL=index.php?showpage=m_frontpage\">
</head>
</html>
Then I try:
redirect_url = page.parser.at('META[HTTP-EQUIV=\"Refresh\"]')[
"0;URL=index.php?showpage=m_frontpage\"][/url=(.+)/, 1]
But I get:
NoMethodError: Undefined method '[]' for nil:NilClass

Internally, Mechanize uses Nokogiri to handle parsing of the HTML into a DOM. You can get at the Nokogiri document so you can use either XPath or CSS accessors to dig around in a returned page.
This is how to get the redirect URL with Nokogiri only:
require 'nokogiri'
html = <<EOT
<html>
<head>
<meta http-equiv="refresh" content="2;url=http://www.example.com/">
</meta>
</head>
<body>
foo
</body>
</html>
EOT
doc = Nokogiri::HTML(html)
redirect_url = doc.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1]
redirect_url # => "http://www.example.com/"
doc.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1] breaks down to: Find the first occurrence (at) of the CSS accessor for the <meta> tag with an http-equiv attribute of refresh. Take the content attribute of that tag and return the string following url=.
This is some Mechanize code for a typical use. Because you gave no sample code to base mine on you'll have to work from this:
agent = Mechanize.new
page = agent.get('http://www.examples.com/')
redirect_url = page.parser.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1]
page = agent.get(redirect_url)
EDIT: at('META[HTTP-EQUIV=\"Refresh\"]')
Your code has the above at(). Notice that you are escaping the double-quotes inside a single-quoted string. That results in a backslash followed by a double-quote in the string which is NOT what my sample uses, and is my first guess for why you're getting the error you are. Nokogiri can't find the tag because there is no <meta http-equiv=\"Refresh\"...>.
EDIT: Mechanize has a built-in way to handle meta-refresh, by setting:
agent.follow_meta_refresh = true
It also has a method to parse the meta tag and return the content. From the docs:
parse(content, uri)
Parses the delay and url from the content attribute of a meta tag. Parse requires the uri of the current page to infer a url when no url is specified. If a block is given, the parsed delay and url will be passed to it for further processing.
Returns nil if the delay and url cannot be parsed.
# <meta http-equiv="refresh" content="5;url=http://example.com/" />
uri = URI.parse('http://current.com/')
Meta.parse("5;url=http://example.com/", uri) # => ['5', 'http://example.com/']
Meta.parse("5;url=", uri) # => ['5', 'http://current.com/']
Meta.parse("5", uri) # => ['5', 'http://current.com/']
Meta.parse("invalid content", uri) # => nil

Mechanize treats meta refresh elements just like links without text. Thus, your code can be as simple as this:
page = agent.get("http://www.euroads.dk/system/index.php?showpage=login")
page.meta_refresh.first.click

Related

How to get content value out of meta tag in ruby on rails?

I have this list of meta tags in my view HTML (after the page loads). The tag is generated dynamically,
#meta = "\n <meta content=\content1\">\n <meta content=\content2\">\n <meta content='content2\">\n ....... <meta content=\"2019/01/10 09:59:59 +0900\" name=\"r_end\">\n \n"
I wanted to fetch the value 2019/01/10 09:59:59 +0900 inside content i.e.<meta content=\"2019/01/10 09:59:59 +0900\" name=\"r_end\">. Is there a way to get the value of the content from the meta tag.
Given a #meta variable containing some HTML snippet as a string:
#meta = <<-HTML
<meta name="foo" content="content1">
<meta name="bar" content="content2">
<meta content="2019/01/10 09:59:59 +0900" name="r_end">
HTML
You can use Nokogiri to parse it:
require 'nokogiri'
doc = Nokogiri::HTML::DocumentFragment.parse(#meta)
doc.at_css('meta[name="r_end"]')['content']
#=> "2019/01/10 09:59:59 +0900"
at_css returns the first element matching the given CSS selector and [] returns the value for the given attribute.
How about a using simple regular expression to capture the value using String#scan.
This will work only if the name of metatag doesn't change
#meta = "\n <meta content=\content1\">\n <meta content=\content2\">\n <meta content='content2\">\n ....... <meta content=\"2019/01/10 09:59:59 +0900\" name=\"r_end\">\n \n"
#meta.scan(/content=\"(.*)\" name=\"r_end\"/)
#=> [["2019/01/10 09:59:59 +0900"]]
Explanation:
The above code will capture the value of content with metatag name="r_end"
If you think there might be some other HTML elements with name="r_end" you might need to add some other identifier in the regex

Rails not rendering public/index.html file; blank page in browser

I have a problem with my Rails + React app when I deploy it to Heroku. The React client is inside a client/ directory of the Rails app. Due to using react-router, the Rails server needs to know to render the index.html from the React build. When I deploy the client on Heroku, a script copies the content from client/build/. to the Rails app's public/ dir.
Now here is the problem: when my route detects a path like example.com/about it tries to render public/index.html. Here is the method:
def fallback_index_html
render file: "public/index.html"
end
However, the contents from this file are not sent to the browser. I get a blank page. I have added a puts "hit fallback_index_html" in the method and confirmed that this method is being hit. I have also opened the file in puts each line to confirm the file has the required html (this is what appeared in the logs from that puts and what SHOULD be sent to the browser):
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1,shrink-to-fit=no">
<meta name="theme-color" content="#000000">
<link rel="manifest" href="/manifest.json">
<link rel="shortcut icon" href="/favicon.ico">
<title>Simple Bubble</title>
<link href="/static/css/main.65027555.css" rel="stylesheet">
</head>
<body><noscript>You need to enable JavaScript to run this app.</noscript>
<div id="root"></div>
<script type="text/javascript" src="/static/js/main.21a8553c.js"></script>
</body>
</html>
The most recent fix I tried was going into config/environments/production.rb and changing config.public_file_server.enabled to true. This did not help.
I'm using Rails API, so my ApplicationController inherits from ActionController::API instead of ActionController::Base.
From Rails API docs it says:
The default API Controller stack includes all renderers, which means you can use render :json and brothers freely in your controllers. Keep in mind that templates are not going to be rendered, so you need to ensure your controller is calling either render or redirect_to in all actions, otherwise it will return 204 No Content.
Thus Rails API only cannot render HTML! The following allowed me to render the html without including everything from ActionController::Base.
class ApplicationController < ActionController::API
include ActionController::MimeResponds
def fallback_index_html
respond_to do |format|
format.html { render body: Rails.root.join('public/index.html').read }
end
end
end
The reason I am including ActionController::MimeResponds is to have access to the respond_to method.
My Rails application now renders index.html from my public directory when a subdirectory is hit and my React client / react-router takes over from there.

Why is Rails using the default debug_exception_response_format in test mode?

Given that I've set the following in config/environments/test.rb (I know I don't need to, but I just want to be certain):
config.debug_exception_response_format = :api
Why are exceptions triggered by my Cucumber features coming back as HTML?
When the admin attempts to create a new vendor # features/step_definitions/vendor_steps.rb:9
743: unexpected token at '<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Action Controller: Exception caught</title>
<style>
body {
background-color: #FAFAFA;
color: #333;
margin: 0px;
}
Shouldn't I be getting JSON back in this case or am I misunderstanding something?
EDIT:
Per some of the comments below, I've verified that the Content-Type is application/json and that config.debug_exception_response_format is configured correctly in development and staging environments. Unfortunately, I'm still seeing this issue.
From within the affected controller:
(byebug) request.headers["Content-Type"]
"application/json"
config.debug_exception_response_format = :api
means
To render debugging information preserving the response format, use
the value :api.
Rails doc
Given this is a feature spec for admin, therefore the request is very likely to be a HTML request, so the response will also be in HTML format.
Edit
Even your request format is explicitly set as json, you still need to explicitly define the response format to be json. Because without defining the response format explicitly something like the following, it would still be in HTML format regardless the request type.
respond_to do |format|
format.html ...
format.json ...
end
or
render json: {hello: 'world'}
You need to set this inside of your config/environments/development.rb as well.
config.debug_exception_response_format = :api
Even if you are running your application in api mode, if your application was created as a default rails application then your development exception response will run in the default format response of rendering HTML/XHR.
EDIT:
Can you try adding the following test helpers?
def api_get action, params={}, version="1"
get "/api/v#{version}/#{action}", params
JSON.parse(response.body) rescue {}
end
def api_post action, params={}, version="1"
post "/api/v#{version}/#{action}", params
JSON.parse(response.body) rescue {}
end
def api_delete action, params={}, version="1"
delete "/api/v#{version}/#{action}", params
JSON.parse(response.body) rescue {}
end
def api_put action, params={}, version="1"
put "/api/v#{version}/#{action}", params
JSON.parse(response.body) rescue {}
end
Did you take the byebug breakpoint out to ensure that it doesn't have anything to do with that gem?

How to know sent URL with Grails REST Client Builder Plugin RestBuilder.get()?

Now, I'm developing Grails plugin for simplify use of Amazon PAAPI in Grails apps.
The goal of this plugin is provide convenient TagLib to doing Amazon PAAPI Operation like shown below.
<paapi:img idType="ISBN" itemId="4048668161" relationshipType="AuthorityTitle" size="medium" alt="alttext" />
The code will rendered like below.
<img src="http://mediumimageurl.jpg" alt="alttext" />
The Taglib need means connection to Amazon PAAPI. I choosed Grails REST Client Builder Plugin for that.
And I written below code. This is service method.
def itemLookup(
String condition,
String idType,
String itemId,
String merchantId,
String offerPage,
String relatedItemsPage,
String relationshipType,
String reviewPage,
String reviewSort,
String searchIndex,
String tagPage,
String tagsPerPage,
String tagSort,
String variationPage,
ResponseGroup responseGroup) {
def associateId = grailsApplication.config.grails.plugin.foo.amazonpaapi.associateId
def paapiAccessKey = grailsApplication.config.grails.plugin.foo.amazonpaapi.paapiAccessKey
def paapiSecretAccessKey = grailsApplication.config.grails.plugin.foo.amazonpaapi.paapiSecretAccessKey
def rest = new RestBuilder()
def resp = rest.get(
"http://ecs.amazonaws.com/onca/xml",
[
Service:'AWSECommerceService',
AWSAccessKeyId:paapiAccessKey,
AssociateTag:associateId,
Operation:'ItemLookup',
Condition:condition,
IdType:idType,
ItemId:itemId,
MerchantId:merchantId,
OfferPage:offerPage,
RelatedItemsPage:relatedItemsPage,
RelationshipType:relationshipType,
ReviewPage:reviewPage,
ReviewSort:reviewSort,
SearchIndex:searchIndex,
Tagage:tagPage,
TagsPerPage:tagsPerPage,
TagSort:tagSort,
VariationPage:variationPage,
ResponseGroup:responseGroup.getLabel()
]) {
accept "application/xml"
}
return resp.xml
}
The code is failed.
I got below result. It had entered resp.txt.
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html> <head> <title>400 Bad Request</title> </head> <body> <div align=center> <img src="http://g-images.amazon.com/images/G/01/icons/amazon-logo.gif" width=140 height=30 alt="Amazon.com" border=0><br> </div> <h1>Bad Request</h1> <p>Parameter Operation is missing</p> </body> </html>
The problem is the thing you do not know why did this happen.
I want know, what URL sent to Amazon PAAPI from RestBuilder.get(). And I didn't find it.
Do you have any means to know how to do this?
Self solved.
Finally, I could not find means to know REST request url.
But, This error is solved. I found error message in request from Amazon
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html> <head> <title>400 Bad Request</title> </head> <body> <div align=center> <img src="http://g-images.amazon.com/images/G/01/icons/amazon-logo.gif" width=140 height=30 alt="Amazon.com" border=0><br> </div> <h1>Bad Request</h1> <p>Parameter Operation is missing</p> </body> </html>
Parameter Operation is missing
I can resolve this error like below.
def resp = rest.get('http://ecs.amazonaws.com/onca/xml?Operation={Operation}') {
urlVariables Operation:'ItemLookup'
}
Error seems to have been returned because there is no Operation parameters is essential.

Nokogiri parsing for metawords

I know this question has been asked earlier but I am not able to get the parsed result. I am trying to parse metawords using nokogiri, can any one point out my mistake?
keyword = []
meta_data = doc.xpath('//meta[#name="Keywords"]/#content') #parsing for keywords
meta_data.each do |meta|
keyword << meta.value
end
key_str=keyword.join(",")
I tried running this in irb as well but keyword returns a nil.
This is how I used it in irb
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::HTML("www.google.com")
have already tried alternatives from other stackoverflow posts like
Nokogiri html parsing question but of no use, they still return nil. I guess i am doing something wrong somewhere.
www.google.com does not have any meta keywords in the source. View Source on the page to see for yourself. So even if everything else went perfectly, you'd still get no results there.
The result of doc = Nokogiri::HTML("www.google.com") is
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>www.google.com</p></body></html>
If you want to fetch the contents of a URL, you want to use something like:
require 'open-uri'
doc = Nokogiri::HTML( open('http://www.google.com' ) )
If you get a valid HTML page, and use the proper casing on keywords to match the source, it works fine. Here's an example from my IRB session, fetching a page from one of the apps on my site that happens to use name="keywords" instead of name="Keywords":
irb(main):001:0> require 'open-uri'
#=> true
irb(main):002:0> require 'nokogiri'
#=> true
irb(main):003:0> url = "http://pentagonalrobin.phrogz.net/choose"
#=> "http://pentagonalrobin.phrogz.net/choose"
irb(main):04:0> doc = Nokogiri::HTML( open(url) ); nil # don't show doc here
#=> nil
irb(main):005:0> doc.xpath('//meta[#name="keywords"]/#content').map(&:value)
#=> ["team schedule free round-robin league"]

Resources