Scraping rake task seemingly suffering from unwanted caching

Scraping rake task seemingly suffering from unwanted caching - ruby-on-rails

I'm stumped!
I have a rake task which is cron'd to run every minute.
It's logs in, it finds the JSON that I'm interested in but can take up to 30 runs of the task before any changes in the JSON are noticed in the rake task. During which time I've missed several changes of certain JSON objects.
Seems like there's some caching going on, I've tried to turn off Mechanize caching as shown, just not sure what else I can try now.
Any pointers?
Thanks in advance.
agent = Mechanize.new # {|a| a.log = Logger.new(STDERR) }
agent.history.clear
agent.max_history = 0
agent.user_agent_alias = 'Mac Safari'
page = agent.get 'http://website.com'
form = page.forms.first
form.email = 'me#home.com'
form.password = 'mypassword'
page = agent.submit form
page = agent.get 'http://website.com/password_protected_page'
jsonDirty = page.search '//script[#type="application/json"]'
Response from server:
{"server"=>"nginx", "date"=>"Thu, 13 Sep 2012 14:16:43 GMT", "content-type"=>"text/html; charset=utf-8", "connection"=>"close", "vary"=>"Cookie", "content-language"=>"plfplen", "set-cookie"=>"csrftoken=pVDg2SJ4KHqONz2OiEkNK7IbKlnJSQQf; expires=Thu, 12-Sep-2013 14:16:43 GMT; Max-Age=31449600; Path=/, affiliate=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/, one-click-join=; expires=Thu,01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/", "expires"=>"Thu, 01 Jan 1970 00:00:01 GMT", "cache-control"=>"no-cache", "content-encoding"=>"gzip", "transfer-encoding"=>"chunked"}

You could try appending a random query parameter to the URL. Such as:
page = agent.get "http://website.com/password_protected_page?random=#{Time.now.to_i}"

Related

I am trying to integrate Xero API to ruby on rails however am unable to access the Token Set

I am integrating Xero API into my ruby on rails application, however when trying to get a Token Set I am getting the below error:
*** XeroRuby::ApiError Exception: Error message: the server returns an error
HTTP status code: 400
Response headers: {"content-type"=>"application/json; charset=UTF-8", "server"=>"nginx", "xero-origin-id"=>"IdentityServer.Web", "xero-causation-id"=>"63c9c7b03d7f435aa5dd801d96e8c152", "xero-message-id"=>"9e6951801da040ac94f61ab3392e3feb", "xero-activity-id"=>"adbfa8002310478c9d223f71e47b5b17", "xero-correlation-id"=>"6069553f3b9843f992e54f59f7bde8c0", "content-length"=>"26", "expires"=>"Wed, 01 Feb 2023 12:40:41 GMT", "cache-control"=>"max-age=0, no-cache, no-store", "pragma"=>"no-cache", "date"=>"Wed, 01 Feb 2023 12:40:41 GMT", "connection"=>"close", "set-cookie"=>"Device=961bc311701f4e20a0a2a7c0b6900dad; expires=Tue, 01 Feb 2028 12:40:41 GMT; path=/; samesite=none; httponly, _abck=CF3C26F9436304BF37101493FFAD83AF~-1~YAAQscNQaDKxnNKFAQAAPFz+DAnlKYqiZGnjCEbMvoyet1jSR8zH92SopvoLwB4qij7m04HY3vz38HatmtYuYAgN43HShtEj4miB94A9kiGQvrTNgMw9fNcpXV5sZ7JVNARjYdFjRYo0hU/n+qpWeEFH5OgBb8gzYVcOP5KqhPLgOd2ctiJrhmWiEmaeNZVbKj/spi60wt24oTv4jeWSplHq+i1LIzvPWLsVSU8RGKddmx+w7QnmtuWgbogouQljdvXS2Hrp9jDQsQXbvC9cWLy7A4AINQy7DLKP53mRgbqhdl7rG4Zyy8Bkv8nuxJvboM1MmdmorDngUVMNKkxpfdrWfJB5dv1Dbs3BOxJS2s9lRN56ugyI~-1~-1~-1; Domain=.xero.com; Path=/; Expires=Thu, 01 Feb 2024 12:40:41 GMT; Max-Age=31536000; Secure, bm_sz=D5B91089FE0B5C15AAF78A78C3DC4631~YAAQscNQaDOxnNKFAQAAPFz+DBKO2MjN+su/jV34lpo0F8Da/HIe1gG6gWfP7mzR6F7LwAPpRmm2lbXSjxw8/92CaTTcdsebzypKwiiowvOYOOI5/2TdwwcrU2bLSe9jN9YgUIS5izdAcysuz8S4pjx5OnNVe1HvhmUOeX8P/njVXeF7sQbFwmoAz3HyAO2AbJK0FGybHT8Spbfujl91GJ8+8YUf8voUQObj8r7o3K3GbWCycMG0lp6yupNoF7qfkPEuIl2vzMNCF0m2ZLH9a+akzpzc14KqjSwuz3k+++NK~3551286~3225656; Domain=.xero.com; Path=/; Expires=Wed, 01 Feb 2023 16:40:41 GMT; Max-Age=14400"}
Response body: {"error":"invalid_client"}
Please see the command below which I used to try an receive the Token Set:
#token_set = #xero_client.get_token_set_from_callback(params[:code])
class HomeController < ApplicationController
def index
require 'xero-ruby'
require 'httparty'
creds = {
client_id: '...',
client_secret: '...',
redirect_uri: 'http://localhost:3000/login',
scopes: 'accounting.attachments',
state: "Optional value to pass through auth flow"
}
config = { timeout: 30, debugging: true }
#xero_client ||= XeroRuby::ApiClient.new(credentials: creds, config: config)
#authorization_url = #xero_client.authorization_url
end
end
Login controller:
class LoginController < ApplicationController
def index
require 'xero-ruby'
creds = {
client_id: '...',
client_secret: '...',
redirect_uri: 'http://localhost:3000/login',
scopes: 'accounting.attachments',
state: "Optional value to pass through auth flow"
}
config = { timeout: 30, debugging: true }
#xero_client ||= XeroRuby::ApiClient.new(credentials: creds, config: config)
byebug
#token_set = #xero_client.get_token_set_from_callback(params[:code])
end
end

Please can you check your client secret as the error linked to the Xero correlation id is "Client secret validation failed for client"
It may be best to generate a new one from developer.xero.com and then replace this into your code.
We also have a sample app you can use if this makes it easier https://github.com/XeroAPI/Xero-ruby-oauth2-app

Does ruby strip headers from response?

I am fetching html content directly from my blog as:
response = Net::HTTP.get_response(uri)
respond_to do |format|
format.html { render :text => response.body }
end
Although at the blog engine (WordPress) I am adding header Access-Control-Allow-Origin: * how ever I noticed that its not passed within the response.
However, if I use postman to get the page or view the page into browser directly, I can see that the header is there.
EDIT
I can see other headers passed, ex:
cache-control: no-cache, must-revalidate, max-age=0
content-type: text/html; charset=UTF-8
date: Tue, 24 Jul 2018 06:37:57 GMT
expires: Wed, 11 Jan 1984 05:00:00 GMT
Any idea?

response.body will return you body part not header part. you can convert response to hash and check header like below:
> url = "https://stackoverflow.com/questions/51492025/does-ruby-strip-headers-from-response"
> uri = URI.parse(url)
> response = Net::HTTP.get_response(uri)
#=> #<Net::HTTPOK 200 OK readbody=true>
> response.to_hash
#=> {"cache-control"=>["private"], "content-type"=>["text/html; charset=utf-8"], "last-modified"=>["Tue, 24 Jul 2018 07:04:00 GMT"], "x-frame-options"=>["SAMEORIGIN"], "x-request-guid"=>["22a4b6b6-3039-46e2-b4de-c8af7cad6659"], "strict-transport-security"=>["max-age=15552000"], "content-security-policy"=>["upgrade-insecure-requests"], "accept-ranges"=>["bytes", "bytes"], "age"=>["0", "0"], "content-length"=>["31575"], "date"=>["Tue, 24 Jul 2018 07:04:46 GMT"], "via"=>["1.1 varnish"], "connection"=>["keep-alive"], "x-served-by"=>["cache-bom18221-BOM"], "x-cache"=>["MISS"], "x-cache-hits"=>["0"], "x-timer"=>["S1532415886.990199,VS0,VE280"], "vary"=>["Accept-Encoding,Fastly-SSL"], "x-dns-prefetch-control"=>["off"], "set-cookie"=>["prov=a7dfe911-76a1-f1c1-093b-3fc8fe79af65; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly"]}
You can access specific header as below by passing header name:
> response['Cache-Control']
#=> "private"
for more details read: https://ruby-doc.org/stdlib-2.5.1/libdoc/net/http/rdoc/Net/HTTP.html

If you want to pass through headers that are being served from the host that you're fetching from, you first need to stash the response from your blog in a different variable name. Let's call it blog_response (this is because response is a preexisting special method name in a rails controller instance.).
blog_response = Net::HTTP.get_response(uri)
Then you need to grab the header you care about from the blog_response like this:
header_name, header_value = blog_response.each_header.find do |name, value|
name =~ /pattern-matching-a-header-name-i-care-about/i #case insensitive regex matching recommended for http headers
end
Then you need to set them in your controller before you render the response, e.g.:
response.headers[header_name] = header_value
respond_to do |format|
format.html { render :text => blog_response.body }
end
This example is obviously only for one header, but you can copy multiple headers by just iterating through, matching and setting them in your response like so:
blog_response.each_header.select do |name, value|
if name =~ /pattern-matching-header-names-i-care-about|some-other-pattern-i-care-about/i #case insensitive regex matching recommended for http headers
response.headers[name] = value
end
end
If you want to pass all headers through just do this:
blog_response.each_header do |name, value|
response.headers[name] = value
end
respond_to do |format|
format.html { render :text => blog_response.body }
end

Net::HTTPResponse (that is your response) mixes in Net::HTTPHeader. Thus, you can get an individual header as response['Access-Control-Allow-Origin'], iterate over them with response.each_header, or even get them all as a hash using response.to_hash.

Rails: Parsing HTTParty

I've read other answers on this topic, such as:
Parsing HTTParty response
HTTParty parsing JSON in Rails
However, I still can't figure out how to parse a response I'm receiving.
response.parsed_response = HTTParty.get(url, query: params) returns:
=> #<HTTParty::Response:0x89435d0 parsed_response="http://foo.com", #response=#<Net::HTTPOK 200 OK readbody=true>, #headers={"cache-control"=>["no-cache", "no-store"], "date"=>["Tue, 21 Feb 2017 23:10:47 GMT"], "expires"=>["Thu, 01 Jan 1970 00:00:00 GMT"], "p3p"=>["CP=\"ALL IND DSP COR CUR ADM TAIo PSDo OUR COM INT NAV PUR STA UNI\""], "pragma"=>["no-cache"], "server"=>["Apache-Coyote/1.1"], "set-cookie"=>["bar.Agent.p=c1921b97d1f8a0918621c48bd32ded2b; Domain=.bar.com; Expires=Fri, 19-Feb-2027 23:10:47 GMT; Path=/"], "content-length"=>["366"], "connection"=>["Close"]}>
I need the URL that appears after parsed_response. The other answers seemed to break down hashes that appear after parsed_response, but I'm just looking for the url that appears after parsed_response (and it only appears there in the response).
I tried:
puts response which returns the entire response above.
puts response.parsed_response which returns:
http://foo.com
=> nil

This usually works for me
response = HTTParty.get(url, options)
puts response.body

Post request to eventbrite API via uri in rails

I'd like to send a post request via URI to the eventbrite API in order to receive a user access key. Documented here: https://www.eventbrite.com/developer/v3/reference/authentication/
You must then exchange this access code for an OAuth token. Send a
POST request to:
https://www.eventbrite.com/oauth/token This POST must contain the
following urlencoded data, along with a Content-type:
application/x-www-form-urlencoded header:
code=THE_USERS_AUTH_CODE&client_secret=YOUR_CLIENT_SECRET&client_id=YOUR_API_KEY&grant_type=authorization_code
I try to translate that into rails and sending a post request via URI. The response is expected to be the authorization code:
require "uri"
require "net/http"
params = {'code' => current_user.eventbrite_key, 'client_secret' => 'XXXX', 'client_id' => 'XXXX', 'grant_type' => 'authorization_code' }
response = Net::HTTP.post_form(URI.parse('https://www.eventbrite.com/oauth/token'), params)
This isn't working (http bad request).
I researched that the default content type is already "application/x-www-form-urlencoded" so I would not have to define that in my request. The 'code' should be correct as I simply fetch it before with help of their callback URL. Other credentials should also be correct.
The response is the following:
<Net::HTTPBadRequest:0x007ff8fcc9e4b8>
"{\"server\":[\"nginx\"],\"date\":[\"Wed, 08 Jul 2015 14:48:19
GMT\"],\"content-type\":[\"application/json\"],\"transfer-encoding\":[\"chunked\"],\"connection\":[\"keep-alive\"],\"x-xss-protection\":[\"1;
mode=block\"],\"x-content-type-options\":[\"nosniff\"],\"x-ua-compatible\":[\"IE=edge\"],\"p3p\":[\"CP=\\"NOI
ADM DEV PSAi COM NAV OUR OTRo STP IND
DEM\\"\"],\"x-frame-options\":[\"SAMEORIGIN\"],\"set-cookie\":[\"mgrefby=;
Domain=.eventbrite.com; expires=Thu, 07-Jul-2016 14:48:19 GMT;
httponly; Max-Age=31536000;
Path=/\",\"G=v%3D2%26i%3D54a93968-6bc1-486a-b401-fedab0b33dc4%26a%3D5f9%26s%3D56f73cc9c4519dc0d05f6518a092e66c6c83516c;
Domain=.eventbrite.com; expires=Thu, 07-Jul-2016 14:48:19 GMT;
httponly; Path=/\",\"ebEventToTrack=; expires=Thu, 01-Jan-1970
00:00:00 GMT; Max-Age=0;
Path=/\",\"SS=AE3DLHS8ESSoJhhYoWTny-enqBu_PN4d5A;
Domain=.eventbrite.com; httponly; Path=/;
secure\",\"eblang=lo%3Den_US%26la%3Den-us; Domain=.eventbrite.com;
expires=Thu, 07-Jul-2016 14:48:19 GMT; httponly; Path=/\",\"AN=;
expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0;
Path=/\",\"mgref=typeins; Domain=.eventbrite.com; expires=Thu,
07-Jul-2016 14:48:19 GMT; httponly; Max-Age=31536000;
Path=/\",\"SP=AGQgbblORi0c9X3owNbUIuFSZeUwSlY9HoUdpypGreork-Gf0GI6rzrLrcQDGWvu49mxHIQW9iBqa6JR-1k0eGvBhwnNpaON_Aak96kQ1yu90CaN7P2lnvfddxfskEniVHppbf0rp8YL5PA4vLYzRiaWdSohVy73j8H6HlCakht1OfKyxvwG-FeyR5rwPFEJw0iGB71Azw3oyFOTJcGJcYMWdSSVgS3F6pEbV5QI4ps5WlNMW0C9uL0;
Domain=.eventbrite.com; httponly; Path=/\",\"SERVERID=djc11;
path=/\"]}"
URI.parse('https://www.eventbrite.com/oauth/token') without params returns:
{"scheme":"https","user":null,"password":null,"host":"www.eventbrite.com","port":443,"path":"/oauth/token","query":null,"opaque":null,"fragment":null,"parser":{"regexp":{"SCHEME":"(?-mix:\A[A-Za-z][A-Za-z0-9+\-.]\z)","USERINFO":"(?-mix:\A(?:%\h\h|[!$\u0026-.0-;=A-Z_a-z~])\z)","HOST":"(?-mix:\A(?:(?\u003cIP-literal\u003e\[(?:(?\u003cIPv6address\u003e(?:\h{1,4}:){6}(?\u003cls32\u003e\h{1,4}:\h{1,4}|(?\u003cIPv4address\u003e(?\u003cdec-octet\u003e[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]|\d)\.\g\u003cdec-octet\u003e\.\g\u003cdec-octet\u003e\.\g\u003cdec-octet\u003e))|::(?:\h{1,4}:){5}\g\u003cls32\u003e|\h{,4}::(?:\h{1,4}:){4}\g\u003cls32\u003e|(?:(?:\h{1,4}:)?\h{1,4})?::(?:\h{1,4}:){3}\g\u003cls32\u003e|(?:(?:\h{1,4}:){,2}\h{1,4})?::(?:\h{1,4}:){2}\g\u003cls32\u003e|(?:(?:\h{1,4}:){,3}\h{1,4})?::\h{1,4}:\g\u003cls32\u003e|(?:(?:\h{1,4}:){,4}\h{1,4})?::\g\u003cls32\u003e|(?:(?:\h{1,4}:){,5}\h{1,4})?::\h{1,4}|(?:(?:\h{1,4}:){,6}\h{1,4})?::)|(?\u003cIPvFuture\u003ev\h+\.[!$\u0026-.0-;=A-Z_a-z~]+))\])|\g\u003cIPv4address\u003e|(?\u003creg-name\u003e(?:%\h\h|[!$\u0026-.0-9;=A-Z_a-z~])))\z)","ABS_PATH":"(?-mix:\A\/(?:%\h\h|[!$\u0026-.0-;=#-Z_a-z~])(?:\/(?:%\h\h|[!$\u0026-.0-;=#-Z_a-z~]))\z)","REL_PATH":"(?-mix:\A(?:%\h\h|[!$\u0026-.0-;=#-Z_a-z~])+(?:\/(?:%\h\h|[!$\u0026-.0-;=#-Z_a-z~]))\z)","QUERY":"(?-mix:\A(?:%\h\h|[!$\u0026-.0-;=#-Z_a-z~\/?])\z)","FRAGMENT":"(?-mix:\A(?:%\h\h|[!$\u0026-.0-;=#-Z_a-z~\/?])\z)","OPAQUE":"(?-mix:\A(?:[^\/].)?\z)","PORT":"(?-mix:\A[\x09\x0a\x0c\x0d
]\d*[\x09\x0a\x0c\x0d ]*\z)"}}}

Page isn't reloaded after clicking back button in browser

When clicking the back button in my browser, the url is changed but the page content is stale/cached.
I've tried using:
def set_no_cache
response.headers["Cache-Control"] = "no-cache, no-store, max-age=0, must-revalidate"
response.headers["Pragma"] = "no-cache"
response.headers["Expires"] = "Fri, 01 Jan 1990 00:00:00 GMT"
end
but it doesn't seem to help.
Would be glad for any thoughts/suggestions on how to get this to work properly...

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Scraping rake task seemingly suffering from unwanted caching - ruby-on-rails

You could try appending a random query parameter to the URL. Such as: page = agent.get "http://website.com/password_protected_page?random=#{Time.now.to_i}"

Related

I am trying to integrate Xero API to ruby on rails however am unable to access the Token Set

Does ruby strip headers from response?

Rails: Parsing HTTParty

Post request to eventbrite API via uri in rails

Page isn't reloaded after clicking back button in browser

Categories

Resources