I have a website that I need to access and get the content so I can parse it. This site has no API so I just have to access from the front end. The browser prompts me to log into the website but I do not know how to do this in ruby.
This works for websites that do not require authentication. I can NOT turn the authentication off.
file = open('https://website/')
contents = file.read
There are plenty of examples here mechanize example
If you are looking at http authentication then a similar post is here basic-and-form-authentication-with-mechanize-ruby
Use mechanize to make life easier.
Related
I am preparing to work on a project where I need to display a dashboard from an online application. Unfortunately, the use of an API is currently not possible. The dashboard can be embedded in an iFrame. However, when it is displayed it will prompt the user viewing the dashboard to login to an account.
I have one paid account to this service. Are there any rails gems to login to the service before the iFrame is processed?
Or would a proxy within my rails app be a better route to go?
Any pointers are appreciated!
Neither a Rails gems nor a proxy within your rails will work and they same have the same limitation.
They are both running on the back-end, server side.
The authentication you need is client side.
Unless you mean proxy the ENTIRE thing, the auth request and all subsequent requests and user interactions with this dashboard. That should work but (see below)
The way authentication works (pretty much universally) is: once you log in to any system, it stores a cookie on your browser and then the browser sends that cookie for every subsequent request.
If you authenticate on the backend, that cookie will be sent to your rails code and will die there, and the users browser will never know about it.
Also - it is not possible to do the auth server side and capture the cookie and then have the user browse the site with their browser directly, for two reasons:
Sometimes auth cookies use information about the browser or HTTP client to encrypt the cookie, so sending the same cookie from a different client wont work
You can not tell a browser to send a cookie to a domain different than your own.
So your options are, off the top of my head right now:
If there is a login page that accepts form submissions from other domains, you could try to simulate a form submission directly to that sites "after login" page. (The page the user gets directed to once they fill up the login form). Any modern web framework as XSRF protection (Cross Site Request Forgery protection) and will disallow this approach for security reasons.
See if the auth this site uses has any kind of OAUTH, Single Sign On (SSO) or similar type of authentication integration that you can do. (Similar to an API, so you may have already explored this option)
Proxy all requests to this site through your server. You will have to rewrite the entire HTML so that all images, CSS, stylesheets, and all other assets are also routed through the proxy or else the URLs are rewritten in the HTML to not be relative. You might hit various walls if a site wasn't designed for this use case. From things like the site using relative URL's for assets that you aren't proxying, the site referencing non-relative URL's causing cross-domain errors, etc. Note its really hard to re-write every single last assets reference, its not only the HTML you're worried about, Javascript can have URL's in it too, and CSS can as well.
You could write a bookmarklet or a browser extension that logs the user into the site.
Have everyone install Lastpass
Have everyone install the TamperMonkey browser extension (and others like it for other browser), and write a small User Script to run custom javascript automatically to log the user in on that site
Scrape that site for the info you need and serve it on your own site.
OK I'm out of ideas. :)
I am helping to create a Rails app that uses Ember for a front end MVC. For the app, it is hosting user content accessed via subdomains. On the subdomains, the user can upload custom JS and CSS. What I'm wondering about is if token authentication on the root domain will be safe if stored in Ember from the custom JS people could upload and run on their subdomains?
Provided the following:
Don't use cookies on *.domain.com or use cookies at all.
They can't run (or really display it unescaped in any way) the JS/CSS on your main site.
The ember app with your token doesn't run on their sub-domain (obviously).
They can't put HTML in a file with a different extension or even Content-Type on your subdomain (or you aren't using cookies). They could direct a user's web browser there and it'd display the HTML. Be wary of phishing though (looks like it's your secure content). I can't imagine you could prevent this easily other than not using cookies -- without 100% ensuring properly formatted JS/CSS which would present all kinds of problems.
You can limit cookies to domain.com and www.domain.com, but I don't recommend it (prone to mistakes). If you don't somebody can make a GET request through CSS or ie. an image tag (not to mention JavaScript) and it'll send the authenticated cookies to your server. Remember unescaped input in their app can leave holes too.
If your token is stored in ember, and they have access to custom JS where the app is running of course it'll leave your token vulnerable. If you run your ember app only on the www.domain.com, avoid cookies, storing the token only locally/in JS, you might be okay.
If they just put HTML code in a file with another extension and direct people there it'll be interpreted as HTML.
I'm using YoutubeAPI v3.0 to automatically upload videos to my own channel. However the script still needs manual intervention during Oath2.0 authorization. How to make it completely automatic?
1) Access the API using username and password
2) Or find a way to create permanent OAuth2.0 authentication
P/S: I use this script to upload
https://developers.google.com/youtube/v3/guides/uploading_a_video
The only thing I can think of is web scraping. Basically, programmatically open the web page and get its HTML. Then find the authorization code, and store it as a string. I don't know if your scripting language of choice can do it, but Python has Beautiful Soup (links at the bottom). The problem, of course, is accessing the contents of a page like that which is pretty clearly designed to be reached by a logged in user from a web browser. I've never done that, but there's some concept of a "login handshake" where you post the data to the server that's needed as you access the page. I've a few links at the bottom.
Anyway, to give you a better idea of what I mean in pseudo-code (for those who may be confused), it'd be something like:
webURL = 'http://any-url.net";
webPageObject = openPage(webURL);
pageHTML = webPageObject.getHTML();
theHTMLTag = searchForTagById(pageHTML, "<p id='oAuthMessage'>");
//And from there, figure out where the string containing the code is.
//Probably just by getting a substring from the end of the text in the <p>
//backward until you reach the length of the oAuth code.
You'll have to look at the page source to know which tags to look for specifically, but this can all just be done programmatically/automatically, as you wanted.
Links:
Login handshake - Scraping from a website that requires a login?
Beautiful Soup - http://www.crummy.com/software/BeautifulSoup/
google.gov/webScraping - https://www.google.com/search?ie=UTF-8&oe=utf-8&q=how+to+web+scrape+logged+in+page
You can use get Google OAUTH2 for devices in order to have fully automatic token renewal process.
So all you need now is:
Request a device code and confirmation code
Enter confirmation code to confirm your application have access for specific account
Generate new or renew existing ACCESS_TOKEN for your device code
Upload Video using your device code and valid ACCESS_TOKEN
Here is documentation for it.
And here is some examples.
I am trying to figure out a way to post a tweet via just JavaScript to twitter for a personal project of mine. It won't be publicly accessible, so I'm not worried about security, which is normally something you have to worry a lot about with OAuth through JS.
Essentially I want to: redirect to twitter OAuth login -> accept -> redirect back to my own page -> tweet/do something.
I have been looking forever, and I can't seem to find a way to do this (I can't find any good JavaScript OAuth libraries either). I wouldn't have imagined it would be this difficult.
Any recommendations/solutions?
As far I understand, custom origin server with cloudfront only works if cloudfront is able to access files from my website url:
eg: www.domain.com/hello.html
However, my website has a login requirement in order to view hello.html. How can I have the login mechanism and still cache my real hello.html page in cloudfront using custom origin server?
I am using Ruby on Rails btw, but this is applicable to other stacks as well.
I'm pretty sure this is not possible. As you said, CloudFront has be to able to access the file to serve and cache it. I never saw an option to tell CloudFront to use a password to access the file.
An idea: maybe you can check in your Rails app, before you require the user to enter a password, if the request comes from CloudFront (I'm sure there are some headers indicating that) and, if so, bypass the login requirement?
Edit:
It says in the docs:
Do not configure your origin server to request client authentication.
One thing I'm pretty sure set though is the User Agent. Check for user_agent =~ /cloudfront/iand bypass authentication?