Nutch and Http POST authentication? - post

I'm stuck at the point where I need to crawl websites that have a form post.
Nutch does not support this.
How do I get around this so I can crawl these websites using Nutch? Is there a better solution?

Make a file with data: regex for URLs requiring auth / URL to submit form / form data
Make own http protocol plugin modifying standard protocol-httpclient plugin. If URL to make http request is requiring auth and no auth made yet, so go to form and send it.
Here's the simplest solution. The problem is, there is no one simple solution for big amount of websites. There are problems with cookie expiring / using of Javascript during login / etc. Search through Nutch's JIRA, there were many discussions about that.

Here is the answer that you guys are looking for:
http://lifelongprogrammer.blogspot.com/2014/02/part1-using-apache-http-client-to-do-http-post-form-authentication.html
and
https://issues.apache.org/jira/browse/NUTCH-827
These two links have complete and sample code. If you follow each steps correctly, then you will be able to achieve Form Based Authentication in Nutch.

Related

Azure AD not redirecting to the original request url rather goes back to Root

I have an ASP .Net MVC5 application using Azure AD Authentication. Whenever I enter URL, it takes me for authentication (if not done already) using a URL of this sort.
https://login.microsoftonline.com/[tenantID]/oauth2/v2.0/authorize?
There are 3 questions
Can I say its using OAuth2 ?
If someone enters a url for a page e.g. https://mydomain/Category/View/1, then it goes for auth (which is alright), but then post successful authentication, it should redirect me to the originally requested URL but currently its taking me to the root URL https://mydomain . But subsequent ones, work fine once authenticated.
Currently the auth happens every 1 hr I believe..how can I extend it to every 4 hrs ?
Will be helpful to see your suggestions.
Cheers.
According to your description of point 2, would you like to achieve that, visiting a url -> direct to login page -> redirect to that url? If so, I think this document can help you.
In my opinion, if I wanna a demo app or app just for simple test, just add all possible url to the redirect configuration form. If I need to do an formal app or I need to make it easy to maintain, I use the idea in the above document. I think the centeral thought is creating a specific place to control url redirecting, including judgment, and only need to add this specific url to the redirect configuration in azure portal. If you wanna a sample, may this document will help you.
To point 3, emmm perhaps you can search for some key words like 'azure ad authentication set token lifetime policy', I found several powershell scripts but I haven't tested. If you haven't got the result I will do some test on it next Monday.

Hiding parameters (sensitive information) from URL of an MVC 5 application

I am working on Asp.Net MVC 5. When i click a link (placed in another website) I navigate to UserDetails.cshtml page. Basically that 3rd party site is passing the UserName & Password to my site & using that I authorize & display further user info.
It's fine but the Url is looking like this
localhost:8080//Admin/UserDetails/UserName/PWD.
I don't want to show the UserName & Password in URL i.e URL should look something like :
localhost:8080//Admin/UserDetails/
One possible solution could be rewrite the URL in IIS (http://www.hanselman.com/blog/ASPNETMVCAndTheNewIIS7RewriteModule.aspx)
But I believe there is an easier way to handle this by using the routing mechanism of MVC.
Please help me to figure out the same.
EDIT :
As many of you are confused why I am not doing a Form Post here, let me re-frame my question. I have no control over the third party application, so I cant request them to do a form Post to my MVC application. Again the 3rd party application is a Oracle Reporting application (OBI), so doing a POST from that application might not be feasible too...
Let me reverse engineer your requirements from your question:
I want to have an URI that when invoked will give access to a secured section of my website. This URI must be clicked by visitors of a third-party site, whom I give that URI to. I want to hide the credentials from the URI.
You cannot do this, the requirements are conflicting. You cannot hand out URIs that will authenticate anyone who fires a request to that URI.
You could do something with a token (like http://your-site/auth/$token), but then still, anyone with access to that URI can use it to authenticate themselves, or simply put it up on their own website.
If you have data you want to expose to a third-party site, let that site perform an HTTP request (with tokens, usernames, headers or whatever you want to use to authenticate) in the background to your site, and display the response in their site. Then the visitor won't see that traffic, can't share the URI and all will be secure.
No. No. NO. Like seriously, NO. Any sensitive information should be sent via a post body over a secure connection (HTTPS). You can't "hide" information in a GET request, because it's all part of the URI, or the location of a particular resource. If you remove a portion, it's an entirely different location.
UPDATE
I find it extremely hard to believe that any third-party application that needs to authenticate via HTTP and isn't designed by a chimp with a typewriter, wouldn't support a secure method to do so, especially if it's an Oracle application. I'm not familiar with this particular app, but, and no offense meant here, but I would more easily believe that you've missed something in the documentation or simply haven't found the right way to do it yet before I'd believe you have to send clear-text credentials over GET.
Regardless, as I said previously, there's no way to hide information in a GET request. All data in a GET is part of the URL, and therefore is plainly visible in the browser location bar or whatever. Unfortunately, I have no advice for you other than to look closer at the documentation, even reach out to Oracle if you have to. Whether by post or something like OAuth, there almost has to be another way.

How should I secure my SPA and Web.API?

I have to implement a web site (MVC4/Single Page Application + knockout + Web.API) and I've been reading tons of articles and forums but I still can't figure out about some points in security/authentication and the way to go forward when securing the login page and the Web.API.
The site will run totally under SSL. Once the user logs on the first time, he/she will get an email with a link to confirm the register process. Password and a “salt” value will be stored encrypted in database, with no possibility to get password decrypted back. The API will be used just for this application.
I have some questions that I need to answer before to go any further:
Which method will be the best for my application in terms of security: Basic/ SimpleMembership? Any other possibilities?
The object Principal/IPrincipal is to be used just with Basic Authentication?
As far as I know, if I use SimpleMembership, because of the use of cookies, is this not breaking the RESTful paradigm? So if I build a REST Web.API, shouldn't I avoid to use SimpleMembership?
I was checking ThinkTecture.IdentityModel, with tokens. Is this a type of authentication like Basic, or Forms, or Auth, or it's something that can be added to the other authentication types?
Thank you.
Most likely this question will be closed as too localized. Even then, I will put in a few pointers. This is not an answer, but the comments section would be too small for this.
What method and how you authenticate is totally up to your subsystem. There is no one way that will work the best for everyone. A SPA is no different that any other application. You still will be giving access to certain resources based on authentication. That could be APIs, with a custom Authorization attribute, could be a header value, token based, who knows! Whatever you think is best.
I suggest you read more on this to understand how this works.
Use of cookies in no way states that it breaks REST. You will find ton of articles on this specific item itself. Cookies will be passed with your request, just the way you pass any specific information that the server needs in order for it to give you data. If sending cookies breaks REST, then sending parameters to your API should break REST too!
Now, a very common approach (and by no means the ONE AND ALL approach), is the use of a token based system for SPA. The reason though many, the easiest to explain would be that, your services (Web API or whatever) could be hosted separately and your client is working as CORS client. In which case, you authenticate in whatever form you choose, create a secure token and send it back to the client and every resource that needs an authenticated user, is checked against the token. The token will be sent as part of your header with every request. No token would result in a simple 401 (Unauthorized) or a invalid token could result in a 403 (Forbidden).
No one says an SPA needs to be all static HTML, with data binding, it could as well be your MVC site returning partials being loaded (something I have done in the past). As far as working with just HTML and JS (Durandal specifically), there are ways to secure even the client app. Ultimately, lock down the data from the server and route the client to the login screen the moment you receive a 401/403.
If your concern is more in the terms of XSS or request forging, there are ways to prevent that even with just HTML and JS (though not as easy as dropping anti-forgery token with MVC).
My two cents.
If you do "direct" authentication - meaning you can validate the passwords directly - you can use Basic Authentication.
I wrote about it here:
http://leastprivilege.com/2013/04/22/web-api-security-basic-authentication-with-thinktecture-identitymodel-authenticationhandler/
In addition you can consider using session tokens to get rid of the password on the client:
http://leastprivilege.com/2012/06/19/session-token-support-for-asp-net-web-api/

Good example of application that uses rails and backbone.js that handles authentication through backbone

Does anybody know of a good example I could look at as to how to go about implementing authentication through backbone with rails?
I haven't been able to find anything..
You have several possibilities. First you can log in normally, with plain html. That login would guide you to your backbone.js application.
Another possibility is within your backbone.js app you have a login form that takes advantage of backbone.js's ":authentication_token". When your backbone.js app sends the login info it will get a token back. From then on you are able make ajax calls and receive responses with that token.
EDIT: see this post for an example of working with the token: http://www.hyperionreactor.net/blog/token-based-authentication-rails-3-and-rails-2
What you are looking about rails integration with client-side, except ajax queries and authentication/authorization?
Use the demo app to see how backbone.js is working: http://documentcloud.github.com/backbone/examples/todos/index.html
Found one example app. Trying to figure out what is what atm myself. But might be worth looking: https://github.com/diaspora/diaspora

Is it secure to POST Credit Card data from View to Controller?

Need to submit some CC data from the View to the Controller where it will be processed, can I just POST it or is there some common way of securing the data in transit?
Post the data using SSL.
Here's a good resource on setting up SSL with IIS and ASP.NET.
Posting with SSL like Rex M mentioned is definitely the first step. You should probably make the page where they are typing their credit card number SSL as well. This will give your users the green URL of comfort.
You should also include protection against CSRF attacks. Use the anti-forgery token.
Also, you should use the PRG (Post, Redirect, Get) pattern to make sure that the credit card numbers aren't submitted twice. After the post, don't just render a different view, send a redirect so their browser does a GET against another URL - probably your confirmation page.
You'll run into a few ASP.NET MVC specific things:
If you have some http pages and some https pages, how will you code the links to the https pages from the http pages. You can hard code them, but you'll have to hard code the domain and protocol. You can't just use <%= Html.ActionLink(... see this SO question for more details.
You'll want to make sure you can't hit your controllers when you are not using SSL. This will help you catch any errors, and ensure that no one uses http instead of https. See the [RequireSsl] attribute in the futures assembly. Here's a blog post about it from Adam Salvo
I haven't read about the implementation of the ASP.net-MVC. However, i believe that you have mixed up the terminology.
The MVC Pattern would be evaluated on the server end. [So there is little need to do security checks between the components (unless they are exposed outside the program)]
I believe that many people get the impression that you are talking about HTTP POSTS after a form submission (as opposed to HTTP GETs)

Resources