Delphi code for sanitizing a URL entered by the user - delphi

I need a user to be able to enter a URL, and would like to make sure it is as wholesome as possible. Things like checking that there is http:// at the front, no double-dots, perhaps valid TLD, trailing slash (I have to add the final page).
I figure this is such a common requirement that it must exist already. Suggestions?
[edit:] To be clear, this is a run-time requirement in a Windows Service. The aim is to get the best from the URL read from the configuration, rather than validate what the user typed in. In essence, if I can adjust the URL and make it work, then that is what I'd like to do. The download will be a specific file, so if it all goes wrong it won't get the wrong thing from another server by mistake.

How about using the PathIsURL function in the Windows API?
Update:
This is already wrapped in the Delphi RTL in the ShLwApi unit.

Have you had a look at "What is the best regular expression to check if a string is a valid URL"? It is not Delphi specific, but might get you started.

Perhaps some of the suggestions here might help.

Related

How to know what are all the possible parameters for a query string for a site?

I want to check what are ALL the possible parameters for any existing website url. Assuming the site is working with parameters type query string "architecture" (and not MVC for example) something like:
http://www.foobar.com/p1&itemsPerPage=50&size=500
Let's say there are other parameters which I don't know exist, and I don't see them in the url at the moment. For example, parameters like max, day and OtherExoticVariable. Again, I don't know their names but want to know ALL of their names. Is there some way of requesting the server to respond will all possible url parameters?
I would prefer a method with Javascript that I could run quickly through a browser but could also do asp.net c# if necessary.
Thanks a lot!
Ray.
It is the script/app running on the server that decides what parameters are valid. Unless the app provides such a query mechanism you can't do it. The server has no idea what is valid and what isn't.
Not guaranteed to get you ALL query strings, but it is often helpful to Google
"foobar.com/p1& * ".
You will be able to see all the public occurrences of query strings for the foobar.com website.
(As the accepted answer says, there is no general method to access query strings unless the website provides an API.)
I do not think this is possible. Each Web application designer can decide on the parameters individually, and you only know them if you see them being used.

Any short URL service that you can POST variables on?

I work for a small SMS marketing company, where we're sending out text message that each contain a unique code for the user (as a variable). My url is rather long, and I want to attach a unique variable for each one.
For example, the full URL might be:
http://www.mybigwebsiteurlishuge.com/more/more/?code={variable}
but I want it to be something like:
http://bit.ly/2398h?code={variable}
Anybody know any services that can do this? Otherwise I need to purchase small domain name just for this.
Thanks so much!
Most shortening services have APIs that you can use to shorten your URLs. Including bit.ly. Yu will have to use their API to the shortened URL.
I kept on looking, and still couldn't find anything suitable, so I got a new 3-character domain name, and also make a redirecting script that changed miniaturized variable names t the full ones. This works just as good really.

Is there a way to disable email engines from automatically hyperlinking a URL?

One of my clients wants to disable the URL to be shown as a hyperlinked URL, it has to be recognized as plain text, this is what I have tried:
ur<!comments>l
I have also tried to remove the <a></a> tag, as well as remove "http://" of the URL, none of them worked in Outlook. Outlook still recognized it as a hyperlink.
Anybody have any workaround here?
There is a zero-width non-breaking space that I like to use: 
I place it in strategic places so that the URL does not get recognized as a URL, like so: http://wwwdomain.com.
This strategy has worked for me across platforms and rendering clients. Its advantages are twofold: 1) it prevents the client from auto-rendering text as a link, and 2) unlike other "non-breaking" zero-width space ascii codes (ie ), it wraps the entire URL if your URL happens to need it (instead of just the parts after the zero-width space).
Try it out.
Credit belongs to my coworker, actually. Seems to work in all clients that we tested.
www.websitename.<img src="" width="0" height="0">com
An empty image tag with 0 width and 0 height. Insert it between the dot and the following text (in this case "com").
After we tried several things, he somehow suffered from a moment of inspiration/brilliance.
No visible spacing between the characters. Not sure what will happen if you copy/paste the string into a browser directly, though. It served my purpose of not allowing email clients to automatically make it a hyperlink, though.
This one worked for me. It is a combination of Scott's answer and David K. Hess's comment.
Break your url using <span>. However, you need to break it in a way that they are not matched as url when the mail client scans it.
eg: http<span>://</span><span>google.</span>com
You can turn off auto-hyperlinking in general. Here is a tutorial for Outlook 2007:
Turn automatic hyperlinking on or off
I have a similar issue with words like "chequed.com" and "interviewing.com" that are creating a hyperlink in my messages when I do not want it to.
The first step I took was to edit the HTML link tags.. but there weren't any.
After that, I went to the text in the email and added a very small space by using a fount of 8pt (im using an ESP, otherwise I would have gone with 1px)
This may help if you're having the same issue.
My solution for this is
http://...
I contacted Gmail's support and spoke with a department manager for Apple Care. This is expected behavior and cannot be prevented. These hacks no longer work, and if implemented could result in your IP being listed as a phishing operation. You're dancing around security issues here. I would suggest revising your content strategy.
The only thing you can do currently is wrap all email addresses in mailto links and phone numbers in tel links. There are no other options available as of 2017.
I had success with janusoo's solution for years until for some reason it began to introduce line breaks on some clients. I found that I could proceed with ​
www.websitename.​com
You might try using CSS to re-flow the text.
<p>www.example.<span style="float:left">http://</span>com/</p>
If the part with "http://" still gets marked as a URL, try breaking things up in different places.
One other trick would be to replace the periods with some other Unicode character that LOOKS like a period but actually isn't. For example, "⠄" (U-2840) is a Braille single-dot.
Alas (!) I don't have any Microsoft applications I can test this with, but good luck with it. :)
If you use . to replace your '.' in your hyperlinks you'll solve Outlook 2007 Hyperlinking the URL.

Deleting an Azure Blob in MVC 3

I'm trying to delete blobs in an mvc 3 application that uses azure storage.
I'm trying to pass the Uri of the blob which will be deleted to the controller, however an error is thrown:
A potentially dangerous Request.Path value was detected from the client (:)
I think this is from the https: part of the Uri and I need to parse it out, however I'm not sure how to do that. I'm wondering how to fix this error.
Is there a more graceful way to delete a blob from storage?
You must properly URL encode your urls. Here's an example of a badly encoded url:
http://foo.com/controller/action?param=http://bar.com
Here's how it should look like:
http://foo.com/controller/action?param=http%3A%2F%2Fbar.com
Or maybe you are having an url of the form:
http://foo.com/controller/action/https://bar.com
which is even worse. If you want to use special characters in the Path portion of an URL you might find the following blog post useful.
If you want unsecure content to get through then you can add [ValidateInput(false)] to your action - however, this is opening up something that is there for your security - so only do this if you are sure you're code is secure - see first answer in A potentially dangerous Request.Form value was detected from the client
I was able to fix it and I want to summarize the solution, since it requires bit from the other two answers and bits mostly from the Scott Hanselman Blog post.
You need to do a few things to make this work:
Put the [ValidateInput(false)] on your action method.
Make sure your Url is properly encoded (an example is given in the above post) which is done when you use the blobVariableName.Uri.AbsoluteUri as the string to pass from your view to your controller, so you shouldn't have to do anything there.
Make your query string looks like
http://site/controller/action?blobid=http%3A%2F%2F... and NOT http://site/controller/action/http%3A%2F%2F... the latter won't work!
On a side note, since I started, our functional requirements changed and now were storing information about each blob in the database, which allows me to pass parameters other than the blob's uri, which seems like a much safer way to play it.
A great deal of the community appears to be in agreement that it is a bad idea to pass uri's and to open up your application as to allow you to do so.

Why would I put ?src= in a link?

I feel dumb for not knowing this, but I see a lot of links in web pages and instead of this:
<a href="http://foo.com/">
...they use this:
<a href="http://foo.com/?src=bar.com">
Now I understand that the ?src= is telling something that this referral is coming from bar.com, but I don't understand why this needs to be called out explicitly. Can anyone shed some light on it for me? Is this something I need to include in my program generated links?
EDIT: Ok, sorry, I'm not being clear enough. I understand the GET syntax with a question mark and parameters separated by ampersands. I'm wondering what's this special src parameter? Why would one site link to another and tack an src parameter on the end even though there's no indication that the destination site uses this normally.
For example, on this page hover your mouse over the screenshot. The link URL is http://moms4mom.com/?src=stackexchangesites
But moms4mom.com is our site. Passing the src parameter does nothing, so why include it?
There are a few reasons that the src is being used explicitly. But in general, it is easier and more reliable to trust a query string to determine referer[sic] than it is to trust the referer, since the latter is often broken, deliberately or not. On the other hand, browsers almost never break the query string in a url, since this, unlike referers, is pretty important for pages to function. Besides, a referer is often done without any deliberate action on the part of the site doing the refering, which some users dislike.
The reason (I do it) is that popular analytics tools sometimes make it easier to filter on query strings than referrers.
There is no standard to the src parameter. Each site has its own and it's usually up to the site that gets the link to define how it wants to read it (as usually it's that site that's going to pay for the click).
The second is a dynamic link, it's a URL that another language(like ASP and PHP) interpret as something to do, like in those Google URLs, but i never used this site(foo.com), then i don't much things about this parameter.
Depending on how the site processes its URL, you may or may not need to include the ?... information.
This is passed to the website, and the server can process it just like form input. Some sites require this - and build their navigation off a single page, using nothing but the "extra" stuff passed afterwards. If you're generating a link to a site like that, it will be required.
In other cases, this is just used to pass extra, unrequired info (such as advertising, tracking info, etc)... In those cases, you can leave it off.
Unfortunately, there's no way to know without trying whether you can remove the "extra" bits from the URL.
After reading some of your comments - I'll also say:
There is nothing special about the "src" field in a query string. The server is free to use it any way it wishes. Unless you know specific info about the server, you cannot assume it can be left out.
The part after the ? is the query string. Different sites use it for different things, and it is usually used for passing information to the server side code for that URL, but can also be used in javascript.
For more info see Query String

Resources