When I was coding my meta tag and trying to figure out what other companies have implemented, I noticed some of them have ## instead of #. Does this make any difference?
<meta name="twitter:creator" content="##https://twitter.com/company">
<meta name="twitter:site" content="##company">
I always implement with only one # sign.
I was wondering, could this actually have something to do with SEO strategy?
Update: "# is sufficient. ## at your own risk." - Twitter Engineer
After a cursory browsing of the Twitter Card documentation, I see only examples of a single at-sign (#) preceding content. In my experience, no other official convention or pattern exists, or is encouraged by the documentation.
One likely explanation for the redundancy could be confusion in the template. Suppose the following exists in your source:
<meta name="twitter:creator" content="#<?= $username; ?>">
If the $username variable already consists of an at-sign, the resulting output will contain two. Twitter may have no issue with this, depending on how they search the value for usernames. If they look for nothing more than an at-sign, followed by a valid username, ##jonathansampson is valid.
Searching GitHub also didn't yield examples of developers explicitly and unequivocally desiring to use ##, but instead a smattering of resources showing the above pattern; an at-sign followed by a variable (which could also contain its own at-sign).
Related
I have the following ruby code in a model:
if privacy_policy_link.match(/#{domain}($|\/$)/)
errors.add(:privacy_policy_link, 'Link to a specific privacy policy page on your site instead of your homepage.')
end
This worked until a user tried to save a privacy policy link that looked like this:
https://example.com/about-me/privacy-policy-for-example-com/
The goal is that I don't want them linking to their base homepage (example.com or www.example.com etc) for this privacy policy link (for some random, complicated reasons I won't go into). The link provided above should pass the matcher (meaning that because they are linking to a separate page on their site, it shouldn't be considered a match and they shouldn't see any errors when saving the form) - but because they reference their base domain in second half of the url, it comes up as a match.
I cannot, for the life of me, figure out how the correct regex on rubular to get this url to pass the matching algorithm. And of course I cannot just ask the user to rename their privacy policy link to remove the "com" from it at the end - because this: https://example.com/about-me/privacy-policy-for-example would pass. :)
I would be incredibly grateful for any assistance that could help me understand how to solve this problem!
Rubular link: http://rubular.com/r/G5OmYfzi6t
Your issue is the . character is any character so it matched the - in example-com.
If you chain it to the beginning of the line it will match correctly without trying to escape the . in the domain.
if privacy_policy_link.match(%r{^(http[s]?://|)(www.)?#{domain}($|/$)})
Why Spring Security doesn't provide any XSS filter to clean the form input values?
Accordingly to this ticket, such XSS filter is a low priority:
https://jira.spring.io/browse/SEC-2167?jql=text%20~%20%22xss%22
(although the ticket speaks only about URL querystring. Sanitizing POST params would be also required)
In my opinion it would be really useful that spring would provide such a filter instead of building your own. This filter it's a recurrent problem.
XSS is best handled at output stage via the use of encoding. That is, store everything in your database as is, and yes storing <script> is fine, however once output, encode correctly for the context it is output in. For HTML this would be <script>, however if your output context was plain text you would just output as is <script> (assuming the same character set encoding is used). Side note: Use parameterised queries or equivalent for storing in your database to avoid SQL injection, however the text stored should exactly match what was entered.
Microsoft attempts to block inputs that look like XSS via their request validation feature in ASP.NET. However, this isn't very effective and flaws are found quite often. Similar approaches from other frameworks are doomed to fail.
The reason that this is much better is that it makes things much more simple. Imagine if StackOverflow didn't allow HTML or script tags - the site would not be functional as a place for people to post code snippets.
You can use input validation as second line of defence. For example, if you are asking the user to enter their car registration you would only want to allow alphanumerics and space to be entered. However, for more complex fields it is often difficult to restrict input to a safe set as output context is unknown at this stage.
Say your language filtered < and > characters. However you were outputting user input into the following context.
<img src="foo.jpg" alt="USER-INPUT" />
An XSS attack is possible by entering " onmouseover="alert('xss') because it would be rendered as
<img src="foo.jpg" alt="" onmouseover="alert('xss')" />
Similar problems would ensue if you were outputting to JavaScript server-side. This is why it should be up to the developer to select the correct encoding type when using user controlled values.
I'm designing a permalink system and I just noticed that Twitter and Hipmunk both prefix their permalinks with #!. I was wondering why this is, and if the exclamation point in particular is there for a reason. Wouldn't #/ work just as well, since they're no doubt using a framework that lets them redirect queries to certain templates with a regex URL parser?
http://www.hipmunk.com/#!BOS.SEA,Dec15.Jan02
http://twitter.com/#!/dozba
My only guess is it's because browsers use # to link to an anchor element. Is this why the exclamation point is appended?
This is done to make an "AJAX" page crawlable [by google] for indexing -- It does not affect the other well-defined semantics of the fragment identifier at all!
See Making AJAX Applications Crawlable: Getting Started
Briefly, the solution works as follows: the crawler finds a pretty AJAX URL (that is, a URL containing a #! hash fragment). It then requests the content for this URL from your server in a slightly modified form. Your web server returns the content in the form of an HTML snapshot, which is then processed by the crawler. The search results will show the original URL.
I am sure other search-engines are also following this lead/protocol.
Happy coding.
Also, It is actually perfectly valid, at least per HTML5, to have an element with an ID of "!foo" so the
reasoning in the post is invalid. See the article "The id attribute just got more classy":
HTML5 gets rid of the additional restrictions on the id attribute. The only requirements left — apart from being unique in the document — are that the value must contain at least one character (can’t be empty), and that it can’t contain any space characters.
My guess is that both pages use this in their JavaScript to differ between # (a link to an anchor) and their custom #! which loads some additional content using Ajax.
In that case pretty much everything else would work after the # sign.
Can I use ActionView::Helpers::SanitizeHelper#sanitize on user-entered text that I plan on showing to other users? E.g., will it properly handle all cases described on this site?
Also, the documentation mentions:
Please note that sanitizing
user-provided text does not guarantee
that the resulting markup is valid
(conforming to a document type) or
even well-formed. The output may still
contain e.g. unescaped ’<’, ’>’, ’&’
characters and confuse browsers.
What's the best way to handle this? Pass the sanitized text through Hpricot before displaying?
Ryan Grove's Sanitize goes a lot farther than Rails 3 sanitize. It ensures the output HTML is well-formed and has three built-in whitelists:
Sanitize::Config::RESTRICTED
Allows only very simple inline formatting markup. No links, images, or block elements.
Sanitize::Config::BASIC
Allows a variety of markup including formatting tags, links, and lists. Images and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto protocols, and a attribute is added to all links to mitigate SEO spam.
Sanitize::Config::RELAXED Allows an even wider variety of markup than BASIC, including images and tables. Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images are limited to HTTP and HTTPS. In this mode, is not added to links.
Sanitize is certainly better than the "h" helper. Instead of escaping everything, it actually allows the html tags that you specify. And yes, it does prevent cross-site scripting because it removes javascript from the mix entirely.
In short, both will get the job done. Use "h" when you don't expect anything other than plaintext, and use sanitize when you want to allow some, or you believe people may try to enter it. Even if you disallow all tags with sanitize, it'll "pretty up" the code by removing them instead of escaping them as "h" does.
As for incomplete tags: You could run a validation on the model that passes html-containing fields through hpricot, but I think this is overkill in most applications.
The best course of action depends on two things:
Your rails version (2.x or 3.x)
Whether your users are supposed to enter any html at all on the input or not.
As a general rule, I don't allow my users to input html - instead I let them input textile.
On rails 3.x:
User input is sanitized by default. You don't have to do anything, unless you want your users to be able to send some html. In that case, keep reading.
This railscast deals with XSS attacks on rails 3.
On rails 2.x:
If you don't allow any html from your users, just protect your output with the h method, like this:
<%= h post.text %>
If you want your users to send some html: you can use rails' sanitize method or HTML::StathamSanitizer
What should I use:
/findby/name/{first}_{last}
/findby/name/{first}-{last}
/findby/name/{first};{last}
/findby/name/first/{first}/last/{last}
etc.
The URI represents a Person resource with 1 name, but I need to logically separate the first from the last to identify each. I kind of like the last example because I can do:
/findby/name/first/{first}
/findby/name/last/{last}
/findby/name/first/{first}/last/{last}
You could always just accept spaces :-) (querystring escaped as %20)
But my preference is to just use dashes (-) ... looks nicer in the URL. unless you have a need to be able to essentially query in which case the last example is better as you noted
Why not use + for space?
I am at a loss: dashes, minuses, underscores, %20... why not just use +? This is how spaces are normally encoded in query parameters. Yes, you can use %20 too but why, looks ugly.
I'd do
/personNamed/Joe+Blow
I like using "_" because it is the most similar character to space that keeps the URL readable.
However, the URLs you provided don't seem really RESTful. A URL should represent a resource, but in your case it represents a search query. So I would do something like this:
/people/{first}_{last}
/people/{first}_{last}_(2) - in case there are duplicate names
It this case you have to store the slug ({first}_{last}, {first}_{last}_(2)) for each user record. Another option to prepend the ID, so you don't have to bother with slugs:
/people/{id}-{first}_{last}
And for search you can use non-RESTful URLs:
/people/search?last={last}&first={first}
These would display a list of search results while the URLs above the page for a particular person.
I don't think there is any use of making the search URLs RESTful, users will most likely want to share links to a certain person's page and not search result pages. As for the search engines, avoid having the same content for multiple URLs, and you should even deny indexing of your search result pages in robots.txt
For searching:
/people/search?first={first}&last={last}
/people/search?first=george&last=washington
For resource paths:
/people/{id}-{first}-{last}
/people/35-george-washington
If you are using Ruby on Rails v3 in standard configuration, here's how you can do it.
# set up the /people/{param} piece
# config/routes.rb
My::Application.routes.draw do
resources :people
end
# set up that {param} should be {id}-{first}-{last}
# app/models/person.rb
class Person < ActiveRecord::Base
def to_param
"#{id}-#{to_slug(first_name)}-#{to_slug(last_name)}"
end
end
Note that your suggestion, /findby/name/first/{first}/last/{last}, is not restful. It does not name resources and it does not name them succinctly.
The most sophisticated choice should always and first of all consider two constraints:
As you'll never know how skilled the developer or the device being implemented on is regarding handling of urlencoding, i will always try to limit myself to the table of safe characters, as found in the excellent rant (Please) Stop Using Unsafe Characters in URLs
Also - we want to consider the client consuming the API. Can we have the whole structure easily represented and accessible in the client side programming language? What special characters would this requirement leave us with? I.e. a $ will be fine in javascript variable names and thus directly accessible in the parsed result, but a PHP client will still have to use a more complex (and potentially more confusing) notation $userResult->{'$mostVisited'}->someProperty... that a shot in your own foot! So for those two (and a couple of other programming environments) underscore seems the only valid option.
Otherwise i mostly agree with #yfeldblum`s response - i'd distinct between a search endpoint vs. the actual unique resource lookup. Feels more REST to me, but more importantly, the two have a significant cost difference on your api server - this way you can easier distinct and i.e. charge a higher costs or rate limit the search endpoint - should you ever need it.
To be Pragmatic, as opposed to a "RESTafarian" the mentioned approach /people/35-george-washington could (and should imho) basically respond to just the id, so if you want a named, urlsafe-for-dummies-link, list the reference as /people/35_george_washington. Other ideas could be /people/35/#GeorgeWashington (so breaking tons of RFCs) or /people/35_GeorgeWashington - the API wouldn't care.