Using "&" in a slug - url

I have a slug field in the database which has been created from a name containing &:
name: Hansel & Gretel
slug: hansel-&-gretel
Doctrine removes from the name characters which are not url-friendly. Well, & is definitely url-friendly.
Now, when I generate a link to the fairytale with symfony's link_to() I get:
http://myfairytalesite.ft/tale/hansel-%26amp%3B-gretel
So before the & gets url-encoded it gets changed to a HTML entity.
When I navigate to this URL I get 404 error because the slug is only url-decoded and the route (sfDoctrineRoute) tries to find an object with the slug hansel-&-gretel which obviously does not exist in the database.
My question: what can I do about it?
The name is imported from an external source so I can't change it manually.
I thought of adding a html_entities_decode to the action where I use the slug but it feels like attaching a wing to a plane with duct-tape...
I even tried to dig into the Symfony's internals and see what could I change there and I went so deep I was afraid I'm going to wake up the Balrog but couldn't find anything interesting. :/
Has anyone had a similar problem? Can it be a Sf or Doctrine bug?

To avoid this issue, I replace any & symbols as well as & amp; with an "and" when I save my slugs.
E.g.
hansel-and-gretel

Related

Ruby On Rails - Use "Format" As A URL GET Parameter?

I have a search page where I update the URL params on the page as filters are added or removed by the user. This allows me to deep link into the page (ie. going to /search?location=new+york&time=afternoon will set the location and afternoon filters).
I also have a filter named format. I noticed that passing in ?format=whatevervalue to the URL and then reloading the page with that param causes Rails to return a Completed 406 Not Acceptable error. It seems that format is a reserved Rails URL parameter.
Is there anyway to unreserve this parameter name for a particular endpoint?
In the context of an URL in Ruby on Rails there are at least four reserved parameter names: controller, method, id, format.
You cannot use these keys for anything else than for their intended purpose.
If you try to you will override the value internally set by Rails. In your example by setting ?format=whatevervalue you override the default format (html) and your application will try to find and render a whatevervalue template instead of the html formatted template. This will obviously not work.
Fun fact: Instead of using the default Rails path format like /users/123/edit you could use query parameters instead like this: /?controller=users&id=123&method=edit&format&html.
My suggestion is: Do not try to fight Rails conventions. Whenever you try to work around basic Rails conventions it will hurt you later on because it makes updates more difficult, common gems might break, unexpected side-effects will happen. Just use another name for that parameter.

Regex URL validation in Ruby

I have the following ruby code in a model:
if privacy_policy_link.match(/#{domain}($|\/$)/)
errors.add(:privacy_policy_link, 'Link to a specific privacy policy page on your site instead of your homepage.')
end
This worked until a user tried to save a privacy policy link that looked like this:
https://example.com/about-me/privacy-policy-for-example-com/
The goal is that I don't want them linking to their base homepage (example.com or www.example.com etc) for this privacy policy link (for some random, complicated reasons I won't go into). The link provided above should pass the matcher (meaning that because they are linking to a separate page on their site, it shouldn't be considered a match and they shouldn't see any errors when saving the form) - but because they reference their base domain in second half of the url, it comes up as a match.
I cannot, for the life of me, figure out how the correct regex on rubular to get this url to pass the matching algorithm. And of course I cannot just ask the user to rename their privacy policy link to remove the "com" from it at the end - because this: https://example.com/about-me/privacy-policy-for-example would pass. :)
I would be incredibly grateful for any assistance that could help me understand how to solve this problem!
Rubular link: http://rubular.com/r/G5OmYfzi6t
Your issue is the . character is any character so it matched the - in example-com.
If you chain it to the beginning of the line it will match correctly without trying to escape the . in the domain.
if privacy_policy_link.match(%r{^(http[s]?://|)(www.)?#{domain}($|/$)})

acts_as_taggable_on url friendly tag names

Is there a way to make the tag name on acts_as_taggable_on to be URL friendly?
For example, at the moment I have 'tags/foo' and 'tags/bar' working great. However when I add spaces to the name such as 'rabbits foot' the url is 'tags/rabbits%20foot'. I'd like to replace that %20 with a dash.
Thanks in advance!
UPDATE
I just noticed that stackoverflow actually uses a very similar or identical way of doing tags to what I have in mind.
Take a look at this
https://github.com/arturaz/acts_as_taggable_on_steroids
It has a way to change the multiple words into a slug (e.g. rabbits foot into rabbits-foot), there's also manual non plugin ways to do this as well (it looks somewhat outdated).
Decided to kind of cheat and use the id+tag.name just to get passed this hump for now. Will revisit when I have more time.

dynamic seo title for news articles

I have a news section where the pages resolve to urls like
newsArticle.php?id=210
What I would like to do is use the title from the database to create seo friendly titles like
newsArticle/joe-goes-to-town
Any ideas how I can achieve this?
Thanks,
R.
I suggest you actually include the ID in the URL, before the title part, and ignore the title itself when routing. So your URL might become
/news/210/joe-goes-to-town
That's exactly what Stack Overflow does, and it works well. It means that the title can change without links breaking.
Obviously the exact details will depend on what platform you're using - you haven't specified - but the basic steps will be:
When generating a link, take the article title and convert it into something URL-friendly; you probably want to remove all punctuation, and you should consider accented characters etc. Bear in mind that the title won't need to be unique, because you've got the ID as well
When handling a request to anything starting with /news, take the next part of the path, parse it as an integer and load the appropriate article.
Assuming you are using PHP and can alter your source code (this is quite mandatory to get the article's title), I'd do the following:
First, you'll need to have a function (or maybe a method in an object-oriented architecture) to generate the URLs for you in your code. You'd supply the function with the article object or the article ID and it returns the friendly URL with the ID and the friendly title.
Basically function url(Article $article) => URL.
You will also need some URL rewriting rules to remove the PHP script from the URL. For Apache, refer to the mod_rewrite documentation for details (RewriteEngine, RewriteRule, RewriteCond).

How to avoid conflict when not using ID in URLs

I see often (rewritten) URLs without ID in it, like on some wordpress installations. What is the best way of achieve this?
Example: site.com/product/some-product-name/
Maybe to keep an array of page names and IDs in cache, to avoid DB query on every page request?
How to avoid conflicts, and what are other issues on using urls without IDs?
Using an ID presents the same conundrum, really--you're just checking for a different value in your database. The "some-product-name" part of your URL above is also something unique. Some people call them slugs (Wordpress, also permalinks). So instead of querying the database for a row that has the particular ID, you're querying the database for a row that has a particular slug. You don't need to know the ID to retrieve the record.
As long as product names are unique it shouldn't be an issue. It won't take any longer (at least not significant) to look up a product by unique name than numeric ID as long as the column is indexed.
Wordpress has a field in the wp_posts table for the slug. When you create the post, it creates a slug from the post title (if that's how you have it configured), replacing spaces with dashes (or I think you can set it to underscores). It also takes out the apostrophes, commas, or whatnot. I believe it also limits the overall length of the slug, too.
So, in short, it isn't dynamically decoding the URL into the post's title--there's a field in the table that matches the URL version of the post name directly.
As you may or may not know, the URLs are being re-written with Apache's mod_rewrite module. As mentioned here, Wordpress is, in the background, assigning a slug after sanitizing the title or post name.
But, to answer your question, what you're describing is Wordpress' "Pretty Permalinks" feature and you can learn more about it in the Wordpress codex. Newer versions of Wordpress do the re-writing internally (no .htaccess editin, wp_rewrite instead). Which is why you'll see the same ruleset for any permalink structure.
Though, if you do some digging you can find the old rewrite rules. For example:
RewriteRule ^([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/?$ /index.php?year=$1&monthnum=$2&day=$3 [QSA,L]
Will take a URL like /2008/01/01/ and direct it to /index.php?year=2008&monthnum=01&day=01 (and load a date category).
But, as mentioned, a page like product-name exists only because Wordpress already sanitized the post title and stored it as a field in the database.

Resources