Auth-code with A-Za-z0-9 to use in an URL parameter - ruby-on-rails

As part of a web application I need an auth-code to pass as a URL parameter.
I am currently using (in Rails) :
Digest::SHA1.hexdigest((object_id + rand(255)).to_s)
Which provides long strings like :
http://myapp.com/objects/1?auth_code=833fe7bdc789dff996f5de46075dcb409b4bb4f0
However it is too long and I think I might be able to "compress" this chain using more legal characters in an URL like the whole uppercase and lowercase alphabet in addition to numbers.
Do you have a code snipplet which does just that ?

your_auth_code = Digest::SHA1.hexdigest((object_id + rand(255)).to_s)
your_shortened_code = your_auth_code.to_i(16).to_s(36)
Converts your auth_code from base 16 (hexadecimal) to base 36 which uses [0-9a-z]
Personally I'd just cut the code in two if you feel it's too long.

Courtesy of a coworker of mine:
CHARS = [*'a'..'z'] + [*'A'..'Z'] + [*0..9]
def create_token
self.token = (0..9).map { CHARS[rand(CHARS.size)] }*''
end
There's also one that uses a bunch ascii characters from range 32+, but it isn't suitable for your use case (urls) due to illegal characters, but you might want to use for password salts, etc. This one courtesy of James Buck:
Array.new(32) { 32 + rand(95) }.pack("C*")
With those two snippets you can probably customize it for your needs.

What gpaul is getting at is that hash functions are still hash functions even if they're truncated, there's just a higher chance of collision though with only 10 bits it's still quite a low rate of collision. If you look at bit.ly for instance their hashes are completely miniscule but as you noted they're using base-32 instead of base-16, it doesn't really matter that much.
What's important is for you to ask what's at risk if people collide, because even with full SHA1 there's still the chance (cryptographically impossible). If there's really not a huge danger I think you could go down to 5-10 characters.
But the question still remains of why it matters. In your emails presumably you're sending a link which people just click on correct? There may be a better option entirely if you can tell us why the url is too long.

That is correct : my app has Users which click a link on an email containing an auth code.
When the user clicks the link, he ends up on the webapp but he is not redirected. The auth code will stay in the URL bar.
Each one of my users has an auth code. What's at stake if collision occur is that two users cannot be distinguished between each other.
Thanks to your very valuable input I was able to figure out what to type in google to get info on that topic : "base 62".
So I found the base62 gem : http://github.com/jtzemp/base62
And now, my formula is :
Digest::SHA1.hexdigest((object_id + rand(255)).to_s).to_i(16).base62_encode.slice(0..10)
which gives me an auth_code like : Fw1eDr701PY
Its a good compromise. If my app conquers the world, I can still add a DB lookup to avoid duplicates but for now I will stick to it.

Related

What language is this Salesforce code that I need to wrap?

I'm working on a Salesforce coding issue. Let me preface this by saying I'm not a developer or Salesforce expert.
What language is this?
Data Type FormulaThis formula references multiple objects
IF (Fulfillment_Submission_Form_URL__c <> "" && CONTAINS(Fulfillment_Submission_Form_URL__c, "qualtrics"),
Fulfillment_Submission_Form_URL__c &
(IF (CONTAINS(Fulfillment_Submission_Form_URL__c,"?SID="), "&", "?")) &
(IF (CONTAINS(TEXT(Type__c), "Site Visit"),
"ContactId="&Statement_of_Work__r.Contractor_Contact__c&
"&CoachType="&SUBSTITUTE(Statement_of_Work__r.Work_Type__r.Name," ","%20")&
"&CoachName="&SUBSTITUTE(Statement_of_Work__r.Contractor_Name__c," ","%20")&
"&InitPartId="&Initiative_Participation__r.Id&
"&InstitutionName="&substitute(substitute(SUBSTITUTE(Institution_Name__c," ","%20"),")",""),"(","")&
"&AccountId="&Initiative_Participation__r.Participating_Institution__r.Id&
"&TodaysDate="&TEXT(TODAY())&
"&SOWLineItemId="&Id&
"&LeaderCollege="&Initiative_Participation__r.ATD_Leader_College_Status__c&
"&SVRCompleted="&TEXT(Count_of_Site_Visit_Fulfillments__c)&
"&SVRRequired="&TEXT(Number_of_Work_Units_Allocated__c),
IF (CONTAINS(TEXT(Type__c), "Feedback"),
"InitPartId="&Initiative_Participation__r.Id&
"&SOWLineItemId="&Id&
"&ReportYear="&Statement_of_Work__r.SOW_Year__c&
"&UserId="&Contractor_User_Id__c&
"&InstitutionName="&substitute(substitute(SUBSTITUTE(Institution_Name__c," ","%20"),")",""),"(",""),
"")
))
,"")
Essentially it's pulling a link from another product we've integrated it with. We then take the basic link and reformat it to add parameters.
The problem is when it pulls in some parameters (ex: CoachName) the Coach entered their name in strange formats like: John (Coach) Doe.
So when the script outputs a URL that includes parameters it breaks at the &CoachName=John%20(Coach)% portion of the URL. Any easy way to work around this by modifying the script? Unfortunately we DO need that (Coach) identifier because the system we push to grabs that as well.
It's formula syntax, I'd compare it to Excel-like formulas. There's self-paced training if you don't want to read documentation. And as it's not exactly code-related you may have more luck on dedicated site, https://salesforce.stackexchange.com/. More admins lurk there.
So you do want that "(Coach)" to go through but it breaks the link? Looks like ( is a special character. It's not technically wrong to have unescaped parentheses, if it breaks that other site you might want to contact them and get their act together. RFC doesn't force us to encode them but looks like you'll have to to solve it at least in the short term: https://webmasters.stackexchange.com/questions/78110/is-it-bad-to-use-parentheses-in-a-url
Instead of poor man's encoding (SUBSTITUTE(Statement_of_Work__r.Contractor_Name__c," ","%20") try using proper URLENCODE(Statement_of_Work__r.Contractor_Name__c).
Or there's bit more "pro" function called URLFOR but the documentation doesn't make it very clear how powerful the 3rd parameter is with the braces [key1 = value1, key2 = value2] syntax. Basically just pass the parameters and let SF worry about encoding special characters etc.
Read my answer https://salesforce.stackexchange.com/a/46445/799 and there are some examples on the net like https://support.docusign.com/s/articles/DFS-URL-buttons-for-Lightning-basic-setup-limitations?language=en_US&rsc_301

Any problems with using a period in URLs to delimiter data?

I have some easy to read URLs for finding data that belongs to a collection of record IDs that are using a comma as a delimiter.
Example:
http://www.example.com/find:1%2C2%2C3%2C4%2C5
I want to know if I change the delimiter from a comma to a period. Since periods are not a special character in a URL. That means it won't have to be encoded.
Example:
http://www.example.com/find:1.2.3.4.5
Are there any browsers (Firefox, Chrome, IE, etc) that will have a problem with that URL?
There are some related questions here on SO, but none that specific say it's a good or bad practice.
To me, that looks like a resource with an odd query string format.
If I understand correctly this would be equal to something like:
http://www.example.com/find?id=1&id=2&id=3&id=4&id=5
Since your filter is acting like a multi-select (IDs instead of search fields), that would be my guess at a standard equivalent.
Browsers should not have any issues with it, as long as the application's route mechanism handles it properly. And as long as you are not building that query-like thing with an HTML form (in which case you would need JS or some rewrites, ew!).
May I ask why not use a more standard URL and querystring? Perhaps something that includes element class (/reports/search?name=...), just to know what is being queried by find. Just curious, I knows sometimes standards don't apply.

ColdFusion - What's the best URL naming convention to use?

I am using ColdFusion 9.
I am creating a brand new site that uses three templates. The first template is the home page, where users are prompted to select a brand or a specific model. The second template is where the user can view all of the models of the selected brand. The third template shows all of the specific information on a specific model.
A long time ago... I would make the URLs like this:
.com/Index.cfm // home page
.com/Brands.cfm?BrandID=123 // specific brand page
.com/Models.cfm?ModelID=123 // specific model page
Now, for SEO purposes and for easy reading, I might want my URLs to look like this:
.com/? // home page
.com/?Brand=Worthington
.com/?Model=Worthington&Model=TX193A
Or, I might want my URLs to look like this:
.com/? // home
.com/?Worthington // specific brand
.com/?Worthington/TX193A // specific model
My question is, are there really any SEO benefits or easy reading or security benefits to either naming convention?
Is there a best URL naming convention to use?
Is there a real benefit to having a URL like this?
http://stackoverflow.com/questions/7113295/sql-should-i-use-a-junction-table-or-not
Use URLs that make sense for your users. If you use sensible URLs which humans understand, it'll work with search engines too.
i.e. Don't do SEO, do HO. Human Optimisation. Optimise your pages for the users of your page and in doing so you'll make Google (and others) happy.
Do NOT stuff keywords into URLs unless it helps the people your site is for.
To decide what your URL should look like, you need to understand what the parts of a URL are for.
So, given this URL: http://domain.com/whatever/you/like/here?q=search_terms#page-frament.
It breaks down like this:
http
what protocol is used to deliver the page
:
divides protocol from rest of url
//domain.com
indicates what server to load
/whatever/you/like/here
Between the domain and the ? should indicate which page to load.
?
divides query string from rest of url
q=search_terms
Between the ? and the # can be used for a dynamic search query or setting.
#
divides page fragment from rest of the url
page-frament
Between the # and the end of line indicates which part of the page to focus on.
If your system setup lets you, a system like this is probably the most human friendly:
domain.com
domain.com/Worthington
domain.com/Worthington/TX193A
However, sometimes a unique ID is needed to ensure there is no ambiguity (with SO, there might be multiple questions with the same title, thus why ID is included, whilst the question is included because it's easier for humans that way).
Since all models must belong to a brand, you don't need both ID numbers though, so you can use something like this:
domain.com
domain.com/123/Worthington
domain.com/456/Worthington/TX193A
(where 123 is the brand number, and 456 is the model number)
You only need extra things (like /questions/ or /index.cfm or /brand.cfm or whatever) if you are unable to disambiguate different pages without them.
Remember: this part of the URL identifies the page - it needs to be possible to identify a single page with a single URL - to put it another way, every page should have a unique URL, and every unique URL should be a different page. (Excluding the query string and page fragment parts.)
Again, using the SO example - there are more than just questions here, there are users and tags and so on too. so they couldn't just do stackoverflow.com/7275745/question-title because it's not clearly distinct from stackoverflow.com/651924/evik-james - which they solve by inserting /questions and /users into each of those to make it obvious what each one is.
Ultimately, the best URL system to use depends on what pages your site has and who the people using your site are - you need to consider these and come up with a suitable solution. Simpler URLs are better, but too much simplicity may cause confusion.
Hopefully this all makes sense?
Here is an answer based on what I know about SEO and what we have implemented:
The first thing that get searched and considered is your domain name, and thus picking something related to your domain name is very important
URL with query string has lower priority than the one that doesn't. The reason is that query string is associated with dynamic content that could change over time. The search engine might also deprioritize those with query string fearing that it might be used for SPAM and diluting the result of SEO itself
As for using the URL such as
http://stackoverflow.com/questions/7113295/sql-should-i-use-a-junction-table-or-not
As the search engine looks at both the domain and the path, having the question in the path will help the Search Engine and elevate the question as a more relevant page when someone typing part of the question in the search engine.
I am not an SEO expert, but the company I work for has a dedicated dept to managing the SEO of our site. They much prefer the params to be in the URI, rather than in the query string, and I'm sure they prefer this for a reason (not simply to make the web team's job slightly trickier... all though there could be an element of that ;-)
That said, the bulk of what they concern themselves with is the content within and composition of the page. The domain name and URL are insignificant compared to having good, relevant content in a well defined structure.

How do SO URLs self correct themselves if they are mistyped?

If an extra character (like a period, comma or a bracket or even alphabets) gets accidentally added to URL on the stackoverflow.com domain, a 404 error page is not thrown. Instead, URLs self correct themselves & the user is led to the relevant webpage.
For instance, the extra 4 letters I added to the end of a valid SO URL to demonstrate this would be automatically removed when you access the below URL -
https://stackoverflow.com/questions/194812/list-of-freely-available-programming-booksasdf
I guess this has something to do with ASP.NET MVC Routing. How is this feature implemented?
Well, this is quite simple to explain I guess, even without knowing the code behind it:
The text is just candy for search engines and people reading the URL:
This URL will work as well, with the complete text removed!
The only part really important is the question ID that's also embedded in the "path".
This is because EVERYTHING after http://stackoverflow.com/questions/194812 is ignored. It is just there to make the link, if posted somewhere, if more speaking.
Internally the URL is mapped to a handler, e.g., by a rewrite, that transforms into something like: http://stackoverflow.com/questions.php?id=194812 (just an example, don't know the correct internal URL)
This also makes the URL search engine friendly, besides being more readable to humans.

What to use for space in REST URI?

What should I use:
/findby/name/{first}_{last}
/findby/name/{first}-{last}
/findby/name/{first};{last}
/findby/name/first/{first}/last/{last}
etc.
The URI represents a Person resource with 1 name, but I need to logically separate the first from the last to identify each. I kind of like the last example because I can do:
/findby/name/first/{first}
/findby/name/last/{last}
/findby/name/first/{first}/last/{last}
You could always just accept spaces :-) (querystring escaped as %20)
But my preference is to just use dashes (-) ... looks nicer in the URL. unless you have a need to be able to essentially query in which case the last example is better as you noted
Why not use + for space?
I am at a loss: dashes, minuses, underscores, %20... why not just use +? This is how spaces are normally encoded in query parameters. Yes, you can use %20 too but why, looks ugly.
I'd do
/personNamed/Joe+Blow
I like using "_" because it is the most similar character to space that keeps the URL readable.
However, the URLs you provided don't seem really RESTful. A URL should represent a resource, but in your case it represents a search query. So I would do something like this:
/people/{first}_{last}
/people/{first}_{last}_(2) - in case there are duplicate names
It this case you have to store the slug ({first}_{last}, {first}_{last}_(2)) for each user record. Another option to prepend the ID, so you don't have to bother with slugs:
/people/{id}-{first}_{last}
And for search you can use non-RESTful URLs:
/people/search?last={last}&first={first}
These would display a list of search results while the URLs above the page for a particular person.
I don't think there is any use of making the search URLs RESTful, users will most likely want to share links to a certain person's page and not search result pages. As for the search engines, avoid having the same content for multiple URLs, and you should even deny indexing of your search result pages in robots.txt
For searching:
/people/search?first={first}&last={last}
/people/search?first=george&last=washington
For resource paths:
/people/{id}-{first}-{last}
/people/35-george-washington
If you are using Ruby on Rails v3 in standard configuration, here's how you can do it.
# set up the /people/{param} piece
# config/routes.rb
My::Application.routes.draw do
resources :people
end
# set up that {param} should be {id}-{first}-{last}
# app/models/person.rb
class Person < ActiveRecord::Base
def to_param
"#{id}-#{to_slug(first_name)}-#{to_slug(last_name)}"
end
end
Note that your suggestion, /findby/name/first/{first}/last/{last}, is not restful. It does not name resources and it does not name them succinctly.
The most sophisticated choice should always and first of all consider two constraints:
As you'll never know how skilled the developer or the device being implemented on is regarding handling of urlencoding, i will always try to limit myself to the table of safe characters, as found in the excellent rant (Please) Stop Using Unsafe Characters in URLs
Also - we want to consider the client consuming the API. Can we have the whole structure easily represented and accessible in the client side programming language? What special characters would this requirement leave us with? I.e. a $ will be fine in javascript variable names and thus directly accessible in the parsed result, but a PHP client will still have to use a more complex (and potentially more confusing) notation $userResult->{'$mostVisited'}->someProperty... that a shot in your own foot! So for those two (and a couple of other programming environments) underscore seems the only valid option.
Otherwise i mostly agree with #yfeldblum`s response - i'd distinct between a search endpoint vs. the actual unique resource lookup. Feels more REST to me, but more importantly, the two have a significant cost difference on your api server - this way you can easier distinct and i.e. charge a higher costs or rate limit the search endpoint - should you ever need it.
To be Pragmatic, as opposed to a "RESTafarian" the mentioned approach /people/35-george-washington could (and should imho) basically respond to just the id, so if you want a named, urlsafe-for-dummies-link, list the reference as /people/35_george_washington. Other ideas could be /people/35/#GeorgeWashington (so breaking tons of RFCs) or /people/35_GeorgeWashington - the API wouldn't care.

Resources