Naming convention for the combination of a URL's path & query - url

I'm struggling with a naming convention for a specific part of a URL.
For example: https://stackoverflow.com/questions/ask?abcd
Conventionally, this is what follows:
path = /questions/ask
query string = abcd
However I'm storing the following in a database, and am struggling on what to call it (it's not really a URI, path):
/questions/ask?abcd
Anyone run into this before?

Related

How to create URL for Base58 encoded IDs

I am planning to use Base58 encoding for IDs in URL in a new project.
An URL might look like example.com/sTPTi.
I am concerned about the possibility that a generated URL might coincide with a statically routed section like example.com/contacts.
I have an idea that I could insert a slash after the first symbol in the generated URLs. Thus my example would look like this: example.com/s/TPTi. And I could avoid all the risks by not using a single-symbol static routes. Is that a sane solution?
I am aware that I could prefix the generated URLs with something static like the common example.com/i/sTPTi but my solution makes it shorter by 2 symbols. I am also aware that the generated URLs could be simply prefixed by a symbol that I'd need to avoid for static URLs like prefixing the generated ones by a 9: example.com/9sTPTi but that somehow seems a bit dumb.

Problems getting data from XML using Nokogiri and Rails

I'm trying to get information from a XML file with Nokogiri. I can retrieve file using
f = File.open("/my/path/file.xml")
cac=Nokogiri::XML(f)
And what a get is a fancy noko:file. My row tags are defined like
<z:row ...info..../>
like
<Nokogiri::XML::Element:0x217e7b8 name="z:row" attributes=[#<Nokogiri::XML::Attr:0x217e754 name="ID_Poblacio" value="3">
and I cannot retrieve the rows using either:
s=cac.at_xpath("/*/z:row") or
s=cac.at_xpath("//z:row") or
s=cac.at_xpath("//row") or
s=cac.at_xpath("z:row")...
Probably I'm really fool but I cannot figure out which can be the issue.
Does anyone face this problem?
Thanks in advance.
P:S I tried to paste my cac file directly from bash but something wierd happens with format so I remove it from question. If anyone can explain how to do it I will appreciate it.
Your XML element name contains a colon, but it is not in a namespace (otherwise the prefix and uri would show up in the dump of the node). Using element names with colons without using namespaces is valid, but can cause problems (like this case) so generally should be avoided. Your best solution, if possible, would be to either rename the elements in your xml to avoid the : character, or to properly use namespaces in your documents.
If you can’t do that, then you’ll need to be able to select such element names using XPath. A colon in the element name part of an XPath node test is always taken to indicate a namespace. This means you can’t directly specify a name with a colon that isn’t in a namespace. A way around this is to select all nodes and use an XPath function in a predicate to refine the selection to only those nodes you’re after. You can use a colon in an argument to name() and it won’t be interpreted as a namespace separator:
s=cac.at_xpath("//*[name()='z:row']")

How can you get the canonical URL for a web page (Rails)?

I need to store a distinct URL for an external webpage
I need to put the URL into the database. I don't want to store the same page twice so
I need to strip all fluff off the URL.
# if I have
url_1 = "http://scientificamerican.com/royal-baby/?utm_campaign=promo"
# and
url_2 = "http://scientificamerican.com/royal-baby/?utm_source=email"
# then they should map to:
url_canonical = "http://scientificamerican.com/royal-baby/"
...it's not as simple as just stripping query parameters though
In order to get a single canonical URL regardless of what was on it I tried stripping the query string. The problem is that there are still CMSs which use the query string.
e.g.
url_1 = "https://www.scientificamerican.com/article.cfm?id=obama-budget"
# strip the query string and it becomes
url_1 = "https://www.scientificamerican.com/article.cfm"
# which is obviously the same for all articles :(
Is there any Rails tool for getting a page's canonical URL?
This is obviously a problem that a number of people have had to solve, not least the search engines. How do you reduce the URL down such that all that remains is the data for the page?
You can't. There is no way to know what query parameters are necessary to distinguish the URL. There are obviously many parameters you can knowingly remove (ie. utm_campaign, etc.) but not all.
You're best bet would be to load the HTML for the page and look for the canonical link element . If that exists, then you've got your canonical URL.
http://en.wikipedia.org/wiki/Canonical_link_element

How to check if URLs match, within a huge database of online products?

So, the problem seems simple at the beginning but is not. Using Mongo and Node.js.
Problem: I have a URL. I need to match that URL with all the URLs I have in my database. Remember, there is no rule that the URL I'm on always have "category" infront or things like that. And please don't take "cases" into consideration.
I have no clue of the name of parameters, or anything else.
Let's assume the URL is smth like example.com/category/product_name.html?session_id=2423412fd
In the database I only have example.com/product_name.html
The URL is smth like example.com/index.php?productid=6&category=3&utm_campaign=google&utm_source=click
In the database I only have example.com/index.php?productid=6
The URL is smth like example.com/product_name.html
In the database I only have example.com/category/subcategory/product.html
I think I made my point. What I'm looking is a solution that matches URL in any cases (they are more than these). It can be an external services, class or something complex.
But I need it to work, and to work very fast because is doing this on every page refresh.
Thank you!
I would use this function to separate the strings http://php.net/manual/en/function.parse-url.php
Then take parts of the path name which you want to match from the URL and query your database URL's looking for matches.
To follow on from Anagio's answer, the URL
example.com/index.php?productid=6&category=3&utm_campaign=google&utm_source=click
could be saved as a Mongo object like:
{
url: "example.com/index.php?productid=6&category=3&utm_campaign=google&utm_source=click",
indexes: [
"example.com",
"index.php",
"productid=6",
"category=3",
"utm_campaign=google",
"utm_source=click"
]
}
You could then split up any new URL using the same algorithm, then do a map/reduce on the indexes field for scoring and then take the highest score as the best "fuzzy match"

Symfony use_javascript() routing issue

i am using symfony 1.4.11; use_helper('Url').
On using link_to('new',course/course/type/new),
the url it show is ../backend_dev/backend_dev/Course/course/type/new
instead of
../backend_dev/Course/course/type/new.
Same issue exist for form_tag also.
Edit
Above issue was solved.By setting no_script_name: true at config and clearing cache.
But image_tag(),use_stylesheet() and use_javascript() gives path as for example
use_javascript('jquery-1.6.1.min.js')
==>../web/backend_dev/js/jquery-1.6.1.min.js
instead of
use_javascript('jquery-1.6.1.min.js') ==>../web/js/jquery-1.6.1.min.js
Any help appreciated.
Hard to say without your full routing.yml but the one thing i see is that your internal_uri should be expressed as an abs url with a query string like:
link_to('new','/Course/course?type=new');
Note the forward slash at the beginning. Also the module name should be the real module name, not the routed one so if the maodule is /apps/backend/modules/Course then the module in the internal URI should be Course not course same with the action name.
If the route is named then you should use one of the following:
link_to('new','#routename?type=new');
OR
link_to('new','routename', array('type'=>'new'));

Resources