Question is mark separating url and not getting it in view? - url

I am trying to pass a parameter in url named title and it contains question mark(?) in it. I am unable to get complete parameter. Question mark is not showing in my view.

That is because the question mark separates the path from the query string [wiki]. You need to encode it with percent-encoding [wiki], so the question mark is replaced with %3F, for example:
somehost.com/foo/bar%3fqux/
Here the part after foo/ will be seen as bar?qux.
That being said, often one wants to avoid these, since it will result in "ugly" URLs. Usually therefore a slug [Django-doc] is used, that removes such characters, and only retains characters that render nicely.

Related

How to equate a value in encoded URL?

We have multiple URL's, wanted to compare
https://securepubads.g.doubleclick.net/gampad/ads?ssp%3D0
if you can see in below URL ssp=0 can be present anywhere. I want a wildcard code that can be replace ssp%3D0.
https://securepubads.g.doubleclick.net/gampad/ads?pvsid%3D326202298883096&correlator%3D3063969260360993&output%3Dldjh&gdfp_req%3D0&vrg%3D2022030600&ptt%3D08&impl%3Dfifs&iu_parts%3D3302333,profpromo,medscpnewsdesktop&enc_prev_ius%3D/0/0/2,/0/0/2,/0/0/2,/0/0/2,/0/0/2,/0/0/2,/0/0/2&prev_iu_szs%3D320x90|828x90|980x90|980x290,320x90|300x290|300x600|300x390,320x90|0x2,0x2,320x90|2x9,320x90|2x9,320x90|828x90|828x90&fluid%3Dheight,height,height,0,height,height,height&ifi%3D0&adks%3D838338968,2333289989,3296303628,0699999960,3982038306,3982003000,883988630&sfv%3D0-0-38&ecs%3D20220323&fsapi%3Dfalse&prev_scp%3Dpos%3D003&ad_slot%3Dads-pos-003&mnetPageID%3D9&mnetCC%3DIN&mnetCV%3D0&mnetUGD%3D3&mnetCID%3D8CU9I96G3&hb_abt%3Dhb&mnetDNB%3D0|pos%3D022&ad_slot%3Dads-pos-022&mnetPageID%3D3&mnetCC%3DIN&mnetCV%3D0&mnetUGD%3D3&mnetCID%3D8CU9I96G3&hb_abt%3Dhb&mnetDNB%3D0|pos%3D900&ad_slot%3Dads-pos-900|pos%3D009&ad_slot%3Dads-pos-009|pos%3D622&ad_slot%3Dads-pos-622|pos%3D822&ad_slot%3Dads-pos-822|pos%3D030&ad_slot%3Dads-pos-030&mnetPageID%3D0&mnetCC%3DIN&mnetCV%3D0&mnetUGD%3D3&mnetCID%3D8CU9I96G3&hb_abt%3Dhb&mnetDNB%3D0&eri%3D0&cust_params%3Dlif%3D0&val%3D0&vit%3D&pbs%3D&st%3D0&tar%3D0&actid%3D0&occ%3D0&sa%3D0&tc%3D0&ct%3D0&pb%3D0&usp%3D0&pf%3D0&masid%3D0&gd%3D0&pbp%3D&ssp%3D08&ac%3D0&art%3D0&as%3D0&cg%3D0&ssp%3D0&scg%3D0&ck%3D0&pub%3D0&pc%3Dhp&auth%3D0&spon%3D08&env%3D0&envp%3Dprod&ep%3D0&
Sorry for bit confused question

How to get http tag text by id using lua

There is a webpage parser, which takes a page contains several tags, in a certain structure, where divs are badly nested. I need to extract a certain div element, and copy it and all its content to a new html file.
Since I am new to lua, I may need basic clarification for things might seem simple.
Thanks,
The ease of extraction of data is going to largely depend on the page itself. If the page uses the exact same tag information throughout its entirety, it'll be much more difficult to extract than it would if it has named tags.
If you're able to find a version of the page that returns json format, then you're that much better off. Here's a snippet of code on something I wrote to grab definitions from a webpage that did not have json format:
local actualword, definition = string.match(wayup,"<html.-<td class='word'>%c(.-)%c</td>.-<div class=\"definition\">(.-)</div>")
Essentially, this code searched down the page until it found the class "word", and took the word after it (%c is the pattern for control characters). It continued on to "definition" and captured that, as well.
As you can see, it's a bit convoluted, but I had the luck of having specifically named tags for what I wanted.
This is edited to fit your comment. As a side note that I should have mentioned before, if you're familiar with regular expressions, you can use its model to capture what you need. In this case, it's capturing the string in its totality:
local data = string.match(page, "(<div id=\"aa\"><div>.-</div>.-</div>)")
It's rarely the fault of the language, but rather the webpage itself, that makes it hard to data mine anything. Since webpages could literally have hundreds of lines of code, it's hard to pinpoint exactly what you want without coming across garbage information. It's why I prefer a simplified result such as json, since Lua has a json module that can encode/decode and you can get your precise information.

#! as opposed to just # in a permalink

I'm designing a permalink system and I just noticed that Twitter and Hipmunk both prefix their permalinks with #!. I was wondering why this is, and if the exclamation point in particular is there for a reason. Wouldn't #/ work just as well, since they're no doubt using a framework that lets them redirect queries to certain templates with a regex URL parser?
http://www.hipmunk.com/#!BOS.SEA,Dec15.Jan02
http://twitter.com/#!/dozba
My only guess is it's because browsers use # to link to an anchor element. Is this why the exclamation point is appended?
This is done to make an "AJAX" page crawlable [by google] for indexing -- It does not affect the other well-defined semantics of the fragment identifier at all!
See Making AJAX Applications Crawlable: Getting Started
Briefly, the solution works as follows: the crawler finds a pretty AJAX URL (that is, a URL containing a #! hash fragment). It then requests the content for this URL from your server in a slightly modified form. Your web server returns the content in the form of an HTML snapshot, which is then processed by the crawler. The search results will show the original URL.
I am sure other search-engines are also following this lead/protocol.
Happy coding.
Also, It is actually perfectly valid, at least per HTML5, to have an element with an ID of "!foo" so the
reasoning in the post is invalid. See the article "The id attribute just got more classy":
HTML5 gets rid of the additional restrictions on the id attribute. The only requirements left — apart from being unique in the document — are that the value must contain at least one character (can’t be empty), and that it can’t contain any space characters.
My guess is that both pages use this in their JavaScript to differ between # (a link to an anchor) and their custom #! which loads some additional content using Ajax.
In that case pretty much everything else would work after the # sign.

How do SO URLs self correct themselves if they are mistyped?

If an extra character (like a period, comma or a bracket or even alphabets) gets accidentally added to URL on the stackoverflow.com domain, a 404 error page is not thrown. Instead, URLs self correct themselves & the user is led to the relevant webpage.
For instance, the extra 4 letters I added to the end of a valid SO URL to demonstrate this would be automatically removed when you access the below URL -
https://stackoverflow.com/questions/194812/list-of-freely-available-programming-booksasdf
I guess this has something to do with ASP.NET MVC Routing. How is this feature implemented?
Well, this is quite simple to explain I guess, even without knowing the code behind it:
The text is just candy for search engines and people reading the URL:
This URL will work as well, with the complete text removed!
The only part really important is the question ID that's also embedded in the "path".
This is because EVERYTHING after http://stackoverflow.com/questions/194812 is ignored. It is just there to make the link, if posted somewhere, if more speaking.
Internally the URL is mapped to a handler, e.g., by a rewrite, that transforms into something like: http://stackoverflow.com/questions.php?id=194812 (just an example, don't know the correct internal URL)
This also makes the URL search engine friendly, besides being more readable to humans.

dynamic seo title for news articles

I have a news section where the pages resolve to urls like
newsArticle.php?id=210
What I would like to do is use the title from the database to create seo friendly titles like
newsArticle/joe-goes-to-town
Any ideas how I can achieve this?
Thanks,
R.
I suggest you actually include the ID in the URL, before the title part, and ignore the title itself when routing. So your URL might become
/news/210/joe-goes-to-town
That's exactly what Stack Overflow does, and it works well. It means that the title can change without links breaking.
Obviously the exact details will depend on what platform you're using - you haven't specified - but the basic steps will be:
When generating a link, take the article title and convert it into something URL-friendly; you probably want to remove all punctuation, and you should consider accented characters etc. Bear in mind that the title won't need to be unique, because you've got the ID as well
When handling a request to anything starting with /news, take the next part of the path, parse it as an integer and load the appropriate article.
Assuming you are using PHP and can alter your source code (this is quite mandatory to get the article's title), I'd do the following:
First, you'll need to have a function (or maybe a method in an object-oriented architecture) to generate the URLs for you in your code. You'd supply the function with the article object or the article ID and it returns the friendly URL with the ID and the friendly title.
Basically function url(Article $article) => URL.
You will also need some URL rewriting rules to remove the PHP script from the URL. For Apache, refer to the mod_rewrite documentation for details (RewriteEngine, RewriteRule, RewriteCond).

Resources