wget and special characters - character-encoding

I am using wget locally to take a static snapshot of a small web app. When I do, the resulting html files come back with strange characters in place of quotation marks and apostrophes.
What can I do to avoid this behavior?
Thanks.

I would suggest trying with:
--restrict-file-names=nocontrol
Source: http://www.win.tue.nl/~aeb/linux/misc/wget.html

Sounds like you need to specify --remote-encoding perhaps --remote-encoding=utf-8.

I had this same problem but then I found out that my browser showed the web page with wrong enconding. For example in Firefox I just needed to change View -> Character Encoding -> Unicode.

I had such issue too. It appeared the page I was downloading were gziped.
You can check this using the -S option in wget.
You will find a
Content-Encoding: gzip
line. In such case I use zcat to read the file.

It seems that wget can't guess the encoding so you need this in your html response of your web app:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

I had this vary same problem (a wget mirror with special characters and quotation marks shown as Unicode "unknown char", ?) when browsing the mirror.
The problem turned to be related to the different servers encoding, rather than depending on wget. The original server was a old Windows+IIS installation configured to serve HTML pages with ISO-8859 encoding, while the mirror was an Linux+Apache server configured to serve UTF-8 pages.
The solution was to configure Apache to serve ISO-8859 pages, adding to the right virtual host the directive AddDefaultCharset ISO-8859-1

Related

Generate markdown docs with rustdoc?

Is there any way to generate a single markdown file in doc/ from the /// comments?
Multiple markdown files (doc/main.md, doc/foo.md, etc) would be nice too.
I'm new to rust, and while the generated HTML documentation is nice, I mostly live on the command line and really don't want to be switching between my terminal and a web browser just to read the docs. That breaks the flow and takes me out of the zone. Also, md is easily converted to man pages, or to TeX for printed or PDF docs.
(I'm used to suspending vim with Ctrl-Z or using another terminal tab, and running man or perldoc or pydoc etc. Text-mode browsers like lynx nor links are not good options for me - navigation is clumsy, the output is ugly on my 200+ column terminals windows if i forget to use the -width option, and neither support javascript)
cargo-readme might work for you. You run cargo readme -i foo.rs > FOO.md and it populates FOO.md with the contents of the doc comments from foo.rs. Found it via reddit.

Chinese and Japanese character encoding issues when exporting HTML to PDF

I run a web-based timeline maker that lets users create timelines in HTML/JavaScript and then export them to PDF files for printing when they're done.
I have had several users report issues with exporting their timelines to PDFs when the timelines contain certain Unicode characters. Here, for example, is a screenshot showing the web page and the PDF file that is generated:
I've been trying to wrap my head around why some Unicode character blocks like Block Elements and Georgian will export but Chinese and Japanese will not. Also, the export works correctly when I perform it on my local computer, but results in the above output when exporting on Heroku.
Does anyone know what might be causing this?
For completeness, the backend is in Ruby on Rails and it uses the PDFKit gem to convert the HTML page to a PDF and the site is hosted on Heroku.
It sounds like it might be an issue with the fonts on the server. The webpage version of the timeline renders correctly because you obviously have the correct font on the client machine that is running the browser. The PDF on the other hand is generated on the server, and thus has to use a font available to it there.
If that's the case, then using a font that both exists on the server and supports the correct CJK characters should fix this issue.
Having personally experienced this with Rails and Heroku, I can tell you the reason is either (A) fonts on your system not matching the fonts on Heroku, or (B) pdfkit having trouble loading custom fonts linked through CSS, or some combination of both
Most likely, you are referencing fonts on your local system (which contain glyphs for special characters) that don't match the fonts on Heroku. Run fc-list in Heroku's bash to get a list of their installed fonts, and substitute your font(s) for one that has the needed extended charset. However, now you will have to ensure that this font is also installed on your local machine. (Even worse, you could use different fonts for dev and production.)
You can also try uploading fonts to Heroku, and linking them from there. However, I've found this method to be unreliable when spanning across multiple systems or dev/staging/production environments, because each and every system has to have the required fonts installed. And even then, PDFkit makes you jump through hoops to get CSS fonts to work (for example, because of subtle variation in interpretation of font names by different operating systems).
The best solution I've found is to encode and embed fonts directly into CSS. Base-64 encode a font, and add it to the stylesheet:
#font-face {
font-family: 'OpenSans';
src: url(data:font/truetype;charset=utf-8;base64,AAEAAAATAQA...
}
Now you have a bulletproof stylesheet that's portable and self-compatible with every system.
If you do use Docker and is having the same issue above then try installing Japanese fonts at Docker: apt-get install fonts-takao-mincho
If it works then add it to your Dockerfile:
apt update && apt install -y \
# japanese fonts
fonts-takao-mincho

Force MP3 link to download instead of stream but keep HTML5 audio

So apparently the way to force an MP3 to download instead of play in the browser is to set the MIME type as file and/or set the Content-Disposition response header in the .htaccess.
What is the difference between these two methods and is it better to use one or the other, or both?
Also, will doing either of these break HTML5's handling of the <audio> tag when using an MP3 file as the source?
1. Use headers correctly
This is a very widespread problem and unfortunately even the PHP manual is plagued with errors. Developers usually say “this works for me” and they copy stuff they don’t fully understand.
First of all, I notice the use of headers like Content-Description and Content-Transfer-Encoding. There is no such thing in HTTP. Don’t believe me? Have a look at RFC2616, they specifically state “HTTP, unlike MIME, does not use Content-Transfer-Encoding, and does use Transfer-Encoding and Content-Encoding“. You may add those headers if you want, but they do absolutely nothing. Sadly, this wrong example is present even in the PHP manual.
Second, regarding the MIME-type, I often see things like Content-Type: application/force-download. There’s no such thing and Content-Type: application/octet-stream (RFC1521) would work just as fine (or maybe application/x-msdownload if it’s an exe/dll). If you’re thinking about Internet Explorer, it’s even better to specify it clearly rather than force it to “sniff” the content. See MIME Type Detection in Internet Explorer for details.
Even worse, I see these kinds of statements:
header("Content-Type: application/force-download");
header("Content-Type: application/octet-stream");
header("Content-Type: application/download");
The author must have been really frustrated and added three Content-Type headers. The only problem is, as specified in the header() manual entry, “The optional replace parameter indicates whether the header should replace a previous similar header, or add a second header of the same type. By default it will replace“. So unless you specify header("Content-Type: some-value", FALSE), the new Content-Type header will replace the old one.
2. Forcing download and Internet Explorer bugs
What would it be like to not having to worry about old versions of Internet Explorer? A better world, that’s for sure.
To force a file to download, the correct way is:
header("Content-Disposition: attachment; filename=\"$file_name\"");
Note: the quotes in the filename are required in case the file may contain spaces.
The code above will fail in IE6 unless the following are added:
header("Pragma: public");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
Now, the use of Cache-Control is wrong in this case, especially to both values set to zero, according to Microsoft, but it works in IE6 and IE7 and later ignores it so no harm done.
If you still get strange results when downloading (especially in IE), make sure that the PHP output compression is disabled, as well as any server compression (sometimes the server inadvertently applies compression on the output produced by the PHP script).
Look at this,
.mp3 audio/mpeg3
.mp3 audio/x-mpeg-3
.mp3 video/mpeg
.mp3 video/x-mpeg
See this link for more info.
Using Content-Disposition: attachment... forces a download box to appear instead of having to right click -> save target as.

How to make Unrealizable characters in URL?

Im just wondering
How to make an URL to ignore loading some characters in it
like this :
http://example.com?test=test
"test=test"
the page will be normally loaded but test=test will not because its not realized on the webpage
how to do the same at the starter of the URL?
EDIT: this assumes you are running apache as your webserver
You need to change your httpd.conf and add a wildcard default server name.
see http://www.webmasterworld.com/forum92/3844.htm

How Can I Automatically Execute A Link In Internet Explorer

I am trying to create an application to print documents over the web. I have created my document, and made a web page with a meta refresh tag, along the lines of this:
<meta http-equiv="refresh" content="3;http://example.com/download.epl2" />
I specify that the document has a content-type of application/x-epl2, and I have associated .epl2 files on my computer with a program that silently sends them to the printer.
I have put the website into my trusted sites zone.
Currently Internet Explorer pops up the "Open, Save, Cancel" dialog box with no option to automatically open the file.
Is there a setting in IE6/7/8 that I can use to have IE just open the file without prompting?
EDIT
The actual content of the file will differ based on the job, but essentially it is text that follows the Eltron Programming Language.
EDIT
I have accomplished this in both Chrome and Firefox by choosing "Automatically Open Files Of This Type From Now On"
EDIT
The machines this program will be used on will effectively be kiosks that are limited to only accessing my website from their web browsers, so I'm not worried about rogue websites sending documents to my printers.
EDIT
I am using PHP to generate the documents and HTML on the server side, though I expect the solution to be language agnostic.
I would expect that not to be possible, because then you could stumble onto a site that automatically loads and prints a 5000 page document or something, which would not be good.
If you always had a secret desire to develop a custom URL protocol (I know I do), this might be a good excuse to do it. ;-)
http://msdn.microsoft.com/en-us/library/aa767914%28VS.85%29.aspx
There are 1-2 prompts when opening such a link for the first time in IE, but you can choose to automatically open them after that.
I would use javascript to make this happen.
Javascript Window Open
EDIT
Since you have control of the windows box you could use an automate script process to interact with the print window.
autoit3: ControlClick
Write a small utility program that does nothing but send the file passed to it on the command-line to the default system printer.
Then, edit the registry under HKEY_CLASSES_ROOT to associate this program with the .epl2 filetype.
I don't have time to investigate it for You, but there were lots of exploits that could be helpful. Using ie6 without certain fixes seems helpful.
Also there should be an option called "Automatic prompting for file downloads". I use Linux nowadays so I can't chceck if it helps. I found it in some docs.
I'm on a Mac at the moment, but if this is possible in IE I would imagine this page holds the answer to it (or at least hints at it) http://support.microsoft.com/kb/883255
I believe what you're looking for is a setting in Windows, not IE:
Microsoft Support: Not Prompted to Specify Download Folder for File
Try using an older version of IE. Security was looser in the older versions and since it's a non-issue, this could be the quickest solution.

Resources