Why does my org-mode file is showing some ^M and \303\251 characters - character-encoding

I'm quite new to org-mode (6 months) but I must confess that I cannot envisage my working day without it now, as a note-taking tool and amy genda.
I'm quite lost as since a couple of days, my file shows some ^M and \303\251 everywhere. What is weird is that it didn't happen before... I must precise that I write in mixed English and french, therefore I use special characters like "é, è, ç", typical in french.
I assume it deals with the encoding system, that's why I went through other questions. But I can't find any workaround.
My .emacs contains this code since the very beginning
(prefer-coding-system 'utf-8)
(setq coding-system-for-read 'utf-8)
(setq coding-system-for-write 'utf-8)
My emacs version in 26.2 and I'm on Windows.
Thanks in advance for your help.

Related

RStudio - Script opened with wrong encoding, how to get back original characters?

I have a file in Spanish, when seen on my teacher's PC a bit of text would display as
regresión cuantílica más
but now that I've opened on mine I see this:
regresión cuantílica más
I have tried "Save with Encoding" to ISO-8859-1 and UTF-8 but it doesn't seem to change anything. Will I need to run some regex replacements on my file or is there a simpler way to fix this?
If you have already saved it and you've lost the original version of the file, it will be a pain to recover.
What you should have done when you noticed the bad characters was "Reopen with encoding", and chosen the "UTF-8" encoding. If you can still get the original file, do this now.
If you can't, then you're stuck with lots of manual fixing. Accented characters (and Euro signs, and a few other things) will show up as multi-character sequences. When you recognize one, use search and replace to replace that sequence with the correct character.

How to read a text file in ancient encoding?

There is a public project called Moby containing several word lists. Some files contain European alphabets symbols and were created in pre-Unicode time. Readme, dated 1993, reads:
"Foreign words commonly used in English usually include their
diacritical marks, for example, the acute accent e is denoted by ASCII
142."
Wikipedia says that the last ASCII symbol has number 127.
For example this file: http://www.gutenberg.org/files/3203/files/mobypos.txt contains symbols that I couldn't read in any of vatious Latin encodings. (There are plenty of such symbols in the very end of section of words beginning with B, just before C letter. )
Could someone advise please what encoding should be used for reading this file or how can it be converted to some readable modern encoding?
A little research suggests that the encoding for this page is Mac OS Roman, which has é at position 142. Viewing the page you linked and changing the encoding (in Chrome, View → Encoding → Western (Macintosh)) seems to display all the words correctly (it is incorrectly reporting ISO-8859-1).
How you deal with this depends on the language / tools you are using. Here’s an example of how you could convert into UTF-8 with Ruby:
require 'open-uri'
s = open('http://www.gutenberg.org/files/3203/files/mobypos.txt').read
s.force_encoding('macroman')
s.encode!('utf-8')
You are right in that ASCII only goes up to position 127 (it’s a 7-bit encoding), but there are a large number of 8 bit encodings that are supersets of ASCII and people sometimes refer to those as “Extended ASCII”. It appears that whoever wrote the readme you refer to didn’t know about the variety of encodings and thought the one he happened to be using at the time was universal.
There isn’t a general solution to problems like this, as there is no guaranteed way to determine the encoding of some text from the text itself. In this case I just used Wikipedia to look through a few until I found one that matched. Joel Spolsky’s article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) is a good place to start reading about character sets and encodings if you want to learn more.

Correctly translate backslashed numbers to symbols?

I just decompiled a file with luadec, it does it well, and, the output not being perfect, it's still usable, but I'm getting a weird string of numbers \198\247\184\181\188\177\177\219\183\161\189\186 that I know for a fact are in Korean language, but I do not know what they're called and basically can't find anything about them.
I just need to correctly translate the string from numbers to symbols or gibberish text, like this c±Ý»ö´À³¦Ç¥.
If someone could point me in the right direction I would be grateful, thanks.
I ran this script with Lua
print"\198\247\184\181\188\177\177\219\183\161\189\186"
and saved the output to a text file which I then loaded into Safari.
I got gibberish the default encoding. I got 포링선글래스 with Korean (Mac OS) encoding. Same thing with Korean (Windows, DOS), but not with Korean (ISO 2022-KR).
Note that escaped numbers in Lua are in decimal.

Cannot Serve International Characters From Lisp Portable AllegroServe

I am using Clozure Cl on Mac os x 10.9 and Portable allegro serve
I have a file with text has characters like ı ç ş ö (these are some characters Turkish also have) and some Arabic characters. I cannot serve them. when i visit from the browser this kind of characters are not displayed at all, only part of text showed is the ones until the first ı in the text.
In Lisp i use a function composed with a do and read-lines and format (or i have tried print princ prin1 also) reads entire document and when i set the :external-format :utf-8 it shows the read characters properly in Lisp. Problem is in serving them, if i can serve them as i read on Lisp it will be done.
Also If do not set :external-formatat all, in Lisp it is read improperly, as expected, however, this time the browser can show all the text but with wrong characters in place of above described characters.
How to fix that and use external-formats character encodings properly?
See http://www.xach.com/lisp/allegro-cl/2001-3/964.html for an example on how to use :external-format in AllegroServe.
Cheers
Frank
P.S. I also posted an answer to the same question newsgroup comp.lang.lisp .

Is there a solution to the character encoding problem ("�") for Rails 2 / Ruby 1.8.7?

From the Rails 3 announcement listing the major new features:
Say goodbye to encoding issues
If you browse the Internet with any frequency, you will likely encounter the � character. This problem is extremely pervasive, and is caused by mixing and matching content with different encodings.
In a system like Rails, content comes from the database, your templates, your source files, and from the user. Ruby 1.9 gives us the raw tools to eliminate these problems, and in combination with Rails 3, � should be a thing of the past in Rails applications. Never struggle with corrupted data pasted by a user from Microsoft Word again!
I have an app where users often paste in text from MS Word and we encounter exactly this issue.
However we're running Rails 2 and Ruby 1.8.7. There is no immediate prospect of changing this.
I think the encoding problem usually manifests with typographer's quotes ("curly quotes"). Probably also things like em dashes and the elipses character.
I'm wondering if there's routine I can run on the incoming data to overcome this problem.
It's OK if the quotes get turned into straight quotes, elipses get turned into three periods, etc.
It could even be a utility that runs on the system level that I could call from my app with
processed_data = `system_command #{params[:incoming_data]}`
You can use the rchardet gem to detect the encoding of incoming strings, and the built-in Iconv libs to convert strings that need conversion:
require ‘rchardet’
[...]
cd = CharDet.detect(params[:my_upload_form][:uploaded_file])
encoding = cd['encoding']
converted_string = Iconv.conv(‘UTF-8′, encoding, params[:my_upload_form][:uploaded_file])
The example is working on an uploaded file, but of course you can apply it to data coming in from textareas or wherever else you think users may be pasting data in encodings other than the one you want.
Borrowed shamelessly from the kind gentleman at http://www.meeho.net/blog/2010/03/ruby-how-to-detect-the-encoding-of-a-string/.

Resources