Creating a POST with url elisp package in emacs: utf-8 problem - ruby-on-rails

I'm currently creating a Rest client for making blog posts much in the spirit of pastie.el. The main objective is for me to write a textile in emacs and make a post to a Rails application that will create it. It is working fine until I type anything in either spanish or japanese, then I get a 500 error. pastie.el has this same problem also by the way.
Here is the code:
(require 'url)
(defun create-post()
(interactive)
(let ((url-request-method "POST")
(url-request-extra-headers '(("Content-Type" . "application/xml")))
(url-request-data (concat "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
"<post>"
"<title>"
"Not working with spanish nor japanese"
"</title>"
"<content>"
;; "日本語" ;; not working
;; "ñ" ;; not working either
"h1. Textile title\n\n"
"*Textile bold*"
"</content>"
"</post>"))
) ; end of let varlist
(url-retrieve "http://127.0.0.1:3000/posts.xml"
;; CALLBACK
(lambda (status)
(switch-to-buffer (current-buffer)))
)))
The only way I can imagine right now that the problem could be fixed is by making emacs encode the utf-8 characters so that a 'ñ' becomes '&#241' (which works by the way).
What could be a work around for this problem?
EDIT: '*' is not equivalent to &ast;'. What I meant was that if I encoded to utf-8 with emacs using for example 'sgml-char' it would make the whole post become utf-8 encoded. Like &ast;Textile bold&ast; thus making RedCloth being unable to convert it into html. Sorry, it was very bad explained.

A guess: does it work if you set url-request-data to
(encode-coding-string (concat "<?xml etc...") 'utf-8)
instead?
There's nothing really to tell url what coding system you use, so I guess you have to encode your data yourself. This should also give a correct Content-length header, as that just comes from (length url-request-data), which would obviously give the wrong result for most UTF-8 strings.

Thanks to #legoscia I know now that I have to encode the data by myself. I'll post the function here for future reference:
(require 'url)
(defun create-post()
(interactive)
(let ((url-request-method "POST")
(url-request-extra-headers '(("Content-Type" . "application/xml; charset=utf-8")))
(url-request-data
(encode-coding-string (concat "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
"<post>"
"<title>"
"Not working with spanish nor japanese"
"</title>"
"<content>"
"日本語\n\n" ;; working!!!
"ñ\n\n" ;; working !!!
"h1. Textile title\n\n"
"*Textile bold*"
"</content>"
"</post>") 'utf-8)
)
) ; end of let varlist
(url-retrieve "http://127.0.0.1:3000/posts.xml"
;; CALLBACK
(lambda (status)
(switch-to-buffer (current-buffer))
)))) ;let

Related

Detecting encoding before opening a file

I got a file with an unknown character encoding. Running file -bi test.trace returns text/plain; charset=us-ascii but using
(with-open-file (stream "/home/*/test.trace" :external-format :us-ascii)
(code-to-work-with-file))
gives me an exception:
:ASCII stream decoding error on
#<SB-SYS:FD-STREAM for "file /home/*/test.trace" {10208D2723}>:
the octet sequence #(194) cannot be decoded. [Condition of type SB-INT:STREAM-DECODING-ERROR]
How can I detect the encoding of a file before opening it?
I can open the file with emacs,less and nano just fine so it seems to be a miss-detection of the encoding or a difference in what file and sbcl think an encoding should look like.
I currently avoid this problem by forcing every file to have a utf8 encoding with vim +set nobomb | set fenc=utf8| x file-path. But even after this file still thinks it is an us-ascii encoding. Additional this is not a valid permanent solution, rather a dirty hack to make it work.
As pointed in prorgammers stackexchange in here,
Files generally indicate their encoding with a file header. There are
many examples here. However, even reading the header you can never be
sure what encoding a file is really using.
I looked for trace files in my system and found this one but this not have any funny thing
2016-06-22 13:10:07 ☆ |ruby-2.2.3#laguna| Antonios-MacBook-Pro in ~/learn/lisp/stackoverflow/scripts
○ → file -I resources/hello.trace
resources/hello.trace: text/plain; charset=us-ascii
2016-06-22 13:11:50 ☆ |ruby-2.2.3#laguna| Antonios-MacBook-Pro in ~/learn/lisp/stackoverflow/scripts
○ → cat resources/hello.trace
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
So With this code I can read it:
CL-USER> (with-open-file (in "/Users/toni/learn/lisp/stackoverflow/scripts/resources/hello.trace" :external-format :us-ascii)
(when in
(loop for line = (read-line in nil)
while line do (format t "~a~%" line))))
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
NIL
or even in chinese or whatever it was:
we can read the ascci character like this
CL-USER> (format nil "~{~C~}" (mapcar #'code-char '(194)))
"Â"
Or any other strange character so it seems that can be characters with accents I add this to the file:
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
Â
patatopita
and I get the same error:
:ASCII stream decoding error on
for "file /Users/toni/learn/lisp/stackoverflow/scripts/resources/hello.trace"
{1003994043}>:
the octet sequence #(195) cannot be decoded.
[Condition of type SB-INT:STREAM-DECODING-ERROR]
So at this point you can work with contitions and restarts, to change the character, there is an option, I not specialist in this kind of code but there could be a restart with
Restarts:
0: [ATTEMPT-RESYNC] Attempt to resync the stream at a character boundary and continue.
1: [FORCE-END-OF-FILE] Force an end of file.
2: [INPUT-REPLACEMENT] Use string as replacement input, attempt to resync at a character boundary and continue.
3: [*ABORT] Return to SLIME's top level.
4: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {10050E0003}>)
Input-replacement, if not try wit european like latin-1 or ISO....
CL-USER> (with-open-file (in "/Users/toni/learn/lisp/stackoverflow/scripts/resources/hello.trace" :external-format :latin-1)
(when in
(loop for line = (read-line in nil)
while line do (format t "~a~%" line))))
println! { "Hello, World!" }
print! { concat ! ( "Hello, World!" , "\n" ) }
¬
patatopita
NIL
And it should work, good luck
so let's read with european charset

String Stream in Prolog?

I have to work with some SWI-Prolog code that opens a new stream (which creates a file on the file system) and pours some data in. The generated file is read somewhere else later on in the code.
I would like to replace the file stream with a string stream in Prolog so that no files are created and then read everything that was put in the stream as one big string.
Does SWI-Prolog have string streams? If so, how could I use them to accomplish this task? I would really appreciate it if you could provide a small snippet. Thank you!
SWI-Prolog implements memory mapped files. Here is a snippet from some old code of mine, doing both write/read
%% html2text(+Html, -Text) is det.
%
% convert from html to text
%
html2text(Html, Text) :-
html_clean(Html, HtmlDescription),
new_memory_file(Handle),
open_memory_file(Handle, write, S),
format(S, '<html><head><title>html2text</title></head><body>~s</body></html>', [HtmlDescription]),
close(S),
open_memory_file(Handle, read, R, [free_on_close(true)]),
load_html_file(stream(R), [Xml]),
close(R),
xpath(Xml, body(normalize_space), Text).
Another option is using with_output_to/2 combined with current_output/1:
write_your_output_to_stream(Stream) :-
format(Stream, 'example output\n', []),
format(Stream, 'another line', []).
str_out(Codes) :-
with_output_to(codes(Codes), (
current_output(Stream),
write_your_output_to_stream(Stream)
)).
Usage example:
?- portray_text(true), str_out(C).
C = "example output
another line"
Of course, you can choose between redirecting output to atom, string, list of codes (as per example above), etc., just use the corresponding parameter to with_output_to/2:
with_output_to(atom(Atom), ... )
with_output_to(string(String), ... )
with_output_to(codes(Codes), ... )
with_output_to(chars(Chars), ... )
See with_output_to/2 documentation:
http://www.swi-prolog.org/pldoc/man?predicate=with_output_to/2
Later on, you could use open_string/2, open_codes_stream/2 and similar predicates to open string/list of codes as an input stream to read data.

what are the influences after cancelling \hypersetup in org-mode?

I make pdf in org-mode with my own preamble, but the PDF or tex file generated always appear the information resulting from:
(format "\\hypersetup{\n pdfkeywords={%s},\n pdfsubject={%s},\n pdfcreator={%s}}\n"
(org-export-latex-fontify-headline keywords)
(org-export-latex-fontify-headline description)
(concat "Emacs Org-mode version " org-version))
those cods locate in ~/.emacs.d/org-7.8.11/lisp/org-latex.el
I cancelled it to prevent the useless information appearing at the first page of its PDF-file.
However is it OK to delete such codes without any function lost?
what influences will be caused by this action?
Thank you for your help.
Ok, I know how it is.
add the code into your preamble :
\usepackage{hyperref}
all are solved.
In that same file you mention, (lisp/org-latex.el), it suggests
org-latex-hyperref-template...
Set it to the empty string to ignore the command completely.
So the following should work
(setq org-latex-hyperref-template "")
Alas, I've reading downloaded source code not the same as my current org version, so this didn't work. apropos may be an easy fix, so
M-x apropos RET hyperref RET
leads me to the variable org-latex-with-hyperref, so now I try
(setq org-latex-with-hyperref nil)
This worked for me.

Rails 2.3.2/Ruby 1.8.6 Encoding Question - ActionController returning UTF-8?

I have a pretty simple Rails question regarding encoding that I can't find an answer to.
Environment:
Rails 2.3.2/Ruby1.8.6
I am not setting any encoding options within the Rails environment currently, have left everything to defaults.
If I read a String from disk from a text file - and send it via Rails render :text functionality using Apache/Phusion, what encoding should the client expect?
Thank you for any answers,
Since about Rails 1.2, Rails sets Ruby 1.8's $KCODE magic variable to "UTF8". It includes ActiveSupport::CoreExtensions::String::Multibyte to patch around issues with otherwise ambiguous per-character/per-byte operators. Your text file should be UTF-8, Ruby will pass it through and your application layout should specify a META tag declaring the document's charset to be UTF-8 too:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Then it should all 'just work', but there are some gotchas described below.
If you're on a Mac, running "script/console" in Terminal.app and then pasting unusual character sequences directly into the terminal from e.g. the Character Viewer is a good way to play around and demonstrate this to your own satisfaction, since the whole OS works in UTF-8. I don't know what the equivalent would be for Windows or an arbitrary Linux distribution.
For example, "⇒" - RIGHTWARDS DOUBLE ARROW - is Unicode 21D2, UTF8 0xE2 (226), 0x87 (125), 0x92 (146). If I paste that into Terminal and ask for the byte values I get the expected result:
>> $KCODE
=> "UTF8"
>> "⇒"
=> "\342\207\222"
>> puts "⇒"
⇒
...but...
>> "⇒"[0]
=> 226
>> "⇒"[1]
=> 135
>> "⇒"[2]
=> 146
>> "⇒"[3]
=> nil
Note how you're still getting byte access with "[]". See the documentation on the Multibyte extensions in the Rails API (for Rails 2.2, e.g. at http://railsapi.com/) if you want to do string operations, otherwise things like "foo.reverse" will do the wrong thing; "foo.mb_chars.reverse" gets it right by using the "mb_chars" proxy.

Can I use a regular expression to extract the domain from a URL?

Suppose I want to turn this :
http://en.wikipedia.org/wiki/Anarchy
into this :
en.wikipedia.org
or even better, this :
wikipedia.org
Is this even possible in regex?
Why use a regex when Ruby has a library for it? The URI library:
ruby-1.9.1-p378 > require 'uri'
=> true
ruby-1.9.1-p378 > uri = URI.parse("http://en.wikipedia.org/wiki/Anarchy")
=> #<URI::HTTP:0x000001010a2270 URL:http://en.wikipedia.org/wiki/Anarchy>
ruby-1.9.1-p378 > uri.host
=> "en.wikipedia.org"
ruby-1.9.1-p378 > uri.host.split('.')
=> ["en", "wikipedia", "org"]
Splitting the host is one way to separate the domains, but I'm not aware of a reliable way to get the base domain -- you can't just count, in the event of a URL like "http://somedomain.otherdomain.school.ac.uk" vs "www.google.com".
/http:\/\/([^\/]*).*/ will produce en.wikipedia.org from the string you provided.
/http:\/\/.{0,3}\.([^\/]*).*/ will produce wikipedia.org.
yes
Now I know you haven't asked for how, and you haven't specified a language, but I'll answer anyway... (note, this works for all language subsites, not just en.wikipedia...)
perl:
$url =~ s,http://[a-z]{2}\.(wikipedia\.org)/.*,$1,;
ruby:
url = url.sub(/http:\/\/[a-z]{2}\.(wikipedia\.org)\/.*/, '\1')
php:
$url = preg_replace('|http://[a-z]{2}.(wikipedia.org)/.*|, '$1', $url);
Of course, for this particular example, you don't even need a regex, just this will do:
url = 'wikipedia.org'
but I jest...
you probably want to handle any URL and pull out the domain part, and it should also work for domains in different countries, eg: foo.co.uk.
In which case, I'd use Mark Rushakoff's solution to get the hostname and then a regex to pull out the domain:
domain = host.sub(/^.*\.([^.]+\.[^.]+(\.[a-z]{2})?)$/, '\1')
Hope this helps
Also, if you want to learn more, I have a regex tute online: http://tech.bluesmoon.info/2006/04/beginning-regular-expressions.html
Sure all you would have to do is search on http://(.*)/wiki/Anarchy
In Perl (Sorry I don't know Ruby, but I expect it's similar)
$string_to_search =~ s/http:////(.)//. should give you wikipedia.org
to get rid of the en, you can simply search on http:////en(.)//......
That should do it.
Update: In case you're not familiar with Regex, I would recommend picking up a Regex book, this one really rocks and I like it: REGEX BOOK,Mastering Regular Expressions, I saw it on half.com the other day for 14.99 used, but to clarify what i suggested above, is to look for the string http://en, then for anything until you find a / this is all captured in $1 (in perl, not sure if it's the same in ruby), a simple print $1 will print the string.
Update: #2 sorry the star in the regex is not showing up for some reason, so where you see the . in the () and after the // just imagine a *, oh and I forgot for the en part add a /. at the end that way you don't end up with .wikipedia.org

Resources