pybabel text extraction from mako templates - pylons

By default pybabel is extacting ${_("mystr")} strings just fine from my mako templates, but when I try to use ${pgettext("myctx", "mystr")} for contextual translations, it doesn't seem to find and extract them.
My babel config is pretty basic:
[mako: templates/**.mako]
encoding = utf-8
Does anyone know how to get pybabel to extract pgettext translations from mako templates?

With Flask I implemented it importing pgettext for Mako:
kw['imports'] = ['from flask.ext.babel import gettext as _, pgettext']

Related

Tcl Tk show all available links

Is there a function in Tcl/Tk to show all available urls from a link? I want to start to programm a webcrawler with some features.
For example:
the user types this:
"www.testsite.com"
and he will get that:
"www.testsite.com/dir1/"
"www.testsite.com/dir2/"
e.g.
Or is it better to programm it with other language like phyton?
br
It's pretty easy to do with the http and tDOM packages. You just need to know a bit of XPath…
package require http
package require tdom
set tok [http::geturl http://example.com/index.html]
set html [http::data $tok]
http::cleanup $tok
set doc [dom parse -html $html]
foreach anchor [$doc selectNodes "//a"] {
puts [$anchor #href]
}

How to read .docx file using F#

How can I read a .docx file using F#. If I use
System.IO.File.ReadAllText("D:/test.docx")
It is returning me some garbage output with beep sounds.
Here is a F# snippet that may give you a jump-start. It successfully extracts all text contents of a Word2010-created .docx file as a string of concatenated lines:
open System
open System.IO
open System.IO.Packaging
open System.Xml
let getDocxContent (path: string) =
use package = Package.Open(path, FileMode.Open)
let stream = package.GetPart(new Uri("/word/document.xml", UriKind.Relative)).GetStream()
stream.Seek(0L, SeekOrigin.Begin) |> ignore
let xmlDoc = new XmlDocument()
xmlDoc.Load(stream)
xmlDoc.DocumentElement.InnerText
printfn "%s" (getDocxContent #"..\..\test.docx")
In order to make it working do not forget to reference WindowsBase.dll in your VS project.
.docx files follow Open Packaging Convention specifications. At the lowest level, they are .ZIP files. To read it programmatically, see example here:
A New Standard For Packaging Your Data
Packages and Parts
Using F#, it's the same story, you'll have to use classes in the System.IO.Packaging Namespace.
System.IO.File.ReadAllText has type of string -> string.
Because a .docx file is a binary file, it's probable that some of the chars in the strings have the bell character. Rather than ReadAllText, look into Word automation, the Packaging, or the OpenXML APIs
Try using the OpenXML SDK from Microsoft.
Also on the linked page is the Microsoft tool that you can use to decompile the office 2007 files. The decompiled code can be quite lengthy even for simple documents though so be warned. There is a big learning curve associated with OpenXML SDK. I'm finding it quite difficult to use.

Opening .doc files in Ruby

Can I open a .doc file and get that file's contents using Ruby?
If you only need the plain text content, you might want to have a look at Yomu. It's a gem whichs acts as a wrapper for Apache TIKA and it supports a variety of document formats which includes the following:
Microsoft Office OLE 2 and Office Open XML Formats (.doc, .docx, .xls, .xlsx, .ppt, .pptx)
OpenOffice.org OpenDocument Formats (.odt, .ods, .odp)
Apple iWorks Formats
Rich Text Format (.rtf)
Portable Document Format (.pdf)
The gem docx is very simple to use
require 'docx'
puts Docx::Document.open('test.docx')
or
d = Docx::Document.open('test.docx')
d.each_paragraph do |p|
puts p
end
you can find it at https://github.com/chrahunt/docx and install it by gem install docx
docx however doesn't support .doc files (word 2007 and earlier), then you can use the WIN32OLE like this:
require 'win32ole'
begin
word = WIN32OLE.connect('Word.Application')
doc = word.ActiveDocument
rescue
word = WIN32OLE.new('word.application')
path_open = 'C:\Users\...\test.doc' #yes: backslashes in windows
doc = word.Documents.Open(path_open)
end
word.visible = true
doc.Sentences.each { |x| puts x.text }
Yes and No
In Ruby you can do something like:
thedoc = `externalProgram some_file`
And so what you need is a good externalProgram.
You could look at the software library wv or the (apparently not recently updated) program antiword. I imagine there are others. OpenOffice can read doc files and export text files, so driving OO via the CLI will probably also work.
If you're on Windows, this will work: http://www.ruby-doc.org/stdlib/libdoc/win32ole/rdoc/classes/WIN32OLE.html
I recently dealt with this in a project and found that I wanted a lighter-weight library to get the text from .doc, .docx and .pdf files. DocRipper uses a combination of Antiword, grep and Poppler/pdftotext command-line tools to grab the text contents from files and return them as a utf-8 string.
dr = DocRipper::TextRipper.new('/path/to/file')
dr.text
=> "Document's text"

Iconv.conv in Rails application to convert from unicode to ASCII//translit

We wanted to convert a unicode string in Slovak language into plain ASCII (without accents/carons) That is to do: č->c š->s á->a é->e etc.
We tried:
cstr = Iconv.conv('us-ascii//translit', 'utf-8', a_unicode_string)
It was working on one system (Mac) and was not working on the other (Ubuntu) where it was giving '?' for accented characters after conversion.
Problem: iconv was using LANG/LC_ALL variables. I do not know why, when the encodings are known, but well... You had to set the locale variables to something.utf8, for example: sk_SK.utf8 or en_GB.utf8
Next step was to try to set ENV['LANG'] and ENV['LC_ALL'] in config/application.rb. This was ignored by Iconv in ruby.
Another try was to use global system setting in /etc/default/locale - this worked in command line, but not for Rails application. Reason: apache has its own environment. Therefore the final solution was to add LANG/LC_ALL variables into /etc/apache2/envvars:
export LC_ALL="en_GB.utf8"
export LANG="en_GB.utf8"
export LANGUAGE="en_GB.utf8"
Restarted apache and it worked.
This is more a little how-to than a question. However, if someone has better solution I would like to know about it.
You can try unaccent approach instead.

convert jruby 1.8 string to windows encoding?

I want to export some data from my jruby on rails webapp to excel, so I create a csv string and send it as a download to the client using
send_data(text, :filename => "file.csv", :type => "text/csv; charset=CP1252", :encoding => "CP1252")
The file seems to be in UTF-8 which Excel cannot read correctly. I googled the problem and found that iconv can convert encodings. I try to do that with:
ic = Iconv.new('CP1252', 'UTF-8')
text = ic.iconv(text)
but when I send the converted text it does not make any difference. It is still UTF-8 and Excel cannot read the special characters. there are several solutions using iconv, so this seems to work for others. When I convert the file on the linux shell manually with iconv it works.
What am I doing wrong? Is there a better way?
Im using:
- jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM) Client VM 1.6.0_19) [i386-java]
- Debian Lenny
- Glassfish app server
- Iceweasel 3.0.6
Edit:
Do I have to include some gem to use iconv?
Solution:
S.Mark pointed out this solution:
You have to use UTF-16LE encoding to make excel understand it, like this:
text= Iconv.iconv('UTF-16LE', 'UTF-8', text)
Thanks, S.Mark for that answer.
According to my experience, Excel cannot handle UTF-8 CSV files properly. Try UTF-16 instead.
Note: Excel's Import Text Wizard appears to work with UTF-8 too
Edit: A Search on Stack Overflow give me this page, please take a look that.
According to that, adding a BOM (Byte Order Mark) signature in CSV will popup Excel Text Import Wizard, so you could use it as work around.
Do you get the same result with the following?
cp1252= Iconv.conv("CP1252", "UTF8", text)

Resources