jsPDF doesn't show certain letters properly (ū) - character-encoding

I am trying to create a PDF File to export using jsPDF library. In one of the lines I am trying to write a word that contains this 'ū' letter,
doc.text('Hūla', 20, 30);
However, when doing so the exported file doesn't contain this letter but instead it becomes
'H k l a' with spaces in between and a k instead of the ū.
What can I do in order to have this printed properly?

The solution was to use a font that supports this. I had to try multiple ones in order to get it working with this letter ū (it is not a specific language)
The font was Amiri. It also supported the Arabic font.

Related

How do I import words from a text file?

h
words_to_guess= (which code should I enter here in order to import words from a text file )
my text file is names word.txt so i tried
words_to_guess =import words.txt
You can assign the content of a .txt file to a variable using this code:
words_to_guess = open("words.txt")
Just make sure that the file "words.txt" is in the same directory as the .py file (or if you're compiling it to a .exe, the same directory as the .exe file)
I would also like to point out that based on the screenshot you provided, it looks like you're trying to get a random word from the .txt file. Once you've done the above code, I would recommend adding this code below it as well:
words_to_guess = words_to_guess.split()
This will take the content of "words_to_guess" and split every word into a list that can be further accessed. You can then call:
word = random.choice(words_to_guess)
And it will select a random element from the array into the "word" variable, hence providing you a random word from the .txt file.
Just note that in the split() function, a word is determined by the spaces in between, so if you have a word like "Halloween Pumpkin" or "American Flag", the split() function would make each individual word an element, so it would be transferred into ["Halloween", "Pumpkin"] or ["American", "Flag"].
That's all!

how to tokenize/parse/search&replace document by font AND font style in LibreOffice Writer?

I need to update a bilingual dictionary written in Writer by first parsing all entries into their parts e.g.
main word (font 1, bold)
foreign equivalent transliterated (font 1, italic)
foreign equivalent (font 2, bold)
part of speech (font 1, italic)
Each line of the document is the main word followed by the parts listed above, each separated by a space or punctuation.
I need to automate the process of walking through the whole file, line by line, and place a delimiter between each part, ignoring spaces and punctuation, so I can mass import it into a Calc file. In other words, "each part" is a sequence of character (ignoring spaces and punctuation) that have the same font AND font-style.
I have tried the standard Search&Replace feature, and AltSearch extension, but neither are able to complete the task. The main problem is I am not able to write a search query that says:
Find: consecutive characters with the same font AND font_style, ignore spaces and punctuation
Replace: term found above + "delimiter"
Any suggestions how I can write a script for this, or if an existing tool can solve the problem?
Thanks!
Pseudo code for desired effect:
var delimiter = "|"
Go to beginning of document
While not end of document do:
var $currLine = get line from doc
var $currChar = get next character which is not space or punctuation;
var $font = currChar.font
var $font_style - currChar.font_style (e.g. bold, italic, normal)
While not end of line do:
$currChar = next character which is not space or punctuation;
if (currChar.font != $font || currChar.font_style != $font_style) { // font or style has changed
print $delimiter
$font = currChar.font
$font_style - currChar.font_style (e.g. bold, italic, normal)
}
end While
end While
Here are tips for each of the things your pseudocode does.
First, the easiest way to move line by line is with the TextViewCursor, although it is slow. Notice the XLineCursor section. For the while loop, oVC.goDown() will return false when the end of the document is reached. (oVC is our variable for the TextViewCursor).
Get each character by calling oVC.goRight(0, False) to deselect followed by oVC.goRight(1, True) to select. Then the selected value is obtained by oVC.getString(). To ignore space and punctuation, perhaps use python's isalnum() or the re module.
To determine the font of the character, call oVC.getPropertyValue(attr). Values for attr could simply be CharAutoStyleName and CharStyleName to check for any changes in formatting.
Or grab a list of specific properties such as 'CharFontFamily', 'CharFontFamilyAsian', 'CharFontFamilyComplex', 'CharFontPitch', 'CharFontPitchAsian' etc. Character properties are described at https://wiki.openoffice.org/wiki/Documentation/DevGuide/Text/Formatting.
To insert the delimiter into the text: oVC.getText().insertString(oVC, "|", 0).
This python code from github shows how to do most of these things, although you'll need to read through it to find the relevant parts.
Alternatively, instead of using the LibreOffice API, unzip the .odt file and parse content.xml with a script.

Copy a table from iPython notebook into Word?

I want to copy a table from iPython notebook into a Word doc. I'm using Word for Mac 2011. The table is a standard pandas output and looks like this:
If I use Apple+C to copy the table, and then paste it into a Word doc, I get this:
Surely there must be an easier way?
Creating a table with the same number of rows/columns in Word and then trying to paste the cells there doesn't work either.
I guess I could screenshot the table, but I'd like to include the raw data in the document if possible.
The problem in this case (from the Word perspective) is not the table layout - it's the paragraph layout. Each paragraph has a substantial indent on right and left, and more space before/after than you would normally want.
I don't think any of the Paste options (e.g. Paste Special) in Word is going to help, unless you paste as unformatted text, then select the text, convert to a table, then proceed from there.
But, even a simple Word VBA macro such as this one will leave you with something a bit more manageable. (Select a table you copied in, then run the macro). A little bit more work on the code would probably allow you to get most of the formatting you want, most of the time.
Sub fixupSelectedTable()
With Selection.Tables(1).Range.ParagraphFormat
.LeftIndent = 0
.RightIndent = 0
.SpaceBefore = 0
.SpaceAfter = 0
.LineSpacingRule = wdLineSpaceSingle
End With
End Sub
If you are more familiar with Applescript, the equivalent looks something like this:
-- you may need to fix up the application name
-- (I use this to ensure that the script uses the Open Word 2011 doc
-- and does not try to start Word for Mac 15 (2016))
tell application "/Applications/Microsoft Office 2011/Microsoft Word.app"
tell the paragraph format of the text object of table 1 of the text object of the selection
set paragraph format left indent to 0
set paragraph format right indent to 0
set space before to 0
set space after to 0
set line spacing rule to line space single
end tell
end tell

printfn not producing expected results for international (non-latin) characters

I have the following program:
let txt = "إتصالات"
printfn "Text is: %s" txt
0 // return an integer exit code
The value of txt is being set to some Arabic characters. When I run the program what is being displayed on the console is a bunch of question marks rather than the characters. In the Visual Studio 2012 debugger the correct characters are being displayed for the txt variable.
What am I doing wrong and how does one properly display international characters?
According to How to write unicode chars to console? you need to set the OutputEncoding property on the console, like this:
System.Console.OutputEncoding <- System.Text.Encoding.Unicode
let txt = "إتصالات"
printfn "Text is: %s" txt
0 // return an integer exit code
The answer for that question is worth reading though, because it also describes why you need to change your console font to really make this work, and also how to do it.
Here are some additional links with more information:
Necessary criteria for fonts to be available in a command window (this is for Windows 2000 and may not entirely apply to Windows 8, but it should give you a good idea of what to look for in a font).
Windows Console and TrueType Fonts shows how to add new fonts to the console.
Anyone who says the console can't do Unicode isn't as smart as they think they are has some background information about writing Unicode text to the console.
Update: Since the Arabic text in the example renders just fine here on StackOverflow, I peeked at the CSS to see which fonts they're using to render preformatted text. Using that list and the Windows Character Map tool (Start -> All Programs -> Accessories -> System Tools -> Character Map), I've found the Courier New font (which ships with Windows) supports Arabic characters. If you use the registry hack in the "Windows Console and TrueType Fonts" link (above), you should be able to add Courier New as a font you can use in the console.

Parse a Word Document By Font?

I'm currently trying to write a script which would run through a word document and output to a text file all the lines that are written in a certain font.
So if I had the document:
"This is the first line of the document.
This is the second line of the document.
This is the third line of the document."
And say normal lines are Times New Roman, bold is Arial, and italics is Sans Serif.
Then, ideally, I could parse the document for all lines in Arial and the text file output would have the line:
This is the second line of the document.
Any idea on how to do this from a script? I was thinking about first converting the doc into xml, but I do not think this is possible within a script.
You'll want to use the FIND object, and the FONT property of the FIND object.
So, something like this:
Public Sub FindTest()
Dim r As Range
Set r = ActiveDocument.Content
With r.Find
.ClearFormatting
.Style = "SomeStyleName"
Do While .Execute(Forward:=True, Format:=True) = True
'---- we found a range
Dim duperange As Range
Set duperange = r.Duplicate
Debug.Print r.Text
Loop
End With
End Sub
Note that where I've specified Style, you could specify font formatting via the FIND.FONT object, or various other formatting options. Just browse around the FIND object to see what's available.

Resources