jsPDF: How to render Ä Ü Ö? - jspdf

I am using jsPDF to create a pdf file from a webpage. I like to add a text but if use ä ö ü it won't be rendered correctly.
I tried to add Unicode characters to my pdf using the documentation, but the docs they have added for the library related to the Unicode part doesn't include any examples. It would be great if you could provide an example.
This is my code:
function doDocData(doc) {
var number_of_pages = doc.internal.getNumberOfPages()
var pdf_pages = doc.internal.pages
doc.setFontStyle('normal');//optional
for (var i = 1; i < pdf_pages.length; i++) {
doc.setPage(i)
var strl = 'I like to add Ä Ü Ö';
doc.setFontSize(10);// optional
doc.text(strl, 20, doc.internal.pageSize.height - 10);//key is the interal pageSize
var str = 'Seite ' + i + ' von ' + number_of_pages;
doc.setFontSize(10);// optional
doc.text(str, doc.internal.pageSize.width - 80, doc.internal.pageSize.height - 10);//key is the interal pageSize
}
}

For utf-8 encoded Characters like Ä, Ö, Ü you need to include a font, that is utf-8 encoded (like it is stated in the Documentary: Use of Unicode Characters / UTF-8) since jsPDF by default contains only ASCII font.
A font like this (ARIAL utf-8 ATTENTION: opens download of arialuni.ttf) can be converted by jsPDF own fontconverter.
You can find a code example how to include the converted font here: setFont in jsPDF

Related

TCPDF - superscript without HTML

I'm using TCPDF to create PDF documents and need to render a superscript character without using HTML as the multicell option. No HTML because I need to align the words vertically at the bottom of the cell which doesn't work when the cell has HTML endabled.
Any ideas?
[Edit]
According to Jakuje's hint, I'm using this code to convert the unicode-characters:
$unicodeTable = array('<sup>1</sup>'=>'U+00B9', '<sup>2</sup>'=>'U+00B2', '<sup>3</sup>'=>'U+00B3', '<sup>4</sup>'=>'U+2074', '<sup>5</sup>'=>'U+2075');
function replace_unicode_escape_sequence($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}
function unicode_chr ($chr) {
$x = explode("+", $chr);
$str = "\u".end($x);
return preg_replace_callback('/\\\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', $str);
}
foreach($unicodeTable as $uKey=>$uValue){
$text = str_replace($uKey, unicode_chr($uValue), $text);
}
This works in pure php/HTML - but when I use this code with TCPDF, all I get is the unicode-code (e.g. \u00B9)
You can use UTF8 superscript, if it is some "common" letter, such as
x² or xⁿ
I found the following works with TCPDF
json_decode('"\u00B3"') // for PHP 5.x
"\u{00B2}" // for PHP 7.x
Based on this stack overflow article Unicode character in PHP string
TCPDF 6.2.13 with PHP7.1.4

Downloaded csv files contains ??????? in the place of some arabic letters

Here is my code for initiating download
System.Web.HttpContext.Current.Response.ClearContent();
System.Web.HttpContext.Current.Response.Charset = "utf-8";
System.Web.HttpContext.Current.Response.HeaderEncoding = UnicodeEncoding.UTF8;
System.Web.HttpContext.Current.Response.ContentEncoding = UnicodeEncoding.UTF8;
//System.Web.HttpContext.Current.Response.ContentEncoding = System.Text.Encoding.UTF8;
System.Web.HttpContext.Current.Response.AddHeader("Content-Type", "text/csv");
System.Web.HttpContext.Current.Response.AddHeader("Cache-Control", "max-age=0");
System.Web.HttpContext.Current.Response.AddHeader("Accept-Ranges", "none");
HttpContext.Current.Response.AddHeader("Content-Disposition", "attachment" + "; filename=Data.csv");
System.Web.HttpContext.Current.Response.BinaryWrite(buffer);
System.Web.HttpContext.Current.ApplicationInstance.CompleteRequest();
System.Web.HttpContext.Current.Response.Flush();
System.Web.HttpContext.Current.Response.Close();
args.AbortPipeline();
the out csv file contains like this at the place of some arabic letters
Data
99999;Arabic: ?????????????? ?????????????? al-abj;jkj;biluruya#mswork.ru;
I dont know why the UTF-8 encoding is not working here or is my download code wrong . could some one pls explain me the issue here ?
Thanks

ANTLR best way to include meta-data in lexing/parsing (custom objects, kind of annotation)

I plan to include text metadata (like bold, font-size, etc.) in the process of parsing to achieve better recognition.
For instance, I have a given structure, where a word on its own line word/r/n which is bold and sized 24px, is the title for some article. In order to get better recognition results, I want to take the characters as well as the metadata in account. In terms of ANTRL I'm not sure how this could be done best. I'd like to do something like:
Wrap each character of the original text into a custom object with fields for the metadata and pass that to ANTLR.
Preprocess the text and insert at specific places annotations for the metadata which is considered by the grammer.
I really like to take option 1. but I'm not sure which part from ANTLR I need to subclass etc. Do I have to start at the ANTLRInputStream-Object, in order to get a proper stream for a subclassed Lexer to get custom Tokens for a subclassed Parser etc. Is there a more elegant way, especially in querying the tokens while parsing with actions in a {} block ?
If anyone has some hints and/or experiences this would be great!
EDIT:
Here is a more specific simple example: I have a file wich includes the encoding of metadata which I parse forehand. the actual text including newline look like the following:
entryOne
Here is some content one.
entryTwo
Here is some content two.
Where the titlesentryOneand entryTwo are originally font-size of 24px and the content is font-size of 12px (as exemplary given values). Char by char I create a new instance of a custom object encapsulating the character as String and the font-size.
I initialize respective objects for each of the characters with fields of the font-size, e.g for the first letter of entryOne like
MyChar aTitelChar = new MyChar("e", 24);
For the content, like the second line Here is some content one. I create instances of MyChar like:
MyChar aContentChar= new MyChar("H", 12);
All characters of the texts are wrapped in instances of the below MyChar-Class and added to a List<MyChar> in order to produce a new input for ANTLR.
below is the Java Class for the characters:
public class MyChar {
private int fontSizePx;
private String text;
public MyChar(String text, int fontSizePx) {
this.text = text;
this.fontSizePx = fontSizePx;
}
public int getFontSizePx() {
return fontSizePx;
}
public String getText() {
return text;
}
}
I want that my grammar matches the above two entries (or more formatted this way) which in turn consist each of a title and a content which is terminated with a fullstop. This grammar could look like this:
rule: entry+ NEWLINE
;
entry:
title
content
;
title:
letters NEWLINE
;
content:
(letters)+ '.' NEWLINE
;
letters:
LETTERS
;
LETTERS:
('a'..'z' | 'A'..'Z')+
;
WS:
(' ' | '\t' | 'f' ) + {$channel = HIDDEN;};
NEWLINE:'\r'? '\n';
Now, for instance, what I want to do is to find out if it's really a title of an entry by checking the font-size of all letters encompassing the title-token before titel-rule returns. In case the input conforms to the grammar but is actually some kind of mistake (the original metadata-encoded file starts with something that conforms to the title-rule but its actually the content) the author of the grammar could sort that out if he knows that the original font-size for titles is 24 and check this. If one of the letter-tokens doesn't equal to font-size 24 throw an exception/don't return/do smthg. appropriate.
The thing I'm pondering on is where to plug in the List<MyChar> to provide this functionality (to query kinds of metadata while parsing in context of ANTLR). I'm experimenting with ANTLR's Classes but as I'm new to ANTLR I thought probably some of the experienced users can point me in the right direction, like where would be a good insertion points for custom objects? should I start by implenting CharStream and override some methods? Probably there is something which ANTLR provides which I haven't found yet?
Here's one way to accomplish what I think you're going for, using the parser to manage matching input to metadata. Note that I made whitespace significant because it's part of the content and can't be skipped. I also made periods part of content to simplify the example, rather than using them as a marker.
SysEx.g
grammar SysEx;
#header {
import java.util.List;
}
#parser::members {
private List<MyChar> metadata;
private int curpos;
private boolean isTitleInput(String input) {
return isFontSizeInput(input, 24);
}
private boolean isContentInput(String input){
return isFontSizeInput(input, 12);
}
private boolean isFontSizeInput(String input, int fontSize){
List<MyChar> sublist = metadata.subList(curpos, curpos + input.length());
System.out.println(String.format("Testing metadata for input=\%s, font-size=\%d", input, fontSize));
int start = curpos;
//move our metadata pointer forward.
skipInput(input);
for (int i = 0, count = input.length(); i < count; ++i){
MyChar chardata = sublist.get(i);
char c = input.charAt(i);
if (chardata.getText().charAt(0) != c){
//This character doesn't match the metadata (ERROR!)
System.out.println(String.format("Content mismatch at metadata position \%d: metadata=(\%s,\%d); input=\%c", start + i, chardata.getText(), chardata.getFontSizePx(), c));
return false;
} else if (chardata.getFontSizePx() != fontSize){
//The font is wrong.
System.out.println(String.format("Format mismatch at metadata position \%d: metadata=(\%s,\%d); input=\%c", start + i, chardata.getText(), chardata.getFontSizePx(), c));
return false;
}
}
//All characters check out.
return true;
}
private void skipInput(String str){
curpos += str.length();
System.out.println("\t\tMoving metadata pointer ahead by " + str.length() + " to " + curpos);
}
}
rule[List<MyChar> metadata]
#init {
this.metadata = metadata;
}
: entry+ EOF
;
entry
: title content
{System.out.println("Finished reading entry.");}
;
title
: line {isTitleInput($line.text)}? newline {System.out.println("Finished reading title " + $line.text);}
;
content
: line {isContentInput($line.text)}? newline {System.out.println("Finished reading content " + $line.text);}
;
newline
: (NEWLINE{skipInput($NEWLINE.text);})+
;
line returns [String text]
#init {
StringBuilder builder = new StringBuilder();
}
#after {
$text = builder.toString();
}
: (ANY{builder.append($ANY.text);})+
;
NEWLINE:'\r'? '\n';
ANY: .; //whitespace can't be skipped because it's content.
A title is a line that matches the title metadata (size 24 font) followed by one or more newline characters.
A content is a line that matches the content metadata (size 12 font) followed by one or more newline characters. As mentioned above, I removed the check for a period for simplification.
A line is a sequence of characters that does not include newline characters.
A validating semantic predicate (the {...}? after line) is used to validate that the line matches the metadata.
Here is the code I used to test the grammar (minus imports, for brevity):
SysExGrammar.java
public class SysExGrammar {
public static void main(String[] args) throws Exception {
//Create some metadata that matches our input.
List<MyChar> matchingMetadata = new ArrayList<MyChar>();
appendMetadata(matchingMetadata, "entryOne\r\n", 24);
appendMetadata(matchingMetadata, "Here is some content one.\r\n", 12);
appendMetadata(matchingMetadata, "entryTwo\r\n", 24);
appendMetadata(matchingMetadata, "Here is some content two.\r\n", 12);
parseInput(matchingMetadata);
System.out.println("Finished example #1");
//Create some metadata that doesn't match our input (negative test).
List<MyChar> mismatchingMetadata = new ArrayList<MyChar>();
appendMetadata(mismatchingMetadata, "entryOne\r\n", 24);
appendMetadata(mismatchingMetadata, "Here is some content one.\r\n", 12);
appendMetadata(mismatchingMetadata, "entryTwo\r\n", 12); //content font size!
appendMetadata(mismatchingMetadata, "Here is some content two.\r\n", 12);
parseInput(mismatchingMetadata);
System.out.println("Finished example #2");
}
private static void parseInput(List<MyChar> metadata) throws Exception {
//Test setup
InputStream resource = SysExGrammar.class.getResourceAsStream("SysExTest.txt");
CharStream input = new ANTLRInputStream(resource);
resource.close();
SysExLexer lexer = new SysExLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
SysExParser parser = new SysExParser(tokens);
parser.rule(metadata);
System.out.println("Parsing encountered " + parser.getNumberOfSyntaxErrors() + " syntax errors");
}
private static void appendMetadata(List<MyChar> metadata, String string,
int fontSize) {
for (int i = 0, count = string.length(); i < count; ++i){
metadata.add(new MyChar(string.charAt(i) + "", fontSize));
}
}
}
SysExTest.txt (note this uses Windows newlines (\r\n)
entryOne
Here is some content one.
entryTwo
Here is some content two.
Test output (trimmed; the second example has deliberately-mismatched metadata):
Parsing encountered 0 syntax errors
Finished example #1
Parsing encountered 2 syntax errors
Finished example #2
This solution requires that each MyChar corresponds to a character in the input (including newline characters, although you can remove that limitation if you like -- I would remove it if I didn't already have this answer written up ;) ).
As you can see, it's possible to tie the metadata to the parser and everything works as expected. I hope this helps.

What is correct OAuth percent encoding?

I am working on implementing an Oauth Api and am discovering there are a few things I am having trouble validating, would love if anyone could provide clarification. Warning I probably will ramble so I will try to mark my questions in bold.
According to the oauth 1.0 spec https://www.rfc-editor.org/rfc/rfc5849, I am lead to believe that the way oauth params are percent encoded for signatures is different then when on the wire.
Section 3.6 https://www.rfc-editor.org/rfc/rfc5849#section-3.6
"It is used only in the construction of the signature base string and the "Authorization" header field."
RFC3986
https://www.rfc-editor.org/rfc/rfc3986
This appears to be the percent encoding scheme used in normal requests. However I did not see it give any sort of 'this' maps to 'that' so I am assuming if the character is in the reserved list the hexadecimal equivalent should be used.
Is the only difference that a ' '(Space) is %20 when encoded for signature? The Oauth spec makes reference to this, but I can't honestly find where that is defined in the other specs. It would be awesome if someone could point me to where that is mentioned and how I may have misunderstood it.
Should other white space characters be %20? Where in the spec does that mention that?
Is the conventional UrlEncode fine for form body and query params?
Finally I have some example output that I am looking to validate. I tried to show the difference between the Oauth Signature Encoded character and the Url encoded character. Once again the only differences appear to be the handling of the ' ', '*' and '~'
Char Oauth Url
* %2A *
~ ~ %7E
% %25 %25
! %21 %21
: %3A %3A
/ %2F %2F
= %3D %3D
& %26 %26
+ %2B %2B
%20 +
, %2C %2C
# %40 %40
\r\n %0D%0A %0D%0A
\n %0A %0A
\r %0D %0D
" %22 %22
? %3F %3F
( %28 %28
) %29 %29
| %7C %7C
[ %5B %5B
] %5D %5D
Although this is an old post would like to state my understanding all the same.
With regard to the percent-encoding as specified in the rfc3986#2.1, the understanding is that all characters other than the unreserved characters are to be escaped.
This means that other than :
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
The rest of the characters are to be encoded.
A sample implementation in java is provided here. Look for the percentEncode method that accepts a String as an argument.
public static String percentEncode(String s)
Additional code samples in other languages can be found here.
For JavaScript:
/**
* encodeURIComponent(str) Unescaped / Reserved characters:
*
* Alphabetic, Digit and -_.~!*'()
*
* oAuth Unescaped / Reserved characters:
*
* Alphabetic, Digit and -_.~
*
*/
// Encode with !*'()
this.oAuthEncode = function (value) {
value = encodeURIComponent(value)
value = value.replace(/!/g, '%21') // !
value = value.replace(/\*/g, '%2A') // *
value = value.replace(/'/g, '%27') // '
value = value.replace(/\)/g, '%29') // )
value = value.replace(/\(/g, '%28') // (
return value;
};
// Decode with !*'()
this.oAuthDecode = function (value) {
value = decodeURIComponent(value)
value = value.replace(/%21/g, '!') // !
value = value.replace(/%2A/g, '*') // *
value = value.replace(/%27/g, '\'') // '
value = value.replace(/%29/g, ')') // )
value = value.replace(/%28/g, '(') // (
return value;
};
Maybe this part of the Twitter developer docs might help you: https://developer.twitter.com/en/docs/basics/authentication/guides/percent-encoding-parameters.html

What standard produced hex-encoded characters with an extra "25" at the front?

I'm trying to integrate with ybp.com, a vendor of proprietary software for managing book ordering workflows in large libraries. It keeps feeding me URLs that contain characters encoded with an extra "25" in them. Like this book title:
VOLATILE KNOWING%253a PARENTS%252c TEACHERS%252c AND THE CENSORED STORY OF ACCOUNTABILITY IN AMERICA%2527S PUBLIC SCHOOLS.
The encoded characters in this sample are as follows:
%253a = %3A = a colon
%252c = %2C = a comma
%2527 = %27 = an apostrophe (non-curly)
I need to convert these encodings to a format my internal apps can recognize, and the extra 25 is throwing things off kilter. The final two digits of the hex encoded characters appear to be identical to standard URL encodings, so a brute force method would be to replace "%25" with "%". But I'm leary of doing that because it would be sure to haunt me later when an actual %25 shows up for some reason.
So, what standard is this? Is there an official algorithm for converting values like this to other encodings?
%25 is actually a % character. My guess is that the external website is URLEncoding their output twice accidentally.
If that's the case, it is safe to replace %25 with % (or just URLDecode twice)
The ASCII code 37 (25 in hexadecimal) is %, so the URL encoding of % is %25.
It looks like your data got URL encoded twice: , -> %2C -> %252C
Substituting every %25 for % should not generate any problems, as an actual %25 would get encoded to %25252525.
Create a counter that increments one by one for next two characters, and if you found modulus, you go back, assign the previous counter the '%' char and proceed again. Something like this.
char *str, *newstr; // Fill up with some memory before proceeding below..
....
int k = 0, j = 0;
short modulus = 0;
char first = 0, second = 0;
short proceed = 0;
for(k=0,j=0; k<some_size; j++,k++) {
if(str[k] == '%') {
++k; first = str[k];
++k; second = str[k];
proceed = 1;
} else if(modulus == 1) {
modulus = 0;
--j; first = str[k];
++k; second = str[k];
newstr[j] = '%';
proceed = 1;
} else proceed = 0; // Do not do decoding..
if(proceed == 1) {
if(first == '2' && second == '5') {
newstr[j] = '%';
modulus = 1;
......

Resources