How to open Excel file written with incorrect character encoding in VBA - character-encoding

I read an Excel 2003 file with a text editor to see some markup language.
When I open the file in Excel it displays incorrect characters. On inspection of the file I see that the encoding is Windows 1252 or some such. If I manually replace this with UTF-8, my file opens fine. Ok, so far so good, I can correct the thing manually.
Now the trick is that this file is generated automatically, that I need to process it automatically (no human interaction) with limited tools on my desktop (no perl or other scripting language).
Is there any simple way to open this XL file in VBA with the correct encoding (and ignore the encoding specified in the file)?
Note, Workbook.ReloadAs does not function for me, it bails out on error (and requires manual action as the file is already open).
Or is the only way to correct the file to go through some hoops? Either: text in, check line for encoding string, replace if required, write each line to new file...; or export to csv, then import from csv again with specific encoding, save as xls?
Any hints appreciated.
EDIT:
ADODB did not work for me (XL says user defined type, not defined).
I solved my problem with a workaround:
name2 = Replace(name, ".xls", ".txt")
Set wb = Workbooks.Open(name, True, True) ' open read-only
Set ws = wb.Worksheets(1)
ws.SaveAs FileName:=name2, FileFormat:=xlCSV
wb.Close False ' close workbook without saving changes
Set wb = Nothing ' free memory
Workbooks.OpenText FileName:=name2, _
Origin:=65001, _
DataType:=xlDelimited, _
Comma:=True

Well I think you can do it from another workbook. Add a reference to AcitiveX Data Objects, then add this sub:
Sub Encode(ByVal sPath$, Optional SetChar$ = "UTF-8")
Dim stream As ADODB.stream
Set stream = New ADODB.stream
With stream
.Open
.LoadFromFile sPath ' Loads a File
.Charset = SetChar ' sets stream encoding (UTF-8)
.SaveToFile sPath, adSaveCreateOverWrite
.Close
End With
Set stream = Nothing
Workbooks.Open sPath
End Sub
Then call this sub with the path to file with the off encoding.

Related

How to encode a STRING variable into a given code page

I've got a string variable containing a text that I need to encode and write to a file, in UTF-16LE code page.
Currently the following code generates a UTF-8 file and I don't see any option in the statement OPEN DATASET to generate the file in UTF-16LE.
REPORT zmyprogram.
DATA(filename) = `/tmp/myfile`.
OPEN DATASET filename IN TEXT MODE ENCODING DEFAULT FOR OUTPUT.
TRANSFER 'HELLO WORLD' TO filename.
CLOSE DATASET filename.
I guess one solution is to first encode the string in memory, then write the encoded bytes to the file.
Generally speaking, how to encode a string of characters into a given code page, in memory?
In the first part, I explain how to encode a string of characters into a given code page (all is done in memory), and in the second part, I explain specifically how to write files to the application server in a given code page.
General way (all in memory)
If a string of characters (type STRING) has to be encoded, the result has to be stored in a string of bytes, which corresponds to the built-in data type XSTRING.
There are several possibilities which depend on the ABAP version:
Since 7.53, use the class CL_ABAP_CONV_CODEPAGE:
DATA(xstring) = cl_abap_conv_codepage=>create_out( codepage = `UTF-16LE` )->convert( source = `ABCDE` ).
Since 7.02, use the class CL_ABAP_CODEPAGE:
DATA xstring TYPE xstring.
xstring = cl_abap_codepage=>convert_to( source = `ABCDE` codepage = `UTF-16LE` ).
Before 7.02, use the class CL_ABAP_CONV_OUT_CE (documentation provided with the class):
First, instantiate the conversion object, use a SAP code page number instead of the ISO name (list of values shown hereafter):
DATA: conv TYPE REF TO CL_ABAP_CONV_OUT_CE, xstring TYPE xstring.
conv = CL_ABAP_CONV_OUT_CE=>CREATE( encoding = '4103' ). "4103 = utf-16le
Then encode the string and retrieve the bytes encoded:
conv->RESET( ).
conv->WRITE( data = `ABCDE` ).
xstring = conv->GET_BUFFER( ).
Eventually, instead of using RESET, WRITE and GET_BUFFER, the method CONVERT was added in 6.40 and retroported :
conv->CONVERT( EXPORTING data = `ABCDE` IMPORTING buffer = xstring ).
With the class CL_ABAP_CONV_OUT_CE, you need to use the number of the SAP Code Page, not the ISO name. Here are the most common SAP code pages and their equivalent ISO names:
1100: ISO-8859-1
1101: US-ASCII
1160: Windows-1252 ("ANSI")
1401: ISO-8859-2
4102: UTF-16BE
4103: UTF-16LE
4104: UTF-32BE
4105: UTF-32LE
4110: UTF-8
Etc. (the possible values are defined in the table TCP00A, in lines with column CPATTRKIND = 'H').
 
Writing a file on the application server in a given code page
In ABAP, OPEN DATASET can directly specify the target code page, most code pages are supported including UTF-8, but not other UTF (code pages 41xx) which can be done only by the solution explained in 2.3 below (by first encoding in memory).
2.1) IN TEXT MODE ENCODING ...
Possible ENCODING values:
UTF-8: in this mode, it's possible to add the Byte Order Mark if needed, via the option WITH BYTE-ORDER MARK.
DEFAULT: will be UTF-8 in a SAP "Unicode" system (that you can check via the menu System > Status > Unicode System Yes/No), NON-UNICODE otherwise.
NON-UNICODE: will depend on the current ABAP linguistic environment; for language English, it's the character encoding iso-8859-1, for language Polish, it's the character encoding iso-8859-2, etc. (the equivalences are shown in table TCP0C.)
Example in ABAP version 7.52 to write to UTF-8 with the byte order mark:
REPORT zmyprogram.
DATA(filename) = `/tmp/dataset_utf_8`.
OPEN DATASET filename IN TEXT MODE ENCODING UTF-8 WITH BYTE-ORDER MARK FOR OUTPUT.
TRY.
TRANSFER `Witaj świecie` TO filename.
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.
Example in ABAP version 7.52 to write to iso-8859-2 (Polish language here):
REPORT zmyprogram.
SET LOCALE LANGUAGE 'L'. " Polish
DATA(filename) = `/tmp/dataset_nonunicode_pl`.
OPEN DATASET filename IN TEXT MODE ENCODING NON-UNICODE FOR OUTPUT.
TRY.
TRANSFER `Witaj świecie` TO filename.
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.
2.2) IN LEGACY TEXT MODE CODE PAGE ...
Use any code page number except code pages 41xx (i.e. UTF-8 and other UTF; see workaround in 2.3 below).
Example in ABAP version 7.52 to write to iso-8859-2 (code page 1401) :
REPORT zmyprogram.
DATA(filename) = `/tmp/dataset_iso_8859_2`.
OPEN DATASET filename IN LEGACY TEXT MODE CODE PAGE '1401' FOR OUTPUT. " iso-8859-2
TRY.
TRANSFER `Witaj świecie` TO filename.
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.
2.3) UTF = general way + IN BINARY MODE
Example in ABAP version 7.52:
REPORT zmyprogram.
TRY.
DATA(xstring) = cl_abap_codepage=>convert_to( source = `Witaj świecie` codepage = `UTF-16LE` ).
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
BREAK-POINT.
ENDTRY.
DATA(filename) = `/tmp/dataset_utf_16le`.
OPEN DATASET filename IN BINARY MODE FOR OUTPUT.
TRANSFER xstring TO filename.
CLOSE DATASET filename.

PHPEXCEL weird characters on form inputs

I need some help with PHPEXCEL library, everything works great, I'm successfully extracting my SQL query to excel5 file, I need to give this file to transport company in order to auto collect informations about packages, unfotunately the generated excel file has some ascii characters between each letter of the cell text, and when the excel file is imported you need to manually delete these charaters.
If I open the excel file, everything is fine I see: COMPANY NAME, If I open the excel file with notepad++, I see the cell values this way: C(NUL)O(NUL)M(NUL)P(NUL)A(NUL)N(NUL)Y N(NUL)A(NUL)M(NUL)E
If I open again the file with excel and save, then reopen with notepad++ I see COMPANY NAME.
So I do not understan why every time I create an excel file using PHPEXCEL my every letter of all words are filled with (nul) every letter.
So how do I prevent the generated excel file to include (nul) between every word????
Also if you open the original excel file generated from PHPExcel samples are also filled with (nul) and if you open and save it, the (nul) is gone.
Any help would be appreciated, thanks.
what is the (nul) ??? 0x00??? char(0)???
ok, here is the example:
error_reporting(E_ALL);
ini_set('display_errors', TRUE);
ini_set('display_startup_errors', TRUE);
date_default_timezone_set('Europe/London');
if (PHP_SAPI == 'cli')
die('Disponibile solo su browser');
require_once dirname(__FILE__) . '/Classes/PHPExcel.php';
$objPHPExcel = new PHPExcel();
$objPHPExcel->getProperties()->setCreator("Solidus")
->setLastModifiedBy("Solidus")
->setTitle("Import web")
->setSubject("Import File")
->setDescription("n.a")
->setKeywords("n.a")
->setCategory("n.a");
$objPHPExcel->setActiveSheetIndex(0)
->setCellValueExplicit("A1", "COMPANY")
->setCellValue('A2', 'SAMSUNG');
$objPHPExcel->getActiveSheet()->setTitle('DDT');
$objPHPExcel->setActiveSheetIndex(0);
header('Content-Type: application/vnd.ms-excel');
header('Content-Disposition: attachment;filename="TEST.xls"');
header('Cache-Control: max-age=0');
header('Cache-Control: max-age=1');
header('Cache-Control: private',false);
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');
ob_end_clean();
$objWriter->save('php://output');
As you can see from this little example, this scripts creates a file excel5 with 2 cells, A1 = COMPANY, A2 = SAMSUNG
when I send this file to the transport company, they import the file into their system, but as you can see from the picture, there is an weird character between each letter.
so I noticed every time I open the generated Excel5 with notepad++ file I get:
S(nul)A(nul)M(nul)S(nul)U(nul)N(nul)G
If I save the save with excel and then open it again with notepad++ I get:
SAMSUNG
and this file is ok for the transport company
so my question is, how should I avoid the file generated to contain thi '(nul) charachter between each letter????
some help?
weird characters
SAMSUNG
I found the soluion by myself, I explain just in case anyone has also this problem:
there is not way to change the way the excelfile is encoded by PHPEXCEL
so I figured out the problem was reading the file, I did some simulations and reproduce the problem, every time a read the file and put the result into inputs a get weird characters:
C�O�M�P�A�N�Y�
If I set the output enconding enconding as follows:
$excel->setOutputEncoding('UTF-8');
the file loads fine, so the problem was not creating the excel file, but reading the excel file.
If I print the variable with ECHO I get: "COMPANY",
if I put the variable on input as value I get: "C�O�M�P�A�N�Y�"
setting the output solves the problem, but I would like to know why the difference when I put the variable on input as value, thanks

Japanese language translation issue in CSV File - ASP.NET MVC

I am facing an issue while exporting japanese text in CSV format. Junk characters are being exported instead of original japanese text. I am using .NET MVC FileStreamResult to export records in Csv file and used encoding format as UTF8 (I have also used some other encoding format, but no luck). I debugged my code and able to convert string from memory stream and vice versa and able to see original japanese text being exported. Once exporting completed, I opened the CSV file, but only able to see junk character instead of expected text. If I open the CSV file in NotePad ( Opening the csv file in Notepad is NOT my requirement. I am referring Notepad only to verify whether i am able to see Japanese translated language ), then i can see the expected japanese text. It would be really helpful if someone please help me find root cause of this issue and provide a resolution.
Ex. 東京都品川区大崎 gets written as æ±äº¬éƒ½å“å·åŒºå¤§å´Ž
Note: I can see expected japanese text is exported properly if I opened the sample .CSV file using LibreOffice Calc, Linux default gEdit. But the issue is with opening this csv file using MS Office.
Please find the below attached code -
Controller/Action to execute while clicking on export to Csv button
================================================================================
[HttpPost]
[ValidateInput(false)]
public FileStreamResult SaveCustomerInfo()
{
return ExportToCsv();
}
================================================================================
private static FileStreamResult ExportToCsv()
{
var exportedData = new StringBuilder();
exportedData
.AppendLine("実行日,口座番号,支店番号,アカウント名,支店名,の/受益秩序,ステートメント日,入力日,お問い合わせ番号, ,Date Range")
.Append(
"CS0001,Demo FName,Demo LName,8/20/2015,\"Demo User Address\",City,Country,08830,0123456789,15813,Absolute from 8/20/2015 to 8/22/2015");
var stream = PrintingHelper.StringToMemoryStream(Encoding.UTF8, exportedData.ToString());
var fileStreamResult = new FileStreamResult(stream, "text/csv")
{
FileDownloadName =
new StringBuilder("TestExportedFileInCsv")
.Append(".csv").ToString()
};
return fileStreamResult;
}
It sound as though you haven't installed the language pack for MS Office on the machine that you are trying to open the csv on.

TCPDF generated PDF to automatically display in Acrobat Reader

I am able to use TCPDF and generate a PDF in the browser using JQuery/JavaScript:
window.open("", "pdfWindow",scrollbars=yes, resizable=yes, top=500, left=500, width=400, height=400");
$("#" + formID).attr('action','tcpdf/example/genReport.pdf').attr('target','pdfWindow');
In genReport.pdf, I am using $pdf->Output('genReport.pdf', 'I');
When genReport.pdf is generated, it appears in a new tab with the standard browser settings. I wanted to know if there is a way to have the generated PDF automatically display in Acrobat Reader?
Any help will be greatly appreciated.
According to the documentation of the output() function, the second parameter can be one of those:
I: send the file inline to the browser (default). The plug-in is used if available. The name given by name is used when one selects
the "Save as" option on the link generating the PDF.
D: send to the browser and force a file download with the name given by name.
F: save to a local server file with the name given by name.
S: return the document as a string (name is ignored).
FI: equivalent to F + I option
FD: equivalent to F + D option
E: return the document as base64 mime multi-part email attachment (RFC 2045)
So I'd suggest using $pdf->Output('genReport.pdf', 'D'); this will open the download dialog and the user can choose to either open or download the file.

saving data with TextEdit

I want to use TextEdit to save data. what I have so far
tell application "TextEdit"
open /Users/UserName/Desktop/save.rtf
end tell
This gives me
"Expected “given”, “in”, “of”, expression, “with”, “without”, other parameter name, etc. but found unknown token."
and highlights the . in .rtf I tried removing the .rtf
but when I compile it it turns into
(open) / Users / username / desktop / (save)
This code gives "The variable Users is not defined."
also if possible can I have TextEdit run in the background without opening a window?
Put quotes around the path and use POSIX file to get a file object for the path:
tell application "TextEdit"
open POSIX file "/Users/UserName/Desktop/save.rtf"
end tell
You can modify the text of a document by changing the text property:
tell application "TextEdit"
set text of document 1 to text of document 1 & "aa"
end tell
It removes all styles in rich text documents. It also inserts the text as 12-point Helvetica in plain text documents, regardless of the default font.
Creating a new rtf file:
tell application "TextEdit"
make new document at beginning with properties {text:"aa"}
close document 1 saving in POSIX file "/tmp/a.rtf"
end tell
printf %s\\n aa | textutil -inputencoding UTF-8 -convert rtf -stdin -output a.rtf

Resources