Will PHPSpreadsheet convert xls to xlsx and preserve formatting? - phpspreadsheet

I am on the fact-finding part of our project specs and have not found a definitive answer re whether PHPSpreadsheet can convert an .xls to .xlsx and preserve formatting, such as table borders.
From this question, I see that there are separate imports for read/write and for file format type. This example demonstrates use of the different modules for read/write file formats:
<?php
require 'vendor\autoload.php';
use \PhpOffice\PhpSpreadsheet\Reader\Xls;
use \PhpOffice\PhpSpreadsheet\Writer\Xlsx;
$xls_file = "Example.xls";
$reader = new Xls();
$spreadsheet = $reader->load($xls_file);
$loadedSheetNames = $spreadsheet->getSheetNames();
$writer = new Xlsx($spreadsheet);
foreach($loadedSheetNames as $sheetIndex => $loadedSheetName) {
$writer->setSheetIndex($sheetIndex);
$writer->save($loadedSheetName.'.xlsx');
}
However, I have not seen if the resultant export preserves formatting, specifically border lines. At the moment, I am unable to write this myself.

I haven't tested PhpSpreadsheet (due to Composer requirement), but FWIW PhpExcel (the predecessor to PhpSpreadsheet) does the job quite handily and yes, it does preserve most formatting.
Here is a sample conversion script using PhpExcel:
<?php
$xls_to_convert = 'test.xls';
error_reporting(E_ALL);
ini_set('display_errors', TRUE);
ini_set('display_startup_errors', TRUE);
define('EOL',(PHP_SAPI == 'cli') ? PHP_EOL : '<br />');
date_default_timezone_set('America/Vancouver');
require_once dirname(__FILE__) . '/PHPExcel/Classes/PHPExcel/IOFactory.php';
$objPHPExcel = PHPExcel_IOFactory::load(dirname(__FILE__) . '/' . $xls_to_convert);
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel2007');
$objWriter->save(str_replace('.xls', '.xlsx', $xls_to_convert));
A cursory comparison of my test xls spreadsheet against the xlsx result shows:
font sizes, font colors, bold/italic, borders, centering, cell background/shading/colors, drop-downs with values, are preserved
Buttons are lost
Shapes (e.g. lines, arrows, textboxes) are lost
Charts are lost

Related

TCPDF Embedded Base64 Encoded Images in HTML String

I have read several posts on this subject but didn't want to piggy-back on any of them with additional questions.
Specifically this post: TCPDF and insert an image base64 encoded
I am generating a PDF from within a custom theme in Wordpress. I'm using TCPDF 6.2.3 (latest stable release, I believe).
I am building this PDF from the same HTML I am using to display on the page. If I embed the full base64 encoded string, it works correctly in the browser, but the image is missing from the PDF.
If I use the "#" method described in the linked post, I get a broken image in the browser (expectedly) but still nothing in the PDF.
All the rest of my HTML markup is rendering in the PDF, images are just not showing.
Is there some other setting or option I need to set in order to get the images to appear in the PDF, and/or can you spot anything I'm doing wrong here? No errors, the images are just not visible in the PDF.
This is how I set the image up:
$imageLocation = $img_root.$imgsrc;
$ext = end(explode(".", $imageLocation));
$image = base64_encode(file_get_contents($imageLocation));
//$response .= "<img src='data:image/$ext;base64,$image'>"; //works in browser but not in PDF
$response .= "<img src='#$image' class='socf_image'>"; //does not work in browser or PDF
And here is the method to create the PDF:
function createPDF($response)
{
// Include the main TCPDF library (search for installation path).
require_once('tcpdf_6_3_2/tcpdf/tcpdf.php');
// create new PDF document
$pdf = new TCPDF(PDF_PAGE_ORIENTATION, PDF_UNIT, PDF_PAGE_FORMAT, true, 'UTF-8', false);
// set document information
$pdf->SetCreator(PDF_CREATOR);
$pdf->SetAuthor('test');
$pdf->SetTitle('test');
$pdf->SetSubject('test');
$pdf->SetKeywords('test');
// set default header data
$pdf->SetHeaderData(PDF_HEADER_LOGO, PDF_HEADER_LOGO_WIDTH, PDF_HEADER_TITLE.' 001', PDF_HEADER_STRING, array(0,64,255), array(0,64,128));
$pdf->setFooterData(array(0,64,0), array(0,64,128));
// set header and footer fonts
$pdf->setHeaderFont(Array(PDF_FONT_NAME_MAIN, '', PDF_FONT_SIZE_MAIN));
$pdf->setFooterFont(Array(PDF_FONT_NAME_DATA, '', PDF_FONT_SIZE_DATA));
// set default monospaced font
$pdf->SetDefaultMonospacedFont(PDF_FONT_MONOSPACED);
// set margins
$pdf->SetMargins(PDF_MARGIN_LEFT, PDF_MARGIN_TOP, PDF_MARGIN_RIGHT);
$pdf->SetHeaderMargin(PDF_MARGIN_HEADER);
$pdf->SetFooterMargin(PDF_MARGIN_FOOTER);
// set auto page breaks
$pdf->SetAutoPageBreak(TRUE, PDF_MARGIN_BOTTOM);
// set image scale factor
$pdf->setImageScale(PDF_IMAGE_SCALE_RATIO);
// set default font subsetting mode
$pdf->setFontSubsetting(true);
// Set font
$pdf->SetFont('helvetica', '', 14, '', true);
// Add a page
$pdf->AddPage();
$html = $response;
$pdf->writeHTML($response, true, false, true, false, '');
return $pdf;
}
Well, fortunately, I was able to figure it out on my own. Perhaps this isn't the best forum for seeking help with this library? If anyone can suggest a better place to get help, I'd appreciate the direction.
Ultimately, the issue was two-fold:
The "#" notation is required for the PDf while the approach is what works for displaying the HTML in browser. So a string replace before creating the PDF solves that.
This is the tricky part. The HTML needs to use double-quotes around the properties, not single quotes. My code was using double quotes for the PHP strings, so the HTML properties were surrounded with single quotes and that was the issue. Swapping the two quote types was the last piece of the puzzle to get the images to appear in the PDF.
Hopefully this will help someone else who is pulling their hair out trying to blindly find their way through this library like me.

PHPEXCEL weird characters on form inputs

I need some help with PHPEXCEL library, everything works great, I'm successfully extracting my SQL query to excel5 file, I need to give this file to transport company in order to auto collect informations about packages, unfotunately the generated excel file has some ascii characters between each letter of the cell text, and when the excel file is imported you need to manually delete these charaters.
If I open the excel file, everything is fine I see: COMPANY NAME, If I open the excel file with notepad++, I see the cell values this way: C(NUL)O(NUL)M(NUL)P(NUL)A(NUL)N(NUL)Y N(NUL)A(NUL)M(NUL)E
If I open again the file with excel and save, then reopen with notepad++ I see COMPANY NAME.
So I do not understan why every time I create an excel file using PHPEXCEL my every letter of all words are filled with (nul) every letter.
So how do I prevent the generated excel file to include (nul) between every word????
Also if you open the original excel file generated from PHPExcel samples are also filled with (nul) and if you open and save it, the (nul) is gone.
Any help would be appreciated, thanks.
what is the (nul) ??? 0x00??? char(0)???
ok, here is the example:
error_reporting(E_ALL);
ini_set('display_errors', TRUE);
ini_set('display_startup_errors', TRUE);
date_default_timezone_set('Europe/London');
if (PHP_SAPI == 'cli')
die('Disponibile solo su browser');
require_once dirname(__FILE__) . '/Classes/PHPExcel.php';
$objPHPExcel = new PHPExcel();
$objPHPExcel->getProperties()->setCreator("Solidus")
->setLastModifiedBy("Solidus")
->setTitle("Import web")
->setSubject("Import File")
->setDescription("n.a")
->setKeywords("n.a")
->setCategory("n.a");
$objPHPExcel->setActiveSheetIndex(0)
->setCellValueExplicit("A1", "COMPANY")
->setCellValue('A2', 'SAMSUNG');
$objPHPExcel->getActiveSheet()->setTitle('DDT');
$objPHPExcel->setActiveSheetIndex(0);
header('Content-Type: application/vnd.ms-excel');
header('Content-Disposition: attachment;filename="TEST.xls"');
header('Cache-Control: max-age=0');
header('Cache-Control: max-age=1');
header('Cache-Control: private',false);
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');
ob_end_clean();
$objWriter->save('php://output');
As you can see from this little example, this scripts creates a file excel5 with 2 cells, A1 = COMPANY, A2 = SAMSUNG
when I send this file to the transport company, they import the file into their system, but as you can see from the picture, there is an weird character between each letter.
so I noticed every time I open the generated Excel5 with notepad++ file I get:
S(nul)A(nul)M(nul)S(nul)U(nul)N(nul)G
If I save the save with excel and then open it again with notepad++ I get:
SAMSUNG
and this file is ok for the transport company
so my question is, how should I avoid the file generated to contain thi '(nul) charachter between each letter????
some help?
weird characters
SAMSUNG
I found the soluion by myself, I explain just in case anyone has also this problem:
there is not way to change the way the excelfile is encoded by PHPEXCEL
so I figured out the problem was reading the file, I did some simulations and reproduce the problem, every time a read the file and put the result into inputs a get weird characters:
C�O�M�P�A�N�Y�
If I set the output enconding enconding as follows:
$excel->setOutputEncoding('UTF-8');
the file loads fine, so the problem was not creating the excel file, but reading the excel file.
If I print the variable with ECHO I get: "COMPANY",
if I put the variable on input as value I get: "C�O�M�P�A�N�Y�"
setting the output solves the problem, but I would like to know why the difference when I put the variable on input as value, thanks

How to display Chinese characters with Three.js

I am using the OBJLoader to load a large 3D model (described in a .obj file) and I want to display the models name on its surfaces. Though it seems that Three.js can only display English characters. My question is how can I display Chinese characters in Three.js?
there are a couple of ways to display text (https://github.com/mrdoob/three.js/wiki/Text-in-Three.js), but since exporting a chinese font might be more difficult, it might be easier to draw the chinese characters to a canvas texture and use the texture as a material in the scene.
It seem that you can use Facetype.js to get a Chinese font library, for example"YaHei_Regular.typeface.json" , then you can show Chinese characters.
var fontLoader = new THREE.FontLoader();
fontLoader.load("YaHei_Regular.typeface.json", (font)=> {
this.font = font;
});
Another option is using msdf-bmfont-xml. This example uses Microsoft YaHei.
charset.txt —
你好,世界
You may need to install dependencies first. Then:
npm install msdf-bmfont-xml -g
msdf-bmfont -f json yahei.ttf -i charset.txt --pot --square
and finally, use three-bmfont-text to render the text.
Three.js doesn't display Chinese regularly because it doesn't support the charset. You have to load it dynamically.
It seems there're two methods to load: 1, new THREE.TTFLoader().load('*.ttf') , it loads a ttf file that support Chinese charset. But I failed.
2, new THREE.FontLoader().load('*.json') it loads a json file transformed by
the ttf on http://gero3.github.io/facetype.js/ website.
But firstly you have to find a complete ttf file. I tried 方正兰亭超细黑简体 and 方正赵佶瘦金书 which both work, you can google and download ttf file. I found some ttf can't be identified by three.js completely. You perhaps see some Chinese char display normally but others still display '?'.
The final code snippets as following:
const three_font = new THREE.FontLoader();
three_font.load('*.json', function (font_font) {
font=font_font
})
// finally add text with font
const geometry = new THREE.TextGeometry(
{
font: font,
size, height: h, curveSegments: 4, bevelThickness: 2, bevelSize: 2, bevelEnabled: true
});
geometry.computeBoundingSphere();
geometry.computeVertexNormals();
const mesh = new THREE.Mesh(geometry, pool.textMaterial);
mesh.position.set(x * deviation, y, z * deviation);
mesh.rotation.set(rx, ry, rz);
scene.add(mesh);

tika returning incorrect line of text for pdf with lots of tables

I am using tika to extract text from a pdf file that has lot of tables.
java -jar tika-app-0.9.jar -t https://s3.amazonaws.com/centraldoc/alg1.pdf
It is returning some invalid text and sometimes it is trimming white space between 2 words; for example it returns
"qu inakli fmyathematical ideas to the real world" instead of "Link mathematical ideas to the real world".
Is there a way to minimize this kind of error? or is there another library that I can use? Does it make sense to use OCR to process these kind of pdf.
Try to control order when using PDFBox parser: PDFTextStripper has a flag that controls the order of lines in the document. By default (in PDFBox) it's set to false for performance reasons (no order preserved), but Tika changed its behavior between releases switching this flag on and off.
More details exactly on this problem in my blog Extracting text from PDF files with Apache Tika 0.9 (and PDFBox under the hood).
To get text from PDF to display in the right order, I had to set the SortByPosition flag to true... (tika-app-1.19.jar)
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
PDFParser pdfParser = new PDFParser();
PDFParserConfig config = pdfParser.getPDFParserConfig();
config.setSortByPosition(true); // needed for text in correct order
pdfParser.setPDFParserConfig(config);
pdfParser.parse(is, handler, metadata, context);

How to write a simple .txt content processor in XNA?

I don't really understand how Content importer/processor works in XNA.
I need to read a text file (Content/levels/level1.txt) of the form:
x x
x x
x x
where x's are just integers, into an int[,] array.
Any tips on writting a SIMPLE .txt importer??? By searching google/msdn I only found .x/.fbx file importer examples. And they seem too complicated.
Do you actually need to process the text file? If not, then you can probably skip most of the content pipeline.
Something like:
string filename = "Content/TextFiles/sometext.txt";
string path = Path.Combine(StorageContainer.TitleLocation, filename);
string lineOfText;
StreamReader sr = new StreamReader(path);
while ((lineOfText = sr.ReadLine()) != null)
{
// do something
}
Also, be sure to set the "Build Action" to "None" and the "Copy to Output Directory" to "Copy if newer" on the text files you've added. This tells the content pipeline not to compile the text file but rather copy it to the output directory for use as is.
I got this (more or less) from the RacingGame sample provided by Microsoft. It foregoes much of the content pipeline and simply loads and processes text files (XML) for much of its level data.
XNA 4.0 uses
System.IO.Stream stream = TitleContainer.OpenStream("tilename.txt");
See http://msdn.microsoft.com/en-us/library/bb199094.aspx and also http://blogs.msdn.com/b/shawnhar/archive/2010/12/09/reading-files-in-xna-game-studio-4-0.aspx
There doesn't seem to be a lot of info out there, but this blog post does indicate how you can load .txt files through code using XNA.
Hopefully this can help you get the file into memory, from there it should be straightforward to parse it in any way you like.
XNA 3.0 - Reading Text Files on the Xbox
http://www.ziggyware.com/readarticle.php?article_id=69 is probably a good place to start. It covers creating a basic content processor.

Resources