SPSS converting excel (.xlsx) files into .sav files with a very large size - how to reduce it? - spss

I am really stuck with an issue with SPSS Version 26. I have an excel file with 1 event per row that has approx 1.700.000 cases and 13 variables. When I import that data and save it as .sav file, the generated file has about 11 GB, which makes running any command in it very, very, very slow.
I have already worked with bigger databases (more cases and more variables, with more string variables than this one) and I never have that issue before.
Is there any way to identify what is causing the file to have such big size and reducing it somehow?
I have tried to save the file as SPSS Statistics Compressed but it didn't help that much.
thanks in advance

Related

ghostscript pdf/a conversion problem on ubuntu 18.10 and docker

I am using Ghostscript to convert pdf1.3 to pdf/a-1b using this command:
gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -sColorConversionStrategy=sRGB -sDEVICE=pdfwrite -sOutputFile=output.pdf PDFA_def.ps input.pdf
The PDFA_def.ps is customized to use the srgb icc profile. Except that change it is the standard def file which comes with GS 9.26.
Now comes the tricky part:
1- running this conversion locally on a ubuntu 18.10, GS 9.26 it works fine an i get a valid pdf/a
2- running the same command in a docker container (ubuntu 18.10. GS 9.26) creates a pdf/a as well, which is considered to be valid
However, in the first scenario I can process the file using mustang (https://github.com/ZUGFeRD/mustangproject) to create a valid electronic invoice. In the second scenario (docker container) this fails, since the file is not considered to be valid pdf by mustang.
checking both pdf files I would have expected them to be identical since i am running the same converison on it. However they are not. The PDF create in the dockerfile is 10 bytes smaller and shows some different metainformation in the file itself.
I suspect that there must be some "hidden depdencies" that make GS to act different on my host system compared to a docker container, but it feels entirely wrong and I am running out of means to debug further.
Does anyone know, wether GS has some more depdencies that might cause the same command to produce different results?
The answer is 'maybe'. It depends on how Ghostscript was built for starters.
I assume you are using a package, and not building from source yourself. In which case there are a number of dependencies including; FreeType, LibJPEG, JBIG2dec, LibTIFF, JPEG-XR, LCMS2, LibPNG, OpenJPEG, Expat, zlib, potentially IJS, CUPS and X-Windows, depending on what devices were built in.
Any or all of these could be system shared libraries instead of being built using the specific version shipped by Artifex. They could also be different versions on the two systems.
That said, I think its 'unlikely' that this is your problem. However, without seeing the PDF files I can't tell you why there is a difference. Differences in the metadata are to be expected, since that includes a date/time stamp.
I'd really need to see examples of the original and the two output PDF files to be able to comment further.
[Edit]
Looking at the files they have been produced compressed (unsurprisingly) which can obviously lead to differences in size if there are small differences in the input streams. So the first task was to decompress the files.
That done I see there are, essentially, no differences between them. One of the operating systems is using a time zone 7 hours behind UTC, the other is in UTC so where one of the systems is time stamping with (eg)
2019-04-26T19:49:01Z
The other is using
2019-04-26T12:51:35-07:00
So instead of Z (for UTC) you get -07:00 which is where the extra 10 bytes are coming from. Other than that, the Unique IDs are (as you would imagine) different, the Length values for the streams are different for the streams containing dates, and the startxref is different because the streams are different lengths.
Both files claim to be compatible with PDF/A-1b. In short I can see no real differences between them. Since you are using a tool to further process the files, I'd suggest you try taking the PDF file from your working system and try processing it on the non-working system, and vice versa, it seems to me that the problem may be the later processing rather than the PDF file itself. Perhaps you have different versions of that tool on the two systems.
For what it may be worth, Ghostscript can be induced into creating a ZUGFeRD file directly, see this bug report and this commit to the repository.

What is the file size limit? (NodeMCU, Esplorer)

I recently tried to host a little web interface from my ESP8266. But something kept failing until I realized that a bigger file (around 10kb) was corrupt. Well, not really corrupt, but simply incomplete. And no matter how I changed it, the file was always cut off after a certain amount of characters.
My compiled NodeMCU firmware is about 649kb in size, so there should easily be enough space. I mean my board has at least 4MB of storage (32m), so that should be plenty to store my lua, html and css files!
I used Esplorer to upload the files btw.
So what exactly is the limit here?
Is it a memory issue? A flash storage issue? An issue related to Esplorer?
Is it somehow possible to get bigger files onto my board?
edit:
I should mention that uploading the init.lua file always worked even if it was around 10kb. Maybe the uploading mechanism is different for the init.lua file?
Alright, here's the long form of my comment above. My best guess is (was) that this be an issue with ESPlorer. Whenever I look at its source code I'm actually surprized how well it usually works.
At https://frightanic.com/iot/tools-ides-nodemcu/ I compiled a list of tools and IDEs for NodeMCU. I suggest you pick a different uploader and try again. The NodeMCU-Tool for example is solid and it's definitely a lot better maintained than ESPlorer is.

What are the SPSS *.bin files?

SPSS 19 keeps creating new files with BIN extensions in my user\AppData\Local\Temp folder. Many of them are about 8gb in size, and they have started to crash my system. What are they? Can I delete them? How can I prevent them from being created?
Thanks,
Marty
I think these are cache files that SPSS creates during several procedures, notably during SORT.

COBOL - How to read and export text from .dat files

i got a customer who wants to migrate from an old Fujitsu COBOL based system to our system, said that, he wants his old data to be kept in the new system, like products,manufacters, etc.I dont have the COBOL source file, i have: .DAT files, .RDD files and .FDD files.
Apparently the .DAT files are in the INDEXED organization, a sample file output bellow:
FDD output:http://textuploader.com/kxdv
RDD output:http://textuploader.com/kxdw
I can't simple read the .DAT file in notepad, i've tried the SiberDataViewer but unsuccesfull, also it gets paid to export the data.
If there's a way, can i write a program to export all these files to csv,dbf,postgres format? If you are still reading, thank you.
I do not know Fujitsu COBOL but as I see it there are a few ways you might be able to get at the data:
0) Have your customer (or someone with a compatible Fujitsu COBOL compiler) write a COBOL program to read the INDEXED file and output a SEQUENTIAL file.
1) Find a Fujitsu COBOL utility to do the same.
2) Find a product that can read the INDEXED file and export it into something you can use. I'm thinking of products like Cyberquery or Crystal Reports, etc. Or, after I saw that the FDD/RDD files were produced by Siber Systems, a quick search helped me find their "Cobol DataViewer" product; use that to output it to a "more common and usable format" ;-)
I could convert it using the Siber DataViewer, but, its full version is paid.

Decompile help file and extract context mappings?

I have an old help file project, but the original project was lost in a hard drive crash. The original was created using HelpScribble, but now I've decompiled it into WinCHM. I have recreated the help file after decompiling the original compiled CHM file. However, to my knowledge, there is no way to identify the mappings to direct an application to certain Context ID's.
What I'm wondering is if there's a way to read the compiled CHM file and extract the Context ID of each topic in the help file? I would hate to have to iterate through individual numbers from 0 to 5,000 from what I've seen in the original software source. This is a large system, and has a corresponding large help file for every possible scenario in the software.
You can use the chmls tool from the FreePascal project. Invoke it like this:
chmls extractalias MyHelpFile.chm
The output are files named MyHelpFile.ali and MyHelpFile.h containing the IDs and targets of your aliases.

Resources