SAS Encoding Label - character-encoding

I have a dataset x. When I try the following code it gives me the following error.
libname raw 'D:\Apellis\data\raw' outrep='WINDOWS_64';
data x;
set raw.x;
run;
ERROR: Some character data was lost during transcoding in the dataset WORK.X. Either the data contains characters that are not
representable in the new encoding or truncation occurred during transcoding.
If I try the following inencoding option the error disappears from the log, however, would this result in some rows being omitted?:
libname raw 'D:\Apellis\data\raw' outrep='WINDOWS_64' inencoding=any;
data x;
set raw.x;
run;
Additionally, I figured, the raw.x dataset has a variable which contains the following character ≥, which I think could be leading to the encoding problem. Do Labels containing such characters also lead to the encoding error described above? How do I get rid of this?

Related

SPSS save decimal number to ASCII

I am trying to save a numeric-with-decimal-places(f8.6) variable from an SPSS file into a fixed ASCII file. The goal is to write it into certain columns of the ASCII (21 to 30).
WRITE OUTFILE='C:\misc\ascii.dat'
ENCODING='UTF8'
TABLE /1
variable 21-30.
exe.
writes to the correct positions, but not with decimals.
variable 21-30 (f)
does the same thing.
variable (f8.6)
saves with decimals, but on positions 1 to 10.
variable 21-30 (f8.6)
results in an error, because apparently you cannot specify both columns and format.
I know two workarounds, but both involve additional data editing, which I'd rather not do:
Convert variable to string and save it as string - but I am not sure about the implications (encoding, decimal places, or whatever other thing I am not even considering)
add an empty string variable with length of 20 before my variable.
But is there a straightforward way of doing this, without workarounds ?
You can add the 20 spaces in the command itself, like this:
WRITE OUTFILE='C:\misc\ascii.dat'
ENCODING='UTF8'
TABLE / ' ' YourNumVar (f8.6) .
exe.

Number notation "SK"

I use an ODBC table handler to read data from Excel and CSV files into an AMPL model. But the thing I encountered probably doesn't have much to do with the precise programs and programming language I use.
Among the data are two specific types of strings: three-digit alphabetic and six-digit alphanumeric.
When the three-digit alphabetic type includes a NAN string, AMPL throws an error. As I found out, the reason is that it understands NAN as "NaN" (not a number). It cannot use this as an index.
The six-digit alphanumeric type sometimes include strings like 3E1234. This seems to be a problem because AMPL (or the handler) understands this as a number in scientific notation. So it reads 3*10^1234, which is handled as infinity. So when there is one 3E1234 entry and one 3E1235 entry, it sees them both as infinity.
I understand these two. And although they are annoying, I can work with that. Now I encountered that a string SK1234 is parsed as the number 1234. I have learned a bit of programming in college, but I don't have any idea why this happens. Is the prefix SK anything special?
EDIT: Here is an example that reproduces the error:
The model file:
set INDEX;
param value;
The "run" file:
table Table1 IN "tableproxy" "odbc" "DSN=NDE" "Test.csv": INDEX <- [Index], value ~ Value;
read table Table1;
NDE is a user DSN that uses the Microsoft Text Driver in the appropriate folder.
And the CSV file:
Index,Value
SK1202,1
SK1445,2
SK0124,3
SK7896,4
SK1,5
AB1234,6
After running all this code, I type display INDEX and get
set INDEX := 1202 1445 124 7896 1 Missing;
So the field Index is treated as a numeric field with the first five entries converted to a number. The last entry cannot be converted so it is treated as Missing.
The DSN's setting is that it sets the type according to the first 25 lines. For some reason, it understands the SK... entries as numbers and therefore reads all as numbers.
For the Text ODBC driver to detect column type correctly, values should be quoted:
Index,Value
'SK1202',1
'SK1445',2
'SK0124',3
'SK7896',4
'SK1',5
'AB1234',6

weird erlang behavior, dealing with large bitstrings

when converting large base64 image (~45k), to bitstring, it raises an exception:
exception error: no function clause matching
base64:decode("j9/",
[255,128,0,65,41,25,37,24,0,4,4,0,0,4,0,4,0,
3,255,108,1,12,0,32,24,24,28|...]) (base64.erl, line 254)
in function base64:decode/1 (base64.erl, line 118)
I really want to understand why it behaves like that (maybe bitstring max size?)
Thanks for your time
It seems like your base64 data is truncated. Base64 works by taking groups of 4 characters and converting them to groups of 3 bytes. If the bitstring is not a multiple of 3 bytes, the base64 text should be padded with one or two = signs, so that it's still made up of groups of 4 characters, but your base64 text ends with a group of only 3 characters.
Can you verify if the image is properly encoded with base64 by trying to decode it outside Erlang?
See this post how to do that from the command line:
https://askubuntu.com/questions/178521/how-can-i-decode-a-base64-string-from-the-command-line

how to keep leading zeros in Rails yml fixtures?

I am trying to use the "food_descriptions" fixture in a "minitest" test in Rails 4 beta1:
butter:
NDB_No: "01001"
FdGrp_Cd: "0100"
Long_Desc: "Butter, salted"
The test I have is this:
it "must work" do
food_descriptions(:butter).NDB_No.must_equal "01001"
end
However, when I run the test I get this error: Expected: "01001" Actual: 1001
I don't understand why that number is not recognized as a string. I've read that yml treats values that start with 0 as octal values, so adding the quotes should be enough to treat it as a string but is not working. I have also try the pipe "|" sign but doesn't work either.
Any idea why?
Quick Answer (TL;DR)
YAML 1.2 leading zeros can be preserved by quoting a scalar value.
Context
YAML 1.2
Scalar value with leading zeros
Problem
Scenario: Developer wishes to specify a scalar value with leading zero in a YAML 1.2 file.
When parsed, the leading zero gets omitted or truncated
Solution
Quote a scalar value in YAML to have it parsed as a string.
Leading zeros are preserved for non-numeric values.
Pitfalls
Data type casting for databases or programming language context may convert string scalar to numeric scalar value.
It turns out the problem is not what I thought it was (yml). The problem was that the fixtures were being pushed to the DB and the tests were actually retrieving the entry from the database (I thought the fixture were just in memory), and the database column type for that value was integer, not string, thus the leading zeros were being removed. My real problem was that I wanted that column to be the primary key of the table of type string and I didn't realize that the migration I created didn't change the type of the column to string in the test database.

Convert WideString to String

How can I replace the empty widestring characters from a string?
e.g:
'H e l l o' convert to 'Hello'
I read an text blob field from a DB, and want to store it in another table. When I read it, I get the extra spaces, when I store it the extra spaces remains, and cannot be read correctly at the next query.
DXE, Firebird 2.5
UPDATE:
I use the IBQuery -> DataSetProvider -> ClientDataSet without desin-time created fields.
It seams, that the IBQuery retrieves the data in that wrong format.
Current Write Code:
blobStream := TStringStream.Create;
...
blobStream.WriteString(blobText); //blobText = 'H e l l o';
ibsql.ParamByName('ABLOBCOL').LoadFromStream(blobStream);
...
ibsql.ExecQuery;
...
In the FB database is 'H e l l o' stored, but it must be 'Hello'. Since it seems to be a bug in IBQuery, I need a way to convert that string.
First of all, I'm not going to attempt to describe how to remove every other character from a string. Whilst that might appear to solve your problem it merely papers thinly over the gaping cracks. What you have here is a classic text encoding mismatch. The real solution to your problem will involve fixing the mismatch.
I suspect the problem arises in code that you have not shown. As I understand your question now, you have a string variable blobText that contains incorrectly encoded text. But the code in the question takes blobText as input, and so the damage has already been done by the time we reach the code in the question. The key to solving this is the code that puts duff data into blobText.
You need to find the code which assigns to blobText and sort out the encoding problem there. It looks like you have taken UTF-16 encoded text and interpreted it as if it had an 8 bit encoding. I could speculate as to how that would happen, but it would be better for you to look at the actual code, the code that assigns to blobText. If you cannot work it out, please do post an update to the question.
I'm pretty confident that there are no bugs in the database libraries and that this is just an encoding mismatch.

Resources