How to insert special characters in database? - character-encoding

I have characters like ó, ö and so on. When i insert these data into the table it displays like these ó ö. I am using php mysql. Is there any solution for this???

Please specify what DB system you're using -- each has its preferred way to specify character encoding. Then, set the character encoding of your DB to the same you're using for sending it the strings -- the latter may depend on your language and library, so if you want detailed help you'd better specify those too.

Depends on your database engine. For MySQL, the solution is usually to set your table's default character set, like so:
CREATE TABLE `foo` (
`bar` varchar(20) not null
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Run this query to check MySQL's various character encoding settings:
show variables like '%character%'
I would say you've probably got one or more of those set to latin1. Also, try executing these queries prior to inserting/fetching data:
SET NAMES utf8
SET CHARACTER SET utf8

Related

Loading data in hive table with multiple charsets

I am facing issues where i have multiple files with different charsets, say one file has Chinese charsets and other has French Charsets, how can i load them in a single hive table? I searched online and found this :-
ALTER TABLE mytable SET SERDEPROPERTIES ('serialization.encoding'='SJIS');
With this i can handle charsets for one of the file either Chinese or French. Is there a way to handle both charsets once?
[UPDATE]
Okay i am using RegexSerde for fixed width file alongside encoding scheme being used is - ISO 8859-1. Seems Regex Serde is not taking this encoding scheme into account and splitting the characters considering default UTF-8 encoding scheme. Is there a way to take encoding scheme into account with Regex serde.
I am not sure if this is possible (i think it isn't based on https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java). A workaround could be create two tables with different enconding and create a view on top of that.

Does Fast Report 4 (Delphi 7) support UTF8 using frxUserDataSet?

I've done my homework, and specifically:
1) Read the whole FASTREPORT 4 manual. It does not mention UTF8, nor Unicode support
2) Looked for an answer here on SO
3) Googled it around
If I set a Text field and fill it with Thai characters, they are perfectly printed, so FastReport CAN handle Unicode characters, at least it can print them.
If I try to "pass" a value using the callbacks provided by the frxUserDataSet, then what I see is some garbled not-unicode text. In particular, if I pass e.g. a string made with the same 10 Thai characters, I see the same "set" of 3 or 4 garbled characters repeated ten times, so I am sure the data is passed correctly, but then FastReport has probably no way to know that they should be handled as Unicode.
The callback requires the data passed back to be of "variant" type, so I guess it's totally useless to cast them to any type, because variant will accept any of them.
I forgot to mention that I get the strings from a MySql DB and the data is stored as UTF8, and I do not even copy the data in a local variable: what I get from the DB is put into the variant.
Is there a way to force FastReport to print the data received as Unicode?
Thank you
Yes, FR4 with Delphi7 supports UTF8 using frxUserDataSet.
Just for future reference:
1) You MUST set your DB (MySql in my case) to use UTF8
2) You MUST set the character set in the component you use to access the DB to utf8 ("DAC for MySql" in my case, and the property is called ConnectionCharacterSet)
3) In all the frxUserDataSet callbacks, before setting the "value" variable, you MUST CONVERT whatever you have using the Utf8decode Delphi system routine, like this:
value := Utf8decode(fReports.q1.FieldValueByFieldName('yourDBfield'));
where fReports is the form name, and q1 the component used to access the DB.
I keep reading that using D7 and UniCode is almost impossible, but - as long as you use XP and up - it's only harder from what I am seeing. Unfortunately, I must use XP, D7 and cannot upgrade. But, as said, I am quickly becoming used to solve these problems so, in the future, I hope to be able to give back some help in the same way everybody has always helped me here :)

Can I directly use the accented syllable in database?

I am building a Rails application which needs the list of countries,cities.City should at least contain 100,000 peoples. I have found the data from Wikipedia.But I need a clarification city names contains some special letters.
Durrës - ë
Vicente López - ó
São Paulo - ã
I have googled and found these are accented syllable.
My question is
Can I directly insert these values into the database?
Can I search the database without any problem?
Thank you.
If you set your database to store values as utf-8 you should be able to store a wide range of such values without problem.
When it comes to sorting and comparing the important thing is which collation you ask the database to use. In a nutshell a collation is a set of rules saying how strings are compared, for example how is é sorted relative to e, is ß equal to ss and so on.
When using full text search (solr, sphinx etc) you should ensure that your stop words, choice of stemmer and so on are Unicode aware

reading and sorting a variable length CSV file

We am using OpenVMS system and I believe it is using the Cobol from HP.
With a data file of a lot of records ( 500mb or more ) which variable length. The records are comma delimited. I would like to parse each records and extract corresponding fields for processing. After that, I might want to sort it by some particular fields. Is it possible with cobol?
I've seen sorting with fixed-length records only.
Variable length is no problem, not sure exactly how this is done in VMS cobol but the IBMese for this is:-
FILE SECTION.
FD THE-FILE RECORD IS VARYING DEPENDING ON REC-LENGTH.
01 THE-RECORD PICTURE X(5000) .
WORKING-STORAGE SECTION.
01 REC-LENGTH PICTURE 9(5) COMPUTATIONAL.
When you read the file "REC-LENGTH" will contain the record length, when write a record it will write a record of length REC-LENGTH.
To handle the delimited record files you will probably need to use the "UNSTRING" verb to convert into a fixed format. This is pretty verbose (but then this is COBOL).
UNSTRING record DELIMITED BY ","
INTO field1, field2, field3, field4, field5 etc....
END-UNSTRING
Once the record is in fixed format you can use the SORT as normal.
The Cobol SORT verb will do what you need.
If the SD file contains variable-length records, all of the KEY data-items must be contained within the first n character positions of the record, where n equals the minimum records size
specified for the file. In other words, they have to be in the fixed part.
However, you can get around this easily by using an input procedure. This will let you create a virtual file that has its keys in the right place. In your input procedure, you will reformat your variable, comma delimited, record, into one that has its keys at the front, then "Release" it to the sort.
If my memory is correct, VMS has a SORT/MERGE utility that you could use after you have processed the file into a fixed file format (variable may also be possible). Typically a standalone SORT utility performs better than in-line COLBOL SORT and can be better design if the sort criteria changes in the future.
No need to write a solution in COBOL, at least not to sort the file. The UNIX sort utility should do it just fine, just call sort -t ',' -n with maybe a couple of other options.

Why is my query returning the wrong string type?

According to the official Firebird documentation, columns containing Unicode strings (what SQL Server calls NVARCHAR) should be declared as VARCHAR(x) CHARACTER SET UNICODE_FSS. So I did that, but when I query the table with DBExpress, the result I get back is a TStringField, which is AnsiString only, not the TWideStringField I was expecting.
How do I get DBX to give me a Unicode string result from a Unicode string column?
With Firebird, your only option is to set the whole database connection to a Unicode char set, for example to utf8.
That way, all the VarChar columns will result in fields of type TWideStringField. The fields will be always TWideStringFields despite the particular char set declared when creating the column.
Setting this, will result in this:
I collect this images now from a example project I created while teaching Delphi a few months ago. You have to set this property before creating any persistent fields if that's your case.
It looks like the driver does not support the UNICODE_FSS charset, as my first action was to create a new project, set the property and then create some fields. IMHO it's better to declare the whole database as utf8 or other charset supported by the driver in the create database sentence, and then match the database charset in Delphi to avoid string conversions.

Resources