MySQL table charset issues when moving servers - character-encoding

I am trying to move phpBB's database from one old server to another new one.
Old server is running MySQL 5.0.45 with phpMyAdmin 2.8.0.2.
New one is running MySQL 5.5.17 with phpMyAdmin 3.4.3.2.
Old server's default charset is UTF-8, but phpBB's database is all in latin1 and latin1_swedish_ci collation. As it is polish forum, it has polish accents in words in it and although they display correctly on the forum, phpMyAdmin shows them as:
ć is displayed as æ
ś - ¶
ż - ¿
ł - ³
and so on...
I got two dumps of the database, one from phpMyAdmin available on the server and the second one done by server's admin using mysqldump. My guess is that dump is done in utf-8 format, but in a way that prevents me from importing the dump to the new database and keeping polish accents. E.g. UTF-8's hex value for letter ć is C4 87, while in both dumps hex value for the letter is C3 A6.
So, how do I go about it? What do I do to import the dump (or export correctly if that's the problem) so it preserves polish accents?
Maybe I should convert the dump somehow? I tried using iconv with no success, but I have next to none experience with it.
Oh, and by the way, the forums html charset is iso-8859-2, which is correct for displaying polish accents.

If you have access to your server, you can change the default encoding for your new server in my.cnf file.
[mysqld]
default-character-set=latin1
default-collation=latin1_german1_ci
and if you use command line to perform import you can use this command
mysql -h host -u username -p password -default_character_set utf8 database < file.sql

Related

INSERT INTO ... in MariaDB in Ubuntu under Windows WSL2 results in corrupted data in some columns

I am migrating a MariaDB database into a Linux docker container.
I am using mariadb:latest in Ubuntu 20 LTS via Windows 10 WSL2 via VSCode Remote WSL.
I have copied the sql dump into the container and imported it into the InnoDB database which has DEFAULT CHARACTER SET utf8. It does not report any errors:
> source /test.sql
That file does this (actual data truncated for this post):
USE `mydb`;
DROP TABLE IF EXISTS `opsitemtest`;
CREATE TABLE `opsitemtest` (
`opId` int(11) NOT NULL AUTO_INCREMENT,
`opKey` varchar(50) DEFAULT NULL,
`opName` varchar(200) DEFAULT NULL,
`opDetails` longtext,
PRIMARY KEY (`opId`),
KEY `token` (`opKey`)
) ENGINE=InnoDB AUTO_INCREMENT=4784 DEFAULT CHARSET=latin1;
insert into `opsitemtest`(`opId`,`opKey`,`opName`,`opDetails`) values
(4773,'8vlte0755dj','VTools addin for MSAccess','<p>There is a super helpful ...'),
(4774,'8vttlcr2fTA','BAS OLD QB','<ol>\n<li><a href=\"https://www.anz.com/inetbank/bankmain.asp\" ...'),
(4783,'9c7id5rmxGK','STP - Single Touch Payrol','<h1>Gather data</h1>\n<ol style=\"list-style-type: decimal;\"> ...');
If I source a subset of 12 records of the table in question all the columns are correctly populated.
If I source the full set of data for the same table ( 4700 rows ) where everything else is the same, many of the opDetails long text fields have a length showing in sqlYog but no data is visible. If I run a SELECT on that column there are no errors but some of the opDetails fields are "empty" (meaning: you can't see any data), and when I serialize that field, the opDetails column of some records (not all) has
"opDetails" : "\u0000\u0000\u0000\u0000\u0000\u0000\",
( and many more \u0000 ).
The opDetails field contains HTML fragments. I am guessing it is something to do with that content and possibly the CHARSET, although that doesn't explain why the error shows up only when there are a large number of rows imported. The same row imported via a set of 12 rows works correctly.
The same test of the full set of data on a Windows box with MariaDB running on that host (ie no Ubuntu or WSL etc) all works perfectly.
I tried setting the table charset to utf8 to match the database default but that had no effect. I assume it is some kind of Windows WSL issue but I am running the source command on the container all within the Ubuntu host.
The MariaDB data folder is mapped using a volume, again all inside the Ubuntu container:
volumes:
- ../flowt-docker-volumes/mariadb-data:/var/lib/mysql
Can anyone offer any suggestions while I go through and try manually removing content until it works? I am really in the dark here.
EDIT: I just ran the same import process on a Mac to a MariaDB container on the OSX host to check whether it was actually related to Windows WSL etc and the OSX database has the same issue. So maybe it is a MariaDB docker issue?
EDIT 2: It looks like it has nothing to do with the actual content of opDetails. For a given row that is showing the symptoms, whether or not the data gets imported correctly seems to depend on how many rows I am importing! For a small number of rows, all is well. For a large number there is missing data, but always the same rows and opDetails field. I will try importing in small chunks but overall the table isn't THAT big!
EDIT 3: I tried a docker-compose without a volume and imported the data directly into the MariaDB container. Same problem. I was wondering whether it was a file system incompatibility or some kind of speed issue. Yes, grasping at straws!
Thanks,
Murray
OK. I got it working. :-)
One piece of info I neglected to mention, and it might not be relevant anyway, is that I was importing from an sql dump from 10.1.48-MariaDB-0ubuntu0.18.04.1 because I was migrating a legacy app.
So, with my docker-compose:
Version
Result
mysql:latest
data imported correctly
mariadb:latest
failed as per this issue
mariadb:mariadb:10.7.4
failed as per this issue
mariadb:mariadb:10.7
failed as per this issue
mariadb:10.6
data imported correctly
mariadb:10.5
data imported correctly
mariadb:10.2
data imported correctly
Important: remember to completely remove the external volume mount folder content between tests!
So, now I am not sure whether the issue was some kind of sql incompatibility that I need to be aware of, or whether it is a bug that was introduced between v10.6 and 10.7. Therefore I have not logged a bug report. If others with more expertise think this is a bug, I am happy to make a report.
For now I am happy to use 10.6 so I can progress the migration- the deadline is looming!
So, this is sort of "solved".
Thanks for all your help. If I discover anything further I will post back here.
Murray

Is there a way to configure the filename for a Neo4j Desktop database dump file to exclude timestamp?

I'm a first time user of Neo4j and following a training course to install and learn the basics.
I've installed Neo4j Desktop on a Windows machine and can see that it comes with a demo DB called "Movie DBMS". I'm trying to follow steps to dump the database, by stopping the database, clicking on "..." and then "Dump".
The dump errors with the following error in the log file:
[2022-01-31 12:54:36.022] [error] Selecting JVM - Version:11.0.8+10-LTS, Name:OpenJDK 64-Bit Server VM, Vendor:Azul Systems, Inc.
java.nio.file.InvalidPathException: Illegal char <:> at index 128: C:\Users<me>.Neo4jDesktop\relate-data\projects<my project name>\movie-dbms-neo4j-31-Jan-2022-12:54:31.dump
It would appear that the automatic configuration for the dump file is adding a timestamp with includes colons (hh:mm:ss). How can I configure the file name to either exclude the timestamp or avoid using ":"?
Thanks.
I had no responses. But I've figured it out myself.
The answer was to use the command line to dump the database manually. At that point I can specify my own "--to=" filename which doesn't include a ":".
Details in this section of the manual: https://neo4j.com/docs/operations-manual/current/backup-restore/offline-backup/#offline-backup

Loading Unicode or UTF-8 characters into file datastore in Oracle Data Integrator (ODI)

How do I load Arabic characters from an Oracle table into a flat file in ODI 10g.
I used the "LKM SQL to File Append" to load data to a flat file. But I believe that it is creating the file in ANSI encoding. This is causing all special characters to appear as question marks "?" in the flat file. This is the only LKM module I found on the tool that loads table to a file.
Additionally I also tried writing UTF-8 & Unicode in the "Format" field under the "columns" tab of the file data store model. But this didn't work.
Is there any way I can create flat with Unicode/UTF8 encoding using oracle data integrator?
Did you try to create new File Data Server and setting the Encoding setting to 'UTF8'? Try this instructions: Workin with Files, ODI

Foreign character issue with CSV import to Heroku Postgres DB

I have a rails app where my users can manually set up products via a web form. This works fine and accepts foreign characters well, words like 'Svölk' for example.
I now have a need to bulk import products and am using FasterCSV to do so. Generally this works without issue, but when the CSV contains foreign characters it stalls at that point.
Am I correct to believe the file needs to be UTF-8 in the first instance?
Also, I'm running Ruby 1.8.7 so is ICONV my only solution for converting the file? This could be an issue as the format of the original file won't be known.
Have others encountered this issue and if so, how did you overcome it?
You have two alternatives:
Use ensure_encoding gem to find the actual encoding of the strings.
Use Ruby to determine the file encoding using:
File.open(source_file).read.encoding
I prefer the first approach as it tries to detected the encoding based on Strings, and tries to convert to your desired encoding (UTF-8) and then you can set the encoding on FasterCSV options.

Incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string) on Heroku

I have a Rails application where I use regex-based rules to categorize transactions. In my seeds.rb, I create some categories and rules, then import transactions from a CSV file (also utf8-encoded) and allow them to be categorized. This process works fine on my development machine, but when I run it on Heroku, I get:
incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
I am running the Cedar Stack, Rails 2.3.15. I have put
# encoding: utf-8
at the top of all my source files and I've set the encoding to utf-8 in my app config, so I'm not sure what else could be causing this problem. I'm wondering if has something to do with the Heroku configuration.
The issue could be caused by invisible characters that are ignored by your local operating system, ensuring proper encoding takes place whereas on Heroku, the characters mess up the magic number declaration at the top of the file and you end up with both ASCII-8BIT and UTF-8.
Since the file that is having issues contains the regex, it's probably your model class instead of seeds.rb.
There are many ways to view invisible characters in your file. In vi, just set the option :set list

Resources