Rails mysql2 migration breaks utf8 strings - ruby-on-rails

I'm migrating old mysql database to new one so I wrote a script that connects to old database, reads it and creates new entities in new database. Here it is:
old_db = Mysql2::Client.new(host: options[:db_host],
username: options[:db_user],
password: options[:db_password],
database: options[:db_name],
encoding: 'utf8')
old_categories = old_db.query('select id, title from catalog__category order by lvl asc')
old_categories.each do |old_c|
c = Catalog::Category.new
c.name = old_c["title"]
c.save!
end
However after migration categories names appeared in really bad shape.
Both databases encoded in utf8. Client and server sets utf8 encoding
mysql> show variables like "%character%";
+--------------------------+------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/Cellar/mysql/5.5.27/share/mysql/charsets/ |
PHP project uses old database and shows all strings correctly,
Rails project uses new database and shows correctly everything, but imported categories strings.
Does anyone knows where is the problem and how to fix it?
Thank you.

The root problem with bad encoding in database itself (it was changed from latin1 to utf8 on the fly some time ago). As a result strings was encoded twice in dump.
mysqldump --default-character-set=latin1 --skip-set-charset -u <user> -p > b.sql
This command help to generate correct dump

Related

Unable to use TimescaleDB in Rails test environment

I'm stuck using TimescaleDB in Rails - everything works fine in development, but in my test suite I cannot insert any data.
What I tried
A) Use SQL schema dump
This causes the original error message I saw. It does create parts of the schema for TimescaleDB but not all of it. I have a hypertable but it's not working properly
B) Use Ruby schema dump
This lets me insert into my table but it's not a hypertable at all - the ruby syntax looses everything related to TimescaleDB and hypertables.
C) Migrate test database directly
I tried avoiding the schema.structure dump and load with the following:
$ rails db:drop
Dropped database 'my_app_development'
Dropped database 'my_app_test'
$ RAILS_ENV=test rails db:create
Created database 'my_app_test'
$ RAILS_ENV=test rails db:migrate
== 20200517164444 EnableTimescaledbExtension: migrating =======================
-- enable_extension("timescaledb")
WARNING:
WELCOME TO
_____ _ _ ____________
|_ _(_) | | | _ \ ___ \
| | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ /
| | | | _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \
| | | | | | | | | __/\__ \ (_| (_| | | __/ |/ /| |_/ /
|_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
Running version 1.7.0
For more information on TimescaleDB, please visit the following links:
1. Getting started: https://docs.timescale.com/getting-started
2. API reference documentation: https://docs.timescale.com/api
3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture
Note: TimescaleDB collects anonymous reports to better understand and assist our users.
For more information and how to disable, please see our docs https://docs.timescaledb.com/using-timescaledb/telemetry.
-> 0.2315s
== 20200517164444 EnableTimescaledbExtension: migrated (0.2316s) ==============
== 20200517165027 CreateAccounts: migrating ===================================
-- create_table(:accounts)
-> 0.0095s
== 20200517165027 CreateAccounts: migrated (0.0095s) ==========================
== 20200517165103 CreateMetrics: migrating ====================================
-- create_table(:metrics)
-> 0.0116s
== 20200517165103 CreateMetrics: migrated (0.0117s) ===========================
== 20200517170842 CreateEvents: migrating =====================================
-- create_table(:events)
-> 0.0072s
-- remove_column(:events, :id)
-> 0.0020s
-- execute("SELECT create_hypertable('events', 'time');\n")
-> 0.0047s
== 20200517170842 CreateEvents: migrated (0.0142s) ============================
pg_dump: warning: there are circular foreign-key constraints on this table:
pg_dump: hypertable
pg_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
pg_dump: warning: there are circular foreign-key constraints on this table:
pg_dump: chunk
pg_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
But when running the test suite it is the same as attempt A.
Running the tests after actually prints this message a few times which makes me think that Rails auto-magically uses the structure.sql again to recreate the test DB:
psql:/home/axel/src/my_app/db/structure.sql:16: WARNING:
WELCOME TO
_____ _ _ ____________
|_ _(_) | | | _ \ ___ \
| | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ /
| | | | _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \
| | | | | | | | | __/\__ \ (_| (_| | | __/ |/ /| |_/ /
|_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
Running version 1.7.0
For more information on TimescaleDB, please visit the following links:
1. Getting started: https://docs.timescale.com/getting-started
2. API reference documentation: https://docs.timescale.com/api
3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture
Note: TimescaleDB collects anonymous reports to better understand and assist our users.
For more information and how to disable, please see our docs https://docs.timescaledb.com/using-timescaledb/telemetry.
Error message
$ rails test
Running via Spring preloader in process 107937
Run options: --seed 29840
# Running:
E
Error:
Api::EventsControllerTest#test_POST_event_data_-_new_metric:
DRb::DRbRemoteError: PG::FeatureNotSupported: ERROR: invalid INSERT on the root table of hypertable "events"
HINT: Make sure the TimescaleDB extension has been preloaded.
(ActiveRecord::StatementInvalid)
app/controllers/api/events_controller.rb:5:in `create'
test/controllers/api/events_controller_test.rb:9:in `block in <class:EventsControllerTest>'
rails test test/controllers/api/events_controller_test.rb:8
Finished in 0.215286s, 4.6450 runs/s, 0.0000 assertions/s.
1 runs, 0 assertions, 0 failures, 1 errors, 0 skips
I have the feeling it's related to how Rails creates the test database using the schema.rb (for default config.active_record.schema_format = :ruby) or structure.sql (for config.active_record.schema_format = :sql.
I already tried both, the Ruby and SQL setting of the structure and neither works - development DB gets migrated correctly but test DB is not set up correctly.
In the two databases below (development and test) we can see the only difference is that the test DB is missing: Child tables: _timescaledb_internal._hyper_1_1_chunk
Development DB
$ psql -d my_app_development
psql (12.2)
Type "help" for help.
my_app_development=# SHOW shared_preload_libraries;
shared_preload_libraries
--------------------------
timescaledb
(1 row)
my_app_development=# insert into events (metric_id, time, value) VALUES (1, NOW(), 22);
INSERT 0 1
my_app_development=# \d+ events
Table "public.events"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
metric_id | bigint | | | | plain | |
time | timestamp without time zone | | not null | | plain | |
value | numeric | | | | main | |
Indexes:
"events_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON events FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_1_chunk
Access method: heap
Test DB
$ psql -d my_app_test
psql (12.2)
Type "help" for help.
my_app_test=# SHOW shared_preload_libraries;
shared_preload_libraries
--------------------------
timescaledb
(1 row)
my_app_test=# insert into events (metric_id, time, value) VALUES (1, NOW(), 22);
ERROR: invalid INSERT on the root table of hypertable "events"
HINT: Make sure the TimescaleDB extension has been preloaded.
my_app_test=# \d+ events
Table "public.events"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
metric_id | bigint | | | | plain | |
time | timestamp without time zone | | not null | | plain | |
value | numeric | | | | main | |
Indexes:
"events_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON events FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Access method: heap
ActiveRecord with SQL schema
CREATE EXTENSION IF NOT EXISTS timescaledb WITH SCHEMA public;
SET default_tablespace = '';
SET default_table_access_method = heap;
CREATE TABLE public.events (
metric_id bigint,
"time" timestamp without time zone NOT NULL,
value numeric
);
CREATE INDEX events_time_idx ON public.events USING btree ("time" DESC);
CREATE TRIGGER ts_insert_blocker BEFORE INSERT ON public.events FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker();
ActiveRecord with Ruby schema
ActiveRecord::Schema.define(version: 2020_05_17_170842) do
# These are extensions that must be enabled in order to support this database
enable_extension "plpgsql"
enable_extension "timescaledb"
create_table "events", id: false, force: :cascade do |t|
t.bigint "metric_id"
t.datetime "time", null: false
t.decimal "value"
t.index ["time"], name: "events_time_idx", order: :desc
end
end
Note: this looses the ts_insert_blocker trigger and lets me insert into the events table but it is not a hypertable anymore:
my_app_test=# \d+ events
Table "public.events"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
metric_id | bigint | | | | plain | |
time | timestamp without time zone | | not null | | plain | |
value | numeric | | | | main | |
Indexes:
"events_time_idx" btree ("time" DESC)
Access method: heap
Additional information
Related question: Running an RSpec test suite against a TimescaleDB database with Rails 4.2 - The suggestions did not work for me and there is no accepted answer.
Version information:
Rails 6.0.3
Postgres 12.2
TimescaleDB 1.7.0
Edit 1
I added the following to my test/test_helper.rb similar to the workaround mentioned by #cstabru
def execute_create_hypertable(sql)
ActiveRecord::Base.connection.execute(sql)
rescue ActiveRecord::StatementInvalid => e
raise e unless e.message.include? 'is already a hypertable'
end
execute_create_hypertable <<~SQL
SELECT create_hypertable('events', 'time');
SQL
But maybe we can use something like SELECT create_hypertable('hypertable_name', 'time_field', if_not_exists => TRUE in an initializer instead of creating hypertables in DB migrations?
I ran into this as well, no matter which way i recreate the db schema (sql or ruby formats) the hyper table is not recreated as the timescale internal schema data is not exported.
Noting that when I restore using the sql format, it copies across the ts_insert_blocker trigger which indeed break inserts on the table with this error (I believe is due to the trigger function not being available)
PG::FeatureNotSupported: ERROR: invalid INSERT on the root table of hypertable "hypertable_name"
HINT: Make sure the TimescaleDB extension has been preloaded.
To fix the underlying issue (either sql or ruby formats) we can recreate the hypertable (and removing the trigger) manually via the following
DROP TRIGGER IF EXISTS ts_insert_blocker ON events;
DROP TRIGGER
SELECT create_hypertable('hypertable_name', 'time_field', if_not_exists => TRUE);
....
(1 row)
Now manually check for the hypertable existence since https://github.com/timescale/timescaledb/pull/862
SELECT * FROM timescaledb_information.hypertable;
I've added these DDL commands to my spec_helper.rb to ensure the test db uses an actual hypertable. I want to ensure the test db schema mirrors my production / staging setups.
config.before(:suite) do
# ensure the hypertable_name hypertable is setup correctly
ActiveRecord::Base.connection.execute(
"DROP TRIGGER IF EXISTS ts_insert_blocker ON hypertable_name;"
)
ActiveRecord::Base.connection.execute(
"SELECT create_hypertable('hypertable_name', 'time_field', if_not_exists => TRUE);"
)
has_hypertables_sql = "SELECT * FROM timescaledb_information.hypertable WHERE table_name = 'hypertable_name';"
if ActiveRecord::Base.connection.execute(has_hypertables_sql).to_a.empty?
raise "TimescaleDB missing hypertable on 'hypertable_name' table"
end
end
If folks find this useful I can look at extracting to a gem to help with schema restores for rails environments, https://github.com/timescale/timescaledb/issues/1916
In case somebody uses the DDL commands of cstabru's answer in spec_helper.rb and got the error PG::UndefinedTable: ERROR: Relation »timescaledb_information.hypertable« does not exist
From timescaledb version 2.0 on you have to use plural timescaledb_information.hypertables and the column name has changed too, so now you have to use hypertable_name instead of table_name.
config.before(:suite) do
# ensure the hypertable_name hypertable is setup correctly
ActiveRecord::Base.connection.execute(
"DROP TRIGGER IF EXISTS ts_insert_blocker ON hypertable_name;"
)
ActiveRecord::Base.connection.execute(
"SELECT create_hypertable('hypertable_name', 'time_field', if_not_exists => TRUE);"
)
has_hypertables_sql = "SELECT * FROM timescaledb_information.hypertables WHERE hypertable_name = 'hypertable_name';"
if ActiveRecord::Base.connection.execute(has_hypertables_sql).to_a.empty?
raise "TimescaleDB missing hypertable on 'hypertable_name' table"
end
end

Ruby on Rails to Postgres Binary encoding

I have a Ruby on Rails + Postgresql app that saves file images into db, in some postgres installations data is saved erroneously in what it looks like a different bytea format, so the image in some postgres installation fails to return file correctly
WORKING SAVED DATA
\xFF\xD8\xFF\xE1\x00\x\xFF\xEE\x00\x0EAdobe\x00d\xC0\x00\x00\x00\x01\xFF\xDB\x00\x84\x00\x06\x04\x04\x04\x05\x04\x06\x05\x05\x06\t\x06\x05\x06\t\v\b\x06\x06\b\v\f\n\n\v\n\n\f\x10\f\f\f\f\f\f\x10\f\x0E\x0F\x10\x0F\x0E\f\x13\x13\x14\x14\x13\x13\x1C\e\e\e\x1C\x1F\x1F\x1F\x1F\x1F\x1F\x1F\x1F\x1F\x1F\x01\a\a\a\r\f\r\x18\x10\x10\x18\x1A\x15\x11\x15
NON WORKING SAVED DATA
x89504e470d0a1a0a0000000d494844520000014a0000003d0806000000cb7920c80000001974455874536f6674776172650041646f626520496d616765526561647971c9653c000047eb4944415478daec5d07781455d79e99edbdef269bb62940208426bd4a5110503e51c40e28d8051145fcec62ff4041c08628624169a28234419ad2a477d2934db2bdf736ffb95bc226a46c4240f1cf7d9e81ecececade7bef73de79e7b062749126b4b6da92db5a5b6d470
any suggestions??
RoR
On RoR i do this to save the file
data = (!file.nil?) ? file.read.force_encoding("UTF-8") : nil
Brand.create!({name: brand[:name], value: brand[:value], file: file, file_properties: get_file_properties(file_url), default: brand[:default], data: data})
and to get the the file image
image_tag(o.get_base64_image, style: 'max-width:100%')
With the model method
def get_base64_image
return self.data != nil ? 'data:image/png;base64,' + Base64.encode64(self.data) : ""
end
Server encoding for both servers is UTF-8
Postgres
potgresql.conf
bytea_output = 'escape'
Working Server
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
--------------------+------------+----------+-------------+-------------+---------------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
Non-working server
List of databases
Name | Owner | Encoding | Collation | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | es_MX.UTF-8 | es_MX.UTF-8 |
The issue is related to the output of BYTEA column in postgresql, i solved like this:
ALTER ROLE postgres SET bytea_output TO 'escape';

Configuring postgresql in rails

I am working on a project with a friend. I cloned the application from bitbucket. Everything was fine except postgresql (v9.3.7) . It keeps giving me the following message.
psql: FATAL: password authentication failed for user "ubuntu"
FATAL: password authentication failed for user "ubuntu"
I have created a superuser as well as all the databases. The designated users and the list of all the databases is given below.
ubuntu=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------+-----------
divjot | Superuser | {}
postgres | Superuser, Create role, Create DB, Replication | {}
ubuntu | Superuser, Create role, Create DB | {}
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------------+----------+-----------+---------+-------+-----------------------
app_development | ubuntu | SQL_ASCII | C | C |
app_production | ubuntu | SQL_ASCII | C | C |
app_test | ubuntu | SQL_ASCII | C | C |
postgres | postgres | SQL_ASCII | C | C |
template0 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
ubuntu | ubuntu | SQL_ASCII | C | C |
I have always struggled with postgresql configuration in rails. I follow the documentation completely, however, every time I try to clone application or move the code, I always run into problems. I am not sure why I am getting this error. Any help regarding this would be greatly appreciated. Thanks!!
Disclaimer: I'm not an expert on pgsql. But I've successfully set up pg/rails in several versions and environments. Here's what I suggest.
Find the pg_hba.conf file. Since you are apparently using Ubuntu, try
cd /
find . -name pg_hba.conf -print 2> /dev/null
This will search your whole disk, which will take a while. But eventually it will provide the path. If it produces more than one, you'll have to resolve which one is correct.
If you know where PG is installed, cd there instead of root.
Now
Verify the auth method for user ubuntu is password or maybe md5. Here is the relevant docs page. If you're interested only in local development, you can change password to trust. But if this is for production, that's a bad idea.
While logged into the pg command line, run
ALTER USER ubuntu PASSWORD 'newpassword';
to ensure the password is what you think it is.
You should post database.yaml or ENV['DATABASE_URL'] settings. In general, database.yaml needs to match precisely what pg expects.
For example:
development:
adapter: postgresql
encoding: unicode
database: app_development
pool: 5
username: ubuntu
password: <your password>
allow_concurrency: true
Caveat: Don't commit production passwords to your repos or even dev passwords if you don't totally control the repo.

Removal of Role PostgreSQL Failed - cache lookup failed for database

This is my first time using PostgreSQL for production.
I made a database blog_production with username blog_production and generated password from gemfile capistrano-postgresql. Once it is generated, I tried to delete database blog_production with this command from terminal:
$ sudo -u postgres dropdb blog_production
After that I tried to delete user blog_production with this command:
$ sudo -u postgres droprole blog_production
And it returned dropuser: removal of role "blog_production" failed: ERROR: cache lookup failed for database 16417
1.) Why is this happening?
2.) I also tried to delete from psql using DELETE FROM pg_roles WHERE rolname='blog_production' but it returned the same error (cache lookup failed)
3.) How do I solve this problem?
Thank you.
Additional Information
PostgreSQL Version
PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
(1 row)
select * from pg_shdepend where dbid = 16417;
dbid | classid | objid | objsubid | refclassid | refobjid | deptype
-------+---------+-------+----------+------------+----------+---------
16417 | 1259 | 16419 | 0 | 1260 | 16418 | o
16417 | 1259 | 16426 | 0 | 1260 | 16418 | o
16417 | 1259 | 16428 | 0 | 1260 | 16418 | o
(3 rows)
select * from pg_database where oid = 16417;
datname | datdba | encoding | datcollate | datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | dattablespace | datacl
---------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+---------------+--------
(0 rows)
select * from pg_authid where rolname = 'blog_production'
rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcatupdate | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvaliduntil
-----------------+----------+------------+---------------+-------------+--------------+-------------+----------------+--------------+-------------------------------------+---------------
blog_production | f | t | f | f | f | t | f | -1 | md5d4d2f8789ab11ba2bd019bab8be627e6 |
(1 row)
Somehow the DROP database; didn't drop the shared dependencies correctly. PostgreSQL still thinks that your user owns three tables in the database you dropped.
Needless to say this should not happen; it's almost certainly a bug, though I don't know how we'd even begin tracking it down unless you know exactly what commands you ran etc to get to this point, right from creating the DB.
If the PostgreSQL install's data isn't very big and if you can share the contents, can I get you to stop the database server and make a tarball of the whole database directory, then send it to me? I'd like to see if I can tell what happened to get you to this point.
Send a dropbox link to craig#2ndquadrant.com . Just:
sudo service postgresql stop
sudo tar cpjf ~abrahamks/abrahamks-postgres.tar.gz \
/var/lib/postgresql/9.1/main \
/etc/postgresql/9.1/main \
/var/log/postgresql/postgresql-9.1-main-*.
/usr/lib/postgresql/9.1
sudo chown abrahamks ~abrahamks/abrahamks-postgres.tar.gz
and upload abrahamks-postgres.tar.gz from your home folder.
Replace abrahamks with your username on your system. You might need to adjust the paths above if I'm misremembering where the PostgreSQL data lives on Debian-derived systems.
Note that this contains all your databases not just the one that was an issue, and it also contains your PostgreSQL user accounts.
(If you're going to send me a copy, do so before continuing):
Anyway, since the database is dropped, it is safe to manually remove the dependencies that should've been removed by DROP DATABASE:
DELETE FROM pg_shdepend WHERE dbid = 16417
It should then be possible to DROP USER blog_production;

Collation so ordering doesn't work properly with postresql

I am developing a rails app which has its content in Turkish. I am using Postgresql 9.2.2 as my database backend. Everything works fine (no weird character issues etc. ) except for proper ordering.
For example, when I try to list some items that are ordered by city they are in, I expect something like "Adana, Bursa, İstanbul, Giresun, Zonguldak ..".
Instead, I always get Turkish specific characters at the end/beginning of the list. (i.e."Adana, Bursa, Giresun, Zonguldak, İstanbul")
I have initialized my postgres db with command: initdb /usr/local/var/postgres -E utf8 --locale=tr_TR
when I \l in psql console I get the expected.
Name Owner Encoding Collate Ctype
----------------+-------------+----------+---------+-------+
app_development | app | UTF8 | tr_TR | tr_TR |
app_production | app | UTF8 | tr_TR | tr_TR |
app_test | app | UTF8 | tr_TR | tr_TR |
postgres | monkegjinni | UTF8 | tr_TR | tr_TR |
I also tried to create databases manually with LC_CTYPE="tr_TR.UTF-8" and LC_COLLATE="tr_TR.UTF-8", and again no progress.
Some information about my development environment:
Running Mountain Lion 10.8.2 with Macbook Pro 7.1
psql --version : 9.2.2
rails --version : 3.2.11
$ locale :
LANG="tr_TR.UTF-8"
LC_COLLATE="tr_TR.UTF-8"
LC_CTYPE="tr_TR.UTF-8"
LC_MESSAGES="tr_TR.UTF-8"
LC_MONETARY="tr_TR.UTF-8"
LC_NUMERIC="tr_TR.UTF-8"
LC_TIME="tr_TR.UTF-8"
LC_ALL=
How can I resolve this issue?
UTF-8 locales on OS X are broken. You can try to use a non-UTF-8 locale (tr_TR.ISO8859-9), but it appears that it is also broken in this respect. So it's not going to work.

Resources