Removal of Role PostgreSQL Failed - cache lookup failed for database - ruby-on-rails

This is my first time using PostgreSQL for production.
I made a database blog_production with username blog_production and generated password from gemfile capistrano-postgresql. Once it is generated, I tried to delete database blog_production with this command from terminal:
$ sudo -u postgres dropdb blog_production
After that I tried to delete user blog_production with this command:
$ sudo -u postgres droprole blog_production
And it returned dropuser: removal of role "blog_production" failed: ERROR: cache lookup failed for database 16417
1.) Why is this happening?
2.) I also tried to delete from psql using DELETE FROM pg_roles WHERE rolname='blog_production' but it returned the same error (cache lookup failed)
3.) How do I solve this problem?
Thank you.
Additional Information
PostgreSQL Version
PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
(1 row)
select * from pg_shdepend where dbid = 16417;
dbid | classid | objid | objsubid | refclassid | refobjid | deptype
-------+---------+-------+----------+------------+----------+---------
16417 | 1259 | 16419 | 0 | 1260 | 16418 | o
16417 | 1259 | 16426 | 0 | 1260 | 16418 | o
16417 | 1259 | 16428 | 0 | 1260 | 16418 | o
(3 rows)
select * from pg_database where oid = 16417;
datname | datdba | encoding | datcollate | datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | dattablespace | datacl
---------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+---------------+--------
(0 rows)
select * from pg_authid where rolname = 'blog_production'
rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcatupdate | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvaliduntil
-----------------+----------+------------+---------------+-------------+--------------+-------------+----------------+--------------+-------------------------------------+---------------
blog_production | f | t | f | f | f | t | f | -1 | md5d4d2f8789ab11ba2bd019bab8be627e6 |
(1 row)

Somehow the DROP database; didn't drop the shared dependencies correctly. PostgreSQL still thinks that your user owns three tables in the database you dropped.
Needless to say this should not happen; it's almost certainly a bug, though I don't know how we'd even begin tracking it down unless you know exactly what commands you ran etc to get to this point, right from creating the DB.
If the PostgreSQL install's data isn't very big and if you can share the contents, can I get you to stop the database server and make a tarball of the whole database directory, then send it to me? I'd like to see if I can tell what happened to get you to this point.
Send a dropbox link to craig#2ndquadrant.com . Just:
sudo service postgresql stop
sudo tar cpjf ~abrahamks/abrahamks-postgres.tar.gz \
/var/lib/postgresql/9.1/main \
/etc/postgresql/9.1/main \
/var/log/postgresql/postgresql-9.1-main-*.
/usr/lib/postgresql/9.1
sudo chown abrahamks ~abrahamks/abrahamks-postgres.tar.gz
and upload abrahamks-postgres.tar.gz from your home folder.
Replace abrahamks with your username on your system. You might need to adjust the paths above if I'm misremembering where the PostgreSQL data lives on Debian-derived systems.
Note that this contains all your databases not just the one that was an issue, and it also contains your PostgreSQL user accounts.
(If you're going to send me a copy, do so before continuing):
Anyway, since the database is dropped, it is safe to manually remove the dependencies that should've been removed by DROP DATABASE:
DELETE FROM pg_shdepend WHERE dbid = 16417
It should then be possible to DROP USER blog_production;

Related

Neo4J: "Database in use: false" after loading dump + migrate

I'm running a Neo4J 5.3.0 community edition and tried to load the dump from https://github.com/neo4j-graph-examples/twitch. I managed to load the dump and migrate the database.
My question: how do I use the "twitch" database?
cosh#osmingestor:~$ sudo neo4j-admin database info
Database name: neo4j
Database in use: true
Last committed transaction id:-1
Store needs recovery: true
Database name: system
Database in use: true
Last committed transaction id:-1
Store needs recovery: true
Database name: twitch
Database in use: false
Store format version: record-aligned-1.1
Store format introduced in: 5.0.0
Last committed transaction id:62742
Store needs recovery: false
show databases doesn't show it for my user "neo4j":
neo4j#neo4j> show databases;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| name | type | aliases | access | address | role | writer | requestedStatus | currentStatus | statusMessage | default | home | constituents |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "neo4j" | "standard" | [] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | TRUE | TRUE | [] |
| "system" | "system" | [] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | FALSE | FALSE | [] |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I tried to grant access to the database by executing:
neo4j#neo4j> grant all database PRIVILEGES ON DATABASES * TO neo4j;
Unsupported administration command: grant all database PRIVILEGES ON DATABASES * TO neo4j
Adding "neo4j" as the default admin did not help as well (was working though):
neo4j-admin dbms set-default-admin neo4j

Unable to use TimescaleDB in Rails test environment

I'm stuck using TimescaleDB in Rails - everything works fine in development, but in my test suite I cannot insert any data.
What I tried
A) Use SQL schema dump
This causes the original error message I saw. It does create parts of the schema for TimescaleDB but not all of it. I have a hypertable but it's not working properly
B) Use Ruby schema dump
This lets me insert into my table but it's not a hypertable at all - the ruby syntax looses everything related to TimescaleDB and hypertables.
C) Migrate test database directly
I tried avoiding the schema.structure dump and load with the following:
$ rails db:drop
Dropped database 'my_app_development'
Dropped database 'my_app_test'
$ RAILS_ENV=test rails db:create
Created database 'my_app_test'
$ RAILS_ENV=test rails db:migrate
== 20200517164444 EnableTimescaledbExtension: migrating =======================
-- enable_extension("timescaledb")
WARNING:
WELCOME TO
_____ _ _ ____________
|_ _(_) | | | _ \ ___ \
| | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ /
| | | | _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \
| | | | | | | | | __/\__ \ (_| (_| | | __/ |/ /| |_/ /
|_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
Running version 1.7.0
For more information on TimescaleDB, please visit the following links:
1. Getting started: https://docs.timescale.com/getting-started
2. API reference documentation: https://docs.timescale.com/api
3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture
Note: TimescaleDB collects anonymous reports to better understand and assist our users.
For more information and how to disable, please see our docs https://docs.timescaledb.com/using-timescaledb/telemetry.
-> 0.2315s
== 20200517164444 EnableTimescaledbExtension: migrated (0.2316s) ==============
== 20200517165027 CreateAccounts: migrating ===================================
-- create_table(:accounts)
-> 0.0095s
== 20200517165027 CreateAccounts: migrated (0.0095s) ==========================
== 20200517165103 CreateMetrics: migrating ====================================
-- create_table(:metrics)
-> 0.0116s
== 20200517165103 CreateMetrics: migrated (0.0117s) ===========================
== 20200517170842 CreateEvents: migrating =====================================
-- create_table(:events)
-> 0.0072s
-- remove_column(:events, :id)
-> 0.0020s
-- execute("SELECT create_hypertable('events', 'time');\n")
-> 0.0047s
== 20200517170842 CreateEvents: migrated (0.0142s) ============================
pg_dump: warning: there are circular foreign-key constraints on this table:
pg_dump: hypertable
pg_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
pg_dump: warning: there are circular foreign-key constraints on this table:
pg_dump: chunk
pg_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
But when running the test suite it is the same as attempt A.
Running the tests after actually prints this message a few times which makes me think that Rails auto-magically uses the structure.sql again to recreate the test DB:
psql:/home/axel/src/my_app/db/structure.sql:16: WARNING:
WELCOME TO
_____ _ _ ____________
|_ _(_) | | | _ \ ___ \
| | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ /
| | | | _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \
| | | | | | | | | __/\__ \ (_| (_| | | __/ |/ /| |_/ /
|_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
Running version 1.7.0
For more information on TimescaleDB, please visit the following links:
1. Getting started: https://docs.timescale.com/getting-started
2. API reference documentation: https://docs.timescale.com/api
3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture
Note: TimescaleDB collects anonymous reports to better understand and assist our users.
For more information and how to disable, please see our docs https://docs.timescaledb.com/using-timescaledb/telemetry.
Error message
$ rails test
Running via Spring preloader in process 107937
Run options: --seed 29840
# Running:
E
Error:
Api::EventsControllerTest#test_POST_event_data_-_new_metric:
DRb::DRbRemoteError: PG::FeatureNotSupported: ERROR: invalid INSERT on the root table of hypertable "events"
HINT: Make sure the TimescaleDB extension has been preloaded.
(ActiveRecord::StatementInvalid)
app/controllers/api/events_controller.rb:5:in `create'
test/controllers/api/events_controller_test.rb:9:in `block in <class:EventsControllerTest>'
rails test test/controllers/api/events_controller_test.rb:8
Finished in 0.215286s, 4.6450 runs/s, 0.0000 assertions/s.
1 runs, 0 assertions, 0 failures, 1 errors, 0 skips
I have the feeling it's related to how Rails creates the test database using the schema.rb (for default config.active_record.schema_format = :ruby) or structure.sql (for config.active_record.schema_format = :sql.
I already tried both, the Ruby and SQL setting of the structure and neither works - development DB gets migrated correctly but test DB is not set up correctly.
In the two databases below (development and test) we can see the only difference is that the test DB is missing: Child tables: _timescaledb_internal._hyper_1_1_chunk
Development DB
$ psql -d my_app_development
psql (12.2)
Type "help" for help.
my_app_development=# SHOW shared_preload_libraries;
shared_preload_libraries
--------------------------
timescaledb
(1 row)
my_app_development=# insert into events (metric_id, time, value) VALUES (1, NOW(), 22);
INSERT 0 1
my_app_development=# \d+ events
Table "public.events"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
metric_id | bigint | | | | plain | |
time | timestamp without time zone | | not null | | plain | |
value | numeric | | | | main | |
Indexes:
"events_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON events FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_1_chunk
Access method: heap
Test DB
$ psql -d my_app_test
psql (12.2)
Type "help" for help.
my_app_test=# SHOW shared_preload_libraries;
shared_preload_libraries
--------------------------
timescaledb
(1 row)
my_app_test=# insert into events (metric_id, time, value) VALUES (1, NOW(), 22);
ERROR: invalid INSERT on the root table of hypertable "events"
HINT: Make sure the TimescaleDB extension has been preloaded.
my_app_test=# \d+ events
Table "public.events"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
metric_id | bigint | | | | plain | |
time | timestamp without time zone | | not null | | plain | |
value | numeric | | | | main | |
Indexes:
"events_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON events FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Access method: heap
ActiveRecord with SQL schema
CREATE EXTENSION IF NOT EXISTS timescaledb WITH SCHEMA public;
SET default_tablespace = '';
SET default_table_access_method = heap;
CREATE TABLE public.events (
metric_id bigint,
"time" timestamp without time zone NOT NULL,
value numeric
);
CREATE INDEX events_time_idx ON public.events USING btree ("time" DESC);
CREATE TRIGGER ts_insert_blocker BEFORE INSERT ON public.events FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker();
ActiveRecord with Ruby schema
ActiveRecord::Schema.define(version: 2020_05_17_170842) do
# These are extensions that must be enabled in order to support this database
enable_extension "plpgsql"
enable_extension "timescaledb"
create_table "events", id: false, force: :cascade do |t|
t.bigint "metric_id"
t.datetime "time", null: false
t.decimal "value"
t.index ["time"], name: "events_time_idx", order: :desc
end
end
Note: this looses the ts_insert_blocker trigger and lets me insert into the events table but it is not a hypertable anymore:
my_app_test=# \d+ events
Table "public.events"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
metric_id | bigint | | | | plain | |
time | timestamp without time zone | | not null | | plain | |
value | numeric | | | | main | |
Indexes:
"events_time_idx" btree ("time" DESC)
Access method: heap
Additional information
Related question: Running an RSpec test suite against a TimescaleDB database with Rails 4.2 - The suggestions did not work for me and there is no accepted answer.
Version information:
Rails 6.0.3
Postgres 12.2
TimescaleDB 1.7.0
Edit 1
I added the following to my test/test_helper.rb similar to the workaround mentioned by #cstabru
def execute_create_hypertable(sql)
ActiveRecord::Base.connection.execute(sql)
rescue ActiveRecord::StatementInvalid => e
raise e unless e.message.include? 'is already a hypertable'
end
execute_create_hypertable <<~SQL
SELECT create_hypertable('events', 'time');
SQL
But maybe we can use something like SELECT create_hypertable('hypertable_name', 'time_field', if_not_exists => TRUE in an initializer instead of creating hypertables in DB migrations?
I ran into this as well, no matter which way i recreate the db schema (sql or ruby formats) the hyper table is not recreated as the timescale internal schema data is not exported.
Noting that when I restore using the sql format, it copies across the ts_insert_blocker trigger which indeed break inserts on the table with this error (I believe is due to the trigger function not being available)
PG::FeatureNotSupported: ERROR: invalid INSERT on the root table of hypertable "hypertable_name"
HINT: Make sure the TimescaleDB extension has been preloaded.
To fix the underlying issue (either sql or ruby formats) we can recreate the hypertable (and removing the trigger) manually via the following
DROP TRIGGER IF EXISTS ts_insert_blocker ON events;
DROP TRIGGER
SELECT create_hypertable('hypertable_name', 'time_field', if_not_exists => TRUE);
....
(1 row)
Now manually check for the hypertable existence since https://github.com/timescale/timescaledb/pull/862
SELECT * FROM timescaledb_information.hypertable;
I've added these DDL commands to my spec_helper.rb to ensure the test db uses an actual hypertable. I want to ensure the test db schema mirrors my production / staging setups.
config.before(:suite) do
# ensure the hypertable_name hypertable is setup correctly
ActiveRecord::Base.connection.execute(
"DROP TRIGGER IF EXISTS ts_insert_blocker ON hypertable_name;"
)
ActiveRecord::Base.connection.execute(
"SELECT create_hypertable('hypertable_name', 'time_field', if_not_exists => TRUE);"
)
has_hypertables_sql = "SELECT * FROM timescaledb_information.hypertable WHERE table_name = 'hypertable_name';"
if ActiveRecord::Base.connection.execute(has_hypertables_sql).to_a.empty?
raise "TimescaleDB missing hypertable on 'hypertable_name' table"
end
end
If folks find this useful I can look at extracting to a gem to help with schema restores for rails environments, https://github.com/timescale/timescaledb/issues/1916
In case somebody uses the DDL commands of cstabru's answer in spec_helper.rb and got the error PG::UndefinedTable: ERROR: Relation »timescaledb_information.hypertable« does not exist
From timescaledb version 2.0 on you have to use plural timescaledb_information.hypertables and the column name has changed too, so now you have to use hypertable_name instead of table_name.
config.before(:suite) do
# ensure the hypertable_name hypertable is setup correctly
ActiveRecord::Base.connection.execute(
"DROP TRIGGER IF EXISTS ts_insert_blocker ON hypertable_name;"
)
ActiveRecord::Base.connection.execute(
"SELECT create_hypertable('hypertable_name', 'time_field', if_not_exists => TRUE);"
)
has_hypertables_sql = "SELECT * FROM timescaledb_information.hypertables WHERE hypertable_name = 'hypertable_name';"
if ActiveRecord::Base.connection.execute(has_hypertables_sql).to_a.empty?
raise "TimescaleDB missing hypertable on 'hypertable_name' table"
end
end

Configuring postgresql in rails

I am working on a project with a friend. I cloned the application from bitbucket. Everything was fine except postgresql (v9.3.7) . It keeps giving me the following message.
psql: FATAL: password authentication failed for user "ubuntu"
FATAL: password authentication failed for user "ubuntu"
I have created a superuser as well as all the databases. The designated users and the list of all the databases is given below.
ubuntu=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------+-----------
divjot | Superuser | {}
postgres | Superuser, Create role, Create DB, Replication | {}
ubuntu | Superuser, Create role, Create DB | {}
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------------+----------+-----------+---------+-------+-----------------------
app_development | ubuntu | SQL_ASCII | C | C |
app_production | ubuntu | SQL_ASCII | C | C |
app_test | ubuntu | SQL_ASCII | C | C |
postgres | postgres | SQL_ASCII | C | C |
template0 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
ubuntu | ubuntu | SQL_ASCII | C | C |
I have always struggled with postgresql configuration in rails. I follow the documentation completely, however, every time I try to clone application or move the code, I always run into problems. I am not sure why I am getting this error. Any help regarding this would be greatly appreciated. Thanks!!
Disclaimer: I'm not an expert on pgsql. But I've successfully set up pg/rails in several versions and environments. Here's what I suggest.
Find the pg_hba.conf file. Since you are apparently using Ubuntu, try
cd /
find . -name pg_hba.conf -print 2> /dev/null
This will search your whole disk, which will take a while. But eventually it will provide the path. If it produces more than one, you'll have to resolve which one is correct.
If you know where PG is installed, cd there instead of root.
Now
Verify the auth method for user ubuntu is password or maybe md5. Here is the relevant docs page. If you're interested only in local development, you can change password to trust. But if this is for production, that's a bad idea.
While logged into the pg command line, run
ALTER USER ubuntu PASSWORD 'newpassword';
to ensure the password is what you think it is.
You should post database.yaml or ENV['DATABASE_URL'] settings. In general, database.yaml needs to match precisely what pg expects.
For example:
development:
adapter: postgresql
encoding: unicode
database: app_development
pool: 5
username: ubuntu
password: <your password>
allow_concurrency: true
Caveat: Don't commit production passwords to your repos or even dev passwords if you don't totally control the repo.

Cypher load CSV eager and long action duration

im loading a file with 85K lines - 19M,
server has 2 cores, 14GB RAM, running centos 7.1 and oracle JDK 8
and it can take 5-10 minutes with the following server config:
dbms.pagecache.memory=8g
cypher_parser_version=2.0
wrapper.java.initmemory=4096
wrapper.java.maxmemory=4096
disk mounted in /etc/fstab:
UUID=fc21456b-afab-4ff0-9ead-fdb31c14151a /mnt/neodata
ext4 defaults,noatime,barrier=0 1 2
added this to /etc/security/limits.conf:
* soft memlock unlimited
* hard memlock unlimited
* soft nofile 40000
* hard nofile 40000
added this to /etc/pam.d/su
session required pam_limits.so
added this to /etc/sysctl.conf:
vm.dirty_background_ratio = 50
vm.dirty_ratio = 80
disabled journal by running:
sudo e2fsck /dev/sdc1
sudo tune2fs /dev/sdc1
sudo tune2fs -o journal_data_writeback /dev/sdc1
sudo tune2fs -O ^has_journal /dev/sdc1
sudo e2fsck -f /dev/sdc1
sudo dumpe2fs /dev/sdc1
besides that,
when running a profiler, i get lots of "Eagers", and i really cant understand why:
PROFILE LOAD CSV WITH HEADERS FROM 'file:///home/csv10.csv' AS line
FIELDTERMINATOR '|'
WITH line limit 0
MERGE (session :Session { wz_session:line.wz_session })
MERGE (page :Page { page_key:line.domain+line.page })
ON CREATE SET page.name=line.page, page.domain=line.domain,
page.protocol=line.protocol,page.file=line.file
Compiler CYPHER 2.3
Planner RULE
Runtime INTERPRETED
+---------------+------+---------+---------------------+--------------------------------------------------------+
| Operator | Rows | DB Hits | Identifiers | Other |
+---------------+------+---------+---------------------+--------------------------------------------------------+
| +EmptyResult | 0 | 0 | | |
| | +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph | 9 | 9 | line, page, session | MergeNode; Add(line.domain,line.page); :Page(page_key) |
| | +------+---------+---------------------+--------------------------------------------------------+
| +Eager | 9 | 0 | line, session | |
| | +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph | 9 | 9 | line, session | MergeNode; line.wz_session; :Session(wz_session) |
| | +------+---------+---------------------+--------------------------------------------------------+
| +ColumnFilter | 9 | 0 | line | keep columns line |
| | +------+---------+---------------------+--------------------------------------------------------+
| +Filter | 9 | 0 | anon[181], line | anon[181] |
| | +------+---------+---------------------+--------------------------------------------------------+
| +Extract | 9 | 0 | anon[181], line | anon[181] |
| | +------+---------+---------------------+--------------------------------------------------------+
| +LoadCSV | 9 | 0 | line | |
+---------------+------+---------+---------------------+--------------------------------------------------------+
all the labels and properties have indices / constrains
thanks for the help
Lior
He Lior,
we tried to explain the Eager Loading here:
And Marks original blog post is here: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
Rik tried to explain it in easier terms:
http://blog.bruggen.com/2015/07/loading-belgian-corporate-registry-into_20.html
Trying to understand the "Eager Operation"
I had read about this before, but did not really understand it until Andres explained it to me again: in all normal operations, Cypher loads data lazily. See for example this page in the manual - it basically just loads as little as possible into memory when doing an operation. This laziness is usually a really good thing. But it can get you into a lot of trouble as well - as Michael explained it to me:
"Cypher tries to honor the contract that the different operations
within a statement are not affecting each other. Otherwise you might
up with non-deterministic behavior or endless loops. Imagine a
statement like this:
MATCH (n:Foo) WHERE n.value > 100 CREATE (m:Foo {m.value = n.value + 100});
If the two statements would not be
isolated, then each node the CREATE generates would cause the MATCH to
match again etc. an endless loop. That's why in such cases, Cypher
eagerly runs all MATCH statements to exhaustion so that all the
intermediate results are accumulated and kept (in memory).
Usually
with most operations that's not an issue as we mostly match only a few
hundred thousand elements max.
With data imports using LOAD CSV,
however, this operation will pull in ALL the rows of the CSV (which
might be millions), execute all operations eagerly (which might be
millions of creates/merges/matches) and also keeps the intermediate
results in memory to feed the next operations in line.
This also
disables PERIODIC COMMIT effectively because when we get to the end of
the statement execution all create operations will already have
happened and the gigantic tx-state has accumulated."
So that's what's going on my load csv queries. MATCH/MERGE/CREATE caused an eager pipe to be added to the execution plan, and it effectively disables the batching of my operations "using periodic commit". Apparently quite a few users run into this issue even with seemingly simple LOAD CSV statements. Very often you can avoid it, but sometimes you can't."

Rails mysql2 migration breaks utf8 strings

I'm migrating old mysql database to new one so I wrote a script that connects to old database, reads it and creates new entities in new database. Here it is:
old_db = Mysql2::Client.new(host: options[:db_host],
username: options[:db_user],
password: options[:db_password],
database: options[:db_name],
encoding: 'utf8')
old_categories = old_db.query('select id, title from catalog__category order by lvl asc')
old_categories.each do |old_c|
c = Catalog::Category.new
c.name = old_c["title"]
c.save!
end
However after migration categories names appeared in really bad shape.
Both databases encoded in utf8. Client and server sets utf8 encoding
mysql> show variables like "%character%";
+--------------------------+------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/Cellar/mysql/5.5.27/share/mysql/charsets/ |
PHP project uses old database and shows all strings correctly,
Rails project uses new database and shows correctly everything, but imported categories strings.
Does anyone knows where is the problem and how to fix it?
Thank you.
The root problem with bad encoding in database itself (it was changed from latin1 to utf8 on the fly some time ago). As a result strings was encoded twice in dump.
mysqldump --default-character-set=latin1 --skip-set-charset -u <user> -p > b.sql
This command help to generate correct dump

Resources