Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docker-Compose] [Breaking] Postgres 9.6 is EOL (11th Nov 2021) - Migrate to 14 Stable #16947

Merged
merged 4 commits into from Nov 18, 2021
Merged

Conversation

shleeable
Copy link
Contributor

@shleeable shleeable commented Nov 6, 2021

Upgrade path using docker-compose

For safety, Old PostgreSQL 9.6 data is kept in ./postgres/, and new PostgreSQL 14 data is stored in ./postgres14/.

Rollback: This PR will allow for a recovery path for those having problems migrating to PostgreSQL 14. They can quickly roll back the docker-compose.yml to PostgreSQL 9.6. - While also working with a backup file to cleanly migrate on a fresh directory so no changes are made to the original DB in case of emergency. (postgres14 can be deleted to restart the import process if required)

Please keep the existing postgres folder incase you need to roll back - Delete it after the upgrade is completed.

Part 1 - Backup DB from PostgreSQL 9.6

# Shutdown app containers (keeping db online)
$ docker-compose stop web streaming sidekiq

# Create backup of mastodon db to local file
$ docker-compose exec db pg_dumpall -U postgres > 9.6.backup

# Shutdown db container
$ docker-compose down --remove-orphans

Part 2 - Import DB to PostgreSQL 14.x

# Checkout for Mastodon 3.5.x
$ git fetch && git checkout v3.5.x

# Boot the new DB - PostgreSQL 14.0
$ docker-compose up -d db

# Import the backup
$ cat 9.6.backup | docker-compose exec -T db psql -U postgres

Part 3 - Confirm import of DB has no serious errors

$ docker-compose logs db and confirm there are not any errors in the import stage. Below is a list of known errors, and their resolution.

Error 1

db_1 | ERROR: invalid input syntax for type <something>

These errors are bad and need to be investigated.

Running vacuumdb has apparently resolved these issues. Please read the optional command as an alternative to step 1.

Error 2

db_1 | LOG: checkpoints are occurring too frequently (20 seconds apart)
db_1 | HINT: Consider increasing the configuration parameter "max_wal_size".

These errors can be ignored.

Error 3

db_1 | ERROR: canceling autovacuum task
db_1 | CONTEXT: while scanning block 307315 of relation "public.statuses"
db_1 | automatic vacuum of table "mastodon_production.public.statuses"

These errors can be ignored. https://pganalyze.com/docs/log-insights/autovacuum/A60

Part 4 - Upgrade Mastodon

  • Follow standard Upgrade notes.

Alternative Step 1 - Vacuum DB before backup

We recommend completing the vacuum on your 9.6 DB before dumping the backup file.

# Shutdown app containers (keeping db online)
$ docker-compose stop web streaming sidekiq

# Vacuum your db (untested command)
## Note: If you believe your 9.6 DB might be corrupted or experience errors with the importing stage, 
$ docker-compose exec db pg_dumpall -U postgres > 9.6.novacuum.backup
$ docker-compose exec db vacuumdb  -U postgres --all --full --analyze --verbose 
$ docker-compose exec db pg_dumpall -U postgres > 9.6.backup

# Shutdown db container
$ docker-compose down --remove-orphans

UPDATE:

We're become aware of a rare issue with the postgres backup dump causing segfaults when restoring. If you experience this issue... Please try this or similar

docker exec db bash -c 'pg_dumpall -U postgres > 9.6.backup'
docker cp db:/9.6.backup 9.6.backup
docker exec db bash -c 'rm 9.6.backup'

Ref: https://stackoverflow.com/questions/63934856/why-is-pg-restore-segfaulting-in-docker

@mashirozx
Copy link
Contributor

Some error logs here, hope them help:

,
PostgreSQL Database directory appears to contain a database; Skipping initialization,
,
2021-11-08 03:10:09.353 UTC [1] LOG:  starting PostgreSQL 14.0 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20210424) 10.3.1 20210424, 64-bit,
2021-11-08 03:10:09.353 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432,
2021-11-08 03:10:09.353 UTC [1] LOG:  listening on IPv6 address "::", port 5432,
2021-11-08 03:10:09.359 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432",
2021-11-08 03:10:09.369 UTC [21] LOG:  database system was shut down at 2021-11-08 03:09:59 UTC,
2021-11-08 03:10:09.377 UTC [1] LOG:  database system is ready to accept connections,
2021-11-08 03:10:57.632 UTC [43] ERROR:  role "postgres" already exists,
2021-11-08 03:10:57.632 UTC [43] STATEMENT:  CREATE ROLE postgres;,
2021-11-08 03:11:52.879 UTC [22] LOG:  checkpoints are occurring too frequently (24 seconds apart),
2021-11-08 03:11:52.879 UTC [22] HINT:  Consider increasing the configuration parameter "max_wal_size".,
2021-11-08 03:12:04.023 UTC [45] ERROR:  invalid input syntax for type boolean: "",
2021-11-08 03:12:04.023 UTC [45] CONTEXT:  COPY mentions, line 880367, column silent: "",
2021-11-08 03:12:04.023 UTC [45] STATEMENT:  COPY public.mentions (id, status_id, created_at, updated_at, account_id, silent) FROM stdin;,
2021-11-08 03:12:12.970 UTC [45] ERROR:  invalid input syntax for type bigint: "c565353",
2021-11-08 03:12:12.970 UTC [45] CONTEXT:  COPY preview_cards, line 534192, column id: "c565353",
2021-11-08 03:12:12.970 UTC [45] STATEMENT:  COPY public.preview_cards (id, url, title, description, image_file_name, image_content_type, image_file_size, image_updated_at, type, html, author_name, author_url, provider_name, provider_url, width, height, created_at, updated_at, embed_url, image_storage_schema_version, blurhash) FROM stdin;,
2021-11-08 03:12:17.027 UTC [22] LOG:  checkpoints are occurring too frequently (25 seconds apart),
2021-11-08 03:12:17.027 UTC [22] HINT:  Consider increasing the configuration parameter "max_wal_size".,
2021-11-08 03:12:19.727 UTC [45] ERROR:  invalid input syntax for type timestamp: "",
2021-11-08 03:12:19.727 UTC [45] CONTEXT:  COPY status_stats, line 974194, column updated_at: "",
2021-11-08 03:12:19.727 UTC [45] STATEMENT:  COPY public.status_stats (id, status_id, replies_count, reblogs_count, favourites_count, created_at, updated_at) FROM stdin;,
2021-11-08 03:12:33.652 UTC [22] LOG:  checkpoints are occurring too frequently (16 seconds apart),
2021-11-08 03:12:33.652 UTC [22] HINT:  Consider increasing the configuration parameter "max_wal_size".,
2021-11-08 03:12:51.660 UTC [22] LOG:  checkpoints are occurring too frequently (18 seconds apart),
2021-11-08 03:12:51.660 UTC [22] HINT:  Consider increasing the configuration parameter "max_wal_size".,
2021-11-08 03:13:11.506 UTC [22] LOG:  checkpoints are occurring too frequently (20 seconds apart),
2021-11-08 03:13:11.506 UTC [22] HINT:  Consider increasing the configuration parameter "max_wal_size".,
2021-11-08 03:13:27.982 UTC [22] LOG:  checkpoints are occurring too frequently (16 seconds apart),
2021-11-08 03:13:27.982 UTC [22] HINT:  Consider increasing the configuration parameter "max_wal_size".,
2021-11-08 03:13:49.128 UTC [74] ERROR:  canceling autovacuum task,
2021-11-08 03:13:49.128 UTC [74] CONTEXT:  while scanning block 62212 of relation "public.media_attachments",
	automatic vacuum of table "postgres.public.media_attachments",
2021-11-08 03:13:53.847 UTC [74] ERROR:  canceling autovacuum task,
2021-11-08 03:13:53.847 UTC [74] CONTEXT:  while scanning block 7912 of relation "public.mentions",
	automatic vacuum of table "postgres.public.mentions",
2021-11-08 03:13:55.768 UTC [74] ERROR:  canceling autovacuum task,
2021-11-08 03:13:55.768 UTC [74] CONTEXT:  while scanning block 2208 of relation "public.pghero_space_stats",
	automatic vacuum of table "postgres.public.pghero_space_stats",
2021-11-08 03:13:57.035 UTC [74] ERROR:  canceling autovacuum task,
2021-11-08 03:13:57.035 UTC [74] CONTEXT:  while scanning block 2117 of relation "public.preview_cards",
	automatic vacuum of table "postgres.public.preview_cards",
2021-11-08 03:14:00.123 UTC [94] ERROR:  canceling autovacuum task,
2021-11-08 03:14:00.123 UTC [94] CONTEXT:  while scanning block 39950 of relation "public.statuses",
	automatic vacuum of table "postgres.public.statuses",
2021-11-08 03:15:33.147 UTC [22] LOG:  checkpoints are occurring too frequently (24 seconds apart),
2021-11-08 03:15:33.147 UTC [22] HINT:  Consider increasing the configuration parameter "max_wal_size".,
2021-11-08 03:15:37.676 UTC [125] ERROR:  canceling autovacuum task,
2021-11-08 03:15:37.676 UTC [125] CONTEXT:  while scanning block 213998 of relation "public.statuses",
	automatic vacuum of table "postgres.public.statuses"

@mashirozx
Copy link
Contributor

mashirozx commented Nov 8, 2021

And maybe pg_dumpall can be better?

docker-compose exec -e PGPASSWORD=password db pg_dumpall -U postgres > dump.sql
psql -U postgres -f dump.sql postgres

@shleeable
Copy link
Contributor Author

And maybe pg_dumpall can be better?

docker-compose exec -e PGPASSWORD=password db pg_dumpall -U postgres > dump.sql
psql -U postgres -f dump.sql postgres

Yeah. I think dumpall is the better way.

@shleeable
Copy link
Contributor Author

@shleeable
Copy link
Contributor Author

shleeable commented Nov 8, 2021

@mashirozx

2021-11-08 03:11:52.879 UTC [22] LOG: checkpoints are occurring too frequently (24 seconds apart),
2021-11-08 03:11:52.879 UTC [22] HINT: Consider increasing the configuration parameter "max_wal_size".,

This is safe to ignore during the import.

2021-11-08 03:12:04.023 UTC [45] ERROR: invalid input syntax for type boolean: "",
2021-11-08 03:12:04.023 UTC [45] CONTEXT: COPY mentions, line 880367, column silent: "",
2021-11-08 03:12:04.023 UTC [45] STATEMENT: COPY public.mentions (id, status_id, created_at, updated_at, account_id, silent) FROM stdin;,

This is bad. please try again using my steps?

2021-11-08 03:12:12.970 UTC [45] ERROR: invalid input syntax for type bigint: "c565353",
2021-11-08 03:12:12.970 UTC [45] CONTEXT: COPY preview_cards, line 534192, column id: "c565353",
2021-11-08 03:12:12.970 UTC [45] STATEMENT: COPY public.preview_cards (id, url, title, description, image_file_name, image_content_type, image_file_size, image_updated_at, type, html, author_name, author_url, provider_name, provider_url, width, height, created_at, updated_at, embed_url, image_storage_schema_version, blurhash) FROM stdin;,

This is bad. please try again using my steps?

2021-11-08 03:12:19.727 UTC [45] ERROR: invalid input syntax for type timestamp: "",
2021-11-08 03:12:19.727 UTC [45] CONTEXT: COPY status_stats, line 974194, column updated_at: "",
2021-11-08 03:12:19.727 UTC [45] STATEMENT: COPY public.status_stats (id, status_id, replies_count, reblogs_count, favourites_count, created_at, updated_at) FROM stdin;,

This is bad. please try again using my steps?

2021-11-08 03:13:49.128 UTC [74] ERROR: canceling autovacuum task,
2021-11-08 03:13:49.128 UTC [74] CONTEXT: while scanning block 62212 of relation "public.media_attachments",
automatic vacuum of table "postgres.public.media_attachments",

This is safe to ignore during the import.
https://pganalyze.com/docs/log-insights/autovacuum/A60

@shleeable shleeable marked this pull request as ready for review November 8, 2021 13:12
@mashirozx
Copy link
Contributor

mashirozx commented Nov 9, 2021

Thanks, I'll try it later and give you feedback.

PS. I'm upgradeing from pg 12 to pg 14.


Just found there is a pg_upgrade, maybe we can have a try.

@shleeable
Copy link
Contributor Author

shleeable commented Nov 9, 2021

Thanks, I'll try it later and give you feedback.

PS. I'm upgradeing from pg 12 to pg 14.

Just found there is a pg_upgrade, maybe we can have a try.

Thanks for your assistance. The steps as described as confirmed as migrating successfully on my instance.

  • The pg_upgrade and pg_restore are tools build for migration but I don't think are required.

I'm happy to test and confirm using pg_upgrade if others have problems - but I'm currently confident in the current process.

@shleeable shleeable changed the title [Breaking] Postgres 9.6 is EOL (11th Nov 2021) - Migrate to 14 Stable [Docker-Compose] [Breaking] Postgres 9.6 is EOL (11th Nov 2021) - Migrate to 14 Stable Nov 14, 2021
@shleeable
Copy link
Contributor Author

shleeable commented Nov 18, 2021

aus.social is running Postgres 14 using those steps above... the migration took about 10-15 minutes to import the 11GB backup.

image

I'm happy with this process.

@Gargron Gargron merged commit ac8ad78 into mastodon:main Nov 18, 2021
@shleeable shleeable deleted the patch-4 branch November 19, 2021 01:07
@OccultWarlock
Copy link

Can these errors be ignored?

DETAIL: Key (account_id)=(128124) is not present in table "accounts".
setval

  2

(1 row)

ERROR: insert or update on table "login_activities" violates foreign key constraint "fk_rails_e4b6396b41"
DETAIL: Key (user_id)=(146) is not present in table "users".
setval

 50

(1 row)

ERROR: insert or update on table "markers" violates foreign key constraint "fk_rails_a7009bc2b6"
DETAIL: Key (user_id)=(57) is not present in table "users".
setval

 88

(1 row)

@OccultWarlock
Copy link

None of the toots or user accounts transferred over after merging the two databases

@mashirozx
Copy link
Contributor

I did the upgrade again and succeed without error this time. (Postgres 12 -> 14)

This time I did vacuum operation before dumping data:

docker-compose exec -e PGPASSWORD=password db vacuumdb -d postgres -U postgres -fzv

@shleeable
Copy link
Contributor Author

shleeable commented Nov 22, 2021

@mashirozx Great to hear (TBH I have no idea why vacuum should fix your problem but I'll put a comment in it guide that a vacuum might be required).

@OccultWarlock
Copy link

Using the command docker-compose exec db vacuumdb --all --full --analyze --verbose comes back with the following error vacuumdb: could not connect to database template1: FATAL: role "root" does not exist

@OccultWarlock
Copy link

Finally got it working properly! Thanks for all the help!

mimikun added a commit to mimikun/mastodon that referenced this pull request Nov 26, 2021
Gargron pushed a commit that referenced this pull request Dec 6, 2021
* Update .dockerignore

* Update .gitignore
hinaloe added a commit to hinaloe/mastodon that referenced this pull request Feb 2, 2022
…1) - Migrate to 14 Stable (mastodon#16947)"

temporary rollback pg version

This reverts commit ac8ad78.
jesseplusplus pushed a commit to jesseplusplus/decodon that referenced this pull request Feb 10, 2022
…rate to 14 Stable (mastodon#16947)

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml
jesseplusplus pushed a commit to jesseplusplus/decodon that referenced this pull request Feb 10, 2022
@Strubbl
Copy link

Strubbl commented Mar 31, 2022

Somehow the import of the old database does not work. I see no new line in docker-compose log

$ cat 9.6.backup | docker-compose exec -T db psql -U postgres
$ docker-compose exec db pg_dumpall -U postgres > 14.backup
$ ls -l 14.backup 9.6.backup 
-rw-r--r-- 1 strubbl strubbl 1.5K Mar 31 21:21 14.backup
-rw-r--r-- 1 strubbl strubbl  22M Mar 31 21:21 9.6.backup

I am missing my data in the postgres db. But why? (My workaround is to get a bash in that container and do the import command there. But it's so strange that the docker-compose command does not work)

Additionally i also get the following problem:

Using the command docker-compose exec db vacuumdb --all --full --analyze --verbose comes back with the following error vacuumdb: could not connect to database template1: FATAL: role "root" does not exist

What is the solution to this?

@ghost
Copy link

ghost commented Apr 8, 2022

Using the command docker-compose exec db vacuumdb --all --full --analyze --verbose comes back with the following error vacuumdb: could not connect to database template1: FATAL: role "root" does not exist

What is the solution to this?

You just need to specify what user vacummdb needs to run under by adding a -U postgres

@shleeable
Copy link
Contributor Author

I'll add that as a note on the top.

@wolfteeth
Copy link

The docker-compose psql import did not for me. With the -T option, literally nothing happens, it returns instantly. Without it, the process goes for a while, makes a lot of noise on screen, and eventually hangs. In this case the docker-compose logs show some syntax errors even after running vacuumdb.

What did work was running a bash prompt inside the container, as another user suggested. In case anyone else is struggling with the import after switching over to v3.5.x, this is what I did:

  1. Add the location of the backup as a volume under the db section in docker-compose.yml, for example:
    volumes:
      - ./postgres14:/var/lib/postgresql/data
      - .:/tmpvol # this mounts your mastodon directory as /tmpvol inside the container
  1. docker-compose up -d db
  2. docker-compose exec db bash
  3. In the bash shell: psql -U postgres -f /tmpvol/9.6.backup.

This is the most obnoxious and time-consuming that a Mastodon update has been for me in a while--would be nice if this migration could have been automated somehow, and maybe tested by more people. As usual what works for one person doesn't work for everyone. I was wondering if maybe it had to do with different docker-compose versions on different OS versions--my server is on Ubuntu 20.04.

@shleeable
Copy link
Contributor Author

Sorry for issues you've faced. I'm glad you got there in the end....

There are multiple ways to do this migration and I just wanted to offer one that I believed was pretty close to standard... as you said, I cant test all environments.

That said, I've added a second method closer to your solution to the bottom of the migration notes.

@wolfteeth
Copy link

Sorry for the terse tone last night, I was frustrated after a few hours of struggling. I do appreciate the effort you put in to document your own process. The whole reason I've stuck with the docker deployment for the life of my instance is because it seemed like less of a headache (avoiding breakage from the OS updating packages in particular). Obviously nothing can be done about it now, but I wonder if there's a way to improve these rare but painful upgrades in the future. I'd guess in general most of us docker deployments are the less experienced sysadmins.

@shleeable
Copy link
Contributor Author

Obviously nothing can be done about it now, but I wonder if there's a way to improve these rare but painful upgrades in the future. I'd guess in general most of us docker deployments are the less experienced sysadmins.

Luckily the Postgres 14 End-of-life is November 12, 2026.... so we have a while to hopefully get the next migration plans sorted out :)

@lightweight
Copy link

lightweight commented May 9, 2022

For what it's worth, I recently did an upgrade from 3.3.0 -> 3.5.1 including the PostgreSQL upgrade from 9.6 -> 14. I used the vacuum db approach described above. All went seemingly without errors, but I ended up with subtle errors on the data import (pre-migration) into pg14. The end result was that I lost all the posts prior to the upgrade (user accounts, etc. seemed fine). I've described the problem in more detail here: #18303

I still need to work out if it's possible for me to merge the missing posts (toots) back into the dataset somehow. Sigh. Anyone ever tried that or have any insight into whether it might work (and how)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants