Move data between PostgreSQL instances on GCP
A while ago I had to migrate the data from one database to the other on GCP. You can try using export/import functionality described here
If you have to move data between different clouds or you want to copy data to a database on a localhost, you may want to use pg_dump and psql.
Running databases with Docker locally
If you want to test it on localhost, you can use docker to run postgres. Let’s run two databases, we create database called db_source where we will have our data and other called db_destination where we want to replicate what is in source. While I’m using databases running locally, you can connect to a database running in a cloud. I like to play first with my environment, it makes me comfortable with what I’m doing, before I try it on a cloud instance and something breaks.
Run source database.
docker run --rm \
--name db_source \
-e POSTGRES_USER=username \
-e POSTGRES_PASSWORD=pass \
-e POSTGRES_DB=db_source \
-d -p 5433:5432 \
postgres:latest
Run destination database.
docker run --rm \
--name db_destination \
-e POSTGRES_USER=username2 \
-e POSTGRES_PASSWORD=pass2 \
-e POSTGRES_DB=db_destination \
-d -p 5434:5432 \
postgres:latest
When you run docker ps
you should see information about your databases.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
da36c2674992 postgres:latest "docker-entrypoint.s…" 4 seconds ago Up 2 seconds 0.0.0.0:5434->5432/tcp db_destination
ae5760af4329 postgres:latest "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5433->5432/tcp db_source
Create test data
Let’s create some data in db_source.
psql \
-h 0.0.0.0 \
-p 5433 \
-U username \
-d db_source \
-c "CREATE TABLE example(value INT);
INSERT INTO example(value)
VALUES (1),(2),(3),(4);"
You will be asked for a password, for db_source we set the password as pass
with a POSTGRES_PASSWORD environment variable.
You could pass (do you see what I did here?) it as an environment variable to psql, but it’s not recommended as it would stay in your command history. For this demo we already have password in a history, because we set it when creating a docker container, so you may carry on with this approach. I will show you what is a faster and safer way later.
It would look like that.
PGPASSWORD=pass \
psql \
-h 0.0.0.0 \
-p 5433 \
-U username \
-d db_source \
-c "SELECT * FROM example;"
If you want to stay connected to database to write more queries you can run psql without any commands.
psql \
-h 0.0.0.0 \
-p 5433 \
-U username \
-d db_source
Once you connect run \dt
to see your table.
Generate table backup
Now we would like to copy data from source to destination. Let’s first see what pg_dump will generate. It will ask you for a password again.
pg_dump \
-h 0.0.0.0 \
-p 5433 \
-U username \
-t example \
db_source
You can see, as part of the output, table and data we inserted.
CREATE TABLE public.example (
value integer
);
ALTER TABLE public.example OWNER TO username;
--
-- Data for Name: example; Type: TABLE DATA; Schema: public; Owner: username
--
COPY public.example (value) FROM stdin;
1
2
3
4
\.
No more typing passwords
Typing password every time is annoying, let’s create a file with passwords for our databases, it’s called .pgpass (in home directory so ~/.pgpass) and it’s format is host:port:dbname:username:password For our databases it will look like that.
0.0.0.0:5433:db_source:username:pass
0.0.0.0:5434:db_destination:username2:pass2
For this file to be used you have to change permission to 0600.
chmod 0600 ~/.pgpass
Restore in destination database
Now let’s go back to our main quest, without typing passwords all the time, you can redirect output to a file.
pg_dump \
-h 0.0.0.0 \
-p 5433 \
-U username \
-t example db_source \
> migration.sql
Now we have a file and we can import it to our destination database.
psql \
-h 0.0.0.0 \
-p 5434 \
-U username2 \
-d db_destination \
-f migration.sql
Now you can query your database to see if table was migrated correctly.
psql \
-h 0.0.0.0 \
-p 5434 \
-U username2 \
-d db_destination \
-c "SELECT * FROM example;"
That’s all good, but instead of creating a file you can redirect pg_dump output to psql, let’s first delete existing table.
psql \
-h 0.0.0.0 \
-p 5434 \
-U username2 \
-d db_destination \
-c "DROP TABLE example;"
We can run migration again.
pg_dump \
-U username \
-h 0.0.0.0 \
-p 5433 \
-t example \
db_source |\
psql \
-h 0.0.0.0 \
-p 5434 \
-U username2 \
-d db_destination
Check if you have data in destination database.
When it doesn’t work with creating an index after
I had to run it for a table which had size around 36GB and 220M rows. Generated dump first creates a table, then inserts all the data and after that creates constraints and indices. You can see it by creating a new table.
psql \
-h 0.0.0.0 \
-p 5433 \
-U username \
-d db_source \
-c "CREATE TABLE example2(value INT UNIQUE);
INSERT INTO example2(value)
VALUES (1),(2),(3),(4);"
When you run pg_dump.
pg_dump \
-U username \
-h 0.0.0.0 \
-p 5433 \
-t example2 \
db_source
You can see output containing:
CREATE TABLE public.example2 (
value integer
);
ALTER TABLE public.example2 OWNER TO username;
--
-- Data for Name: example2; Type: TABLE DATA; Schema: public; Owner: username
--
COPY public.example2 (value) FROM stdin;
1
2
3
4
\.
--
-- Name: example2 example2_value_key; Type: CONSTRAINT; Schema: public; Owner: username
--
ALTER TABLE ONLY public.example2
ADD CONSTRAINT example2_value_key UNIQUE (value);
It’s more efficient to create a table, insert data without any checks, and once we have all rows we can create indices. It should be a faster operation than validating every row when inserting data. Because of the size of a table and limits on my PostgreSQL instance, I was getting an exception when creating an index. My work around was to migrate the schema first and then data. To do that, you have to use --schema-only
and --data-only
, see an example below.
pg_dump \
-U username \
-h 0.0.0.0 \
-p 5433 \
-t example2 \
db_source \
--schema-only |\
psql \
-h 0.0.0.0 \
-p 5434 \
-U username2 \
-d db_destination
pg_dump \
-U username \
-h 0.0.0.0 \
-p 5433 \
-t example2 \
db_source \
--data-only |\
psql \
-h 0.0.0.0 \
-p 5434 \
-U username2 \
-d db_destination
You can check output from both dumps when you remove a psql part from the query, like we did at the beginning when we displayed output on a console.
I hope you found it useful, good luck on your data migration!