{"id":2799,"date":"2021-04-08T11:48:44","date_gmt":"2021-04-08T11:48:44","guid":{"rendered":"https:\/\/ntspl.co.in\/blog\/?p=2799"},"modified":"2023-01-31T10:41:55","modified_gmt":"2023-01-31T10:41:55","slug":"preparing-postgresql-for-migration-with-database-migration-service","status":"publish","type":"post","link":"https:\/\/www.ntspl.co.in\/blog\/preparing-postgresql-for-migration-with-database-migration-service\/","title":{"rendered":"Preparing PostgreSQL for migration with Database Migration Service"},"content":{"rendered":"<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p><a href=\"https:\/\/cloud.google.com\/blog\/products\/databases\/database-migration-service-now-available-for-cloud-sql-and-more\">Last November<\/a>, we made relational database migration easier for MySQL users with our public preview of Database Migration Service (DMS). Today, we\u2019ve officially made the product generally available, and bring the same easy-to-use migration functionality to PostgreSQL.<\/p>\n<p>The thing I\u2019ve appreciated the most about diving deep with DMS has been that it just works. Once you get your source instance and database(s) prepared, and establish the connectivity between source and destination, doing the migration is all handled. When it&#8217;s all finished, cutting over to using your Cloud SQL instance as your application&#8217;s primary database is as simple as clicking a button in the DMS UI.<\/p>\n<p>Not to downplay the difficulty in database prep, or connectivity. I wrote a thorough <a href=\"https:\/\/cloud.google.com\/blog\/topics\/developers-practitioners\/database-migration-service-connectivity-technical-introspective\">blog post<\/a> walking through the various connectivity options for DMS in great detail. Network topology can be incredibly complicated, and connecting two machines securely through the internet while serving an application with thousands or millions of users is anything but simple.<\/p>\n<p>Today, I want to take a moment with you to cover preparing your source PostgreSQL instance and database(s) for migration using DMS and some gotchas I found so you don\u2019t have to.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-image_full_width\">\n<div class=\"article-module h-c-page\">\n<div class=\"h-c-grid\">\n<figure class=\"article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 \"><img decoding=\"async\" src=\"https:\/\/storage.googleapis.com\/gweb-cloudblog-publish\/images\/docUI.max-1000x1000.png\" alt=\"docui\" \/><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p>I\u2019ll start by saying, the <a href=\"https:\/\/cloud.google.com\/database-migration\/docs\/postgres\/configure-source-database\">documentation<\/a> and in-product UI guidance are both excellent for DMS. If you\u2019re familiar with setting up replication infrastructure for PostgreSQL, you\u2019re probably good to jump in, and check in on the documentation if needed. Having said that, it\u2019s documentation, so here I\u2019ll try to add a bit so it\u2019s all in one place to get everything you need to prepare your source PostgreSQL instance and database(s).<\/p>\n<p>Step one, be sure your source instance version is supported. Current list of supported versions can be found on the documentation page I linked above.<\/p>\n<p>Next up is a schema piece: DMS doesn\u2019t support migrating tables that don\u2019t have a primary key. Starting a migration against a database that has tables without primary keys will still succeed, but it won\u2019t bring over the data from a table that\u2019s lacking a primary key, but the table will still be created. So if you want to bring the data over from a table that doesn\u2019t have a primary key, you have a few options:<\/p>\n<ol>\n<li>You\u2019ll have to add a primary key prior to starting the migration.<\/li>\n<li>You\u2019ll need to bring the data over yourself after the initial migration. Keeping in mind of course that if you bring the data over yourself, even if you maintain the connection, DMS won\u2019t replicate data for that table moving forward.<\/li>\n<li>You can export the table from the source instance and import it into the new instance.<\/li>\n<li>Finally, you can create a table with the same schema as the one you have that doesn\u2019t have the primary key, give it a primary key (should use a sequence generator to autogenerate the key) and copy the source data into it. Then do the migration. DMS as part of doing the migration will create the non-PK table, it just doesn\u2019t copy the data over. Then you can copy the data from the migrated primary key table, and finally delete the primary key table once you\u2019ve verified the data. It sounds complicated, but it ensures you\u2019re getting the same data at the point of migration as the rest of your data as long as you\u2019ve got any new rows inserted into the non-primary key table also going into the primary key copy. If you\u2019re worried about the data in that table changing during the migration, you can copy the data over right before promoting the destination instance to minimize that window.<\/li>\n<\/ol>\n<p>DMS relies on pglogical for the migration work. This means that the pglogical extension has to be installed on each of the databases you want to migrate. Instructions for installing pglogical on your instance and database(s) can be found <a href=\"https:\/\/cloud.google.com\/database-migration\/docs\/postgres\/configure-source-database#pglogical\">here<\/a>. If you\u2019re running on Linux, the repo\u2019s installation page is helpful. To be sure I took one for the team, I decided to see how bad it might be to migrate a PostgreSQL database installed with Homebrew from MacOS to Cloud SQL. Turns out, shockingly not too bad! Installing pglogical from source:<\/p>\n<p>1) Clone GitHub repo<\/p>\n<p>2) Run <code>make<\/code><\/p>\n<p>2a) Get compilation error because <code>postgres.h<\/code> not found<\/p>\n<p>3) Find where Homebrew installed Postgres, find include folder, add all include folders to <code>C_INCLUDE_PATH<\/code><\/p>\n<p>4) Run <code>make<\/code>\u00a0again, built!<\/p>\n<p>5) Run <code>sudo make install<\/code>\u00a0because pglogical documentation said I might need it (side note: don\u2019t pre-optimize!)<\/p>\n<p>5a) Fails without any good messages<\/p>\n<p>6) Run <code>make install<\/code><\/p>\n<p>7) Great success! Can\u2019t quite test success yet, since now the instance and database(s) have to be configured to use pglogical and replication.<\/p>\n<p>The next piece is pretty straightforward if you\u2019ve done replication in PostgreSQL before. There are some configuration variables on the instance you need to set in order for the replication to succeed. There are two main ways to change these values. You can either adjust them while the instance is running with the <code>ALTER SYSTEM SET &lt;variable&gt; TO &lt;value&gt;;<\/code>\u00a0calls, or you can change them in the configuration file, <code>posgresql.conf<\/code>. Either way, you\u2019ll need to restart the instance for the changes to take effect.<\/p>\n<p>If you want to change it in the configuration file, but don\u2019t know where it lives, generally it lives in the data directory for the database. If you only have the credentials to log in to your database but don\u2019t know where it lives, you can run <code>SHOW data_directory<\/code>\u00a0once connected to the database and it\u2019ll give you the location of the data directory.<\/p>\n<p>The variables you need to set are:<\/p>\n<p><code>wal_level = logical<\/code>\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 # Needs to be set to logical<\/p>\n<p><code>max_replication_slots = n<\/code>\u00a0 \u00a0 # Number varies, see <a href=\"https:\/\/cloud.google.com\/database-migration\/docs\/postgres\/configure-source-database#pglogical\">here<\/a> for details<\/p>\n<p><code>max_wal_senders = n<\/code>\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0# Should be max_replication_slots plust number of actively connected replicas.<\/p>\n<p><code>max_worker_processes = n<\/code>\u00a0 \u00a0 \u00a0 \u00a0# Should be set to how many databases that are being replicated<\/p>\n<p>shared_preload_libraries = <code>pglogical<\/code><\/p>\n<p>Note that the <code>shared_preload_libraries<\/code> variable is a comma delimited list. You need to be careful when you set it to check first to see if there are other libraries that are being preloaded to include them, otherwise you could unload required libraries by your setup and cause issues with the database.<\/p>\n<p>Once you\u2019ve restarted you can verify the changes by connecting and running <code>SHOW &lt;variable&gt;<\/code>\u00a0e.g. <code>SHOW wal_level<\/code>\u00a0should show <code>logical<\/code>.<\/p>\n<\/div>\n<\/div>\n<div class=\"block-image_full_width\">\n<div class=\"article-module h-c-page\">\n<div class=\"h-c-grid\">\n<figure class=\"article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 \"><img decoding=\"async\" src=\"https:\/\/storage.googleapis.com\/gweb-cloudblog-publish\/images\/Screen_Shot_2021-03-24_at_3.44.56_PM.max-1000x1000.png\" alt=\"postres\" \/><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"block-paragraph\">\n<div class=\"rich-text\">\n<p>Quick example time:<\/p>\n<p>Note that these numbers are for the DMS load only. If you already have these values set for other reasons, you need to take that into account. For example, if you\u2019ve got\u00a0 <code>max_worker_processes<\/code> set to 8 to handle higher parallel querying, then you may want to add more on top\u00a0 to accommodate the replication to avoid impacting performance.<\/p>\n<p>Case 1: You\u2019re just doing a migration and immediately promoting the Cloud SQL instance. There aren\u2019t any other replicas setup on the source, and you only have a single database you\u2019re migrating over. Then you\u2019d want to set the values to:<\/p>\n<p><code># Technically we only need 1 for Cloud SQL subscriber and the default is<\/code><\/p>\n<p><code># set to 10, so you could just leave it alone. This is just illustrating<\/code><\/p>\n<p><code># that you could set it lower without any issues<\/code><\/p>\n<p><b>max_replication_slots = 3\u00a0<\/b><\/p>\n<p><code># Equal to max_replication_slots + 1 because we\u2019ll only have one\u00a0<\/code><\/p>\n<p><code># replica connected to the source instance<\/code><\/p>\n<p><b>max_wal_senders = 4<\/b><\/p>\n<p><code># Technically we only need 1 here because we\u2019re only bringing over<\/code><\/p>\n<p><code># one database, but always a good practice to have one as a buffer<\/code><\/p>\n<p><code># just in case there\u2019s an issue so it doesn\u2019t rely on<\/code><\/p>\n<p><code># only the one processor.<\/code><\/p>\n<p><b>max_worker_processes = 2<\/b><\/p>\n<p>Case 2: You have a setup where your on prem local instance is already set up with 5 replication slots to handle other replication you have in place, and there are 4 databases you want to migrate to the Cloud, you would want to set the variables up like:<\/p>\n<p><code># 5 for existing subscribers + 4 for each of source databases since pglogical<\/code><\/p>\n<p><code># requires 1 slot for each database<\/code><\/p>\n<p><b>max_replication_slots = 9<\/b><\/p>\n<p><code># Equal to max_replication_slots + 6 because say we have 5 existing replicas,<\/code><\/p>\n<p><code># and we\u2019ll be adding one more replica for DMS doing the migration<\/code><\/p>\n<p><b>max_wal_senders = 15<\/b><\/p>\n<p><code># 4 databases we\u2019re migrating, plus and extra as a buffer just in case<\/code><\/p>\n<p><b>max_worker_processes = 5<\/b><\/p>\n<p>Once you have your variables all set, if you changed them in the config file, now\u2019s the time you need to restart your PostgreSQL instance.<\/p>\n<p>You can verify it worked by logging into the instance and running <code>CREATE EXTENSION pglogical<\/code>\u00a0on one of the databases you\u2019re planning on replicating over. As long as it works, you\u2019ll need to connect to every database you want to be replicating and run that command on each one. And while you\u2019re there on each database, you need to grant the user that you specified in the <code>Define a source<\/code>\u00a0step creating the migration certain privileges. These grants need to happen on each database you\u2019re replicating as well as the postgres database:<\/p>\n<p># on all schemas (aside from the information schema and schemas starting with &#8220;pg_&#8221;) on each database to migrate, including pglogical<\/p>\n<p><code>GRANT USAGE on SCHEMA &lt;SCHEMA&gt; to &lt;USER&gt;<\/code><\/p>\n<p># on all databases to get replication information from source databases.<\/p>\n<p><code>GRANT SELECT on ALL TABLES in SCHEMA pglogical to &lt;USER&gt;<\/code><\/p>\n<p># on all schemas (aside from the information schema and schemas starting with &#8220;pg_&#8221;) on each database to migrate, including pglogical<\/p>\n<p><code>GRANT SELECT on ALL TABLES in SCHEMA &lt;SCHEMA&gt; to &lt;USER&gt;<\/code><\/p>\n<p># on all schemas (aside from the information schema and schemas starting with &#8220;pg_&#8221;) on each database to migrate, including pglogical<\/p>\n<p><code>GRANT SELECT on ALL SEQUENCES in SCHEMA &lt;SCHEMA&gt; to &lt;USER&gt;<\/code><\/p>\n<p># We\u2019re not handling it in this blog post, but if you happen to be trying to replicate<\/p>\n<p># from RDS, it would be <code>GRANT rds_replication TO USER<\/code>.<\/p>\n<p><code>ALTER USER USER WITH REPLICATION<\/code><\/p>\n<p>If your source database is earlier than version 9.6, there\u2019s an extra step to follow because before that, PostgreSQL didn\u2019t have replication delay monitoring by default. This is needed because DMS uses this to be able to watch if replication lag becomes too high. I\u2019m not going to cover it in detail here since all versions before 9.6 are currently end of life, but if you need to do this, there\u2019s information on what you need to do <a href=\"https:\/\/cloud.google.com\/database-migration\/docs\/postgres\/configure-source-database#replication-delay\">here<\/a>.<\/p>\n<p>Congratulations! Your PostgreSQL instance and database(s) are fully configured and ready for DMS! Another nicety of DMS, is when you\u2019re all configured and ready to go, there\u2019s a connectivity\/configuration test in the UI that will tell you if everything is configured correctly or not before you hit the final \u201cdo it\u201d button.<\/p>\n<p>Remember I mentioned that I cover a lot of the nitty gritty details around connectivity between your source database and the Cloud SQL instance in the blog post I linked at the top of this post. It covers MySQL there, so I\u2019ll add a pitfall I ran into with PostgreSQL here before I leave you.<\/p>\n<p>Be sure to remember if you haven\u2019t already, to enable your database to listen and accept connections from non-localhost locations. Two pieces to this, one, you need to change the <code>listen_address<\/code>\u00a0variable in your <code>postgresql.conf<\/code> file. It defaults to localhost, which might work depending on how you\u2019re managing connection to the database from your application, but won\u2019t work for the migration. You also need to modify the <code>pg_hba.conf<\/code> file to grant your user for the migration access to your local database from the Cloud. If you don\u2019t do either of these, DMS is really good about giving you clear error messages from the PostgreSQL instance telling you that you messed up. Ask me how I know.<\/p>\n<p>And there we have it. Everything in one place to get you ready to go bringing your PostgreSQL database into Cloud SQL. If you have any questions, suggestions or complaints, please reach out to me on <a href=\"https:\/\/twitter.com\/GabeWeiss_\" target=\"_blank\" rel=\"noopener noreferrer\">Twitter<\/a>, my DMs are open! Thanks for reading.<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Last November, we made relational database migration easier for MySQL users with our public preview of Database Migration Service (DMS). Today, we\u2019ve officially made the product generally available, and bring the same easy-to-use migration functionality to PostgreSQL. The thing I\u2019ve appreciated the most about diving deep with DMS has been that it just works. Once [&hellip;]<\/p>\n","protected":false},"author":53,"featured_media":2916,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[474],"tags":[476,475],"class_list":["post-2799","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-database","tag-database-migration","tag-postgresql"],"acf":{"custom_meta_title":"Preparing PostgreSQL for migration with Database Migration Service.","meta_description":"Everything you need in one place to prepare your PostgreSQL instance and databases for migration using Google Cloud\u2019s Database Migration Service.","meta_keyword":"postgresql database migration, database migration service, google cloud's database migration service","other_meta_tag":"<meta property=og:locale content=\"en-IN\" \/>\r\n<meta property=og:type content=\"article\" \/>\r\n<meta property=og:title content=\"Preparing PostgreSQL for migration with Database Migration Service\"\/>\r\n<meta property=og:description content=\"Everything you need in one place to prepare your PostgreSQL instance and databases for migration using Google Cloud\u2019s Database Migration Service.\"\/>\r\n<meta property=\"og:image\" content=\"https:\/\/ntspl.co.in\/blog\/wp-content\/uploads\/2021\/04\/preparing-postgresql-for-migration-with-database-migration-service.jpg\"\/>\r\n<meta property=og:url content=\"https:\/\/www.ntspl.co.in\/blog\/preparing-postgresql-for-migration-with-database-migration-service\"\/>\r\n<meta property=og:site_name content=NTSPL \/>\r\n<meta name=\"twitter:site\" content=\"@NTSPL\">\r\n<meta name=twitter:card content=\"summary\" \/>\r\n<meta name=twitter:description content=\"Everything you need in one place to prepare your PostgreSQL instance and databases for migration using Google Cloud\u2019s Database Migration Service.\"\/>\r\n<meta name=twitter:title content=\"Preparing PostgreSQL for migration with Database Migration Service\"\/>"},"_links":{"self":[{"href":"https:\/\/www.ntspl.co.in\/blog\/wp-json\/wp\/v2\/posts\/2799"}],"collection":[{"href":"https:\/\/www.ntspl.co.in\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ntspl.co.in\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ntspl.co.in\/blog\/wp-json\/wp\/v2\/users\/53"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ntspl.co.in\/blog\/wp-json\/wp\/v2\/comments?post=2799"}],"version-history":[{"count":5,"href":"https:\/\/www.ntspl.co.in\/blog\/wp-json\/wp\/v2\/posts\/2799\/revisions"}],"predecessor-version":[{"id":6478,"href":"https:\/\/www.ntspl.co.in\/blog\/wp-json\/wp\/v2\/posts\/2799\/revisions\/6478"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.ntspl.co.in\/blog\/wp-json\/wp\/v2\/media\/2916"}],"wp:attachment":[{"href":"https:\/\/www.ntspl.co.in\/blog\/wp-json\/wp\/v2\/media?parent=2799"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ntspl.co.in\/blog\/wp-json\/wp\/v2\/categories?post=2799"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ntspl.co.in\/blog\/wp-json\/wp\/v2\/tags?post=2799"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}