{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "e071e17e-7680-4441-9ea8-9a79d9dfea2b", "metadata": {}, "outputs": [], "source": [ "%load_ext sql\n", "import json\n", "import os\n", "import socket\n", "import cx_Oracle\n", "if os.name == 'nt':\n", " cx_Oracle.init_oracle_client(\"C:\\\\app\\\\instantclient_21_11\")" ] }, { "cell_type": "code", "execution_count": 2, "id": "65b611a9-b7d4-4967-8928-61b3984bfb99", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "" ] }, { "cell_type": "markdown", "id": "b0d7631c-7226-4a76-81b8-c94b94090e9a", "metadata": {}, "source": [ "# Database migration with the Liberatii Data Platform\n", "\n", "The Liberatii Data Platform virtualizes a PostgreSQL database to enable Oracle applications to be migrated to PostgreSQL *without modification*. This notebook contains a tutorial and demonstration of the deployment and use of the Liberatii Data Platform to migrate and test an example Oracle database.\n", "\n", "## Architecture\n", "\n", "The Liberatii Data Platform uses the Liberatii Gateway to virtualize Azure Databases. This allows\n", "applications built for Oracle to use these databases with **no code changes** and\n", "**no new tests** required.\n", "\n", "\"Diagram\n", "\n", "### Change-Data-Capture and Parallel Test Execution\n", "\n", "Production workloads can be tested against up-to-date Azure replicas of the Oracle Database\n", "virtualised through Liberatii Gateway using the Change-Data-Capture (CDC) and Parallel Test Execution\n", "features of the Liberatii Data Platform.\n", "\n", "\"Diagram\n", "\n", "This technique provides full confidence for a successful migration **without additional tests**\n", "before any applications are re-targeted to the Liberatii Gateway.\n", "\n", "## Demonstration\n", "\n", "This notebook uses the\n", "[Oracle H.R. schema](https://docs.oracle.com/cd/E11882_01/server.112/e40540/tablecls.htm#CNCPT111)\n", "to demonstrate the migration of the data and execution of existing Oracle PL/SQL code via\n", "the Liberatii Gateway on a PostgreSQL database.\n", "\n", "The H.R. schema provides a basis for queries that use several Oracle-specific features that\n", "are virtualized through the Liberatii Gateway:\n", "\n", "\"Diagram\n", "\n", "## Migrating an application\n", "\n", "This demonstration will migrate the schema and correspdoning SQL with Liberatii Data Platform though the following steps:\n", "\n", "1. **Deployment**
\n", " The deployment of an Azure Managed Application to provide the Liberatii Data Platform\n", " \n", "3. **Migration**
\n", " The migration of the schema and data from the Oracle database to PostgreSQL via the Liberatii Gateway\n", " \n", "5. **Synchronisation**
\n", " Setup of a Change Data Capture pipeline to synchronize the Oracle and PostgreSQL databases\n", " \n", "7. **Replay testing**
\n", " Testing of an Oracle Workload Replay for **performance** and **correctness** against both the Oracle and PostgreSQL databases to verify proper\n", " operation\n", " \n", "9. **Switch over**
\n", " Replacing the database driver and retargeting the application to use the Liberatii Gateway\n", "\n", "# Set up\n", "\n", "The following cell contains the connection information for the Oracle (Source) and PostgreSQL (Target) databases. It is initially set up to migrate the Oracle demo H.R. schema and expects the given users to exist with the required permissions on the specified databases." ] }, { "cell_type": "code", "execution_count": 3, "id": "8bc3f737-fdd4-4278-baa3-0f8a15396ef0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Connection settings:\n", "\n", " PostgreSQL: \n", " postgresql://HR:hr@postgres:5432/pdborcl\n", " Oracle: \n", " oracle://HR:hr@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=oracle)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=pdborcl)))\n", " Liberatii Data Platform:\n", " http://192.168.1.4:3000/api\n", " LGW:\n", " postgresql://HR:hr@pgtranslator:15432/pdborcl\n", "\n" ] } ], "source": [ "## Hostnames of the databases\n", "PG_HOST='demo-g3rqzbqudztzk-pgsql.postgres.database.azure.com'\n", "ORACLE_HOST='192.168.1.6'\n", "LGW_HOST='192.168.1.4'\n", "\n", "## Connection data\n", "DB='pdborcl'\n", "USER='HR'\n", "PSWD='hr'\n", "ORACLE_PORT=1521\n", "PG_PORT=5432\n", "LGW_PORT=5432\n", "PLATFORM=LGW_HOST\n", "\n", "## Connection strings (derived from the above connection data)\n", "ORACLE_CONN_STR=f'(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST={ORACLE_HOST})(PORT={ORACLE_PORT}))(CONNECT_DATA=(SERVICE_NAME={DB})))'\n", "PG=f'postgresql://{USER}:{PSWD}@{PG_HOST}:{PG_PORT}/{DB}'\n", "ORACLE=f'oracle://{USER}:{PSWD}@{ORACLE_CONN_STR}'\n", "LGW=f'postgresql://{USER}:{PSWD}@{LGW_HOST}:{LGW_PORT}/{DB}'\n", "API_PREFIX=f\"http://{PLATFORM}:3000\"\n", "\n", "print(f\"\"\"\n", "Connection settings:\n", "\n", " PostgreSQL: \n", " {PG}\n", " Oracle: \n", " {ORACLE}\n", " Liberatii Data Platform:\n", " {API_PREFIX}/api\n", " LGW:\n", " {LGW}\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 4, "id": "a6c806c1-caca-48d7-9ce8-5c5be5db1cfe", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 rows affected.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
banner
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
" ], "text/plain": [ "[('Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production',)]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql {ORACLE}\n", "-- Find version information for the Oracle database\n", "select banner FROM v$version" ] }, { "cell_type": "code", "execution_count": 5, "id": "0fcba405-657c-4509-b96c-7d277300dcac", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 rows affected.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
version
PostgreSQL 14.9 (Debian 14.9-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
" ], "text/plain": [ "[('PostgreSQL 14.9 (Debian 14.9-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit',)]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql {PG}\n", "-- Find version information for the PostgreSQL database\n", "select version()" ] }, { "cell_type": "markdown", "id": "dc1b56d4-58c8-4d36-b1c7-1da0dd674c83", "metadata": {}, "source": [ "On LGW we run Oracle SQL query, but it returns PostgreSQL version:" ] }, { "cell_type": "markdown", "id": "bcbc8b7d-5d6f-4649-91e6-a43e8194190c", "metadata": {}, "source": [ "## Test\n", "\n", "With the Liberatii Gateway deployed (or otherwise available) we can now test the connection.\n", "\n", "The following command uses Oracle SQL with will be translated by the virtualisation layer to access the PostgreSQL database." ] }, { "cell_type": "code", "execution_count": 9, "id": "10e8aea3-2ca4-486c-94bc-6ec783948923", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 rows affected.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BANNER
PostgreSQL 14.9 (Debian 14.9-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
" ], "text/plain": [ "[('PostgreSQL 14.9 (Debian 14.9-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit',)]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql {LGW}\n", "-- Find version information for the PostgreSQL database via Liberatii Gateway\n", "select banner FROM v$version" ] }, { "cell_type": "markdown", "id": "ece0383c-6300-4406-b7ef-36124fdd8201", "metadata": {}, "source": [ "# Migration\n", "\n", "The following section shows the migration of the schema and data from the Oracle database to PostgreSQL. This uses the Liberatii Data Platform to perform the following steps:\n", "\n", "1. Configuration, connection setup and initialisation\n", "3. Schema Migration\n", "4. Data Migration\n", "5. Testing and verification\n", "\n", "All of these steps **can be performed in parallel** across multiple threads and connections to cope with large volumes of data.\n", "\n", "## Configuration\n", "\n", "The following cells will construct the required connection information. Each connection (Oracle, PostgreSQL and the Liberatii Gateway) will be created in turn.\n", "\n", "### Oracle" ] }, { "cell_type": "code", "execution_count": 10, "id": "9ae56db7-8eac-4ace-a5e5-d4d4a577043e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"type\":\"Oracle\",\"connectionString\":\"(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=oracle)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=pdborcl)))\",\"user\":\"HR\",\"password\":\"hr\",\"id\":1}" ] } ], "source": [ "!curl -s {API_PREFIX}/connection -H \"Content-Type: application/json\" \\\n", " -d \"{{\\\"type\\\":\\\"Oracle\\\",\\\"connectionString\\\":\\\"{ORACLE_CONN_STR}\\\",\\\"user\\\":\\\"{USER}\\\",\\\"password\\\":\\\"{PSWD}\\\",\\\"id\\\":1}}\"" ] }, { "cell_type": "markdown", "id": "8db62a11-5165-4f8b-a217-10b7bcee4fa9", "metadata": {}, "source": [ "### PostgreSQL" ] }, { "cell_type": "code", "execution_count": 11, "id": "18de9543-edc4-45ae-9569-8232022b910f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"type\":\"PostgreSQL\",\"host\":\"postgres\",\"port\":5432,\"database\":\"pdborcl\",\"user\":\"HR\",\"password\":\"hr\",\"id\":2}" ] } ], "source": [ "!curl -s {API_PREFIX}/connection -H \"Content-Type: application/json\" \\\n", " -d \"{{\\\"type\\\":\\\"PostgreSQL\\\",\\\"host\\\":\\\"{PG_HOST}\\\",\\\"port\\\":5432,\\\"database\\\":\\\"{DB}\\\",\\\"user\\\":\\\"{USER}\\\",\\\"password\\\":\\\"{PSWD}\\\",\\\"id\\\":2}}\"" ] }, { "cell_type": "markdown", "id": "a6145dd8-2462-4877-b964-2ce9e05d3e20", "metadata": {}, "source": [ "### Liberatii Gateway:" ] }, { "cell_type": "code", "execution_count": 12, "id": "f7cea7be-cc42-456d-80fd-efd7cbda1048", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"type\":\"LGW\",\"host\":\"pgtranslator\",\"port\":15432,\"database\":\"pdborcl\",\"user\":\"HR\",\"password\":\"hr\",\"id\":3}" ] } ], "source": [ "!curl -s {API_PREFIX}/connection -H \"Content-Type: application/json\" \\\n", " -d \"{{\\\"type\\\":\\\"LGW\\\",\\\"host\\\":\\\"{LGW_HOST}\\\",\\\"port\\\":{LGW_PORT},\\\"database\\\":\\\"{DB}\\\",\\\"user\\\":\\\"{USER}\\\",\\\"password\\\":\\\"{PSWD}\\\",\\\"id\\\":3}}\"" ] }, { "cell_type": "markdown", "id": "c9264ec2-a546-4359-ab2e-c8e63cf5aaad", "metadata": {}, "source": [ "### Configuration Parameters\n", "\n", "The following configuration parameters are used for this demonstration. Please check the reference document to get all the available parameters.\n", "\n", "* `dataOnePass`
\n", " Don't use staging files or tables, just transfer all the data from the source to the target\n", "* `user`
\n", " List of schemas to transfer between the sources and target target\n", "* `verbose`
\n", " Log verbosity\n", "* `eraseOnInit`
\n", " Delete everything from the schema in the init stage, this is useful if we often restart the migration." ] }, { "cell_type": "code", "execution_count": 13, "id": "7647127c-810b-4c72-beb3-371f20d137f8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"message\": \"Config has been set successfully\",\n", " \"config\": {\n", " \"dataOnePass\": true,\n", " \"verbose\": 4,\n", " \"useCopy\": true,\n", " \"dataIterations\": -1,\n", " \"useWrapper\": true,\n", " \"useNative\": false,\n", " \"useUnlogged\": false,\n", " \"stat\": true,\n", " \"statDB\": false,\n", " \"rowsBuf\": 1000,\n", " \"lightCheck\": false,\n", " \"dataChunkSize\": -1,\n", " \"bigTablesFirst\": true,\n", " \"debReverseOrder\": false,\n", " \"cli\": true,\n", " \"rmStagingFiles\": true,\n", " \"parTables\": true,\n", " \"hashType\": \"murmur\",\n", " \"simulateUnsupportedTypes\": true,\n", " \"blobStreams\": true,\n", " \"clobStreams\": false,\n", " \"blobImmed\": false,\n", " \"clobImmed\": false,\n", " \"dumpCopy\": false,\n", " \"dryRun\": false,\n", " \"noBlobs\": false,\n", " \"idCol\": \"rowid\",\n", " \"idColByTable\": {},\n", " \"activeSqlFiles\": [],\n", " \"numTries\": 10,\n", " \"maxWait\": 60000,\n", " \"ignoreTrim\": true,\n", " \"removeLastFetchFirst\": false,\n", " \"checkMetaData\": true,\n", " \"users\": [\n", " \"HR\"\n", " ],\n", " \"linkedServers\": [],\n", " \"stages\": {},\n", " \"workloadsFiles\": \"\",\n", " \"eraseOnInit\": true,\n", " \"debezium\": \"http://kafka-connect:8083/\",\n", " \"replayDir\": \"/tmp/replay_dir\"\n", " }\n", "}\n" ] } ], "source": [ "result = !curl -s {API_PREFIX}/config -H \"Content-Type: application/json\" \\\n", " -d \"{{\\\"dataOnePass\\\": true, \\\"users\\\":[\\\"{USER}\\\"], \\\"verbose\\\":4, \\\"eraseOnInit\\\":true}}\"\n", "print(json.dumps(json.loads(result[0]), indent=True))" ] }, { "cell_type": "markdown", "id": "f0adc847-95b5-46c6-a4a5-413bea131c29", "metadata": {}, "source": [ "### Initialisation\n", "\n", "This is an initialisation operation: the framework runs simple assessments and stores the results in the target database. All operations are **asynchronous** and multiple can be run in **parallel**; this call constructs a new background operation with the specified id that can be used to query its progress." ] }, { "cell_type": "code", "execution_count": 14, "id": "ab14a2c2-f341-4080-b9b7-16615e0364b9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"message\": \"Config has been set successfully.\",\n", " \"operation\": {\n", " \"id\": 1,\n", " \"messages\": [],\n", " \"progress\": 0,\n", " \"status\": \"Running\",\n", " \"title\": \"\",\n", " \"promise\": {}\n", " }\n", "}\n" ] } ], "source": [ "result = !curl -s {API_PREFIX}/operation -H 'Content-Type: application/json' -d '{{ \"id\": 1, \"oracle\": 1, \"lgw\": 3, \"stage\": \"init\" }}'\n", "print(json.dumps(json.loads(result[0]), indent=True))" ] }, { "cell_type": "markdown", "id": "3025db03-016f-4e6a-8c5d-7d977b0f0812", "metadata": {}, "source": [ "Using this operation we can get the current state of the a currently running operation:" ] }, { "cell_type": "code", "execution_count": 15, "id": "0ef3df1f-d5bd-4a9a-a52a-ef95846ed4b1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"id\": 1,\n", " \"messages\": [],\n", " \"progress\": 0,\n", " \"status\": \"Running\",\n", " \"title\": \"\",\n", " \"promise\": {}\n", "}\n" ] } ], "source": [ "result = !curl -s {API_PREFIX}/operation/1?count=10\\&pager=10 -H 'Content-Type: application/json'\n", "print(json.dumps(json.loads(result[0]), indent=True))" ] }, { "cell_type": "markdown", "id": "945d87db-1057-4e19-b1b0-ea7e97591f34", "metadata": {}, "source": [ "And the next operation is the same as the previous one, except it waits until the current operation ends." ] }, { "cell_type": "code", "execution_count": 16, "id": "6f4d48ec-b5e8-4f72-837e-08506e7e5956", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"id\": 1,\n", " \"messages\": [],\n", " \"progress\": 0,\n", " \"status\": \"Succeeded\",\n", " \"title\": \"\",\n", " \"promise\": {}\n", "}\n" ] } ], "source": [ "result = !curl -s -X POST {API_PREFIX}/operation/1/wait?pager=10\\&count=10 -H 'Content-Type: application/json'\n", "print(json.dumps(json.loads(result[0]), indent=True))" ] }, { "cell_type": "markdown", "id": "ba6de177-165e-41e9-99db-4ce2f51dac99", "metadata": {}, "source": [ "## Schema migration\n", "\n", "After `init` stage is complete we have all the necessary information for migrating the database schema.\n", "\n", "For this demonstration the migration metadata is stored in the PostgreSQL database allowing us to run various queries on it. For example, we can query the number of objects of each type by running with the query:\n" ] }, { "cell_type": "code", "execution_count": 17, "id": "041a9fee-5648-4049-a679-5ea1820153f9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5 rows affected.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
counttypestageerror
3SEQUENCEINone
11INDEXINone
1VIEWINone
2PROCEDUREINone
7TABLEINone
" ], "text/plain": [ "[(3, 'SEQUENCE', 'I', None),\n", " (11, 'INDEX', 'I', None),\n", " (1, 'VIEW', 'I', None),\n", " (2, 'PROCEDURE', 'I', None),\n", " (7, 'TABLE', 'I', None)]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql {PG}\n", "select count(*), type, stage, error from dbt.migration_objects group by type, stage, error" ] }, { "cell_type": "markdown", "id": "18a3df02-6ef0-41f5-b495-1d6af688e51a", "metadata": {}, "source": [ "If there were any errors during the execution of any stage, they will be displayed in `error` stage. The errors can be manually fixed by modifying `ddl1`, `ddl2` columns of `dbt.migration_objects` or by adding some runtime objects:" ] }, { "cell_type": "code", "execution_count": 18, "id": "1e815497-5e25-4e62-9bfd-a113ec66302e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 rows affected.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ddl1ddl2

CREATE TABLE "HR"."COUNTRIES"
(\t"COUNTRY_ID" CHAR(2) CONSTRAINT "COUNTRY_ID_NN" NOT NULL ENABLE,
\t"COUNTRY_NAME" VARCHAR2(40),
\t"REGION_ID" NUMBER,
\t CONSTRAINT "COUNTRY_C_ID_PK" PRIMARY KEY ("COUNTRY_ID") ENABLE
) ORGANIZATION INDEX NOCOMPRESS

ALTER TABLE "HR"."COUNTRIES" ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE "HR"."COUNTRIES" ADD CONSTRAINT "COUNTR_REG_FK" FOREIGN KEY ("REGION_ID")
\t REFERENCES "HR"."REGIONS" ("REGION_ID") ENABLE;
" ], "text/plain": [ "[('\\n CREATE TABLE \"HR\".\"COUNTRIES\" \\n (\\t\"COUNTRY_ID\" CHAR(2) CONSTRAINT \"COUNTRY_ID_NN\" NOT NULL ENABLE, \\n\\t\"COUNTRY_NAME\" VARCHAR2(40), \\n\\t\"REGION_ID\" NUMBER, \\n\\t CONSTRAINT \"COUNTRY_C_ID_PK\" PRIMARY KEY (\"COUNTRY_ID\") ENABLE\\n ) ORGANIZATION INDEX NOCOMPRESS ', '\\nALTER TABLE \"HR\".\"COUNTRIES\" ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;\\nALTER TABLE \"HR\".\"COUNTRIES\" ADD CONSTRAINT \"COUNTR_REG_FK\" FOREIGN KEY (\"REGION_ID\")\\n\\t REFERENCES \"HR\".\"REGIONS\" (\"REGION_ID\") ENABLE;')]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql {PG}\n", "select ddl1, ddl2 from dbt.migration_objects where type = 'TABLE' limit 1" ] }, { "cell_type": "markdown", "id": "4cf6f038-905f-4969-ac9f-254edaea2f28", "metadata": {}, "source": [ "When errors are fixed we don't need to run the whole migration from the beginning. It's enough to reset only the problematic objects' `stage` field and restart the stage. Only those objects migration will be re-run.\n", "\n", "The schema migration doesn't migrate constraints, indexes and triggers. They are migrated in `constraints` stage. This is done to make data migration run faster. The Liberatii Data Platform can also execute **multiple parallel instances** of each operation to signficantly improve transfer speeds. It is even possible to perform **parallel migrations for single tables** by splitting them.\n", "\n", "The schema migration is started with same parameters as with `init` stage, but with `schema` as the stage name." ] }, { "cell_type": "code", "execution_count": 19, "id": "c92e2f34-a23b-4c1c-93cd-9a708366cbfb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"message\":\"Config has been set successfully.\",\"operation\":{\"id\":1,\"messages\":[],\"progress\":0,\"status\":\"Running\",\"title\":\"\",\"promise\":{}}}" ] } ], "source": [ "!curl -s {API_PREFIX}/operation -H 'Content-Type: application/json' -d '{{ \"id\": 1, \"oracle\": 1, \"lgw\": 3, \"stage\": \"schema\" }}'" ] }, { "cell_type": "markdown", "id": "43a44b40-d95c-4f0b-af77-926730ab1e41", "metadata": {}, "source": [ "Again, waiting until this task is finished:\n" ] }, { "cell_type": "code", "execution_count": 20, "id": "a198b763-aa92-47a5-8384-162897801b86", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"id\":1,\"messages\":[\"2 config: {\"],\"progress\":0,\"status\":\"Succeeded\",\"title\":\"\",\"promise\":{}}" ] } ], "source": [ "!curl -s -X POST {API_PREFIX}/operation/1/wait?count=1 -H 'Content-Type: application/json'" ] }, { "cell_type": "markdown", "id": "d0b8359c-4bc0-48d9-9f51-008c48a07924", "metadata": {}, "source": [ "In the `init` stage the framework also queries the sizes of each table. Using them we can get the most optimal strategy for copying the data.\n", "\n", "## Data\n", "\n", "So let's see the sizes:\n" ] }, { "cell_type": "code", "execution_count": 21, "id": "ef0a0cd1-a601-401f-86a5-2a0038a00afa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7 rows affected.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namepg_size_prettystageerror
REGIONS64 kBDNone
EMPLOYEES64 kBDNone
LOCATIONS64 kBDNone
JOBS64 kBDNone
DEPARTMENTS64 kBDNone
JOB_HISTORY64 kBDNone
COUNTRIES0 bytesDNone
" ], "text/plain": [ "[('REGIONS', '64 kB', 'D', None),\n", " ('EMPLOYEES', '64 kB', 'D', None),\n", " ('LOCATIONS', '64 kB', 'D', None),\n", " ('JOBS', '64 kB', 'D', None),\n", " ('DEPARTMENTS', '64 kB', 'D', None),\n", " ('JOB_HISTORY', '64 kB', 'D', None),\n", " ('COUNTRIES', '0 bytes', 'D', None)]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql {PG}\n", "select name, pg_size_pretty(data_size), stage, error from dbt.migration_objects where type = 'TABLE' order by data_size desc" ] }, { "cell_type": "markdown", "id": "e0907048-07ed-430e-aa50-ee30ab89cee2", "metadata": {}, "source": [ "\n", "By analysing the sizes for different tables we can apply different strategies for different tables. But size the sizes are quire small for this schema we just use the default strategy everywhere.\n", "\n", "Now starting the data migration operation, as all the operations before, but with `data` as `stage` field value:\n" ] }, { "cell_type": "code", "execution_count": 22, "id": "4e71cbe4-9d69-4330-9c49-ffcab378cb8c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"message\":\"Config has been set successfully.\",\"operation\":{\"id\":1,\"messages\":[],\"progress\":0,\"status\":\"Running\",\"title\":\"\",\"promise\":{}}}" ] } ], "source": [ "!curl -s {API_PREFIX}/operation -H 'Content-Type: application/json' -d '{{ \"id\": 1, \"oracle\": 1, \"lgw\": 3, \"stage\": \"data\" }}'" ] }, { "cell_type": "markdown", "id": "a83b3dff-8b12-462e-a1cb-0862c16d3fd5", "metadata": {}, "source": [ "Awaiting the operation to be completed:" ] }, { "cell_type": "code", "execution_count": 23, "id": "25fde8c1-a533-4a49-bd22-c286ce7be411", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"id\":1,\"messages\":[\"2 config: {\"],\"progress\":0,\"status\":\"Succeeded\",\"title\":\"\",\"promise\":{}}" ] } ], "source": [ "!curl -s -X POST {API_PREFIX}/operation/1/wait?count=1 -H 'Content-Type: application/json'" ] }, { "cell_type": "markdown", "id": "9828b68b-e42e-45f5-a429-f882d3e33d31", "metadata": {}, "source": [ "## Constraints\n", "This stage adds constraints, indexes and triggers to the data we've moved in the previous step:\n" ] }, { "cell_type": "code", "execution_count": 24, "id": "11f6f1fc-a060-4d66-b2d3-e8621f8fcd85", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"message\":\"Config has been set successfully.\",\"operation\":{\"id\":1,\"messages\":[],\"progress\":0,\"status\":\"Running\",\"title\":\"\",\"promise\":{}}}" ] } ], "source": [ "!curl -s {API_PREFIX}/operation -H 'Content-Type: application/json' -d '{{ \"id\": 1, \"oracle\": 1, \"lgw\": 3, \"stage\": \"constraints\" }}'" ] }, { "cell_type": "markdown", "id": "c1281826-1560-470d-826c-b918befee826", "metadata": {}, "source": [ "Awaiting the operation to be completed:" ] }, { "cell_type": "code", "execution_count": 25, "id": "5806fd48-f7f8-4104-a7d7-0d56ead0b5fb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"id\":1,\"messages\":[\"2 config: {\"],\"progress\":0,\"status\":\"Succeeded\",\"title\":\"\",\"promise\":{}}" ] } ], "source": [ "!curl -s -X POST {API_PREFIX}/operation/1/wait?count=1 -H 'Content-Type: application/json'" ] }, { "cell_type": "markdown", "id": "fc0cb080-7092-4579-ab6b-bcff09ebe941", "metadata": {}, "source": [ "## Verification\n", "\n", "With the data in place we can verify the migration across both databases to ensure they are the same. This is done by comparing hash sums run across data in each column:" ] }, { "cell_type": "code", "execution_count": 26, "id": "682d5c2b-316a-44c4-929d-5a7db7e2e0a3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"message\":\"Config has been set successfully.\",\"operation\":{\"id\":1,\"messages\":[],\"progress\":0,\"status\":\"Running\",\"title\":\"\",\"promise\":{}}}" ] } ], "source": [ "!curl -s {API_PREFIX}/operation -H 'Content-Type: application/json' -d '{{ \"id\": 1, \"oracle\": 1, \"lgw\": 3, \"stage\": \"check\" }}'" ] }, { "cell_type": "code", "execution_count": 27, "id": "f822fe8f-d519-46d3-81ac-41fb99848a3b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"id\":1,\"messages\":[\"2 config: {\"],\"progress\":0,\"status\":\"Succeeded\",\"title\":\"\",\"promise\":{}}" ] } ], "source": [ "!curl -s -X POST {API_PREFIX}/operation/1/wait?count=1 -H 'Content-Type: application/json'" ] }, { "cell_type": "code", "execution_count": 28, "id": "1093d6aa-997a-43c3-8920-f7259c8f1591", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5 rows affected.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stagetype
DSEQUENCE
DINDEX
DVIEW
KTABLE
DPROCEDURE
" ], "text/plain": [ "[('D', 'SEQUENCE'),\n", " ('D', 'INDEX'),\n", " ('D', 'VIEW'),\n", " ('K', 'TABLE'),\n", " ('D', 'PROCEDURE')]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql {PG}\n", "\n", "select stage, type from dbt.migration_objects group by stage, type" ] }, { "cell_type": "markdown", "id": "b29f3c0c-3faa-45c3-8106-59f2b7bbb478", "metadata": {}, "source": [ "This is it, now the whole database is migrated. Please see the guide document for the information about different migration options.\n", "\n", "## Running queries\n", "\n", "Let's now run a few queries in the both databases to see the results are identical. There is a `checksql` stage to do this automatically, but for the demo purposes we do this manually.\n", "\n", "So first we check the databases are indeed Oracle and PostgreSQL:" ] }, { "cell_type": "code", "execution_count": 29, "id": "770f2d18-0594-4572-ae0e-b14d26ba695b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 rows affected.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
banner
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
" ], "text/plain": [ "[('Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production',)]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql {ORACLE}\n", "select banner from v$version" ] }, { "cell_type": "code", "execution_count": 30, "id": "f5553f78-56b0-4524-a280-2e89ccb2c3bf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 rows affected.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BANNER
PostgreSQL 14.9 (Debian 14.9-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
" ], "text/plain": [ "[('PostgreSQL 14.9 (Debian 14.9-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit',)]" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql {LGW}\n", "select banner from v$version" ] }, { "cell_type": "markdown", "id": "dd24613a-7916-4664-a207-5674900ce101", "metadata": {}, "source": [ "The output value on PostgreSQL can be configured to output exactly the same like Oracle if the application uses the data for some business logic selection.\n", "\n", "Now let's run the same query in both databases. This query contains a number of Oracle-specific features:\n", "\n", "* Concatenation using the double-pipe (||) operator\n", "* The `NVL` and `DECODE` functions\n", "* The join-plus operator and implicit joins" ] }, { "cell_type": "code", "execution_count": 31, "id": "00efd189-dcde-4112-8626-5e88ab0161ee", "metadata": {}, "outputs": [], "source": [ "QUERY=\"\"\"\n", "SELECT first_name || ' ' || last_name full_name,\n", " salary +\n", " NVL (commission_pct, 0) sal_com,\n", " DECODE (NVL(e.department_id, -3),\n", " 60, d.department_name,\n", " 50, d.department_name,\n", " 30, d.department_name,\n", " -3, 'UNKNOWN',\n", " 'OTHER') dep,\n", " TO_CHAR(e.hire_date) hire_date\n", "FROM employees e, departments d\n", "WHERE d.department_id(+) = e.department_id\n", " AND e.last_name like 'G%'\n", "ORDER BY first_name || ' ' || last_name\n", "\"\"\"" ] }, { "cell_type": "markdown", "id": "827768b9-03cd-45e8-b510-fca878aeb5ec", "metadata": {}, "source": [ "First running it on Oracle:" ] }, { "cell_type": "code", "execution_count": 32, "id": "d41b3ece-2541-4288-8f74-3b3abc6faac2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 rows affected.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
full_namesal_comdephire_date
Danielle Greene9500.15OTHER19-MAR-07
Douglas Grant2600Shipping13-JAN-08
Girard Geoni2800Shipping03-FEB-08
Ki Gee2400Shipping12-DEC-07
Kimberely Grant7000.15UNKNOWN24-MAY-07
Nancy Greenberg12008OTHER17-AUG-02
Timothy Gates2900Shipping11-JUL-06
William Gietz8300OTHER07-JUN-02
" ], "text/plain": [ "[('Danielle Greene', Decimal('9500.15'), 'OTHER', '19-MAR-07'),\n", " ('Douglas Grant', 2600, 'Shipping', '13-JAN-08'),\n", " ('Girard Geoni', 2800, 'Shipping', '03-FEB-08'),\n", " ('Ki Gee', 2400, 'Shipping', '12-DEC-07'),\n", " ('Kimberely Grant', Decimal('7000.15'), 'UNKNOWN', '24-MAY-07'),\n", " ('Nancy Greenberg', 12008, 'OTHER', '17-AUG-02'),\n", " ('Timothy Gates', 2900, 'Shipping', '11-JUL-06'),\n", " ('William Gietz', 8300, 'OTHER', '07-JUN-02')]" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql {ORACLE} \n", "\n", "{QUERY}" ] }, { "cell_type": "markdown", "id": "bd9241d3-4a92-4fef-b737-1d921cc651b1", "metadata": {}, "source": [ "And now running absolutely the same query but on PostgreSQL via Liberatii Gateway:" ] }, { "cell_type": "code", "execution_count": 33, "id": "580387f1-0f6f-47cc-b508-a508379c2a77", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8 rows affected.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FULL_NAMESAL_COMDEPHIRE_DATE
Danielle Greene9500.15OTHER19-MAR-07
Douglas Grant2600.00Shipping13-JAN-08
Girard Geoni2800.00Shipping03-FEB-08
Ki Gee2400.00Shipping12-DEC-07
Kimberely Grant7000.15UNKNOWN24-MAY-07
Nancy Greenberg12008.00OTHER17-AUG-02
Timothy Gates2900.00Shipping11-JUL-06
William Gietz8300.00OTHER07-JUN-02
" ], "text/plain": [ "[('Danielle Greene', Decimal('9500.15'), 'OTHER', '19-MAR-07'),\n", " ('Douglas Grant', Decimal('2600.00'), 'Shipping', '13-JAN-08'),\n", " ('Girard Geoni', Decimal('2800.00'), 'Shipping', '03-FEB-08'),\n", " ('Ki Gee', Decimal('2400.00'), 'Shipping', '12-DEC-07'),\n", " ('Kimberely Grant', Decimal('7000.15'), 'UNKNOWN', '24-MAY-07'),\n", " ('Nancy Greenberg', Decimal('12008.00'), 'OTHER', '17-AUG-02'),\n", " ('Timothy Gates', Decimal('2900.00'), 'Shipping', '11-JUL-06'),\n", " ('William Gietz', Decimal('8300.00'), 'OTHER', '07-JUN-02')]" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql {LGW}\n", "\n", "{QUERY}" ] }, { "cell_type": "markdown", "id": "6e7632bf-92a1-466f-bdc2-016b66cdaa8b", "metadata": {}, "source": [ "The results are identical. The next steps are:\n", "\n", "* Synchronise the databases using CDC so we avoid downtimes on switchover\n", "* Replay workloads on a sandbox Oracle database and PostgreSQL via Liberatii Gateway, for testing correctness and performance" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }