{"id":19873,"date":"2025-01-07T15:07:18","date_gmt":"2025-01-07T09:37:18","guid":{"rendered":"https:\/\/opstree.com\/blog\/?p=19873"},"modified":"2025-07-23T11:25:31","modified_gmt":"2025-07-23T05:55:31","slug":"a-step-into-the-world-of-data-mastery-optimizing-redshift-for-seamless-migration","status":"publish","type":"post","link":"https:\/\/opstree.com\/blog\/2025\/01\/07\/a-step-into-the-world-of-data-mastery-optimizing-redshift-for-seamless-migration\/","title":{"rendered":"How to Optimize Amazon Redshift for Faster and Seamless Data Migration"},"content":{"rendered":"<p>When it comes to handling massive datasets, choosing the right approach can make or break your system\u2019s performance. In this blog, I\u2019ll take you through the first half of my Proof of Concept (PoC) journey\u2014preparing data in Amazon Redshift for migration to Google BigQuery. From setting up Redshift to crafting an efficient data ingestion pipeline, this was a hands-on experience that taught me a lot about Redshift\u2019s power (and quirks). Let\u2019s dive into the details, and I promise it won\u2019t be boring!<\/p>\n<p><!--more--><\/p>\n<h2>Step 1: Connecting to the Mighty Redshift<\/h2>\n<p>Imagine having a tool capable of handling terabytes of data with lightning speed\u2014welcome to <a href=\"https:\/\/opstree.com\/blog\/2025\/05\/06\/technical-case-study-amazon-redshift-and-athena-as-data-warehousing-solutions\/\"><strong>Amazon Redshift<\/strong><\/a>! My first task was to set up a Redshift serverless instance and connect to it via psql.<\/p>\n<p>Here\u2019s how I cracked open the doors to Redshift:<\/p>\n<pre>psql -h &lt;your-redshift-hostname&gt; -p &lt;port-number&gt; -U &lt;your-username&gt; -d &lt;your-database-name&gt;<\/pre>\n<p>Once I was in, I felt like stepping into a data powerhouse. The next step? Building a table that\u2019s smart, efficient, and ready for action.<\/p>\n<p><strong>[ Also Read Part 2: <a href=\"https:\/\/opstree.com\/blog\/2025\/01\/14\/part-2-automating-data-migration-with-apache-airflow\/\">Automating Data Migration Using Apache Airflow: A Step-by-Step Guide<\/a> ]<\/strong><\/p>\n<h2>Step 2: Crafting a Supercharged Table<\/h2>\n<p>Tables are the backbone of any database, and in Redshift, designing them smartly can save you a world of pain. I created a users table with a structure optimized for Redshift\u2019s columnar architecture and distribution capabilities:<\/p>\n<pre>CREATE TABLE users ( user_id INT PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), email VARCHAR(100), created_at TIMESTAMP, last_login TIMESTAMP ) DISTSTYLE KEY\r\nDISTKEY(user_id) SORTKEY(user_id) ENCODE BYTEDICT;<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-19874\" src=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114509-300x42.png\" alt=\"\" width=\"664\" height=\"93\" srcset=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114509-300x42.png 300w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114509-1024x144.png 1024w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114509-768x108.png 768w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114509-1536x216.png 1536w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114509-1200x169.png 1200w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114509.png 1751w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/p>\n<h4><span data-contrast=\"auto\">Here\u2019s what makes this setup special:<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/h4>\n<ul>\n<li><b><span data-contrast=\"auto\">DISTSTYLE KEY &amp; DISTKEY (user_id):<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\"><br \/>\n<\/span>The <b><span data-contrast=\"auto\">DISTSTYLE KEY<\/span><\/b><span data-contrast=\"auto\"> setting, combined with <\/span><b><span data-contrast=\"auto\">DISTKEY(user_id)<\/span><\/b><span data-contrast=\"auto\">, is a clever way to ensure that the data is distributed evenly across all compute nodes. When you use <\/span><b><span data-contrast=\"auto\">DISTSTYLE KEY<\/span><\/b><span data-contrast=\"auto\">, Redshift distributes the data based on the specified column\u2014in this case, user_id. This minimizes the data shuffling that can happen during query execution, which is especially important for large datasets. By choosing user_id as the distribution key, Redshift can better handle queries that filter or join on this column without needing to move data across nodes.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"auto\">SORTKEY (user_id):<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\"><br \/>\n<\/span>Sorting the data by user_id helps Redshift quickly locate the records it needs when executing queries that filter by user_id. Think of it like arranging a set of books by author\u2019s last name\u2014if you know the author, you can jump straight to the right section. With the <b><span data-contrast=\"auto\">SORTKEY<\/span><\/b><span data-contrast=\"auto\"> on user_id, Redshift doesn\u2019t have to scan the entire table. It can skip irrelevant data, making retrieval faster.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"auto\">Column Compression (ENCODE BYTEDICT):<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\"><br \/>\n<\/span>Redshift&#8217;s columnar storage and compression capabilities are key to making large datasets manageable. I used the <b><span data-contrast=\"auto\">BYTEDICT<\/span><\/b><span data-contrast=\"auto\"> encoding to compress columns, which stores repeating values as a dictionary. This technique reduces storage space and speeds up query performance because fewer bytes need to be read from disk. It\u2019s a space-saver that also enhances I\/O efficiency.<\/span><\/li>\n<\/ul>\n<p><strong>[ Are you looking: <a href=\"https:\/\/opstree.com\/blog\/2023\/06\/20\/database-migration-service-in-aws\/\">AWS\u00a0Database Migration Services<\/a>]\u00a0<\/strong><\/p>\n<h2 aria-level=\"3\"><b><span data-contrast=\"auto\">Step 3: Inserting Test Data\u2014The Fun Begins!<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:281,&quot;335559739&quot;:281}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">Why test with boring, clean data when you can simulate the real-world mess? I created 200 rows of diverse data with:<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<ol>\n<li><b><span data-contrast=\"auto\">Skewed User IDs (150\u2013160):<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\"><br \/>\n<\/span>Because real-world data isn\u2019t always evenly distributed. This tested how well Redshift handles imbalances.<span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"auto\">NULL Values:<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\"><br \/>\n<\/span>Every 10th row had NULL values for created_at and last_login\u2014because life is messy, and so is data.<span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"auto\">Randomized Timestamps:<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\"><br \/>\n<\/span>For added realism, I introduced randomized timestamps. Data should feel alive, right?<span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/li>\n<\/ol>\n<p><span data-contrast=\"auto\">Here\u2019s the Python script I wrote to bring this to life:<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559731&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<p><a href=\"https:\/\/github.com\/ramneek2109\/DataMigration\/blob\/main\/test_data.py\" target=\"_blank\" rel=\"noopener\"><span data-contrast=\"none\">https:\/\/github.com\/ramneek2109\/DataMigration\/blob\/main\/test_data.py<\/span><\/a><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559731&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559731&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><span data-contrast=\"auto\">The result? A realistic dataset that was ready to be challenged!<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559731&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-19875\" src=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114839-300x48.png\" alt=\"\" width=\"625\" height=\"100\" srcset=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114839-300x48.png 300w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114839-1024x164.png 1024w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114839-768x123.png 768w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114839-1536x246.png 1536w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114839-1200x192.png 1200w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114839.png 1717w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/p>\n<h2 aria-level=\"3\"><b><span data-contrast=\"auto\">Step 4: Loading Data Like a Pro with the COPY Command<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:281,&quot;335559739&quot;:281}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">When it comes to moving large datasets, Redshift\u2019s <\/span><b><span data-contrast=\"auto\">COPY command<\/span><\/b><span data-contrast=\"auto\"> is a game-changer. It\u2019s fast, efficient, and designed for scale. For this PoC, I loaded a compressed CSV file stored in <a href=\"https:\/\/opstree.com\/blog\/2024\/11\/05\/amazon-s3-security-essentials-protect-your-data-with-these-key-practices\/\">Amazon S3<\/a>.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">But here\u2019s the twist: 10 records in the file conflicted with existing user_id values in the users table. If I tried loading directly, it would fail. So, I got creative.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<h4 aria-level=\"4\"><b><i><span data-contrast=\"auto\">Step 4.1: Load Data into a Staging Table<\/span><\/i><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:319,&quot;335559739&quot;:319}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"auto\">First, I used a staging table to load all data safely without disrupting the existing table:<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<pre><span class=\"TextRun SCXW183287159 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW183287159 BCX8\">CREATE TEMP TABLE <\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW183287159 BCX8\">staging_users<\/span><span class=\"NormalTextRun SCXW183287159 BCX8\"> ( <\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW183287159 BCX8\">user_id<\/span><span class=\"NormalTextRun SCXW183287159 BCX8\"> INT, <\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW183287159 BCX8\">first_name<\/span><span class=\"NormalTextRun SCXW183287159 BCX8\"> VARCHAR(100), <\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW183287159 BCX8\">last_name<\/span><span class=\"NormalTextRun SCXW183287159 BCX8\"> VARCHAR(100), email VARCHAR(100), <\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW183287159 BCX8\">created_at<\/span><span class=\"NormalTextRun SCXW183287159 BCX8\"> TIMESTAMP, <\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW183287159 BCX8\">last_login<\/span><span class=\"NormalTextRun SCXW183287159 BCX8\"> TIMESTAMP ); COPY <\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW183287159 BCX8\">staging_users<\/span><span class=\"NormalTextRun SCXW183287159 BCX8\"> FROM 's3:\/\/&lt;your-s3-bucket-path&gt;\/&lt;file-name&gt;.csv.gz' CREDENTIALS '<\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW183287159 BCX8\">aws_iam_role<\/span><span class=\"NormalTextRun SCXW183287159 BCX8\">=&lt;your-<\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW183287159 BCX8\">iam<\/span><span class=\"NormalTextRun SCXW183287159 BCX8\">-role&gt;' DELIMITER ',' GZIP IGNOREHEADER 1;<\/span><\/span><span class=\"EOP SCXW183287159 BCX8\" data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-19876\" src=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114953-300x83.png\" alt=\"\" width=\"687\" height=\"190\" srcset=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114953-300x83.png 300w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114953-1024x284.png 1024w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114953-768x213.png 768w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114953-1200x333.png 1200w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-114953.png 1325w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-19877\" src=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115020-300x84.png\" alt=\"\" width=\"700\" height=\"196\" srcset=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115020-300x84.png 300w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115020-1024x288.png 1024w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115020-768x216.png 768w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115020-1200x338.png 1200w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115020.png 1329w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/p>\n<h4 aria-level=\"4\"><b><i><span data-contrast=\"auto\">Step 4.2: Insert Non-Conflicting Records<\/span><\/i><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:319,&quot;335559739&quot;:319}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"auto\">Next, I inserted only the non-conflicting rows into the users table:<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<pre><span data-contrast=\"auto\">INSERT INTO users (user_id, first_name, last_name, email, created_at, last_login) SELECT s.* FROM staging_users s LEFT JOIN users u ON s.user_id = u.user_id WHERE u.user_id IS NULL;<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-19878\" src=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115057-300x83.png\" alt=\"\" width=\"636\" height=\"176\" srcset=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115057-300x83.png 300w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115057-1024x283.png 1024w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115057-768x212.png 768w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115057-1200x331.png 1200w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-07-115057.png 1325w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/p>\n<p><span class=\"TextRun SCXW85637737 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW85637737 BCX8\">With this approach, I ensured that only unique <\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW85637737 BCX8\">user_id<\/span><span class=\"NormalTextRun SCXW85637737 BCX8\"> records made it into the <\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2Themed SCXW85637737 BCX8\">users<\/span><span class=\"NormalTextRun SCXW85637737 BCX8\"> table.<\/span><\/span><span class=\"EOP SCXW85637737 BCX8\" data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The rest? Skipped like a pro.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Finally, I cleaned up the staging table:<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<pre><span data-contrast=\"auto\">DROP TABLE staging_users;<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559731&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/pre>\n<h2 aria-level=\"3\"><b><span data-contrast=\"auto\">What I Learned (and Why It Matters)<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:281,&quot;335559739&quot;:281}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">This part of the PoC taught me some valuable lessons:<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/p>\n<ol>\n<li><b><span data-contrast=\"auto\">Optimization is Everything:<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\"><br \/>\n<\/span>Redshift\u2019s performance relies heavily on how you design your tables. Choosing the right distribution and sort keys is key to unlocking its full potential.<span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"auto\">COPY Command is Your Best Friend:<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\"><br \/>\n<\/span>It\u2019s fast, efficient, and handles large data like a champ. Pair it with staging tables for ultimate control.<span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"auto\">Test Like It\u2019s Real:<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\"><br \/>\n<\/span>Simulating real-world scenarios with skewed data, NULL values, and conflicts helped me prepare for edge cases during the migration.<span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:720,&quot;335559738&quot;:240,&quot;335559739&quot;:240}\">\u00a0<\/span><\/li>\n<\/ol>\n<p><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><span data-contrast=\"auto\">This was just the first step in my journey. In the next part of this blog series, I\u2019ll delve into migrating this data from Redshift to BigQuery using the power of <a href=\"https:\/\/opstree.com\/blog\/2024\/08\/13\/building-and-managing-production-ready-apache-airflow\/\"><strong>Apache Airflow<\/strong><\/a>. It only gets more exciting from here\u2014stay tuned!<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>When it comes to handling massive datasets, choosing the right approach can make or break your system\u2019s performance. In this blog, I\u2019ll take you through the first half of my Proof of Concept (PoC) journey\u2014preparing data in Amazon Redshift for migration to Google BigQuery. From setting up Redshift to crafting an efficient data ingestion pipeline, &hellip; <a href=\"https:\/\/opstree.com\/blog\/2025\/01\/07\/a-step-into-the-world-of-data-mastery-optimizing-redshift-for-seamless-migration\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;How to Optimize Amazon Redshift for Faster and Seamless Data Migration&#8221;<\/span><\/a><\/p>\n","protected":false},"author":244582684,"featured_media":19882,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[28070474],"tags":[768739372,768739439,768739438,4996032],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/01\/Blog-Image-Template-4.png","jetpack_likes_enabled":false,"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pfDBOm-5ax","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/19873"}],"collection":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/users\/244582684"}],"replies":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/comments?post=19873"}],"version-history":[{"count":4,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/19873\/revisions"}],"predecessor-version":[{"id":29433,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/19873\/revisions\/29433"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/media\/19882"}],"wp:attachment":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/media?parent=19873"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/categories?post=19873"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/tags?post=19873"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}