{"id":31066,"date":"2026-04-07T14:09:09","date_gmt":"2026-04-07T08:39:09","guid":{"rendered":"https:\/\/opstree.com\/blog\/?p=31066"},"modified":"2026-04-07T14:09:09","modified_gmt":"2026-04-07T08:39:09","slug":"git-history-rewrite-at-scale-removing-100mb-files-safely","status":"publish","type":"post","link":"https:\/\/opstree.com\/blog\/2026\/04\/07\/git-history-rewrite-at-scale-removing-100mb-files-safely\/","title":{"rendered":"Git History Rewrite at Scale: Removing 100MB+ Files Safely"},"content":{"rendered":"<div style=\"background: #f8fafc; padding: 18px; border: 1px solid #e2e8f0; border-radius: 6px; font-family: Inter, Arial, sans-serif; margin: 20px 0;\">\n<h2 style=\"margin-top: 0; font-size: 18px;\">Table of Contents<\/h2>\n<ol style=\"margin: 0; padding-left: 18px; line-height: 1.7; font-size: 14px;\">\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#introduction\">Introduction<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#problem-large-files-git\">The Problem with Large Files in Git<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#requirements\">Requirements<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#architecture-overview\">Architecture Overview<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#implementation-strategy\">Implementation Strategy<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#verification-method\">Verification Method<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#results\">Results<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#lessons-learned\">Lessons Learned<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#best-practices-production\">Best Practices for Production<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#conclusion\">Conclusion<\/a><\/li>\n<\/ol>\n<\/div>\n<h2 id=\"introduction\">Introduction<\/h2>\n<p>Large files inside <a href=\"https:\/\/opstree.com\/blog\/2022\/06\/21\/what-is-a-bare-git-repository\/\" target=\"_blank\" rel=\"noopener\">Git repositories<\/a> are a silent problem. They increase clone times, inflate repository size, and in platforms like Bitbucket Cloud, can completely block pushes once files exceed 100MB.<\/p>\n<p>During a migration exercise, we encountered multiple repositories containing large binary files embedded directly in Git history. Some were intentionally added during testing; others were legacy artifacts. Regardless of origin, the impact was the same: repository growth, push failures, and migration risk.<\/p>\n<p>We needed a scalable, production-safe solution to:<\/p>\n<ul>\n<li>Identify files larger than 100MB<\/li>\n<li>Preserve those files safely<\/li>\n<li>Remove them from Git history<\/li>\n<li>Maintain traceability<\/li>\n<li>Avoid Git LFS<\/li>\n<li>Process multiple repositories in batch<\/li>\n<\/ul>\n<p>This article explains the approach, implementation, and verification process.<\/p>\n<h2 id=\"problem-large-files-git\">The Problem with Large Files in Git<\/h2>\n<p>Git is optimized for source code, not large binaries. When a file larger than 100MB is committed:<\/p>\n<ul>\n<li>It becomes part of Git object history.<\/li>\n<li>Even if later deleted, the blob remains in history.<\/li>\n<li>Every clone downloads that blob.<\/li>\n<li>Bitbucket Cloud blocks pushes containing files \u2265100MB.<\/li>\n<li>Repository size increases permanently unless history is rewritten.<\/li>\n<\/ul>\n<p>Deleting the file in a new commit is not enough. The blob must be removed from the entire commit graph.<\/p>\n<div style=\"border: 1px solid #d1d5db; padding: 16px; margin: 20px 0; background-color: #f0f4f8;\">\n<p style=\"margin: 0; font-weight: 600; font-size: 16px;\">Are you ready for limitless expansion? Take advantage of seamless <a href=\"https:\/\/opstree.com\/services\/cloud-migration-and-modernization-services\/\" target=\"_blank\" rel=\"noopener\">cloud migration services<\/a> designed to accelerate your business growth today.<\/p>\n<\/div>\n<h2 id=\"requirements\">Requirements<\/h2>\n<p>We defined clear technical requirements:<\/p>\n<ol>\n<li>Scan multiple repositories under a parent directory.<\/li>\n<li>Detect files larger than 100MB in:\n<ul>\n<li>Working directory<\/li>\n<li>Full Git history<\/li>\n<\/ul>\n<\/li>\n<li>Generate a detailed CSV audit report.<\/li>\n<li>Back up repositories before modification.<\/li>\n<li>Archive large blobs to S3 before removal.<\/li>\n<li>Rewrite Git history safely.<\/li>\n<li>Force push cleaned repositories.<\/li>\n<li>Verify that no large blobs remain.<\/li>\n<\/ol>\n<p><!-- notionvc: ba859719-6b5c-4d14-85bc-9f0f7a3d84fd --><\/p>\n<h2 id=\"architecture-overview\">Architecture Overview<\/h2>\n<p>The cleanup workflow followed this structure:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-31067 aligncenter\" src=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2026\/04\/Gemini_Generated_Image_j2y0tsj2y0tsj2y0.png\" alt=\"\" width=\"1408\" height=\"768\" srcset=\"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2026\/04\/Gemini_Generated_Image_j2y0tsj2y0tsj2y0.png 1408w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2026\/04\/Gemini_Generated_Image_j2y0tsj2y0tsj2y0-300x164.png 300w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2026\/04\/Gemini_Generated_Image_j2y0tsj2y0tsj2y0-1024x559.png 1024w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2026\/04\/Gemini_Generated_Image_j2y0tsj2y0tsj2y0-768x419.png 768w, https:\/\/opstree.com\/blog\/wp-content\/uploads\/2026\/04\/Gemini_Generated_Image_j2y0tsj2y0tsj2y0-1200x655.png 1200w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/p>\n<h2 id=\"implementation-strategy\">Implementation Strategy<\/h2>\n<h3>1. Repository Discovery<\/h3>\n<p>All repositories were discovered under a specified parent directory by locating <code>.git<\/code> folders. This allowed batch processing without hardcoding repository names.<\/p>\n<h3>2. Scanning Git History<\/h3>\n<p>To detect large blobs in history, we relied on Git\u2019s object database:<\/p>\n<p><!-- notionvc: facc32bb-d062-4340-9250-8e7b5b330fb4 --><\/p>\n<p><!-- notionvc: 3e49194c-cec5-4969-9d59-531ce4ea3308 --><\/p>\n<pre style=\"background: #0f172a; color: #e5e7eb; padding: 16px; border-radius: 8px; font-size: 13px; line-height: 1.6; overflow-x: auto; max-width: 100%;\">git rev-list --objects --all \\\r\n| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)'\r\n<\/pre>\n<p>We filtered for blobs larger than 100MB (104857600 bytes). This approach ensures that even deleted historical files are detected.<\/p>\n<h3>3. CSV Report Generation<\/h3>\n<p>For traceability, a consolidated CSV report was generated containing:<\/p>\n<ul>\n<li>Repository name<\/li>\n<li>File path<\/li>\n<li>File size (bytes + human-readable)<\/li>\n<li>Blob hash<\/li>\n<li>Commit hash<\/li>\n<li>Target S3 path<\/li>\n<\/ul>\n<p>This report served as:<\/p>\n<ul>\n<li>A dry-run validation artifact<\/li>\n<li>An audit record<\/li>\n<li>A mapping between Git history and <a href=\"https:\/\/opstree.com\/blog\/2026\/02\/03\/serverless-log-analytics-aws\/\" target=\"_blank\" rel=\"noopener\">S3 storage<\/a><\/li>\n<\/ul>\n<h3>4. S3 Archival Before Deletion<\/h3>\n<p>Before rewriting history, large blobs were extracted using:<\/p>\n<pre style=\"background: #0f172a; color: #e5e7eb; padding: 16px; border-radius: 8px; font-size: 13px; line-height: 1.6; overflow-x: auto; max-width: 100%;\">git cat-file -p &lt;blob-hash&gt;<\/pre>\n<p><!-- notionvc: 08dcaf7c-5d64-4568-a5c3-7015bbd7216a --><\/p>\n<p>They were uploaded to S3 using a structured path:<\/p>\n<pre style=\"background: #0f172a; color: #e5e7eb; padding: 16px; border-radius: 8px; font-size: 13px; line-height: 1.6; overflow-x: auto; max-width: 100%;\">s3:\/\/&lt;bucket&gt;\/&lt;repo-name&gt;\/&lt;commit-hash&gt;\/&lt;file-name&gt;<\/pre>\n<p>This ensured:<\/p>\n<ul>\n<li>No data loss<\/li>\n<li>Full traceability<\/li>\n<li>Commit-level mapping<\/li>\n<li>Easy retrieval if required<\/li>\n<\/ul>\n<p><!-- notionvc: e7c1c800-cdfe-4427-965c-4e473d9efbb0 --><\/p>\n<h3>5. Safe History Rewrite<\/h3>\n<p>We used <code>git-filter-repo<\/code>, which is the modern, recommended alternative to <code>git filter-branch<\/code>.<\/p>\n<p>For each repository:<\/p>\n<ul>\n<li>Paths to remove were collected from the CSV report.<\/li>\n<li><code>git-filter-repo --invert-paths<\/code> was used to remove those paths from all commits.<\/li>\n<li>A full backup tarball was created before execution.<\/li>\n<li>A confirmation prompt prevented accidental execution.<\/li>\n<\/ul>\n<p>This resulted in a new, clean commit graph with large blobs removed.<\/p>\n<p><!-- notionvc: 759f9f4c-50b7-4492-b63f-28a359c17256 --><\/p>\n<h3>6. Force Push and Remote Restoration<\/h3>\n<p>Since history was rewritten:<\/p>\n<ul>\n<li>All commit hashes changed.<\/li>\n<li>A force push was required.<\/li>\n<li>Team members were instructed to re-clone repositories.<\/li>\n<\/ul>\n<p>Remote URLs were preserved or restored automatically to ensure push continuity.<\/p>\n<div style=\"border: 1px solid #d1d5db; padding: 16px; margin: 20px 0; background-color: #f0f4f8;\">\n<p style=\"margin: 0; font-weight: 600; font-size: 16px;\">Also Read- <a href=\"https:\/\/opstree.com\/blog\/2026\/02\/10\/event-hub-vs-confluent-cloud\/\" target=\"_blank\" rel=\"noopener\">Event Hub vs Confluent Cloud: Which One Should You Use and When? <\/a><span style=\"background-color: #ffffff; font-weight: 400;\">\u00a0<\/span><\/p>\n<\/div>\n<p><!-- notionvc: d5f8c31a-5f99-4bb4-a68c-1e6d5e2a9497 --><\/p>\n<h2 id=\"verification-method\">Verification Method<\/h2>\n<p>Cleanup was verified using a direct scan of Git objects:<\/p>\n<pre style=\"background: #0f172a; color: #e5e7eb; padding: 16px; border-radius: 8px; font-size: 13px; line-height: 1.6; overflow-x: auto; max-width: 100%;\">git rev-list--objects--all \\\r\n|git cat-file--batch-check='%(objecttype) %(objectname) %(objectsize)' \\\r\n|awk'$1 == \"blob\" &amp;&amp; $3 &gt;= 104857600'<\/pre>\n<p><!-- notionvc: 24a65db8-dae4-4a56-8e49-bb901d900977 --><\/p>\n<p>If the command returned no output, it confirmed:<\/p>\n<ul>\n<li>No blob \u2265100MB remained<\/li>\n<li>History rewrite was successful<\/li>\n<li>Repository was safe to push and clone<\/li>\n<\/ul>\n<p>This verification step is critical and should never be skipped.<\/p>\n<p><!-- notionvc: 59f056c7-ea0b-4713-a012-fd5e9937b57d --><\/p>\n<h2 id=\"results\">Results<\/h2>\n<ul>\n<li>10 repositories processed<\/li>\n<li>~3GB of large blobs identified<\/li>\n<li>All large files archived to S3<\/li>\n<li>Git history rewritten safely<\/li>\n<li>Force push completed<\/li>\n<li>No remaining blobs \u2265100MB in any repository<\/li>\n<li>Repositories ready for clean migration<\/li>\n<\/ul>\n<h2 id=\"lessons-learned\">Lessons Learned<\/h2>\n<ol>\n<li>Deleting files in a commit does not remove them from history.<\/li>\n<li>Always run a dry-run before destructive operations.<\/li>\n<li>Always create a full backup before rewriting history.<\/li>\n<li>Always verify using Git\u2019s object database.<\/li>\n<li>Separate source code storage from binary artifact storage.<\/li>\n<li>Avoid large binary commits unless using Git LFS intentionally.<\/li>\n<\/ol>\n<p><!-- notionvc: ef69e067-68cd-45fc-b229-45e17b2584ea --><\/p>\n<h2 id=\"best-practices-production\">Best Practices for Production<\/h2>\n<ul>\n<li>Keep repositories focused on source code.<\/li>\n<li>Use external storage (S3, artifact repositories) for large binaries.<\/li>\n<li>Automate detection of large files in CI pipelines.<\/li>\n<li>Add pre-commit or pre-receive hooks to block oversized files.<\/li>\n<li>Regularly audit repository object sizes.<\/li>\n<\/ul>\n<p><!-- notionvc: 29ce3c55-aad5-4d27-b7b8-493765c62490 --><\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>Rewriting Git history at scale is a high-impact, high-risk operation if not handled properly. However, with a structured approach, proper backups, archival strategy, and verification, it becomes a controlled and repeatable process.<\/p>\n<p>By combining Git object analysis, S3 archival, and <code>git-filter-repo<\/code>, we successfully removed large files from multiple repositories without data loss and without relying on Git LFS.<\/p>\n<p>This approach provides a scalable blueprint for teams facing similar migration or repository health challenges.<\/p>\n<p><span style=\"font-size: 28px; font-weight: 900;\">Related Searches<\/span><\/p>\n<ul>\n<li><a href=\"https:\/\/opstree.com\/aws-consulting-services\/\">AWS Consulting Service<\/a><\/li>\n<li><a href=\"https:\/\/opstree.com\/blog\/2026\/02\/03\/serverless-log-analytics-aws\/\">From Messy Logs to Structured Analytics using AWS S3, Lambda, and Athena<\/a><\/li>\n<li><a href=\"https:\/\/opstree.com\/blog\/2025\/05\/28\/aws-for-beginners-what-is-it-how-it-works-and-key-benefits\/\">AWS For Beginners: What Is It, How It Works, and Key Benefits<\/a><\/li>\n<\/ul>\n<h2>Related Solutions<\/h2>\n<ul>\n<li><a href=\"https:\/\/opstree.com\/aws-partner\/\" target=\"_blank\" rel=\"noopener\">AWS Consulting Partner<\/a><\/li>\n<li><a href=\"https:\/\/opstree.com\/services\/middleware-database-and-data-engineering\/\" target=\"_blank\" rel=\"noopener\">Data Engineering Solutions<\/a><\/li>\n<li><a href=\"https:\/\/opstree.com\/services\/generative-ai-solutions\/\" target=\"_blank\" rel=\"noopener\">Custom Generative AI Solutions<\/a><\/li>\n<\/ul>\n<p><!-- notionvc: 16a6c4b3-5f72-4b54-b774-f03f0c075e22 --><\/p>\n<p><!-- notionvc: 91d8a46c-bde3-477f-8cb8-1cfa5f2d7177 --><\/p>\n<p><!-- notionvc: 1f228318-9488-4107-9b3c-1e7280d6dec8 --><\/p>\n<p><!-- notionvc: cb52776f-4901-4740-8981-b5ef9c833290 --><\/p>\n<p><!-- notionvc: a8da6a16-7928-48bc-8023-f3eb6bd13e95 --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Table of Contents Introduction The Problem with Large Files in Git Requirements Architecture Overview Implementation Strategy Verification Method Results Lessons Learned Best Practices for Production Conclusion Introduction Large files inside Git repositories are a silent problem. They increase clone times, inflate repository size, and in platforms like Bitbucket Cloud, can completely block pushes once files &hellip; <a href=\"https:\/\/opstree.com\/blog\/2026\/04\/07\/git-history-rewrite-at-scale-removing-100mb-files-safely\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Git History Rewrite at Scale: Removing 100MB+ Files Safely&#8221;<\/span><\/a><\/p>\n","protected":false},"author":244582727,"featured_media":31068,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[209196],"tags":[43041988,768739295,699830190,28644656,343865],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2026\/04\/Untitled-design-15.png","jetpack_likes_enabled":false,"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pfDBOm-854","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/31066"}],"collection":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/users\/244582727"}],"replies":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/comments?post=31066"}],"version-history":[{"count":6,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/31066\/revisions"}],"predecessor-version":[{"id":31074,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/31066\/revisions\/31074"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/media\/31068"}],"wp:attachment":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/media?parent=31066"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/categories?post=31066"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/tags?post=31066"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}