Cleanup Binaries in Git History

Cleanup Binaries in Git History

April 8, 2022
Development
Git

It is considered bad etiquette to track binary objects without the use of LFS, this significantly impacts the performance of the repository for all participants.

git Large File Storage replaces larger files with text pointers inside git while storing the actual files differently to prevent significant growth of the history. LFS was created by github, it is an open source project. There are other systems for handling large binary data with git, such as git-annex, but lfs has gained the most traction.

To use lfs you should typically start with a new repository, adding lfs to an existing repository will involve re-writing history if you’ve already tracked binary files.

$ git lfs install

Updated git hooks.
Git LFS initialized.

This makes changes in the repository .git folder as well as adds some configuration to your ~/.gitconfig. Your ~/.gitconfig naturally persists across all projects (git level configuration) but the specific hooks for lfs have to be created in each project to “install” lfs properly for that repository. To peer in a little more, look at the difference in your .git/hooks before and after running git lfs install.

Cleanup Binaries in History #

I have a repository that I’ve found some .m4v files tracked in git history rather than lfs. Using git lfs migrate I can examine the impact of these files on the repository:

$ git lfs migrate info –everything –include="*.m4v"

migrate: Sorting commits: ..., done.
migrate: Examining commits: 100% (4/4), done.
*.m4v	289 MB	9/9 files	100%
  • info is a “dry run”
  • --everything means to examine all branches
  • --include="" is a comma separate list

Now to do the migration with the import command, you will want to make sure your working copy is committed as this will rewrite history and discard staged but not stashed changes:

git lfs migrate import –everything –include="*.m4v"

migrate: Sorting commits: ..., done.
migrate: Rewriting commits: 100% (4/4), done.
  main	214094c9155721c0eba1d4416ddd467b43905f28 -> 87b8c386d9e6d0cda2b5a6796a29fabea7baf451
migrate: Updating refs: ..., done.
migrate: checkout: ..., done.

Since we’ve rewritten history we’re likely going to need to --force push. Most forge systems will have branch protection on by default so you may need to toggle that off temporarily to make your fix.

Bonus, a chungus .gitattributes file #

Between Linux and Windows there are case sensitivity challenges, this uses a glob to get around that:

# document
*.[oO][dD][tT] filter=lfs diff=lfs merge=lfs -text
*.[oO][dD][pP] filter=lfs diff=lfs merge=lfs -text
*.[oO][dD][sS] filter=lfs diff=lfs merge=lfs -text
*.[pP][pP][tT] filter=lfs diff=lfs merge=lfs -text
*.[pP][pP][tT][xX] filter=lfs diff=lfs merge=lfs -text
*.[dD][oO][cC] filter=lfs diff=lfs merge=lfs -text
*.[dD][oO][cC][xX] filter=lfs diff=lfs merge=lfs -text
*.[xX][lL][sS] filter=lfs diff=lfs merge=lfs -text
*.[xX][lL][sS][xX] filter=lfs diff=lfs merge=lfs -text
*.[pP][dD][fF] filter=lfs diff=lfs merge=lfs -text
# image
*.[jJ][pP][gG] filter=lfs diff=lfs merge=lfs -text
*.[jJ][pP][eE][gG] filter=lfs diff=lfs merge=lfs -text
*.[pP][nN][gG] filter=lfs diff=lfs merge=lfs -text
*.[tT][gG][aA] filter=lfs diff=lfs merge=lfs -text
*.[pP][aA][aA] filter=lfs diff=lfs merge=lfs -text
*.[gG][iI][fF] filter=lfs diff=lfs merge=lfs -text
*.[wW][eE][bB][pP] filter=lfs diff=lfs merge=lfs -text
*.[tT][iI][fF] filter=lfs diff=lfs merge=lfs -text
*.[tT][iI][fF][fF] filter=lfs diff=lfs merge=lfs -text
*.[rR][eE][fF] filter=lfs diff=lfs merge=lfs -text
*.[dD][nN][gG] filter=lfs diff=lfs merge=lfs -text
*.[xX][mM][pP] filter=lfs diff=lfs merge=lfs -text
*.[aA][rR][wW] filter=lfs diff=lfs merge=lfs -text
# video
*.[mM][pP]4 filter=lfs diff=lfs merge=lfs -text
*.[mM][oO][vV] filter=lfs diff=lfs merge=lfs -text
*.[wW][mM][vV] filter=lfs diff=lfs merge=lfs -text
*.[oO][gG][gG] filter=lfs diff=lfs merge=lfs -text
*.[wW][eE][bB][mM] filter=lfs diff=lfs merge=lfs -text
*.[mM][pP][gG] filter=lfs diff=lfs merge=lfs -text
*.[mM][kK][vV] filter=lfs diff=lfs merge=lfs -text
*.[aA][vV][iI] filter=lfs diff=lfs merge=lfs -text
*.[fF][lL][vV] filter=lfs diff=lfs merge=lfs -text
*.[qQ][tT][fF][fF] filter=lfs diff=lfs merge=lfs -text
*.[mM]4[vV] filter=lfs diff=lfs merge=lfs -text
# audio
*.[mM][pP]3 filter=lfs diff=lfs merge=lfs -text
# 2d
*.[pP][sS][dD] filter=lfs diff=lfs merge=lfs -text
*.[iI][nN][dD][dD] filter=lfs diff=lfs merge=lfs -text
*.[iI][nN][dD][lL] filter=lfs diff=lfs merge=lfs -text
*.[iI][nN][dD][tT] filter=lfs diff=lfs merge=lfs -text
*.[iI][nN][dD][bB] filter=lfs diff=lfs merge=lfs -text
*.[dD][rR][aA][wW][iI][oO] filter=lfs diff=lfs merge=lfs -text
# 3d
*.[bB][lL][eE][nN][dD] filter=lfs diff=lfs merge=lfs -text
*.[oO][bB][jJ] filter=lfs diff=lfs merge=lfs -text
*.[sS][pP][pP] filter=lfs diff=lfs merge=lfs -text
*.[pP]3[dD] filter=lfs diff=lfs merge=lfs -text
*.[fF][bB][xX] filter=lfs diff=lfs merge=lfs -text
*.[sS][tT][eE][pP] filter=lfs diff=lfs merge=lfs -text
# archive
*.[zZ][iI][pP] filter=lfs diff=lfs merge=lfs -text
*.7[zZ] filter=lfs diff=lfs merge=lfs -text
*.[tT][aA][xX] filter=lfs diff=lfs merge=lfs -text