Cleanup Binaries in Git History
April 8, 2022
It is considered bad etiquette to track binary objects without the use of LFS, this significantly impacts the performance of the repository for all participants.
git Large File Storage replaces larger files with text pointers inside git while storing the actual files differently to prevent significant growth of the history. LFS was created by github, it is an open source project. There are other systems for handling large binary data with git, such as git-annex, but lfs has gained the most traction.
To use lfs you should typically start with a new repository, adding lfs to an existing repository will involve re-writing history if you’ve already tracked binary files.
$ git lfs install
Updated git hooks.
Git LFS initialized.
This makes changes in the repository .git
folder as well as adds some configuration to your ~/.gitconfig
. Your ~/.gitconfig
naturally persists across all projects (git level configuration) but the specific hooks for lfs have to be created in each project to “install” lfs properly for that repository. To peer in a little more, look at the difference in your .git/hooks
before and after running git lfs install
.
Cleanup Binaries in History #
I have a repository that I’ve found some .m4v
files tracked in git history rather than lfs. Using git lfs migrate
I can examine the impact of these files on the repository:
$ git lfs migrate info –everything –include="*.m4v"
migrate: Sorting commits: ..., done.
migrate: Examining commits: 100% (4/4), done.
*.m4v 289 MB 9/9 files 100%
info
is a “dry run”--everything
means to examine all branches--include=""
is a comma separate list
Now to do the migration with the import
command, you will want to make sure your working copy is committed as this will rewrite history and discard staged but not stashed changes:
git lfs migrate import –everything –include="*.m4v"
migrate: Sorting commits: ..., done.
migrate: Rewriting commits: 100% (4/4), done.
main 214094c9155721c0eba1d4416ddd467b43905f28 -> 87b8c386d9e6d0cda2b5a6796a29fabea7baf451
migrate: Updating refs: ..., done.
migrate: checkout: ..., done.
Since we’ve rewritten history we’re likely going to need to --force
push. Most forge systems will have branch protection on by default so you may need to toggle that off temporarily to make your fix.
Bonus, a chungus .gitattributes
file
#
Between Linux and Windows there are case sensitivity challenges, this uses a glob to get around that:
# document
*.[oO][dD][tT] filter=lfs diff=lfs merge=lfs -text
*.[oO][dD][pP] filter=lfs diff=lfs merge=lfs -text
*.[oO][dD][sS] filter=lfs diff=lfs merge=lfs -text
*.[pP][pP][tT] filter=lfs diff=lfs merge=lfs -text
*.[pP][pP][tT][xX] filter=lfs diff=lfs merge=lfs -text
*.[dD][oO][cC] filter=lfs diff=lfs merge=lfs -text
*.[dD][oO][cC][xX] filter=lfs diff=lfs merge=lfs -text
*.[xX][lL][sS] filter=lfs diff=lfs merge=lfs -text
*.[xX][lL][sS][xX] filter=lfs diff=lfs merge=lfs -text
*.[pP][dD][fF] filter=lfs diff=lfs merge=lfs -text
# image
*.[jJ][pP][gG] filter=lfs diff=lfs merge=lfs -text
*.[jJ][pP][eE][gG] filter=lfs diff=lfs merge=lfs -text
*.[pP][nN][gG] filter=lfs diff=lfs merge=lfs -text
*.[tT][gG][aA] filter=lfs diff=lfs merge=lfs -text
*.[pP][aA][aA] filter=lfs diff=lfs merge=lfs -text
*.[gG][iI][fF] filter=lfs diff=lfs merge=lfs -text
*.[wW][eE][bB][pP] filter=lfs diff=lfs merge=lfs -text
*.[tT][iI][fF] filter=lfs diff=lfs merge=lfs -text
*.[tT][iI][fF][fF] filter=lfs diff=lfs merge=lfs -text
*.[rR][eE][fF] filter=lfs diff=lfs merge=lfs -text
*.[dD][nN][gG] filter=lfs diff=lfs merge=lfs -text
*.[xX][mM][pP] filter=lfs diff=lfs merge=lfs -text
*.[aA][rR][wW] filter=lfs diff=lfs merge=lfs -text
# video
*.[mM][pP]4 filter=lfs diff=lfs merge=lfs -text
*.[mM][oO][vV] filter=lfs diff=lfs merge=lfs -text
*.[wW][mM][vV] filter=lfs diff=lfs merge=lfs -text
*.[oO][gG][gG] filter=lfs diff=lfs merge=lfs -text
*.[wW][eE][bB][mM] filter=lfs diff=lfs merge=lfs -text
*.[mM][pP][gG] filter=lfs diff=lfs merge=lfs -text
*.[mM][kK][vV] filter=lfs diff=lfs merge=lfs -text
*.[aA][vV][iI] filter=lfs diff=lfs merge=lfs -text
*.[fF][lL][vV] filter=lfs diff=lfs merge=lfs -text
*.[qQ][tT][fF][fF] filter=lfs diff=lfs merge=lfs -text
*.[mM]4[vV] filter=lfs diff=lfs merge=lfs -text
# audio
*.[mM][pP]3 filter=lfs diff=lfs merge=lfs -text
# 2d
*.[pP][sS][dD] filter=lfs diff=lfs merge=lfs -text
*.[iI][nN][dD][dD] filter=lfs diff=lfs merge=lfs -text
*.[iI][nN][dD][lL] filter=lfs diff=lfs merge=lfs -text
*.[iI][nN][dD][tT] filter=lfs diff=lfs merge=lfs -text
*.[iI][nN][dD][bB] filter=lfs diff=lfs merge=lfs -text
*.[dD][rR][aA][wW][iI][oO] filter=lfs diff=lfs merge=lfs -text
# 3d
*.[bB][lL][eE][nN][dD] filter=lfs diff=lfs merge=lfs -text
*.[oO][bB][jJ] filter=lfs diff=lfs merge=lfs -text
*.[sS][pP][pP] filter=lfs diff=lfs merge=lfs -text
*.[pP]3[dD] filter=lfs diff=lfs merge=lfs -text
*.[fF][bB][xX] filter=lfs diff=lfs merge=lfs -text
*.[sS][tT][eE][pP] filter=lfs diff=lfs merge=lfs -text
# archive
*.[zZ][iI][pP] filter=lfs diff=lfs merge=lfs -text
*.7[zZ] filter=lfs diff=lfs merge=lfs -text
*.[tT][aA][xX] filter=lfs diff=lfs merge=lfs -text