AWS CodeCommit 不再提供給新客戶。的現有客戶 AWS CodeCommit 可以繼續正常使用服務。進一步了解"
本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
以增量方式移轉儲存庫
遷移到 AWS CodeCommit 時,請考慮依增量或區塊推送儲存庫,以減少因為間歇性網路問題或網路效能降低而造成整個推送失敗的可能性。如果採用遞增推送並搭配如此處所含的指令碼,您可以重新開始遷移,並只推送稍早嘗試時未成功的遞交。
此主題中的程序說明如何建立和執行指令碼,以增量方式遷移儲存庫,且只重新推送未成功的增量,直到遷移完成為止。
主題
步驟 0:判斷是否要以累加方式移轉
需要考量幾個因素,以決定儲存庫的整體大小及是否漸進遷移。最明顯的因素是儲存庫中成品的整體大小。像是儲存庫的累積歷史記錄,這種因素也會影響大小。具有多年的歷史記錄和分支的儲存庫可能會非常龐大,即使個別資產並不大。您可以運用多種策略來讓遷移這些儲存庫更簡單且更有效率。例如,您可以在複製開發歷史記錄很長的儲存庫時使用淺層複製策略,或對大型二進位檔案停用差異壓縮。您可以透過查詢 Git 文件來研究選項,或者您可以選擇設定和配置遞增推送,用於使用此主題 incremental-repo-migration.py
隨附的範例指令碼來遷移您的儲存庫。
如果以下一或多個條件成立,您可能想要設定遞增推送:
-
您要遷移的儲存庫有超過 5 年的歷史記錄。
-
您的網際網路連線受限於不穩定的中斷、捨棄的封包、緩慢回應或其他服務中斷。
-
儲存庫的整體大小大於 2 GB,而您要遷移整個儲存庫。
-
儲存庫包含的大型成品或二進位檔案未正確壓縮,例如具有超過 5 個追蹤版本的大型映像檔案。
-
您之前曾嘗試移轉至, CodeCommit 並收到「內部服務錯誤」訊息。
即使上述條件均不符合,您仍然可以選擇以遞增方式推送。
步驟 1:安裝必要條件並將 CodeCommit 存放庫新增為遠端
您可以建立自己的自訂指令碼,它有其自己的先決條件。如果使用此主題中的範例,您必須:
-
安裝其必要項目。
-
將儲存庫複製到本機電腦。
-
將 CodeCommit 存放庫新增為您要移轉之存放庫的遠端。
設置為運行 incremental-repo-migration .py
-
在您的本機電腦上安裝 Python 2.6 或更新版本。如需詳細資訊以及最新的版本,請參閱 Python 網站
。 -
在同一台計算機上,安裝 GitPython,這是一個用於與 Git 存儲庫交互的 Python 庫。如需詳細資訊,請參閱 GitPython文件
。 -
使用 git clone --mirror 命令來複製您要遷移到本機電腦的儲存庫。從終端機 (Linux、macOS 或 Unix) 或命令提示字元 (Windows),使用git clone --mirror指令建立儲存庫的本機存放庫,包括您要在其中建立本機存放庫的目錄。舉例來說,若要將一個名
MyMigrationRepo
為https://example.com/my-repo/
的 Git 儲存庫複製到名為my-repo
的目錄中:git clone --mirror https://example.com/my-repo/MyMigrationRepo.git my-repo
您應該會看到類似以下的輸出,這表示已將儲存庫複製到名為 my-repo 的本機儲存庫:
Cloning into bare repository 'my-repo'... remote: Counting objects: 20, done. remote: Compressing objects: 100% (17/17), done. remote: Total 20 (delta 5), reused 15 (delta 3) Unpacking objects: 100% (20/20), done. Checking connectivity... done.
-
將您剛複製的儲存庫目錄切換到本機儲存庫 (例如,
my-repo
)。從該目錄,使用 git remote addDefaultRemoteName
RemoteRepositoryURL
命令來將 CodeCommit 儲存庫新增為遠端儲存庫的本機儲存庫。注意
推送大型儲存庫時,請考慮使用 SSH,而不是 HTTPS。推送大型變更、大量變更,或大型儲存庫時,長時間執行的 HTTPS 連線經常會因為網路連線問題或防火牆設定提前終止。如需有關設定 SSH CodeCommit 的詳細資訊,請參閱Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux或適用於 Windows 上的 SSH 連線。
例如,使用以下命令為名為的遠端 CodeCommit存放庫新增名 MyDestinationRepo 為遠端存放庫的 SSH 端點
codecommit
:git remote add codecommit ssh://git-codecommit.us-east-2.amazonaws.com/v1/repos/MyDestinationRepo
提示
由於這是複製,預設的遠端名稱 (
origin
) 已在使用中。您必須使用其他遠端名稱。雖然範例使用codecommit
,您可以使用您要的任何名稱。使用 git remote show 命令來檢閱為您的本機儲存庫設定的遠端清單。 -
使用 git remote -v 命令來顯示本機儲存庫的擷取和推送設定,並確認已正確設定。例如:
codecommit ssh://git-codecommit.us-east-2.amazonaws.com/v1/repos/MyDestinationRepo (fetch) codecommit ssh://git-codecommit.us-east-2.amazonaws.com/v1/repos/MyDestinationRepo (push)
提示
如果您仍看到不同遠端儲存庫的擷取和推送項目 (例如,來源項目),請使用 git remote set-url --delete 命令來移除它們。
步驟 2:建立用於逐步移轉的指令碼
這些步驟的撰寫是假設您使用 incremental-repo-migration.py
範例指令碼。
-
開啟文字編輯器,並將範例指令碼的內容貼上至空的文件中。
-
將文件儲存在文件目錄 (而非本機儲存庫的工作目錄) 並將它命名為
incremental-repo-migration.py
。確定您選擇的目錄是在您的本機環境或路徑變數中設定的目錄,使得您可以從命令列或終端機執行 Python 指令碼。
步驟 3:執行指令碼並以遞增方式移轉至 CodeCommit
現在您已經建立了incremental-repo-migration.py
指令碼,您可以使用它將本機存放庫逐步遷移到 CodeCommit 儲存庫。依預設,指令碼將以每個批次 1,000 個遞交的方式推送遞交,並嘗試使用目錄的 Git 設定,藉此它會執行做為本機儲存庫和遠端儲存庫的設定。必要時,您可以使用 incremental-repo-migration.py
中包含的選項來設定其他設定。
-
從終端機或命令提示字元,將目錄切換到您要遷移的本機儲存庫。
-
從該目錄中,執行以下命令:
python incremental-repo-migration.py
-
指令碼會執行,並在終端機或命令提示字元中顯示進度。有些大型儲存庫顯示進度的速度會慢下來。如果單一推送失敗了三次,指令碼會停止。然後您可以重新執行指令碼,指令碼從失敗的批次開始。您可以重新執行指令碼,直到所有推送成功且遷移完成。
提示
您可以從任何目錄執行 incremental-repo-migration.py
,只要您使用 -l
和 -r
選項來指定要使用的本機和遠端設定。例如,若要從任何目錄使用指令碼將位於 /tmp/my-repo
的本機儲存庫遷移到別名為 codecommit
的遠端:
python incremental-repo-migration.py -l "/tmp/my-repo" -r "
codecommit
"
在遞增推送時,您可能還想要使用 -b
選項來變更使用的預設批次大小。例如,如果您定期推送的儲存庫具有經常變動的非常大型二進位檔案,並且從具有網路頻寬限制的位置作業,您可能希望使用 -b
選項來將批次大小變更為 500,而非 1,000。例如:
python incremental-repo-migration.py -b 500
這將以每個批次 500 個遞交的方式遞增推送本機儲存庫。如果在遷移儲存庫時,您決定再次變更批次大小 (例如,如果您決定減少在嘗試失敗後減少批次大小),請記得使用 -c
選項來移除批次標籤,之後再使用 -b
來重設批次大小:
python incremental-repo-migration.py -c python incremental-repo-migration.py -b 250
重要
如果您想要在失敗後重新執行指令碼,請勿使用 -c
選項。-c
選項會移除用於將遞交分批次的標籤。只在您想要變更批次大小並重新開始時,或如果您決定不再使用該指令碼時,才使用 -c
選項。
附錄:範例指令集 incremental-repo-migration.py
為方便起見,我們開發了一個範例 Python 指令碼 incremental-repo-migration.py
,用於以遞增的方式推送儲存庫。此指令碼是開放原始碼範例,並以現狀提供。
# Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Amazon Software License (the "License"). # You may not use this file except in compliance with the License. A copy of the License is located at # http://aws.amazon.com/asl/ # This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for # the specific language governing permissions and limitations under the License. #!/usr/bin/env python import os import sys from optparse import OptionParser from git import Repo, TagReference, RemoteProgress, GitCommandError class PushProgressPrinter(RemoteProgress): def update(self, op_code, cur_count, max_count=None, message=""): op_id = op_code & self.OP_MASK stage_id = op_code & self.STAGE_MASK if op_id == self.WRITING and stage_id == self.BEGIN: print("\tObjects: %d" % max_count) class RepositoryMigration: MAX_COMMITS_TOLERANCE_PERCENT = 0.05 PUSH_RETRY_LIMIT = 3 MIGRATION_TAG_PREFIX = "codecommit_migration_" def migrate_repository_in_parts( self, repo_dir, remote_name, commit_batch_size, clean ): self.next_tag_number = 0 self.migration_tags = [] self.walked_commits = set() self.local_repo = Repo(repo_dir) self.remote_name = remote_name self.max_commits_per_push = commit_batch_size self.max_commits_tolerance = ( self.max_commits_per_push * self.MAX_COMMITS_TOLERANCE_PERCENT ) try: self.remote_repo = self.local_repo.remote(remote_name) self.get_remote_migration_tags() except (ValueError, GitCommandError): print( "Could not contact the remote repository. The most common reasons for this error are that the name of the remote repository is incorrect, or that you do not have permissions to interact with that remote repository." ) sys.exit(1) if clean: self.clean_up(clean_up_remote=True) return self.clean_up() print("Analyzing repository") head_commit = self.local_repo.head.commit sys.setrecursionlimit(max(sys.getrecursionlimit(), head_commit.count())) # tag commits on default branch leftover_commits = self.migrate_commit(head_commit) self.tag_commits([commit for (commit, commit_count) in leftover_commits]) # tag commits on each branch for branch in self.local_repo.heads: leftover_commits = self.migrate_commit(branch.commit) self.tag_commits([commit for (commit, commit_count) in leftover_commits]) # push the tags self.push_migration_tags() # push all branch references for branch in self.local_repo.heads: print("Pushing branch %s" % branch.name) self.do_push_with_retries(ref=branch.name) # push all tags print("Pushing tags") self.do_push_with_retries(push_tags=True) self.get_remote_migration_tags() self.clean_up(clean_up_remote=True) print("Migration to CodeCommit was successful") def migrate_commit(self, commit): if commit in self.walked_commits: return [] pending_ancestor_pushes = [] commit_count = 1 if len(commit.parents) > 1: # This is a merge commit # Ensure that all parents are pushed first for parent_commit in commit.parents: pending_ancestor_pushes.extend(self.migrate_commit(parent_commit)) elif len(commit.parents) == 1: # Split linear history into individual pushes next_ancestor, commits_to_next_ancestor = self.find_next_ancestor_for_push( commit.parents[0] ) commit_count += commits_to_next_ancestor pending_ancestor_pushes.extend(self.migrate_commit(next_ancestor)) self.walked_commits.add(commit) return self.stage_push(commit, commit_count, pending_ancestor_pushes) def find_next_ancestor_for_push(self, commit): commit_count = 0 # Traverse linear history until we reach our commit limit, a merge commit, or an initial commit while ( len(commit.parents) == 1 and commit_count < self.max_commits_per_push and commit not in self.walked_commits ): commit_count += 1 self.walked_commits.add(commit) commit = commit.parents[0] return commit, commit_count def stage_push(self, commit, commit_count, pending_ancestor_pushes): # Determine whether we can roll up pending ancestor pushes into this push combined_commit_count = commit_count + sum( ancestor_commit_count for (ancestor, ancestor_commit_count) in pending_ancestor_pushes ) if combined_commit_count < self.max_commits_per_push: # don't push anything, roll up all pending ancestor pushes into this pending push return [(commit, combined_commit_count)] if combined_commit_count <= ( self.max_commits_per_push + self.max_commits_tolerance ): # roll up everything into this commit and push self.tag_commits([commit]) return [] if commit_count >= self.max_commits_per_push: # need to push each pending ancestor and this commit self.tag_commits( [ ancestor for (ancestor, ancestor_commit_count) in pending_ancestor_pushes ] ) self.tag_commits([commit]) return [] # push each pending ancestor, but roll up this commit self.tag_commits( [ancestor for (ancestor, ancestor_commit_count) in pending_ancestor_pushes] ) return [(commit, commit_count)] def tag_commits(self, commits): for commit in commits: self.next_tag_number += 1 tag_name = self.MIGRATION_TAG_PREFIX + str(self.next_tag_number) if tag_name not in self.remote_migration_tags: tag = self.local_repo.create_tag(tag_name, ref=commit) self.migration_tags.append(tag) elif self.remote_migration_tags[tag_name] != str(commit): print( "Migration tags on the remote do not match the local tags. Most likely your batch size has changed since the last time you ran this script. Please run this script with the --clean option, and try again." ) sys.exit(1) def push_migration_tags(self): print("Will attempt to push %d tags" % len(self.migration_tags)) self.migration_tags.sort( key=lambda tag: int(tag.name.replace(self.MIGRATION_TAG_PREFIX, "")) ) for tag in self.migration_tags: print( "Pushing tag %s (out of %d tags), commit %s" % (tag.name, self.next_tag_number, str(tag.commit)) ) self.do_push_with_retries(ref=tag.name) def do_push_with_retries(self, ref=None, push_tags=False): for i in range(0, self.PUSH_RETRY_LIMIT): if i == 0: progress_printer = PushProgressPrinter() else: progress_printer = None try: if push_tags: infos = self.remote_repo.push(tags=True, progress=progress_printer) elif ref is not None: infos = self.remote_repo.push( refspec=ref, progress=progress_printer ) else: infos = self.remote_repo.push(progress=progress_printer) success = True if len(infos) == 0: success = False else: for info in infos: if ( info.flags & info.UP_TO_DATE or info.flags & info.NEW_TAG or info.flags & info.NEW_HEAD ): continue success = False print(info.summary) if success: return except GitCommandError as err: print(err) if push_tags: print("Pushing all tags failed after %d attempts" % (self.PUSH_RETRY_LIMIT)) elif ref is not None: print("Pushing %s failed after %d attempts" % (ref, self.PUSH_RETRY_LIMIT)) print( "For more information about the cause of this error, run the following command from the local repo: 'git push %s %s'" % (self.remote_name, ref) ) else: print( "Pushing all branches failed after %d attempts" % (self.PUSH_RETRY_LIMIT) ) sys.exit(1) def get_remote_migration_tags(self): remote_tags_output = self.local_repo.git.ls_remote( self.remote_name, tags=True ).split("\n") self.remote_migration_tags = dict( (tag.split()[1].replace("refs/tags/", ""), tag.split()[0]) for tag in remote_tags_output if self.MIGRATION_TAG_PREFIX in tag ) def clean_up(self, clean_up_remote=False): tags = [ tag for tag in self.local_repo.tags if tag.name.startswith(self.MIGRATION_TAG_PREFIX) ] # delete the local tags TagReference.delete(self.local_repo, *tags) # delete the remote tags if clean_up_remote: tags_to_delete = [":" + tag_name for tag_name in self.remote_migration_tags] self.remote_repo.push(refspec=tags_to_delete) parser = OptionParser() parser.add_option( "-l", "--local", action="store", dest="localrepo", default=os.getcwd(), help="The path to the local repo. If this option is not specified, the script will attempt to use current directory by default. If it is not a local git repo, the script will fail.", ) parser.add_option( "-r", "--remote", action="store", dest="remoterepo", default="codecommit", help="The name of the remote repository to be used as the push or migration destination. The remote must already be set in the local repo ('git remote add ...'). If this option is not specified, the script will use 'codecommit' by default.", ) parser.add_option( "-b", "--batch", action="store", dest="batchsize", default="1000", help="Specifies the commit batch size for pushes. If not explicitly set, the default is 1,000 commits.", ) parser.add_option( "-c", "--clean", action="store_true", dest="clean", default=False, help="Remove the temporary tags created by migration from both the local repo and the remote repository. This option will not do any migration work, just cleanup. Cleanup is done automatically at the end of a successful migration, but not after a failure so that when you re-run the script, the tags from the prior run can be used to identify commit batches that were not pushed successfully.", ) (options, args) = parser.parse_args() migration = RepositoryMigration() migration.migrate_repository_in_parts( options.localrepo, options.remoterepo, int(options.batchsize), options.clean )