Home > Blockchain >  GitHub cascade down main branch changes without overriding client customisation
GitHub cascade down main branch changes without overriding client customisation

Time:01-09

I have a Larave project that is saved in Github. It's the main branch called core.

I have a client who needs this project with some modifications. So I have cloned this repo and set it up for the client. Then I made changes for him. Changes include - some extra DB fields, Different invoice formats, and some logic changes.

We often have bug fixes and changes on the core repository. How do I bring down these changes to the client's project without overriding the customization we have done for the client?

I looked at branching and merging. But this is not going to be merged. As changes I have done for client are not temporary. It's permanent. We are going to have multiple versions of the project running simultaneously. But when we make updates in the core. We want it to cascade down. What structure should I be using?

I tried Pull but it overrides what I have done for the client.

CodePudding user response:

This is not—or should not be, at least—a question about Git, but rather a question about how to structure your own system to support configurable clients. But since you asked about Git instead, let's look at what Git does with git merge. (Also, questions about how to structure software are generally "too big" and unfocused for StackOverflow; consider one of the sister sites such as SoftwareEngineering.)

Remember that git pull literally means:

  1. run git fetch, then
  2. run a second Git command, by default git merge.

The first step—git fetch—obtains new commits from some other Git repository: in this case, bug fixes and changes in some repository you control or use. The second step—you may choose git rebase instead of git merge, but the goal of the second step is the same either way—has, as its goal, to make use of the new commits obtained in the first step. We do this by combining work.

The git merge command is the primary way to combine the work done in multiple different strings-of-commits. The git rebase command instead does repeated cherry-picking of existing commits with the goal of "improving" each commit along the way; each cherry-pick is a special form of merge, so in the end you must still understand merging. Hence the correct introduction at this point is the merge. If you aren't already intimately familiar with what a Git commit is and does for you, you should go read up on that now.

Now, given that we have some sets of commits that form a graph, we can very easily get into a situation like this one:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

That is, we're "on" branch branch1, using commit J. Someone else has provided us commits K-L which we're now finding using the name branch2 (or perhaps origin/branch2, but I've drawn this as branch2 for simplicity).

While each commit has a full snapshot of every file, the two branches obviously converge at commit H. That is, as we work backwards in time from our currently-latest commit J, we step to commit I, then H, then G, and so on. Meanwhile, if we work backwards in time from their latest commit L, we step to commit K, then H, then G, and so on. This means that commit H is shared—it's on both branches, along with all of its ancestors—and we can easily choose it as the best such shared commit, since it is by definition the latest shared commit (all other shared commits are necessarily earlier than H).

Git's merge operation will find commit H—which Git calls the merge base—automatically, using the commit graph that is formed by the existence of the commits. All we have to do is tell Git:

  • look at our current commit H;
  • look at commit L;
  • find the merge base, and begin the merge process.

We do this by running git merge branch2 or git merge hash-of-L. Anything that allows Git to locate commit L suffices here: we do not have to use a branch name. We usually do use a branch name because that's easiest for us, but all Git needs is a way to find L: it works out the rest on its own.

How Git performs a merge operation

Having found the merge base commit H, Git is now ready to do the merge operation. This consists of:

  • comparing every file in H to every file in J, to see what we changed in each file, if anything;
  • comparing every file in H to every file in L, to see what they changed in each file, if anything;
  • and while Git is at it, figuring out if we renamed, added, or deleted any entire files, and the same for them. Usually all three commits have the same set of files so that this extra wrinkle does not cause any heartburn, and for this particular answer, we'll just ignore the possibility.

For many files in many merges, nobody changed anything. This makes merging the file trivial: any of the three versions in the three commits will do, since all three are identical. (Git's automatic file de-duplication comes in very handy here: Git knows instantly whether the files are duplicates across two or three of the commits.)

For other files in many merges, either we changed something, or they changed something, but if we changed something they didn't touch that file, and vice versa. Again, this makes merging that file trivial: Git just needs to take whichever one changed. However, we can view this as a special case of the third and most complicated case.

Last, we have the third and most complicated case: both we and they made changes to the same file. What Git does for this case is straightforward and not clever at all: Git simply combines the changes. If we deleted line 3 and they didn't do anything to line 3, Git will delete line 3. If they added a line between lines 10 and 11 and we didn't, Git will take their added line. Git repeats this process for every modification—line by line, because Git's internal git diff works on a line-by-line basis.1 As long as the changes we make to some line(s) do not touch or overlap the changes they make to other line(s), Git is able to do this line-by-line work by itself, and does so.

If Git is able to resolve all files on its own, Git will normally go on to make a new merge commit on its own as well. A merge commit is the same as any other commit: it has a unique hash ID and contains a snapshot—a copy of every file, as of the form it should have when extracted later—and some metadata. The only thing special about a merge commit is that instead of one parent commit hash ID, the metadata list of parent commits lists two parents.2 We can draw that here as:

          I--J
         /    \
...--G--H      M   <-- branch1 (HEAD)
         \    /
          K--L   <-- branch2

Note that, as usual, the current branchbranch1—now points to the new commit, and the new commit M points back to the commit you were on a moment ago, commit J, as usual. What's different about M is that it has a second parent L, indicating that this commit joins two histories: one that results from starting at J and working backwards, and one that result from starting at L and working backwards.


1Note that this means that Git is utterly unable to combine changes to binary files. If you have a conflict in a binary file, Git will refuse to help you out here.

2Technically, this is two or more, with "more" producing what Git calls an octopus merge. Octopus merges don't do anything you cannot do with ordinary merges. (In fact, except for joining up multiple branches, they do less than you can do with an ordinary merge, which is their ultimate value proposition: if you see an octopus merge in a Git history, you know that, despite the many inputs, the merge itself was simple—or as simple as it could be based on the number of inputs.)


Merge conflicts

Sometimes we and they make different changes to the same line. For instance, a line might read:

the red ball

in some file in the merge base commit H. We change this to:

the blue ball

but they change it to:

the red cube

Git has no idea how to combine these two changes. If the result should be "the blue cube", you will have to make that change yourself. Git also declares a conflict if we change two lines that "touch", even though in some cases this might not be necessary. This is based on years of experience with merge algorithms: this seems to produce the result most humans find the most pleasing, or at least, has done so historically.

In any case, Git will now take the combined changes—plus any conflicts—and apply the combined changes to the file from the merge base. That way, Git keeps our changes and adds theirs, or, depending on your point of view, keeps their changes and adds ours. (The result is the same either way.) If Git encountered nothing it declared as a merge conflict, it goes on to arrange for the combined-changes file to go into the next commit. Otherwise, Git leaves behind a mess:

  • Git's index (see this answer for more about the index AKA staging area) will contain all three versions of the file, from the merge base, the HEAD or --ours commit, and the other or --theirs commit;
  • the working tree copy of the file will contain Git's best effort at combining the changes, including conflict markers.

Your job is to come up with the correct combined file—you can do this in any way that pleases you—and then to adjust Git's index to hold the correct copy of the file. Git doesn't need the working tree copy at all, but git add tells Git: make the index version of the file match the working tree version, with the side effect of deleting from the index the extra versions that prevent committing. Most people thus mostly find it easiest to fix up the working tree copy and then run git add, or to use git mergetool, which is a command that:

  1. runs some tool of your choice to do the "fix up working tree file" step, then
  2. runs git add for you.

Note that this is really all that git mergetool does, so git mergetool does not add a lot of value. However, the process of extracting all three input files—merge base, --ours, and --theirs—is a bit tedious, and before git mergetool runs your choice of merge tool (vimdiff, kdiff3, Beyond Compare, or whatever else you may like), it automates this part of the job.

Once we've resolved all conflicts, we tell Git to finish the merge, by running either git merge --continue or git commit. Git then goes on to make merge commit M as usual.

Conclusion for real merges

In any case, we now have a complete overview of what git merge does for the case where we start with:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

Git will use HEAD to locate commit J, the argument to git merge to locate commit L, and the commit graph to locate the merge base commit H. Git will then diff the snapshot in H against those in J and L as needed to find changes, combine the changes, and apply the combined changes to the files from the merge base H. If the combining goes smoothly, Git will make the resulting merge commit M on its own. If not, Git will stop in the middle of the merge, force us to finish the merge—we cannot proceed without either finishing or aborting the merge, due to the messed-up state of Git's index3—and when we do finish the merge we get the same merge commit M, this time with human intervention.


3This is a real problem. There is no proper way to deliver a partial merge to someone else, making collaborative merging difficult. Fortunately, with most smaller merges, one person can do the job.


A special case

As a special case, consider what happens if we're in the following situation:

...--G--H   <-- branch1 (HEAD)
         \
          I--J   <-- branch2

Suppose we now run git merge branch2. If Git were to follow the usual rules for merges, it would:

  • locate commits H and J as ours and theirs;
  • locate the common merge base, which is commit H again;
  • diff H vs H to see what we changed;
  • diff H vs J to see what they changed;
  • combine these changes; and
  • make a new merge commit.

The result would look like this:

...--G--H------M   <-- branch1 (HEAD)
         \    /
          I--J   <-- branch2

where I've used the letter M again to stand for "merge". But: what's in the snapshot in commit M? We had Git diff commit H vs commit H to see what we changed, and by definition, if we compare H against it self, nothing changed. So Git combines our "nothing" with whatever they did—presumably, something—and the resulting files necessarily exactly match all the files in commit L.

One might legitimately wonder: Why bother? And in fact, by default, git merge does not bother. It detects that commit H is commit H and that there's therefore nothing of ours to carry forward. Instead of merging, Git does a fast-forward operation (which git merge chooses, rather cheekily, to call a "fast-forward merge" even though nothing is merged). Instead of merging, then, Git just drags the branch name branch1 "forward", like this:

...--G--H
         \
          I--J   <-- branch1 (HEAD), branch2

There's no need for the kink in the drawing any more, so we can draw this graph like this now:

...--G--H--I--J   <-- branch1 (HEAD), branch2

The fast-forward non-merge "merge" is now complete, by virtue of simply making both branch names point to commit J.

Sometimes there is nothing to do

Suppose that we have a graph that looks like this:

...--G--H   <-- branch2
         \
          I--J   <-- branch1 (HEAD)

That is, this is like the fast-forward case, except we're on the later commit J, rather than the earlier commit H. If we run git merge branch2 now, Git will say "already up to date" and quit. There's literally nothing to do: commit H is already part of our history at commit J.

These two special cases work by finding the merge base as usual: if the merge base is one of the two end-point commits, we have a special case. The special case is either "nothing to do" (the merge base is the other commit) or "fast-forward" (the merge base is not the other commit, but is our commit). So Git will always find the merge base.

The last special case

There's a final special case, which with any luck you will never encounter. Suppose we have a graph like this one:

...--o--A---M1--o--L   <-- branch1 (HEAD)
         \ /
          X
         / \
...--o--B---M2--o--R   <-- branch2

where o represents a commit (or any number of commits) that aren't interesting, and the two M commits are two merges whose input commits are A and B (plus some merge base, not shown here, that Git found automatically).

If we run git merge branch2 now to combine work in L and R, commits A and B are both "equally good" commits as merge base candidates. Both commits are on both branches and neither one is "further away from the end" (and if we use the usual lowest common ancestor algorithm, both commit hash IDs will come out of it, in an undetermined order).

Git has multiple ways to deal with this, but the default strategy pre-Git-2.34 is to merge the merge bases A and B to produce a temporary commit, then use the temporary commit as the merge base to merge L and R. In Git 2.34, a new algorithm tries to do the same thing as merge-recursive, but without as much craziness and wasted effort.4 I have not yet studied the new algorithm myself and hence will not attempt to explain it here.


4"Normal", non-recursive merges occur mostly in Git's index, with the working tree files occasionally used for scratch storage. The pre-2.34 merge-recursive code performs each inner merge using the same merge-recursive code and literally makes a commit—or at least a tree object—from the result, adding to the index files containing conflict markers if needed. This gives the "outer" merge a proper merge base but means that the merge conflicts propagate forward, and it means that sites like GitHub cannot use this code. The new merge-ort code does the entire merge in memory and in Git's index, without using scratch files, and—as I understand things—handles the recursion directly as well, enabling several new features and having as a goal the ability to use this code with hosting sites like GitHub.

  •  Tags:  
  • Related