Skip to content

Our GitHub Practices

GitHub is the home of our machine learning codebase, and our primary home for collaboration between machine learning engineers. Our processes, particularly continuous integration and delivery, are heavily integrated with GitHub. The way that we interact with GitHub is, for the most part, quite well defined, and this article discusses those processes.

The Experiment Lifecycle and GitHub

Writing an Experiment Epic

Our experiment epic template covers the majority of the points to be made on this topic. Here we'll just highlight a few of the key points.

  • The phases should exactly follow the steps of the machine learning lifecycle.
    • Since the experimentation phase may not result in any code being contributed, it is simply closed with a comment on the issue noting the general direction of the experiment.
    • Beginning with the documentation step, each step has tangible outcomes that should result in pull requests.
  • The success criteria should be pulled straight from the service specification that was worked on in collaboration with product.
  • Time boxing the experimentation step is encouraged, to avoid rabbit holes.
    • As discussed in a previous article, the time alloted for the first post-benchmark version should be shorter than the rest.

Making Epic Issues

Each phase of the epic is then broken out into its own issue. The content should be a cut-and-paste of the corresponding Phase from the epic, and the checklist items should be checked off as they are completed. The title should be formatted like so:

[Service] [MajorVersion MinorVersion] Step name

For example:

[Housing] [V1 M1] Training Module

All issues should be tagged "Platform & Productivity" using the GitHub label, for visibility.

Dividing Ops and Modeling Work

Ops work -- changes to Paystone packages, infrastructure code, the CLI, documentation, or auxiliary services -- should be tagged with the "MLOps" Github label for separation.

Ops epics have no template currently, and they of course do not follow the same phases as machine learning experiments. Given their much broader classification, a unified definition of phases is unlikely to ever exist.

Conventions

Branch Naming

Branches should be named after the issues they are opened to work on. For example, a branch for issue #999 would be titled issue-999.

This naming convention encourages engineers to work on exactly one issue at a time, and to avoid ever beginning work without opening a related issue.

Pull Request Naming and Body

Pull requests should be named after the issue which they are closing. For example, a pull request opened to merge a branch named issue-999 into master would be called Issue 999.

The first line of the body of any pull request should be "Closes #[issue number]".

The rest of the body should be a point-form explanation of the changes made in the pull request, and any notes for the reviewer. There is no template for this.

Use of Draft PRs

Pull requests should be opened in draft mode by default, and only marked "ready for review" once all of the CI checks have passed. This avoids potential wasted time as reviewers begin reviewing code that turns out to not be passing all tests.

Draft PRs should be opened only once all of the checklist items of the associated issue, if there are any, are completed.