Blog

How to use GitHub Copilot's new Content Exclusions feature to keep sensitive information out of AI-generated code

How to use GitHub's new feature, Content Exclusions, to make sure that GitHub Copilot doesn't index files in your repos that contain confidential information (like secrets) or other sensitive information.

Author: Megan Bruce
/
4 mins read
/
Mar 11, 2024
AI security

GitHub Copilot, GitHub’s AI coding assistant, has become a critical tool for developers to code more efficiently. But it’s important to put in place some key security practices to keep AI-generated code and your business safe. 

One best practice is to vet the external dependencies that AI-generated code recommends, to make sure they’re safe and are being actively maintained. You can use GitHub’s built-in code scanning tool or a supported 3P scanning tool like Trivy to check for vulnerabilities. You can also check trustypkg.dev for additional supply chain risks, like activity levels and proof of origin.

Another best practice is to enable secret scanning. GitHub recently made the change to turn on secret push protection by default for all users, to check pushes for supported secrets and make sure they aren't accidentally leaked to public repos. However, push protection only blocks leaked secrets that meet GitHub's most identifiable user-alerted patterns, meaning that even with this enabled, there's still a risk that secrets could be leaked. This risk becomes even greater when you're relying on AI-generated code, which could be referencing indexed files that contain secrets.

Fortunately, GitHub recently rolled out a new feature that allows you to specify which files you don't want Copilot to index. This new feature, currently in public beta, is called Content Exclusions.

Overview: What are Content Exclusions?

Because GitHub Copilot works by indexing all of the files in your organization’s repos to inform code suggestions, it may index code repos that contain confidential or sensitive information. The Content Exclusions feature allows you to exclude files in your repos, so that those files aren’t indexed by GitHub Copilot and are not referenced in its code suggestions. 

Content Exclusions works by allowing you to specify paths to excluded content in the settings for your repo or organization. Per GitHub, when you specify these paths, it means that:

  • The content of those files won’t be indexed and referenced in AI-generated code completion suggestions

  • Code completion won’t be available in those specified files

Configuring Content Exclusions at the repo and org levels

At the individual repo level, you can specify which files in that repo that Copilot should exclude by navigating to Settings > Code & automation > Copilot. You can enter the paths to exclude in that repo—for example:

Image credit: GitHub

At the organization level, you can exclude files in any GitHub repo that use these syntaxes:

Unset
http[s]://host.xz[:port]/path/to/repo.git/
git://host.xz[:port]/path/to/repo.git/
[user@]host.xz:path/to/repo.git/
ssh://[user@]host.xz[:port]/path/to/repo.git/

You can set this up at the org level by navigating to Settings > Copilot > Content exclusion, and entering the details in the “Repositories and paths to exclude” box.

Example of content exclusions at the repo level:

Unset
# Ignore all `.env` files at any path, in any repository.
# This setting applies to all repositories, not just to those on GitHub.com.
# This could also have been written on a single line as:
#
# "*": ["**/.env"]
"*":
  - "**/.env"

# In the `primer/react` repository on GitHub:
https://github.com/primer/react.git:
  # Ignore files called `secrets.json` anywhere in this repository.
  - "secrets.json"
  # Ignore files called `temp.rb` in or below the `/src` directory.
  - "/src/**/temp.rb"

# In the `copilot` repository of any GitHub organization:
git@github.com:*/copilot:
  # Ignore any files in or below the `/__tests__` directory.
  - "/__tests__/**"
  # Ignore any files in the `/scripts` directory.
  - "/scripts/*"

Content Exclusions access and limitations

As mentioned above, Content Exclusions is still in public beta. Right now, it’s available only for GitHub Copilot Business and Enterprise subscribers, and only for users who have seats for those licenses. This means that if you have users who can access those files and don’t have one of those seats, those content exclusions won’t apply to them, and references to those files may show up in their code suggestions. 

At Stacklok, we’ve been excited about the productivity promise of GitHub Copilot, while being wary of security risks that it (and other AI coding assistants) could introduce. Content Exclusions represents a strong step forward for GitHub in protecting particularly their enterprise users who need assurance that Copilot won’t expose sensitive information or secrets that could put them at risk for supply chain attacks and breaches. 

Check out GitHub’s documentation for more info and details on how to use this feature. 

To keep your Copilot-enabled repos safe, you can also use Minder, Stacklok’s open source software supply chain security platform. Minder can apply and continuously enforce GitHub Advanced Security settings in those repos, like secret scanning, code scanning, and Dependabot configuration.

Learn more about Minder here. 

Megan Bruce

Director of Product Marketing - Stacklok