A report on secret leakage in source code claims that the problem is worse than ever, with 12.8 million new secrets leaked in GitHub public repositories in 2023, a 28 percent increase on the previous year.
The report is from GitGuardian, a company which specialises in detecting secrets in source code, and is based on scans of activity on public GitHub repositories, including source code, issues, gists and comments.
Putting this in context, GitHub added 300 million public repositories in 2023, representing 22 percent growth. GitGuardian sends email alerts when exposed secrets are found, but despite this, 90 percent of secrets “remain active for at least five days,” according to the report. Sometimes commits are erased or repositories made private in response, but as the report authors note, this is a poor substitute for revoking a token or changing a password.
The Python repository PyPi was also investigated, showing that more than 11,000 unique secrets were exposed in packages.
Leaked secrets in 2023 include upwards of 1 million Google API secrets, 250,000 Google Cloud secrets, and 140,000 AWS secrets. The problem continues even though it is well known that hard-coding secrets is poor coding practice.
Analysing leaks by industry, the researchers reckon that 65.9 percent of affected repositories are from the IT sector, and 14.0 percent from education.
The surging interest in AI corresponds with a huge increase in the number of leaked API keys, led by OpenAI API keys – up by 1,212 times – and Hugging Face user access tokens.
What is GitHub doing to address the issue? An official post last month confirmed the extent of the issue. “In the just the first eight weeks of 2024, GitHub has detected over 1 million leaked secrets on public repositories. That’s more than a dozen accidental leaks every minute,” the company said.
The main action GitHub is taking is by a feature called push protection, which blocks commits if they contain secrets. In August 2023, this was opt-in for all GitHub cloud users. It is now being enabled by default for all pushes to public repositories. The service only covers supported secrets. Push protection is also part of GitHub Advanced Security, a paid service.
It seems obvious though that developers struggle with secret management. Detection of secrets before code is committed is useful but does not address changing coding practice so that secrets are not included in the first place. Credential management is a complex problem, and unless it is both effective and usable coders will continue to be tempted to hard-code them, perhaps with the best intention of fixing up the problem later, an intention that can easily be forgotten when under pressure to deliver.