Machine learning and data science projects and increasingly active developer communities outside the US were some of the big drivers behind GitHub’s growth over the last year.
The repo platform’s latest State of the Octoverse report puts the number of developers using the platform at more than 40 million, ten million of whom joined in the last year alone. It claims that the number of users creating their first repository was up by 44 percent in 2019 compared with the previous year.
The statistics also highlight the global distribution of GitHub users, with almost 80 percent of developers accessing the site now based outside the United States, and some contributions even being uploaded from Antarctica. Outside of the US, open source use (as measured by clones and forks) was strongest in China, with India, Germany and the UK some way behind. Developers in China are said to have forked and cloned 48 percent more projects this year than last year.
In terms of where GitHub is expanding fastest, Nigeria, Iran and Kenya showed the highest growth in open source projects created in public repositories, while Hong Kong, Singapore and Japan were the territories with the greatest increase in contributors.
One interesting set of statistics shows the popularity of programming languages with repository contributors over the past five years. Javascript has consistently been top dog, while Python and C# are currently on the rise and Java and C++ are declining in popularity. The number of developers using Ruby, meanwhile, seems to have plummeted since 2015.
According to GitHub, Python’s growth is partly driven by a rapidly expanding community of data science professionals, thanks to the many core data science packages powered by this language.
In addition, repositories with topics like “deep learning”, “natural language processing” and “machine learning” have all become more popular recently. Another apparent giveaway sign is that the use of Jupyter Notebooks has seen more than 100 percent growth year-on-year for the last three years.
Similarly, the volume of contributions to TensorFlow illustrates the growth in data science. Almost ten thousand developers have contributed to TensorFlow in the last year, over two thousand of whom have made code commits in the last year, while 25,000 have contributed to TensorFlow dependencies in the last year.
According to GitHub, projects are also becoming more connected and dependent on other projects. Each public and private repository on GitHub now depends on more than 200 packages on average, while the most depended-upon packages supported more than 3.6 million other repositories this year. Whether this growing inter-dependency is a cause for concern is another matter.