LF AI & Data gets new incubator projects to simplify distributed learning

distributed learning

Linux Foundation subsidiary LF AI & Data Foundation has welcomed projects Substra and TonY into its incubator, increasing the number of projects working towards graduation to 21.

Substra, a framework for distributed orchestration of machine learning tasks, already has some experience at being a foundation project, since its creators at medical research company Owkin tried their hand at fostering the tool in a dedicated foundation since 2017. Eric Boniface, General Manager of the Substra Foundation, sees rehoming as a logical next step and “the perfect next chapter for the Substra project, its community, and many more privacy-preserving federated learning use cases to come”.

Substra is designed to be used by data scientists to build federated learning strategies that use several remote datasets. At the same time, data owners are promised the chance to share their data for training or evaluation purposes in a way that prevents it from being viewed or downloaded, which is useful when developing models for privacy-sensitive use cases.

TonY helps with those as well by bringing distributed training to big data platform Apache Hadoop and thus speeding up the process when especially large data sets are involved. The deep learning project was initially developed at LinkedIn and was shared with a broader audience in 2018. Although the project hasn’t made it to its first major release yet, it has since been integrated with Google Cloud and saw adoption by Chinese online video platform iQIYI. 

The development team behind iQIYI has in turn taken to help TonY implement support for distributed deep learning training framework and fellow LF AI & Data project Horovod. Fitted with a Horovod driver, TonY now allows running Horovod in YARN, which means users don’t have to set up a Kubernetes cluster for training purposes anymore.

The LF AI & Data Foundation was created in 2020 through a merger of the LF AI Foundation with big data non-profit ODPi. Its aim is “to raise, budget and spend funds in support of various artificial intelligence, machine learning, and data-related open source projects”. Horovod aside, machine learning interoperability standard ONNX is probably the most well-known project fostered by the initiative. Its most recent graduate project is vector database Milvus, which reached the top foundation tier in June.

While a promotion to graduate level is accompanied by eligibility for financial support, the incubation stage mainly offers help fostering adoption and contributions to projects aligned with the foundation’s mission. Incubating projects are regularly evaluated by the foundation’s technical advisory council (TAC) and are expected to be ready for the next stage within 18 months at latest. If they don’t seem fit after that time, the TAC must decide if it adjusts its helping measures or moves the project to the so-called Emeritus stage which signals its end of life.