Scikit-learn celebrates the big 1.0 with keyword arguments and online one-class SVMs

Scikit-learn celebrates the big 1.0 with keyword arguments and online one-class SVMs

After a good 14 years of development, the team behind scikit-learn has released version 1.0 of the Python machine learning library, signaling the open-source project’s stability and adjusting things to make it more straightforward to use.

For its first major release, the maintainer’s focus was mainly on stabilisation as well as some enhancements meant to help in more complex scenarios. When preprocessing data for instance, users can now generate polynomial features by using spline transformers, which was previously only possible through less flexible (and often less stable) pure polynomials. 

The library also gained an online one-class SVM implementation that uses stochastic gradient descent, which could be useful for cases in which large numbers of training samples are needed for fitting linear classifiers. Data scientists interested in predicting intervals instead of data points should meanwhile take a look at the newly added quantile regressors.

Another new addition means all estimators now set a feature_names_in_ attribute that includes the feature name when fitted on pandas Dataframes. If a name is non-consistent with non-fit methods, scikit-learn will raise a warning.

To make metrics visible, scikit-learn includes a plotting API, that now sports additional class methods from_estimator and from_predictions in metrics.ConfusionMatrixDisplay, metrics.PrecisionRecallDisplay, metrics.DetCurveDisplay, and inspection.PartialDependenceDisplay for creating plots based on estimators and predictions.

Since the number of parameters available for many functions made scikit-learn code tricky to read (and write without looking into the documentation) at times, its developers decided to deprecate positional arguments in version 0.23. Starting with v1.0, the library will raise a TypeError if constructor and function parameters aren’t provided with their names. The release also saw histogram-based Gradient Boosting Models maturing from their experimental status, meaning they can be imported and used like regular models.

Developers who are afraid of the breaking changes major releases often bring about can breathe easy, as v1.0 promises to be comparatively straightforward to upgrade to. However, changes in manifold.TSNE, manifold.Isomap, and the splitting criterion of tree.DecisionTreeClassifier and tree.DecisionTreeRegressor may lead to slightly different models than before, as well as the switch to keyword-only arguments might make modifications necessary.

Long-term planners should also check out the deprecations in v1.0 to give them enough time to prepare for their removal in version 1.2. The scikit-learn team for instance worked hard to unify the use of squared and absolute errors through criterion and loss parameters, making squared_error and absolute_error the default and deprecating old option names.

Other deprecations include np-matrix, get_feature_names in the Transformer API, cluster.Birch attributes, fit_ and partial_fit_, grid_scores_ in  feature_selection.RFECV, the normalize parameter of linear_model.LinearRegression, as well as utils._testing.assert_warns and utils._testing.assert_warns_message. Details are available via the project’s changelog.

Scikit-learn is licensed under a BSD-3-Clause License and can be found on GitHub. The project started out as a summer of code project in 2017 and has been driven by a group of INRIA scientists since 2010 which makes the project stand out amongst other popular machine learning projects that are mainly developed by large corporations.