Pandas hits 1.0, opening up to new engines and dtype experiments

AI/ML

By Julia Schmidt

January 30, 2020

Pandas hits 1.0, opening up to new engines and dtype experiments

After almost 12 years of work, the team behind data analysis and manipulation library pandas is finally able to celebrate their first major release.

Pandas comes in the form of a Python package and aims to be “the fundamental high-level building block for doing practical, real world data analysis” in that same language. It provides flexible data structures that help in dealing with relational and labeled data, which has made the library quite popular amongst machine learning practitioners.

Version 1.0 splashes out on new features, providing devs with a function to convert data frames into markdown tables, for example. It also adds an engine keyword to rolling.apply and expanding.apply, so that users can choose Numba instead of Cython.

This is meant to speed up the process for larger data sets for example – but only after the first time the function is run using the engine. On the first try, the process is bound to produce some compilation overhead, but since the function will then be cached, following calls will get fast results. Another addition to rolling operations is a pandas.api.indexers.BaseIndexer() class. Analysts can use it to define how start and end indices for a window are created, if a custom approach is needed.

Those interested to see in which direction the library is going, can take a look at the experimental features that made it into the 1.0 release. Amongst other things, it includes a pd.NA singleton, which can be used as an indicator for missing data across types (as opposed to datatime-like or object-dtype data only, which pd.nan and pd.NaT expect).

On top of that, there’s an experimental StringDtype, extending string data to tackle some issues with object-dtype NumPy arrays. Once the details are figured out, the string extension type will prevent the accidental mixing of strings and non-strings in such arrays, help select just text for certain operations and clarify contents during reading. New methods like DataFrame.convert_dtypes() and Series.Convert_dtypes are meant to encourage the new dtypes use.

Devs who made use of older pandas versions are recommended to upgrade to pandas 0.25 to see if their code runs without warnings before making the leap to 1.0, only because the team has removed a lot of deprecated features.

Starting with the current release, pandas also switches to a variant of semantic versioning for their release. This largely means that API-breaking changes will only be part of major releases (2.0.0, 3.0.0, …), experimental features aside. Meanwhile deprecations will be introduced in minor releases (1.1.0, 1.2.0, …) and enforced in major ones.

Sourcegraph coding assistant now supports Anthropic Claude 3 – though limited to 7K token input

Supabase moves out of beta, adds supports for Swift, plugs in Oriole storage engine

Go dev survey shows frustration with Python’s dominance of AI

AI coding: Hugging Face engineer extols benefits of open source models, but hard questions remain

.NET Smart Components experiment the "Visual Basic" of AI programming?

GitHub autofix progresses to public beta: insecure code corrected with AI, but only for enterprise

JetBrains bows to user pressure and unbundles AI Assistant in new IntelliJ IDEA beta

Hands On: Netlify AI-assisted deployment aims to reduce log-diving

Stack Overflow turns to Google for hosting and AI features, trusts in Gemini for tech answers

Employing your cloud data warehouse to scale up AI/ML

Rust-based Zed editor now open source – with built-in support for OpenAI and GitHub Copilot

AI assistance is leading to lower code quality, claim researchers

Pandas hits 1.0, opening up to new engines and dtype experiments

ABOUT US

FOLLOW US