AWS open-sourced its PartiQL query language to provide a way to query data across formats and services.
The project was started as a reaction to the spreading of data across “relational databases, non-relational data stores, and data lakes” companies see today. Since most data stores come with their own query language, transforming data and moving it to another platform can get quite complicated and may require changes in applications and queries.
With the amount of tabular, nested, and semi-structured data that can be found across Amazon’s retail business and AWS services, the company needed a way to solve those issues and started work on PartiQL. The language’s design goals included SQL compatibility to keep SQL queries intact, first-class support for nested data, optional schema and query stability, minimal extensions over SQL, format independence, and data store independence as key development goals.
The now released outcome separates a query’s syntax and semantics from the data source and format, so that users can query data no matter how or where they are stored. A first reference implementation written in Kotlin, JetBrains’ language for the JVM, a specification document, and a PartiQL tutorial are available now under the Apache 2 license.
The implementation is compatible with SQL-92 and includes additions to deal with schemaless hierarchical data. It comes with an embeddable reference interpreter, a test framework, and tests and should allow devs to analyse PartiQL queries and use them in their own apps.
At this stage, PartiQL is still a preview and shouldn’t be used in production (although it is already used in AWS projects, as the introductory post points out). Although the language is mostly stable, the project repo implies that the interpreter API will see a lot of changes in the next few months, so a bit of patience is advised if you don’t want to offer your support to the project.
In the past couple of days AWS also announced improvements to the Amazon SageMaker Python SDK – not strictly data storage related, but you need lots of data for machine learning, so there you go. The kit was finally fitted with an integration for version management system Git, which means users can access training scripts stored in Git repositories directly. Previously scripts to train models had to be downloaded beforehand.