Interview AWS has introduced Mountpoint, an open source client for Linux that connects to S3 (Simple Storage Service) using file APIs, enabling applications to traverse S3 files as if they are in the file system. It is a specialized client aimed at data analytics, and not designed for general-purpose use. According to advance information from AWS, “with Mountpoint, file operations map to GET and PUT operations against S3, allowing scaleable file-based applications to burst to terabits per second of aggregate throughput, without any code changes.”
There are several limitations. The preview is not production-ready, and files are currently mounted read-only. Write support will be added before general availability, but only sequential writes to new objects.
An AWS paper on Mountpoint explained that Mountpoint deliberately does not offer a “full featured file system or POSIX compatibility.” The reason is that file systems have “a surprising number of features that don’t overlap with object storage,” including operations like the ability to mutate the contents of files in place, and OS-managed permissions. Mountpoint was therefore designed to optimize performance and avoid any operations that could not be implemented efficiently with S3 APIs. The ideal use case is “data lake applications doing scale-out analytics on large datasets,” the paper said.
That said, Mountpoint is open source, built using the Rust programming language, and the paper acknowledged that early customers are interested in “helping to evolve it to provide richer functionality.”
Mountpoint respects S3 permissions and access policy, and therefore needs AWS credentials. One possibility is to attach an IAM (Identity and Access Management) role to an EC2 (Elastic Compute Cloud) instance, in which case credentials can be applied automatically.
Why has AWS created its own file system client, when many third-party clients that do this already exist? Examples include S3FS-FUSE which supports Linux, macOS and FreeBSD, the commercial ObjectiveFS system, and Rclone for Windows.
“Customers are looking for better performance, better stability, and AWS support for those connectors,” Kevin Miller, VP and GM for S3 told DevClass. “We took a look at all the connectors and decided it would be best to build a connector from the ground up. We are building it on top of the AWS Common Runtime, which is the library that underpins our SDKs … we are also writing in the Rust programming language which gets us benefits of type checking and other built-in quality features without sacrificing native code performance.” Miller added that Mountpoint benefits from “automated reasoning … for validating correctness of things like S3 strong consistency.”
Apparently AWS is so pleased with how Mountpoint is turning out that the code will be “a visible window into what we have seen over the last 17 years as best practices for engineering software that meets the customers’ reliability bar at scale,” Miller told us.
Alongside Mountpoint, AWS has introduced six other new features for S3, to mark the 17th anniversary of S3’s general availability – it was launched on 14th March (“Pi day”) 2006. The new features are:
- You can now use an S3 Object Lambda Access Point alias as an origin for a CloudFront CDN deployment.
- S3 Multi-Region Access Points can now support replication that spans multiple AWS accounts.
- Private DNS options for VPC (virtual private cloud) endpoints improve routing from on-premises networks to S3 endpoints.
- S3 replication is now supported on AWS Outposts, local on-premises installations of AWS services.
- AWS OpenSearch has introduced security analytics including S3 support. This “delivers a rules engine with more than 2200 rules to detect threats from popular log types including Windows, AWS CloudTrail, S3 Access Logs, Active Directory, LDAP, Windows and Linux systems logs,” AWS told DevClass.
- AWS Data Exchange, a marketplace of third-party data files, is now generally available.
AWS told us that S3 today stores more than 280 trillion objects and receives over 100 million requests on average. AWS still supports the original S3 API but has added many features since its first launch.
Security has been an issue at times because of mis-configured buckets but this is being addressed, Miller told us. “This year we are changing the defaults for new buckets. So buckets will now have our ‘block public access’ feature turned on by default.” In addition, all new objects are encrypted by default.
What does AWS think about the S3 API becoming something of an industry standard? For example, OpenStack emulates the S3 REST API on top of its object storage.
“It’s a big testament to the value and the utility of the API that others replicate it,” Miller told us, but added that there are “things like the enhanced checksum support and other things we’ve added over the years … when someone says it’s compatible, it’s compatible on some dimensions but doesn’t provide things that we see as essential today.”
Would AWS ever consider making the S3 API or part of it an official standard? “If that was really relevant or critical for customers we would, but there’s plenty of other things that we’re focused on,” said Miller.