Michael Tsai – Blog – The Future of Swift Serialization and Deserialization APIs


Kevin Perry:

It’s clear from community adoption and feedback that Codable has had a lot of success in the years since it was added to Swift 4, but that it doesn’t satisfy some important needs. One of the foremost of those needs is performance more in line with programming environments that compete with Swift. As such, the main goal for this effort is to unlock higher levels of performance during both serialization and deserialization without sacrificing the ease of use that Codable provides.

[…]

Even with all of its strengths, the existing API’s design has some unavoidable performance penalties. For instance, its use of existentials implies additional runtime and memory costs as existential values are boxed, unboxed, retained, released, and dynamic dispatch is performed.

Also, because a client can decode dictionary values in arbitrary orders, a KeyedDecodingContainer is effectively required to proactively parse the payload into some kind of intermediate representation, necessitating allocations for internal temporary dictionaries, and String values.

[…]

In Swift, when a client needs to do more than just alter the default CodingKey representations, developers are often faced with a large cliff where they’re forced to manually replicate the whole Codable implementation just to do so.

[…]

In this new design I aim to leverage Swift’s macro features to meet or exceed Serde’s level of support for customization of synthesized conformances. Moving code synthesis from the compiler to a macro will enable us to use attribute-like macros as targeted customization mechanisms, which was not something we could easily accomplish with the compiler-based Codable synthesis.

[…]

There is no encode(_: Date) function present in the Encoder interface, which means PropertyListEncoder has to attempt to dynamically cast every some Encodable type it receives to Date in order to handle these natively. This helps keep the Encodable type format-agnostic, but it has a negative impact on performance, even if you never actually encode any Dates.

I believe that fully and formally embracing format-specialization where appropriate is the best solution to this problem. Specifically, we should encourage each serialization format that has native support for data types that aren’t represented in the format-agnostic interface to produce its own protocol variant that includes explicit support for these types, e.g. JSONCodable or PropertyListCodable.

Dave DeLong (Mastodon):

One of the big flaws of Codable is that it was built on the wrong abstraction. 99.9% of the time, developers who are interested in serializing a struct to data and back are doing so to a single, well-known format. However, the Codable API was built so that the abstraction point is the encoder itself, under the assumption that you would want to serialize a type to multiple formats. This is not the case.

That design flaw has been the #1 source of Codable’s woes. It makes properly implementing custom coders almost impossible; no one implements superEncoder properly, since most people don’t deal with inheritance of reference types, and some formats are fundamentally incompatible with the Encoder/Decoder APIs. (XML and CSV are two that spring to mind off the top of my head)

[…]

IMO we should be encouraging packages that provide format-specific coders (JSONCodable, PlistCodable, CSVCodable, XMLCodable, etc) so that each encoder and decoder can provide format-specific functionality. Then we should provide a system level API to ask types to encode into an opaque format (ie “please turn yourself into a Data and back again”).

[…]

Foundation should provide an updated replacement for NSCoding and leave the type-specific encoders to type-specific packages to implement.

Kevin Perry:

JSONCodable, PlistCodable, etc. should have full freedom to craft their interface around each format’s individuals needs and specialities.

At one stage, the “format specialized” protocols was the entirety of the design. However, while looking at adoption scenarios, I realized that this design presented a problem with “currency” types that are owned by frameworks/libraries, but used by application-level serializable types.

[…]

Hence the introduction of the format-agnostic protocols in parallel with the format-specialized ones. Range and CGRect can, in similar fashion to Codable, describe their serializable members abstractly, allowing a specific encoder/decoder to interpret those instructions. The difference from Codable being that we avoid all the OTHER downsides of Codable the OP describes.

Dave DeLong:

That’s why I’m suggesting that we split the API to support the cases separately. We have one API that can be very general and support the whole “A type can be serialized to an opaque format” use-case, and then packages to support particular formats and all of their respective idiosyncrasies. I think we’d be repeating past mistakes to try and make those two use cases be the same API again.

Lincoln Wu:

I think there’s one common use case which is not covered by the current Codable design: heterogeneous/dynamic decoding/encoding.

Many times in my developing, I wanted to decode part of a json into an intermediate representation, and later further decode that thing into a specific type.

Matt Gallagher:

The problem with Codable – and what I think you’re getting at when you suggest we need JSONCodable/PlistCodable – is there’s no sane custom implementation of init(from:) and encode(to:) without being archive-specific. These functions are generally a mashup of two different ideas:

  1. migration and versioning
  2. archive-specific choices like which fields to include and what order

But moreover, while you might make archive-specific choices, you don’t always have archive-specific knowledge.

[…]

We have no lookahead. We can’t peek to see if the next char is a double-quote, a digit or a bracket. Without overloading the Decoder to emit lookahead metadata as decodable types, you simply need to try each possibility, in turn, incurring the overhead and disruption of thrown errors.

Kevin Perry:

This design does not include support for encoding and decoding cyclical objects graphs. Relatedly, there’s still no intention to include encoding of runtime type information in serialization formats for any purpose—all concrete types must be specified by the client doing the encoding or decoding.

Nick Lockwood:

I was really disappointed to see this, because these are probably my two major pain points with Codable.

If we are going to the trouble of making a brand new, backwards-incompatible replacement for Codable then it should try to correct all the major deficiencies of the existing design, not just performance.

NSCoding (for all its faults) supports both or heterogeneous data and cyclical references. If this new system doesn’t support those then we are saying from the outset that it is still isn’t going to be capable of dealing with a lot of real-world use-cases.

[…]

Also (related) some kind of built-in support for schema updates and migrations (similar to CoreData/SwiftData) would be a great feature, as this is another pain point in Codable.

Even just a way to specify a default value for new non-optional properties would reduce a lot of the need for adding manual decoder implementations to apps in post-1.0 releases.

Helge Heß:

NSCoder/NSArchiver was actually pretty good for what it was intended for, archiving object graphs. How can I do that today? SwiftData? 🙈

Nick Lockwood:

Another issue I’ve run into with Codable is that a given object may have more than one serialized representation in a given application.

Zev Eisenberg:

I’d like to put in a request to please consider error handling. A common source of grief for beginners is difficulty in reading the error messages thrown by Codable. Some information is missing, and it’s formatted such that you really have to do some digging to understand it.

Helge Heß:

It seems the new macro based approach will solve some major performance problems 👍 But it doesn’t seem to address what makes serialisation actually hard: different formats, mappings, versioning and preservation. It still seems to be bonusware, w/ “now it does the demos fast”, not something addressing actual serialisation issues.
Think Protobuf, that does.

Previously:


1 Comment


We will be happy to hear your thoughts

Leave a reply

Som2ny Network
Logo
Compare items
  • Total (0)
Compare
0