Skip to content

Fast and well tested serialization library on top of dataclasses

License

Notifications You must be signed in to change notification settings

RA80533/mashumaro

 
 

Repository files navigation

logo
Fast and well tested serialization library on top of dataclasses

Build Status Coverage Status Latest Version Python Version License

When using dataclasses, you often need to dump and load objects based on the schema you have. Mashumaro not only lets you save and load things in different ways, but it also does it super quick.

Key features

  • 🚀 One of the fastest libraries
  • ☝️ Mature and time-tested
  • 👶 Easy to use out of the box
  • ⚙️ Highly customizable
  • 🎉 Built-in support for JSON, YAML, MessagePack, TOML
  • 📦 Built-in support for almost all Python types including typing-extensions
  • 📝 JSON Schema generation

Table of contents

Installation

Use pip to install:

$ pip install mashumaro

The current version of mashumaro supports Python versions 3.7 - 3.11. The latest version of mashumaro that can be installed on Python 3.6 is 3.1.1.

Changelog

This project follows the principles of Semantic Versioning. Changelog is available on GitHub Releases page.

Supported serialization formats

This library adds methods for dumping to and loading from the following formats:

Plain dict can be useful when you need to pass a dict object to a third-party library, such as a client for MongoDB.

You can find the documentation for the specific serialization format below.

Supported data types

There is support for generic types from the standard typing module:

for standard generic types on PEP 585 compatible Python (3.9+):

for special primitives from the typing module:

for standard interpreter types from types module:

for enumerations based on classes from the standard enum module:

for common built-in types:

for built-in datetime oriented types (see more details):

for pathlike types:

for other less popular built-in types:

for backported types from typing-extensions:

for arbitrary types:

Usage example

from enum import Enum
from typing import List
from dataclasses import dataclass
from mashumaro.mixins.json import DataClassJSONMixin

class Currency(Enum):
    USD = "USD"
    EUR = "EUR"

@dataclass
class CurrencyPosition(DataClassJSONMixin):
    currency: Currency
    balance: float

@dataclass
class StockPosition(DataClassJSONMixin):
    ticker: str
    name: str
    balance: int

@dataclass
class Portfolio(DataClassJSONMixin):
    currencies: List[CurrencyPosition]
    stocks: List[StockPosition]

my_portfolio = Portfolio(
    currencies=[
        CurrencyPosition(Currency.USD, 238.67),
        CurrencyPosition(Currency.EUR, 361.84),
    ],
    stocks=[
        StockPosition("AAPL", "Apple", 10),
        StockPosition("AMZN", "Amazon", 10),
    ]
)

json_string = my_portfolio.to_json()
Portfolio.from_json(json_string)  # same as my_portfolio

How does it work?

This library works by taking the schema of the data and generating a specific parser and builder for exactly that schema, taking into account the specifics of the serialization format. This is much faster than inspection of field types on every call of parsing or building at runtime.

These specific parsers and builders are presented by the corresponding from_* and to_* methods. They are compiled during import time (or at runtime in some cases) and are set as attributes to your dataclasses.

Benchmark

  • macOS 13.0.1 Ventura
  • Apple M1
  • 8GB RAM
  • Python 3.11.0

Load and dump sample data 100 times in 5 runs. The following figures show the best overall time in each case.

Library From dict To dict
Time Slowdown factor Time Slowdown factor
mashumaro 0.14724 1x 0.10128 1x
cattrs 0.18906 1.28x 0.14072 1.39x
pydantic 1.02666 6.97x 0.81932 8.09x
marshmallow 1.38348 9.4x 0.45695 4.51x
dataclasses 0.68057 6.72x
dacite 2.37315 16.12x

To run benchmark in your environment:

git clone [email protected]:Fatal1ty/mashumaro.git
cd mashumaro
python3 -m venv env && source env/bin/activate
pip install -e .
pip install -r requirements-dev.txt
python benchmark/run.py

Serialization mixins

mashumaro provides mixins for each serialization format.

Can be imported in two ways:

from mashumaro import DataClassDictMixin
from mashumaro.mixins.dict import DataClassDictMixin

The core mixin that adds serialization functionality to a dataclass. This mixin is a base class for all other serialization format mixins. It adds methods from_dict and to_dict.

Can be imported as:

from mashumaro.mixins.json import DataClassJSONMixin

This mixins adds json serialization functionality to a dataclass. It adds methods from_json and to_json.

Can be imported as:

from mashumaro.mixins.orjson import DataClassORJSONMixin

This mixins adds json serialization functionality to a dataclass using a third-party orjson library. It adds methods from_json, to_jsonb, to_json.

In order to use this mixin, the orjson package must be installed. You can install it manually or using an extra option for mashumaro:

pip install mashumaro[orjson]

Using this mixin the following data types will be handled by orjson library by default:

Can be imported as:

from mashumaro.mixins.msgpack import DataClassMessagePackMixin

This mixins adds MessagePack serialization functionality to a dataclass. It adds methods from_msgpack and to_msgpack.

In order to use this mixin, the msgpack package must be installed. You can install it manually or using an extra option for mashumaro:

pip install mashumaro[msgpack]

Using this mixin the following data types will be handled by msgpack library by default:

Can be imported as:

from mashumaro.mixins.yaml import DataClassYAMLMixin

This mixins adds YAML serialization functionality to a dataclass. It adds methods from_yaml and to_yaml.

In order to use this mixin, the pyyaml package must be installed. You can install it manually or using an extra option for mashumaro:

pip install mashumaro[yaml]

Can be imported as:

from mashumaro.mixins.toml import DataClassTOMLMixin

This mixins adds TOML serialization functionality to a dataclass. It adds methods from_toml and to_toml.

In order to use this mixin, the tomli and tomli-w packages must be installed. In Python 3.11+, tomli is included as tomlib standard library module and can be used my this mixin. You can install the missing packages manually or using an extra option for mashumaro:

pip install mashumaro[toml]

Using this mixin the following data types will be handled by tomli/ tomli-w library by default:

Fields with value None will be omitted on serialization because TOML doesn't support null values.

Customization

Customization options of mashumaro are extensive and will most likely cover your needs. When it comes to non-standard data types and non-standard serialization support, you can do the following:

  • Turn an existing regular or generic class into a serializable one by inheriting the SerializableType class
  • Write different serialization strategies for an existing regular or generic type that is not under your control using SerializationStrategy class
  • Define serialization / deserialization methods:
    • for a specific dataclass field by using field options
    • for a specific data type used in the dataclass by using Config class
  • Alter input and output data with serialization / deserialization hooks
  • Separate serialization scheme from a dataclass in a reusable manner using dialects
  • Choose from predefined serialization engines for the specific data types, e.g. datetime and NamedTuple

SerializableType interface

If you have a custom class or hierarchy of classes whose instances you want to serialize with mashumaro, the first option is to implement SerializableType interface.

User-defined types

Let's look at this not very practicable example:

from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializableType

class Airport(SerializableType):
    def __init__(self, code, city):
        self.code, self.city = code, city

    def _serialize(self):
        return [self.code, self.city]

    @classmethod
    def _deserialize(cls, value):
        return cls(*value)

    def __eq__(self, other):
        return self.code, self.city == other.code, other.city

@dataclass
class Flight(DataClassDictMixin):
    origin: Airport
    destination: Airport

JFK = Airport("JFK", "New York City")
LAX = Airport("LAX", "Los Angeles")

input_data = {
    "origin": ["JFK", "New York City"],
    "destination": ["LAX", "Los Angeles"]
}
my_flight = Flight.from_dict(input_data)
assert my_flight == Flight(JFK, LAX)
assert my_flight.to_dict() == input_data

You can see how Airport instances are seamlessly created from lists of two strings and serialized into them.

By default _deserialize method will get raw input data without any transformations before. This should be enough in many cases, especially when you need to perform non-standard transformations yourself, but let's extend our example:

class Itinerary(SerializableType):
    def __init__(self, flights):
        self.flights = flights

    def _serialize(self):
        return self.flights

    @classmethod
    def _deserialize(cls, flights):
        return cls(flights)

@dataclass
class TravelPlan(DataClassDictMixin):
    budget: float
    itinerary: Itinerary

input_data = {
    "budget": 10_000,
    "itinerary": [
        {
            "origin": ["JFK", "New York City"],
            "destination": ["LAX", "Los Angeles"]
        },
        {
            "origin": ["LAX", "Los Angeles"],
            "destination": ["SFO", "San Fransisco"]
        }
    ]
}

If we pass the flight list as is into Itinerary._deserialize, our itinerary will have something that we may not expect — list[dict] instead of list[Flight]. The solution is quite simple. Instead of calling Flight._deserialize yourself, just use annotations:

class Itinerary(SerializableType, use_annotations=True):
    def __init__(self, flights):
        self.flights = flights

    def _serialize(self) -> list[Flight]:
        return self.flights

    @classmethod
    def _deserialize(cls, flights: list[Flight]):
        return cls(flights)

my_plan = TravelPlan.from_dict(input_data)
assert isinstance(my_plan.itinerary.flights[0], Flight)
assert isinstance(my_plan.itinerary.flights[1], Flight)
assert my_plan.to_dict() == input_data

Here we add annotations to the only argument of _deserialize method and to the return value of _serialize method as well. The latter is needed for correct serialization.

The importance of explicit passing use_annotations=True when defining a class is that otherwise implicit using annotations might break compatibility with old code that wasn't aware of this feature. It will be enabled by default in the future major release.

User-defined generic types

The great thing to note about using annotations in SerializableType is that they work seamlessly with generic and variadic generic types. Let's see how this can be useful:

from datetime import date
from typing import TypeVar
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializableType

KT = TypeVar("KT")
VT = TypeVar("VT")

class DictWrapper(dict[KT, VT], SerializableType, use_annotations=True):
    def _serialize(self) -> dict[KT, VT]:
        return dict(self)

    @classmethod
    def _deserialize(cls, value: dict[KT, VT]) -> 'DictWrapper[KT, VT]':
        return cls(value)

@dataclass
class DataClass(DataClassDictMixin):
    x: DictWrapper[date, str]
    y: DictWrapper[str, date]

input_data = {
    "x": {"2022-12-07": "2022-12-07"},
    "y": {"2022-12-07": "2022-12-07"}
}
obj = DataClass.from_dict(input_data)
assert obj == DataClass(
    x=DictWrapper({date(2022, 12, 7): "2022-12-07"}),
    y=DictWrapper({"2022-12-07": date(2022, 12, 7)})
)
assert obj.to_dict() == input_data

You can see that formatted date is deserialized to date object before passing to DictWrapper._deserialize in a key or value according to the generic parameters.

If you have generic dataclass types, you can use SerializableType for them as well, but it's not necessary since they're supported out of the box.

SerializationStrategy

If you want to add support for a custom third-party type that is not under your control, you can write serialization and deserialization logic inside SerializationStrategy class, which will be reusable and so well suited in case that third-party type is widely used. SerializationStrategy is also good if you want to create strategies that are slightly different from each other, because you can add the strategy differentiator in the __init__ method.

Third-party types

To demonstrate how SerializationStrategy works let's write a simple strategy for datetime serialization in different formats. In this example we will use the same strategy class for two dataclass fields, but a string representing the date and time will be different.

from dataclasses import dataclass, field
from datetime import datetime
from mashumaro import DataClassDictMixin, field_options
from mashumaro.types import SerializationStrategy

class FormattedDateTime(SerializationStrategy):
    def __init__(self, fmt):
        self.fmt = fmt

    def serialize(self, value: datetime) -> str:
        return value.strftime(self.fmt)

    def deserialize(self, value: str) -> datetime:
        return datetime.strptime(value, self.fmt)

@dataclass
class DateTimeFormats(DataClassDictMixin):
    short: datetime = field(
        metadata=field_options(
            serialization_strategy=FormattedDateTime("%d%m%Y%H%M%S")
        )
    )
    verbose: datetime = field(
        metadata=field_options(
            serialization_strategy=FormattedDateTime("%A %B %d, %Y, %H:%M:%S")
        )
    )

formats = DateTimeFormats(
    short=datetime(2019, 1, 1, 12),
    verbose=datetime(2019, 1, 1, 12),
)
dictionary = formats.to_dict()
# {'short': '01012019120000', 'verbose': 'Tuesday January 01, 2019, 12:00:00'}
assert DateTimeFormats.from_dict(dictionary) == formats

Similarly to SerializableType, SerializationStrategy could also take advantage of annotations:

from dataclasses import dataclass
from datetime import datetime
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializationStrategy

class TsSerializationStrategy(SerializationStrategy, use_annotations=True):
    def serialize(self, value: datetime) -> float:
        return value.timestamp()

    def deserialize(self, value: float) -> datetime:
        # value will be converted to float before being passed to this method
        return datetime.fromtimestamp(value)

@dataclass
class Example(DataClassDictMixin):
    dt: datetime

    class Config:
        serialization_strategy = {
            datetime: TsSerializationStrategy(),
        }

example = Example.from_dict({"dt": "1672531200"})
print(example)
# Example(dt=datetime.datetime(2023, 1, 1, 3, 0))
print(example.to_dict())
# {'dt': 1672531200.0}

Here the passed string value "1672531200" will be converted to float before being passed to deserialize method thanks to the float annotation.

As well as for SerializableType, the value of use_annotatons will be True by default in the future major release.

Third-party generic types

To create a generic version of a serialization strategy you need to follow these steps:

  • inherit Generic[...] type with the number of parameters matching the number of parameters of the target generic type
  • Write generic annotations for serialize method's return type and for deserialize method's argument type
  • Use the origin type of the target generic type in the serialization_strategy config section (typing.get_origin might be helpful)

There is no need to add use_annotations=True here because it's enabled implicitly for generic serialization strategies.

For example, there is a third-party multidict package that has a generic MultiDict type. A generic serialization strategy for it might look like this:

from dataclasses import dataclass
from datetime import date
from pprint import pprint
from typing import Generic, List, Tuple, TypeVar
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializationStrategy

from multidict import MultiDict

T = TypeVar("T")

class MultiDictSerializationStrategy(SerializationStrategy, Generic[T]):
    def serialize(self, value: MultiDict[T]) -> List[Tuple[str, T]]:
        return [(k, v) for k, v in value.items()]

    def deserialize(self, value: List[Tuple[str, T]]) -> MultiDict[T]:
        return MultiDict(value)


@dataclass
class Example(DataClassDictMixin):
    floats: MultiDict[float]
    date_lists: MultiDict[List[date]]

    class Config:
        serialization_strategy = {
            MultiDict: MultiDictSerializationStrategy()
        }

example = Example(
    floats=MultiDict([("x", 1.1), ("x", 2.2)]),
    date_lists=MultiDict(
        [("x", [date(2023, 1, 1), date(2023, 1, 2)]),
         ("x", [date(2023, 2, 1), date(2023, 2, 2)])]
    ),
)
pprint(example.to_dict())
# {'date_lists': [['x', ['2023-01-01', '2023-01-02']],
#                 ['x', ['2023-02-01', '2023-02-02']]],
#  'floats': [['x', 1.1], ['x', 2.2]]}
assert Example.from_dict(example.to_dict()) == example

Field options

In some cases creating a new class just for one little thing could be excessive. Moreover, you may need to deal with third party classes that you are not allowed to change. You can usedataclasses.field function as a default field value to configure some serialization aspects through its metadata parameter. Next section describes all supported options to use in metadata mapping.

serialize option

This option allows you to change the serialization method. When using this option, the serialization behaviour depends on what type of value the option has. It could be either Callable[[Any], Any] or str.

A value of type Callable[[Any], Any] is a generic way to specify any callable object like a function, a class method, a class instance method, an instance of a callable class or even a lambda function to be called for serialization.

A value of type str sets a specific engine for serialization. Keep in mind that all possible engines depend on the data type that this option is used with. At this moment there are next serialization engines to choose from:

Applicable data types Supported engines Description
NamedTuple, namedtuple as_list, as_dict How to pack named tuples. By default as_list engine is used that means your named tuple class instance will be packed into a list of its values. You can pack it into a dictionary using as_dict engine.
Any omit Skip the field during serialization

In addition, you can pass a field value as is without changes using pass_through.

Example:

from datetime import datetime
from dataclasses import dataclass, field
from typing import NamedTuple
from mashumaro import DataClassDictMixin

class MyNamedTuple(NamedTuple):
    x: int
    y: float

@dataclass
class A(DataClassDictMixin):
    dt: datetime = field(
        metadata={
            "serialize": lambda v: v.strftime('%Y-%m-%d %H:%M:%S')
        }
    )
    t: MyNamedTuple = field(metadata={"serialize": "as_dict"})

deserialize option

This option allows you to change the deserialization method. When using this option, the deserialization behaviour depends on what type of value the option has. It could be either Callable[[Any], Any] or str.

A value of type Callable[[Any], Any] is a generic way to specify any callable object like a function, a class method, a class instance method, an instance of a callable class or even a lambda function to be called for deserialization.

A value of type str sets a specific engine for deserialization. Keep in mind that all possible engines depend on the data type that this option is used with. At this moment there are next deserialization engines to choose from:

Applicable data types Supported engines Description
datetime, date, time ciso8601, pendulum How to parse datetime string. By default native fromisoformat of corresponding class will be used for datetime, date and time fields. It's the fastest way in most cases, but you can choose an alternative.
NamedTuple, namedtuple as_list, as_dict How to unpack named tuples. By default as_list engine is used that means your named tuple class instance will be created from a list of its values. You can unpack it from a dictionary using as_dict engine.

In addition, you can pass a field value as is without changes using pass_through.

Example:

from datetime import datetime
from dataclasses import dataclass, field
from typing import List, NamedTuple
from mashumaro import DataClassDictMixin
import ciso8601
import dateutil

class MyNamedTuple(NamedTuple):
    x: int
    y: float

@dataclass
class A(DataClassDictMixin):
    x: datetime = field(
        metadata={"deserialize": "pendulum"}
    )

class B(DataClassDictMixin):
    x: datetime = field(
        metadata={"deserialize": ciso8601.parse_datetime_as_naive}
    )

@dataclass
class C(DataClassDictMixin):
    dt: List[datetime] = field(
        metadata={
            "deserialize": lambda l: list(map(dateutil.parser.isoparse, l))
        }
    )

@dataclass
class D(DataClassDictMixin):
    x: MyNamedTuple = field(metadata={"deserialize": "as_dict"})

serialization_strategy option

This option is useful when you want to change the serialization logic for a dataclass field depending on some defined parameters using a reusable serialization scheme. You can find an example in the SerializationStrategy chapter. In addition, you can pass a field value as is without changes using pass_through.

alias option

In some cases it's better to have different names for a field in your class and in its serialized view. For example, a third-party legacy API you are working with might operate with camel case style, but you stick to snake case style in your code base. Or even you want to load data with keys that are invalid identifiers in Python. This problem is easily solved by using aliases:

from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options

@dataclass
class DataClass(DataClassDictMixin):
    a: int = field(metadata=field_options(alias="FieldA"))
    b: int = field(metadata=field_options(alias="#invalid"))

x = DataClass.from_dict({"FieldA": 1, "#invalid": 2})  # DataClass(a=1, b=2)
x.to_dict()  # {"a": 1, "b": 2}  # no aliases on serialization by default

If you want to write all the field aliases in one place there is such a config option.

If you want to serialize all the fields by aliases you have two options to do so:

It's hard to imagine when it might be necessary to serialize only specific fields by alias, but such functionality is easily added to the library. Open the issue if you need it.

If you don't want to remember the names of the options you can use field_options helper function:

from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options

@dataclass
class A(DataClassDictMixin):
    x: int = field(
        metadata=field_options(
            serialize=str,
            deserialize=int,
            ...
        )
    )

More options are on the way. If you know which option would be useful for many, please don't hesitate to create an issue or pull request.

Config options

If inheritance is not an empty word for you, you'll fall in love with the Config class. You can register serialize and deserialize methods, define code generation options and other things just in one place. Or in some classes in different ways if you need flexibility. Inheritance is always on the first place.

There is a base class BaseConfig that you can inherit for the sake of convenience, but it's not mandatory.

In the following example you can see how the debug flag is changed from class to class: ModelA will have debug mode enabled but ModelB will not.

from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig

class BaseModel(DataClassDictMixin):
    class Config(BaseConfig):
        debug = True

class ModelA(BaseModel):
    a: int

class ModelB(BaseModel):
    b: int

    class Config(BaseConfig):
        debug = False

Next section describes all supported options to use in the config.

debug config option

If you enable the debug option the generated code for your data class will be printed.

code_generation_options config option

Some users may need functionality that wouldn't exist without extra cost such as valuable cpu time to execute additional instructions. Since not everyone needs such instructions, they can be enabled by a constant in the list, so the fastest basic behavior of the library will always remain by default. The following table provides a brief overview of all the available constants described below.

Constant Description
TO_DICT_ADD_OMIT_NONE_FLAG Adds omit_none keyword-only argument to to_* methods.
TO_DICT_ADD_BY_ALIAS_FLAG Adds by_alias keyword-only argument to to_* methods.
ADD_DIALECT_SUPPORT Adds dialect keyword-only argument to from_* and to_* methods.

serialization_strategy config option

You can register custom SerializationStrategy, serialize and deserialize methods for specific types just in one place. It could be configured using a dictionary with types as keys. The value could be either a SerializationStrategy instance or a dictionary with serialize and deserialize values with the same meaning as in the field options.

from dataclasses import dataclass
from datetime import datetime, date
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
from mashumaro.types import SerializationStrategy

class FormattedDateTime(SerializationStrategy):
    def __init__(self, fmt):
        self.fmt = fmt

    def serialize(self, value: datetime) -> str:
        return value.strftime(self.fmt)

    def deserialize(self, value: str) -> datetime:
        return datetime.strptime(value, self.fmt)

@dataclass
class DataClass(DataClassDictMixin):

    x: datetime
    y: date

    class Config(BaseConfig):
        serialization_strategy = {
            datetime: FormattedDateTime("%Y"),
            date: {
                # you can use specific str values for datetime here as well
                "deserialize": "pendulum",
                "serialize": date.isoformat,
            },
        }

instance = DataClass.from_dict({"x": "2021", "y": "2021"})
# DataClass(x=datetime.datetime(2021, 1, 1, 0, 0), y=Date(2021, 1, 1))
dictionary = instance.to_dict()
# {'x': '2021', 'y': '2021-01-01'}

aliases config option

Sometimes it's better to write the field aliases in one place. You can mix aliases here with aliases in the field options, but the last ones will always take precedence.

from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig

@dataclass
class DataClass(DataClassDictMixin):
    a: int
    b: int

    class Config(BaseConfig):
        aliases = {
            "a": "FieldA",
            "b": "FieldB",
        }

DataClass.from_dict({"FieldA": 1, "FieldB": 2})  # DataClass(a=1, b=2)

serialize_by_alias config option

All the fields with aliases will be serialized by them by default when this option is enabled. You can mix this config option with by_alias keyword argument.

from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options
from mashumaro.config import BaseConfig

@dataclass
class DataClass(DataClassDictMixin):
    field_a: int = field(metadata=field_options(alias="FieldA"))

    class Config(BaseConfig):
        serialize_by_alias = True

DataClass(field_a=1).to_dict()  # {'FieldA': 1}

omit_none config option

All the fields with None values will be skipped during serialization by default when this option is enabled. You can mix this config option with omit_none keyword argument.

from dataclasses import dataclass, field
from typing import Optional
from mashumaro import DataClassDictMixin, field_options
from mashumaro.config import BaseConfig

@dataclass
class DataClass(DataClassDictMixin):
    x: Optional[int] = None

    class Config(BaseConfig):
        omit_none = True

DataClass().to_dict()  # {}

namedtuple_as_dict config option

Dataclasses are a great way to declare and use data models. But it's not the only way. Python has a typed version of namedtuple called NamedTuple which looks similar to dataclasses:

from typing import NamedTuple

class Point(NamedTuple):
    x: int
    y: int

the same with a dataclass will look like this:

from dataclasses import dataclass

@dataclass
class Point:
    x: int
    y: int

At first glance, you can use both options. But imagine that you need to create a bunch of instances of the Point class. Due to how dataclasses work you will have more memory consumption compared to named tuples. In such a case it could be more appropriate to use named tuples.

By default, all named tuples are packed into lists. But with namedtuple_as_dict option you have a drop-in replacement for dataclasses:

from dataclasses import dataclass
from typing import List, NamedTuple
from mashumaro import DataClassDictMixin

class Point(NamedTuple):
    x: int
    y: int

@dataclass
class DataClass(DataClassDictMixin):
    points: List[Point]

    class Config:
        namedtuple_as_dict = True

obj = DataClass.from_dict({"points": [{"x": 0, "y": 0}, {"x": 1, "y": 1}]})
print(obj.to_dict())  # {"points": [{"x": 0, "y": 0}, {"x": 1, "y": 1}]}

If you want to serialize only certain named tuple fields as dictionaries, you can use the corresponding serialization and deserialization engines.

allow_postponed_evaluation config option

PEP 563 solved the problem of forward references by postponing the evaluation of annotations, so you can write the following code:

from __future__ import annotations
from dataclasses import dataclass
from mashumaro import DataClassDictMixin

@dataclass
class A(DataClassDictMixin):
    x: B

@dataclass
class B(DataClassDictMixin):
    y: int

obj = A.from_dict({'x': {'y': 1}})

You don't need to write anything special here, forward references work out of the box. If a field of a dataclass has a forward reference in the type annotations, building of from_* and to_* methods of this dataclass will be postponed until they are called once. However, if for some reason you don't want the evaluation to be possibly postponed, you can disable it using allow_postponed_evaluation option:

from __future__ import annotations
from dataclasses import dataclass
from mashumaro import DataClassDictMixin

@dataclass
class A(DataClassDictMixin):
    x: B

    class Config:
        allow_postponed_evaluation = False

# UnresolvedTypeReferenceError: Class A has unresolved type reference B
# in some of its fields

@dataclass
class B(DataClassDictMixin):
    y: int

In this case you will get UnresolvedTypeReferenceError regardless of whether class B is declared below or not.

dialect config option

This option is described below in the Dialects section.

orjson_options config option

This option changes default options for orjson.dumps encoder which is used in DataClassORJSONMixin. For example, you can tell orjson to handle non-str dict keys as the built-in json.dumps encoder does. See orjson documentation to read more about these options.

import orjson
from dataclasses import dataclass
from typing import Dict
from mashumaro.config import BaseConfig
from mashumaro.mixins.orjson import DataClassORJSONMixin

@dataclass
class MyClass(DataClassORJSONMixin):
    x: Dict[int, int]

    class Config(BaseConfig):
        orjson_options = orjson.OPT_NON_STR_KEYS

assert MyClass({1: 2}).to_json() == {"1": 2}

Passing field values as is

In some cases it's needed to pass a field value as is without any changes during serialization / deserialization. There is a predefined pass_through object that can be used as serialization_strategy or serialize / deserialize options:

from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, pass_through

class MyClass:
    def __init__(self, some_value):
        self.some_value = some_value

@dataclass
class A1(DataClassDictMixin):
    x: MyClass = field(
        metadata={
            "serialize": pass_through,
            "deserialize": pass_through,
        }
    )

@dataclass
class A2(DataClassDictMixin):
    x: MyClass = field(
        metadata={
            "serialization_strategy": pass_through,
        }
    )

@dataclass
class A3(DataClassDictMixin):
    x: MyClass

    class Config:
        serialization_strategy = {
            MyClass: pass_through,
        }

@dataclass
class A4(DataClassDictMixin):
    x: MyClass

    class Config:
        serialization_strategy = {
            MyClass: {
                "serialize": pass_through,
                "deserialize": pass_through,
            }
        }

my_class_instance = MyClass(42)

assert A1.from_dict({'x': my_class_instance}).x == my_class_instance
assert A2.from_dict({'x': my_class_instance}).x == my_class_instance
assert A3.from_dict({'x': my_class_instance}).x == my_class_instance
assert A4.from_dict({'x': my_class_instance}).x == my_class_instance

a1_dict = A1(my_class_instance).to_dict()
a2_dict = A2(my_class_instance).to_dict()
a3_dict = A3(my_class_instance).to_dict()
a4_dict = A4(my_class_instance).to_dict()

assert a1_dict == a2_dict == a3_dict == a4_dict == {"x": my_class_instance}

Dialects

Sometimes it's needed to have different serialization and deserialization methods depending on the data source where entities of the dataclass are stored or on the API to which the entities are being sent or received from. There is a special Dialect type that may contain all the differences from the default serialization and deserialization methods. You can create different dialects and use each of them for the same dataclass depending on the situation.

Suppose we have the following dataclass with a field of type date:

@dataclass
class Entity(DataClassDictMixin):
    dt: date

By default, a field of date type serializes to a string in ISO 8601 format, so the serialized entity will look like {'dt': '2021-12-31'}. But what if we have, for example, two sensitive legacy Ethiopian and Japanese APIs that use two different formats for dates — dd/mm/yyyy and yyyy年mm月dd日? Instead of creating two similar dataclasses we can have one dataclass and two dialects:

from dataclasses import dataclass
from datetime import date, datetime
from mashumaro import DataClassDictMixin
from mashumaro.config import ADD_DIALECT_SUPPORT
from mashumaro.dialect import Dialect
from mashumaro.types import SerializationStrategy

class DateTimeSerializationStrategy(SerializationStrategy):
    def __init__(self, fmt: str):
        self.fmt = fmt

    def serialize(self, value: date) -> str:
        return value.strftime(self.fmt)

    def deserialize(self, value: str) -> date:
        return datetime.strptime(value, self.fmt).date()

class EthiopianDialect(Dialect):
    serialization_strategy = {
        date: DateTimeSerializationStrategy("%d/%m/%Y")
    }

class JapaneseDialect(Dialect):
    serialization_strategy = {
        date: DateTimeSerializationStrategy("%Y年%m月%d日")
    }

@dataclass
class Entity(DataClassDictMixin):
    dt: date

    class Config:
        code_generation_options = [ADD_DIALECT_SUPPORT]

entity = Entity(date(2021, 12, 31))
entity.to_dict(dialect=EthiopianDialect)  # {'dt': '31/12/2021'}
entity.to_dict(dialect=JapaneseDialect)   # {'dt': '2021年12月31日'}
Entity.from_dict({'dt': '2021年12月31日'}, dialect=JapaneseDialect)

serialization_strategy dialect option

This dialect option has the same meaning as the similar config option but for the dialect scope. You can register custom SerializationStrategy, serialize and deserialize methods for the specific types.

omit_none dialect option

This dialect option has the same meaning as the similar config option but for the dialect scope.

Changing the default dialect

You can change the default serialization and deserialization methods for a dataclass not only in the serialization_strategy config option but using the dialect config option. If you have multiple dataclasses without a common parent class the default dialect can help you to reduce the number of code lines written:

@dataclass
class Entity(DataClassDictMixin):
    dt: date

    class Config:
        dialect = JapaneseDialect

entity = Entity(date(2021, 12, 31))
entity.to_dict()  # {'dt': '2021年12月31日'}
assert Entity.from_dict({'dt': '2021年12月31日'}) == entity

Code generation options

Add omit_none keyword argument

If you want to have control over whether to skip None values on serialization you can add omit_none parameter to to_* methods using the code_generation_options list. The default value of omit_none parameter depends on whether the omit_none config option or omit_none dialect option is enabled.

from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig, TO_DICT_ADD_OMIT_NONE_FLAG

@dataclass
class Inner(DataClassDictMixin):
    x: int = None
    # "x" won't be omitted since there is no TO_DICT_ADD_OMIT_NONE_FLAG here

@dataclass
class Model(DataClassDictMixin):
    x: Inner
    a: int = None
    b: str = None  # will be omitted

    class Config(BaseConfig):
        code_generation_options = [TO_DICT_ADD_OMIT_NONE_FLAG]

Model(x=Inner(), a=1).to_dict(omit_none=True)  # {'x': {'x': None}, 'a': 1}

Add by_alias keyword argument

If you want to have control over whether to serialize fields by their aliases you can add by_alias parameter to to_* methods using the code_generation_options list. The default value of by_alias parameter depends on whether the serialize_by_alias config option is enabled.

from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options
from mashumaro.config import BaseConfig, TO_DICT_ADD_BY_ALIAS_FLAG

@dataclass
class DataClass(DataClassDictMixin):
    field_a: int = field(metadata=field_options(alias="FieldA"))

    class Config(BaseConfig):
        code_generation_options = [TO_DICT_ADD_BY_ALIAS_FLAG]

DataClass(field_a=1).to_dict()  # {'field_a': 1}
DataClass(field_a=1).to_dict(by_alias=True)  # {'FieldA': 1}

Add dialect keyword argument

Support for dialects is disabled by default for performance reasons. You can enable it using a ADD_DIALECT_SUPPORT constant:

from dataclasses import dataclass
from datetime import date
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig, ADD_DIALECT_SUPPORT

@dataclass
class Entity(DataClassDictMixin):
    dt: date

    class Config(BaseConfig):
        code_generation_options = [ADD_DIALECT_SUPPORT]

Generic dataclasses

Along with user-defined generic types implementing SerializableType interface, generic and variadic generic dataclasses can also be used. There are two applicable scenarios for them.

Generic dataclass inheritance

If you have a generic dataclass and want to serialize and deserialize its instances depending on the concrete types, you can use inheritance for that:

from dataclasses import dataclass
from datetime import date
from typing import Generic, Mapping, TypeVar, TypeVarTuple
from mashumaro import DataClassDictMixin

KT = TypeVar("KT")
VT = TypeVar("VT", date, str)
Ts = TypeVarTuple("Ts")

@dataclass
class GenericDataClass(Generic[KT, VT, *Ts]):
    x: Mapping[KT, VT]
    y: Tuple[*Ts, KT]

@dataclass
class ConcreteDataClass(
    GenericDataClass[str, date, *Tuple[float, ...]],
    DataClassDictMixin,
):
    pass

ConcreteDataClass.from_dict({"x": {"a": "2021-01-01"}, "y": [1, 2, "a"]})
# ConcreteDataClass(x={'a': datetime.date(2021, 1, 1)}, y=(1.0, 2.0, 'a'))

You can override TypeVar field with a concrete type or another TypeVar. Partial specification of concrete types is also allowed. If a generic dataclass is inherited without type overriding the types of its fields remain untouched.

Generic dataclass in a field type

Another approach is to specify concrete types in the field type hints. This can help to have different versions of the same generic dataclass:

from dataclasses import dataclass
from datetime import date
from typing import Generic, TypeVar
from mashumaro import DataClassDictMixin

T = TypeVar('T')

@dataclass
class GenericDataClass(Generic[T], DataClassDictMixin):
    x: T

@dataclass
class DataClass(DataClassDictMixin):
    date: GenericDataClass[date]
    str: GenericDataClass[str]

instance = DataClass(
    date=GenericDataClass(x=date(2021, 1, 1)),
    str=GenericDataClass(x='2021-01-01'),
)
dictionary = {'date': {'x': '2021-01-01'}, 'str': {'x': '2021-01-01'}}
assert DataClass.from_dict(dictionary) == instance

GenericSerializableType interface

There is a generic alternative to SerializableType called GenericSerializableType. It makes it possible to decide yourself how to serialize and deserialize input data depending on the types provided:

from dataclasses import dataclass
from datetime import date
from typing import Dict, TypeVar
from mashumaro import DataClassDictMixin
from mashumaro.types import GenericSerializableType

KT = TypeVar("KT")
VT = TypeVar("VT")

class DictWrapper(Dict[KT, VT], GenericSerializableType):
    __packers__ = {date: lambda x: x.isoformat(), str: str}
    __unpackers__ = {date: date.fromisoformat, str: str}

    def _serialize(self, types) -> Dict[KT, VT]:
        k_type, v_type = types
        k_conv = self.__packers__[k_type]
        v_conv = self.__packers__[v_type]
        return {k_conv(k): v_conv(v) for k, v in self.items()}

    @classmethod
    def _deserialize(cls, value, types) -> "DictWrapper[KT, VT]":
        k_type, v_type = types
        k_conv = cls.__unpackers__[k_type]
        v_conv = cls.__unpackers__[v_type]
        return cls({k_conv(k): v_conv(v) for k, v in value.items()})

@dataclass
class DataClass(DataClassDictMixin):
    x: DictWrapper[date, str]
    y: DictWrapper[str, date]

input_data = {
    "x": {"2022-12-07": "2022-12-07"},
    "y": {"2022-12-07": "2022-12-07"},
}
obj = DataClass.from_dict(input_data)
assert obj == DataClass(
    x=DictWrapper({date(2022, 12, 7): "2022-12-07"}),
    y=DictWrapper({"2022-12-07": date(2022, 12, 7)}),
)
assert obj.to_dict() == input_data

As you can see, the code turns out to be massive compared to the alternative but in rare cases such flexibility can be useful. You should think twice about whether it's really worth using it.

Serialization hooks

In some cases you need to prepare input / output data or do some extraordinary actions at different stages of the deserialization / serialization lifecycle. You can do this with different types of hooks.

Before deserialization

For doing something with a dictionary that will be passed to deserialization you can use __pre_deserialize__ class method:

@dataclass
class A(DataClassJSONMixin):
    abc: int

    @classmethod
    def __pre_deserialize__(cls, d: Dict[Any, Any]) -> Dict[Any, Any]:
        return {k.lower(): v for k, v in d.items()}

print(DataClass.from_dict({"ABC": 123}))    # DataClass(abc=123)
print(DataClass.from_json('{"ABC": 123}'))  # DataClass(abc=123)

After deserialization

For doing something with a dataclass instance that was created as a result of deserialization you can use __post_deserialize__ class method:

@dataclass
class A(DataClassJSONMixin):
    abc: int

    @classmethod
    def __post_deserialize__(cls, obj: 'A') -> 'A':
        obj.abc = 456
        return obj

print(DataClass.from_dict({"abc": 123}))    # DataClass(abc=456)
print(DataClass.from_json('{"abc": 123}'))  # DataClass(abc=456)

Before serialization

For doing something before serialization you can use __pre_serialize__ method:

@dataclass
class A(DataClassJSONMixin):
    abc: int
    counter: ClassVar[int] = 0

    def __pre_serialize__(self) -> 'A':
        self.counter += 1
        return self

obj = DataClass(abc=123)
obj.to_dict()
obj.to_json()
print(obj.counter)  # 2

After serialization

For doing something with a dictionary that was created as a result of serialization you can use __post_serialize__ method:

@dataclass
class A(DataClassJSONMixin):
    user: str
    password: str

    def __post_serialize__(self, d: Dict[Any, Any]) -> Dict[Any, Any]:
        d.pop('password')
        return d

obj = DataClass(user="name", password="secret")
print(obj.to_dict())  # {"user": "name"}
print(obj.to_json())  # '{"user": "name"}'

JSON Schema

You can build JSON Schema not only for dataclasses but also for any other supported data types. There is support for the following standards:

Building JSON Schema

For simple one-time cases it's recommended to start from using a configurable build_json_schema function. It returns JSONSchema object that can be serialized to json or to dict:

from dataclasses import dataclass
from typing import List
from uuid import UUID

from mashumaro.jsonschema import build_json_schema


@dataclass
class User:
    id: UUID
    name: str


print(build_json_schema(List[User]).to_json())
Click to show the result
{
    "type": "array",
    "items": {
        "type": "object",
        "title": "User",
        "properties": {
            "id": {
                "type": "string",
                "format": "uuid"
            },
            "name": {
                "type": "string"
            }
        },
        "additionalProperties": false,
        "required": [
            "id",
            "name"
        ]
    }
}

Additional validation keywords (see below) can be added using annotations:

from typing import Annotated, List
from mashumaro.jsonschema import build_json_schema
from mashumaro.jsonschema.annotations import Maximum, MaxItems

print(
    build_json_schema(
        Annotated[
            List[Annotated[int, Maximum(42)]],
            MaxItems(4)
        ]
    ).to_json()
)
Click to show the result
{
    "type": "array",
    "items": {
        "type": "integer",
        "maximum": 42
    },
    "maxItems": 4
}

The $schema keyword can be added by setting with_dialect_uri to True:

print(build_json_schema(str, with_dialect_uri=True).to_json())
Click to show the result
{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "string"
}

By default, Draft 2022-12 dialect is being used, but you can change it to another one by setting dialect parameter:

from mashumaro.jsonschema import OPEN_API_3_1

print(
    build_json_schema(
        str, dialect=OPEN_API_3_1, with_dialect_uri=True
    ).to_json()
)
Click to show the result
{
    "$schema": "https://spec.openapis.org/oas/3.1/dialect/base",
    "type": "string"
}

All dataclass JSON Schemas can or can not be placed in the definitions section, depending on the all_refs parameter, which default value comes from a dialect used (False for Draft 2022-12, True for OpenAPI Specification 3.1.0):

print(build_json_schema(List[User], all_refs=True).to_json())
Click to show the result
{
    "type": "array",
    "$defs": {
        "User": {
            "type": "object",
            "title": "User",
            "properties": {
                "id": {
                    "type": "string",
                    "format": "uuid"
                },
                "name": {
                    "type": "string"
                }
            },
            "additionalProperties": false,
            "required": [
                "id",
                "name"
            ]
        }
    },
    "items": {
        "$ref": "#/defs/User"
    }
}

The definitions section can be omitted from the final document by setting with_definitions parameter to False:

print(
    build_json_schema(
        List[User], dialect=OPEN_API_3_1, with_definitions=False
    ).to_json()
)
Click to show the result
{
    "type": "array",
    "items": {
        "$ref": "#/components/schemas/User"
    }
}

Reference prefix can be changed by using ref_prefix parameter:

print(
    build_json_schema(
        List[User],
        all_refs=True,
        with_definitions=False,
        ref_prefix="#/components/responses",
    ).to_json()
)
Click to show the result
{
    "type": "array",
    "items": {
        "$ref": "#/components/responses/User"
    }
}

The omitted definitions could be found later in the Context object that you could have created and passed to the function, but it could be easier to use JSONSchemaBuilder for that. For example, you might found it handy to build OpenAPI Specification step by step passing your models to the builder and get all the registered definitions later. This builder has reasonable defaults but can be customized if necessary.

from mashumaro.jsonschema import JSONSchemaBuilder, OPEN_API_3_1

builder = JSONSchemaBuilder(OPEN_API_3_1)

@dataclass
class User:
    id: UUID
    name: str

@dataclass
class Device:
    id: UUID
    model: str

print(builder.build(List[User]).to_json())
print(builder.build(List[Device]).to_json())
print(builder.get_definitions().to_json())
Click to show the result
{
    "type": "array",
    "items": {
        "$ref": "#/components/schemas/User"
    }
}
{
    "type": "array",
    "items": {
        "$ref": "#/components/schemas/Device"
    }
}
{
    "User": {
        "type": "object",
        "title": "User",
        "properties": {
            "id": {
                "type": "string",
                "format": "uuid"
            },
            "name": {
                "type": "string"
            }
        },
        "additionalProperties": false,
        "required": [
            "id",
            "name"
        ]
    },
    "Device": {
        "type": "object",
        "title": "Device",
        "properties": {
            "id": {
                "type": "string",
                "format": "uuid"
            },
            "model": {
                "type": "string"
            }
        },
        "additionalProperties": false,
        "required": [
            "id",
            "model"
        ]
    }
}

JSON Schema constraints

Apart from required keywords, that are added automatically for certain data types, you're free to use additional validation keywords. They're presented by the corresponding classes in mashumaro.jsonschema.annotations:

Number constraints:

String constraints:

Array constraints:

Object constraints:

Extending JSON Schema

Using a Config class it is possible to override some parts of the schema. Currently, it works for dataclass fields via "properties" key:

from dataclasses import dataclass
from mashumaro.jsonschema import build_json_schema

@dataclass
class FooBar:
    foo: str
    bar: int

    class Config:
        json_schema = {
            "properties": {
                "foo": {
                    "type": "string",
                    "description": "bar"
                }
            }
        }

print(build_json_schema(FooBar).to_json())
Click to show the result
{
    "type": "object",
    "title": "FooBar",
    "properties": {
        "foo": {
            "type": "string",
            "description": "bar"
        },
        "bar": {
            "type": "integer"
        }
    },
    "additionalProperties": false,
    "required": [
        "foo",
        "bar"
    ]
}

JSON Schema and custom serialization methods

Mashumaro provides different ways to override default serialization methods for dataclass fields or specific data types. In order for these overrides to be reflected in the schema, you need to make sure that the methods have annotations of the return value type.

from dataclasses import dataclass, field
from mashumaro.config import BaseConfig
from mashumaro.jsonschema import build_json_schema

def str_as_list(s: str) -> list[str]:
    return list(s)

def int_as_str(i: int) -> str:
    return str(i)

@dataclass
class FooBar:
    foo: str = field(metadata={"serialize": str_as_list})
    bar: int

    class Config(BaseConfig):
        serialization_strategy = {
            int: {
                "serialize": int_as_str
            }
        }

print(build_json_schema(FooBar).to_json())
Click to show the result
{
    "type": "object",
    "title": "FooBar",
    "properties": {
        "foo": {
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "bar": {
            "type": "string"
        }
    },
    "additionalProperties": false,
    "required": [
        "foo",
        "bar"
    ]
}

About

Fast and well tested serialization library on top of dataclasses

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%