Skip to content

Commit

Permalink
Fixed #30190 -- Added JSONL serializer.
Browse files Browse the repository at this point in the history
  • Loading branch information
aliva authored Jun 16, 2020
1 parent ea3beb4 commit e296376
Show file tree
Hide file tree
Showing 6 changed files with 396 additions and 1 deletion.
1 change: 1 addition & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ answer newbie questions, and generally made Django that much better:
Alex Robbins <[email protected]>
Alexey Boriskin <[email protected]>
Alexey Tsivunin <[email protected]>
Ali Vakilzade <[email protected]>
Aljosa Mohorovic <[email protected]>
Amit Chakradeo <https://amit.chakradeo.net/>
Amit Ramon <[email protected]>
Expand Down
1 change: 1 addition & 0 deletions django/core/serializers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
"python": "django.core.serializers.python",
"json": "django.core.serializers.json",
"yaml": "django.core.serializers.pyyaml",
"jsonl": "django.core.serializers.jsonl",
}

_serializers = {}
Expand Down
57 changes: 57 additions & 0 deletions django/core/serializers/jsonl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""
Serialize data to/from JSON Lines
"""

import json

from django.core.serializers.base import DeserializationError
from django.core.serializers.json import DjangoJSONEncoder
from django.core.serializers.python import (
Deserializer as PythonDeserializer, Serializer as PythonSerializer,
)


class Serializer(PythonSerializer):
"""Convert a queryset to JSON Lines."""
internal_use_only = False

def _init_options(self):
self._current = None
self.json_kwargs = self.options.copy()
self.json_kwargs.pop('stream', None)
self.json_kwargs.pop('fields', None)
self.json_kwargs.pop('indent', None)
self.json_kwargs['separators'] = (',', ': ')
self.json_kwargs.setdefault('cls', DjangoJSONEncoder)
self.json_kwargs.setdefault('ensure_ascii', False)

def start_serialization(self):
self._init_options()

def end_object(self, obj):
# self._current has the field data
json.dump(self.get_dump_object(obj), self.stream, **self.json_kwargs)
self.stream.write("\n")
self._current = None

def getvalue(self):
# Grandparent super
return super(PythonSerializer, self).getvalue()


def Deserializer(stream_or_string, **options):
"""Deserialize a stream or string of JSON data."""
if isinstance(stream_or_string, bytes):
stream_or_string = stream_or_string.decode()
if isinstance(stream_or_string, (bytes, str)):
stream_or_string = stream_or_string.split("\n")

for line in stream_or_string:
if not line.strip():
continue
try:
yield list(PythonDeserializer([json.loads(line), ], **options))[0]
except (GeneratorExit, DeserializationError):
raise
except Exception as exc:
raise DeserializationError() from exc
5 changes: 4 additions & 1 deletion docs/releases/3.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,10 @@ Security
Serialization
~~~~~~~~~~~~~

* ...
* The new :ref:`JSONL <serialization-formats-jsonl>` serializer allows using
the JSON Lines format with :djadmin:`dumpdata` and :djadmin:`loaddata`. This
can be useful for populating large databases because data is loaded line by
line into memory, rather than being loaded all at once.

Signals
~~~~~~~
Expand Down
21 changes: 21 additions & 0 deletions docs/topics/serialization.txt
Original file line number Diff line number Diff line change
Expand Up @@ -160,11 +160,14 @@ Identifier Information

``json`` Serializes to and from JSON_.

``jsonl`` Serializes to and from JSONL_.

``yaml`` Serializes to YAML (YAML Ain't a Markup Language). This
serializer is only available if PyYAML_ is installed.
========== ==============================================================

.. _json: https://json.org/
.. _jsonl: http://jsonlines.org/
.. _PyYAML: https://pyyaml.org/

XML
Expand Down Expand Up @@ -307,6 +310,24 @@ The JSON serializer uses ``DjangoJSONEncoder`` for encoding. A subclass of

.. _ecma-262: https://www.ecma-international.org/ecma-262/5.1/#sec-15.9.1.15

.. _serialization-formats-jsonl:

JSONL
-----

.. versionadded:: 3.2

*JSONL* stands for *JSON Lines*. With this format, objects are separated by new
lines, and each line contains a valid JSON object. JSONL serialized data look
like this::

{ "pk": "4b678b301dfd8a4e0dad910de3ae245b", "model": "sessions.session", "fields": { ... }}
{ "pk": "88bea72c02274f3c9bf1cb2bb8cee4fc", "model": "sessions.session", "fields": { ... }}
{ "pk": "9cf0e26691b64147a67e2a9f06ad7a53", "model": "sessions.session", "fields": { ... }}

JSONL can be useful for populating large databases, since the data can be
processed line by line, rather than being loaded into memory all at once.

YAML
----

Expand Down
Loading

0 comments on commit e296376

Please sign in to comment.