This article is participating in Python Theme Month. See the link to the event for more details

Rest cannot be enjoyed by lazy people~

preface

The more you use JSON, the more likely you will encounter the bottleneck of JSON encoding or decoding. Python’s built-in JSON library is good, but there are many other faster JSON libraries available, but the choice of which one or which one depends on the specific situation, there is no standard rule to determine which is the best or fastest JSON. Because different projects have different needs, some for security and some for speed, this article introduces the faster JSON-orJSON.

Orjson profile

Orjson is a faster Python JSON library that serializes dataclass Datetime NumPY UUID than the standard JSON library and RapidJSON. Json or RapidJSON is a bytes serialization rather than a STR serialization. The serialization does not convert Unicode to ASCII. Orjson does not provide a dump/load method for serializing and deserializing file-like objects. See the official documentation for more information on orJSON usage.

Json Rapidson vs. ORJSON speed comparison

Below is the benchmark code for JSON RapidJSON orJSON and the results.

import json
import orjson
import rapidjson
import time

m = {
    "timestamp": 1556283673.1523004."task_uuid": "0ed1a1c3-050c-4fb9-9426-a7e72d0acfc7"."task_level": [1.2.1]."action_status": "started"."action_type": "main"."key": "value"."another_key": 123."and_another": ["a"."b"],}def benchmark(module_name, dumps) :
    start = time.time()
    for i in range(100000):
        dumps(m)
    print(module_name, time.time() - start)


benchmark('json', json.dumps)
benchmark('rapidjson', rapidjson.dumps)
benchmark("orjson".lambda s: str(orjson.dumps(s), "utf-8"))  # orjson can only output bytes
Copy the code

Orjson is the fastest, even if it requires additional Unicode decoding.

json 1.1019978523254395
rapidjson 0.25800156593322754
orjson 0.0859987735748291
Copy the code

Orjson basic use

orjsonThe installation

The installation command is very simple: PIP install orjson

Note that the PIP version must be greater than 19.3 to install using the PIP command on Linux, so you can update the PIP version before installing ORJSON.

orjsonThe basic use

Serialization – dumps

The biggest difference with Python’s standard JSON library, JSON, is that orjson.dumps returns bytes while json.dumps returns STR. OPT_SORT_KEYS is replaced by option=orjson.OPT_SORT_KEYS and indent is replaced by option=orjson.OPT_INDENT_2 and no other indent levels are supported.

def dumps(
    __obj: Any,
    default: Optional[Callable[[Any].Any]] = ...,
    option: Optional[int] =... .) - >bytes:.# serialization
import orjson

m = {
    "timestamp": 1556283673.1523004."task_uuid": "0ed1a1c3-050c-4fb9-9426-a7e72d0acfc7"."task_level": [1.2.1]."action_status": "started"."action_type": "Ha ha"."key": "value"."another_key": 123."and_another": ["a"."b"],
}


res = orjson.dumps(m)
print(res)  
# B '{" timestamp ": 1556283673.1523004," task_uuid ":" 0 ed1a1c3 fb9-9426-050 - c - 4 - a7e72d0acfc7 ", "task_level" : [1, 2, 1], "action_status ":"started","action_type":"\xe5\x93\x88\xe5\x93\x88","key":"value","another_key":123,"and_another":["a","b"]}'
Copy the code

Deserialization -loads

The loads method deserializes bytes data in JSON format as a Python example object.

print(orjson.loads(res))
Copy the code

float int strSerialization and deserialization of types

float

Orjson does not lose precision when serializing and deserializing double precision floating point numbers, nor does IT lose precision in JSON RapidJSON. Orjson. dumps are not compatible with Nan,Infinity,-Infinity serialization and will get null results, but JSON and RapidJSON are supported.

>>> import orjson, ujson, rapidjson, json
>>> orjson.dumps([float("NaN"), float("Infinity"), float("-Infinity")])
b'[null,null,null]'
>>> rapidjson.dumps([float("NaN"), float("Infinity"), float("-Infinity")])
'[NaN,Infinity,-Infinity]'
>>> json.dumps([float("NaN"), float("Infinity"), float("-Infinity")])
'[NaN, Infinity, -Infinity]'
Copy the code

int

By default, orJSON can serialize and deserialize 64-bits certificates from the minimum signed value (-9223372036854775807) to the maximum unsigned value (18446744073709551615), but in some scenarios only 53-bits certificates are supported. For web browsers, the dumps method throws a JSONEncodeError exception for incompatible parts.

>>> import orjson
>>> orjson.dumps(9007199254740992)
b'9007199254740992'
>>> orjson.dumps(9007199254740992, option=orjson.OPT_STRICT_INTEGER)
JSONEncodeError: Integer exceeds 53-bit range
>>> orjson.dumps(-9007199254740992, option=orjson.OPT_STRICT_INTEGER)
JSONEncodeError: Integer exceeds 53-bit range
Copy the code

str

Orjson is strictly consistent with UTF8, more so than Python’s standard library JSON, which uses UTF-16 proxies for serialization and deserialization, but UTF8 is not available. Orjson.jsonencodeerror will be thrown if the orjson.dumps argument passes a character other than UTF8, and orjson.loads() will also be thrown if it receives an unavailable UTF8 character.

Orjson and RapidJSON, in contrast to the Python standard library JSON, always throw an exception for input that does not conform to the rules.

For program robustness, you can encode bytes into UTF8 format before deserialization.

>>> import orjson
>>> orjson.loads(b'"\xed\xa0\x80"')
JSONDecodeError: str is not valid UTF-8: surrogates not allowed
>>> orjson.loads(b'"\xed\xa0\x80"'.decode("utf-8"."replace"))
Copy the code

conclusion

The article was first published in the wechat public account program Yuan Xiaozhuang, synchronized with nuggets.

Please explain where it came from. If you pass by, please put out your cute little finger and click like before you go (╹▽╹)