This article describes how to use the Google Protocol Buffer (hereafter abbreviated as PB) in Python, including the following parts:
- Why use PB?
- Install Google PB
- Custom.proto file
- Compile.proto file
- Parse the target py file
- Serialization and deserialization
- More complex messages
- Dynamic compilation
Why use PB?
Protocol Buffer (PB) is a structured data exchange format developed by Google and used as the standard writing format of Tencent cloud log service. Therefore, before writing log data, you need to serialize the original log data to PB data flow and write it to the server through API. However, it is difficult to operate PB format in each end class program, so a PB conversion layer needs to be added between the end class and the log service.
Of course, PB format also has its own advantages, mainly simple and fast, specific test results see Google serialization benchmark analysis
Install Google PB
If you want to use PB in Python, you need to install the PB compiler protoc to compile your.proto file.
Download the latest Protobuf release package and install it. The current version is 3.5.1
Wget https://github.com/google/protobuf/releases/download/v3.5.1/protobuf-all-3.5.1.tar.gz tar XVFZ Protobuf - all - 3.5.1 track of. Tar. GzcdProtobuf-3.5.1 /./configure --prefix=/usr make make check make installCopy the code
If all the check steps pass, the compilation succeeds.
Continue to install the Python module for Protobuf
cd ./python
python setup.py build
python setup.py test
python setup.py install
Copy the code
Verify the protoc command
root@ubuntu:~# protoc --version
libprotoc 3.5.1
Copy the code
The protobuf’s default installation location is/usr/local/usr/local/lib is not in the Ubuntu system default LD_LIBRARY_PATH, if when configure is not specified in the Ubuntu installation path is/usr, will appear the following error
protoc: error while loading shared libraries: libprotoc.so.8: cannot open shared object file: No such file or directory
Copy the code
You can use the ldconfig command to resolve the problem. For Protobuf cannot find shared libraries, this error is mentioned in the README of the installation package. You can also reinstall it
Verify that Python modules are installed correctly
import google.protobuf
Copy the code
In the Python interpreter, if the preceding import does not fail, the installation is normal.
Custom.proto file
First we need to write a proto file that defines the structured data we need to process in our program. In Protobuf terminology, structured data is called Message. Proto files are very similar to data definitions in Java or C++. The proto sample file cls.log. proto is as follows:
syntax = "proto2";
package cls;
message Log
{
optional uint64 time = 1; // UNIX Time Format
required string topic_id = 2;
required string content = 3;
}
Copy the code
The.proto file begins with a package declaration to help prevent naming conflicts in different projects. In Python, packages are usually determined by directory structure, so this.proto file defines packages that have no effect in actual Python code. However, the official advice is to stick with this statement, mainly to prevent name collisions in the NAMESPACE of the PB. The package name is CLS and defines a message Log that has three members with the following meanings:
The field name | type | location | Whether must | meaning |
---|---|---|---|---|
time | uint64 | body | no | Log time. If this parameter is not specified, the time when the server receives the request is used |
topic_id | string | body | is | Id of the reported log subject |
content | string | body | is | Log contents |
A good habit is to take the name of a proto file seriously. Such as the naming rules as: packageName. MessageName. Proto
Compile.proto file
Use the protoc compiler to compile directly, specifying the source file path and target file path
SRC_DIR=/tmp/src_dir
DST_DIR=/tmp/dst_dir
protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/cls.Log.proto
Copy the code
Use the –python_out option to generate Python classes. Use the –cpp_out option to generate C++ classes
Parse the target py file
The file directories generated in the target folder are as follows:
root@ubuntu:/tmp/dst_dir# tree.├ ─ garbage ─ log_2.py 1 directory, 1 fileCopy the code
The log_pb2. py file contains the following contents (editing is not allowed) :
# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: cls.Log.proto
import sys
_b=sys.version_info[0] <3 and (lambda x:x) or (lambda x:x.encode('latin1'))
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
from google.protobuf import descriptor_pb2
# @@protoc_insertion_point(imports)
_sym_db = _symbol_database.Default()
DESCRIPTOR = _descriptor.FileDescriptor(
name='cls.Log.proto',
package='cls',
syntax='proto2',
serialized_pb=_b('\n\rcls.Log.proto\x12\x03\x63ls\"6\n\x03Log\x12\x0c\n\x04time\x18\x01 \x01(\x04\x12\x10\n\x08topic_id\x18\x02 \x02(\t\x12\x0f\n\x07\x63ontent\x18\x03 \x02(\t')
)
_LOG = _descriptor.Descriptor(
name='Log',
full_name='cls.Log',
filename=None,
file=DESCRIPTOR,
containing_type=None,
fields=[
_descriptor.FieldDescriptor(
name='time', full_name='cls.Log.time', index=0,
number=1.type=4, cpp_type=4, label=1,
has_default_value=False, default_value=0,
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
options=None, file=DESCRIPTOR),
_descriptor.FieldDescriptor(
name='topic_id', full_name='cls.Log.topic_id', index=1,
number=2.type=9, cpp_type=9, label=2,
has_default_value=False, default_value=_b("").decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
options=None, file=DESCRIPTOR),
_descriptor.FieldDescriptor(
name='content', full_name='cls.Log.content', index=2,
number=3.type=9, cpp_type=9, label=2,
has_default_value=False, default_value=_b("").decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
options=None, file=DESCRIPTOR),
],
extensions=[
],
nested_types=[],
enum_types=[
],
options=None,
is_extendable=False,
syntax='proto2',
extension_ranges=[],
oneofs=[
],
serialized_start=22,
serialized_end=76,
)
DESCRIPTOR.message_types_by_name['Log'] = _LOG
_sym_db.RegisterFileDescriptor(DESCRIPTOR)
Log = _reflection.GeneratedProtocolMessageType('Log', (_message.Message,), dict(
DESCRIPTOR = _LOG,
__module__ = 'cls.Log_pb2'
# @@protoc_insertion_point(class_scope:cls.Log)
))
_sym_db.RegisterMessage(Log)
# @@protoc_insertion_point(module_scope)
Copy the code
The source code of py file generated by PB is temporarily shelved. Please refer to the information in the attachment
Serialization and deserialization
#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""
Created on 1/30/18 4:23 PM
@author: Chen Liang
@function: pb test
"""
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import Log_pb2
import json
def serialize_to_string(msg_obj) :
ret_str = msg_obj.SerializeToString()
return ret_str
def parse_from_string(s) :
log = Log_pb2.Log()
log.ParseFromString(s)
return log
if __name__ == '__main__':
# serialize_to_string
content_dict = {"live_id": "1239182389648923"."identify": "zxc_unique"}
tencent_log = Log_pb2.Log()
tencent_log.time = 1510109254
tencent_log.topic_id = "John Doe"
tencent_log.content = json.dumps(content_dict)
ret_s = serialize_to_string(tencent_log)
print(type(ret_s))
print(ret_s)
# parse_from_string
log_obj = parse_from_string(ret_s)
print(log_obj)
Copy the code
The key operations are the writing and reading of the Message object and the serialization function SerializeToString and the deserialization function ParseFromString
More complex messages
So far, we have only given a simple example of uploading a log. In practice, people often need to define more complex messages. We use the word “complex” not just to mean more fields or more types of fields in terms of numbers, but to mean more complex data structures:
- The Message nested
- Import Message
The following are introduced separately
The Message nested
Nesting is a magic concept, and once you have the ability to nest, the expressiveness of messages can be very powerful. An example of a concrete nested Message is as follows
message Person {
required string name = 1;
required int32 id = 2; // Unique ID number for this person.
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
Copy the code
In Message Person, the nested Message PhoneNumber is defined and used to define the phone field in the Person Message. This makes it possible to define more complex data structures.
Import Message
In one.proto file, you can also use the Import keyword to Import messages defined in other.proto files, which can be called Import Message or Dependency Message. An example of a specific import message is as follows
import common.header;
message youMsg{
required common.info_header header = 1;
required string youPrivateData = 2;
}
Copy the code
Info_header is defined in the common.header package.
The main purpose of Import Message is to provide a convenient code management mechanism, similar to header files in C. You can define some common messages in a package, then import that package in another.proto file and use the Message definitions in it.
The Google Protocol Buffer does a great job of supporting nesting and importing messages, making defining complex data structures a breeze.
Dynamic compilation
Typically, people using a Protobuf will write a.proto file and then use the Protobuf compiler to generate the source code files needed for the target language. Compile the generated code with the application.
However, in some cases, people cannot know about.proto files in advance, and they need to deal with some unknown.proto files dynamically. A generic message forwarding middleware, for example, cannot predict what message needs to be processed. This requires dynamically compiling the.proto file and using the Message in it.
For details, see The Usage and mechanism of Google Protocol Buffer
Reference:
- Developers.google.com/protocol-bu…
- Developers.google.com/protocol-bu…
- Hzy3774.iteye.com/blog/232342…
- Github.com/google/prot…
- Github.com/google/prot…
- Blog.csdn.net/losophy/art…
- www.ibm.com/developerwo…
- Github.com/google/prot…
- Github.com/google/prot…
- Python Google Protocol Buffer: developers.google.com/protocol-bu…