This article describes how to use the Google Protocol Buffer (hereafter abbreviated as PB) in Python, including the following parts:

  • Why use PB?
  • Install Google PB
  • Custom.proto file
  • Compile.proto file
  • Parse the target py file
  • Serialization and deserialization
  • More complex messages
  • Dynamic compilation

Why use PB?

Protocol Buffer (PB) is a structured data exchange format developed by Google and used as the standard writing format of Tencent cloud log service. Therefore, before writing log data, you need to serialize the original log data to PB data flow and write it to the server through API. However, it is difficult to operate PB format in each end class program, so a PB conversion layer needs to be added between the end class and the log service.

Of course, PB format also has its own advantages, mainly simple and fast, specific test results see Google serialization benchmark analysis

Install Google PB

If you want to use PB in Python, you need to install the PB compiler protoc to compile your.proto file.

Download the latest Protobuf release package and install it. The current version is 3.5.1

Wget https://github.com/google/protobuf/releases/download/v3.5.1/protobuf-all-3.5.1.tar.gz tar XVFZ Protobuf - all - 3.5.1 track of. Tar. GzcdProtobuf-3.5.1 /./configure --prefix=/usr make make check make installCopy the code

If all the check steps pass, the compilation succeeds.

Continue to install the Python module for Protobuf

cd ./python 
python setup.py build 
python setup.py test 
python setup.py install
Copy the code

Verify the protoc command

root@ubuntu:~# protoc --version
libprotoc 3.5.1
Copy the code

The protobuf’s default installation location is/usr/local/usr/local/lib is not in the Ubuntu system default LD_LIBRARY_PATH, if when configure is not specified in the Ubuntu installation path is/usr, will appear the following error

protoc: error while loading shared libraries: libprotoc.so.8: cannot open shared object file: No such file or directory
Copy the code

You can use the ldconfig command to resolve the problem. For Protobuf cannot find shared libraries, this error is mentioned in the README of the installation package. You can also reinstall it

Verify that Python modules are installed correctly

import google.protobuf
Copy the code

In the Python interpreter, if the preceding import does not fail, the installation is normal.

Custom.proto file

First we need to write a proto file that defines the structured data we need to process in our program. In Protobuf terminology, structured data is called Message. Proto files are very similar to data definitions in Java or C++. The proto sample file cls.log. proto is as follows:

syntax = "proto2";
package cls;
message Log
{
    optional uint64 time = 1; // UNIX Time Format
    required string topic_id = 2;
    required string content = 3;
}
Copy the code

The.proto file begins with a package declaration to help prevent naming conflicts in different projects. In Python, packages are usually determined by directory structure, so this.proto file defines packages that have no effect in actual Python code. However, the official advice is to stick with this statement, mainly to prevent name collisions in the NAMESPACE of the PB. The package name is CLS and defines a message Log that has three members with the following meanings:

The field name type location Whether must meaning
time uint64 body no Log time. If this parameter is not specified, the time when the server receives the request is used
topic_id string body is Id of the reported log subject
content string body is Log contents

A good habit is to take the name of a proto file seriously. Such as the naming rules as: packageName. MessageName. Proto

Compile.proto file

Use the protoc compiler to compile directly, specifying the source file path and target file path

SRC_DIR=/tmp/src_dir
DST_DIR=/tmp/dst_dir
protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/cls.Log.proto
Copy the code

Use the –python_out option to generate Python classes. Use the –cpp_out option to generate C++ classes

Parse the target py file

The file directories generated in the target folder are as follows:

root@ubuntu:/tmp/dst_dir# tree.├ ─ garbage ─ log_2.py 1 directory, 1 fileCopy the code

The log_pb2. py file contains the following contents (editing is not allowed) :

# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: cls.Log.proto

import sys
_b=sys.version_info[0] <3 and (lambda x:x) or (lambda x:x.encode('latin1'))
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
from google.protobuf import descriptor_pb2
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()




DESCRIPTOR = _descriptor.FileDescriptor(
  name='cls.Log.proto',
  package='cls',
  syntax='proto2',
  serialized_pb=_b('\n\rcls.Log.proto\x12\x03\x63ls\"6\n\x03Log\x12\x0c\n\x04time\x18\x01 \x01(\x04\x12\x10\n\x08topic_id\x18\x02 \x02(\t\x12\x0f\n\x07\x63ontent\x18\x03 \x02(\t')
)




_LOG = _descriptor.Descriptor(
  name='Log',
  full_name='cls.Log',
  filename=None,
  file=DESCRIPTOR,
  containing_type=None,
  fields=[
    _descriptor.FieldDescriptor(
      name='time', full_name='cls.Log.time', index=0,
      number=1.type=4, cpp_type=4, label=1,
      has_default_value=False, default_value=0,
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      options=None, file=DESCRIPTOR),
    _descriptor.FieldDescriptor(
      name='topic_id', full_name='cls.Log.topic_id', index=1,
      number=2.type=9, cpp_type=9, label=2,
      has_default_value=False, default_value=_b("").decode('utf-8'),
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      options=None, file=DESCRIPTOR),
    _descriptor.FieldDescriptor(
      name='content', full_name='cls.Log.content', index=2,
      number=3.type=9, cpp_type=9, label=2,
      has_default_value=False, default_value=_b("").decode('utf-8'),
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      options=None, file=DESCRIPTOR),
  ],
  extensions=[
  ],
  nested_types=[],
  enum_types=[
  ],
  options=None,
  is_extendable=False,
  syntax='proto2',
  extension_ranges=[],
  oneofs=[
  ],
  serialized_start=22,
  serialized_end=76,
)

DESCRIPTOR.message_types_by_name['Log'] = _LOG
_sym_db.RegisterFileDescriptor(DESCRIPTOR)

Log = _reflection.GeneratedProtocolMessageType('Log', (_message.Message,), dict(
  DESCRIPTOR = _LOG,
  __module__ = 'cls.Log_pb2'
  # @@protoc_insertion_point(class_scope:cls.Log)
  ))
_sym_db.RegisterMessage(Log)


# @@protoc_insertion_point(module_scope)

Copy the code

The source code of py file generated by PB is temporarily shelved. Please refer to the information in the attachment

Serialization and deserialization

#! /usr/bin/env python
# -*- coding: utf-8 -*-

"""
Created on 1/30/18 4:23 PM
@author: Chen Liang
@function: pb test
"""

import sys

reload(sys)
sys.setdefaultencoding('utf-8')
import Log_pb2
import json


def serialize_to_string(msg_obj) :
    ret_str = msg_obj.SerializeToString()
    return ret_str


def parse_from_string(s) :
    log = Log_pb2.Log()
    log.ParseFromString(s)
    return log

if __name__ == '__main__':
    # serialize_to_string
    content_dict = {"live_id": "1239182389648923"."identify": "zxc_unique"}
    tencent_log = Log_pb2.Log()
    tencent_log.time = 1510109254
    tencent_log.topic_id = "John Doe"
    tencent_log.content = json.dumps(content_dict)
    ret_s = serialize_to_string(tencent_log)
    print(type(ret_s))
    print(ret_s)

    # parse_from_string
    log_obj = parse_from_string(ret_s)
    print(log_obj)

Copy the code

The key operations are the writing and reading of the Message object and the serialization function SerializeToString and the deserialization function ParseFromString

More complex messages

So far, we have only given a simple example of uploading a log. In practice, people often need to define more complex messages. We use the word “complex” not just to mean more fields or more types of fields in terms of numbers, but to mean more complex data structures:

  • The Message nested
  • Import Message

The following are introduced separately

The Message nested

Nesting is a magic concept, and once you have the ability to nest, the expressiveness of messages can be very powerful. An example of a concrete nested Message is as follows

message Person { 
 required string name = 1; 
 required int32 id = 2;        // Unique ID number for this person. 
 optional string email = 3; 
 
 enum PhoneType { 
   MOBILE = 0; 
   HOME = 1; 
   WORK = 2; 
 } 
 
 message PhoneNumber { 
   required string number = 1; 
   optional PhoneType type = 2 [default = HOME]; 
 } 
 repeated PhoneNumber phone = 4; 
}
Copy the code

In Message Person, the nested Message PhoneNumber is defined and used to define the phone field in the Person Message. This makes it possible to define more complex data structures.

Import Message

In one.proto file, you can also use the Import keyword to Import messages defined in other.proto files, which can be called Import Message or Dependency Message. An example of a specific import message is as follows

import common.header; 
 
message youMsg{ 
 required common.info_header header = 1; 
 required string youPrivateData = 2; 
}
Copy the code

Info_header is defined in the common.header package.

The main purpose of Import Message is to provide a convenient code management mechanism, similar to header files in C. You can define some common messages in a package, then import that package in another.proto file and use the Message definitions in it.

The Google Protocol Buffer does a great job of supporting nesting and importing messages, making defining complex data structures a breeze.

Dynamic compilation

Typically, people using a Protobuf will write a.proto file and then use the Protobuf compiler to generate the source code files needed for the target language. Compile the generated code with the application.

However, in some cases, people cannot know about.proto files in advance, and they need to deal with some unknown.proto files dynamically. A generic message forwarding middleware, for example, cannot predict what message needs to be processed. This requires dynamically compiling the.proto file and using the Message in it.

For details, see The Usage and mechanism of Google Protocol Buffer


Reference:

  1. Developers.google.com/protocol-bu…
  2. Developers.google.com/protocol-bu…
  3. Hzy3774.iteye.com/blog/232342…
  4. Github.com/google/prot…
  5. Github.com/google/prot…
  6. Blog.csdn.net/losophy/art…
  7. www.ibm.com/developerwo…
  8. Github.com/google/prot…
  9. Github.com/google/prot…
  10. Python Google Protocol Buffer: developers.google.com/protocol-bu…