This article is participating in Python Theme Month. See the link for details

Not long ago, the 2020 openEuler College Developers Competition came to an end. In this competition, our team undertook the project of “Developing Python client for openLooKeng”, successfully entered the final and won the second prize, and gained a new understanding and experience of the open source community. We have been invited to introduce our pyOpenLooKeng and our competition history.

What is openLooKeng?

OpenLooKeng, formerly known as Hetu, is a distributed, low-latency, reliable data engine that provides a unified SQL interface and features such as high performance, cross-data center/cloud queries, and data source extension to make interactive analysis of big data easier.

OpenLooKeng uses Presto, FaceBook’s open source distributed SQL engine, to provide interactive query analysis capabilities. In addition, openLooKeng also implements many features, such as dynamic filtering, bitmap indexing, multiple caches, and cross-DC connectors. This gives openLooKeng better performance, more scalability and usability, and truly implements the concept of “SQL on Everything”.

How do we make pyOpenLooKeng?

This is the warehouse for our project: gitee.com/openeuler20…

Project analysis

OpenLooKeng uses REST to communicate, including the CLI, JDBC and Coordinator, and Coordinator and Worker. The Python client can also use REST communication to encapsulate a Statement into a REST request based on OpenLooKeng requirements and send it to a Coordinator for execution. The Statement contains the following requests:

1. Submit the query request POST /v1/ Statement

2. Query information until the query is complete: GET /v1/statement/{ID}

DELETE a query: DELETE /v1/statement/{queryId}

In addition, OpenLooKeng must provide encryption and authentication functions, including HTTPS encryption and Basic and Kerberos authentication. As a module of OpenLooKeng, the Python client must comply with OpenLooKeng development specifications, be fully tested, and be fully functional and able to access OpenLooKeng services.

Architecture is introduced

  • Auth: The schema is used to get the request session, which is authenticated by HTTPBasic or Kerberos;

  • Err: Contains errors and warnings that may be encountered during the operation,

  • Common: Base classes for some common DB-API logic;

  • Connections: Manages openLooKeng connections for retrieving cursors, setting encrypted authentication information, and so on;

  • Cursor: Cursor represents a database cursor that is used to manage the context of extraction operations.

Function is introduced

PyopenLooKeng uses the PEP-249v2.0 database API specification. Pep-249 DB-API provides a consistent access interface for different databases, such as databases that can be easily ported from Mysql to OpenLooKeng with only a few code changes.

Similar to PyMysql, we need to use cur.execute() to submit query requests. Fetch family of functions (fetchone() fetchmany() fetchall()) to fetch the result set; Cancel the query with the cancel() function.

PyOpenLooKeng also provides additional functionality, such as access to cluster information, working node information, query data and phase data in Connection.

In the process, we discovered that the v1/stage RESTful interface of openLooKeng was unusable, and we returned to the community with an issue: gitee.com/openlookeng…

We also support HTTPS encryption and Basic and Kerberos authentication,

The detailed code is as follows:

from pyopenLooKeng import connections, auth

myAuth = auth.BasicAuthentication(username='user', password='password')
# Kerberos authentication can also be used
conn = connections.connect(host='host', port=8080, catalog='system', schema='runtime',
                           protocol="https", https_verify="trust", auth=myAuth)
print(conn.cluster())
print(conn.query())
#...
cur = conn.cursor()
cur.execute("SHOW TABLES")
res = cur.fetchall()
print(res)
# (('nodes',), ('queries',), ('tasks',), ('transactions',))
Copy the code

When encrypting HTTPS, we also implemented two methods: ignoring certificate validation and using certificates from Python’s third-party library Certifi; So, we can put openLooKeng’s public key into a file pointed to by Certifici.Where (), increasing the security and convenience of pyOpenLooKeng.

Participating experience

This is our first time to participate in the open source community competition, and we have learned a lot in the process.

  1. The open source community is very helpful. At the beginning, we encountered a problem that the installation was stuck in waiting cluster to start. After several attempts, the problem could not be solved. At last we succeeded. The second one is the bug of stage interface. At first, I thought it was the problem of my environment configuration, but after communication with the community, I determined that there should be a bug in the system code. Under the guidance of teacher Xu Dezhi in the community, I submitted the issue of openLooKeng.
  2. Making good use of tools such as Gitee’s Issue biweekly and wiki can provide a lot of convenience for users as well as time management for developers.
  3. Learning from the experience of open source software, we were initially at a loss when faced with PEP-249. We later looked at the implementation of Pymysql and got a lot of help from it.
  4. Open source is not just about putting your code out there. It’s also about being well documented, conforming to community norms, being well tested, and most importantly being maintained for a long time.

In the future, our team will continue to improve this project and contribute more to the development of openLooKeng. We also wish our openLooKeng will get better and better!