- How to write your own Python Documentation Generator
- Cristian Medina
- The Nuggets translation Project
- Translator: Wang Zijian
- Proofreader: Zheaoli, Zhiwei Yu
When I first started learning Python, one of my favorite things was sitting in front of the compiler, using the built-in help function to examine classes and methods and decide what I wanted to write next. This function takes in an object and examines its internal members, generates instructions and outputs something like a help document to help you understand how to use the object.
One of the great things about putting help into the library is that it outputs instructions directly from the code, which indirectly encourages laziness on the part of people like me who don’t want to spend extra time maintaining documentation. Especially if you already have straightforward names for your variables and functions, the help function can add descriptions to your functions and classes and correctly identify private and protected members by the underscore prefix.
Help on class list in module builtins: class list(object) | list() -> new empty list | list(iterable) -> new list initialized from iterable's items | | Methods defined here: | | __add__(self, value, /) | Return self+value. ... | __iter__(self, /) | Implement iter(self). ... | append(...) | L.append(object) -> None -- append object to end | | extend(...) | L.extend(iterable) -> None -- extend list by appending elements from the iterable | | index(...) | L.index(value, [start, [stop]]) -> integer -- return first index of value. | Raises ValueError if the value is not present. ... | pop(...) | L.pop([index]) -> item -- remove and return item at index (default last). | Raises IndexError if list is empty or index is out of range. | | remove(...) | L.remove(value) -> None -- remove first occurrence of value. | Raises ValueError if the value is not present. ... | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __hash__ = NoneCopy the code
Using help(list) in the Python compiler will print the above
In fact, the help function uses the PyDoc module to generate the output, which can also be run on the command line to generate any.txt or.html documentation for imported modules.
Not long ago, I needed to write some more detailed, formal design documents, and as a Markdown fan, I decided to see if MKDocs could give me something to work with. This module makes it easy to convert your Markdown text into a beautifully styled web page, and can be modified before you launch. It provides a ReadTheDocs template and a simple command-line interface for pushing content to GitHub Pages.
After completing some initial design requirements documents, I wanted to add details to my own and existing interfaces for other modules. Since I had defined most of the methods, I wanted to automatically generate reference pages from the source file, and I wanted to use the Markdown format so THAT I could later render an HTML document with mkdocs along with other files.
However, there is no default way to generate markdown files from source files using mkdocs, but there are plug-ins that do. After a while of searching and researching, I became frustrated with the projects and plug-ins I found online — many were outdated, unmaintained, or simply didn’t output what I wanted — so I decided to write my own. It was fun to learn to use the Inspect module, which I tried to use when building debuggers in this article: Hacking Together a Simple Graphical Python Debugger).
“The Inspect module provides some very useful methods for getting information about the current object…” – Python documentation
Check it out!
Inspect is a library module that not only lets you Inspect low-level Python frame and code objects, but also provides a number of methods to Inspect modules and classes to help you find things that might be of interest. As mentioned above, PyDoc uses it to generate help documents.
As you browse through the online documentation, you’ll notice that there are many related functions. The most important are getMembers (), getDoc (), and signature(). Function. With these functions, we can easily traverse functions, including differentiating generators from coroutines, and recursively into and within any class as needed.
If we’re looking at an object, any object, the first thing we need to do is provide the structure to bring it into the namespace. Why talk about introduction? Given what we have to do, there are many things to consider, such as virtual environments, custom code, standard modules, and repeated naming. It’s a real mess, one wrong move and all bets are off.
There are a few things to choose from, the best being the reuse of SafeImport () in PyDoc, which takes care of special cases for us and throws an ErrorDuringImport exception when something goes wrong. However, if our environment is more controlled, we can simply run __import__(Modulename).
Another thing to keep in mind is the execution path of your code. Sys.path.append () may need a directory to get the modules we need. I executed from the command line in a directory in the path of the module being examined, so I added the current directory to the system path, which solved the typical import path problem.
Remember, our introduction function is written like this:
def generatedocs(module): try: sys.path.append(os.getcwd()) # Attempt import mod = safeimport(module) if mod is None: Return getmarkDown (mod) except ErrorDuringImport as e: print("Module not found") return getmarkDown (mod) except ErrorDuringImport as e: print("Error while trying to import " + module)Copy the code
Determine the output
At this point, you will have a mental blueprint of how to organize the generated Markdown content. Do you want a sketch that doesn’t recurse inside a custom class? Which methods are we going to generate description documents for? Do I need to generate instructions for built-in content? Or _ and __ methods (that is, non-public and magic methods)? How do we express the function signature? Do we want to get comments?
My options are as follows:
- Each run generates an object containing information that recurses inside the various subclasses of the object being viewed
.md
file - Only the custom code I create is generated with instructions, not the imported modules
- Each part of the output must use mrakdown’s secondary heading (
# #
) markers - All titles must contain the full path to the current description of the project (
Module. Class. Subclass. methods
) - Make the complete function signature as text in a predefined format
- Provide an anchor point for each heading for quick links to documents (and within documents)
- Any to
_
or__
None of the initial functions generate documentation
Once the object is introduced, we can begin to examine it simply by repeatedly calling the getMembers (object, filter) method, which is an IS method. You can use not only isClass and isFunction methods, but also other methods such as isMethod, ISGenerator and isCoroutine. It all depends on whether you want to write generics that can handle all the special cases, or something more nuanced and characteristic. Since there is nothing to worry about, I always use the first two methods and create the document formats for modules, classes, and methods in three ways.
def getmarkdown(module): output = [ module_header ] output.extend(getfunctions(module) output.append("***\n") output.extend(getclasses(module)) return "".join(output) def getclasses(item): output = list() for cl in inspect.getmembers(item, inspect.isclass): if cl[0] ! = "__class__" and not cl[0].startswith("_"): # Consider anything that starts with _ private # and don't document it output.append( class_header ) output.append(cl[0]) # Get the docstring output.append(inspect.getdoc(cl[1]) # Get the functions output.extend(getfunctions(cl[1])) # Recurse into any subclasses output.extend(getclasses(cl[1]) return output def getfunctions(item): for func in inspect.getmembers(item, inspect.isfunction): output.append( function_header ) output.append(func[0]) # Get the signature output.append("\n```python\n) output.append(func[0]) output.append(str(inspect.signature(func[1])) # Get the docstring output.append(inspect.getdoc(func[1]) return outputCopy the code
When formatting a large chunk of text mixed with program code, I like to break it up into lists or tuples and use “”.join() to group the output together, which is actually much faster than adding.format and %. However, the new string formatting method in Python 3.6 is faster and more readable than this method.
As you can see, getMembers () first returns the name of the object, and then the actual object, which we can use to recurse the entire object structure.
We can use getDoc () or getComments () to get description content and comments for each retrieved content. For functions, we can use signature() to get signature objects that describe their location and keyword arguments, default values, and annotations, and have the flexibility to generate very descriptive and well-styled text to help users understand the intent of our coding.
Plan for a rainy day and just in case
Keep in mind that the code above is just to give you a sense of what the result will be, and consider the following questions before you get there:
-
As shown above, getFunctions and getClasses show all the functions and classes introduced in the module, including the built-in and extension packages, so you need to filter further in the for loop. Finally, I use the __file__ attribute of the module in which the content is currently being viewed. In other words, if there is a module in the path being viewed and the content being viewed is defined in the module, we can then introduce it using os.path.monprefix ().
-
There are also some problems with file paths, introducing structure, and naming. For example, when you introduce moduleX into a code package via init.py, you will be able to retrieve its functions via package.modulex.function, But by moduleX. Name full name is returned package. The moduleX. ModuleX. Function, at the time of iterative content need to always remember.
-
You’ll also import classes from Builtins, but the built-in modules don’t have a __file__ attribute, so check when you add filters.
-
Because it is markdown syntax and we are simply introducing the specification, you can introduce the markdown syntax in the specification document and it will display nicely on the page. However, this means that you need to do it right and avoid documentation that affects HTML generation.
I ran the generator on the sofi code package — the sofi.app module to be exact — and here’s what the resulting Markdown file looks like.
# sofi
### [sofi](#sofi).\_\_init\_\_
```python
__init__(self)
```
### [sofi](#sofi).addclass
```python
addclass(self, selector, cl)
```
Add the given class from all elements matching this selector.
Copy the code
Here is a sample content of the readThedocs theme generated under mkdocs (without function comments) :
As I’m sure you already know, auto-generated documentation using these mechanisms provides complete, accurate, and up-to-date module information, which makes modules easy to maintain and edit during programming rather than as an afterthought. Instead of making a document that matches a module when it’s no longer needed, you can make a document that matches a module instead. I highly recommend that everyone give it a try.
Before I close, I want to review and explain that MakDocs is not the only documentation package out there. There are many well-known and widely used documentation packages, such as Sphinx (which mkdocs is based on) and Doxygen, both of which can implement what we’re talking about today. However, AS always, I do so in order to gain a deeper understanding of Python and the tools that come with it.