preface

In software development testing process, test data is often needed. These scenarios include:

After the back-end development creates a new table, it is necessary to construct database test data and generate interface data for the front-end use.
Database performance test generates a large amount of test data to test database performance

How to write a test data generation script

The first step is to formulate the structure of the data, including name, type, scope, etc

If data is dependent, the final data form can be conceived in advance, and then the form of the data it depends on can be deduced backwards according to the final data form. For example, suppose you need a user tag – life insurance customer value. This label is dependent on life insurance policy amount label, life insurance recently purchased or renewed time label, life insurance purchased or renewed within 2 years frequency label. The life insurance policy amount label relies on a series of event attributes. With the user tag – life Insurance customer value, you can pull out a lot of data structures in reverse.

The second step is to design the distribution of data in different ranges

The quality of the data affects the end result, so the distribution of the data should be reasonable. The distribution of data is related to the business. The optimal distribution is to refer to the distribution of original business data. Other methods include the distribution based on the experience of the business people, the distribution of similar business from the Internet, and the distribution of the data in the subjective imagination of the developers.

The third step is to specify the data generation method, including the number of data, the sequence of data, and the storage method

Step 4: Confirm the data import method

The relevant framework

After investigating various frameworks of Python, I finally chose # Joke2K/Faker to consult the documents and found a method to generate data according to weights, which can meet my needs.

Address: document faker. Readthedocs. IO/en/stable/p…

Simple to use

Rely on

 pip install faker 
Copy the code

Localization, Chinese faker

   fake = Faker('zh_CN')
Copy the code

Simple API calls

from faker import Faker fake = Faker() fake.name() # 'Lucy Cechtelar' fake.address() # Cartwrightshire, SC 88120-6700' fake.text() # 'Sint velit eveniet. Rerum atque repellat voluptatem quia rerum. Numquam excepturi # beatae  sint laudantium consequatur. Magni occaecati itaque sint et sit tempore. NesciuntCopy the code

For more information on the use of FAker, see the documentation.

Concrete example

Generate the following user attribute data and save it as a JSON file

User attributes

The property name	Attribute display name	type	The scope of	Distribution proportion
name	The user name	STRING		random
age	age	NUMBER	0-18, 18-26, 26-35, 36-45, 45-55, 55 and above	(2,8,20,34,21,15)
sex	gender	NUMBER	Men and women	(50.03, 49.97)
city	Life in the city	STRING	North to Guangshen, Shenyang, Jinan, Tianjin, Xi ‘an, Hohhot, Wenzhou, Huangshan	(40,30,20,10)
province	provinces	STRING	Beijing, Shanghai and Guangzhou, Liaoning, Shandong, Tianjin, Shaanxi, Inner Mongolia, Jiangsu, Anhui	(40,30,20,10)
annual_income	Annual income	NUMBER	0-6W, 6-15W, 15-30W, 30W-80W, 80W and above	(15,45,33,5,2)
married	Marital status	STRING	Unmarried, married, divorced	(20,70,10)
occupation	professional	STRING	White-collar workers, teachers, workers, civil servants, sales	(45,10,20,10,15)
work_state	Working state	STRING	On the job, retired, freelance	(45,35,20)
family_size	Family size	NUMBER	1-6, and other	(5,15,18,22,22,15,5)
children_size	Number of children	NUMBER	0 to 3, the other	(33,30,20,12,5)
have_car	Whether to have a car	BOOL		(20,80)
vip_level	Membership grade	STRING	0-5 Ordinary Members – Diamond members	(40,30,15,10,5)
membership_points	Member of the integral	NUMBER	0, 1-1000, 1001-2000, 2000-5000, 5000 or more	(20,30,30,15,5)
is_valid	Whether in the bartender	BOOL		(30,70)
education	Record of formal schooling	STRING	High school or below, bachelor, Master, doctor	(35,45,15,5)

The relevant code

user_faker.py

import json from collections import OrderedDict from datetime import datetime, date from typing import Optional from pydantic import BaseModel from faker_config import fake from snowflake import id_worker class User(BaseModel): user_id: int first_id: int = None second_id: int = None time: Optional[datetime] = None name: str age: int sex: str city: str province: str annual_income: int married: str occupation: str work_state: str family_size: int children_size: int have_car: int vip_level: str membership_points: str is_valid: int education: str create_time: Optional[datetime] create_date: Optional[date] def generate_user(): time = fake.past_datetime(start_date='-2y') user = { "user_id": id_worker.get_id(), "first_id": None, "second_id": None, "time": time, "name": fake.name(), "age": user_faker.age(), "sex": user_faker.sex(), "annual_income": user_faker.annual_income(), "married": user_faker.married(), "occupation": user_faker.occupation(), "work_state": user_faker.work_state(), "family_size": user_faker.family_size(), "children_size": user_faker.children_size(), "have_car": user_faker.have_car(), "vip_level": user_faker.vip_level(), "membership_points": user_faker.membership_points(), "is_valid": user_faker.is_valid(), "education": user_faker.education(), "create_time": time, "create_date": time.date() } user.update(json.loads(user_faker.province_and_city())) user = User(**user) return user class UserFaker: def age(self): Elements = OrderedDict([(fake. Random_int (min=0, Max =18), 0.02), (fake. Random_int (min=19, Max =26), 0.08), (fake. Random_int (min=27, Max =35), 0.2), (fake. Random_int (min=36, Max =45), 0.34), 0.21), (fake random_int (min = 55, Max = 99), 0.15)]) return fake.random_element(elements=elements) def province_and_city(self): Elements = OrderedDict ([(' {" province ", "Beijing", "city" : "Beijing"} ', 0.4), (' {" province ", "liaoning province", "city" : "Shenyang"} ', 0.3), (" {" province ", "shanxi", "city" : "xian"} ', 0.2), (' {" province ", "anhui province", "city" : }', 0.1),]) return fake. Random_element (elements=elements) def annual_income(self): Elements = OrderedDict([(fake. Random_int (min=0, Max =6), 0.15), (fake. Random_int (min=7, Max =15), 0.45), (fake. Random_int (min=16, Max =30), 0.33), (fake. Random_int (min=31, Max =80), 0.02) (3)]) return fake.random_element(elements=elements) def phone: Elements = OrderedDict ([(' unmarried, 0.2), (' married ', 0.7), (' divorced, 0.1). ]) return fake.random_element(elements=elements) def sex(self): Elements = OrderedDict([(' male ', 0.52), (' female ', 0.48)]) return fake. Random_element (elements=elements) def occupation(self): Elements = OrderedDict ([(' white-collar workers', 0.45), (' teachers', 0.1), (' workers', 0.2), (' civil servants', 0.1), (0.15) 'sales', ]) return fake.random_element(elements=elements) def work_state(self): Elements = OrderedDict ([(' on-the-job ', 0.45), (' retirement ', 0.35), (' freelance, 0.20), ]) return fake.random_element(elements=elements) def family_size(self): Elements = OrderedDict ([(1, 0.05), (2, 0.15), (3, 0.18), (4, 0.22), (5, 0.22), (6, 0.15), (fake. Random_int (min = 7, Max =10), 0),]) return fake.children_size (c) def children_size(self): Elements = OrderedDict ([(1, 0.33), (2, 0.35), (3, 0.20), (4, 0.07), (5, Return fake. Element (elements=elements) def have_car(self): Return fake.element (elements=elements) def vip_level(self): return fake.element (elements=elements) def vip_level(self): Elements = OrderedDict ([(1, 0.40), (2, 0.30), (3, 0.15), (4, 0.10), (5, 0)]) return fake. Element (elements=elements) def membership_points(self): Elements = OrderedDict([(fake. Random_int (min=0, Max =0), 0.2), (fake. Random_int (min=1, Max =1000), 0.3), (fake. Random_int (min=1001, Max =2000), 0.3), (fake. Random_int (min=2001, Max =5000), 0.15), (fake. 0)]) return fake. Random_element (elements=elements) def is_valid(self): Return fake. Random_element (elements=elements) def education(self): return fake. Random_element (elements=elements) def education(self): Elements = OrderedDict ([(' high school and below, 0.35), (' bachelor ', 0.45), (' master ', 0.15), (0.05), 'Dr', ]) return fake.random_element(elements=elements) user_faker = UserFaker()Copy the code

faker_config.py

from faker import Faker

fake = Faker('zh_CN')
Copy the code

main.py

import datetime import json from user_faker import generate_user, User class DateEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, datetime.datetime): return obj.strftime("%Y-%m-%d %H:%M:%S") if isinstance(obj, User): return obj.dict() if isinstance(obj, datetime.date): return obj.strftime("%Y-%m-%d") else: return json.JSONEncoder.default(self, obj) def generate_data(row): Print (f" generating data ========>{row} ", datetime.datetime.now()) users = [] for I in range(row): user = generate_user() users.append(user) with open('./user.json', 'w', encoding='utf-8') as fObj: Json. dump(users, fObj, ensure_ASCII =False, CLS =DateEncoder) print(========>{row}, Datetime.datetime.now ()) if __name__ == '__main__': # Row = 10000 generate_data(row)Copy the code

The effect

Generating data ========>10000 items 2021-07-23 11:13:44.739249 Generating test data ========>10000 items, completed 2021-07-23 11:13:48.923505Copy the code

user.json

[{ "user_id": 1418409069177348096, "first_id": null, "second_id": null, "time": "2019-12-31 12:27:59", "name": "LuBing", "age" : 38, "sex", "female", "city" : "shenyang", "province", "liaoning province", "annual_income" : 2, "I" : "married", "occupation" : "White collar", "work_state" : "on-the-job", "family_size" : 4, "children_size" : 2, "have_car" : 0, "vip_level" : "2", "membership_points" : "275", "is_valid" : 1, "education", "high school and the following", "create_time" : "the 2019-12-31 12:27:59", "create_date" : "2019-12-31"}]Copy the code

Commonly used API

Bothify generates strings and numbers

bothify(text=’## ?? ‘, = ‘abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ’ letters), Number signs (‘ # ‘) are replaced with a random digit (0 To 9). Question marks (‘? ‘) are replaced with a random character from letters.

eg:
for _ in range(5):
    fake.bothify(letters='ABCDE')
Copy the code

lexify(text=’???? = ‘abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ’, ‘letters) random letters

Generate a string with each question mark ') in text interviews with a random character from letters.Copy the code

Numerify (text=’###’) random number

Number signs (‘ # ‘) are replaced with a random digit (0 to 9). Percent signs (‘ % ‘) are replaced with a random non-zero Digit (1 to 9). Exclamation marks (‘! ‘) are replaced with a random digit or an empty string. At symbols (‘ @ ‘) are replaced with a random non-zero digit or an empty string.

>>> Faker.seed(0)
>>> for _ in range(5):
...     fake.numerify(text='Intel Core i%-%%##K vs AMD Ryzen % %%##X')
Copy the code

Random_digit () A random number

    Generate a random digit (0 to 9).
Copy the code

Random_choices (elements=(‘a’, ‘b’, ‘c’), length=None

Length is a numberCopy the code

C (c =(‘ c’, ‘c’, ‘c’, length=None, unique=False, use_weighting=None

Fake. Random_elements (elements = OrderedDict ([(" variable_1 ", 0.5), # Generates "variable_1" 50% of the time ("variable_2", 0.2), # Generates "variable_2" 20% of the time ("variable_3", 0.2), 0.2), # Generates "variable_3" 20% of the time ("variable_4": 0), # Generates "variable_4" 10% of the time]), unique=False)Copy the code

Random_int (min=0, Max =9999, step=1) A random number
Random_letter ()
Date_between_dates (date_start=None, date_end=None) Specifies a random date
Past_datetime (start_date=’-30d’, tzinfo=None) Past random time

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

How does Python gracefully generate test data

preface

How to write a test data generation script

The first step is to formulate the structure of the data, including name, type, scope, etc

The second step is to design the distribution of data in different ranges

The third step is to specify the data generation method, including the number of data, the sequence of data, and the storage method

Step 4: Confirm the data import method

The relevant framework

Simple to use

Concrete example

User attributes

The relevant code

user_faker.py

faker_config.py

main.py

The effect

Commonly used API

How does Python gracefully generate test data

preface

How to write a test data generation script

The first step is to formulate the structure of the data, including name, type, scope, etc

The second step is to design the distribution of data in different ranges

The third step is to specify the data generation method, including the number of data, the sequence of data, and the storage method

Step 4: Confirm the data import method

The relevant framework

Simple to use

Concrete example

User attributes

The relevant code

user_faker.py

faker_config.py

main.py

The effect

Commonly used API

Related Posts

Spring AOP series (part 5 – Final) – Reflection

Introduction to Kubernetes

Understand Tensorflow in Go