preface
In software development testing process, test data is often needed. These scenarios include:
- After the back-end development creates a new table, it is necessary to construct database test data and generate interface data for the front-end use.
- Database performance test generates a large amount of test data to test database performance
How to write a test data generation script
The first step is to formulate the structure of the data, including name, type, scope, etc
If data is dependent, the final data form can be conceived in advance, and then the form of the data it depends on can be deduced backwards according to the final data form. For example, suppose you need a user tag – life insurance customer value. This label is dependent on life insurance policy amount label, life insurance recently purchased or renewed time label, life insurance purchased or renewed within 2 years frequency label. The life insurance policy amount label relies on a series of event attributes. With the user tag – life Insurance customer value, you can pull out a lot of data structures in reverse.
The second step is to design the distribution of data in different ranges
The quality of the data affects the end result, so the distribution of the data should be reasonable. The distribution of data is related to the business. The optimal distribution is to refer to the distribution of original business data. Other methods include the distribution based on the experience of the business people, the distribution of similar business from the Internet, and the distribution of the data in the subjective imagination of the developers.
The third step is to specify the data generation method, including the number of data, the sequence of data, and the storage method
Step 4: Confirm the data import method
The relevant framework
After investigating various frameworks of Python, I finally chose # Joke2K/Faker to consult the documents and found a method to generate data according to weights, which can meet my needs.
Address: document faker. Readthedocs. IO/en/stable/p…
Simple to use
- Rely on
pip install faker
Copy the code
- Localization, Chinese faker
fake = Faker('zh_CN')
Copy the code
- Simple API calls
from faker import Faker fake = Faker() fake.name() # 'Lucy Cechtelar' fake.address() # Cartwrightshire, SC 88120-6700' fake.text() # 'Sint velit eveniet. Rerum atque repellat voluptatem quia rerum. Numquam excepturi # beatae sint laudantium consequatur. Magni occaecati itaque sint et sit tempore. NesciuntCopy the code
For more information on the use of FAker, see the documentation.
Concrete example
Generate the following user attribute data and save it as a JSON file
User attributes
The property name | Attribute display name | type | The scope of | Distribution proportion |
---|---|---|---|---|
name | The user name | STRING | random | |
age | age | NUMBER | 0-18, 18-26, 26-35, 36-45, 45-55, 55 and above | (2,8,20,34,21,15) |
sex | gender | NUMBER | Men and women | (50.03, 49.97) |
city | Life in the city | STRING | North to Guangshen, Shenyang, Jinan, Tianjin, Xi ‘an, Hohhot, Wenzhou, Huangshan | (40,30,20,10) |
province | provinces | STRING | Beijing, Shanghai and Guangzhou, Liaoning, Shandong, Tianjin, Shaanxi, Inner Mongolia, Jiangsu, Anhui | (40,30,20,10) |
annual_income | Annual income | NUMBER | 0-6W, 6-15W, 15-30W, 30W-80W, 80W and above | (15,45,33,5,2) |
married | Marital status | STRING | Unmarried, married, divorced | (20,70,10) |
occupation | professional | STRING | White-collar workers, teachers, workers, civil servants, sales | (45,10,20,10,15) |
work_state | Working state | STRING | On the job, retired, freelance | (45,35,20) |
family_size | Family size | NUMBER | 1-6, and other | (5,15,18,22,22,15,5) |
children_size | Number of children | NUMBER | 0 to 3, the other | (33,30,20,12,5) |
have_car | Whether to have a car | BOOL | (20,80) | |
vip_level | Membership grade | STRING | 0-5 Ordinary Members – Diamond members | (40,30,15,10,5) |
membership_points | Member of the integral | NUMBER | 0, 1-1000, 1001-2000, 2000-5000, 5000 or more | (20,30,30,15,5) |
is_valid | Whether in the bartender | BOOL | (30,70) | |
education | Record of formal schooling | STRING | High school or below, bachelor, Master, doctor | (35,45,15,5) |
The relevant code
user_faker.py
import json from collections import OrderedDict from datetime import datetime, date from typing import Optional from pydantic import BaseModel from faker_config import fake from snowflake import id_worker class User(BaseModel): user_id: int first_id: int = None second_id: int = None time: Optional[datetime] = None name: str age: int sex: str city: str province: str annual_income: int married: str occupation: str work_state: str family_size: int children_size: int have_car: int vip_level: str membership_points: str is_valid: int education: str create_time: Optional[datetime] create_date: Optional[date] def generate_user(): time = fake.past_datetime(start_date='-2y') user = { "user_id": id_worker.get_id(), "first_id": None, "second_id": None, "time": time, "name": fake.name(), "age": user_faker.age(), "sex": user_faker.sex(), "annual_income": user_faker.annual_income(), "married": user_faker.married(), "occupation": user_faker.occupation(), "work_state": user_faker.work_state(), "family_size": user_faker.family_size(), "children_size": user_faker.children_size(), "have_car": user_faker.have_car(), "vip_level": user_faker.vip_level(), "membership_points": user_faker.membership_points(), "is_valid": user_faker.is_valid(), "education": user_faker.education(), "create_time": time, "create_date": time.date() } user.update(json.loads(user_faker.province_and_city())) user = User(**user) return user class UserFaker: def age(self): Elements = OrderedDict([(fake. Random_int (min=0, Max =18), 0.02), (fake. Random_int (min=19, Max =26), 0.08), (fake. Random_int (min=27, Max =35), 0.2), (fake. Random_int (min=36, Max =45), 0.34), 0.21), (fake random_int (min = 55, Max = 99), 0.15)]) return fake.random_element(elements=elements) def province_and_city(self): Elements = OrderedDict ([(' {" province ", "Beijing", "city" : "Beijing"} ', 0.4), (' {" province ", "liaoning province", "city" : "Shenyang"} ', 0.3), (" {" province ", "shanxi", "city" : "xian"} ', 0.2), (' {" province ", "anhui province", "city" : }', 0.1),]) return fake. Random_element (elements=elements) def annual_income(self): Elements = OrderedDict([(fake. Random_int (min=0, Max =6), 0.15), (fake. Random_int (min=7, Max =15), 0.45), (fake. Random_int (min=16, Max =30), 0.33), (fake. Random_int (min=31, Max =80), 0.02) (3)]) return fake.random_element(elements=elements) def phone: Elements = OrderedDict ([(' unmarried, 0.2), (' married ', 0.7), (' divorced, 0.1). ]) return fake.random_element(elements=elements) def sex(self): Elements = OrderedDict([(' male ', 0.52), (' female ', 0.48)]) return fake. Random_element (elements=elements) def occupation(self): Elements = OrderedDict ([(' white-collar workers', 0.45), (' teachers', 0.1), (' workers', 0.2), (' civil servants', 0.1), (0.15) 'sales', ]) return fake.random_element(elements=elements) def work_state(self): Elements = OrderedDict ([(' on-the-job ', 0.45), (' retirement ', 0.35), (' freelance, 0.20), ]) return fake.random_element(elements=elements) def family_size(self): Elements = OrderedDict ([(1, 0.05), (2, 0.15), (3, 0.18), (4, 0.22), (5, 0.22), (6, 0.15), (fake. Random_int (min = 7, Max =10), 0),]) return fake.children_size (c) def children_size(self): Elements = OrderedDict ([(1, 0.33), (2, 0.35), (3, 0.20), (4, 0.07), (5, Return fake. Element (elements=elements) def have_car(self): Return fake.element (elements=elements) def vip_level(self): return fake.element (elements=elements) def vip_level(self): Elements = OrderedDict ([(1, 0.40), (2, 0.30), (3, 0.15), (4, 0.10), (5, 0)]) return fake. Element (elements=elements) def membership_points(self): Elements = OrderedDict([(fake. Random_int (min=0, Max =0), 0.2), (fake. Random_int (min=1, Max =1000), 0.3), (fake. Random_int (min=1001, Max =2000), 0.3), (fake. Random_int (min=2001, Max =5000), 0.15), (fake. 0)]) return fake. Random_element (elements=elements) def is_valid(self): Return fake. Random_element (elements=elements) def education(self): return fake. Random_element (elements=elements) def education(self): Elements = OrderedDict ([(' high school and below, 0.35), (' bachelor ', 0.45), (' master ', 0.15), (0.05), 'Dr', ]) return fake.random_element(elements=elements) user_faker = UserFaker()Copy the code
faker_config.py
from faker import Faker
fake = Faker('zh_CN')
Copy the code
main.py
import datetime import json from user_faker import generate_user, User class DateEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, datetime.datetime): return obj.strftime("%Y-%m-%d %H:%M:%S") if isinstance(obj, User): return obj.dict() if isinstance(obj, datetime.date): return obj.strftime("%Y-%m-%d") else: return json.JSONEncoder.default(self, obj) def generate_data(row): Print (f" generating data ========>{row} ", datetime.datetime.now()) users = [] for I in range(row): user = generate_user() users.append(user) with open('./user.json', 'w', encoding='utf-8') as fObj: Json. dump(users, fObj, ensure_ASCII =False, CLS =DateEncoder) print(========>{row}, Datetime.datetime.now ()) if __name__ == '__main__': # Row = 10000 generate_data(row)Copy the code
The effect
Generating data ========>10000 items 2021-07-23 11:13:44.739249 Generating test data ========>10000 items, completed 2021-07-23 11:13:48.923505Copy the code
user.json
[{ "user_id": 1418409069177348096, "first_id": null, "second_id": null, "time": "2019-12-31 12:27:59", "name": "LuBing", "age" : 38, "sex", "female", "city" : "shenyang", "province", "liaoning province", "annual_income" : 2, "I" : "married", "occupation" : "White collar", "work_state" : "on-the-job", "family_size" : 4, "children_size" : 2, "have_car" : 0, "vip_level" : "2", "membership_points" : "275", "is_valid" : 1, "education", "high school and the following", "create_time" : "the 2019-12-31 12:27:59", "create_date" : "2019-12-31"}]Copy the code
Commonly used API
- Bothify generates strings and numbers
bothify(text=’## ?? ‘, = ‘abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ’ letters), Number signs (‘ # ‘) are replaced with a random digit (0 To 9). Question marks (‘? ‘) are replaced with a random character from letters.
eg:
for _ in range(5):
fake.bothify(letters='ABCDE')
Copy the code
- lexify(text=’???? = ‘abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ’, ‘letters) random letters
Generate a string with each question mark ') in text interviews with a random character from letters.Copy the code
- Numerify (text=’###’) random number
Number signs (‘ # ‘) are replaced with a random digit (0 to 9). Percent signs (‘ % ‘) are replaced with a random non-zero Digit (1 to 9). Exclamation marks (‘! ‘) are replaced with a random digit or an empty string. At symbols (‘ @ ‘) are replaced with a random non-zero digit or an empty string.
>>> Faker.seed(0)
>>> for _ in range(5):
... fake.numerify(text='Intel Core i%-%%##K vs AMD Ryzen % %%##X')
Copy the code
- Random_digit () A random number
Generate a random digit (0 to 9).
Copy the code
- Random_choices (elements=(‘a’, ‘b’, ‘c’), length=None
Length is a numberCopy the code
- C (c =(‘ c’, ‘c’, ‘c’, length=None, unique=False, use_weighting=None
Fake. Random_elements (elements = OrderedDict ([(" variable_1 ", 0.5), # Generates "variable_1" 50% of the time ("variable_2", 0.2), # Generates "variable_2" 20% of the time ("variable_3", 0.2), 0.2), # Generates "variable_3" 20% of the time ("variable_4": 0), # Generates "variable_4" 10% of the time]), unique=False)Copy the code
- Random_int (min=0, Max =9999, step=1) A random number
- Random_letter ()
- Date_between_dates (date_start=None, date_end=None) Specifies a random date
- Past_datetime (start_date=’-30d’, tzinfo=None) Past random time