While writing business logic code, I had the misfortune to encounter the following table structure (having already pulled out the main logic):
class Category(Model): __tablename__ = 'category' # ID = Column(Integer, primary_key=True, Autoincrement =True) # name = Column(String(length=255)) class Product(Model): __tablename__ = 'product' # ID = Column(Integer, primary_key=True, Autoincrement =True) # name = Column(String(length=255)) # category_id = Column(value =255)
The business that needs to be implemented now is to return the sorted list results:
[{" id ": 1," name ":" category 1 ", "product_count" : 1},...
This is a one-to-many model. The usual clumsy thinking goes something like this:
data = []
categorys = Category.query.all()
for category in categorys:
product_count = len(Product.query.filter(Product.category_id == category.id).all())
data.append({
'id': category.id,
'name': category.name,
'product_count': product_count
})
Len (product.query. Filter (product.category_id == Category. ID).all())) =
product_count = Product.query.filter(Product.category_id == category.id).count()
However, according to this article :[Why is SQLAlchemy count() much slower than the raw query?] slower (https://stackoverflow.com/que… It seems to give better performance to write like this:
from sqlalchemy import func
session.query(func.count(Product.id)).filter(Product.category_id == category.id).scalar()
However, for those who are even a little bit more experienced, product_count is placed in for category in categorys:, which means that if there are thousands of categorys, Thousands of session.query() requests will be made, and the database requests are consumed on the network. The request time is relatively long, and some databases do not handle the connection pool well. Setting up and disconnecting the connection is also a huge overhead, so the fewer queries the better. Placing Query in a for loop like the one above is obviously not a good idea. The following version of the request was created:
result = db.session.query(Product, Category) \
.filter(Product.category_id == Category.id)\
.order_by(Category.id).all()
id_list = []
data = []
for product, category in result:
if category and product:
if category.id not in id_list:
id_list.append(category.id)
data.append({
'id': category.id,
'name': category.name,
'product_count': 0
})
idx = id_list.index(category.id)
data[idx]['product_count'] += 1
This is ugly, and also makes poor use of SQLAlchemy’s count function. Then it changed to:
product_count = func.count(Product.id).label('count')
results = session.query(Category, product_count) \
.join(Product, Product.category_id == Category.id) \
.group_by(Category).all()
data = [
{
'id': category.id,
'name': category.name,
'product_count': porduct_count
} for category, product_count in results]
But there is another problem here, that is, if you add a Category first, and there is no Product in the Category, then the Category will not appear in the data, so the join must be changed to OuterJoin. That is:
results = session.query(Category, product_count) \
.outerjoin(Product, Product.category_id == Category.id) \
.group_by(Category).all()
Demand again!! Now consider designing a Product to be deleted in pseudo-deletion mode by adding an is_deleted attribute to determine whether the Product has been deleted. Instead of simply counting (product.id), the count function determines whether product.is_deleted is true and whether product.is_deleted is not. After some research, we found that we can achieve this with func.nullif.
product_count = func.count(func.nullif(Product.is_deleted.is_(False), False)).label('count')
results = session.query(Category, product_count) \
.join(Product, Product.category_id == Category.id) \
.group_by(Category).all()
data = [
{
'id': category.id,
'name': category.name,
'product_count': porduct_count
} for category, product_count in results]
As you can see, there are still a lot of things to consider when using ORM.