introduce
FuzzyWuzzy is a third-party tool library for string matching that computes the difference between two specified strings based on Levenshtein Distance.
Github project address
Environmental installation
Pythonv 2.7 +
Difflib
- python-Levenshtein, which can be provided if the string matches the difference
4-10x
The acceleration, but inCertain conditions”May cause different matching results
PIP can be used to install:
python -m pip install fuzzywuzzy
Copy the code
Or you need to install the Python-Levenshtein library:
python -m pip install fuzzywuzzy[speedup]
Copy the code
Import the alarm information when running:
UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
Copy the code
Install the library for clearing alarm data:
python -m pip3 install python-levenshtein
Copy the code
Supported testing tools
pycodestyle
hypothesis
pytest
Simple to use
#! /usr/bin/env python
# _*_ Coding: UTF-8 _*_
from fuzzywuzzy import fuzz, process
# Easy to use
print(fuzz.ratio("this is a medusa blog"."this is a blog!")) # 78
Not an exact match
print(fuzz.partial_ratio("this is a test"."this is a test!")) # 100
print(fuzz.partial_ratio("It is never too old to learn"."It is never too old to learn!")) # 100
print(fuzz.partial_ratio("no cross, no crown."."No cross, no crown.")) # 95
# ignore matching order
print(fuzz.ratio("Medusa Sorcerer Blog"."Blog Medusa Sorcerer")) # 75
print(fuzz.token_sort_ratio("Medusa Sorcerer Blog"."Blog Medusa Sorcerer")) # 100
# deduplicate subset matching
print(fuzz.token_sort_ratio("fuzzy was a bear"."fuzzy fuzzy was a bear")) # 84
print(fuzz.token_set_ratio("fuzzy was a bear"."fuzzy fuzzy was a bear")) # 100
[(" string ", matching degree),...]
choices = ["Atlanta Falcons"."New York Jets"."New York Giants"."Dallas Cowboys"]
print(process.extract("new york jets", choices, limit=2)) # [('New York Jets', 100), ('New York Giants', 79)]
print(process.extract("new york jets", choices, limit=4)) # [('New York Jets', 100), ('New York Giants', 79), ('Atlanta Falcons', 29), ('Dallas Cowboys', 22)]
print(process.extract("new york jets", choices)) # [('New York Jets', 100), ('New York Giants', 79), ('Atlanta Falcons', 29), ('Dallas Cowboys', 22)]
# Process enumerates matches, returns the most similar enumeration character tuple :(" string ", matching)
print(process.extractOne("cowboys", choices)) # ('Dallas Cowboys', 90)
# Process sets the matching mode with additional parameters, such as the matching file path
songs = ["System of a down"."fly"."I am"]
print(process.extractOne("System of a down - Hypnotize - Heroin", songs)) # ('System of a down', 90)
print(process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)) # ('System of a down', 65)
Copy the code
Known migration projects
- Java: Xpresso’s FuzzyWuzzy Implementation
- Java: FuzzyWuzzy (Java Port)
- Rust: FuzzyRusty (Rust Port)
- JavaScript: fuzzball.js (JavaScript port)
- C + + : Tmplt/fuzzywuzzy
- C# : fuzzysharp
- Go: Go-Fuzzywuzz (Go port)
- Free Pascal: FuzzyWuzzy. Pas (Free Pascal Port)
- Kotlin multiplatform: FuzzyWuzzy – Kotlin
- R: fuzzywuzzyR (R port)