introduce

FuzzyWuzzy is a third-party tool library for string matching that computes the difference between two specified strings based on Levenshtein Distance.

Github project address

Environmental installation

  • Pythonv 2.7 +
  • Difflib
  • python-Levenshtein, which can be provided if the string matches the difference4-10xThe acceleration, but inCertain conditions”May cause different matching results

PIP can be used to install:

python -m pip install fuzzywuzzy
Copy the code

Or you need to install the Python-Levenshtein library:

python -m pip install fuzzywuzzy[speedup]
Copy the code

Import the alarm information when running:

UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
Copy the code

Install the library for clearing alarm data:

python -m pip3 install python-levenshtein
Copy the code

Supported testing tools

  • pycodestyle
  • hypothesis
  • pytest

Simple to use

#! /usr/bin/env python
# _*_ Coding: UTF-8 _*_
from fuzzywuzzy import fuzz, process


# Easy to use
print(fuzz.ratio("this is a medusa blog"."this is a blog!"))  # 78

Not an exact match
print(fuzz.partial_ratio("this is a test"."this is a test!"))  # 100
print(fuzz.partial_ratio("It is never too old to learn"."It is never too old to learn!"))  # 100
print(fuzz.partial_ratio("no cross, no crown."."No cross, no crown."))  # 95

# ignore matching order
print(fuzz.ratio("Medusa Sorcerer Blog"."Blog Medusa Sorcerer"))  # 75
print(fuzz.token_sort_ratio("Medusa Sorcerer Blog"."Blog Medusa Sorcerer"))  # 100

# deduplicate subset matching
print(fuzz.token_sort_ratio("fuzzy was a bear"."fuzzy fuzzy was a bear"))  # 84
print(fuzz.token_set_ratio("fuzzy was a bear"."fuzzy fuzzy was a bear"))  # 100

[(" string ", matching degree),...]
choices = ["Atlanta Falcons"."New York Jets"."New York Giants"."Dallas Cowboys"]
print(process.extract("new york jets", choices, limit=2))  # [('New York Jets', 100), ('New York Giants', 79)]
print(process.extract("new york jets", choices, limit=4))  # [('New York Jets', 100), ('New York Giants', 79), ('Atlanta Falcons', 29), ('Dallas Cowboys', 22)]
print(process.extract("new york jets", choices))  # [('New York Jets', 100), ('New York Giants', 79), ('Atlanta Falcons', 29), ('Dallas Cowboys', 22)]

# Process enumerates matches, returns the most similar enumeration character tuple :(" string ", matching)
print(process.extractOne("cowboys", choices))  # ('Dallas Cowboys', 90)

# Process sets the matching mode with additional parameters, such as the matching file path
songs = ["System of a down"."fly"."I am"]
print(process.extractOne("System of a down - Hypnotize - Heroin", songs))  # ('System of a down', 90)
print(process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio))  # ('System of a down', 65)
Copy the code

Known migration projects

  • Java: Xpresso’s FuzzyWuzzy Implementation
  • Java: FuzzyWuzzy (Java Port)
  • Rust: FuzzyRusty (Rust Port)
  • JavaScript: fuzzball.js (JavaScript port)
  • C + + : Tmplt/fuzzywuzzy
  • C# : fuzzysharp
  • Go: Go-Fuzzywuzz (Go port)
  • Free Pascal: FuzzyWuzzy. Pas (Free Pascal Port)
  • Kotlin multiplatform: FuzzyWuzzy – Kotlin
  • R: fuzzywuzzyR (R port)