String fuzzy matching library - FuzzyWuzzy

introduce

FuzzyWuzzy is a third-party tool library for string matching that computes the difference between two specified strings based on Levenshtein Distance.

Github project address

Environmental installation

Pythonv 2.7 +
Difflib
python-Levenshtein, which can be provided if the string matches the difference4-10xThe acceleration, but inCertain conditions”May cause different matching results

PIP can be used to install:

python -m pip install fuzzywuzzy
Copy the code

Or you need to install the Python-Levenshtein library:

python -m pip install fuzzywuzzy[speedup]
Copy the code

Import the alarm information when running:

UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
Copy the code

Install the library for clearing alarm data:

python -m pip3 install python-levenshtein
Copy the code

Supported testing tools

pycodestyle
hypothesis
pytest

Simple to use

#! /usr/bin/env python
# _*_ Coding: UTF-8 _*_
from fuzzywuzzy import fuzz, process


# Easy to use
print(fuzz.ratio("this is a medusa blog"."this is a blog!"))  # 78

Not an exact match
print(fuzz.partial_ratio("this is a test"."this is a test!"))  # 100
print(fuzz.partial_ratio("It is never too old to learn"."It is never too old to learn!"))  # 100
print(fuzz.partial_ratio("no cross, no crown."."No cross, no crown."))  # 95

# ignore matching order
print(fuzz.ratio("Medusa Sorcerer Blog"."Blog Medusa Sorcerer"))  # 75
print(fuzz.token_sort_ratio("Medusa Sorcerer Blog"."Blog Medusa Sorcerer"))  # 100

# deduplicate subset matching
print(fuzz.token_sort_ratio("fuzzy was a bear"."fuzzy fuzzy was a bear"))  # 84
print(fuzz.token_set_ratio("fuzzy was a bear"."fuzzy fuzzy was a bear"))  # 100

[(" string ", matching degree),...]
choices = ["Atlanta Falcons"."New York Jets"."New York Giants"."Dallas Cowboys"]
print(process.extract("new york jets", choices, limit=2))  # [('New York Jets', 100), ('New York Giants', 79)]
print(process.extract("new york jets", choices, limit=4))  # [('New York Jets', 100), ('New York Giants', 79), ('Atlanta Falcons', 29), ('Dallas Cowboys', 22)]
print(process.extract("new york jets", choices))  # [('New York Jets', 100), ('New York Giants', 79), ('Atlanta Falcons', 29), ('Dallas Cowboys', 22)]

# Process enumerates matches, returns the most similar enumeration character tuple :(" string ", matching)
print(process.extractOne("cowboys", choices))  # ('Dallas Cowboys', 90)

# Process sets the matching mode with additional parameters, such as the matching file path
songs = ["System of a down"."fly"."I am"]
print(process.extractOne("System of a down - Hypnotize - Heroin", songs))  # ('System of a down', 90)
print(process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio))  # ('System of a down', 65)
Copy the code

Known migration projects

Java: Xpresso’s FuzzyWuzzy Implementation
Java: FuzzyWuzzy (Java Port)
Rust: FuzzyRusty (Rust Port)
JavaScript: fuzzball.js (JavaScript port)
C + + : Tmplt/fuzzywuzzy
C# : fuzzysharp
Go: Go-Fuzzywuzz (Go port)
Free Pascal: FuzzyWuzzy. Pas (Free Pascal Port)
Kotlin multiplatform: FuzzyWuzzy – Kotlin
R: fuzzywuzzyR (R port)

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

String fuzzy matching library — FuzzyWuzzy

introduce

Environmental installation

Supported testing tools

Simple to use

Known migration projects

String fuzzy matching library — FuzzyWuzzy

introduce

Environmental installation

Supported testing tools

Simple to use

Known migration projects

Related Posts

Hexo-2 – Site information configuration

Can test engineers easily earn more than ten thousand yuan a month? After looking at these 20 charts (market + learning skills), I fell to my knees!

How to optimize slow query in MySQL?