- Published on
Using SequenceMatcher for String and Sequence Similarity in Python
- Authors
- Name
- hwahyeon
difflib.SequenceMatcher
is a part of Python’s standard library that allows you to calculate the similarity between two strings. Since it is included in Python by default, no additional installation is required.
Examples
1. Comparing identical strings
from difflib import SequenceMatcher
print(SequenceMatcher(None, 'hello', 'hello').ratio()) # 1.0
→ The similarity between two identical strings is 1.0.
2. Comparing partially similar strings
from difflib import SequenceMatcher
print(SequenceMatcher(None, 'hello', 'hallo').ratio()) # 0.8
→ Strings with minor differences have a similarity score of 0.8.
3. Comparing completely different strings
from difflib import SequenceMatcher
print(SequenceMatcher(None, 'hello', 'world').ratio()) # 0.0
→ Completely different strings have a similarity score of 0.0.
4. Comparing other sequence types
SequenceMatcher
can also compare sequences like lists or tuples, as long as they are ordered data structures.
Example: Lists
from difflib import SequenceMatcher
print(SequenceMatcher(None, [1, 2, 3, 4], [1, 3, 4, 5]).ratio()) # 0.75
→ The similarity between these lists is 0.75.
Example: Tuples
from difflib import SequenceMatcher
print(SequenceMatcher(None, ('a', 'b', 'c', 'd'), ('a', 'x', 'y', 'd')).ratio()) # 0.5
→ The similarity score between these tuples is 0.5, as two elements ("a" and "d") match.