Published on

Using SequenceMatcher for String and Sequence Similarity in Python

Authors
  • avatar
    Name
    hwahyeon
    Twitter

difflib.SequenceMatcher is a part of Python’s standard library that allows you to calculate the similarity between two strings. Since it is included in Python by default, no additional installation is required.

Examples

1. Comparing identical strings

from difflib import SequenceMatcher
print(SequenceMatcher(None, 'hello', 'hello').ratio())  # 1.0

→ The similarity between two identical strings is 1.0.

2. Comparing partially similar strings

from difflib import SequenceMatcher
print(SequenceMatcher(None, 'hello', 'hallo').ratio())  # 0.8

→ Strings with minor differences have a similarity score of 0.8.

3. Comparing completely different strings

from difflib import SequenceMatcher
print(SequenceMatcher(None, 'hello', 'world').ratio())  # 0.0

→ Completely different strings have a similarity score of 0.0.

4. Comparing other sequence types

SequenceMatcher can also compare sequences like lists or tuples, as long as they are ordered data structures.

Example: Lists

from difflib import SequenceMatcher
print(SequenceMatcher(None, [1, 2, 3, 4], [1, 3, 4, 5]).ratio())  # 0.75

→ The similarity between these lists is 0.75.

Example: Tuples

from difflib import SequenceMatcher
print(SequenceMatcher(None, ('a', 'b', 'c', 'd'), ('a', 'x', 'y', 'd')).ratio())  # 0.5

→ The similarity score between these tuples is 0.5, as two elements ("a" and "d") match.