logo

Analiticcl - efficient fuzzy string matching for spelling & post-OCR correction

time4 yr agoview0 views

Analiticcl is an approximate string matching or fuzzy-matching system that can be used for spelling correction or text normalisation (such as post-OCR correction or post-HTR correction). Texts can be checked against a validated or corpus-derived lexicon (with or without frequency information) and spelling variants will be returned.

The distinguishing feature of the system is the usage of anagram hashing to drastically reduce the search space and make quick lookups possible even over larger edit distances. The underlying idea is largely derived from prior work TICCL (Reynaert 2010; Reynaert 2004).

This work was done at the KNAW Humanities Cluster in 2022 in the scope of the Golden Agents projects.

Loading comments...