
STRING SIMILARITY MODS
Cite (Informal): Optimal Transport-based Alignment of Learned Character Representations for String Similarity (Tam et al., ACL 2019) Copy Citation: BibTeX Markdown MODS XML Endnote More options… PDF: Video: Code = "Optimal Transport-based Alignment of Learned Character Representations for String Similarity",īooktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", Association for Computational Linguistics. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5907–5917, Florence, Italy. Optimal Transport-based Alignment of Learned Character Representations for String Similarity. Anthology ID: P19-1592 Volume: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics Month: July Year: 2019 Address: Florence, Italy Venue: ACL SIG: Publisher: Association for Computational Linguistics Note: Pages: 5907–5917 Language: URL: DOI: 10.18653/v1/P19-1592 Bibkey: tam-etal-2019-optimal Cite (ACL): Derek Tam, Nicholas Monath, Ari Kobren, Aaron Traylor, Rajarshi Das, and Andrew McCallum. We also demonstrate STANCE’s ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in Bˆ3 F1 over the previous state-of-the-art approach. We show that STANCE (or one of its variants) outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We construct five new alias detection datasets (and make them publicly available). We evaluate STANCE’s ability to detect whether two strings can refer to the same entity–a task we term alias detection. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. In this work, we present STANCE–a learned model for computing the similarity of two strings. Abstract String similarity models are vital for record linkage, entity resolution, and search.
