Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams

Authors

  • Esteban Rodríguez-Betancourt Universidad de Costa Rica, Costa Rica
  • Edgar Casasola-Murillo Universidad de Costa Rica, Costa Rica

Keywords:

Databases, Indexes, Natural Language Processing, WordEmbeddings

Abstract

With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.

Downloads

Published

2024-09-19

Issue

Section

ASAID - Argentine Symposium on Artificial Intelligence and Data Science

How to Cite

Rodríguez-Betancourt, E., & Casasola-Murillo, E. (2024). Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams. JAIIO, Jornadas Argentinas De Informática, 10(1), 150-157. https://revistas.unlp.edu.ar/JAIIO/article/view/17913