Re: distance measures in vector similarity search

From: Eric Lease Morgan <00000107b9c961ae-dmarc-request_at_nyob> Date: Tue, 20 May 2025 11:41:26 -0400 To: CODE4LIB_at_LISTS.CLIR.ORG

On May 16, 2025, at 9:23 AM, Eric Lease Morgan <emorgan_at_nd.edu> wrote:

> What distance measure do you suggest I use when implementing vector similarity search?
> 
> I have piles o' sentences. Almost more than I count, literally. I have successfully looped through subsets of these sentences, vectorized them (think "indexed"), and stored the result in a Postgres database through the use of an extension called pgvector...
> 
> [1] https://github.com/pgvector/pgvector
> [2] https://medium.com/advanced-deep-learning/understanding-vector-similarity-b9c10f7506de

I have finished my investigations into vectorizing sentences, saving the results to a Postgres database, and querying the results. But alas, the linked suite (below), while very functional, is incomplete and poorly described because I subsequently learned how to do all of the same things and more with SQLite and an SQLite module called sqlite_vec. [1, 2, 3]

That said, if your computing stack needs/requires Postgres, then the attached zip file may speed up your investigations. 

[1] temporarily available suite of Python scripts - https://distantreader.org/tmp/vectors2postgres.zip
[2] sqlite_vec home - https://github.com/asg017/sqlite-vec
[3] sqlite_vec documentation - https://alexgarcia.xyz/sqlite-vec/installation.html

--
Eric Morgan
University of Notre Dame