Re: Are there datasets to evaluate the quality of a document embedding

From: Tom Cramer <tcramer_at_nyob> Date: Mon, 9 Dec 2024 18:10:17 +0000 To: CODE4LIB_at_LISTS.CLIR.ORG

Hi Jannis,

You might want to try asking this question of ai4lam.org<https://sites.google.com/view/ai4lam> too:

Google Group: https://groups.google.com/forum/#!forum/ai4lam

Slack: https://join.slack.com/t/ai4lam/shared_invite/zt-1omthldn8-9vrGySjIRdija1nKQm0ltA

Data for training and testing models is a frequent topic there.

Best,

- Tom

From: Code for Libraries <CODE4LIB_at_LISTS.CLIR.ORG> on behalf of Ohms, Jannis <j.ohms_at_TU-BRAUNSCHWEIG.DE>
Date: Friday, December 6, 2024 at 4:59 AM
To: CODE4LIB_at_LISTS.CLIR.ORG <CODE4LIB_at_LISTS.CLIR.ORG>
Subject: [CODE4LIB] Are there datasets to evaluate the quality of a document embedding
Dear all,

Iam currently developing a RAG (https://en.wikipedia.org/wiki/Retrieval-augmented_generation) Application

Are there datasets to evaluate or test the retrieval quality of my embedding model ?

Thanks

Jannis Ohms

Jannis Ohms
Technische Universität Braunschweig
Universitätsbibliothek | <i>University Library</i>
Abt.: IT und Forschungsnahe Services | <i>Dep.: IT and Research Support Services</i

Universitätsplatz 1, R212
38106 Braunschweig
Germany

Phone: +49 531 391 5027

j.ohms_at_tu-braunschweig.de<mailto:j.ohms_at_tu-braunschweig.de>
www.tu-braunschweig.de/ub/<http://www.tu-braunschweig.de/ub/><http://www.tu-braunschweig.de/ub/%3chttp:/www.tu-braunschweig.de/ub/%3e>