Hi Jannis, You might want to try asking this question of ai4lam.org<https://sites.google.com/view/ai4lam> too: Google Group: https://groups.google.com/forum/#!forum/ai4lam Slack: https://join.slack.com/t/ai4lam/shared_invite/zt-1omthldn8-9vrGySjIRdija1nKQm0ltA Data for training and testing models is a frequent topic there. Best, - Tom From: Code for Libraries <CODE4LIB_at_LISTS.CLIR.ORG> on behalf of Ohms, Jannis <j.ohms_at_TU-BRAUNSCHWEIG.DE> Date: Friday, December 6, 2024 at 4:59 AM To: CODE4LIB_at_LISTS.CLIR.ORG <CODE4LIB_at_LISTS.CLIR.ORG> Subject: [CODE4LIB] Are there datasets to evaluate the quality of a document embedding Dear all, Iam currently developing a RAG (https://en.wikipedia.org/wiki/Retrieval-augmented_generation) Application Are there datasets to evaluate or test the retrieval quality of my embedding model ? Thanks Jannis Ohms Jannis Ohms Technische Universität Braunschweig Universitätsbibliothek | <i>University Library</i> Abt.: IT und Forschungsnahe Services | <i>Dep.: IT and Research Support Services</i Universitätsplatz 1, R212 38106 Braunschweig Germany Phone: +49 531 391 5027 j.ohms_at_tu-braunschweig.de<mailto:j.ohms_at_tu-braunschweig.de> www.tu-braunschweig.de/ub/<http://www.tu-braunschweig.de/ub/><http://www.tu-braunschweig.de/ub/%3chttp:/www.tu-braunschweig.de/ub/%3e>Received on Mon Dec 09 2024 - 13:08:57 EST