Using AutoGluon-RAG to generate embeddings.#

agrag = AutoGluonRAG(
            data_dir="path/to/data", 
            preset_quality="medium_quality", # or path to config file
        )
agrag.initialize_data_module() 
agrag.initialize_embedding_module() 

processed_data = self.process_data()
embeddings = agrag.generate_embeddings(processed_data=processed_data)

Here, instead of calling initialize_rag_pipeline to initialize the entire pipeline, we simply initialize the data and embedding modules to generate the embeddings. generate_embeddings returns a pandas DataFrame with the following columns: "doc_id", "chunk_id", "text", "embedding", "all_embeddings_hidden_dim".

You can obtain the actual embeddings by:

embeddings_list = embeddings["embedding"].tolist()
embeddings_array = np.array(embeddings_list)