Using Tools with LangChain for RAG

Learn how to leverage LangChain tools to integrate Wikipedia and Arxiv for AI research and information retrieval

Setting Up Research Tools

We can use LangChain's built-in tools to query Wikipedia and Arxiv for research purposes.

from langchain_community.tools import ArxivQueryRun, WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper, ArxivAPIWrapper
                    
# Initialize Wikipedia tool
api_wrapper_wiki = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=250)
wiki = WikipediaQueryRun(api_wrapper=api_wrapper_wiki)
                    
# Initialize Arxiv tool
api_wrapper_arxiv = ArxivAPIWrapper(top_k_results=1, doc_content_chars_max=250)
arxiv = ArxivQueryRun(api_wrapper=api_wrapper_arxiv)
print(arxiv.name)
                    
# List of tools
tools = [wiki, arxiv]

Custom RAG-Based Search Tool

We can create a Retrieval-Augmented Generation (RAG) tool for more effective AI-driven search.

from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
                    
# Load documents from LangSmith documentation
loader = WebBaseLoader("https://docs.smith.langchain.com/")
docs = loader.load()
                    
# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(docs)
                    
# Create vector database
vectordb = FAISS.from_documents(documents, OpenAIEmbeddings())
retriever = vectordb.as_retriever()

Creating a Retriever Tool

from langchain.tools.retriever import create_retriever_tool
retriever_tool = create_retriever_tool(retriever, "langsmith-search", "Search any information about Langsmith")
tools.append(retriever_tool)

Running AI Models with Agents

We can integrate LangChain Agents with OpenAI or Groq AI to interact with these tools.

from langchain_groq import ChatGroq
from langchain.chat_models import ChatOpenAI
from dotenv import load_dotenv
import os
                    
load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")
                    
# Initialize LLM
llm = ChatOpenAI(openai_api_key=openai_api_key, model_name="gpt-3.5-turbo")

Setting Up Agents

from langchain import hub
from langchain.agents import create_openai_tools_agent, AgentExecutor
                    
# Load OpenAI functions agent template
prompt = hub.pull("hwchase17/openai-functions-agent")
                    
# Create an agent
agent = create_openai_tools_agent(llm, tools, prompt)
                    
# Execute the agent
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

Running Queries

We can now invoke queries using our AI-powered agent.

agent_executor.invoke({"input": "Tell me about Artificial Intelligence"})
agent_executor.invoke({"input": "What is machine learning?"})
agent_executor.invoke({"input": "What's the paper 1706.03762 about?"})

Conclusion

By integrating Wikipedia, Arxiv, and LangChain's vector search, we enhance AI's ability to retrieve **accurate, context-aware responses. This method is useful for research assistants, academic tools, and AI-driven Q&A systems.