It's not possible that each LLM has all the information in the world, so we need to have a way to retrieve information from data store and then generate the output. This is where RAG comes into picture.
I've developed a project AI Speed Which uses Open AI text-embeeding-3-small to retrieve information from Pinecone Vector Store and uses Groq powered Llama 3 to generate the output this makes it context aware and superfast as groq powered llama-3-8b is fastst available right now (~1250 tokens per second o/p).
Trained on almost ~5M+ tokens of publically available data this is able to generate pretty accurate and context aware output. This is also able to generate code snippets and other structured output which can be used in real world applications.
What's Next? Increasing traing set and making it more context aware and faster. Also making some checks to ensure that output is not hallucinated and is correct.
Live here: ai-speed.vercel.app
Avalable here: pushkarydv/ai-speed also attaching first release tweet below.
Made a RAG based app trained on data of NextJs, ReactJS, Tailwind CSS, shadcn, acternity ui and more Frontend Technologies available publically. In a lot of cases it's better then GPT-4 as this uses retrieval data as it's base and offers speeds of ~1250 tokens/s via聽llama-3-8b.