Bhavish Aggarwal, the founder of India’s fastest-growing and sole AI unicorn, Krutrim, recently unveiled the beta version of the AI chatbot ‘कृत्रिम Assistant’. Some individuals have drawn comparisons between it and Sam Altman-led OpenAI’s ChatGPT.. Following this, Many people started trying Krutrim AI’s chatbot. Among these Raghav Arora took to X (formerly known as Twitter), to share a screenshot of the AI stating it was an OpenAI creation.
“Something seem super fishy @KrutrimAI says it was created by OpenAI,” Arora wrote.
The surprising reaction ignited discussions regarding whether Krutrim functioned as a “wrapper” for ChatGPT, essentially adopting OpenAI’s responses as its own. It is noteworthy that Krutrim achieved unicorn status recently, securing $50 million in funding and attaining a valuation of $1 billion.Such implications wasn’t limited to Raghav, many other users also shared the same response which they received from Krutrim AI.
Responding to this, Krutrim acknowledged the issue, attributing it to a “data leakage” from an open-source dataset used during the language model’s fine-tuning process.
“Hey, thanks for bringing this to our notice. We were able to identify the root cause of a data leakage issue from one of the open-source datasets used in our LLM fine-tuning. As a result, some users saw a message” I am created by OpenAI”. The dataset was immediately removed,” the company wrote.
Despite the prompt resolution, the occurrence has cast lingering uncertainties regarding the methodologies utilized in crafting Krutrim AI. While the startup had previously touted its model’s training on an extensive dataset comprising two trillion tokens, details regarding its research, training techniques, and the composition of the dataset were kept undisclosed.
Pratik Desai, the founder of KissanAI, offered advice on addressing the attribution error, suggesting techniques like Differential Privacy Optimization (DPO) or replacing OpenAI mentions with Krutrim identifiers.
“Unsolicited advice to @Krutrim team. If you’re fine-tuning an OSS base model, use DPO to remove OpenAI mentions. If you’re doing it from scratch, a simple ‘cat’ command can replace all OpenAI mentions in a dataset with Krutrim,” Desai wrote.
In response to Desai’s advice, Krutrim acknowledged the issue and provided insights into the root cause.
“Hey Pratik, You are correct about the approach. Thanks for sharing this, it is very helpful and will help improve our product. We investigated the issue and found the root cause to be a data leakage issue from one of the open-source datasets used in our LLM fine-tuning,” Krutrim wrote in response to Desai’s advice.