Amidst an escalating AI rivalry, e-commerce behemoth Amazon is intensifying its push into artificial intelligence. But how much customer data is being used to build its own large language model (LLM) has recently come under scrutiny.

At a live event last week, the company said that generative AI technology would be coming to Alexa and Amazon’s smart home devices. Since then, an interview on Bloomberg TV has stirred concerns around what data the company is using to train Alexa’s AI. An NBC News report on Tuesday suggested that the use of user conversations as training data was a new practice or revelation—a characterization an Amazon spokesperson refuted to Decrypt.

The controversy was sparked after Amazon’s Senior Vice President of Devices and Services, David Limp, was featured in a Q&A segment on Bloomberg TV.

“Users would be volunteering their voice data and conversations for Amazon’s LLM training purposes” after agreeing to use a more “customized” version of Alexa, Limp reportedly said.


But an Amazon spokesperson told Decrypt that the NBC report took Limp’s comments out of context.

“[Amazon] has always believed that training Alexa with real-world requests is essential to delivering an experience to customers that's accurate and personalized and constantly getting better,” the spokesperson said. “But in tandem, we give customers control over whether their Alexa voice recordings are used to improve the service, and we always honor our customer preferences when we train our models.”

Amazon said an upcoming AI-enhanced Alexa update will introduce more natural voice interactions, including real-time news and smart home functions. Once live, U.S. Echo users can test the new AI features by saying, "Alexa, let’s chat," with the tell-tell blue ring light showing when Alexa is listening.

In 2019, Amazon was in the hot seat as reports surfaced that human contractors were listening to recordings of Alexa user voice commands. The company said at the time that this was done to improve "speech recognition and natural language understanding systems," and explained how users could opt-out, and even review and delete recordings stored by the system.


Since the launch of OpenAI’s ChatGPT in November, technology firms have been in an arms race to develop the most advanced and least expensive generative AI tools for mass markets. On Monday, Amazon announced a $4 billion investment in OpenAI rival Anthropic, the company behind Claude AI.

Large language models or LLMs that make up the systems that underpin generative AI are trained on large datasets collected from various sources, including the internet. Last week, Amazon’s 2023 Devices & Services event showed all of the new AI features coming to Alexa. In the presentation, Amazon said the company wants to make Alexa more personalized to each customer and is turning to its internal models to train its AI.

AI developers are now grappling with increased scrutiny concerning training practices, privacy protocols, and data protection for their rapidly expanding user base. Last week, several high-profile authors joined a class action lawsuit by the Authors Guild against OpenAI, claiming copyright infringement for feeding their work into the chatbot’s dataset.

Amazon’s spokesperson reiterated Amazon's commitment to user privacy.

“We designed the Alexa experience to protect customers’ privacy and put them in control over their experience,” the Amazon spokesperson added in an email. “Customers can still access the same robust set of tools and privacy controls that put them in control of their Alexa experience today, including the ability to review and delete their voice recordings.”

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.