Meta’s AI Assistant Learns from Public Instagram and Facebook Posts

Digital Shop 99 September 30, 2023

0 0 2 minutes read

Meta Platforms has revealed that it used public Facebook and Instagram posts to train its new Meta AI virtual assistant. However, it made sure to exclude private posts shared only with family and friends to respect consumers’ privacy. The company’s President of Global Affairs, Nick Clegg, stated that private chats on its messaging services were also not used as training data, and precautions were taken to filter private details from public datasets used for training.

Clegg emphasized that Meta tried to exclude datasets containing personal information, with the majority of the training data being publicly available. For instance, LinkedIn’s content was deliberately omitted due to privacy concerns. This approach follows the criticism faced by tech companies like Meta, OpenAI, and Alphabet’s Google for scraping information from the internet without permission to train their AI models. These models consume vast amounts of data to summarize information and generate imagery.

Meta AI, unveiled by CEO Mark Zuckerberg at the Connect conference, is a significant product among Meta’s consumer-facing AI tools. It is based on a custom model using the powerful Llama 2 language model and a new model called Emu for image generation. The assistant will be capable of generating text, audio, and imagery, with real-time information accessed through a partnership with Microsoft’s Bing search engine.

Clegg highlighted that safety restrictions were imposed on Meta AI to prevent the creation of photo-realistic images of public figures. As for copyrighted materials, he anticipated potential litigation regarding the application of fair use doctrine to creative content. Clegg stated that Meta believes it falls under fair use but expects the matter to be resolved through legal proceedings.

When it comes to image-generation tools, some companies reproduce iconic characters with permission, while others pay for materials or consciously exclude them from training data. For instance, OpenAI signed a deal with Shutterstock to use their image, video, and music libraries for training purposes. Meta’s spokesperson indicated that the new terms of service prohibit users from generating content that violates privacy and intellectual property rights.

In conclusion, Meta has utilized public content from Facebook and Instagram to train its AI assistant, ensuring the exclusion of private posts and chats. The company is making efforts to address concerns about privacy and copyright infringement, but challenges relating to fair use and reproducing copyrighted material may arise in the future.

[Unique Perspective] The utilization of public data for AI training highlights the delicate balance between advancing technology and safeguarding privacy. As AI continues to evolve, it is crucial for companies like Meta to prioritize consumer privacy and offer transparency regarding data usage. Striking a balance between innovation and respecting personal boundaries will be vital for the widespread acceptance of AI-powered assistants.