Facebook is creating an AI system called Rosetta to identify text and content on pictures.
Several pictures are shared on Facebook and Instagram daily. It might be overlaid on an image in a meme, or inlaid in a photo of a storefront, street sign, or restaurant menu. Taking into account the sheer volume of photos shared each day on Facebook and Instagram, the number of languages supported on the global platform, and the variations of the text, the problem of understanding text in images is quite different from those solved by traditional optical character recognition (OCR) systems, which recognize the characters but don’t understand the context of the associated image.
To address these specific needs, Facebook built and deployed a large-scale machine learning system named Rosetta. It extracts text from more than a billion public Facebook and Instagram images and video frames (in a wide variety of languages), daily and in real time, and inputs it into a text recognition model that has been trained on classifiers to understand the context of the text and the image together.
Rosetta is based on two important elements: detection & recognition. "The first step, we detect rectangular regions that potentially contain text. In the second step, we perform text recognition, where, for each of the detected regions, we use a convolutional neural network (CNN) to recognize and transcribe the word in the region," Facebook's official blog read.
Facebook would also face many challenges to run this procedure like different languages as it’s available globally in many countries, the interconnection between the text and image and if it’s disrespectful/vulgar or just for entertainment purpose.
If it accomplishes the desired aim by running this operation, it would eradicate offensive content to a great extent. But a single glitch in the system may turn the tables. Down the line, Facebook could leverage this data to spot trending content for ad purposes.