Why Transformers offer more than meets the eye
Join executive leaders at the AI at the Edge & IoT Summit. Watch now!
What do OpenAI’s language-generating GPT-3 and DeepMind’s protein shape-predicting AlphaFold have in common? Besides achieving leading results in their respective fields, both are built atop the Transformer, an AI architecture that’s gained considerable attention within the last several years. Dating back to 2017, the Transformer has become the architecture of choice for natural language tasks, but has been shown to have an aptitude for summarizing documents, translating between languages, and analyzing biological sequences.
The Transformer has clear immediate business applications. OpenAI’s GPT-3 is currently used in more than 300 apps by tens of thousands of developers and producing 4.5 billion words per day. DeepMind is applying its AlphaFold technology to identify cures for rare, neglected diseases. And more sophisticated applications are on the horizon, as demonstrated by research showing that the Transformer can be tuned to play games like chess and even applied to image processing.
What are Transformers?
The Transformer’s architecture is made up of two core components: an encoder and a decoder. The encoder contains layers that process input data, like text and images, iteratively layer-by-layer. Each encoder layer generates encodings with information about which parts of the inputs are relevant to each other. They then pass these encodings to the next layer until reaching the final encoder layer.
The decoder’s layers do the same thing, but to the encoder’s output. They take the encodings and use their incorporated contextual information to generate an output sequence of data — whether text, a predicted protein structure, or an image.
Each encoder and decoder layer make use of what’s called attention mechanism, which makes the Transformer unique from other architectures. For every input, attention weighs the relevance of every other input and draws from them to generate the output. Each decoder layer has an additional attention mechanism that draws information from the outputs of previous decoders, before the decoder layer finally draws information from the encodings to produce an output.
Transformers typically undergo semi-supervised learning involving unsupervised pretraining followed by supervised fine-tuning. Residing between supervised and unsupervised learning, semi-supervised learning accepts data that’s partially labeled or where the majority of the data lacks labels. In this case, Transformers first are subjected to “unknown” data for which no previously defined labels exist and must teach themselves to classify the data, processing the unlabeled data to learn from its inherent structure. During the fine-tuning process, Transformers train on labeled datasets so that they learn to accomplish particular tasks, like answering questions, analyzing sentiment, and paraphrasing documents.
It’s a form of transfer learning, or storing knowledge gained while solving one problem and applying it to a different — but related — problem. The pretraining step helps the model to learn general features that can be reused on the target task, boosting its accuracy.
Attention has the added benefit of boosting model training speed. Because Transformers aren’t sequential, they can be more easily parallelized, and larger and larger models can be trained with significant — but not unattainable — increases in compute. Running on 16 Google TPUv3 special-built processors, AlphaFold took a few weeks to train, while OpenAI’s music-generating Jukebox took over a month across hundreds of Nvidia V100 graphics cards.
The business value of Transformers
The Transformer isn’t relegated to the realm of theory, as alluded to — it’s been widely deployed in the real world. One startup, Viable, is using the Transformer-powered GPT-3 to analyze customer feedback, identifying themes and sentiment from surveys, help desk tickets, live chat logs, reviews, and more. Algolia, another startup, is using it to improve its web search products.
More exciting use cases lie beyond the language domain. In January, OpenAI took the wraps off of DALL-E, a text-to-image engine that’s essentially a visual idea generator. Given a text prompt, it generates images that most closely match the prompt, filling in the blanks when the prompt implies the image must contain a detail that isn’t explicitly stated.
OpenAI predicts that DALL-E could someday augment — or even replace — 3D rendering engines. For example, architects could use the tool to visualize buildings, while graphic artists could apply it to software and video game design. In another point in DALL-E’s favor, the Transformer-driven tool can combine disparate ideas to synthesize objects, some of which are unlikely to exist in the real world — like a hybrid of a snail and a harp.
“DALL-E shows creativity, producing useful conceptual images for product, fashion, and interior design,” Gary Grossman, global lead at Edelman’s AI center of excellence, wrote in a recent blog post. “DALL-E could support creative brainstorming … either with thought starters or, one day, producing final conceptual images. Time will tell whether this will replace people performing these tasks or simply be another tool to boost efficiency and creativity.”
The future holds Transformer-based models that can go one step further, synthesizing not just pictures but videos from whole cloth. These types of systems have been detailed in academic literature. Other, related applications soon — or already — possible include generating realistic voices, recognizing speech, parsing medical records, predicting stock prices, and creating computer code.
Indeed, Transformers have immense potential in the enterprise, which is one of the reasons the global AI market is anticipated to be worth $266.92 billion by 2027. Transformer-powered apps could enable workers to spend their time on less menial, more meaningful work, bolstering productivity. The McKinsey Global Institute predicts technologies like the Transformer will result in a 1.2% increase in gross domestic product growth (GDP) for the next 10 years and help capture an additional 20% to 25% in net economic benefits — $13 trillion globally — in the next 12 years.
Businesses that ignore the potential of Transformers do so at their own peril.
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more
Source: Read Full Article