9 de dezembro • 2 min read
Optical Generative AI models
Deep neural networks (DNNs) like the one behind ChatGPT are based on machine learning models that require vast amounts of energy and are usually confined to large data centers. That is motivating the development of new computing paradigms. Using light rather than electrons to run DNN computations has the potential to break through the current bottlenecks.
ChatGPT has made headlines worldwide and has become a general-purpose technology with its ability to write essays, email, and computer code based on a few prompts from a user. Researchers from a Massachusetts Institute of Technology- led team introduce a compact architecture that, for the first time, solves all of these challenges.
Generative AI, a broad category of generative pre-trained transformers like ChatGPT, is currently at the “peak of inflated expectations”. A full realization of the technology’s benefits will take some time; however, the time to act is now or risk being a laggard.
Deep neural networks (DNNs) like the one behind ChatGPT are based on machine learning models that require vast amounts of energy and are usually confined to large data centers. That is motivating the development of new computing paradigms.
Using light rather than electrons to run DNN computations has the potential to break through the current bottlenecks.
Computations using optics, for example, can potentially use far less energy than those based on electronics. However, current optical neural networks (ONNs) have significant challenges. For example, they use a great deal of power because they are inefficient at converting incoming data based on electrical energy into light. Further, the components involved are bulky and take up significant space. While ONNs are pretty good at linear calculations like adding, they are not great at nonlinear calculations like multiplication and “if” statements.
In this work, the researchers from a Massachusetts Institute of Technology- led team introduce a compact architecture that, for the first time, solves all of these challenges.
It’s an architecture based on state-of-the-art arrays of vertical surface-emitting lasers (VCSELs), a relatively new technology used in applications including lidar remote sensing and laser printing.
This architecture can lead to machine learning models several orders of magnitude more powerful than the one behind ChatGPT. The system they developed could also use several orders of magnitude less energy than the state-of-the-art supercomputers behind the machine-learning models of today. The VCELs reported in the Nature Photonics paper were developed by the Reitzenstein group at Technische Universität Berlin.
I will keep an eye on the evolution of their work.