As exciting as AI tools like ChatGPT, Google’s Bard and Microsoft’s numerous Co-pilots may be, they all currently face the same restriction: you have to be connected to the Internet to use them. For most people and in most situations that isn’t a big problem, but imagine how great it would be if you could use them on your computers and phones even if you have a poor connection or none at all?
Not only does this increase the situations where you could take advantage of these capabilities, but it can have a number of other important but not necessarily obvious benefits.
First, it turns out that the computing power – and required electrical power – to run these generative AI tools is currently massive. That means companies who are offering these services are spending a lot of money to enable them and, eventually, that could translate into having those costs passed onto users and businesses that use them.
Second, there are security and privacy benefits of not running everything in the cloud. In many of the early versions of these generative AI tools, whatever you type into them is tracked and fed into the large language models (LLMs) powering these services. It’s part of what’s called the model training process. They also use this information to better personalize the information that these tools generate for you.
In fact, some of the more advanced generative AI tools are likely going to evolve into something that’s akin to digital personal assistants that can help plan and organize tasks and meetings for you. Unlike first generation tools like Cortana and Siri, however, these AI-powered tools will be able to do so with more context and knowledge about you (if you let them, of course).
Just as a real-world personal assistant needs to know a lot about a boss’ schedule and work, so too does a digital assistant need to know about your work and schedule to be effective as possible. As more of the work powering these AI models shifts onto the devices, however, less of this information needs to be transferred to the cloud, thereby offering a more private solution.
The way to solve both the power and privacy issues with generative AI is to leverage a concept called distributed computing, where you essentially split and distribute the computing “work” across the cloud and devices.
When it comes to power, if some of the computations that only used to happen in the cloud can be done on devices, then it’s cheaper for company to run these services in the cloud. On the privacy side, if your data, schedule, etc. can remain on your device, but services that know how to use that information for a customized personal assistant experience run on your device, then little to none of your information will go to the cloud.
Recently, a number of companies have been talking about this idea of distributed computing for generative AI. For example, at Microsoft’s recent Build developer conference, they discussed what they’re calling Hybrid AI. Think of it as the next generation of generative AI tools. Microsoft’s version is called Hybrid Loop, and it leverages a software development platform called ONNX Runtime that developers can use to take advantage of the local device computing resources as well as Azure’s cloud computing. In other words, it’s offering a set of tools for software developers to do distributed computing.
Chipmaker Qualcomm, whose chips and modems are found in most smartphones sold in the US, has also been talking about the hybrid AI concept and its other benefits. The company has created a set of software services called the Qualcomm AI Stack that makes it easier to run generative AI tools on smartphones. In fact, the company has shown Stable Diffusion running on phones using its chips.
Speaking of semiconductors, as great as the concept of hybrid AI and distributed computing may sound, the only way to make it possible is to supercharge the capabilities of our devices. In order to run the foundation AI models that power generative AI apps and services on your devices, we’re going to see a whole new range of AI accelerator chips coming into PCs and smartphones over the next year or so.
OS companies like Microsoft and Google need to develop more support for these chips, too. At the Build event, Microsoft pointed out that some of its underlying work for Hybrid AI will be able to leverage the CPU, GPU, NPU (neural processing unit), and potentially other specialized AI accelerators found on modern PCs. That means having newer processors from Intel, AMD and Qualcomm, as well as GPUs from Nvidia and AMD, is going to start to be more important than ever.
Many of the big chip companies have made announcements in this area. AMD announced the Ryzen 7040 that integrates a dedicated AI accelerator. Similarly, Intel’s next generation CPU line, codenamed Meteor Lake, is rumored to be its first to include a dedicated AI accelerator. Both of these chips are expected later this year.
Qualcomm’s Arm-based 8cx processors for PCs also include dedicated AI acceleration and they’re expected to have a new version later this year as well. Qualcomm has also demonstrated that some of its newer Snapdragon 8 Gen 2 processors for premium phones – found in Android phones from Samsung and Motorola – have the ability to run generative AI models and applications directly on the phone.
To be clear, at present, the vast majority of generative AI software and services still run on the cloud. The computing requirements that tools like ChatGPT need can only be met with huge amounts of cloud-based servers. Over time, however, we’re going to see new types of smaller AI models and clever ways of shifting the computing workloads AI demands onto our devices. When we do, even more mind-blowing AI-powered capabilities will start to become available.
The world of generative AI is causing massive disruptions across the entire tech world, and its implications go far deeper than they first appear. While it can be a bit overwhelming, it’s important to remember that we’re embarking on one of the most exciting new eras of computing, across PCs, mobile, and all other devices, in quite some time. Hang on and enjoy the ride.
Bob O’Donnell is the founder and chief analyst of TECHnalysis Research, LLC a technology consulting firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on Twitter @bobodtech