Memory Chips That Compute Will Accelerate AI - IEEE Spectrum

2022-04-29 18:30:37 By : Ms. Sophia Wang

IEEE websites place cookies on your device to give you the best user experience. By using our websites, you agree to the placement of these cookies. To learn more, read our Privacy Policy.

Samsung could double performance of neural nets with processing-in-memory

Samsung added AI compute cores to DRAM memory dies to speed up machine learning.

John von Neumann’s original computer architecture, where logic and memory are separate domains, has had a good run. But some companies are betting that it’s time for a change.

In recent years, the shift toward more parallel processing and a massive increase in the size of neural networks mean processors need to access more data from memory more quickly. And yet “the performance gap between DRAM and processor is wider than ever,” says Joungho Kim, an expert in 3D memory chips at Korea Advanced Institute of Science and Technology, in Daejeon, and an IEEE Fellow. The von Neumann architecture has become the von Neumann bottleneck.

What if, instead, at least some of the processing happened in the memory? Less data would have to move between chips, and you’d save energy, too. It’s not a new idea. But its moment may finally have arrived. Last year, Samsung, the world’s largest maker of dynamic random-access memory (DRAM), started rolling out processing-in-memory (PIM) tech. Its first PIM offering, unveiled in February 2021, integrated AI-focused compute cores inside its Aquabolt-XL high-bandwidth memory. HBM is the kind of specialized DRAM that surrounds some top AI accelerator chips. The new memory is designed to act as a “drop-in replacement” for ordinary HBM chips, said Nam Sung Kim, an IEEE Fellow, who was then senior vice president of Samsung’s memory business unit.

Last August, Samsung revealed results from tests in a partner’s system. When used with the Xilinx Virtex Ultrascale + (Alveo) AI accelerator, the PIM tech delivered a nearly 2.5-fold performance gain and a 62 percent cut in energy consumption for a speech-recognition neural net. Samsung has been providing samples of the technology integrated into the current generation of high-bandwidth DRAM, HBM2. It’s also developing PIM for the next generation, HBM3, and for the low-power DRAM used in mobile devices. It expects to complete the standard for the latter with JEDEC in the first half of 2022.

There are plenty of ways to add computational smarts to memory chips. Samsung chose a design that’s fast and simple. HBM consists of a stack of DRAM chips linked vertically by interconnects called through-silicon vias (TSVs). The stack of memory chips sits atop a logic chip that acts as the interface to the processor.

The third-largest DRAM maker says it does not have a processing-in-memory product. However, in 2019 it acquired the AI-tech startup Fwdnxt, with the goal of developing “innovation that brings memory and computing closer together.”

The Israeli startup has developed memory with integrated processing cores designed to accelerate queries in data analytics.

Engineers at the DRAM-interface-tech company did an exploratory design for processing-in-memory DRAM focused on reducing the power consumption of high-bandwidth memory (HBM).

Furthest along, the world’s largest DRAM maker is offering the Aquabolt-XL with integrated AI computing cores. It has also developed an AI accelerator for memory modules, and it’s working to standardize AI-accelerated DRAM.

Engineers at the second-largest DRAM makers and Purdue University unveiled results for Newton, an AI-accelerating HBM DRAM in 2019, but the company decided not to commercialize it and pursue PIM for standard DRAM instead.

The highest data bandwidth in the stack lies within each chip, followed by the TSVs, and finally the connections to the processor. So Samsung chose to put the processing on the DRAM chips to take advantage of the high bandwidth there. The compute units are designed to do the most common neural-network calculation, called multiply and accumulate, and little else. Other designs have put the AI logic on the interface chip or used more complex processing cores.

Samsung’s two largest competitors, SK hynix and Micron Technology, aren’t quite ready to take the plunge on PIM for HBM, though they’ve each made moves toward other types of processing-in-memory.

Icheon, South Korea–based SK hynix, the No. 2 DRAM supplier, is exploring PIM from several angles, says Il Park, vice president and head of memory-solution product development. For now it is pursuing PIM in standard DRAM chips rather than HBM, which might be simpler for customers to adopt, says Park.

HBM PIM is more of a mid- to long-term possibility, for SK hynix. At the moment, customers are already dealing with enough issues as they try to move HBM DRAM physically closer to processors. “Many experts in this domain do not want to add more, and quite significant, complexity on top of the already busy situation involving HBM,” says Park.

That said, SK hynix researchers worked with Purdue University computer scientists on a comprehensive design of an HBM-PIM product called Newton in 2019. Like Samsung’s Aquabolt-XL, it places multiply-and-accumulate units in the memory banks to take advantage of the high bandwidth within the dies themselves.

“Samsung has put a stake in the ground,” —Bob O’Donnell, chief analyst at Technalysis Research

Meanwhile, Rambus, based in San Jose, Calif., was motivated to explore PIM because of power-consumption issues, says Rambus fellow and distinguished inventor Steven Woo. The company designs the interfaces between processors and memory, and two-thirds of the power consumed by system-on-chip and its HBM memory go to transporting data horizontally between the two chips. Transporting data vertically within the HBM uses much less energy because the distances are so much shorter. “You might be going 10 to 15 millimeters horizontally to get data back to an SoC,” says Woo. “But vertically you’re talking on the order of a couple hundred microns.”

Rambus’s experimental PIM design adds an extra layer of silicon at the top of the HBM stack to do AI computation. To avoid the potential bandwidth bottleneck of the HBM’s central through-silicon vias, the design adds TSVs to connect the memory banks with the AI layer. Having a dedicated AI layer in each memory chip could allow memory makers to customize memories for different applications, argues Woo.

How quickly PIM is adopted will depend on how desperate the makers of AI accelerators are for the memory-bandwidth relief it provides. “Samsung has put a stake in the ground,” says Bob O'Donnell, chief analyst at Technalysis Research. “It remains to be seen whether [PIM] becomes a commercial success.

This article appears in the January 2022 print issue as "AI Computing Comes to Memory Chips." It was corrected on 30 December to give the correct date for SK hynix's HBM PIM design.

This highly optimized jumping robot launches a staggering 30 meters into the air

Over the last decade or so, we’ve seen an enormous variety of jumping robots. With a few exceptions, these robots look to biology to inspire their design and functionality. This makes sense, because the natural world is full of jumping animals that are absolutely incredible, and matching their capabilities with robots seems like a reasonable thing to aspire to—with creatures such as ants, frogs, birds, and galagos, robots have tried (and occasionally succeeded in some specific ways) to mimic their motions.

The few exceptions to this bio-inspired approach have included robots that leverage things like compressed gas and even explosives to jump in ways that animals cannot. The performance of these robots is very impressive, at least partially because their jumping techniques don’t get all wrapped up in biological models that tend to be influenced by nonjumping things, like versatility.

For a group of roboticists from the University of California, Santa Barbara, and Disney Research, this led to a simple question: If you were to build a robot that focused exclusively on jumping as high as possible, how high could it jump? And in a paper published today in Nature, they answer that question with a robot that can jump 33 meters high, which reaches right about eyeball level on the Statue of Liberty.

These videos are unfortunately not all that great, but here’s a decent one of the jumping robot (which the researchers creatively refer to as “our jumper”) launching itself, landing, self-righting, and then launching again.

And here’s a slow-motion close-up of the jump itself.

The jumper is 30 centimeters tall and weighs 30 grams, which is relatively heavy for a robot like this. It’s made almost entirely of carbon fiber bows that act as springs, along with rubber bands that store energy in tension. The center bit of the robot includes a motor, some batteries, and a latching mechanism attached to a string that connects the top of the robot to the bottom. To prepare for a jump, the robot starts spinning its motor, which over the course of 2 minutes winds up the string, squishing the robot down and gradually storing up a kind of ridiculous amount of energy. Once the string is almost completely wound up, one additional tug from the motor trips the latching mechanism, which lets go of the string and releases all of the energy in approximately 9 milliseconds, over which time the robot accelerates from zero to 28 meters per second. All-in, the robot has a specific energy of over 1,000 joules per kilogram, which is enough to propel it about an order of magnitude higher than even the best biological jumpers, and easily triples the height of any other jumping robot in existence.

The reason that this robot can jump as high as it does is because it relies on a clever bit of engineering that you won’t find anywhere (well, almost anywhere) in biology: a rotary motor. With a rotary motor and some gears attached to a spring, you can use a relatively low amount of power over a relatively long period of time to store lots and lots of energy as the motor spins. Animals don’t have access to rotary motors, so while they do have access to springs (tendons), the amount that those springs can be charged up for jumping is limited by how much you can do with the single power stroke that you get from a muscle. The upshot here is that the best biological jumpers, like the galago, simply have the biggest jumping muscles relative to their body mass. This is fine, but it’s a pretty significant limitation to how high animals can possibly jump.

While many other robots (stretching back at least a decade) have combined rotary motors and springs for jumping, the key insight that led to this Nature paper is the understanding that the best way to engineer an optimal jumping robot is by completely inverting the biology: Instead of getting bigger jumps through bigger motors, you instead minimize the motor while using as many tricks as possible to go all in on the spring. The researchers were able to model the ratio of muscle to tendon for biological jumpers, and found that the best performance comes from a muscle that’s about 30 times the mass of the tendon. But for an engineered jumper, this paper shows that you actually want to invert that mass ratio, and this jumping robot has a spring that’s 1.2 times the mass of the motor. “We were too tied to the animal model,” coauthor Morgan Pope of Disney Research told IEEE Spectrum. “So we’ve been jumping a few meters high when we should be jumping tens of meters high.”

A series of high speed images showing the robot releasing the tension in its springs and jumping

“Seeing our robot jump for the first time was magical,” first author Elliot Hawkes from UC Santa Barbara told us. “We started with a design much more like a pogo stick before coming to a bow design, then to the hybrid spring design with the rubber bands and bows together. Countless hours went into troubleshooting all kinds of challenging mechanical problems, from gearbox teeth shearing off to hinges breaking to carbon-fiber springs exploding. Every new iteration was just as exciting—the most recent one that jumps over 30 meters just blows your mind when you see it take off in person. It’s so much energy in such a small device!”

Getting the robot to jump even higher (since Statue of Liberty eyeball-height obviously just isn’t good enough) will likely involve using a spring that’s even springier to maximize the amount of energy that the robot can store without increasing its mass. “We have pushed the energy storage pretty far with our hybrid tension-compression spring,” Hawkes says. “But I believe there could be spring designs that could push this even further. We’re at around 2,000 joules per kilogram right now.”

It’s temping to fixate on the bonkers jump height of this robot and wonder why we don’t toss all those other bio-inspired robots out the window, but it’s important to understand that this thing is very much a unitasker in a way that animals (and the robots built with animals in mind) are not. “We have made an incredibly specialized device that does one thing very well,” says Hawkes. “It jumps very high once in a while. Biological jumpers do many other things way better, and are way more robust.”

With that in mind, it’s true that even the current version of this jumping robot can self-right, jump repetitively, and carry a small payload, like a camera. The researchers suggest that this combination of mobility and efficiency might make it ideal for exploring space, where jumping can get you a lot farther. On the moon, for example, this robot would be able to cover half a kilometer per jump, thanks to lower gravity and no atmospheric drag. “The application we are currently most excited about is space exploration,” Hawkes tells us. “The moon is a truly ideal location for jumping, which opens up new possibilities for exploration because it could overcome challenging terrain. For instance, the robot could hop onto the side of an inaccessible cliff or leap into the bottom of a crater, take samples, and return to a wheeled rover.” Hawkes says that he and his team are currently working with NASA to develop this system with the goal of launching to the moon within the next five years.

The novel technique can help researchers see if AIs reason as hoped and are trustworthy

Charles Q. Choi is a science reporter who contributes regularly to IEEE Spectrum. He has written for Scientific American, The New York Times, Wired, and Science, among others.

MIT researchers developed a method that helps a user understand a machine-learning model’s reasoning, and how that reasoning compares to that of a human.

The way in which artificial intelligence reaches insights and makes decisions is often mysterious, raising concerns about how trustworthy machine learning can be. Now, in a new study, researchers have revealed a new method for comparing how well the reasoning of AI software matches that of a human in order to rapidly analyze its behavior.

As machine learning increasingly finds real-world applications, it becomes critical to understand how it reaches its conclusions and whether it does so correctly. For example, an AI program may appear to have accurately predicted that a skin lesion was cancerous, but it may have done so by focusing on an unrelated blot in the background of a clinical image.

“Machine-learning models are infamously challenging to understand,” says Angie Boggust, a computer science researcher at MIT and lead author of a new study concerning AI’s trustworthiness. “Knowing a model’s decision is easy, but knowing why that model made that decision is hard.”

A common strategy to make sense of AI reasoning examines the features of the data that the program focused on—say, an image or a sentence—in order to make its decision. However, such so-called saliency methods often yield insights on just one decision at a time, and each must be manually inspected. AI software is often trained using millions of instances of data, making it nearly impossible for a person to analyze enough decisions to identify patterns of correct or incorrect behavior.

“Providing human users with tools to interrogate and understand their machine-learning models is crucial to ensuring machine-learning models can be safely deployed in the real world.” —Angie Boggust, MIT

Now scientists at MIT and IBM Research have created a way to collect and inspect the explanations an AI gives for its decisions, thus allowing a quick analysis of its behavior. The new technique, named Shared Interest, compares saliency analyses of an AI’s decisions with human-annotated databases.

For example, an image-recognition program might classify a picture as that of a dog, and saliency methods might show that the program highlighted the pixels of the dog’s head and body to make its decision. The Shared Interest approach might, by contrast, compare the results of these saliency methods with databases of images where people annotated which parts of pictures were those of dogs.

Based on these comparisons, the Shared Interest method then calls for computing how much an AI’s decision-making aligned with human reasoning, classifying it as one of eight patterns. On one end of the spectrum, the AI may prove completely human-aligned, with the program making the correct prediction and highlighting the same features in the data as humans did. On the other end, the AI is completely distracted, with the AI making an incorrect prediction and highlighting none of the features that humans did.

The other patterns into which AI decision-making might fall highlight the ways in which a machine-learning model correctly or incorrectly interprets details in the data. For example, Shared Interest might find that an AI correctly recognizes a tractor in an image based solely on a fragment of it—say, its tire—instead of identifying the whole vehicle, as a human might, or find that an AI might recognize a snowmobile helmet in an image only if a snowmobile was also in the picture.

In experiments, Shared Interest helped reveal how AI programs worked and whether they were reliable or not. For example, Shared Interest helped a dermatologist quickly see examples of a program’s correct and incorrect predictions of cancer diagnosis from photos of skin lesions. Ultimately, the dermatologist decided he could not trust the program because it made too many predictions based on unrelated details rather than actual lesions.

In another experiment, a machine-learning researcher used Shared Interest to test a saliency method he was applying to the BeerAdvocate data set, helping him analyze thousands of correct and incorrect decisions in a fraction of the time needed with traditional manual methods. Shared Interest helped show that the saliency method generally behaved well as hoped but also revealed previously unknown pitfalls, such as overvaluing certain words in reviews in ways that led to incorrect predictions.

“Providing human users with tools to interrogate and understand their machine-learning models is crucial to ensuring machine-learning models can be safely deployed in the real world,” Boggust says.

The researchers caution that Shared Interest performs only as well as the saliency methods it employs. Each saliency method possesses its own limitations, which Shared Interest inherits, Boggust notes.

In the future, the scientists would like to apply Shared Interest to more kinds of data, such as the tabular data used in medical records. Another potential area of research could be automating the estimation of uncertainty in AI results, Boggust adds.

The scientists have made the source code for Shared Interest and live demos of it publicly available. They will detail their findings 3 May at the ACM CHI Conference on Human Factors in Computing Systems.

Learn about the latest designs in signal analysis to meet 5G NR specifications

Before 5G NR base stations can be released to the market, they must pass all the test requirements specified by standards. Conformance testing is a crucial element of the base station lifecycle, which requires a thorough understanding of specifications and implementation. Register for this free webinar now!

This webinar series discusses the testing requirements for 5G NR base station transmitters and the challenges that arise in high frequencies and wide bandwidths such as mmWaves.

Learn how to test your base stations using the new designs in Keysight's solutions while remaining compliant and performing the most accurate signal analysis. 3 categories are covered: