Global Scientific Information and Computing Center (GSIC)
"Tokyo Tech's supercomputer TSUBAME1 achieves world's highest power efficiency," reads a June 21, 2017 newspaper headline.
Professor Satoshi Matsuoka smiles.
The Tokyo Tech Global Scientific Information and Computing Center (GSIC) professor is the lead developer of TSUBAME3.0, the latest version of the Institute's supercomputer, which ranked No.1 on the Green500 list.2 Introduced in 2007, Green500 was designed to encourage improvements in supercomputer power efficiency, with rankings from 1 to 500 published twice each year. Another supercomputer, AIST AI Cloud (AAIC) developed by the National Institute of Advanced Industrial Science and Technology (AIST), was third in these rankings. The performance of these two machines demonstrates the highly advanced power efficiency of Japanese supercomputers. Matsuoka was also involved in designing AAIC.
TSUBAME, which gets its name from the Tokyo Tech seal, is an extremely high-efficiency large-scale cluster-type supercomputer. It currently provides supercomputing services to both academic and industrial users. Tokyo Tech and many other universities, research institutes, and private companies have been achieving outstanding results with this leading-edge machine.
Computing node × 540
Configuration (per node)
Intel Xeon E5-2680 v4 2.4GHz × 2CPU
14 cores / 28 threads × 2CPU
NVIDIA TESLA P100 for NVlink-Optimized Servers × 4
Intel Omni-Path HFI 100Gbps × 4
Matsuoka started research on cluster supercomputers in 1996. On the basis of his findings, he started full-scale development of TSUBAME in 2004. Since the first production startup in March 2006, the TSUBAMEseries has been recognized as demonstrating the leading edge of supercomputing technology. The second incarnation of the machine — TSUBAME2.0 — went into production in November 2010, and was followed by the TSUBAME2.5 upgrade in September 2013. The brand new TSUBAME3.0 was installed and began production in August 2017.
"This was the third time that TSUBAME ranked No.1 on the Green500 list. TSUBAME-KFC,3 which was developed as an experimental prototype for TSUBAME3.0, was ranked No.1 in November 2013 and June 2014. This, however, is the first time it was rated No.1 in the world as a petaflops-scale production supercomputer, which is very meaningful for us," says Matsuoka.
The Green500 list assesses power efficiency based on processing speed4 per 1 W of power consumption. This figure does not include the power required to cool the supercomputer, which is a significant addition to the base machine power in actual use, sometimes almost equaling the machine power itself. Cooling the TSUBAME3.0, however, requires only 3 percent of total power consumption. This is roughly one tenth of that required by other supercomputers, proving that Matsuoka's creation is truly a practical supercomputer offering extremely high power efficiency.
TSUBAME also boasts world-class processing speed. TSUBAME2.0 was ranked No.4 in the world and No.1 in Japan in the Top500 list in November 2010, when it began service. In addition, Assistant Professor Takashi Shimokawabe and Associate Professor Akira Nukada at GSIC created the world's first detailed simulation of alloy crystallization utilizing TSUBAME2.0. Professor Matsuoka's remarkable achievements were recognized with an ACM Gordon Bell Prize, the most prestigious award in the field of supercomputers, jointly with Shimokawabe and his supervisor and joint researcher Professor Takayuki Aoki, along with Nukada.
TSUBAME has contributed to high-performance computing (HPC) with remarkable processing speed, power efficiency, and low cost. The newest version, TSUBAME3.0, is also expected to achieve top performance in Japan in simulation science, as well as in new workloads such as AI and big data processing.
Matsuoka explains. "Theory and experimentation have traditionally played significant roles in scientific development. Recently, however, computational science represented by computer simulation has come to play a significant role as the third wave in science, and moreover, data science has rapidly attracted attention as the fourth wave of science. TSUBAME3.0 realizes the high-performance application of data science ahead of other supercomputers, and this has attracted attention around the globe."
What enables Matsuoka to stay at the forefront of world-class supercomputer development?
"The reason I incorporated AI and big data functions into TSUBAME3.0 was not because people started paying attention to them. When the potential for a rapid explosion in information was first considered more than ten years ago, we predicted that larger scale data processing would be required and started basic research on how to make it happen on a supercomputer. TSUBAME3.0 is the result of such longstanding research results. Rather than responding to change, we anticipated and started research well in advance of it," says Matsuoka.
The professor always has his eyes set 10 to 20 years into the future, something that has not changed since he started work on the first TSUBAME. "I always seek to set rather than follow the trend when I conduct research and development."
Matsuoka and his team were the first to facilitate a graphics processing unit (GPU) on a large-scale supercomputer. TSUBAME became known throughout the world for a hybrid architecture that integrates scalar operation by CPU and vector operation by GPU. The world's first application of a large-scale GPU enabled astonishingly high processing speed and power efficiency.
Since then, the development of supercomputers using GPUs has become a global standard. Supercomputers used at the Oak Ridge National Laboratory in the United States also use GPUs, and Matsuoka is recognized as the pioneer of general-purpose computing on graphics processing units (GPGPU).
Matsuoka explains GPU application. "There are two ways to improve the performance of supercomputers. One is to increase CPU processing speed with increased power consumption, and the other is to increase the number of low-power consumption CPUs. Although the former method was often employed in the past, power efficiency has become more important due to power limits that a facility can provide to a machine. Therefore, I initiated basic research on power efficiency, and implemented the technology in the first TSUBAME. However, conventional low-power CPUs reached their limits very quickly, especially as machine size increased. As such, I started basic research on using non-conventional processors such as GPUs with extreme parallelism and high power efficiency, and examined whether they are amenable to general-purpose high-performance computing on a supercomputer."
The professor has always focused on processors with commercial usage rather than custom-made CPUs. GPUs had fit that bill, but they were not initially meant for general-purpose computing. After extensive research, Matsuoka found that GPUs would achieve five to six times greater power performance in general high-performance computing workloads compared with CPUs in terms of computation and memory, provided that programming went well.
Although programming was not possible with the first GPUs, programmable GPUs were developed in response to the need for complex computer graphics. Matsuoka did not miss this. He thought that the realization of extreme parallel computing utilizing a large number of GPUs would be useful for HPC and could achieve significant power efficiency. He started basic research in 2003.
Looking back on that period, Matsuoka comments, "Hard-core supercomputer experts at the time called me crazy! Also, GPU manufacturers themselves told me that GPUs were designed and developed for graphics workstations, not for supercomputers. However, for someone who had made computers by himself since middle school, and who had thoroughly learned about the principles of computer science at university, the application of GPUs made perfect sense."
This is one of Matsuoka's strengths. He anticipates functions that might be required and starts basic research on them. He also emphasizes the importance of shifting from basic to applied research promptly, anticipating potential applications, and realizing practical use.
In fact, Matsuoka made the prototype TSUBAME available at an early stage and allowed researchers from a wide range of fields to use it, enabling the professor to identify various issues during actual operation. This approach advanced his research and development. "This must be the reason I have maintained my position at the leading edge of supercomputer development all these years."
Matsuoka is planning to make future generations of TSUBAME more powerful. However, he also thinks about the future beyond this. "Computers have greatly improved in accordance with Moore's law.5 This has brought qualitative improvements such as those that allow us to watch high-definition movies via the internet on our smartphones. The innovations in information technology achieved by Moore's law are immeasurable. However, Moore's law is nearing its limit. In about 10 years, we will enter the post-Moore era. Now is the time to consider the next method of achieving continued scalability in computing performance. Otherwise, opportunities for disruptive innovations in IT will be mostly lost."
Matsuoka mentions quantum computers as one of the methods. "Quantum computers, however, are not a comprehensive solution. We cannot replace most of the existing general-purpose computers with quantum computers. Therefore, I am focusing on other computing methods too, such as neuromorphic computers and near-memory computing."
Computer simulation was based on reductionism. Reductionism is the segmentation of a phenomenon into smaller elements and observation of the interactions between the elements to understand the entire phenomenon. This is like improving the accuracy of simulations through the increase of resolution. However, after entering the post-Moore era, we can no longer expect further increases in resolution. Instead of applying reductionism, AI and neuromorphic computer concepts attempt to forecast the future by observing and grasping overall phenomena.
Matsuoka continues, "The idea is to make deductions from data like the human brain does without using simulation. This, it is argued, reduces the amount of computations significantly and increases the processing speed greatly. AI and big data play important roles in the deduction from data, as seen by deep neural networks of today. Processing efficiency of AI using neuromorphic principles could further improve the efficiency significantly. Therefore, AI, big data, and neuromorphic computing will be closely associated in the post-Moore era. One of the reasons that I focused on AI and big data processing in the development of TSUBAME3.0 was to prepare for this era."
To conclude, Matsuoka gives young researchers and students the following message: "Power efficiency, increasing the processing speed through the utilization of GPUs, and application of AI and big data processing were all realized after carrying out basic studies for more than 10 years. It takes a tremendous amount of time to see concrete outcomes from such research. Looking back on my experience as a researcher, I can say that the most important issue to consider is what you hope to achieve from the research, and then work steadily toward that goal so that you arrive ahead of others. This is true in all fields of research."
Abbreviation of Tokyo-tech Supercomputer UBiquitously Accessible Mass-storage Environment
While the Top500 project ranks and details the 500 most powerful supercomputer systems in the world every six months based on benchmark speed performance, the Green500 project does so based on power efficiency (speed performance / power consumption).
Like other supercomputers in the TSUBAME series, TSUBAME-KFC also included GPUs to demonstrate power efficiency performance. It applied an oil cooling system and ranked No.1 in the world on the Green500 list in November 2013 and June 2014.
Flops (FLOPS or Floating Operations Per Second) is a primary performance index that indicates how many floating point calculations are performed in one second by a given computer. Giga, tera, and peta are prefix multipliers and indicate 10 to the ninth (billion), twelfth (trillion), and fifteenth (quadrillion) powers respectively. One petaflop, therefore, is one quadrillion (thousand trillion) calculations per second.
This law states that the number of transistors in dense integrated circuits doubles approximately every 18 to 24 months. This was advocated by Gordon Moore, the co-founder of Intel Corporation, based on his experience in 1965.
The Special Topics component of the Tokyo Tech Website shines a spotlight on recent developments in research and education, achievements of its community members, and special events and news from the Institute.
Past features can be viewed in the Special Topics Gallery.
Published: November 2017