Tokyo Tech News

Tokyo Tech's TSUBAME 3.0 and AIST's AAIC ranked 1st and 3rd on the Green500 List

RSS

Published: June 27, 2017

At the award ceremony (Fourth from left to right: Professor Satoshi Matsuoka, Specially Appointed Associate Professor Akira Nukada)
At the award ceremony
(Fourth from left to right: Professor Satoshi Matsuoka, Specially Appointed Associate Professor Akira Nukada)

  • Tokyo Tech's next-generation supercomputer TSUBAME 3.0 ranks 1st in the Green500 list (Ranking of the most energy efficient supercomputers).
  • AIST's AI Cloud, AAIC, ranks 3rd in the Green500 list, and 1st among air-cooled systems.
  • These achievements were made possible through collaboration between Tokyo Tech and AIST via the Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL).

The supercomputers at the National University Corporation, Tokyo Institute of Technology (Tokyo Tech) and the National Research and Development Corporation, National Institute of Advanced Industrial Science and Technology (AIST) have been ranked 1st and 3rd, respectively, on the Green500 List1, which ranks supercomputers worldwide in the order of their energy efficiency. The rankings were announced on June 19th (German time) at the international conference, ISC HIGH PERFORMANCE 2017 (ISC 2017), in Frankfurt, Germany. This achievement was made possible through the AIST/Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), which was established on February 20th this year, headed by Director Satoshi Matsuoka.

The TSUBAME 3.0 supercomputer of the Global Scientific Information and Computing Center (GSIC) in Tokyo Tech will commence operation in August 2017; it can achieve 14.110 GFLOPS2 per watt. It has been ranked 1st on the Green500 List of June 2017, making it Japan's first supercomputer to top the list.

The AIST AI Cloud (AAIC) supercomputer of the Artificial Intelligence Research Center (AIRC) at AIST commenced operation in April 2017. AAIC has achieved 12.681 GFLOPS per watt and is ranked 3rd on the Green500 List.

Tokyo Tech and AIST have a long history of collaboration in the field of high-performance computing, energy-efficient computing, and big data. This collaboration led to the establishment of the RWBC-OIL. The current achievements are attributed to this collaboration toward building energy-efficient, high-performance computing platforms.

TSUBAME 3.0 (conceptional drawing)
TSUBAME 3.0 (conceptional drawing)

AIST AI Cloud (AAIC)
AIST AI Cloud (AAIC)

TSUBAME 3.0 is the successor to the TSUBAME 2.0/2.5 supercomputer, which has been operating as "Everyone's supercomputer" to support research and development for academic, industrial, and government institutions in Japan since 2010. SGI—now Hewlett Packard Enterprise—and NVIDIA and their subcontractors along with Tokyo Tech conduct the design, development, and preparation of TSUBAME 3.0. TSUBAME 3.0 has the following features:

  • It has a computing power of 47.2 PFLOPS when using 16-bit floating point (half precision3) , which is effective for artificial intelligence and big data applications. This is made possible by installing 2.160 NVIDIA P100 GPUs4 in the Hewlett Packard Enterprise's SGI ICE XA energy-efficient supercomputer.
  • Both the computer itself and the cooling system have one of the most energy-efficient designs in the world. Its power usage effectiveness (PUE5), which is an indicator of system efficiency, is 1.033. This means that much more power can be spent on computing rather than on cooling.
  • The Intel Omni-Path architecture provides high multi-casting performance and superior power efficiency that have contributed to the outstanding results.

The research and development of this system was supported by the MEXT-funded projects "Ultra-green Technology for Supercomputer and Cloud Computing Infrastructure" and "Promotion of Research Towards Energy Optimization of Supercomputer and Cloud Computing Infrastructure for the Realization of a Smart Community." Owing to the contributions made by these projects, the testbed system TSUBAME-KFC6 topped the Green500 list for two consecutive years, 2013 and 2014. TSUBAME 3.0 has inherited the high-temperature liquid cooling technology developed for TSUBAME-KFC.

The AAIC is a shared computing platform for AI and big data. It was introduced as a part of the supplementary budget of 2015 for "Infrastructure Project to Accelerate Research and Development in AI and IoT" in order to accelerate the research, development, and demonstration of AI and IoT technologies by various businesses. AIRC conducted the design and development of AAIC at AIST, adopting the solutions provided by NEC and NVIDIA through competitive bidding. AAIC commenced operation in April 2017, and it has the following features:

  • It has 400 of NVIDIA's latest P100 GPUs with 16-bit half-precision and a computing power of 8.6 PFLOPS. It is one of the largest shared computing platforms in Japan for R&D on artificial intelligence.
  • The total power consumption can be maintained below 150 kW by constantly monitoring the power usage of the entire system, including storage and network. This allows the AAIC to be housed in a standard server room, without the need for specialized cooling systems or sacrificing energy efficiency.

The design of AAIC is supported by the technology developed through RWBC-OIL and the collaboration between Tokyo Tech and AIST.

This achievement is the result of a long and diverse collaboration between the two institutions on research on the energy efficiency of large scale computing. The work at Tokyo Tech was also supported by basic research projects such as the JST-CREST "Extreme Big Data - Convergence of Big Data and HPC for Yottabyte Processing" and "ULP-HPC: Ultra Low-Power, High Performance Computing via Modeling and Optimization of Next Generation HPC Technologies," and also by collaboration with NVIDIA through the CUDA Center of Excellence for several years, leading to the development of an energy-efficient HPC system with state-of-the-art GPU technology. The work at AIST was supported by the NEDO "Research and Development of Green Network System Technology" for the application of server operation technology based on power monitoring. With the establishment of RWBC-OIL, the sharing of information on GPU-based computational platforms between the institutions increased, leading to the development of production systems with the highest energy efficiency in the world.

These findings will be used to construct the AI Bridging Cloud Infrastructure (ABCI)7 to be installed at AIST in 2017. Through RWBC-OIL, the two institutions will continue to work on better system integration for big data applications and big data analytics, by utilizing TSUBAME 3.0 and AAIC. Challenges encountered during the operation of these machines will be used to advance hardware design. Driven by the integration of technologies between the two institutions, research at RWBC-OIL will help develop the infrastructure for real-world big data applications and energy efficient solutions for big data processing, including AI.

1 Green 500 List

In contrast to the TOP 500 List, which lists the 1st to 500th fastest supercomputers based on FLOPS every 6 months, the Green 500 List ranks supercomputers based on FLOPS/watt.

2 Mega Flops, Giga Flops, Tera Flops, and Peta Flops

Flops (FLOPS or Floating Operations per Second) is a primary performance index that indicates how many floating-point calculations are performed in 1 s by a given computer. Mega, giga, tera, and peta are prefixes indicating multipliers, and they indicate 10 raised to the 6th (million), 9th (billion), 12th (trillion), and 15th (quadrillion) powers, respectively.

3 Half precision

Expression for floating-point numbers on a computer, which uses 2 bytes (16 bits). Significant digits of accuracy are 3.3 in decimals. The latest GPUs perform half-precision calculations much faster than double-precision (8 bytes, 16 digits) or single-precision (4 bytes, 7 digits) calculations. Half-precision calculations are heavily used in machine learning and AI applications.

4 GPU (Graphics Processing Unit)

They were originally designed to accelerate graphics applications, but their capability for general purpose computing has increased, and GPUs have recently been used as general purpose processors in high-performance computing.

5 PUE (Power Usage Effectiveness)

It is an indicator for cooling efficiency of data centers and supercomputers. It is the ratio of the total power consumption of the system and the power consumption of the computing equipment. The closer the number is to 1.0, the higher is the cooling efficiency.

6 TSUBAME-KFC

A test system similar to TSUBAME for experimenting with energy efficiency. It employs oil-immersed cooling. It was ranked 1st in the Green500 List in November 2013 and June 2014.

7 AI Bridging Cloud Infrastructure (ABCI)

It is a cloud system that is planned to be installed in AIST by the end of this year. It will have similar energy efficiency as TSUBAME 3.0 and is expected to have one of the highest AI processing powers while remaining the most energy-efficient system in the world.

Further information

Global Scientific Information and Computing Center
Tokyo Institute of Technology

Email kib.som@jim.titech.ac.jp
Tel +81-3-5734-2087

Contact

Public Relations Section,
Tokyo Institute of Technology

Email media@jim.titech.ac.jp
Tel +81-3-5734-2975

RSS