History, Strategy, and Governance

 

M&IC’s primary goal is to allow all areas of the Lab access to the cutting-edge systems available to LLNL’s high performance computing (HPC) community.

History and Strategy

M&IC was founded in 1996 by Michel McCoy, then serving as Livermore Computing (LC) and M&IC project leader. Prior to its establishment, HPC resources were not consolidated, and each directorate would buy small- to mid-size servers for their respective employees or try to get weapons’ accounts on larger machines for research, which was time-consuming and confusing. This approach was inefficient on an institutional level; researchers could run only the problems that could be handled by these servers, and several directorates might purchase and run duplicate servers. In addition, scientists encountered obstacles when communicating with collaborators outside of their program, because they had a different understanding of programming models, algorithms, and other software tools—a disconnect that could have been harmful to LLNL should an all-hands emergency occur.

Then McCoy adopted the motto, “No LLNL scientist left behind,” and with the above dilemmas in mind, and the strong encouragement  support of acting Deputy Director for Science and Technology (S&T) Bill Lokke and Director Emeritus Bruce Tarter, he worked with senior laboratory management, in particular then Leader of the Council for National Security George Miller, to invest in centrally managed resources that would be available to all areas of the Lab. Today, M&IC continues to make these invaluable systems available, working closely with the Advanced Simulation and Computing (ASC) Program to allocate computer cycles to collaborators on- and off-site.

The multiprogrammatic aspect of M&IC sells access privileges to LLNL programs, strategic partners, and collaborators, while the institutional aspect provides free computer cycles to Laboratory Directed Research and Development (LDRD), Grand Challenge, and other special projects. McCoy credits Eugene Brooks for proposing this model, which was key to the lab’s approval to move ahead with M&IC. Each year, fractions of computers such as Sierra, Vulcan, and Catalyst are set aside for use in the Grand Challenge program, and a call for proposals is sent out. The most promising proposals win an allocation on one of the supercomputers. Former M&IC Program Director Brian Carnes said, “When a project is granted significant HPC resources to fully develop its S&T, that’s when it starts having value. The Grand Challenge Program allows that value to develop.”

M&IC Deputy Director Greg Tomaschke has managed allocations on M&IC platforms since the program’s beginning. He said, “It has been rewarding to see how Grand Challenge and LDRD projects have been able to use M&IC resources to achieve impressive results, such as those described in the S&TR article about the 10th anniversary of the Grand Challenge program. I recall senior lab managers applauding the program from very early on and recognizing that these efforts are vital to the lab.”

Below is a table of the M&IC machines, the years they were active, and their maximum GF/s. 

 

Machine Name

Years Active

Maximum GF/s

T3D FY97 37
Compass FY97 - FY01 70
TC98 FY99 - FY02 176
Sun FY98 - FY02 24
ASCI Blue FY98 - FY02 99
TC2K FY00 - FY04 683
LX FY01 - FY02 101
ASCI Frost FY01 - FY02 326
GPS FY02 - FY06 277
MCR FY02 - FY06 11,078
Qbert FY03 - FY04 12
iLX FY03 - FY06 678
Thunder FY04 - FY08 22,938
Snowbert FY05 - FY08 57
Zeus CY07 - CY09 11,059
Atlas CY07 - CY11 44,237
Yana CY07 - CY10 3034
uBGL CY08 - CY12 230,000
Hera CY08 - CY13 121,650
Ebert FY09 900
Hive (256 GB*4) FY09 563
Edge CY10 - CY14 29,030
Rzzeus CY10 - CY15 22,118
Ansel CY10 - CY15 43,546
Sierra CY10 - CY15 261,274
Aztec CY11 - CY15 12,902
Rzmerl CY12 - CY16 51,251
Cab CY12 - CY16 410,010
Vulcan CY13 - CY16 5,033,200
Borax CY16 58,100
Ray CY16 896,400
Quartz CY16 3,251,400
Lassen CY18
CY19
12,377,800
13,029,263
Corona CY19 41,984
Dane CY23 10,723,000
Tuolumne CT24 TBD

Governance

An important aspect of governing M&IC is the Institutional Computing Executive Group (ICEG). The ICEG members are well-known LLNL scientists qualified to identify computing deficiencies and request infrastructure improvements. They review M&IC progress and recommend technical directions, and they act as chief representatives for their areas. M&IC management reports to the deputy director for S&T, who provides guidance relative to the institutions overall S&T goals.

Current M&IC Program Director Becky Springmeyer said, “ICEG provides the customer-based feedback that LC uses to tailor the M&IC environment for maximum impact. We’ve fielded increasingly powerful platforms and infrastructure since the early days of MCR and Thunder, and we’ve also fielded machines specifically tuned to M&IC users’ needs for large memory, accelerators in vis [visualization] clusters, and serial clusters. What LC always provides in addition to the machines is a highly skilled staff running a 24x7x365 HPC center known for depth of expertise and excellent customer service.“

Springmeyer continued, “Within the next two years, we’ll work to ensure that our mission and science programs are effectively leveraging not one new machine but three new machines: the new Quartz Linux cluster (CTS-1), a 54-node early access AT machine equipped for data analytics, and uSierra. In its third decade, M&IC will deliver to LLNL scientists and engineers unprecedented computing power and breadth of architectures as well as depth of computational science expertise.”