M&IC’s primary goal is to allow all areas of the Lab access to the cutting-edge systems available to LLNL’s high performance computing (HPC) community.
History and Strategy
M&IC was founded in 1996 by Michel McCoy, then serving as Livermore Computing (LC) and M&IC project leader. Prior to its establishment, HPC resources were not consolidated, and each directorate would buy small- to mid-size servers for their respective employees or try to get weapons’ accounts on larger machines for research, which was time-consuming and confusing. This approach was inefficient on an institutional level; researchers could run only the problems that could be handled by these servers, and several directorates might purchase and run duplicate servers. In addition, scientists encountered obstacles when communicating with collaborators outside of their program, because they had a different understanding of programming models, algorithms, and other software tools—a disconnect that could have been harmful to LLNL should an all-hands emergency occur.
Then McCoy adopted the motto, “No LLNL scientist left behind,” and with the above dilemmas in mind, and the strong encouragement support of acting Deputy Director for Science and Technology (S&T) Bill Lokke and Director Emeritus Bruce Tarter, he worked with senior laboratory management, in particular then Leader of the Council for National Security George Miller, to invest in centrally managed resources that would be available to all areas of the Lab. Today, M&IC continues to make these invaluable systems available, working closely with the Advanced Simulation and Computing (ASC) Program to allocate computer cycles to collaborators on- and off-site.
The multiprogrammatic aspect of M&IC sells access privileges to LLNL programs, strategic partners, and collaborators, while the institutional aspect provides free computer cycles to Laboratory Directed Research and Development (LDRD), Grand Challenge, and other special projects. McCoy credits Eugene Brooks for proposing this model, which was key to the lab’s approval to move ahead with M&IC. Each year, fractions of computers such as Sierra, Vulcan, and Catalyst are set aside for use in the Grand Challenge program, and a call for proposals is sent out. The most promising proposals win an allocation on one of the supercomputers. Former M&IC Program Director Brian Carnes said, “When a project is granted significant HPC resources to fully develop its S&T, that’s when it starts having value. The Grand Challenge Program allows that value to develop.”
M&IC Deputy Director Greg Tomaschke has managed allocations on M&IC platforms since the program’s beginning. He said, “It has been rewarding to see how Grand Challenge and LDRD projects have been able to use M&IC resources to achieve impressive results, such as those described in the S&TR article about the 10th anniversary of the Grand Challenge program. I recall senior lab managers applauding the program from very early on and recognizing that these efforts are vital to the lab.”
Below is a table of the M&IC machines, the years they were active, and their maximum GF/s.
Machine Name |
Years Active |
Maximum GF/s |
---|---|---|
T3D | FY97 | 37 |
Compass | FY97 - FY01 | 70 |
TC98 | FY99 - FY02 | 176 |
Sun | FY98 - FY02 | 24 |
ASCI Blue | FY98 - FY02 | 99 |
TC2K | FY00 - FY04 | 683 |
LX | FY01 - FY02 | 101 |
ASCI Frost | FY01 - FY02 | 326 |
GPS | FY02 - FY06 | 277 |
MCR | FY02 - FY06 | 11,078 |
Qbert | FY03 - FY04 | 12 |
iLX | FY03 - FY06 | 678 |
Thunder | FY04 - FY08 | 22,938 |
Snowbert | FY05 - FY08 | 57 |
Zeus | CY07 - CY09 | 11,059 |
Atlas | CY07 - CY11 | 44,237 |
Yana | CY07 - CY10 | 3034 |
uBGL | CY08 - CY12 | 230,000 |
Hera | CY08 - CY13 | 121,650 |
Ebert | FY09 | 900 |
Hive (256 GB*4) | FY09 | 563 |
Edge | CY10 - CY14 | 29,030 |
Rzzeus | CY10 - CY15 | 22,118 |
Ansel | CY10 - CY15 | 43,546 |
Sierra | CY10 - CY15 | 261,274 |
Aztec | CY11 - CY15 | 12,902 |
Rzmerl | CY12 - CY16 | 51,251 |
Cab | CY12 - CY16 | 410,010 |
Vulcan | CY13 - CY16 | 5,033,200 |
Borax | CY16 | 58,100 |
Ray | CY16 | 896,400 |
Quartz | CY16 | 3,251,400 |
Lassen | CY18 CY19 |
12,377,800 13,029,263 |
Corona | CY19 | 41,984 |
Dane | CY23 | 10,723,000 |
Tuolumne | CT24 | TBD |
Governance
An important aspect of governing M&IC is the Institutional Computing Executive Group (ICEG). The ICEG members are well-known LLNL scientists qualified to identify computing deficiencies and request infrastructure improvements. They review M&IC progress and recommend technical directions, and they act as chief representatives for their areas. M&IC management reports to the deputy director for S&T, who provides guidance relative to the institutions overall S&T goals.
Current M&IC Program Director Becky Springmeyer said, “ICEG provides the customer-based feedback that LC uses to tailor the M&IC environment for maximum impact. We’ve fielded increasingly powerful platforms and infrastructure since the early days of MCR and Thunder, and we’ve also fielded machines specifically tuned to M&IC users’ needs for large memory, accelerators in vis [visualization] clusters, and serial clusters. What LC always provides in addition to the machines is a highly skilled staff running a 24x7x365 HPC center known for depth of expertise and excellent customer service.“
Springmeyer continued, “Within the next two years, we’ll work to ensure that our mission and science programs are effectively leveraging not one new machine but three new machines: the new Quartz Linux cluster (CTS-1), a 54-node early access AT machine equipped for data analytics, and uSierra. In its third decade, M&IC will deliver to LLNL scientists and engineers unprecedented computing power and breadth of architectures as well as depth of computational science expertise.”