High Performance Computing in Clouds: Moving HPC Applications to a Scalable and Cost-Effective Environment

Borin, Edson, Drummond, Lúcia Maria a., Gaudiot, Jean-Luc

  • 出版商: Springer
  • 出版日期: 2024-07-06
  • 售價: $7,030
  • 貴賓價: 9.5$6,679
  • 語言: 英文
  • 頁數: 334
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 3031297717
  • ISBN-13: 9783031297717
  • 相關分類: JVM 語言
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

This book brings a thorough explanation on the path needed to use cloud computing technologies to run High-Performance Computing (HPC) applications. Besides presenting the motivation behind moving HPC applications to the cloud, it covers both essential and advanced issues on this topic such as deploying HPC applications and infrastructures, designing cloud-friendly HPC applications, and optimizing a provisioned cloud infrastructure to run this family of applications. Additionally, this book also describes the best practices to maintain and keep running HPC applications in the cloud by employing fault tolerance techniques and avoiding resource wastage.

To give practical meaning to topics covered in this book, it brings some case studies where HPC applications, used in relevant scientific areas like Bioinformatics and Oil and Gas industry were moved to the cloud. Moreover, it also discusses how to train deep learning models in the cloud elucidating the key components andaspects necessary to train these models via different types of services offered by cloud providers.

Despite the vast bibliography about cloud computing and HPC, to the best of our knowledge, no existing manuscript has comprehensively covered these topics and discussed the steps, methods and strategies to execute HPC applications in clouds. Therefore, we believe this title is useful for IT professionals and students and researchers interested in cutting-edge technologies, concepts, and insights focusing on the use of cloud technologies to run HPC applications.

商品描述(中文翻譯)

本書對於使用雲端計算技術來運行高效能計算(HPC)應用程式所需的路徑進行了全面的解釋。除了介紹將HPC應用程式移至雲端的動機外,還涵蓋了此主題的基本和進階問題,例如部署HPC應用程式和基礎設施、設計適合雲端的HPC應用程式,以及優化已配置的雲端基礎設施以運行這類應用程式。此外,本書還描述了通過採用容錯技術和避免資源浪費來維護和持續運行雲端中的HPC應用程式的最佳實踐。

為了使本書所涵蓋的主題具有實際意義,書中提供了一些案例研究,展示了在生物資訊學和石油與天然氣行業等相關科學領域中使用的HPC應用程式如何轉移到雲端。此外,還討論了如何在雲端訓練深度學習模型,闡明了通過雲端服務提供商提供的不同類型服務來訓練這些模型所需的關鍵組件和方面。

儘管有大量關於雲端計算和HPC的文獻,但據我們所知,尚無現有手稿全面涵蓋這些主題並討論在雲端執行HPC應用程式的步驟、方法和策略。因此,我們相信這本書對於IT專業人員、學生和對尖端技術、概念及見解感興趣的研究人員來說是非常有用的,特別是聚焦於使用雲端技術來運行HPC應用程式。

作者簡介

Edson Borin: Prof. Edson Borin is an associate professor at the Institute of Computing at the University of Campinas (Unicamp) and has been working there since 2010. Prior to joining Unicamp, he was a researcher at Intel Labs in California, where he developed dynamic compilation techniques to improve next-generation HW/SW co-designed microprocessors. He also used the microcode compression algorithms he had developed in his PhD thesis to enhance the manufacturing process of Intel microprocessors, earning four divisional recognition awards. At Unicamp, Prof. Borin applies his expertise in modern computer architecture and compilers to optimize the performance and cost of scientific and engineering computing. He leads the Discovery laboratory, which is supported by government agencies such as Fapesp, CNPq and Capes, international technology companies like Intel, AMD, Samsung, Motorola, and Cadence/Tensilica, and major Brazilian corporations such as Petrobras. Several of his researchworks have been particularly geared towards optimizing the execution of seismic-processing and deep-learning applications on cloud infrastructure. In addition to his research contributions, Prof. Borin has authored eight patents, a technical book on assembly programming, and over 100 papers in international conferences and journals. He has supervised over 22 doctoral and master's students, many of whom have received recognition for their exceptional theses, dissertations, and papers.

Lúcia Maria A. Drummond: Prof. Lucia Drummond obtained her D.Sc. in Systems Engineering and Computer Science from the Federal University of Rio deJaneiro, Brazil, in 1994, where she took part of the group which developed the first Brazilian parallel computer. She has been in the Department of Computer Science of the Fluminense Federal University (UFF) since 1989, where she is now Full Professor. She currently acts in undergraduate and graduate program, advising a number of master and doctoral students. She is a Level 1 Researcher at CNPq (a Brazilian Research Agency), possessing more than 100 publications in journals and proceedings of national and international conferences. Her research interests are parallel and distributed computing, including theory and applications. She has been invited to give talks in Université Paris-Sud, École de Mines, Université d'Avignon et des Pays du Vaucluse, Université Sorbonne, France, where she has also co-advised Ph.D. students.

Jean-Luc Gaudiot: Prof. Jean-Luc Gaudiot received the Diplôme d'Ingénieur from the École Supérieure d'Ingénieurs en Electronique et Electrotechnique, Paris, France in 1976 and the M.S. and Ph.D. degrees in Computer Science from the University of California, Los Angeles in 1977 and 1982, respectively. He is currently Distinguished Professor in the Electrical Engineering and Computer Science Department at the University of California, Irvine where he was department Chair from 2003 to 2009. Priorto joining UCI in January 2002, he was a Professor of Electrical Engineering at the University of Southern California since 1982, where he served as Director of the Computer Engineering Division for three years. He has also designed distributed microprocessor systems at Teledyne Controls, Santa Monica, California (1979-1980) and performed research in innovative architectures at the TRW Technology Research Center, El Segundo, California (1980-1982). He frequently acts as consultant to companies that design high-performance computer architectures and has served as an expert witness in patent infringement and product liability cases. His research interests include programmability of parallel systems, hardware computer security, and design of Autonomous Driving Systems. He has published nearly 300 journal and conference papers. His research has been sponsored by NSF, DoE, and DARPA, as well as a number of industrial organizations. From 2006 to 2009, he was the first Editor-in-Chief of theIEEE Computer Architecture Letters, a new publication of the IEEE Computer Society, which he helped found to the end of facilitating short, fast turnaround of fundamental ideas in the Computer Architecture domain. From 1999 to 2002, he was the Editor-in-Chief of the IEEE Transactions on Computers. In June 2001, he was elected chair of the IEEE Technical Committee on Computer Architecture and re-elected in June 2003 for a second two-year term. In 2009, he was elected to the Board of Governors of the IEEE Computer Society for a 3-year-term. He was the Chair of the IEEE Computer Society Publications Board Transactions Operations Committee (2010-2011), the Chair of the IEEE Computer Society Publications Board Magazines Operations Committee in 2012, the IEEE Computer Society vice President, Educational Activities Board in 2013, and 2014-2015 IEEE Computer Society vice President, Publications Board. He served as the 2017 IEEE Computer Society President. Dr. Gaudiot is a member of AAAS, ACM, and IEEE. He has also chaired the IFIP Working Group 10.3 (Concurrent Systems). He was co-General Chairman of the 1992 International Symposium on Computer Architecture, Program Committee Chairman of the 1993 IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, the 1993 IEEE Symposium on Parallel and Distributed Processing (Systems Track), the 1995 Parallel Architectures and Compilation Techniques Conference (PACT '95), the High Performance Computer Architecture conference in 1999 (HPCA-5), and the 2005 International Parallel and Distributed Processing Symposium. In 1999, he became a Fellow of the IEEE, "For Contributions to the Programmability and Reliability of Dataflow Architectures." He was elevated to the rank of AAAS Fellow in 2007, "For Distinguished Contributions to the Design and Analysis of Highly Efficient Multiprocessor and Memory System Architectures."

Alba Melo: Prof. Alba Cristina Magalhaes Alves de Melo obtained her PhD degree in Computer Science from the Institut National Polytechnique de Grenoble (INPG), France, in 1996. In 2008, she did a postdoc at the University of Ottawa, Canada; in 2011, she was invited as Guest Scientist at Université Paris-Sud, France; and in 2013 she did a sabbatical at the Universitat Polytecnica de Catalunya, Spain. Since 1997, she works at the Department of Computer Science at the University of Brasilia (UnB), Brazil, where she is now a Full Professor. She is also a CNPq Research Fellow level 1D in Brazil. She was the Coordinator of the Graduate Program in Informatics at UnB for several years (2000-2002, 2004-2006, 2008, 2010, 2014) and she coordinated international collaboration projects with the Universitat Politecnica de Catalunya, Spain (2012, 2014-2016) and with the University of Ottawa, Canada (2012-2015). In 2016, she received the Brazilian Capes Award on "Advisor of the Best PhD Thesis in Computer Science". Her research interests are High Performance Computing, Bioinformatics and Cloud Computing. She advised 2 postdocs, 4 PhD Thesis and 22 MsC Dissertations. Currently, she advises 4 PhD students and 2 MsC students. She is Senior Member of the IEEE Society and Member of the Brazilian Computer Society. She gave invited talks at Universitat Karlshure, Germany, Université Paris-Sud, France, Universitat Polytecnica de Catalunya, Spain, University of Ottawa, Canada and at Universidad del Chile, Chile. She has currently 91 papers listed at DBLP.

Maicon Melo Alves: Dr. Maicon Melo Alves obtained his D.Sc. degree in Computer Science from the Fluminense Federal University (UFF), Brazil, in 2018, and received his M.Sc. degree in Computer Science from Rio de Janeiro Federal University (UFRJ), Brazil, in 2012. He received the best paper award (2015) and an honorable mention for his D.Sc. thesis (2019) in WSCAD, the foremost brazilian conference for the high-performance computing area. He has over 25 years of experience in IT infrastructure and, since 2006, he acts as system analyst at Petrobras, the brazilian oil and gas state company, working with high performance computing systems used to execute geoscience applications. He had joined, in 2021, the executive committee of the Regional Commission for High Performance Computing of the State of Rio de Janeiro and completed the MBA in in Data Science of Pontifícia Universidade Católica of Rio de Janeiro (PUC-RIO). He possesses two published books and publications in international journals and proceedings of national conferences. His research interests include high performance computing, parallel and distributed computing, cloud computing and artificial intelligence.

Philippe Olivier Alexandre Navaux: Prof. Philippe Olivier Alexander Navaux is a retired professor of the Informatics Institute from the Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil, since 1971. Graduated in Electronic Engineering, UFRGS, 1970, Brazil, master's in applied physics, UFRGS, 1973, Brazil, PhD in Computer Science, Grenoble National Institute of Technology (INPG), Grenoble, 1979, France. Professor of graduate and undergraduate courses on Computer Architecture - High Performance Computing. Leader of the GPPD, Parallel and Distributed Processing Group, with projects financed by government agencies Finep, CNPq, Capes, and international Cooperation with groups from France, Germany, Spain and USA, with funding from EU, CNPq and CAPES. Besides the cooperation projects with academic sector, he has conducted several research projects with private companies: Petrobras, Microsoft, Intel, HP, DELL, Altus and Itautec. Has oriented more than 100 Master and PhD students and has published near 400 papers in journals and conferences. Member of the SBC, Brazilian Computer Society, SBPC, Brazilian Society for Scientific Progress, ACM, Association for Computing Machinery, and IEEE, Institute of Electrical and Electronics Engineers. Consultant to various national and international funding organizations DoE (USA), ANR (FR), FINEP, CNPq, CAPES, FAPESP, FAPERGS, FAPEMIG, FACEPE and others. He was member of the Superior Council from the FAPERGS (one Brazilian agency for supporting research) and from the CTC, Scientific and Technical Council, of the LNCC/MCT. He was coordinator of the Computing Area Committee from the Capes/MEC (Higher Education Personnel Training Coordination / Ministry of Education).

作者簡介(中文翻譯)

**Edson Borin**:Edson Borin 教授是坎皮納斯大學(Unicamp)計算機研究所的副教授,自2010年以來一直在該校工作。在加入 Unicamp 之前,他曾在加州的英特爾實驗室擔任研究員,開發動態編譯技術以改善下一代硬體/軟體共同設計的微處理器。他還利用在博士論文中開發的微碼壓縮算法來提升英特爾微處理器的製造過程,並因此獲得四項部門表彰獎。在 Unicamp,Borin 教授運用他在現代計算機架構和編譯器方面的專業知識,優化科學和工程計算的性能與成本。他領導的 Discovery 實驗室獲得了政府機構如 Fapesp、CNPq 和 Capes、國際科技公司如英特爾、AMD、三星、摩托羅拉和 Cadence/Tensilica,以及巴西主要企業如巴西石油公司的支持。他的多項研究工作特別針對在雲基礎設施上優化地震處理和深度學習應用的執行。此外,Borin 教授還擁有八項專利,撰寫了一本關於組合語言編程的技術書籍,以及在國際會議和期刊上發表了超過100篇論文。他指導了超過22名博士和碩士生,其中許多人因其卓越的論文、學位論文和文章而獲得認可。

**Lúcia Maria A. Drummond**:Lúcia Drummond 教授於1994年在巴西里約熱內盧聯邦大學獲得系統工程和計算機科學的博士學位,並參與了開發第一台巴西平行計算機的團隊。自1989年以來,她一直在弗魯米嫩塞聯邦大學(UFF)的計算機科學系任教,目前擔任正教授。她目前在本科和研究生課程中擔任指導,指導多名碩士和博士生。她是巴西研究機構 CNPq 的一級研究員,擁有超過100篇在國內外會議的期刊和論文發表。她的研究興趣包括平行和分佈式計算,涵蓋理論和應用。她曾受邀在法國的巴黎南大學、礦業學院、阿維尼翁大學和索邦大學等地發表演講,並共同指導博士生。

**Jean-Luc Gaudiot**:Jean-Luc Gaudiot 教授於1976年在法國巴黎的電子與電氣工程高等學校獲得工程師文憑,並於1977年和1982年分別在加州大學洛杉磯分校獲得計算機科學的碩士和博士學位。他目前是加州大學爾灣分校電氣工程與計算機科學系的傑出教授,並於2003年至2009年擔任系主任。在2002年1月加入 UCI 之前,他自1982年以來一直是南加州大學的電氣工程教授,並擔任計算機工程部門主任三年。他還曾在加州聖塔莫尼卡的 Teledyne Controls 設計分佈式微處理器系統(1979-1980),並在加州埃爾塞貢多的 TRW 技術研究中心進行創新架構的研究(1980-1982)。他經常擔任設計高性能計算機架構公司的顧問,並在專利侵權和產品責任案件中擔任專家證人。他的研究興趣包括平行系統的可編程性、硬體計算機安全性和自動駕駛系統的設計。他已發表近300篇期刊和會議論文。其研究得到了 NSF、能源部和 DARPA 以及多家工業組織的資助。從2006年到2009年,他是《IEEE》期刊的首任主編。