Big Data Infrastructure Technologies for Data Analytics: Scaling Data Science Applications for Continuous Growth

Demchenko, Yuri, Cuadrado-Gallego, Juan J., Chertov, Oleg

  • 出版商: Springer
  • 出版日期: 2024-10-26
  • 售價: $4,050
  • 貴賓價: 9.5$3,848
  • 語言: 英文
  • 頁數: 544
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 3031693655
  • ISBN-13: 9783031693656
  • 相關分類: 大數據 Big-dataData Science
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

This book provides a comprehensive overview and introduction to Big Data Infrastructure technologies, existing cloud-based platforms, and tools for Big Data processing and data analytics, combining both a conceptual approach in architecture design and a practical approach in technology selection and project implementation.

Readers will learn the core functionality of major Big Data Infrastructure components and how they integrate to form a coherent solution with business benefits. Specific attention will be given to understanding and using the major Big Data platform Apache Hadoop ecosystem, its main functional components MapReduce, HBase, Hive, Pig, Spark and streaming analytics. The book includes topics related to enterprise and research data management and governance and explains modern approaches to cloud and Big Data security and compliance.

The book covers two knowledge areas defined in the EDISON Data Science Framework (EDSF): Data Science Engineering and Data Management and Governance and can be used as a textbook for university courses or provide a basis for practitioners for further self-study and practical use of Big Data technologies and competent evaluation and implementation of practical projects in their organizations.

商品描述(中文翻譯)

本書提供了對大數據基礎設施技術、現有雲端平台以及大數據處理和數據分析工具的全面概述和介紹,結合了架構設計的概念性方法和技術選擇及專案實施的實務性方法。

讀者將學習主要大數據基礎設施組件的核心功能,以及它們如何整合形成具有商業利益的連貫解決方案。特別關注將放在理解和使用主要的大數據平台 Apache Hadoop 生態系統及其主要功能組件 MapReduce、HBase、Hive、Pig、Spark 和串流分析。本書還包括與企業和研究數據管理及治理相關的主題,並解釋現代雲端和大數據安全及合規的做法。

本書涵蓋了 EDISON 數據科學框架 (EDSF) 中定義的兩個知識領域:數據科學工程和數據管理及治理,並可用作大學課程的教科書,或為從業者提供進一步自學和實際應用大數據技術的基礎,以及在其組織中對實際專案的評估和實施的能力。

作者簡介

Dr. Yuri Demchenko is a Senior Researcher and lecturer at the Complex Cyber Infrastructure Research Group of the University of Amsterdam. He graduated from the National Technical University of Ukraine "Kyiv Polytechnic Institute" where he also received his PhD degree. His main research areas include Data Science and Data Management, Big Data Infrastructure and Technologies for Data Analytics, DevSecOps and general security architectures. He was involved in many European projects such as EGEE, GEANT4, FAIRsFAIR, and SLICES-DS. His current involvement is focused on the building of European SLLICES Research Infrastructure for experimentation on emerging digital technologies in the SLICES-PP project, and developing foundations for improving energy efficiency and reducing the environmental impact of the future digital RIs in the GreenDIGIT project. He actively researches the architectural and design aspects of research data management infrastructure for experimental research reproducibility and automation.

J. Cuadrado-Gallego, PhD is an Associate Professor in the Department of Computer Science at the University of Alcalá, Madrid, Spain, in the area of Computer Science and Artificial Intelligence. He has been a Visiting Associate Professor in the Department of Computer Science and Software Engineering of Concordia University, in Montreal, Canada, and in the Department of Software and IT Engineering of the École de Technologie Supérieure in Montreal, Canada. He has also been Visiting Professor, in the National Polytechnic Institute, in Mexico City, Mexico. Juan J. Cuadrado-Gallego is an MRes, MSc, and BSc in Physics from the Complutense University of Madrid, Spain and PhD in Computer Science from the Carlos III University of Madrid. In 2010, she obtained the Outstanding Research Pathway certification by the National Agency for Evaluation and Prospective of the Ministry of Science and Innovation, within the program I3 Program. Dr. Cuadrado-Gallego has carried out research stays at the University of Amsterdam, The Netherlands; the Otto-von-Guericke-University, Magdeburg, Germany; the University of Reading, UK; and the Università Roma Tre, in Rome, Italy.

Prof. Dr. Oleg Chertov is the Head of the Applied Mathematics Department at the National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" and the author of the textbook "Calculus for Programmers" (2017). He received his Master's degree in Applied Mathematics (1987) and a PhD degree in Engineering Sciences (1991) from the same university. He is a Habil. Dr. (Doctor in Engineering Sciences, 2014) from the Institute of Mathematical Machines and Systems Problems of the Ukraine National Academy of Science. He was a university project coordinator in some Horizon2020 and NATO Science for Peace & Security projects and a consultant for the World Bank and the United Nations Population Fund for some Big Data projects. He is interested in Official Statistics, Data Mining & Machine Learning, and Information Security (Group Anonymity).

Dr. Marharyta Aleksandrova is an Applied Scientist at Amazon Luxembourg. She received her master's degree from the National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", and a double PhD from the same university and the University of Lorraine, France. After completing her PhD, she was a postdoc at the University of Luxembourg, where she worked on multiple research projects and started a new research direction in her hosting group. At Amazon, she works on various projects that contribute to smooth transportation execution. Her research interests and experience include recommender systems, application of ML to security, causal ML, prediction with accuracy guarantees, and optimization. In her current role, she also got exposed to industrial-level problem scales and coding standards.

作者簡介(中文翻譯)

Dr. Yuri Demchenko 是阿姆斯特丹大學複雜網路基礎設施研究小組的高級研究員和講師。他畢業於烏克蘭國立技術大學「基輔理工學院」,並獲得博士學位。他的主要研究領域包括數據科學和數據管理、大數據基礎設施及數據分析技術、DevSecOps 以及一般安全架構。他參與了許多歐洲項目,如 EGEE、GEANT4、FAIRsFAIR 和 SLICES-DS。他目前的工作重點是參與 SLICES-PP 項目,建立歐洲 SLICES 研究基礎設施,以便對新興數位技術進行實驗,並在 GreenDIGIT 項目中發展改善能源效率和減少未來數位研究基礎設施環境影響的基礎。他積極研究實驗研究可重複性和自動化的研究數據管理基礎設施的架構和設計方面。

J. Cuadrado-Gallego 博士是西班牙馬德里阿爾卡拉大學計算機科學系的副教授,專注於計算機科學和人工智慧領域。他曾在加拿大蒙特利爾的康考迪亞大學計算機科學與軟體工程系擔任訪問副教授,並在蒙特利爾的高等技術學院軟體與資訊技術工程系任教。他還曾在墨西哥城的國立理工學院擔任訪問教授。Juan J. Cuadrado-Gallego 擁有西班牙馬德里康普頓斯大學的物理學碩士、碩士及學士學位,並在馬德里卡洛斯三世大學獲得計算機科學博士學位。2010年,他獲得了西班牙科學與創新部國家評估與前瞻性機構頒發的卓越研究路徑認證,該認證屬於 I3 計畫。Cuadrado-Gallego 博士曾在荷蘭阿姆斯特丹大學、德國馬格德堡的奧托·馮·古里克大學、英國雷丁大學以及意大利羅馬的羅馬三大學進行研究。

Oleg Chertov 教授博士是烏克蘭國立技術大學「伊戈爾·西科爾斯基基輔理工學院」應用數學系的主任,也是教科書《程序員的微積分》(2017)的作者。他於同一所大學獲得應用數學碩士學位(1987)和工程科學博士學位(1991)。他是烏克蘭國家科學院數學機器與系統問題研究所的 Habil. Dr.(工程科學博士,2014)。他曾擔任一些 Horizon2020 和北約和平與安全科學項目的大學項目協調員,並為世界銀行和聯合國人口基金提供一些大數據項目的顧問。他對官方統計、數據挖掘與機器學習以及資訊安全(群體匿名性)感興趣。

Marharyta Aleksandrova 博士是亞馬遜盧森堡的應用科學家。她在烏克蘭國立技術大學「伊戈爾·西科爾斯基基輔理工學院」獲得碩士學位,並在同一所大學和法國洛林大學獲得雙博士學位。完成博士學位後,她在盧森堡大學擔任博士後研究員,參與多個研究項目並在其所在小組中開展新的研究方向。在亞馬遜,她參與各種項目,以促進順利的運輸執行。她的研究興趣和經驗包括推薦系統、機器學習在安全中的應用、因果機器學習、具有準確性保證的預測以及優化。在目前的角色中,她也接觸到了工業級問題的規模和編碼標準。