Data Science and Predictive Analytics: Biomedical and Health Applications using R
暫譯: 數據科學與預測分析:使用 R 的生物醫學與健康應用

Ivo D. Dinov

商品描述

Over the past decade, Big Data have become ubiquitous in all economic sectors, scientific disciplines, and human activities. They have led to striking technological advances, affecting all human experiences. Our ability to manage, understand, interrogate, and interpret such extremely large, multisource, heterogeneous, incomplete, multiscale, and incongruent data has not kept pace with the rapid increase of the volume, complexity and proliferation of the deluge of digital information. There are three reasons for this shortfall. First, the volume of data is increasing much faster than the corresponding rise of our computational processing power (Kryder’s law > Moore’s law). Second, traditional discipline-bounds inhibit expeditious progress. Third, our education and training activities have fallen behind the accelerated trend of scientific, information, and communication advances. There are very few rigorous instructional resources, interactive learning materials, and dynamic training environments that support active data science learning. The textbook balances the mathematical foundations with dexterous demonstrations and examples of data, tools, modules and workflows that serve as pillars for the urgently needed bridge to close that supply and demand predictive analytic skills gap.

 

Exposing the enormous opportunities presented by the tsunami of Big data, this textbook aims to identify specific knowledge gaps, educational barriers, and workforce readiness deficiencies. Specifically, it focuses on the development of a transdisciplinary curriculum integrating modern computational methods, advanced data science techniques, innovative biomedical applications, and impactful health analytics.

The content of this graduate-level textbook fills a substantial gap in integrating modern engineering concepts, computational algorithms, mathematical optimization, statistical computing and biomedical inference. Big data analytic techniques and predictive scientific methods demand broad transdisciplinary knowledge, appeal to an extremely wide spectrum of readers/learners, and provide incredible opportunities for engagement throughout the academy, industry, regulatory and funding agencies.

 The two examples below demonstrate the powerful need for scientific knowledge, computational abilities, interdisciplinary expertise, and modern technologies necessary to achieve desired outcomes (improving human health and optimizing future return on investment). This can only be achieved by appropriately trained teams of researchers who can develop robust decision support systems using modern techniques and effective end-to-end protocols, like the ones described in this textbook.

 • A geriatric neurologist is examining a patient complaining of gait imbalance and posture instability. To determine if the patient may suffer from Parkinson’s disease, the physician acquires clinical, cognitive, phenotypic, imaging, and genetics data (Big Data). Most clinics and healthcare centers are not equipped with skilled data analytic teams that can wrangle, harmonize and interpret such complex datasets. A learner that completes a course of study using this textbook will have the competency and ability to manage the data, generate a protocol for deriving biomarkers, and provide an actionable decision support system. The results of this protocol will help the physician understand the entire patient dataset and assist in making a holistic evidence-based, data-driven, clinical diagnosis.

• To improve the return on investment for their shareholders, a healthcare manufacturer needs to forecast the demand for their product subject to environmental, demographic, economic, and bio-social sentiment data (Big Data). The organization’s data-analytics team is tasked with developing a protocol that identifies, aggregates, harmonizes, models and analyzes these heterogeneous data elements to generate a trend forecast. This system needs to provide an automated, adaptive, scalable, and reliable prediction of the optimal investment, e.g., R&D allocation, that maximizes the company’s bottom line. A reader that complete a course of study using this textbook will be able to ingest the observed structured and unstructured data, mathematically represent the data as a computable object, apply appropriate model-based and model-free prediction techniques. The results of these techniques may be used to forecast the expected relation between the company’s investment, product supply, general demand of healthcare (providers and patients), and estimate the return on initial investments.
 

 

商品描述(中文翻譯)

在過去十年中,大數據已經在所有經濟部門、科學領域和人類活動中變得無處不在。它們促進了顯著的技術進步,影響了所有人類的經驗。我們管理、理解、詢問和解釋這些極其龐大、多來源、異質、不完整、多尺度和不一致數據的能力,並未跟上數字信息洪流的體量、複雜性和擴散速度的快速增長。造成這一短缺的原因有三個。首先,數據的增長速度遠快於我們計算處理能力的相應提升(Kryder定律 > 摩爾定律)。其次,傳統的學科界限阻礙了迅速的進展。第三,我們的教育和培訓活動未能跟上科學、信息和通信進步的加速趨勢。支持主動數據科學學習的嚴謹教學資源、互動學習材料和動態培訓環境非常稀缺。本教科書平衡了數學基礎與靈活的數據、工具、模塊和工作流程的示範和範例,這些都是填補供需預測分析技能差距所急需的橋樑。

本教科書揭示了大數據浪潮所帶來的巨大機遇,旨在識別特定的知識差距、教育障礙和勞動力準備不足的問題。具體而言,它專注於開發一個跨學科課程,整合現代計算方法、先進的數據科學技術、創新的生物醫學應用和有影響力的健康分析。

這本研究生級別的教科書填補了現代工程概念、計算算法、數學優化、統計計算和生物醫學推斷整合方面的重大空白。大數據分析技術和預測科學方法需要廣泛的跨學科知識,吸引了極其廣泛的讀者/學習者,並為學術界、行業、監管機構和資助機構提供了驚人的參與機會。

以下兩個例子展示了實現期望結果(改善人類健康和優化未來投資回報)所需的科學知識、計算能力、跨學科專業知識和現代技術的強大需求。這只能通過適當訓練的研究團隊來實現,他們能夠使用現代技術和有效的端到端協議(如本教科書中所描述的)來開發穩健的決策支持系統。

• 一位老年神經科醫生正在檢查一位抱怨步態不穩和姿勢不穩定的病人。為了確定病人是否可能患有帕金森病,醫生獲取臨床、認知、表型、影像和基因數據(大數據)。大多數診所和醫療中心並未配備能夠處理、協調和解釋這些複雜數據集的熟練數據分析團隊。完成本教科書學習的學習者將具備管理數據、生成衍生生物標記的協議以及提供可行的決策支持系統的能力。該協議的結果將幫助醫生理解整個病人數據集,並協助做出全面的基於證據、數據驅動的臨床診斷。

• 為了提高其股東的投資回報,一家醫療製造商需要預測其產品的需求,這需要考慮環境、人口統計、經濟和生物社會情感數據(大數據)。該組織的數據分析團隊負責開發一個協議,識別、聚合、協調、建模和分析這些異質數據元素,以生成趨勢預測。該系統需要提供自動化、適應性、可擴展和可靠的最佳投資預測,例如,最大化公司利潤的研發分配。完成本教科書學習的讀者將能夠攝取觀察到的結構化和非結構化數據,將數據數學表示為可計算對象,並應用適當的基於模型和無模型的預測技術。這些技術的結果可用於預測公司投資、產品供應、醫療保健(提供者和病人)的一般需求之間的預期關係,並估算初始投資的回報。