Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data
暫譯: 實用合成數據生成:平衡隱私與數據的廣泛可用性
Emam, Khaled El, Mosquera, Lucy, Hoptroff, Richard
- 出版商: O'Reilly
- 出版日期: 2020-06-23
- 定價: $2,350
- 售價: 8.8 折 $2,068 (限時優惠至 2025-03-31)
- 語言: 英文
- 頁數: 167
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1492072745
- ISBN-13: 9781492072744
-
相關分類:
人工智慧、大數據 Big-data、Machine Learning
立即出貨 (庫存=1)
買這商品的人也買了...
-
$2,280Working Effectively with Legacy Code (Paperback)
-
$1,575Git for Teams: A User-Centered Approach to Creating Efficient Workflows in Git (Paperback)
-
$1,700$1,700 -
$4,620$4,389 -
$1,980$1,881 -
$600$474 -
$1,452Deep Learning with JavaScript: Neural Networks in Tensorflow.Js
-
$560$442 -
$1,998$1,898 -
$1,742Microservices Security in Action
-
$680$537 -
$2,156Parallel and High Performance Computing (Paperback)
-
$2,024Multithreaded JavaScript: Concurrency Beyond the Event Loop
-
$2,420Software Architecture: The Hard Parts: Modern Trade-Off Analyses for Distributed Architectures (Paperback)
-
$1,540Data Privacy: A Runbook for Engineers
-
$2,680$2,626 -
$2,006Mastering API Architecture: Design, Operate, and Evolve Api-Based Systems (Paperback)
-
$600$510 -
$2,233Functional and Concurrent Programming: Core Concepts and Features
-
$2,077Practical Data Privacy: Enhancing Privacy and Security in Data (Paperback)
-
$1,805Functional Design: Principles, Patterns, and Practices (Paperback)
-
$750$375 -
$1,892Learning Systems Thinking: Essential Nonlinear Skills and Practices for Software Professionals (Paperback)
-
$1,742Collaborative Software Design: How to Facilitate Domain Modeling Decisions
-
$2,119Mastering Opentelemetry and Observability: Enhancing Application and Infrastructure Performance and Avoiding Outages
商品描述
Building and testing machine learning models requires access to large and diverse data. But where can you find usable datasets without running into privacy issues? This practical book introduces techniques for generating synthetic data--fake data generated from real data--so you can perform secondary analysis to do research, understand customer behaviors, develop new products, or generate new revenue.
Data scientists will learn how synthetic data generation provides a way to make such data broadly available for secondary purposes while addressing many privacy concerns. Analysts will learn the principles and steps for generating synthetic data from real datasets. And business leaders will see how synthetic data can help accelerate time to a product or solution.
This book describes:
- Steps for generating synthetic data using multivariate normal distributions
- Methods for distribution fitting covering different goodness-of-fit metrics
- How to replicate the simple structure of original data
- An approach for modeling data structure to consider complex relationships
- Multiple approaches and metrics you can use to assess data utility
- How analysis performed on real data can be replicated with synthetic data
- Privacy implications of synthetic data and methods to assess identity disclosure
商品描述(中文翻譯)
建立和測試機器學習模型需要訪問大量且多樣化的數據。但是,在哪裡可以找到可用的數據集而不會遇到隱私問題呢?這本實用的書籍介紹了生成合成數據的技術——從真實數據生成的假數據——以便您可以進行二次分析,進行研究、了解客戶行為、開發新產品或創造新收入。
數據科學家將學習合成數據生成如何提供一種使這些數據廣泛可用於二次用途的方式,同時解決許多隱私問題。分析師將學習從真實數據集生成合成數據的原則和步驟。而商業領導者將看到合成數據如何幫助加速產品或解決方案的上市時間。
本書描述了:
- 使用多變量正態分佈生成合成數據的步驟
- 涉及不同擬合優度指標的分佈擬合方法
- 如何複製原始數據的簡單結構
- 一種考慮複雜關係的數據結構建模方法
- 多種方法和指標可用於評估數據的效用
- 如何用合成數據複製在真實數據上進行的分析
- 合成數據的隱私影響及評估身份披露的方法
作者簡介
Dr. Khaled El Emam is a senior scientist at the Children's Hospital of Eastern Ontario (CHEO) Research Institute and Director of the multi-disciplinary Electronic Health Information Laboratory, conducting academic research on synthetic data generation methods, and re- identification risk measurement, and he is also a Professor in the Faculty of Medicine (Pediatrics) at the University of Ottawa.
He is the founder, CEO, and President of Privacy Analytics. Khaled has been performing data analysis since the early 90s, building statistical and machine learning models for prediction and evaluation. Since 2004 he has been developing technologies to facilitate the sharing of data for secondary analysis, from basic research on algorithms to applied solutions development that have been deployed globally. These technologies addressed problems in anonymization & pseudonymization, synthetic data, secure computation, and data watermarking. He has (co- )written multiple books on various privacy and software engineering topics. In 2003 and 2004, he was ranked as the top systems and software engineering scholar worldwide by the Journal of Systems and Software based on his research on measurement and quality evaluation and improvement. Previously, Khaled was a Senior Research Officer at the National Research Council of Canada. He also served as the head of the Quantitative Methods Group at the Fraunhofer Institute in Kaiserslautern, Germany. He held the Canada Research Chair in Electronic Health Information at the University of Ottawa from 2005 to 2015, and has a PhD from the Department of Electrical and Electronics Engineering, King's College, at the University of London, England.
Lucy Mosquera has a bachelor's degree in Biology and Mathematics from Queen's University and is a current graduate student in the department of statistics at the University of British Columbia. During her time at Queen's, Lucy provided data management support on a dozen clinical trials and observational studies run through Kingston General Hospital's Clinical Evaluation Research Unit. Lucy has also worked on clinical trial data sharing methods based on homomorphic encryption and secret sharing protocols. At Replica Analytics, Lucy is responsible for developing statistical and machine learning models for data generation, and integrating subject area expertise in clinical trial data into synthetic data generation methods, as well as the statistical assessments of our synthetic data generation.
Dr. Richard Hoptroff is a long term technology inventor, investor and entrepreneur. Awarded a PhD in Physics by King's College London for his work in optical computing and artificial intelligence, in 1992, together with Ravensbeck, he founded Right Information Systems, a neural network forecasting software company which was in 1997 sold to Cognos Inc (part of IBM). He then worked as a postdoc at the Research Laboratory for Archaeology and the History of Art at Oxford University and in 2001, created Flexipanel Ltd, a company supplying Bluetooth modules to the electronics industry.
In 2010, he founded the Hoptroff London, with the aim to develop smart, hyper-accurate watch movements and create a new watch brand. In 2013 he established a new commercial category when he brought to market the first commercial atomic timepiece and atomic wristwatch.
Hoptroff has now leveraged his expertise in timing technology and software to develop a hyper- accurate synchronised timestamping solution for the financial services sector, based on a unique combination of grandmaster atomic clock engineering and proprietary software.
作者簡介(中文翻譯)
Dr. Khaled El Emam 是東安大略兒童醫院 (CHEO) 研究所的高級科學家,也是多學科電子健康資訊實驗室的主任,專注於合成數據生成方法和再識別風險測量的學術研究,同時他也是渥太華大學醫學院(小兒科)的教授。
他是 Privacy Analytics 的創始人、首席執行官和總裁。Khaled 自90年代初期以來一直從事數據分析,建立統計和機器學習模型以進行預測和評估。自2004年以來,他一直在開發技術,以促進數據的二次分析共享,從算法的基礎研究到已在全球部署的應用解決方案開發。這些技術解決了匿名化與假名化、合成數據、安全計算和數據水印等問題。他已(共同)撰寫多本有關隱私和軟體工程主題的書籍。在2003年和2004年,他因在測量和質量評估及改進方面的研究,被《系統與軟體期刊》評選為全球頂尖的系統與軟體工程學者。Khaled 之前是加拿大國家研究委員會的高級研究官。他還曾擔任德國凱瑟斯勞滕弗勞恩霍夫研究所定量方法小組的負責人。他於2005年至2015年在渥太華大學擔任電子健康資訊的加拿大研究主席,並擁有英國倫敦大學國王學院電氣與電子工程系的博士學位。
Lucy Mosquera 擁有女王大學的生物學和數學學士學位,目前是英屬哥倫比亞大學統計系的研究生。在女王大學期間,Lucy 為金斯頓總醫院臨床評估研究單位運行的十幾個臨床試驗和觀察性研究提供數據管理支持。Lucy 還曾研究基於同態加密和秘密共享協議的臨床試驗數據共享方法。在 Replica Analytics,Lucy 負責開發數據生成的統計和機器學習模型,並將臨床試驗數據的主題專業知識整合到合成數據生成方法中,以及對我們的合成數據生成進行統計評估。
Dr. Richard Hoptroff 是一位長期的技術發明家、投資者和企業家。他因在光學計算和人工智慧方面的研究而獲得倫敦國王學院的物理學博士學位。1992年,他與 Ravensbeck 共同創立了 Right Information Systems,這是一家神經網絡預測軟體公司,該公司於1997年被 Cognos Inc(IBM 的一部分)收購。隨後,他在牛津大學考古學與藝術史研究實驗室擔任博士後研究員,並於2001年創立了 Flexipanel Ltd,該公司向電子行業提供藍牙模組。
2010年,他創立了 Hoptroff London,旨在開發智能、超精確的手錶機芯並創建一個新的手錶品牌。2013年,他推出了第一款商業原子時鐘和原子手錶,開創了一個新的商業類別。
Hoptroff 現在利用他在計時技術和軟體方面的專業知識,為金融服務行業開發了一種超精確的同步時間戳解決方案,該方案基於獨特的主時鐘工程和專有軟體的組合。