Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work (Paperback)

Harlan Harris, Sean Murphy, Marck Vaisman

相關主題

商品描述

There has been intense excitement in recent years around activities labeled "data science," "big data," and "analytics." However, the lack of clarity around these terms and, particularly, around the skill sets and capabilities of their practitioners has led to inefficient communication between "data scientists" and the organizations requiring their services. This lack of clarity has frequently led to missed opportunities. To address this issue, we surveyed several hundred practitioners via the Web to explore the varieties of skills, experiences, and viewpoints in the emerging data science community.

We used dimensionality reduction techniques to divide potential data scientists into five categories based on their self-ranked skill sets (Statistics, Math/Operations Research, Business, Programming, and Machine Learning/Big Data), and four categories based on their self-identification (Data Researchers, Data Businesspeople, Data Engineers, and Data Creatives). Further examining the respondents based on their division into these categories provided additional insights into the types of professional activities, educational background, and even scale of data used by different types of Data Scientists.

In this report, we combine our results with insights and data from others to provide a better understanding of the diversity of practitioners, and to argue for the value of clearer communication around roles, teams, and careers.

商品描述(中文翻譯)

近年來,「資料科學」、「大數據」和「分析」等活動引起了極大的興趣。然而,對於這些術語以及從業人員的技能和能力缺乏明確的定義,導致「資料科學家」和需要他們服務的組織之間的溝通效率低下。這種缺乏明確性經常導致錯失機會。為了解決這個問題,我們通過網絡對數百名從業人員進行了調查,以探索新興資料科學社區中的技能、經驗和觀點的多樣性。

我們使用降維技術將潛在的資料科學家分為五個類別,根據他們自評的技能集(統計學、數學/運籌學、商業、編程和機器學習/大數據),並根據他們的自我認同分為四個類別(資料研究人員、資料業務人員、資料工程師和資料創意人員)。進一步根據這些類別對受訪者進行分析,可以提供有關不同類型資料科學家的專業活動、教育背景甚至使用的數據規模的更多洞察。

在本報告中,我們將我們的結果與其他人的見解和數據相結合,以更好地理解從業人員的多樣性,並主張在角色、團隊和職業生涯方面進行更清晰的溝通的價值。