Principal Component Analysis and Randomness Test for Big Data Analysis: Practical Applications of Rmt-Based Technique

Tanaka-Yamawaki, Mieko, Ikura, Yumihiko

  • 出版商: Springer
  • 出版日期: 2024-05-25
  • 售價: $4,300
  • 貴賓價: 9.5$4,085
  • 語言: 英文
  • 頁數: 152
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 9811939691
  • ISBN-13: 9789811939693
  • 相關分類: 大數據 Big-dataData Science
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

This book presents the novel approach of analyzing large-sized rectangular-shaped numerical data (so-called big data). The essence of this approach is to grasp the "meaning" of the data instantly, without getting into the details of individual data. Unlike conventional approaches of principal component analysis, randomness tests, and visualization methods, the authors' approach has the benefits of universality and simplicity of data analysis, regardless of data types, structures, or specific field of science.

First, mathematical preparation is described. The RMT-PCA and the RMT-test utilize the cross-correlation matrix of time series, C = XXT, where X represents a rectangular matrix of N rows and L columns and XT represents the transverse matrix of X. Because C is symmetric, namely, C = CT, it can be converted to a diagonal matrix of eigenvalues by a similarity transformation SCS-1 = SCST using an orthogonal matrix S. When N is significantly large, the histogram of the eigenvalue distribution can be compared to the theoretical formula derived in the context of the random matrix theory (RMT, in abbreviation).

Then the RMT-PCA applied to high-frequency stock prices in Japanese and American markets is dealt with. This approach proves its effectiveness in extracting "trendy" business sectors of the financial market over the prescribed time scale. In this case, X consists of N stock- prices of length L, and the correlation matrix C is an N by N square matrix, whose element at the i-th row and j-th column is the inner product of the price time series of the length L of the i-th stock and the j-th stock of the equal length L.

Next, the RMT-test is applied to measure randomness of various random number generators, including algorithmically generated random numbers and physically generated random numbers.

The book concludes by demonstrating two applications of the RMT-test: (1) a comparison of hash functions, and (2) stock prediction by means of randomness, including a new index of off-randomness related to market decline.

商品描述(中文翻譯)

本書提出了一種分析大型矩形數據(即所謂的巨量資料)的新方法。這種方法的本質在於能夠瞬間把握數據的「意義」,而無需深入個別數據的細節。與傳統的主成分分析、隨機性測試和可視化方法不同,作者的方法具有普遍性和簡單性,無論數據類型、結構或特定科學領域如何,都能有效進行數據分析。

首先,描述了數學準備。RMT-PCA 和 RMT-test 利用時間序列的交叉相關矩陣 C = XXT,其中 X 代表一個具有 N 行和 L 列的矩形矩陣,而 XT 代表 X 的轉置矩陣。由於 C 是對稱的,即 C = CT,它可以通過相似變換 SCS-1 = SCST 轉換為特徵值的對角矩陣,這裡使用了正交矩陣 S。當 N 顯著增大時,特徵值分佈的直方圖可以與隨機矩陣理論(簡稱 RMT)中推導的理論公式進行比較。

接著,書中探討了 RMT-PCA 在日本和美國市場的高頻股價上的應用。這種方法證明了其在提取金融市場在特定時間範圍內的「趨勢」商業部門方面的有效性。在這種情況下,XN 個長度為 L 的股價組成,相關矩陣 C 是一個 NN 的方陣,其第 i 行第 j 列的元素是第 i 支股票和第 j 支股票的長度為 L 的價格時間序列的內積。

接下來,RMT-test 被應用於測量各種隨機數生成器的隨機性,包括算法生成的隨機數和物理生成的隨機數。

本書最後展示了 RMT-test 的兩個應用:(1)哈希函數的比較,以及(2)通過隨機性進行的股票預測,包括與市場下跌相關的新指標「非隨機性」。

作者簡介

Mieko Tanaka-Yamawaki, former professor, Tottori University
Yumihiko Ikura, Meiji University

作者簡介(中文翻譯)

田中山脇美惠子,前鳥取大學教授
井倉由美彥,明治大學