相關主題
商品描述
Take a deep dive into Apache Spark and the big data ecosystem. You will acquire an understanding of the next generation of distribution systems, Apache Spark architecture and abstraction, and the Spark ecosystem including Spark SQL, GraphX and MLlib. Beginning Spark provides a practical guide for using Apache Spark in real-world data processing. The author discusses and illustrates how different concepts of Spark are brought together in order to solve complex issues with a data flow system.
With the rise in popularity of distributed systems like Hadoop, more and more people are working in big data processing. A growing number of companies want to build dataflow systems, which can churn huge amounts of data to gain insights for their business. Since Hadoop was a first generation, open source distributed system, there is a need for a next generation distributed system to take data processing to next level. Apache Spark is the next step in that direction. Spark brings a great flexibility and compositional system to the big data world by revolutionizing the field itself.