Rating: Not rated
Tags: Computers, Data Processing, Databases, Data Mining, Systems Architecture, Distributed Systems & Computing, Programming, Algorithms, Data Modeling & Design, Lang:en
Publisher: "O'Reilly Media, Inc."
Added: November 26, 2020
Modified: November 5, 2021
Summary
In the second edition of this practical book, four
Cloudera data scientists present a set of self-contained
patterns for performing large-scale data analysis with Spark.
The authors bring Spark, statistical methods, and real-world
data sets together to teach you how to approach analytics
problems by example. Updated for Spark 2.1, this edition acts
as an introduction to these techniques and other best
practices in Spark programming.You’ll start with an
introduction to Spark and its ecosystem, and then dive into
patterns that apply common techniques—including
classification, clustering, collaborative filtering, and
anomaly detection—to fields such as genomics, security,
and finance.If you have an entry-level understanding of
machine learning and statistics, and you program in Java,
Python, or Scala, you’ll find the book’s patterns
useful for working on your own data applications.With this
book, you will:Familiarize yourself with the Spark
programming modelBecome comfortable within the Spark
ecosystemLearn general approaches in data scienceExamine
complete implementations that analyze large public data
setsDiscover which machine learning tools make sense for
particular problemsAcquire code that can be adapted to many
uses