Rating: Not rated
Tags: Computers, Data Processing, Databases, Data Mining, Programming Languages, Java, Python, Computer Engineering, Data Modeling & Design, Lang:en
Publisher: "O'Reilly Media, Inc."
Added: November 19, 2020
Modified: November 5, 2021
Summary
Learn how to use, deploy, and maintain Apache Spark with
this comprehensive guide, written by the creators of the
open-source cluster-computing framework. With an emphasis on
improvements and new features in Spark 2.0, authors Bill
Chambers and Matei Zaharia break down Spark topics into
distinct sections, each with unique goals.You’ll
explore the basic operations and common functions of
Spark’s structured APIs, as well as Structured
Streaming, a new high-level API for building end-to-end
streaming applications. Developers and system administrators
will learn the fundamentals of monitoring, tuning, and
debugging Spark, and explore machine learning techniques and
scenarios for employing MLlib, Spark’s scalable
machine-learning library.Get a gentle overview of big data
and SparkLearn about DataFrames, SQL, and
Datasets—Spark’s core APIs—through worked
examplesDive into Spark’s low-level APIs, RDDs, and
execution of SQL and DataFramesUnderstand how Spark runs on a
clusterDebug, monitor, and tune Spark clusters and
applicationsLearn the power of Structured Streaming,
Spark’s stream-processing engineLearn how you can apply
MLlib to a variety of problems, including classification or
recommendation