Rating: Not rated
Tags: Computers, Programming Languages, Java, Databases, General, Data Mining, System Administration, Storage & Retrieval, Web, Search Engines, Desktop Applications, Lang:en
Publisher: Manning Publications
Added: July 26, 2020
Modified: November 5, 2021
Summary
SummaryTika in Action is a hands-on guide to content
mining with Apache Tika. The book's many examples and case
studies offer real-world experience from domains ranging from
search engines to digital asset management and scientific
data processing.About the TechnologyTika is an Apache toolkit
that has built into it everything you and your app need to
know about file formats. Using Tika, your applications can
discover and extract content from digital documents in almost
any format, including exotic ones.About this BookTika in
Action is the ultimate guide to content mining using Apache
Tika. You'll learn how to pull usable information from
otherwise inaccessible sources, including internet media and
file archives. This example-rich book teaches you to build
and extend applications based on real-world experience with
search engines, digital asset management, and scientific data
processing. In addition to architectural overviews, you'll
find detailed chapters on features like metadata extraction,
automatic language detection, and custom parser
development.This book is written for developers who are new
to both Scala and Lift and covers just enough Scala to get
you started. Purchase of the print book comes with an offer
of a free PDF, ePub, and Kindle eBook from Manning. Also
available is all code from the book. What's InsideCrack MS
Word, PDF, HTML, and ZIPIntegrate with search engines, CMS,
and other data sourcesLearn through experimentationMany
examplesThis book requires no previous knowledge of Tika or
text mining techniques. It assumes a working knowledge of
Java.==========================================Table
of ContentsPART 1 GETTING STARTEDThe case for the digital
Babel fishGetting started with TikaThe information
landscapePART 2 TIKA IN DETAILDocument type detectionContent
extractionUnderstanding metadataLanguage detectionWhat's in a
file?PART 3 INTEGRATION AND ADVANCED USEThe big pictureTika
and the Lucene search stackExtending TikaPART 4 CASE
STUDIESPowering NASA science data systemsContent management
with Apache JackrabbitCurating cancer research data with
TikaThe classic search engine example