Elasticsearch Indexing
()
About this ebook
About This Book
- Improve user’s search experience with the correct configuration
- Deliver relevant search results – fast!
- Save time and system resources by creating stable clusters
Who This Book Is For
If you understand the importance of a great search experience this book will show you exactly how to build one with ElasticSearch, one of the world’s leading search servers.
What You Will Learn
- Learn how ElasticSearch efficiently stores data – and find out how it can reduce costs
- Control document metadata with the correct mapping strategies and by configuring indices
- Use ElasticSearch analysis and analyzers to incorporate greater intelligence and organization across your documents and data
- Find out how an ElasticSearch cluster works – and learn the best way to configure it
- Perform high-speed indexing with low system resource cost
- Improve query relevance with appropriate mapping, suggest API, and other ElasticSearch functionalities
In Detail
Beginning with an overview of the way ElasticSearch stores data, you’ll begin to extend your knowledge to tackle indexing and mapping, and learn how to configure ElasticSearch to meet your users’ needs. You’ll then find out how to use analysis and analyzers for greater intelligence in how you organize and pull up search results – to guarantee that every search query is met with the relevant results! You’ll explore the anatomy of an ElasticSearch cluster, and learn how to set up configurations that give you optimum availability as well as scalability. Once you’ve learned how these elements work, you’ll find real-world solutions to help you improve indexing performance, as well as tips and guidance on safety so you can back up and restore data. Once you’ve learned each component outlined throughout, you will be confident that you can help to deliver an improved search experience – exactly what modern users demand and expect.
Style and approach
This is a comprehensive guide to performing efficient indexing and providing relevant search results using mapping, analyzers, and other ElasticSearch functionalities.
Related to Elasticsearch Indexing
Related ebooks
Elasticsearch Blueprints Rating: 0 out of 5 stars0 ratingsLearning Elasticsearch 7.x: Index, Analyze, Search and Aggregate Your Data Using Elasticsearch (English Edition) Rating: 0 out of 5 stars0 ratingsMonitoring Elasticsearch Rating: 0 out of 5 stars0 ratingsImplementing Cloud Design Patterns for AWS Rating: 0 out of 5 stars0 ratingsNginx Essentials Rating: 0 out of 5 stars0 ratingsLearning Elasticsearch Rating: 4 out of 5 stars4/5Learning Apache Mahout Classification Rating: 0 out of 5 stars0 ratingsTesting with JUnit Rating: 0 out of 5 stars0 ratingsElasticSearch Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsElasticsearch Essentials Rating: 0 out of 5 stars0 ratingsMastering Elasticsearch - Second Edition Rating: 0 out of 5 stars0 ratingsElasticSearch Cookbook Rating: 5 out of 5 stars5/5Elasticsearch in Action Rating: 0 out of 5 stars0 ratingsMongoDB High Availability Rating: 5 out of 5 stars5/5Amazon S3 Cookbook Rating: 0 out of 5 stars0 ratingsMaven Essentials Rating: 0 out of 5 stars0 ratingsTroubleshooting PostgreSQL Rating: 5 out of 5 stars5/5Mastering Elastic Stack Rating: 0 out of 5 stars0 ratingsApache ZooKeeper Essentials Rating: 5 out of 5 stars5/5Application Observability with Elastic: Real-time metrics, logs, errors, traces, root cause analysis, and anomaly detection Rating: 0 out of 5 stars0 ratingsScala in Action Rating: 0 out of 5 stars0 ratingsIstio in Action Rating: 0 out of 5 stars0 ratingsServerless Architectures on AWS, Second Edition Rating: 5 out of 5 stars5/5Operations Anti-Patterns, DevOps Solutions Rating: 0 out of 5 stars0 ratingsNative Docker Clustering with Swarm Rating: 0 out of 5 stars0 ratingsElasticSearch Server Rating: 0 out of 5 stars0 ratings
Programming For You
Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5Python: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5Python: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days Rating: 5 out of 5 stars5/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Learn Algorithmic Trading: Build and deploy algorithmic trading systems and strategies using Python and advanced data analysis Rating: 0 out of 5 stars0 ratingsPython Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Learn Python in 10 Minutes Rating: 4 out of 5 stars4/5Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis Rating: 0 out of 5 stars0 ratingsSQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5Clean Code in JavaScript: Develop reliable, maintainable, and robust JavaScript Rating: 5 out of 5 stars5/5Deep Learning For Dummies Rating: 0 out of 5 stars0 ratingsEthical Hacking Rating: 4 out of 5 stars4/5Mastering C# and .NET Framework Rating: 5 out of 5 stars5/5Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS Rating: 5 out of 5 stars5/5How To Become A Data Scientist With ChatGPT: A Beginner's Guide to ChatGPT-Assisted Programming Rating: 4 out of 5 stars4/5Coding with JavaScript For Dummies Rating: 0 out of 5 stars0 ratingsLearn JavaScript in 24 Hours Rating: 3 out of 5 stars3/5Git Essentials Rating: 4 out of 5 stars4/5
Reviews for Elasticsearch Indexing
0 ratings0 reviews
Book preview
Elasticsearch Indexing - Akdoğan Hüseyin
Table of Contents
Elasticsearch Indexing
Credits
About the Author
About the Reviewer
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introduction to Efficient Indexing
Getting started
Understanding the document storage strategy
The _source field
The difference between the storable and searchable field
Analysis
Summary
2. What is an Elasticsearch Index
Nature of the Elasticsearch index
Indices
Mapping
Types
Document
Denormalization
Inverted index
Summary
3. Basic Concepts of Mapping
Basic concepts and definitions
Metadata fields
_source
_all
_timestamp
_ttl
Types
Object type
Root object type
Attachment type
The relationship between mapping and relevant search results
Understanding the schema-less
Summary
4. Analysis and Analyzers
Introducing analysis
Process of analysis
Built-in analyzers
Building blocks of Analyzer
Characte filters
HTML Strip Char filter
Pattern Replace Char filter
Tokenizer
Token filters
What's text normalization?
ICU analysis plugin
ASCII Foldng Token filter
An Analyzer Pipeline
Specifying the analyzer for a field in the mapping
Creating a custom analyzer
Summary
5. Anatomy of an Elasticsearch Cluster
Basic concepts
Node
Non-data nodes
Dedicated master nodes
Client nodes
Tribe node
Shards
Replicas
Explaining the architecture of distribution
Correctly configuring the cluster
Choosing the right amount of shards and replicas
Summary
6. Improving Indexing Performance
Configuration
Memory configuration
The ES_HEAP_SIZE environment variable
Avoiding swapping
Mlockall property
Garbage collector
The structure of JVM memory
What is the problem?
Monitoring garbage collection
VisualVM
Different strategies among garbage collectors
Process of deallocating memory
Types of garbage collector
Serial garbage collector
Parallel garbage collector
Concurrent Mark Sweep garbage collector
G1 garbage collector
Tuning the garbage collection
File descriptors
Increasing FD limit on Unix systems
Optimization of mapping definition
Norms
Feature index_option of string type
Exclude unnecessary fields
Extension of the automatic index refresh time
Segments and merging policies
Choosing the right merge policy
Tiered policy
log_byte_size policy
Log_doc policy
The optimize API
Store module
Store types
Simple filesystem store
New IO filesystem store
MMap filesystem store
Hybrid filesystem store
Throttling I/O operations
Throttling type
Bulk API
Bulk sizing
Notes
Summary
7. Snapshot and Restore
Snapshot repository
Repository types
Shared filesystem repository
URL repository
Cloud repository
HDFS filesystem repository
Snapshot
Restore
Overriding index settings during restore
How does the snapshot process works?
Summary
8. Improving the User Search Experience
Correction of users' spelling mistakes
Suggesters
Using the _suggest REST endpoint
Suggest object inclusion in the query
Term suggester
Configuring the term suggester
Common suggest options
Other and additional term suggester options
The phrase suggester
Configuring the phrase suggester
The completion suggester
Mapping the configuration for the completion suggester
Indexing on completion field
Get suggestions
Improving the relevancy of search results
Boosting the query
Bool query
Synonyms
Be careful about the _all field
Summary
Index
Elasticsearch Indexing
Elasticsearch Indexing
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: December 2015
Production reference: 1171215
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-702-3
www.packtpub.com
Credits
Author
Hüseyin Akdoğan
Reviewer
John M. Petrone
Commissioning Editor
Kartikey Pandey
Acquisition Editor
Shaon Basu
Content Development Editor
Anish Dhurat
Technical Editor
Pranjali Mistry
Copy Editor
Neha Vyas
Project Coordinator
Bijal Patel
Proofreader
Safis Editing
Indexer
Mariammal Chettiyar
Graphics
Disha Haria
Production Coordinator
Nilesh Mohite
Cover Work
Nilesh Mohite
About the Author
Hüseyin Akdoğan began his software adventure with the GwBasic programming language. He started learning the Visual Basic language after QuickBasic and developed many applications until 2000, after which he stepped into the world of Web with PHP. After this, he came across Java! In addition to counseling and training activities since 2005, he developed enterprise applications with JavaEE technologies. His areas of expertise are JavaServer Faces, Spring Frameworks, and big data technologies such as NoSQL and Elasticsearch. Along with these, he is also trying to specialize in other big data technologies. Hüseyin also writes articles on Java and big data technologies and works as a technical reviewer of big data books. He was a reviewer of one of the bestselling books, Mastering Elasticsearch – Second Edition.
About the Reviewer
John M. Petrone is a veteran technology leader and innovator who has over 20 years of experience in leading software development and technical operations at organizations ranging in size and scope from early-stage start-ups to public companies and large system integrators. He's passionate about the strategic application of leading-edge technologies to solve real-world problems.
John is currently the CTO of LaunchPad Central, a SaaS platform company offering end-to-end solutions that help organizations innovate more efficiently and accelerate time to market new products. He runs the the engineering and product groups, where he heads the ongoing design, development, and operation of their SaaS products that enable high throughput innovation at scale.
Previously, John was the first CTO of Zignal Labs, a leader in delivering data-driven insights from real-time media monitoring and big data analytics. He recruited the original engineering team and designed, architected, and led the building of a real-time analytics platform. This platform ingests tens of millions of news stories, blog entries, and social media posts every day.
Prior to Zignal, John served as the SVP and CTO of Autobytel Inc (ABTL) from 2003-2008 and again from 2010-2012. He is the awarding-winning pioneer of online car buying and automotive marketing services, and he has led all technology activities and initiatives, including new product development, technical operations, and integration of acquired technologies. He was selected as one of the Premier 100 IT Leaders of 2006 by Computerworld Magazine.
John was also EVP and CTO of Preview Travel, Inc. from 1995 to 1999, where he built the team and platform and led them through a successful IPO in November 1997. Prior to Preview, he held senior technology positions at Oracle, Lotus Consulting, Price Waterhouse, and Andersen Consulting. John attended the University of Maryland, where he received a BS degree in aerospace engineering. He is also a graduate of the Executive Education Program at the UCLA Anderson School of Management.
www.PacktPub.com
Support files, eBooks, discount offers, and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Preface
The world that we live in is hungry for speed, efficiency, and accuracy. We want quick results and faster without compromising the accuracy. This is exactly why I have written this book. I have penned down my years of experience in this book to give you an insight into how to use Elasticsearch more efficiently in today's big data world. This book is targeted at experienced developers who have used Elasticsearch before and want to extend their knowledge about how to effectively perform Elasticsearch indexing. While reading this book, you'll explore different topics, all of which connect to efficient indexing and relevant search results in Elasticsearch. We will focus on understanding the document storage strategy and analysis process in Elasticsearch. This book will help you understand what is going on behind the scenes when you send a document for indexing or make a query. In addition, this book will ensure correct understanding of the meaning of schemaless by asking the question—is the claim that Elasticsearch stands for the schema-free model always true? After this, you will learn the analysis process and about analyzers. More importantly, this book will elaborate the relationship between data analysis and relevant search results. By the end of this book, I believe you will be in a position to master and unleash this beast of a technology.
What this book covers
Chapter 1, Introduction to Efficient Indexing, will introduce you to the document storage strategy and the basic concepts related to the analysis process.
Chapter 2, What is an Elasticsearch Index, describes the concept of Elasticsearch Index, how the inverted index mechanism works, why you should use data denormalization, and what its benefits. In addition to this, it explains dynamic mapping and index flexibility.
Chapter 3, Basic Concepts of Mapping, describes the basic concepts and definitions of mapping. It answers the question what is the relationship between mapping and relevant search results questions. It explains the meaning of schemaless. It also covers metadata fields and data types.
Chapter 4, Analysis and Analyzers, describes analyzers and the analysis process of Elasticsearch, what tokenizers, the character and token filters, how to configure a custom analyzer and what text normalization is. This chapter also describes the relationship between data analysis and relevant search results.
Chapter 5, Anatomy of an Elasticsearch Cluster, covers techniques to choose the right number of shards and replicas and describes a node, the shard concept, replicas, and how shard allocation works. It also explains the architecture of data distribution.
Chapter 6, Improving Indexing Performance, covers how to configure memory, how JVM garbage collector works, why garbage collector is so important for performance, and how to start tuning garbage collector. It also describes how to control the amount of I/O operations that Elasticsearch uses for segment merging and to store modules.
Chapter 7, Snapshot and Restore, covers the Elasticsearch snapshot and restore module, how to define a snapshot repository, different repository types, the process of snapshot and restore, and how to configure them. It also describes how the snapshot process works.
Chapter 8, Improving the User Search Experience, introduces Elasticsearch suggesters, which allow us to correct spelling mistakes and build efficient autocomplete mechanisms. It also covers how to improve query relevance by using different Elasticsearch functionalities such as boosting and synonyms.
What you need for this book
You need a stable version of Elasticsearch. The code of the book is compatible with Elasticsearch version 1 and later. You can examine the code of the book using cURL (that is, Client URL Library) on Linux and MacOS X. Plus, if you need a user-friendly query interface, you can use the sense plugin running on Chrome (https://chrome.google.com/webstore/detail/sense-beta/lhjgkmllcaadmopgmanpapmpjgmfcfig?hl=en).
Who this book is for
If you understand the importance of a great search experience, this book will show you exactly how to build one with Elasticsearch—one of the world's leading search servers.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: We can include other contexts through the use of the include directive.
A block of code is set as follows:
curl -XPOST localhost:9200/company/employee -d '{
firstname
: Joe Jeffers
,
lastname
: Hoffman
,
age
: 30
}'
{_index
:company
,_type
:employee
,_id
:AU7GIEQeR7spPlxvqlud
,_version
:1,created
:true}
Any command-line input or output is written as follows:
curl -XGET 'localhost:9200/_cat/health?pretty' 1448644024 19:07:04 elasticsearch yellow 1 1 6 6 0 0 6