- Lucene
High-performance text search engine library
Written entirely in Java
Cross-platform and versatile
Open source
Cross-platform and versatile
Open source
High-performance Indexing
More than 150 GiB/hour
Minimal RAM requirements: 1MiB
Fast incremental indexin
Index size is 20%-30% of indexed text
Powerful Search Algos
High-performance Indexing
More than 150 GiB/hour
Minimal RAM requirements: 1MiB
Fast incremental indexing
Index size is 20%-30% of indexed text
Ranked searching
Multifaceted queries support
Fielded search & Any-field Sorting
Multi-index search
Concurrent update & search
Memory efficient & typo-tolerant
Pluggable Ranking Models
Configurable Storage Engine
Refernece
https://www.youtube.com/watch?v=BvgGgkN3clI
- Elastic Search
It is Json based distributed web server built on top on Lucene
Schemaless - Type automatically defined at time of indexing
Gives Rest API
Horizontally scalable and pluggable
Near real time data availability
Support queue, thread pool
High Availabilty Via shard replication across node
DSL query Language Jsone based. (instead of Lucene syntax)
Middle layer of ELK
logStash - Ingestion
Kibana - Visualization
Used for Full feature search for web applications
Fuzzy search
https://www.youtube.com/watch?v=BvgGgkN3clI
Elasticsearch Lingo Explained
Field - Named key in a document, think column name in a SQL database
Term - Value for a field
Document - Individual record, a collection of fields
Index - The "schemaless" list for the collection of documents
Primary shard - Independent lucene index, only shard accepting writes to its documents
Replica shard - Duplicate shard for faster retrieval and high-availability of the data
Data node - Holds data shards and performs CRUD operations, search and aggregations
Master node - Only node that can modify the cluster, index & shard configurations
Ingest node - Node that applies ingest pipeline for document enrichment before indexing
Coordinating
How is Elasticsearch so outrageously fast
Duplicates data in multiple n-gram indices trades in disk space for speed
Inverted indices are hashmaps with complexity O(1) assuming good distribution
It keeps as much as possible in-memory
Multi-tiered Caching
Request level: (excludes queries on date ranges and with preset "size")
Data level: Frequently hit lucene indices
No comments:
Post a Comment