Wednesday, 14 July 2021

Techs

  • Lucene 
High-performance text search engine library

Written entirely in Java
Cross-platform and versatile
Open source

High-performance Indexing

More than 150 GiB/hour
Minimal RAM requirements: 1MiB
Fast incremental indexin
Index size is 20%-30% of indexed text

Powerful Search Algos

High-performance Indexing
More than 150 GiB/hour
Minimal RAM requirements: 1MiB
Fast incremental indexing
Index size is 20%-30% of indexed text

Ranked searching
Multifaceted queries support
Fielded search & Any-field Sorting
Multi-index search
Concurrent update & search
Memory efficient & typo-tolerant
Pluggable Ranking Models
Configurable Storage Engine

  • Refernece
    https://www.youtube.com/watch?v=BvgGgkN3clI

    • Elastic Search
    It is  Json based distributed web server built on top on Lucene
    Schemaless - Type automatically defined at time of indexing
    Gives Rest API
    Horizontally scalable and pluggable
    Near real time data availability
    Support queue, thread pool
    High Availabilty Via shard replication across node
    DSL query Language Jsone based. (instead of Lucene syntax)

    Middle layer of ELK
    logStash - Ingestion
    Kibana - Visualization

    Used for Full feature search for web applications
    Fuzzy search
    https://www.youtube.com/watch?v=BvgGgkN3clI


    Elasticsearch Lingo Explained
    Field - Named key in a document, think column name in a SQL database

    Term - Value for a field

    Document - Individual record, a collection of fields

    Index - The "schemaless" list for the collection of documents

    Primary shard - Independent lucene index, only shard accepting writes to its documents
    Replica shard - Duplicate shard for faster retrieval and high-availability of the data

    Data node - Holds data shards and performs CRUD operations, search and aggregations
    Master node - Only node that can modify the cluster, index & shard configurations
    Ingest node - Node that applies ingest pipeline for document enrichment before indexing
    Coordinating

    How is Elasticsearch so outrageously fast
    Duplicates data in multiple n-gram indices trades in disk space for speed






    Inverted indices are hashmaps with complexity O(1) assuming good distribution

    It keeps as much as possible in-memory

    Multi-tiered Caching

    Request level: (excludes queries on date ranges and with preset "size")
    Data level: Frequently hit lucene indices




    •  

    No comments:

    Post a Comment