Python Basics: Techs

Lucene

High-performance text search engine library

Written entirely in Java
Cross-platform and versatile
Open source

High-performance Indexing

More than 150 GiB/hour

Minimal RAM requirements: 1MiB

Fast incremental indexin

Index size is 20%-30% of indexed text

Powerful Search Algos

High-performance Indexing

More than 150 GiB/hour

Minimal RAM requirements: 1MiB

Fast incremental indexing

Index size is 20%-30% of indexed text

Ranked searching

Multifaceted queries support

Fielded search & Any-field Sorting

Multi-index search

Concurrent update & search

Memory efficient & typo-tolerant

Pluggable Ranking Models

Configurable Storage Engine

Refernece

https://www.youtube.com/watch?v=BvgGgkN3clI

Elastic Search

It is Json based distributed web server built on top on Lucene

Schemaless - Type automatically defined at time of indexing

Gives Rest API

Horizontally scalable and pluggable

Near real time data availability

Support queue, thread pool

High Availabilty Via shard replication across node

DSL query Language Jsone based. (instead of Lucene syntax)

Middle layer of ELK

logStash - Ingestion

Kibana - Visualization

Used for Full feature search for web applications

Fuzzy search

https://www.youtube.com/watch?v=BvgGgkN3clI

Elasticsearch Lingo Explained

Field - Named key in a document, think column name in a SQL database

Term - Value for a field

Document - Individual record, a collection of fields

Index - The "schemaless" list for the collection of documents

Primary shard - Independent lucene index, only shard accepting writes to its documents

Replica shard - Duplicate shard for faster retrieval and high-availability of the data

Data node - Holds data shards and performs CRUD operations, search and aggregations

Master node - Only node that can modify the cluster, index & shard configurations

Ingest node - Node that applies ingest pipeline for document enrichment before indexing

Coordinating

How is Elasticsearch so outrageously fast

Duplicates data in multiple n-gram indices trades in disk space for speed

Inverted indices are hashmaps with complexity O(1) assuming good distribution

It keeps as much as possible in-memory

Multi-tiered Caching

Request level: (excludes queries on date ranges and with preset "size")

Data level: Frequently hit lucene indices

Python Basics

Wednesday, 14 July 2021

Techs

No comments:

Post a Comment