Redis -- Deconstructing the Core of Distributed Caching Using a Question-and-Answer Approach (Part 1)

1. General Questions

What is Redis, and why use it?
Redis is an open-source, in-memory, Key-Value NoSQL database. It is renowned for its extremely fast read and write speeds and is commonly used as a cache, database, or message broker.
The main reasons to use Redis are:
* Intercepts massive requests: Protects the backend database from being overwhelmed by high traffic.
* Multiple data structures: Supports String, Hash, List, Set, ZSet, etc., enabling the processing of complex business logic directly in memory.
* Atomicity: All operations are atomic, making it naturally suitable for concurrent scenarios like counters and distributed locks.

What are the common use cases for Redis?
* Cache: Stores hot data (e.g., product info, user info) to significantly reduce database pressure and improve response speed.
* Distributed Locks: Uses atomic operations like SETNX to solve resource competition issues in distributed systems.
* Counters / Rate Limiting: Tracks the number of likes or views, or uses counters to limit API access frequency.
* Leaderboards: Uses the automatic sorting feature of ZSet to implement real-time scoring or popularity rankings.
* Session Management: Centrally stores user login status in a distributed cluster to achieve session sharing across multiple servers.
* Message Queues: Uses List or Stream structures to implement simple asynchronous task processing and decoupling.

Why is Redis so fast?
* In-Memory Operations: Data is stored directly in memory, eliminating the seek and read/write overhead of disk I/O (memory access speeds are tens of thousands of times faster than disk).
* Single-Threaded Model: Core network processing uses a single thread, avoiding the overhead of context switching and lock competition found in multi-threaded environments, ensuring operation atomicity.
* I/O Multiplexing: Uses the epoll non-blocking I/O model, allowing a single thread to efficiently handle tens of thousands of concurrent connections.
* Efficient Data Structures: Redis has extremely optimized algorithms and memory layouts for various internal structures (e.g., SDS for strings, SkipList, ZipList).

2. Data Types and Data Structures

What are the data types in Redis?
* Five Basic Data Types
* String: The most basic type, binary-safe, maximum size of 512MB. Use cases: caching, counters, distributed locks, CAPTCHA codes.
* Hash: A collection of key-value pairs (e.g., user info user:101 {name: "Tom", age: 18}). Use cases: storing objects, shopping carts.
* List: A simple list of strings, sorted by insertion order. Use cases: message queues, latest feeds, timelines.
* Set: An unordered and non-repeating collection of strings. Use cases: tags, mutual friends, deduplication for lucky draws.
* ZSet (Sorted Set): An ordered collection where each element is associated with a double-type score, sorted by score. Use cases: leaderboards, trending searches, delay queues.
* Three Advanced Data Types
* Bitmap: Based on String, uses bitwise operations to record 0/1 states, highly space-efficient. Use cases: user check-ins, active status tracking.
* HyperLogLog: A probabilistic data structure used for cardinality estimation. Consumes very little memory (about 12KB) under massive data volumes but has a standard error of about 0.81%. Use cases: hundreds of millions of UV tracking.
* GeoSpatial (GEO): Stores latitude and longitude to calculate coordinates or the distance between two points. Use cases: finding people nearby, calculating ride-hailing distances.
* New Generation Data Type
* Stream: Introduced in Redis 5.0, mainly used to implement persistent message queues (similar to Kafka), solving the issue of message loss when using List as a queue.

What are the commands for Redis data types?
(Note: Answers flow directly into the object mechanism below)

3. Discuss the Redis Object Mechanism (redisObject)

typedef struct redisObject { 
unsigned type:4;       // 1. Type (external, referring to the 5 major mapped data types) 
unsigned encoding:4;   // 2. Encoding (internal, underlying encoding) 
unsigned lru:24;       // 3. LRU/LFU Information (used for eviction) 
int refcount;          // 4. Reference Count (used for memory reclamation) 
void *ptr;             // 5. Pointer (points to the actual underlying data structure)
} robj;

The design reasons for this object mechanism include:
* Decoupling: Commands (like LLEN) only need to target the List type, without worrying whether the underlying implementation is a ZipList or a LinkedList.
* Extreme Memory Optimization: Compact storage is used for small amounts of data (trading time for space), while efficient indexes are used for large amounts of data (trading space for time).
* Smart Maintenance: Comes with built-in reference counting and access recording to handle memory reclamation and cache eviction automatically.

What are the underlying data structures of Redis data types?
Common types and their underlying structures:
* String: SDS (Simple Dynamic String)
* List: quicklist (A combination of doubly linked list + ziplist/listpack)
* Hash: ziplist or hashtable
* Set: intset or hashtable
* ZSet: ziplist or skiplist + hashtable

Why was SDS designed?
Redis does not use C-language strings (char*) directly; instead, it encapsulates its own SDS. Native C strings (ending with \0) cannot meet Redis's requirements for high performance and safety. The design advantages of SDS are:
* O(1) Length Retrieval: Internally records the len attribute, allowing length retrieval in constant time.
* Prevents Buffer Overflow: Checks if space is sufficient before appending; if not, it automatically expands.
* Reduces Memory Reallocation: Employs spatial pre-allocation and lazy space release strategies.
* Binary Safe: Does not rely on \0 to determine the end, making it capable of storing binary data like images and audio.

What is the maximum capacity a String type value can store?
512 MB

Why was Stream designed?
Before Stream was introduced, Redis Pub/Sub mechanisms had obvious pain points:
* List type: Although persistent, it does not support multiple consumer groups, and implementing acknowledgment (ACK) mechanisms is complex.
* Pub/Sub: Cannot be persisted; "fire and forget" logic means messages are lost if consumers go offline.
* Design goal of Stream: To provide a highly available message queue model that supports persistence, multiple consumer groups, and message acknowledgment mechanisms.

In what scenarios is Stream used?
* Asynchronous Task Processing: Task flows where messages must not be lost.
* Multi-Consumer Scenarios: The same data stream needs to be consumed simultaneously by different business systems (e.g., settlement system, notification system).
* High-Performance Log Collection: Leveraging its append-only characteristics to record massive transaction streams.

Did the message ID design consider the issue of time backtracking?
Yes. The default format for a Stream ID is <millisecondsTime>-<sequenceNumber>.
* Defense Mechanism: Redis records the server's maximum ID timestamp.
* Handling Logic: If the system time backtracks (making the generated timestamp smaller than the previous ID), Redis will forcefully use the timestamp of the last ID and simply increment its sequence number, thereby guaranteeing the monotonic progression of the ID.

3. Persistence and Memory

What are the persistence mechanisms of Redis? What are their pros and cons? How are they generally used?

Mechanism	Principles	Advantages	Disadvantages
RDB (Snapshotting)	Periodically generates and saves memory data as a binary file.	Fast recovery, small file size, low performance overhead.	More data loss (loses data after the last snapshot), snapshot generation is time-consuming.
AOF (Append Only File)	Records every write command and saves them in an append-only format.	Higher data safety (loses only seconds of data), highly readable logs.	Large file size, slow recovery, IO bottlenecks under extremely high concurrency.

General Usage: Hybrid Persistence (RDB + AOF). Use RDB for full backups and AOF for incremental recording, balancing safety and speed.

What are the deletion strategies for expired keys in Redis?
Redis uses a combination of "Lazy Deletion + Periodic Deletion".
* Lazy Deletion: Checks if the key is expired only when it is accessed, and deletes it if so. (Saves CPU, costs memory)
* Periodic Deletion: Randomly samples a batch of keys at set intervals to check for and delete expired keys. (A compromise solution)

What are the Redis memory eviction algorithms?
When memory hits the maxmemory limit, the following algorithms are triggered:
* LRU (Least Recently Used): Evicts the data that has not been accessed for the longest time.
* LFU (Least Frequently Used): Evicts the data with the lowest access frequency.
* Random: Evicts randomly.
* TTL: Prioritizes evicting data that is about to expire.
* Noeviction: Does not evict; directly throws an error for write operations (Default configuration).

What happens when Redis runs out of memory?
* If an eviction policy is set (e.g., allkeys-lru), Redis will automatically delete old data according to the algorithm to free up space.
* If no policy is set or if it is set to noeviction, Redis will reject all write requests (throwing an OOM error), but read requests will work normally.

How to optimize Redis memory?
* Control Key Length: Use shorter names.
* Avoid Big Keys: Split overly large Hashes or Lists.
* Use Efficient Encoding: Prefer using ZipList (Compressed list) to store small-scale data.
* Set Expiration Times: Ensure cold data can be automatically released.
* Enable Memory Defragmentation: Configure activedefrag yes.

How to set the expiration time and continuous validity for a Redis key?
* Set Expiration: EXPIRE key seconds or PEXPIRE key milliseconds.
* Permanent/Valid Forever: Keys are permanent by default upon creation. To remove an expiration time, use PERSIST key.

What is the purpose of Pipelines in Redis?
* Function: Packages multiple commands and sends them to the server all at once to reduce Network RTT (Round Trip Time).
* Effect: Vastly improves the performance of batch operations. Transforms the process from "send one, receive one" to "send a batch, receive a batch."