Terms

This page defines key terms related to system quality attributes, data management strategies, and core architectural principles.

1. Resilience and Availability

These terms measure a system's ability to recover from failure and remain operational.

RPO: Recovery Point Objective

The maximum acceptable amount of data (measured in time) that can be lost after a disruption.

Example: An RPO of 1 hour means you must be able to recover data up to a state that is no older than one hour before the failure occurred. This dictates backup and replication frequency.

Reliability

The probability that a system or component will perform its required functions under stated conditions for a specified period of time.

In Practice: Often measured by MTBF (Mean Time Between Failures) or uptime percentages (e.g., "four nines" is 99.99% availability).

Replication

The process of sharing information across multiple servers or data stores to ensure consistency and improve availability and fault tolerance.

Types: Replication can be synchronous (data written simultaneously) or asynchronous (data written later).

2. Scaling and Data Management

These concepts focus on handling growth, performance, and distributing data loads.

Scalability

A measure of a system's ability to handle an increasing workload or growing amount of data.

Types: Vertical Scaling (adding more resources, like CPU or RAM, to a single server) and Horizontal Scaling (adding more servers to the resource pool).

Partitioning

The act of dividing a single logical database or index into distinct, independent parts (partitions).

Goal: To manage large volumes of data by spreading the load across multiple physical machines.

Data Sharding

A specific type of horizontal partitioning where the data is distributed across independent databases (shards) based on a sharding key (e.g., user ID or geographic region).

Benefit: It allows the system to scale beyond the capacity limits of a single database server.

3. Core Architectural Concepts

These concepts guide how systems are designed, structured, and assessed.

CAP Theorem

A fundamental principle in distributed computing stating that it is impossible for a distributed data store to simultaneously provide more than two of the following three guarantees:

Consistency (C): Every read receives the most recent write or an error.
Availability (A): Every request receives a non-error response, without guaranteeing it is the latest write.
Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

Architectural Choice: Most systems must choose between C and A when a network Partition occurs.

Abstraction

The process of hiding complex implementation details and showing only the necessary essential information to the user or caller.

In Architecture: Defining clean interfaces (APIs) for services so consuming systems don't need to know the underlying technology or complexity.

Code Complexity

A measure of the difficulty of understanding, maintaining, and testing a piece of code.

Metrics: Often assessed using quantitative measures like Cyclomatic Complexity, which counts the number of linearly independent paths through a program's source code.
Architectural Impact: High complexity in core modules leads to higher risk and maintenance costs.

Page updated

Google Sites

Report abuse