Enterprise-Level Internal System High-Concurrency Architecture Design and Optimization Practices

In modern enterprise digital transformation, providing simultaneous online access, smooth experience, and stable response for 300 to 500 employees has become one of the core challenges in system development. Although the user scale seems limited, enterprise internal systems often have complex business logic, high-density operations, strict permission models, and massive I/O requests, making their concurrency pressure comparable to medium-sized internet platforms.
The key to concurrency performance is not about competing on hardware, but achieving asynchronization, decoupling, cache-first approach, horizontal scaling, and observability construction at the architectural level. This article will analyze and discuss from six dimensions: backend, database, frontend, message queue, load balancing, and observability.

I. Backend Concurrency Processing: The Inevitable Evolution from WSGI to ASGI

1. Concurrency Limitations of WSGI Model

Traditional Python Web frameworks (such as Flask, Django) rely on WSGI (synchronous blocking model). The problems include:
- Each request occupies one thread/process
- Massive I/O (database, external APIs, disk) causes blocking
- 500 concurrent users lead to process explosion and huge context switching overhead
- Peak periods are prone to system avalanche
The WSGI model has limited support for I/O-intensive scenarios. Each request occupies a thread or process, and once the request involves external I/O (database, storage, third-party interfaces), the thread gets locked. When concurrency scales up to hundreds, the number of processes grows, switching overhead increases, and system throughput actually decreases.

2. ASGI: The Standard Solution for Enterprise-Level High Concurrency

ASGI is based on Event Loop + Coroutine, with characteristics:
- Single process can handle thousands of connections
- Automatically yields execution rights during I/O waiting
- Efficient utilization of CPU time slices
- Natural support for WebSockets, SSE, background tasks and other real-time business
Adopting ASGI architecture (such as FastAPI) can fundamentally change this. The event loop and coroutine mechanism allow requests to actively yield execution rights when waiting for I/O, enabling single processes to handle large numbers of connections simultaneously. For common scenarios in enterprise internal systems (form submission, queries, batch business processing, etc.), this concurrency model is more suitable.
In real business scenarios, although limited by database I/O, the difference is still clearly reflected in request scheduling capabilities.

3. Relationship Between Coroutines and GIL

GIL limits the parallel execution of Python threads on CPU, but the main bottleneck of enterprise-level systems is I/O rather than CPU.
With asyncio:
- Coroutines suspend during I/O waiting
- Avoid thread blocking
- Single core can simulate high-concurrency behavior
Therefore: GIL limitations mainly affect CPU-intensive tasks, while internal systems usually focus on database and network I/O. As long as asynchronous frameworks and asynchronous drivers are used, blocking problems can be well avoided.

II. Database High-Concurrency Governance: Connection Pools, Async Drivers, and Query Optimization

The database is usually the first bottleneck of internal systems.
Using SQLAlchemy's connection pool at the application layer can reduce the overhead of frequent connection establishment, but when backend service instances increase, each instance's connection pool will stack up, easily exceeding the database's maximum connection count.
Therefore, large-scale concurrency usually requires adding PgBouncer in front of the database to provide unified reuse and throttling at the connection layer. Through transaction-level pooling, PgBouncer can support a large number of logical connections with a small number of physical connections, avoiding excessive database pressure.
Besides this, the efficiency of queries themselves is equally important. Slow queries will occupy connections for a long time, eventually leading to connection pool exhaustion. Necessary indexes, reasonable SQL structure, avoiding N+1 queries, and using async drivers like asyncpg are all key to improving overall concurrency capabilities.

III. Redis: The Triple Role of Cache, Rate Limiting, and Sessions

Redis undertakes three core tasks in high-concurrency architecture.

1. Hot Data Cache (Cache-Aside)

Caching frequently accessed data: permission trees, organizational structures, configuration dictionaries, menu data to Redis can reduce database read pressure by more than 80%.
Adopting:
- TTL + random offset (prevent cache avalanche)
- Cache-Aside pattern

2. Concurrency Rate Limiting

Implementing based on Redis INCR: fixed window/sliding window, token bucket, leaky bucket
Used to prevent: abnormal script pressure, burst operations impacting backend, internal stress testing causing system crashes

3. Session Management and Permission Caching

Comparison with JWT:
| Item | JWT | Redis Session |
|------|-----|---------------|
| State | Stateless | Stateful |
| Revocation | Difficult | Easy (delete key) |
| Concurrency | Excellent | Excellent |
| Storage | Client-side | Redis |
| Security | Vulnerable to XSS | Easy to manage |
In enterprise scenarios, Redis Session storage is easier to manage than JWT, especially when needing to immediately log out a user - just delete the key in Redis. Enterprise internal systems recommend Redis Session + permission caching. Session lookup latency is low and won't slow down the authentication process.

IV. React Frontend: Enterprise-Level High-Frequency Interaction and Big Data Rendering Optimization

Frontend pressure in enterprise internal systems mainly comes from massive data rendering and high-frequency operations, such as:
- Large amounts of real-time data refresh
- Massive list rendering (approval lists, order lists, etc.)
- Data race conditions caused by multi-user collaboration
- Complex diff calculations due to complex permission controls
Although React already has good rendering scheduling capabilities, in large tables and lists, without optimization, the browser's main thread can easily be slowed down.

1. List Virtualization

Using react-window or react-virtualized to render only the viewport area can significantly reduce DOM node count. This is crucial for pages handling large amounts of business data.

2. State Management

Redux Toolkit and RTK Query are more practical in enterprise applications, as they can automatically handle request deduplication and cache invalidation control, reducing unnecessary requests to the backend.

3. User Interaction Optimization

Debouncing, throttling, request race condition handling (always use the latest returned data), and other logic help reduce actual concurrency and improve experience.

V. Time-Consuming Tasks and Async Queues: Extracting Time from Request Chains

Certain tasks (such as AI processing, large batch exports, external interface synchronization) cannot be executed synchronously.
Internal systems often have time-consuming tasks: large batch exports, AI processing, large synchronization tasks, etc. If these are executed directly in HTTP requests, they will cause backend Workers to be occupied for long periods, affecting response times for all users.
The standard approach is to hand these tasks over to Celery for execution.
This approach allows tasks to be queued for processing, preventing the system from being stuck by one user's heavy operations.
Advantages:
- HTTP layer is not blocked
- Peak tasks are automatically queued
- Background Workers can be horizontally scaled
- System won't be stuck by large tasks

VI. Nginx: Load Balancing and Optimization at the Traffic Entry Point

As the entry point, Nginx mainly handles three things:

1. Load Balancing

least_conn is more suitable for internal systems with large request time differences
ip_hash is suitable for WebSocket persistent connection scenarios

2. Connection Count and System Parameters

The maximum file descriptor limits of the operating system and Nginx determine how many concurrent connections the system can handle. In high-concurrency scenarios, these parameters must be adjusted according to peak expectations.

3. SSL and HTTP/2

Unified SSL offloading at the Nginx layer can reduce backend burden; enabling HTTP/2's multiplexing can speed up React static resource loading, especially in environments with poor network conditions.

VII. Observability: The "Self-Healing Capability" of Enterprise-Level Systems

The ability to find bottlenecks and recover promptly is more important than single-point performance.
Common practices include:
- Using Prometheus to collect metrics (RPS, latency, connection pool usage, queue length, etc.)
- Using Grafana for visualization
- Using distributed tracing (such as Jaeger) to locate specific time-consuming segments in requests
- Setting up liveness and readiness probes to ensure load balancers only distribute traffic to healthy instances
In scenarios with multi-person collaboration and frequent deployments, these monitoring capabilities are crucial.

High Concurrency Is Not About "Stacking Hardware", But Reducing Waiting, Decreasing Blocking, and Reasonable Traffic Distribution

A stable internal system supporting 300 to 500 concurrent users relies not on expensive servers, but on reasonable architecture in all components:
- ASGI + FastAPI provide asynchronous scheduling capabilities
- PgBouncer + async drivers jointly improve database concurrency
- Redis provides caching, rate limiting, session and permission acceleration
- React + Virtualization + RTKQuery improve frontend rendering efficiency
- Celery removes time-consuming tasks from request chains
- Nginx handles entry-point distribution and protocol processing
- Complete monitoring system helps maintain controllable system state under high load
When these components work together, the system can not only handle high concurrency but is also easier to scale, optimize, and maintain long-term.

Reference: https://gemini.google.com/share/36973feb7c42