team7katas / sysopsquad
- понедельник, 31 мая 2021 г. в 00:28:51
The Sysops Squad Architectural Kata
Pavel, Suheyl, Nikita, Hassan
Everything in software architecture is a trade-off.
First Law of Software Architecture
Welcome to the Sysops Squad Architectural Kata run by O'Reilly in April - May 2021.
This page is architectural documentation for the solution proposal from Team Seven.
Penultimate Electronics is a large electronics giant that has numerous retail stores throughout the country. When customers buy computers, TV's, stereos, and other electronic equipment, they can choose to purchase a support plan. Customer-facing technology experts (the "Sysops Squad") will then come to the customers residence (or work office) to fix problems with the electronic device.
The current trouble ticket system is a large monolithic application that was developed many years ago. Customers are complaining that consultants are never showing up due to lost tickets, and often times the wrong consultant shows up to fix something they know nothing about. Customers and call-center staff have been complaining that the system is not always available for web-based or call-based problem ticket entry. Change is difficult and risky in this large monolith - whenever a change is made, it takes too long and something else usually breaks. Due to reliability issues, the monolithic system frequently "freezes up" or crashes - they think it's mostly due a spike in usage and the number of customers using the system. If something isn't done soon, Penultimate Electronics will be forced to abandon this very lucrative business line and fire all of the experts.
The desired solution from the functional perspective is represented at the following marketecture diagram.
Business Drivers
What business drivers can we learn from the situation:
Business Goals
The company establishes the following business goal to help the situation:
The company suffers from a poorly performing customer support system that can wind up the business line. They want to develop a new robust and highly performing system that will allow them to stay in business and enable future growth.
This section describes key stakeholders of the system and their architectural concerns.
SH-1: Administrator (security)
SH-2: Customer (availability, performance, scalability, robustness)
SH-3: Expert (availability, performance)
SH-4: Manager (reportability)
SH-5: Helpdesk (availability, performance)
SH-6: Development team (extensibility)
UC-1: User maintenance:
UC-2: Customer registration:
UC-3: Ticket workflow:
UC-4: Survey submission:
UC-5: Knowledge base maintenance:
UC-6: Reporting:
UC-7: Billing:
UC-8: Notification:
UC-9: Ticket search:
QA-1: scalability (UC-3)
QA-2: availability (UC-2, UC-3, UC-4)
QA-3: performance (UC-2, UC-3, UC-6)
QA-4: robustness (UC-3)
QA-5: security (UC-2, UC-7)
QA-6: extensibility (all use cases, SH-6)
This section describes the architecture of the current ticket system.
Please note that all views are documented in C4 model style, although only System Context, Container and dynamic views are presented. The most diagrams use informal notation style. All diagrams are supplied with a key explaining meaning of each shape on the diagram.
The current ticket system demonstrates very poor characteristics of availability, maintainability, deployability and performance. Our goal is to design a new system that solves aforementioned problems.
The following diagram depicts the containers diagram of the current ticket system:
This section describes the target software architecture.
Please note that all views are documented in C4 model style, although only System Context, Container and dynamic views are presented. The most diagrams use informal notation style. All diagrams are supplied with a key explaining meaning of each shape on the diagram.
The following diagram shows mapping of architecture characteristics requirements on the key use cases based on discovered requirements:
The system context diagram below depicted key users of the system and its external dependencies:
The containers diagram that follows shows the high-level shape of the software architecture and how responsibilities are distributed across containers. It also shows the major technology choices and how the containers communicate with one another.
The architecture is build around four main domains that have been discovered during the problem analysis:
The architectural style used here as the bases is Service-based architecture (see ADR-1 for details).
This section explains some key use cases to demonstrate how corresponding workflows pass through containers.
The following sequence diagram highlights some key requests that the customer performs during registration in the system. One worth paying attention is registration of a credit card. In the customer database we store only some minimal credit card data to let the customer possibility identify which card do they have already registered. All the details of the credit card are encrypted and securely passed to the billing system (see ADR-4).
The following diagram illustrates the process of a ticket registration by the customer.
Important thing to note is that the requests succeeds after the ticket is saved in the customer database and the corresponding event is fired for the ticket processing area. This way the customer will be able to see the new ticket immediately after the page refresh and will not have to wait on any further actions on the ticket.
The diagram below explains how the system processes a new ticket and assigns it an expert.
Since Ticket Process is a job that runs periodically, tickets that cannot be assigned at the given moment will never be lost, they we bill processed next time the job will run.
Also, notice that an assignment is a separate entity. This way we can store a history of assignments.
This diagram continues the ticket workflow and shows how the Ticket Assigned event is processed by the Sysops Expert user.
The experts operation succeeds as soon as the ticket status is saved in the database. And in case of acceptance the corresponding even is fired to the customer area.
This diagram demonstrates how the customer is notified when the Sysops Expert accepted the ticket.
Important to notice that the ticket is saved in the customer database prior to the notification event so that the customer will see the actual ticket status upon the notification receive.
This diagram explains the process when the Sysops Expert solved the problem and marks the ticket as completed.
This diagram illustrates how the customer receives a notification about the ticket resolution and link to the survey form.
First, the ticket status has to be updated in the customer database, so that upon receiving any notifications the customer will see the actual ticket status on the Customer Portal.
And finally the last step in the ticket resolution flow is survey submission by the customer.
From the customer perspective this is a fire-and-forget even so the operation succeeds as soon as the "Submit" button is clicked.
Analytics API can perform some preliminary processing of the survey if necessary or simply store it in the database for the reporting.
The diagram illustrates the monthly billing workflow.
The deployment diagram illustrates how the system containers are mapped to the infrastructure:
Note the colors have not special meaning, they are just to distinguish thing from one another.
The deployment strategy here is cloud-agnostic, assuming you can use any cloud provider of your choice or stay totally on-prem. An exception is the billing stuff, which is recommended to remain on-prem anyway for security considerations.
The solution proposed in the Target Architecture section is the final ambition that solves most of the problems and risks, but can require significant development efforts because of the database split required. Thus we can divide the whole work into two phases:
Here is the transitional architecture proposal that solves critical problems but leaves some risks (analysis follows). Note that we still leverage asynchronous messaging for ticket processing here to enable independent scalability and availability for different parts of the system. In this case, messages can contain mach less information because all the details can be taken by the receiver from the database.
Since we have a single monolithic database we can save some efforts on additional messaging and replication.
These are the possible high risks of the transition architecture.
Because this is a monolithic database it can become a performance bottleneck. The same concern regarding the single API Gateway - if not scaled properly may also become a bottleneck.
A single API Gateway may introduce a single point of failure for the whole system (see ADR-12).
There is a risk that admin staff can get access to the customer credit card data. We certainly want to prevent that by extracting billing into a separate architectural quantum (see ADR-4) and isolating it in a separate network zone with strict access permissions.
The same concern is regarding the customer services - we don't want to allow an attacker to get access to the reset of the system. A significant security improvement would be to migrate customer services and data in a separate quantum in isolate it in a separate network zone (see ADR-5).
Additional concerns regarding the API Gateway:
Why is more important than how.
Second Law of Software Architecture