ITDSD - 4. Quantitative Analysis of Distributed Software

sun.shuo@aliyun.com

5.00/5 (1 vote)

Jun 14, 2019

CPOL

26 min read

5719

Introduction to Distributed System Design - 4. Quantitative Analysis of Distributed Software

Introduction

This is the fifth article on distributed architecture. This paper mainly introduces how to calculate the success rate of a single request and the stability rate of repeated requests in a distributed system to obtain the stability evaluation of the system. Evaluate performance and redundancy based on software architecture. Potential performance bottlenecks are identified through system analysis. It provides data support for the design of distributed system.

You can click on the links below to find the first four articles:

A Network With Shared Data Must Be an Asynchronous Network

When multiple tasks attempt to use one data together. Tasks usually need to queue up to use this data in turn. Because using data at the same time can cause data corruption, it is common to modify part of the data for each task separately, resulting in overall data corruption. The case that multiple tasks cannot run simultaneously on a single data is called data atomicity. At this time, data is a protocol with consistency function among multiple tasks, so it plays an irreplaceable role in the operation relationship of multiple tasks. The alternate use of data ensures the exchange of information between tasks through the address of data. The asynchronous model and synchronous model describe the execution state of tasks when using data. For example, one task is using data while another task is waiting. The task waiting is called synchronous execution.

Figure 1.

In a distributed environment, the client and the server trigger the execution of related tasks through messages. The client's task sending request triggers the server's task to be executed. Usually, the client will block the execution of the task and wait to return before the server executes the request and returns. The blocking of the client ensures that the state of the client does not change until the server returns. This allows the server to continue processing on the current client after it returns. We call this a time when the client executes the request synchronously. The server executes the request asynchronously. Because the service will not block after processing this request and wait for other instructions from the client. This ensures that the service can process other client requests in turn. Waiting by connection or by processing request is an important difference between synchronization and asynchrony.

Figure 2.

In order to prove that the network composed of single-threaded servers with shared data is necessarily asynchronous, first, it is assumed that both client and server use synchronization after creating links, and can share data with other clients. That is, after any party has processed the message, it will block and wait for the other party to send the new message. This way of linking is called fully synchronized mode. In fully synchronized mode, as long as the client maintains a link, the server cannot process other client requests. Assuming that the link time of the client is infinite, the time taken by the server is infinite. Then the server will not be able to handle other client requests for an infinite period of time. This is shown in Figure 3.

Figure 3.

The server must not be able to share data with other clients in an infinite amount of time. This contradicts our hypothesis. So we get theorem 1.

Theorem 1. A Network Composed of Single-Threaded Servers Sharing Data Must Be an Asynchronous Network.

An asynchronous network with shared data is characterized by consistency based on delay time t. The so-called consistency of delay time t means that the information obtained by any party who obtains the information is the information after the delay time t. In other words, the information obtained by any receiver is the information before the time t of the sender. The sender of the message does not guarantee that the information will change during this period of time. The delay time t is a corollary based on the conservation of energy. Because the server hardware obeys the law of capacity conservation. Therefore, the transmission of any information on the server hardware also obeys the law of capacity conservation. So we get reasoning 1.

Inference 1, Information Synchronization Between Any Two Points Will Have a Delay Time T.

Inference 1 shows that the data obtained at any point is nominally older. For example, the game score queried by the user on the server. The server may have been updated to a different score when the user sees the data. If in order to ensure that users see the data in its use process to ensure consistency. So to lock the server is to make the link synchronization mode. Theorem 1 shows that the network in synchronous mode cannot share data. Inference 1 also shows that the transmission of information is directional. This transfer triggers the execution of tasks, and the resulting information is directional. Following this orientation, the basis for quantitative analysis of distributed networks emerges. Because in the direction of task execution, the hardware and software involved constitute a distributed system.

Probability of Serialization and Concurrency

When the client sends the request to the distributed system, the distributed system executes the request on a series of servers. Mark these servers as 1 to n, where n is an arbitrary integer. Assume that each server has a probability of error. This error includes hardware downtime, software crash, etc. If we record the probability of error as x1, we can know from the probability formula that the probability of successful serial execution of multiple servers is x1*x2*x3... Xn, we get formula 1. For example, there is a task request that needs to be returned to the client through the gateway server, web server, memory database and hard disk database. Assuming that these four servers are 99%, 90%, 96% and 96%, the probability of successful requests is 0.99 * 0.9 * 0.96 * 0.96 = 0.82.

Although the probability of failure of a single request is high, it can be known by the concurrency probability formula when the user repeats the request. The probability of success for independent event duplicate requests is 1 - ((1-x1)* (1-x2)* (1-x3)... (1-xn). We get formula 2. The probability of two repeated requests passing at any one time is calculated to be 1-(1-0.82)*(1-0.82) = 0.97, so this repeated request increases the probability by about 15 percentage points. It can be seen that the success rate of repeated requests increases, which offsets the decline of the success rate of serial execution of the system. Although it is very similar to the concept of idempotency in mathematics. But according to Theorem 1, the distributed system is a dynamic system, so the two requests are independent from the result or the process.

When a system is a complex system consisting of multiple consecutive requests and concurrent requests. For example, a system consisting of three s1, s2 and s3 servers. The probability of successful requests is 90%, 95% and 99% respectively. Running software systems with three r1, r2, r3 requests. R1 calls server s1, s3 is r1 (s1, s3). The other two requests are r2 (s1, s2, s3) and r3 (s2, s3). Then we can get the success probability of r1 (0.9*0.99) = 0.89, r2 (0.9*0.95*0.99) = 0.85, r3 (0.95*0.99) = 0.94 respectively, and the system operation probability is 1 - ((1-0.89)* (1-0.85)* (1-0.94) = 99.9%. That is to say, the probability that all services of the system will stop responding is 0.1%. This is shown in Figure 4.

Figure 4.

Characteristics of Memory Database and Hard Disk Database

Although continuous requests increase the probability of system response, the overall stability of the system does not depend on the probability of successful response. Because if the database cannot be used, it will cause data loss. Loss of data can make the system completely lose its service capability. Suppose that there is only a memory database in a system and no hard disk database. The probability that the memory database will run normally is 96%. Once the memory database fails to run and data is lost, the system loses its ability to run completely. So the upper limit of the overall stability of the system is 96%. Memory databases are like the shortest board in a barrel. It determines the low point of the overall stability of the system.

Because the hard disk database can ensure that data is not lost for a certain period of time after the service is restarted. So hard disk database becomes the upper limit of system stability. As long as the hard disk is not damaged enough to be unusable and data is lost, even if the system is restarted, it will not cause the system to be unusable. It is pointed out in a hard disk failure rate report [1] provided by backblaze, a data backup manufacturer. The most frequently used hard disk failure rate is 2.22%. By using RAID0, the data loss rate in dual hard disk backup is 0.05%. And it can further reduce the probability of data loss through more backups. Then the stability of the system will be greatly improved after using the hard disk database.

Although the memory database is far less stable than the hard disk database. But the memory database runs much faster than the hard disk database. Random reading can reach a thousand times of the gap, while sequential reading and writing also have a 10 times gap. The order of stability from the highest to the lowest is hard disk database, memory database and service container. From the point of view of response speed, the fastest to the slowest are service container, memory database and hard disk database. Response speed refers to the response to a user's request. If the service container does not use the memory database and the hard disk database, the processing speed is the fastest. Because the response to processing requests is faster, the amount of concurrency supported is the highest. Of course, the stability here is relative, even if it is the worst server container. The service containers under development are frequently updated artificially. Any update behavior can lead to potential downtime. Properly managed service containers can ensure more than 95% smooth operation. Consider the possibility of 1% to 5% damage to any hardware. The software stability rate is very close to the hardware stability rate. So the main memory database serves as a temporary data backup function. Save data usage during service container replacement.

Ambiguity

When there are multiple data in a distributed system, such as memory database and hard disk database, the user account information is stored simultaneously, which is called data ambiguity. From Inference 1, we know that there is consistency of delay time t in this ambiguity. Ambiguity in distributed systems can lead to irreconcilable data errors. Suppose there is a system that accepts user data in the server container and returns it, and then propagates the data to the memory database and the hard disk database in turn. This is shown in Figure 5.

Figure 5.

Assume that writing to the hard disk database fails during propagation, and that the operation has been returned to the user successfully. When the user exits offline, the memory database is deleted. Users will find data lost when they go online again. This is the ambiguity between user return and actual storage.

Another possibility is that the hard disk database was successfully written and the memory database failed. It is always wrong for users to query data, which leads to repeated operation of users and further enlarges the impact of errors.

In the process of transmission from service container to memory database and hard disk database. Because of the existence of delay time t, errors occur during this period, which leads to the success of some devices writing, and ambiguity occurs when some devices fail to write. Because the asynchronous system may be accompanied by other asynchronous operations in the process of writing. This makes deleting erroneous operations extremely difficult. This ambiguity error in distributed systems is caused by our attempt to change the propagation direction of the consistency of delay time t. The original direction of transmission is from service container to memory database to hard disk database. We tried to recover server container errors using data from memory or hard disk databases. The delay time t and the probability of availability of the distribution will inevitably lead to the risk of data errors. High performance is accompanied by higher risks, while high stability is accompanied by lower risks. Only risk cannot be eliminated completely.

By distributing the system, the risk of the system can be further reduced. For example, multiple systems are split and different requests are processed separately.

Figure 6.

At this point, we have a new way to deal with the risks of the system. Formula 2 shows that distributed system can greatly reduce the probability that the whole system can not be used. Then when we need higher performance when we have higher system risk. If it is distributed, the risk of the whole system will be greatly reduced. That is to say, the number of users affected by the system is reduced to 1/n, where n is the number of distributed users. For example, a website is designed as a service container, a memory database and a hard disk database structure. One day, a large number of unexpected users flooded into the promotional activities, resulting in blockage of hard disk IO writing during the peak half hour. The website adopts partial performance design, ignoring the failure of hard disk database writing. Within half an hour, all operations of users who have logged in have not been affected except that the users cannot log in. If the congestion time exceeds 1 hour, the memory database data will be cleared, which will result in the loss of user's operation during this period. The part affected by congestion is two parts of a 10-part distributed system. So two-tenths of all users are potentially missing.

In computer systems, there is also a process of propagation based on the consistency of delay time t. Caching in CPU to memory and then to hard disk is a propagation process. There is also a probability of data inconsistency in this process, which is only very low due to very short time.

Figure 7.

Another common case that leads to ambiguity is the backup server created for the primary server. When the data of the primary server is not updated to the backup server in time, the downtime of the primary server results in the loss of data. At this point, the start of the backup server will cause the old data to overwrite the new data. The occurrence of this situation is related to two factors. The number of synchronizations initiated and the time of synchronization. It is assumed that server downtime will inevitably occur in the process of synchronization. The more synchronization, the longer synchronization time, the greater the impact. The number of synchronizations on the Internet can be approximately replaced by the number of visitors per unit time. The effect of synchronization interruption can be expressed in Formula 3.

Number of People Per Unit Time Online / (Synchronization Time/unit time)

For example, the current online population is 3000 people, synchronization time is 1 millisecond. The number of people affected by interruption is 3000 / (1/1000) = 30. When the outage occurred, the number of people affected was 30. Formula 3 shows that the fewer people are online and the shorter the synchronization time is, the fewer people are affected by the wrong data.

When the server is prone to downtime, we will try to use multiple backups to eliminate the impact of downtime. Common sense is violated because backup time between servers is much longer than disk backup time. Because it needs to be realized through the network and various software. This leads to the more backups we have, the more time it takes. Formula 3 shows that the more time consumed, the more people affected by errors. This, on the contrary, leads to a large increase in ambiguity errors due to the increase in the number of backups. So the suggestion of backup using hardware solution as far as possible can greatly reduce the damage caused by synchronization time.

Propagation of AP-Tree Structure

The process of transmitting data from the place where it was changed to other hardware is called AP method or AP relationship. Because putting data in only one hardware can cause other hardware to be blocked in the process of using this hardware. This propagation results in reading data not being up-to-date or absolutely consistent, but based on the consistency of delay time t. So the data propagated under AP relation cannot be used for operations with atomic relation. That is to say, data in AP network cannot be written back into AP network. Because the existence of delay time t will cover the newer data with older data. This is shown in Figure 8.

Figure 8.

The backup of servers may lead to the situation that old data covers new data. When the primary server is offline and the standby server is online, the old data in the standby server is superimposed with new requests, which can lead to data confusion. For example, a merchandise main server records 0 items in stock, while a backup server records 1. When the backup server is online, it will cause users to buy non-existent items.

The AP relationship gives us access that will not be blocked under almost any circumstances. And it can be extended by tree propagation. This extension brings almost unlimited traffic and hardware support. This is a very important design method in the distributed domain. It is used in content distribution, DNS, ZooKeeper and other important distributed tools. Its distribution forms are also varied, such as gossip, routing protocols, etc. We can also use the blocking method to ensure that the data is not affected by the delay time t. Or a competing algorithm such as Bitcoin is used to eliminate the effect of delay time t.

In ZooKeeper, a tree-divergent network, there are two different operations: read and write. Formula 2 can be used directly to calculate the probability of success of reading. The probability of writing is divided into different stages. The probability of writing to the original origin is the probability that the origin is available. When the origin diffuses to the next layer, the probability of successful re-diffusion is obtained by using formula 2 for all nodes in the current layer. That is to say, if the diffusion of any node succeeds, the diffusion behavior will continue. It can be seen that once the write behavior begins to spread in AP relations, the more nodes that have been successfully diffused, the higher the probability of completing all the diffusion successfully. And the success rate of reading is surprisingly high. This is shown in Figure 9.

Figure 9.

Protocol and Data

Theorem 1 shows that distributed networks are built around shared data in asynchronous networks. When an external request arrives at the network, it finds the corresponding data set and triggers the task operation data set. When a data set has multiple backups in a distributed network, a data set needs to be selected for writing and then diffused to other data backups. The data set that this choice is written to must be unique in a distributed network. Otherwise, data ambiguity will arise. Once the data set is selected, multiple operations on the data set are fixed. This selected point becomes the hardware and software intersection point for processing the specified data set. This point becomes the protocol for specifying data set operations. Note that this is not limited to data. This selected protocol must be unique when there are multiple data backups in a distributed system. That is to say, distributed system is built around data protocol. The entity of this protocol is the hardware that stores the data. Data that does not become a protocol is referred to as backup data or read-only data. Writing on backup or read-only data is invalid.

This data already exists when a computer in a distributed cluster generates a result data set based on computation. We may not return to the client to report successful execution until we have stored it in an in-memory database or a hard disk database. But whether the report is returned or not does not affect the stability of the data. Because the hard disk also has the situation of losing data, but this possibility is relatively low. Data is generated from the beginning and then diffused to other hardware. The source is the point at which tasks and data are linked. This protocol is not only the protocol between tasks, but also the protocol between tasks and data. For example, it may be named "user information" for data and registration tasks.

Since it's called a protocol, it's directional. That is, the task is processed until it is published to write data. Written data is then diffused through AP relationships in distributed systems.

Figure 10.

Reading and writing to data are two operations, but for tasks, they are the input of execution and the output of execution. So writing data is an important basis for judging the atomic relationship between tasks. In an asynchronous network, any task will be executed at any time, so the data written jointly between any two tasks constitutes a write conflict. That is, the two tasks constitute the atomic relationship.

Atomic relations in asynchronous networks are an interesting concept. For example, my wife and I like to eat steak, so we can both make steak at any time. So the use of frying pans should be queued up. Otherwise, it may happen that I have a frying pan without beef, and my wife has beef without a frying pan, resulting in the embarrassment that nobody can make steak. So every time you want to make steak, check your calendar to see if the frying pan is free. The fact is that I haven't cooked steak in a year, and only my wife is using the frying pan. This example reveals that potential conflicts are related to the frequency of occurrences. When the frequency is low, the probability of conflict is low. Suppose this frying pan is the only frying pan in our community. Residents of any community can only use this frying pan to make steak. Then I think it is more in line with the definition of atomic relations in asynchronous networks. I have an atomic relationship with all my neighbors about the frying pan.

Performance Analysis in Distributed System Design

In a small system with only one server container and one hard disk database, the completion of a request, from the service container to the database, must go through (a) service software, (b) server container, (c) network links, (d) database software, (e) hard disk. The probability of successful request is (a)* (b)* (c)* (d)* (e)= F. If the data is stored directly on the hard disk, the request will go through (a) service software, (b) server container, (e) hard disk, and the probability of successful request is (a)* (b)* (e)= G. Because 0 <(c)*(d)< 1, f < g, the success rate of a small system request for a service container and a hard disk database is less than that for a single service container. Because the system can retry indefinitely to improve the success rate of requests. So the overall stability of the system depends on the stability of the hard disk. Both F and G use hard disks, so the overall stability is the same. When multiple service containers use the central database together. This is shown in Figure 11.

Figure 11.

I used a community-shared frying pan as an example. Assuming that one of the service containers is using the frying pan, the other service container requesting the frying pan will wait. Then the parallel system degenerates into a serial system. The system will obey Amdahl's law. That is to say, the partial performance of system parallelization has been improved, while the partial blocking of serialization is in the database. If serialization results in the performance degradation of database processing, which affects the data distribution function and the parallel processing ability, then it will lead to the decline of the overall processing ability of the system. Assuming that each service container is affected by the same X because of serialization, the time taken by N links is x*x*x... x ^ n, that is, serialization of databases can lead to an exponential decline in overall processing power. The more problems that arise, the greater the performance requirement will be and the more serious the problem will be. The serialization part has not been well solved. For example, locking Table 1 requests Table 2 and locking Table 2 requests Table 1 two tasks can form deadlocks. This is shown in Figure 12.

Figure 12.

In Internet software systems, 80% of the data is inactive. These data are stored on the hard disk and need not be computed. And 20% of the data is the data that needs to be calculated at present. Hard disk database location is to manage 80% of the data and provide multi-dimensional large data analysis function. When the data is activated because the user uses it, it needs to be transferred to the service container for processing. Assuming that the hard disk array has 288T capacity, the system needs 56T of memory to run. A single server can't provide so much running memory, so the system needs a distributed server.

The way data sets are centralized and stripped helps to distribute parallel parts. It helps to optimize the performance of distributed systems, such as reducing the computing power of hard disk databases. But it doesn't help the serialization part. Its essence is the AP of software, that is, the availability and protocol of data. As shown in Figure 11, the improvement of data availability is limited to the improvement of distributed performance. Upgrading to a certain level will be limited to the serialization part. Because the serial part of the system is highly concentrated in the database. Even two non-interacting serializations must be done in the database. Because databases cannot know how software systems use data and establish atomic relationships. Serialization is possible between any tables for a database. Because of the preparation for infinite possibilities, that is, any table can be serialized. The database can only put all tables together. This centralization results in the failure to decompose the functions undertaken by the database. Because the serialization part must be placed in a single thread to ensure the atomicity of the processing. This is confirmed by Amdahl's law. Parallel parts of software can use as much computing power as possible to support access pressure. That is, the performance of the parallel part is linearly related to the number of hardware provided. The serial part can only run in a separate service container. So the serial part cannot improve the performance because of the increase of hardware. So the performance of the serial part will be a fixed value in the system. For example, the central database in the previous example, because the serial part of the whole system is executed in the database. Then the ratio of the computing requirement of the serial part to the computing capability of its service container is the performance ratio of the whole distributed system. For example, the current CPU utilization rate of the central database is 50%, because all serial operations are in the central database, so the computing power of the whole distributed system has been used 50%.

Suppose we have two central databases to handle the serialization part. One of them has 100% CPU usage and the other 50% CPU usage. Is our overall usage 75%? Obviously not, because 100% of servers have a large number of potential computing requirements that cannot handle the near collapse. If less than 100% can be approximated as the requirements are met, if any server has been greater than 100%, it is beyond the normal range of use.

It can be seen that in the distributed system, the performance of the system is determined by the serial part, that is, the part with atomic relationship. That is to say, there are tasks of reading and writing data or writing data only in RP method. The serial part is not complete and inseparable. It can be divided according to the different data written. Consider two Amdahl systems running in the same set of hardware.

If the two serial parts are placed in the same hardware, the overall performance will be low, and if they are opened in two different hardware, the overall performance will be improved. A complex software system may have multiple Amdahl systems. Finding these Amdahl systems and putting them into separate hardware can greatly increase the processing cap of distributed systems. The RP method is to find these hidden Amdahl systems. Fundamentally improve the upper limit of the processing capacity of distributed systems.

When we find the hidden Amdahl system in all software systems, we can get the upper limit of the carrying capacity of the distributed system by multiplying the number of Amdahl systems by the hardware used. For example, we have two Amdahl systems for payment and purchase, each with a single user request that takes about 0.001 seconds to process. Then put the two Amdahl systems into two servers. The upper limit for concurrent payments and purchases per second is 2,000.

Conclusion

The software system is composed of parallel part and serial part. The serial part is composed of several serial parts. According to Amdahl's theorem, the performance of the parallel part is proportional to the number of hardware. The serial part is proportional to the hardware performance. Because the serial part usually accounts for only 20% of the software system, the distributed system usually only solves 80% of the parallel part. Adding up to 80% of the software systems will not result in high-intensity concurrent access. So only about 4% of distributed systems involve high-intensity access. And only a very small number of top companies pay attention to these 4% serial issues. So all kinds of problems that serial problems may bring are often overlooked. Even these serial problems are simply regarded as system bugs. It's not that companies and systems in general don't need distributed systems. Because problem is only a lower probability. Once problems arise, it will be a fatal blow to the company's technical team. Technicians should be concerned about any possible system defects.

In the next chapter, I will use the website of online shopping mall to illustrate how to create a distributed system. How to balance slicing, caching and hard disk?

Reference

https://www.backblaze.com/blog/backblaze-hard-drive-stats-q1-2019/