Tuesday, March 20, 2007

CS lecture talk on web servers to databases to storage systems

From web servers to databases to storage systems: A methodological
approach to system design
Dr. Bianca Schroeder
Dept. of Computer Science, Carnegie Mellon

Modern computer systems are complex, and designing systems with good
performance and high reliability is a major challenge. In this talk,
I will show how a measurement-driven "methodological" approach to
system design can create better systems. The approach combines
real-world measurements with techniques from statistical workload and
failure analysis, user behavior characterization, analytical
modeling, performance evaluation, and scheduling and queueing theory
to gain insights into system behavior that lead to better design
ideas. Specific applications we consider in this talk include: (*)
How to schedule connections in a web server to combat transient
overload; (*) How to provide QoS for database transactions; (*) How
to exploit client behavior patterns to maximize system performance;
(*) How to improve reliability of large-scale clusters and storage systems.

Bianca Schroeder is currently a postdoctoral researcher in the
Computer Science Department at Carnegie Mellon University working
with Garth Gibson. She received her doctorate from the Computer
Science Department at Carnegie Mellon University under the direction
of Mor Harchol-Balter in 2005. She is a two-time winner of the IBM
PhD fellowship and her work has won two best paper awards. Her recent
work on system reliability has been featured in articles at a number
of news sites, including Computerworld, Slashdot, StorageMojo and
eWEEK. Bianca's research focuses on the design and implementation of
computer systems. The methods she is using in her work are inspired
by a broad array of disciplines, including performance modeling and
analysis, workload and fault characterization, machine learning, and
scheduling and queueing theory. Her work spans a number of different
areas in computer systems, including high-performance computing
systems, web servers, computer networks, database systems and storage
systems.

Right now, she is talking about system design for web sites where the design goals are low response times. A commercial web site consists of a web server, database system and storage system. Let's look at the web server. Queueing theory can be used to schedule web requests. How to improve the performance of static requests? The standard method is timesharing (FAIR), where the tasks are allocated a certain amount of time and share that time, so all tasks have equal opportunity. Her way of approaching this is to use an optimal policy called Shortest-Remaining-Processing-Time (SRPT) to minimize mean response time and minimize the number of jobs in the system. The challenge is how to implement this in a real web server? The mean response times for loading in Apache/Linux was a 10X improvement over the FAIR strategy. Web workloads are heavy-tailed. Large requests under the FAIR strategy will get done and there is more workload compared to those requests using the SRPT where almost no workload is done.

For databases, the question is which resource to schedule? The bulk of the performance is based on wait time in the database which means acquiring locks. Most databases implement a 2-phase lock mechanism for concurrency control and consistency. So, the question then translates to how to schedule locks? Can we have preemptive policies and non-preemptive policies? So the key idea is to preempt only lock holders with long remaining time, but this requires having to do bookkeeping and trying to predict the future of the tasks. Her results show that they improve existing research in this area.

For the storage system, it must be reliable. What makes this reliability hard is that failures are not exposed since there is no failure data made available. Manufacturers and vendors don't want to expose this data. She collected this data from 30 systems. The real data does not match the theoretical assumption that failures can be modelled as an exponential distribution (remembering my Software Reliability course I took at Waterloo!). For summary, many common assumptions about disk failures are not realistic. Failure rates are higher than vendor specs, time between failure are not exponential, failures are not independent.

In conclusion, can request scheduling to improve response time for all requests, transaction scheduling can improve response time of "big spenders" without hurting others, and many common assumptions on failures are not realistic. It would be interesting to see why it is that existing failure models do not model what actually exists in reality. I also remember reading an article part of a grad research database course about the modelling of disk drives, and one of the things I criticized is that how accurate this is. Modelling is always a tricky thing in my opinion, especially when you're trying to model something like failures, where they can be random and spontaneous, and are affected by many factors.

On Technorati: , , ,

No comments: