Characteristics of production database workloads

Full text

Headnote

There has been very little empirical analysis of any real production database workloads. Although the Transaction Processing Performance Council benchmarks C (TPC-C(TM)) and D (TPC-D(TM)) have become the standard benchmarks for on-line transaction processing and decision support systems, respectively, there has not been any major effort to systematically analyze their workload characteristics, especially in relation to those of real production database workloads. In this paper, we examine the characteristics of the production database workloads of ten of the world's largest corporations, and we also compare them to TPC-C and TPC-D. We find that the production workloads exhibit a wide range of behavior. In general, the two TPC benchmarks complement one another in reflecting the characteristics of the production workloads, but some aspects of real workloads are still not represented by either of the benchmarks. Specifically, our analysis suggests that the TPC benchmarks tend to exercise the following aspects of the system differently than the production workloads: concurrency control mechanism, workloadadaptive techniques, scheduling and resource allocation policies, and I/O optimizations for temporary and index files. We also re-examine Amdahl's rule of thumb for a typical data processing system and discover that both the TPC benchmarks and the production workloads generate on the order of 0.5 to 1.0 bit of logical IlO per instruction, surprisingly close to the much earlier figure.

The Transaction Processing Performance Council (TPC) benchmarks C (TPC-C**)1 and D (TPC-D**)2 have emerged as the de facto standard benchmarks for on-line transaction processing (oLTP) systems and decision support systems (DSS), respectively. By establishing objectives that are easily measurable and repeatable, such standard benchmarks define a transparent playing field and focus attention on what the benchmarks consider to be important. However, the real utility of the benchmarks is determined by whether they represent the workloads of interest. To effectively make use of a benchmark, therefore, we have to carefully evaluate its characteristics against those of the target workloads to understand how closely they correspond. Although the TPC-C and TPC-D benchmarks have become widely accepted and, as a result, are heavily used for both systems design and marketing, there has not been any major effort to empirically determine their workload characteristics, let alone to establish how representative their characteristics are of real workloads.

In fact,...

Show less

Characteristics of production database workloads and the TPC benchmarks

Full text

Suggested sources

Characteristics of production database workloads and the TPC benchmarks

Content area

Full text

Suggested sources