Content area
This article contends that parallel production sub-systems may be cost effective and, in some cases, they are the only technically feasible way to improve output quality. Operating in either a stand-by-redundancy or an outgoing capacity, parallel production subsystems are likely to be most cost effectiveness when placed in upstream operations, where the leverage effect on output quality is considerably higher. Although typical research in parallel systems assumes a known distribution to estimate the output reliability of the parallel configuration, explains how this study used a simulated production environment to develop a regression model for assigning parallel system components by monitoring their actual past performance, and was therefore distribution free. Applying variables monitored during a previous production run in which quality is measured in a binary manner, the model was used to determine optimal pairs of parallel; subsystems. Claims that this matching model was about 2.5 times more accurate than Markov analysis in predicting the output quality of a given pair of parallel systems. The inclusion of an additional variable in the regression resulted in the model explaining about 75% of the output variability of the parallel configurations and thus could potentially predict quality in lieu of direct inspection.
Roger Nibler: Lingnan College, Hong Kong
Introduction
The literature of total quality management has consistently focused on the benefit of eliminating defects during the early stages of the production process, whether that process is product or service oriented. Management gurus such as Crosby (1984), Deming (1986), Feigenbaum (1991), Ishikawa (1985) and Juran (1992) have stressed the importance of defect prevention and correcting mistakes as far upstream as possible in the production process as one of the most important ways of making significant reductions in production costs. In the service industry, the Ritz-Carlton Company estimates that "What costs you a dollar to fix today will cost $10 to fix tomorrow, and $100 to fix down-stream" (Pratlow, 1993).
Most production systems are characterized by sub-systems whose output contains errors or defects. These defects can be the result of technological limitations, design flaws, human errors and defective output from preceding sub-systems. In an effort to minimize defects, especially upstream sub-systems where defect prevention is generally more cost effective, many firms are using parallel sub-systems (Dale, 1994; Taguchi, 1989; Weiss and Gershon, 1993). Although there are numerous examples of parallel systems, three common ones are critical control systems in aeroplanes, auxiliary power supply systems in hospitals and spare tyres carried in motor vehicles, in which the backup system operates in the event of a failure in the primary system. Another example is an electrical utility company using dual metering systems operating in a parallel mode. If one metering system fails, even momentarily, the other metering system automatically takes effect, thus preventing the loss each month of several thousand dollars of billable electricity watt-hours (Electrical World, 1992). In a manufacturing process, two or more machines may operate in a parallel mode in an effort either to increase output or to improve production reliability where one machine serves in a redundant or backup capacity under conditions in which the other machine, either intentionally or unintentionally, ceases to operate (Taguchi, 1989; Weiss and Gershon, 1993). Such parallel systems are examples of utilizing a backup system in the event that one system is not operating. In these situations the two systems are related to each other only through the similarity of their processes and by a switching system to transfer process control from the one to the other. Accordingly, their interaction is only momentary.
In other situations, components of parallel systems operate in active mode on an ongoing basis. An example of this is the traditional method of data entry in which one operator keys the data from a document, while another operator, serving in a verifier capacity, keys the same document. Discrepancies between the two operators are then resolved by a third operator (Rhodes, 1987). Another example of an active parallel system is a bus monitor configuration used in critical information processing situations. The bus monitor receives data from three processors, detects and records potential errors, and re-configures the data routes to the monitored buses so that all memories receive correct data, even in the event of an error being present. Three error recording chips operate in parallel, two of these chips operating pairwise in a "compare-for-equal" function. Error recording differences are resolved by the third chip which is designed to agree with only one of the chips based on the nature of the error. The circuit causing the error is then momentarily bypassed (Kanopoulos, 1988). Other central processing unit (CPU) systems use a second watchdog processor that takes over the processing if an error occurs in the first processor (Bodnar, 1993).
A third example is two optical character recognition (OCR) devices configured in parallel for reading handwritten documents. This system operates in a way similar to the two-operator data entry method by comparing for differences between the two OCRs (Ribaric and Pavesic, 1988). Numerous additional examples exist in which a given operator may perform an assembly operation while a different person (or, perhaps, machine) inspects the output. Although one person performs an operation, inherent in this process is also the inspection which must be done by that person during the operation. When a second person performs the inspection (and, perhaps, an operation as well), a parallel system is in effect, and the operation is basically a compare-for-equal process in which discrepancies are resolved either by one of the operators or by one or more inspectors.
Parallel systems can be used to model reliability and, under certain circumstances, quality. When used in redundancy mode, the parallel system output is an indicator of reliability. The parallel system can be used also to predict and/or measure quality under situations of a binary or go-no-go determination of quality. As a quality model, the examples of data entry, bus monitor and OCR device are illustrative because only a true or false state of nature can exist under these circumstances - the output either is or is not acceptable in those types of situation. Many manufacturing situations, however, are characterized by variable quality determination. In such situations the variable designation of the output quality is the most critical aspect of the quality measurement. In those situations, the model described in this paper would have limited application. This model has the greatest potential application to those reliability and quality situations in which a binary comparison plays a critical role in the evaluation of the output of the system. To this extent, the term "reliability" is used in the context of those situations which pertain specifically to reliability or binary quality measurements as typified by those described above.
To illustrate the effect of a parallel system, an example of a series (A with B1 and B2) and parallel (B1 with B2) systems is illustrated in Figure 1. It should be noted that reliability is equal to 1 minus the error rate. In this example the error rates for the sub-systems are shown in their respective boxes:
If B2 were not present, the reliability of the system would be 0.35 [(1 - 0.5) (1 - 0.3) = 0.35]. The inclusion of B2 increases the system reliability to 0.44 [(1 - 0.5)(1 - (0.4)(0.3)) = 0.44]. This simplified example is based on assumptions characteristic of many parallel system models. They include:
Markov processes in which each series' sub-system is independent of the previous subsystem. Implicit in this process is that information is not shared between any two sub-systems in series which can affect the performance of the following sub-system.
The parallel sub-systems are independent of each other such that a common cause does not impair both systems under the same set of conditions.
The occurrence of errors in the sub-systems have a known distribution, which is either tractable or can be modelled by Monte Carlo, or a related technique.
The sub-systems are in a steady state: they are not transient.
The interface between the parallel subsystems to switch (or resolve the output discrepancy) from the sub-system in defective mode to the backup system has a known reliability (Kozlov and Ushakov, 1970; Smith, 1972).
These assumptions in many cases do not reflect what actually happens in a production situation, and because of this many parallel system models may not accurately portray their anticipated effects by improving total system reliability (Fishman, 1990; Petrovic, 199l). Accordingly, predicting and modelling the reliability of many systems/sub-systems are subject to wide margins of error and credibility; therefore, mathematically-based models in many typical production settings cannot be applied to the reliability of many production systems with the same degree of assurance as they can in physics and engineering (Dale, 1994; Norbert, 1985).
To avoid some of the limitations of the Markov process and other mathematical models, there has been a recent interest in the literature pertaining to modelling in which ongoing information on the performance of sub-systems is communicated to downstream sub-systems (Guha and Aggarwal, 1989; Petrovic, 1991). However, an area which seems to be dealt with only lightly in the literature is the development of models optimally to match sub-systems used in parallel configurations. This matching process could be crucial to the effectiveness of the parallel configurations. For example, if two systems operating in parallel each have a reliability of 0.90 (or an error rate of 0.10), then the theoretical reliability of parallel configuration would be 0.99 [1, (1- 0.9)(1- 0.9) = 0.99]. This prediction, however, assumes that there is a 0.01 [(1- 0.9)(1- 0.9) = 0.01] probability that the two systems would be vulnerable to the same types of condition causing the error or malfunction. If these two systems were virtual clones, then there would be no advantage gained in their parallel configuration, because each would make the same error under a given condition. However, if these two systems had different underlying characteristics such that their 0.10 error rates occurred under different sets of conditions, then the configuration could result in defect-free output.
Regression monitor, match and predict model
A model is presented here for optimizing system performance containing sub-systems in parallel configuration. This model is used to match these sub-systems optimally and is based on the following assumptions:
The parallel sub-systems have either different underlying reliability distributions or different characteristics affecting their reliability.
A switch or interface exists to divert the operation from one sub-system to the other(s) in the event of failure or to resolve the output discrepancy between the two sub-systems in parallel configuration. This switch should have a known reliability.
During processing, information regarding the performance of each sub-system in the parallel configuration can be monitored.
The parallel systems, although possibly in a transient state, are sufficiently stable to be modelled adequately.
The parallel systems can be assigned with each other in various combinations: they are interchangeable.
Basically, the model for optimizing the configuration of parallel systems matches the strengths and weaknesses of each potential sub-system and then continually monitors the performance of the configuration to determine whether a different arrangement would be more effective. Additionally, this model could be used to predict the output quality (reliability) of the parallel system by monitoring the variables generated during the operational process rather than by direct inspection which, in some situations, may be quite costly.
Modem development
The regression monitor and match model was developed and tested using 32 data entry operators. Data entry was selected for the study because of its similarity to many production situations in which a critical portion of the manufacturing process utilizes human hand-to-eye co-ordination. Moreover, the manual data entry process is quite mechanistic and therefore can also be considered to simulate the performance of machines.
Each operator keyed the same document while data pertaining to the keying process were monitored by specially designed routines attached to the input software. The data entry document, which was in hard copy form, was also on disk, thus making the recording of the nature and type of error a rather straightforward process. The performance of each operator was compared to all other operators to yield 496 possible pairwise comparisons.
The operators were recent graduates from a university in Hong Kong and had over one year of working experience. Most of their jobs involved about 20-30 per cent data entry and word-processing activities; thus they were well down the learning curve regarding this type of activity. The operators returned six hours later to enter a second document, which required about one hour to enter. The second document provided the means to validate the model.
The operators keyed a manuscript which required about one hour for each operator to enter. During the data entry process the row, column, correct and incorrect entries were recorded in a special database. These four data fields provided the means to collect the independent variables for the study which are described below:
Errors in first and second databases. Given that the operators were experienced typists, it was expected that the percentage errors committed in the first database entered would be similar to those of the second database. Furthermore, it was expected that the nature and pattern of errors committed in each of these two databases would also be stable for each operator, because experienced typists tend to have stable error patterns from one sitting to the next.
Mutual character errors. A mutual character error is the number of errors committed by a given operator pair on a particular key. For example, assume that the @ key is the correct key to enter on 20 occurrences throughout the document. If operator A makes seven errors on that key while operator B makes three errors, then the mutual character error would be "three" for that operator pair on the @ key. It should be noted that other combinations such as using the largest number of errors (seven in this case), the sum of the errors and other mathematical manipulations did not factor significantly in development of the model. The value of the mutual character errors is the sum of the lowest error values for a given operator pair on all keys.
Different keys. The different keys variable is incremented by "one" each time a given operator pair strike different keys in a given row and column position, regardless of whether one or both operators was or were incorrect. The different keys accumulators were gathered in both the first and the second database. This variable was collected in the second database to serve as an independent variable in predicting the reliability of the parallel system configuration.
Analysis
Correlation analysis was conducted on each of the above independent variables. These correlations are shown in Table I.
Regression analysis was performed on those independent variables which could be used to select optimal operator pairs, and thus could consist of only those variables collected while the operators were entering the first database. The results of the regression are given in Table II.
This regression model explained about two-thirds of the variance and, in this situation and perhaps others similar to it, could be useful in optimally matching parallel sub-systems. In regression analysis it should be noted that whenever high correlations occur among independent variables, collinearity is a potential problem. However, the covariance value of 47.40 coupled with the R[sup]2 of 0.67 indicates that this does not appear to be a significant problem (Lardaro, 1993).
The rather strong relationships shown here can possibly be explained by the fact that the operators were quite consistent during both data entry sessions. This seems reasonable given their experience in data entry and word-processing activities. To examine this possibility further, a difference between matched pairs' t-test was conducted for the operator errors between the first and second database (M = 4.66 per cent, SD = 2.32 per cent and M = 4.65 per cent, SD = 3.00 per cent) resulted in a t-value of 0.226, DF = 494, to yield a p-value of 0.823.
As an additional measure to validate the model, the study was repeated using 28 different operators who entered the same first database as had the previous group, but entered a different second database. This group was similar in composition to the first group of operators and the follow-up study was conducted in a procedure identical to that of the first study. Interestingly, the R[sup]2 for the second group was 0.84, which is higher than that for the first group.
Additionally, the correlations among the independent variables were all slightly higher than those of the first study, thus perhaps indicating a more stable performance among the operators between first and second databases. Although no explanation can be given for this higher result other than that due to sampling variation, it does indicate that the findings of the first study were not spurious.
Given the high R[sup]2 values in this study, it is apparent that heuristic monitoring and matching of the operating characteristics of parallel systems could be an improvement over methods which assume or depend on a particular distribution of errors in the parallel system components, and/or perhaps even less optimally also include a blind or random assignment of parallel system components.
The model was tested as a predictor of parallel system reliability (quality). Given sufficient validity, such a model could have application in situations where sub-system output quality is critical, but physically difficult or excessively costly to perform on an ongoing basis. A predictive model could indicate whether the output of the parallel system should receive a complete inspection, or perhaps be discarded, This model could not only use the same type of independent variables to match parallel system components, but also utilize data gathered during the actual operation of the parallel system. In this study, the different keys variable generated by each unique operator pair in the second database significantly factored into the regression equation as part of the predictive model for determining the number of errors in the second database. The result of this regression is shown in Table III.
The predictive model explained nearly three-fourths of the variability in the errors of the operator pairs. In the replicated study using the second set of operators, the R[sup]2 of the predictive model was 0.88.
To extend the analysis further and perhaps illustrate the effects of the model more clearly, the regression was applied to determine the most optimal pairwise combinations of 16 of the 32 operators. The theoretical error rate was calculated by multiplying the error rates of the operator pair in each row according to the Markov process. These error rates were obtained from their performance in the first database. These theoretical rates were then compared against the actual error rates that occurred in the second database. The result of this analysis is presented in Table IV.
From Table IV it should be noted that in all cases the optimal operator combinations resulted in better performances than would have been predicted by multiplication of their error rates. This indicates that these operator combinations did not tend to make the same types of error, thus leading to the enhancement of their reliability over what might have taken place if a random distribution were assumed. An index to measure the extent of what can be termed a synergistic effect is the ratio of the theoretical per cent errors divided by the actual per cent errors. A synergy index greater than "one" indicates that the operator pair exceeded their theoretical expectations based on Markov analysis. In this study, the combined synergy index for the eight operators was 2.49. It should be noted that this figure was based on extracting 16 operators from a pool of 32. It is expected that if the operator pool were larger, the synergy index for the top eight pairs of operators would likely be higher.
Discussion
Rather than relying on random assignments of parallel system components or assigning parallel system components based strictly on their individual reliability, consideration should be given to the fact that the components in the parallel configuration may have unique error patterns under a given set of conditions. The model developed here does not rely on the stringent assumptions of many mathematical models pertaining to parallel systems, but utilizes an approach which monitors data from previous operations in determining optimal parallel configurations. This model could be used also to predict the reliability of the output, either as a preliminary step to determine whether inspection of the output is needed, or as a go-no-go basis for moving the output to the next production stage.
Given that detecting errors early in the production process has a high leverage effect regarding the costs involved in correcting errors in downstream operations, the use of active parallel systems could be a cost effective way to improve the quality of certain types of system. In other situations, parallel systems may be the only alternative to improve reliability in the face of current technology limitations. It should be noted, however, that the extent to which the switch or interface used to determine which parallel sub-system is in error will play an important role in the effectiveness of a parallel configuration. Thus, it is important that each of the components of the parallel system has as low an error rate as possible and that the switch has an even much lower error rate as well as a rapid response time.
Although the model was developed using active parallel systems, it may be applicable also to redundant systems, especially in matching the components of redundant systems. While the results from this study are encouraging, it should be noted that it is a preliminary effort and that the identification of additional variables as well as refinement of the model are unquestionably needed.
References
1. Bodnar, G.H. (1993, "Data security and contingency planning"; Internal Auditing, Winter, pp 74-80.
2. Crosby, P.B. (1984, Quality without Tears: The Art of Hassle-Free Management, McGraw-Hill, New York, NY, 1984.
3. Dale, B.G. (Ed.) (1994, Managing Quality, Prentice-Hall International (UK) Limited, Hemel Hempstead.
4. Deming, W.E. (1986, Out of Crisis, Massachusetts Institute of Technology, Cambridge, MA.
5. Feigenbaum, A.V. (1991, Total Quality Control, 3rd ed., McGraw-Hill, New York, NY.
6. Fishman, G.S. (1990, "How errors in component reliability affect system reliability", Operations Research, Vol. 38. No. 4, pp. 728-32.
7. Guha, S. and Aggarwal, K.K. (1989, "Extension of minimum effort method for non-series parallel systems", International Journal of Quality & Reliability Management, Vol 6 No. 1, pp. 19-26.
8. Ishikawa, K. (1985, What Is Total Quality Control? The Japanese Way, Prentice-Hall, Englewood Cliffs, NJ.
9. Juran, J.M. (1992, Juran on Quality by Design: The New Steps for Planning Quality into Goods and Services, The Free Press, New York, NY.
10. Kanopoulos, N. (1988, "Design of a bus-monitor for real-time applications"; Microprocessing and Microprogramming, Vol. 24, pp. 717-22.
11. Kozlov, B. and Ushakov, I.A. (1970, Reliability Handbook, Holt, Rinehart and Winston, New York, NY.
12. Lardaro, L. (1993, Applied Econometrics, HarperCollins, New York, NY, pp. 444-54.
13. Norbert, L.E. (1985, Quality, Reliability and Process Improvement; 3rd ed, Industrial Press Inc., New York, NY.
14. Petrovic, D. (1991, "Decision support for improving systems reliability by redundancy", European Journal of Operational Research, Vol. 55 No. 3, pp. 57-67.
15. Pratlow, C.G. (1993, "How Ritz-Carlton applies TQM", The Cornell HRA Quarterly, pp. 16-24.
16. "Redundant metering operates reliably" (1992), lectrical World, June, pp. 78-82.
17. Rhodes, W.L. Jr (1987, "Data input enters a new era", Infosystems, September, pp. 194-7.
18. Ribaric, S. and Pavesic, N. (1988, "Parallel character recognition system: theory, simulation and synthesis", Microprocessing and Microprogramming, Vol. 22 No. 5, pp. 333-46.
19. Smith, D.J. (1972, Reliability Engineering, Harper & Row, New York, NY.
20. Taguchi, G., Elsayed, A.E. and Hsiang, T. (1989, Quality Engineering in Production Systems, McGraw-Hill, New York, NY, pp. 135-8.
21. Weiss, H.J. and Gershon, M.E. (1993, Production and Operations Management, 2nd ed., Allyn and Bacon, Boston, MA..
Caption: Figure 1; Reliability of complex system in series and parallel; Table I; Correlations of independent and dependent variables (n = 496); Table II; Regression analysis for modelling parallel system selection; Table III; Regression analysis for predicting parallel system output; Table IV; Optimal operator combinations of 50 per cent of the operators
Copyright MCB UP Limited (MCB) 1997
