Content area
The fifth part of a series discusses various aspects of sampling. This paper addresses a new error -- the tenth specific and distinct error that may affect samples. It is called the data error. The ten sampling errors are categorized into four major categories. Terminology describing these ten errors has been modified for clarification and better understanding and to reflect common usage. The four categories of sampling errors and their respective specific errors are: 1. material errors, 2. process errors, 3. sampling errors, and 4. laboratory errors. The new sampling error to be discussed is generally termed "data error." Data error is a new and distinct error to be identified among sampling errors. This error was previously included in other errors originally identified by Gy. However, the sources of this error are separate and distinct from other errors in sampling. Corrective and preventive actions to prevent data errors are unique and separate from those addressing other sampling errors.
This paper is the fifth in a series in this journal discussing various aspects of sampling. The first paper in the series, "An Introduction to General Sampling Principles: Reducing Bias and Variation in Bulk Sampling," appeared in the Journal of GXP Compliance (JGXP), Volume 12, Number 4, Summer 2008. The second, "Error and Variation in Bulk Material Sampling" appeared in JGXP, Volume 12, Number 5, Autumn 2008. The third, "Q&A On Sampling Bulk Materials" appeared in Volume 13, Number 1, Winter 2009 of JGXP. The fourth, "A New Way of Looking at Sampling Errors and Data Variation," appeared in JGXP, Volume 13, Number 3, Summer 2009. Dr. Smith is also the author of a book on sampling (1).
KEY POINTS
The following key points are addressed in this paper:
* The recognition of sampling errors was originally reported by Gy. Pitard identified additional sampling errors.
* Data error is a newly identified error heretofore assumed in the errors of Gy and Pitard. Data errors are specific and unique and caused by different sources than the other sampling errors.
* Sampling errors may be categorized into material errors, process errors, sample errors, and laboratory errors. Ten specific sampling errors have been identified.
* Material errors include composition error, distribution error, and nugget effect
* Process errors include non-periodic process errors and periodic process errors
* Sample errors include sample definition, sample collection, and sample handling
* Laboratory errors include analytical error and data error
* Data error is comprised of data recording, data transfer, calculations, and data treatment software
* Data recording errors are mistakes by sampling personnel or laboratory personnel in recording sample weights, analytical results, and other manual recording of data
* Data transfer errors are mistakes in the transfer of original data by human intervention or by electronic means
* Calculation errors are mistakes in calculations by humans or by electronic systems
* Data treatment software errors refers to an incorrect choice of data treatment program or operations within a software program
* Recommendations for policy and procedures to prevent data errors are provided
* Data error is an important consideration when evaluating sources of variation and errors associated with sampling and data.
INTRODUCTION
The recognition of sampling errors was originally reported by Gy (2). He introduced and explained seven errors including fundamental error, grouping and segregation error, delimitation error, extraction error, preparation error, long-range non-periodic process error, and long-range periodic process error. Two additional sampling errors have been reported by Pitard (3). These include the nugget effect and analytical error. This paper addresses a new error - the tenth specific and distinct error that may affect samples. We call this the data error.
Categorization Of Sampling Errors
The ten sampling errors are categorized into four major categories. Terminology describing these ten errors has been modified for clarification and better understanding and to reflect common usage. The four categories of sampling errors and their respective specific errors are the following:
* Material errors. These include the composition heterogeneity (the fundamental error); the distribution heterogeneity (grouping and segregation error); and the nugget effect. Material errors may impact process errors, sampling errors, and laboratory errors.
* Process errors. These include non-periodic variation and periodic variation.
* Sampling errors. These include sample definition (delimitation), sample collection (extraction), and sample handling (preparation).
* Laboratory errors. These include analytical error and data error. Sampling errors are also applicable to laboratory errors.
These error categories and their relationships are depicted in the Table.
Material Errors - Composition, Distribution, And Nugget Effect
An aspect of data variation that is quite important is the variation in the material itself. It underlies all the other contributors to data variation. The nature of the material, how it "behaves" in various situations such as in blenders and how it is "presented" to us for sampling, must be well understood to minimize sampling variation. Material variation has three main components: Composition heterogeneity, distribution heterogeneity, and the nugget effect. Material errors are inherent in all phases of sampling, and may exacerbate process errors, sampling errors, and laboratory errors.
Error #1. Composition Heterogeneity. The composition heterogeneity, also called the constitution heterogeneity or the fundamental error, relates to the nature or composition of the material. This error is inherent in the material and is unavoidable. The material we sample varies in its composition or constitution. For example, a mixture of powders used to manufacture a pharmaceutical tablet contains the active drug, binders, fillers, and other formulation components. These components have different particle sizes, particle densities, and particle shapes. This type of material variation is called the composition heterogeneity. Because the particles are not perfectly identical and thus the material not perfectly uniform, our samples will not be a microcosm of the lot. Even with "perfect" sampling, there is no such thing as a "representative" or "accurate" sample. Samples are a subset of the lot, and they are representative and accurate only to within the degree of precision we are willing to tolerate. Consequently, just the act of sampling generates an error, and so the sample is not perfectly representative of the lot.
Error #2. Distribution Heterogeneity. The distribution heterogeneity addresses how various particles are distributed both in the lot and at the particular place where we take a sample. Differences between individual particles determine how they are distributed in the lot as well as "locally." They may segregate during transport or when transferred from one container to another, for instance. This variation in particle location is the distribution heterogeneity. Variation in data arises not only because of the composition heterogeneity but also because we sample bulk material as "clusters" of particles. We do not sample particles individually. Further, the material is never perfectly blended: it always has some degree of settling or segregation. Even if a near perfect blend was achieved, we lose that uniformity when we transfer the product from blender to drum to hopper. Consequently, we incur the distribution heterogeneity, which generates the grouping and segregation error.
Error #3. Nugget Effect. The nugget effect is the presence of certain isolated and rare particles, or contiguous groups or clumps of particles, which may have a major effect on sample results. It is an error caused by non-uniformity in material composition and is one cause of data values that we might consider outliers. This error is exemplified in the mining industry when prospecting for gold. An area of land may be sampled to determine the presence of gold. The area to be sampled may have a low level of relatively uniform gold concentration and may also have localized high concentrations of gold "nuggets." If not properly sampled, it is these high concentrations of nuggets that cause test results to be erroneous. Results may be skewed high or skewed low depending on the sample withdrawn. The nugget effect may occur in sampling pharmaceutical or nutritional materials such as natural products. Nuggets may be individual particles, clumped materials due to moisture or electrostatic charge, or other localized regions of high ingrethent concentration in the material to be sampled.
Process Errors
There are two sources of process variation: Nonperiodic (error #4) and periodic (error #5). Nonperiodic process variation is non-random and results from process changes showing data shifts and trends. We might know there was a process upset or that material from a new supplier was used, which are examples of non-periodic process variation. Periodic process variation is also non-random and results from cyclic behavior. Temperature and humidity as related to storage conditions in the summer (in an inadequately environmentally-controlled warehouse) is an example of cyclic behavior occurring over 24-hour periods. An optimal sampling strategy, including such things as sampling frequency and stratification plans, is beyond the scope of this article. Gy describes graphical and analytical techniques for examining both non-periodic and periodic process behavior (2).
Sample Errors
The sample category includes errors from sample definition (error #6), sample collection (error #7), and sample handling (error #8). The principle of correct sampling must be followed to reduce these three errors. The following two points are critical:
* Every equal-sized portion of the lot must have the same chance of being in the sample
* The sample integrity must be preserved both during and after sampling.
In simple random sampling, these criteria are easily met. Units in the population are chosen individually (one at a time), with equal probability (every unit has the same chance), and completely at random (using a well-defined random process). This is not possible with small-sized particles or powders; so random sampling in the strict sense is not possible. The best we can do is approximate this "perfect"' sampling by following the principle of correct sampling. In the following discussion, we combine sample definition and sample collection as "sample selection."
Sample Selection. The most misunderstood cause of data variation is incorrect sample selection, which includes sample definition and sample collection. Sample definition is determining what subset of the lot material will be in the sample. Sample collection is physically obtaining the material identified to be in the sample. Material on conveyor belts, for example, must be taken fully across the "stream," and not just from one side. Otherwise, material on the other side has no chance of being part of any sample. Segregation across the stream, such as by particle size, will result in biased results. While we may identify correctly the subset to be collected, the mechanical sampler may not go all the way across the belt, or it might slow down as it collects more material. The result is similar to a grab sample, where more material is collected from one side than the other. Sometimes segregation occurs vertically in drums. Taking only a small scoop from the top, rather than using a thief, is selective sampling, not correct sampling.
Sample Handling. Sample handling is generally considered part of the broad category of sampling and addresses the preservation and integrity of the sample, both during and after sampling. Sample handling speaks to physical and/or chemical changes that alter the sample's material composition and the characteristic of interest, which in turn changes our measurement. Sample handling is often overlooked as a source of sampling problems. For example, a sample may be correctly taken on the production floor, packaged, and sent to the laboratory for analysis. When the lab receives the sample, however, the sample has significantly changed. The particle size distribution may be altered due to abrasion, or other changes may take place because the proper temperature has not been maintained.
Laboratory Errors
The lab category includes errors from all sample preparation activities and analytical testing as well as from data errors. Sample preparation errors include sample definition, sample collection, and sample handling. These considerations when taking the sample similarly apply in the lab when the laboratory analysts prepare the actual material to be tested. For example, a 100-gram sample of powder mixture is submitted for testing in the lab. The laboratory analyst only needs 5 grams for sample preparation and testing. The definition, collection, and handling of the 5 -gram sample introduce variation into the final test data. After the 5 -gram sample is prepared using procedures such as grinding, extraction, dilution, and so on, the sample is ready for instrumental analysis. If the material properties are highly prone to sampling problems such as material segregation due to different particle size, the sample preparation errors may be significant. Actual testing with an analytical instrument also contributes an analytical error.
Error #9. Analytical Error. The category of lab error includes sample preparation in the lab, which usually consists of several steps. Analytical error also involves the error due to the analytical instrument. This error may be determined by performing an instrument precision study in which the same sample is re-injected into high performance liquid chromatography (HPLC) multiple times and mean and standard deviation are calculated. The analytical error may also include human performance variation in the laboratory procedure including such things as incomplete dissolution, splattering during heating, and reading the meniscus improperly. Use of multiple standards will also contribute to data error. Analytical instruments include pumps, timers, transfer mechanisms, and other mechanical equipment - all of which contribute variation to the final test result.
Error #10. Data Error. In previous discussions, we have not talked about data errors because they are not sampling errors per se. Nevertheless, they affect our summary statistics, data analysis, and final decisions. We include these errors under the Lab category, because that is where they are more likely to occur. They can, however, arise anywhere in the process or in sampling. They consist of all mistakes and inaccuracies that occur in manual data recording, data transfer, and data treatment, whether done manually or by computer.
ERROR #10. DATA ERROR- A NEW SAMPLING ERROR
The new sampling error to be discussed is generally termed "data error." This error has been grouped under the Gy "Preparation Error." However, we view the data error as potentially disüncdy different in source. Also, the data error is often caused by computerized systems. The following examples of data errors are discussed:
* Data recording. This refers to mistakes caused by sampling personnel or laboratory personnel in recording sample weights, analytical results, and other manual recording of data.
* Data transfer (i.e., rounding, manual, and electronic errors). This refers to errors or mistakes in the transfer of original data. Transfer may occur by human intervention or by electronic means.
* Calculations (i.e., rounding, manual, and electronic errors). This refers to calculation errors by humans or by electronic systems. For example, an incorrect formula may be added to an Excel spreadsheet. Automatic rounding of data by computer systems exemplifies this type of data error.
* Data treatment software. This refers to an incorrect choice of data treatment program or options within a software program.
DATA ERROR CAUSED BY DATA RECORDING
Accurate and correct data recording is fundamental to the sampling process. Data recording errors are due to inadvertent mistakes by personnel in recording sample weights, transposing numbers, forgetting numbers, and so on as might happen in the manual recording of data. Data recording must be accomplished in a highly disciplined manner and not done haphazardly. Original data are the basis for decisions and must be considered to be of the highest value. In addition to contributing errors to samples, original data are often requested for review in regulatory audits.
Personnel who record data are often not regularly exposed to data treatment and analysis and may not be aware of the implications of haphazard data recording. Original data may be recorded by manufacturing operators; by scientists, engineers, and technicians in research and development (R&D), technical support, or other research areas; or by contract workers performing services. These individuals may not record all data at the time data are available. They may record data on paper towels, paper bags, plastic containers, or other available materials. Data may be recorded using felttip pens, pencils, or other non-permanent media. Data errors may be erased, covered using whiteout, or other correction media. All of these practices increase the potential for errors in data recording. Additionally, they would be very negatively judged by regulatory auditors.
Data Recording Policy And Procedures
It is recommended that the following be implemented in organization policy and procedures to minimize data recording errors. Training on the following is also highly recommended:
* Data should be recorded at the time available. Recorded data should be signed and dated by the individual recording the data.
* The accuracy of recorded data should be verified by a second individual or by a printout from equipment or analytical instrumentation.
* Policy that original data recording is retained, even if on laboratory paper towels, paper bags, etc., needs to be in place.
* Data should also be recorded in indelible ink. Data recording using felt-tip pens should not be allowed because the ink in these pens may fade over time.
* Data should not be recorded in pencil or other medium that may be changed or erased.
* Data should be exactly recorded and not rounded for recording. If necessary, rounding can be accomplished in data treatment.
* If data are mistakenly recorded, the error should be "lined out" with a single line, and the notation should be signed and dated by the individual who made the correction.
* Whiteout should not be allowed in areas where data are recorded.
DATA ERROR CAUSED BY DATA TRANSFER
Data error caused by data transfer refers to errors or mistakes in the transfer of original data. Transfer may occur by human intervention or by electronic means. Accurate and correct data recording is fundamental to the sampling process. Data integrity may be compromised inadvertently by personnel who are simply transferring data or original records, compiling lists, entering data into computer systems, and so on. Data may also be transferred electronically between different computer software programs. What could possibly go wrong with such a mundane task?
Data transfer must be accomplished in a highly disciplined manner and not done haphazardly. Data transfer may be difficult for experienced and well trained personnel, especially when large amounts of data are being transferred. Such tasks are even more difficult for personnel who may not usually perform such tasks. Administrative personnel including secretaries, filing clerks, and other non-scientific personnel may be asked to transfer data - "because it is so easy; what could possibly go wrong?" Again, depending on the amount of data to be transferred, manual data transfer may be extremely tedious and prone to errors. Aside from adding variation to sample data and potentially impacting decisions made from these data, data transfer errors would be very negatively judged by regulatory auditors.
Rounding - Manual And Electronic
Rounding errors may occur as part of data transfer. Original data may be available with excessive significant figures. Personnel transferring data may simultaneously round these data to more manageable significant figures. This practice increases the uncertainty in the data and may add bias. Electronic rounding may also occur. This happens when original data are transferred to a second software program that can accept only a limited number of digits. Digits may be dropped instead of being rounded appropriately. For example, 2.000 may be transferred as 2, and 2.999 may be transferred as 2. If multiple calculations are performed in the same program with multiple data manipulations, effects on the final result may be significant. Data may be transferred properly into an electronic spreadsheet, but the cell formatting functions in the spreadsheet may be used to round values, truncate values, or convert values to percentages prior to final analysis of the data. Spreadsheets may display data in one form (e.g., rounded), but then perform analysis on another form (e.g., unrounded) of the value. These practices can also contribute to uncertainty or bias in the data.
Data Transfer Policy And Procedures
The following are recommended to be implemented in organization policy and procedures to minimize data transfer errors. These points should be added to organizational training programs on good documentation practices:
* Rounding rules should be clearly stated in policy.
* Manual data transferred should be verified by a second individual, and the record of this verification should be signed and dated.
* Electronic data transfer should be manually verified, and the record of this verification should be signed and dated.
* Electronic data transfer should maintain a sufficient number of significant figures
* If a large quantity of data is transferred by electronic means, a reasonable sample may be verified by a second individual. The sampling of this verification should be documented, and record of this verification should be signed and dated.
DATA ERROR CAUSED BY CALCULATIONS
Data error caused by calculations refers to calculation errors by humans or by electronic systems. Humans make calculation errors, especially when calculation methods are not clearly specified. Computer systems may also make calculation errors. Computer systems may make errors when they are impacted by human intervention, such as when an incorrect calculation formula may be added to an Excel spreadsheet.
Manual And Electronic Calculations
Manual calculations performed by humans are prone to mistakes. Even when all input numbers are correct, mistakes can occur. Calculators may be operated incorrectly, equations may be misread, and so on. People unfamiliar with algebraic notations may be asked to perform calculations because they are presumably so easy. Electronic calculations should be much less likely to produce errors. However, when impacted by humans, these methods may produce errors. For example, equations may be deliberately changed without adequately verifying the success of the change. Equations may be inadvertently changed, such as when formulas are moved around an Excel spreadsheet. Spreadsheet functions may interpret cell values (arguments of the functions) in ways that are not intended (e.g., some functions may treat an empty cell as a numerical zero).
Data Calculation Policy And Procedures
The following are recommended to be implemented in organization policy and procedures to minimize calculation errors. The following points should be added to organizational training programs on good documentation practices:
* Data calculations should be exactly specified in procedures
* There should be supporting documentation for calculation methods
* Calculations done by computer should be validated
* If computer programs require changes on same or associated programs, calculations on presumably unaffected programs should be verified
* If data input into computer systems is required, data input should be verified by a second individual, and the record of this verification should be signed and dated
* If other selections (i.e., choices of program options) are required in computer operations, these should be specified in procedure
* Manual calculations performed by individuals should be verified by a second individual, and the record of this verification should be signed and dated.
DATA ERROR CAUSED BY DATA TREATMENT SOFTWARE
Data error may be caused by an incorrect choice of data treatment program. Often decisions are not based on raw data themselves, but on statistics that are calculated from the raw data. Even if the integrity of the raw data is properly preserved during recording and transfer, an inadequate computing system or use of an inappropriate statistical procedure during analysis may lead to an incorrectly calculated or inappropriate decision statistic. For example, complex calculations such as regression may be performed using computer software that does not preserve an adequate number of digits in its computations. This can lead to computer round-off error and incorrect estimates of statistics such as slopes or intercepts. An example of an inappropriate decision statistic in MS Excel is the use of STDEVPO to calculate the standard deviation of a small sample rather then the appropriate function STDEV(). More subtle mistakes such as the use of statistical tests on data that violate the assumptions of normality, variance homogeneity, or independence may also occur. It is common to use an inappropriate decision statistic, such as the use of the correlation coefficient (which measures the strength of a linear relationship) as a measure of linearity. The opportunities for mistakes in the choice of proper statistical procedure are many, and the erroneous results can be devastating. The presence of such mistakes can go unnoticed, and avoiding them requires proper statistical training and careful peer review, verification, and validation of analysis approaches.
Data Treatment Software Policy And Procedures
The following are recommended to be implemented in organization policy and procedures to minimize data errors due to software errors. The following points should be added to organizational training programs on good documentation practices:
* Selection of computer software for use in calculations should be clearly specified in procedures
* Selections of computer systems, analysis packages, decision statistics, and statistical procedures should be reviewed by someone with adequate statistical training and experience
* Calculations done by computer should be validated
* If other selections are required in computer operations, these should be specified in procedure.
WHEN DO DATA ERRORS OCCUR?
The categorization of data errors described herein indicates that errors may occur at any time during the sampling process. The following examples demonstrate.
Data Errors On The Manufacturing Floor
Data errors often occur on the manufacturing floor. These errors are often unexpected and easily overlooked. The following are examples of actual occurrences.
Sample "Perfection." Process validation was being conducted on a blending process in manufacture of a tablet dosage form. The validation protocol required that the manufacturing operator withdraw 12 samples from the mixer using a thief. Each sample was required to be between 0.2 and 0.3 grams. When all samples were tested by the laboratory to determine the active drug content in the blend, test results failed and test data showed unexpectedly high variability. Further examination of the sampling data showed that all samples submitted by the operator were between 0.248 and 0.252 grams - a highly unlikely occurrence. Further discussion with the operator indicated that he made sure that all samples were in the middle of the target range by removing excess powder by pouring powder from the sample container, removing powder with a spatula, adding powder back into the sample, and so on - whatever was needed to submit a perfect sample (in his mind) to the lab. These manipulations caused powder segregation, loss of active ingrethent, and ultimately data variation. A subsequent trial in which sample weights were not manipulated resulted in successful validation data.
Data Recording. The manufacturing operator removed samples from the drying oven after completion of a drying process. Sampling was recorded on a validation sampling page that included a sample number, sample weight, and oven shelf number. His sample recording page had a number of crossouts and entry mistakes. The operator expected to be criticized for his sloppy sampling page, so he rewrote the page. In rewriting the page, several sample weights were incorrectly rewritten so that weights did not correspond with the sampling location. The laboratory results indicated several high values which should have been able to be correlated to high weights. However, because the sampling records had been incorrectly rewritten, correlation was initially not possible. Further discussion with the operator and retrieval of the original sample page allowed the correct correlation to be made.
Data Errors In The Lab
Data errors often occur in the lab, especially when there are many manual operations and multiple data transfers in a single analytical process.
Laboratory Efficient Data Sheets. A laboratory routinely optimized sample test performance through use of formatted data sheets for HPLC analyses. This sheet required all samples to be tested in a specified order related to the concentration of the unknown. Samples from a process validation performance qualification were submitted for analysis. Laboratory personnel followed their internal procedures regarding use of data sheets. However, this procedure destroyed the continuity of the process validation samples as sampled during the process. Because new laboratory analytical standards were freshly prepared each day, distinct differences in data were observed depending on the assay day and associated standard. Better communication between validation personnel submitting samples and laboratory personnel explaining the importance of maintaining sample continuity would have prevented this problem.
Instrument Recording. Instruments can sometimes record erroneous data. For example, an ana-
lytical instrument took recordings every second. The numbers were all below one and were recorded to four decimal places. The analyst sent the data printout to the statistician. When re-entering the numbers one at a time into a program for data analysis, the statistician noticed that several strings of numbers were exactly the same. When this abnormality was pointed out, the chemist realized that the instrument had gotten "stuck" several times. The data were, therefore, not valid and the experiment had to be repeated. If the numbers had been transferred electronically directly to the statistical software, they may not have received the scrutiny necessary to find the errors.
Data Errors During Data Treatment
Data treatment may occur on the manufacturing floor, in the laboratory, and with R&D scientists. Certain testing may include data treatment including data transfer in all of the above areas and may include multiple treatments such as calculations, summary tablets, and writing a final report. When unusual results are noted, data treatment errors are seldom initially considered as the cause of these results.
Massive Data Transfer. Process validation of a tablet compressing process was conducted. Testing was conducted according to US Food and Drug Administration guidelines (4) in which 140 individual tablet samples were required to be tested for each lot in the validation protocol. A total of seven lots were tested to validate the manufacturing process for a multiple dosage strength product. Nearly 1000 test results were thus supplied to technical personnel from the lab. These test results were then transferred to Excel spreadsheets for calculations and graphing. Test results were simultaneously transferred to Microsoft Word documents for preparation of validation reports. When reports, calculations, and original laboratory data were reviewed for consistency, many errors and inconsistencies were found. These errors were technically insignificant, but were very embarrassing to the organization when observed by government auditors. Errors were due to the great number of data transfers, insufficient time to verify transfers, tight timelines, and inadequately trained personnel performing data transfers. A total of seven data transfers were conducted in the above example from sample origination through writing of the final report.
Statistical Analysis. The inclusion of spurious data can skew results or invalidate underlying statistical assumptions, leading to mistaken conclusions and decisions. Reliance on an assumption that the data are consistent with a normal distribution, when in fact the data are highly skewed or contain outliers, can result in statistical tests that show misleading significant differences between sample means. Confidence intervals may be confused with tolerance intervals. For example, asserting with 95% confidence that 99% of the tablets tested have the required potency requires using tolerance intervals. A confidence interval is used to provide a measure of uncertainty around an estimate of a population parameter, such as the mean or potency life.
SUMMARY
Data error is a new and distinct error to be identified among sampling errors. This error was previously included in other errors originally identified by Gy (2). However, the sources of this error are separate and distinct from other errors in sampling. Corrective and preventive actions to prevent data errors are unique and separate from those addressing other sampling errors.
Data errors comprise the following:
* Data recording errors, which include errors caused by sampling personnel or laboratory personnel in recording weights, data, results, and other recording usually manual in nature.
* Data transfer errors, which include errors or mistakes in the transfer of original data. Transfer may occur by human intervention or by electronic means.
* Calculation errors, which include calculation errors by humans or by electronic systems. Automatic rounding of data by computer systems exemplifies this type of data error.
* Data treatment software errors. This refers to an incorrect choice of data treatment program or options within a software program.
Accurate and correct data recording is fundamental to the sampling process. Data recording must be accomplished in a highly disciplined manner and not done haphazardly. Original data are the basis for decisions and must be considered to be of the highest value. Data transfer is equally important in maintaining the integrity of original data. Rounding errors may occur as part of data transfer. Large amounts of data to be transferred may cause data transfer to be extremely tedious tasks that are highly prone to unintentional errors and mistakes. Humans make calculation errors, especially when calculation methods are not clearly specified. Computer systems may also make calculation errors. Computer systems may make errors when they are impacted by human intervention. Computer software may also cause data errors when they are inadequate to the analysis task (e.g., fail to preserve sufficient digits in calculations), or when they are used inappropriately (e.g., incorrect spreadsheet functions, violation of statistical assumptions, or use of an inappropriate decision statistic). Specific recommendations for policy and procedures to prevent the aforementioned data errors are provided.
Data error is an important consideration when evaluating sources of variation and errors associated with sampling and data. Data error may be part of process error, sample error, and laboratory error, and may impact the quality of data associated with these potential sampling errors. Data error should be a separate and distinct consideration when conducting investigations and reviews.
REFERENCES
1. Smith, Patricia L., A Primer for Sampling Solids, Liquids, and Gases Based on the Theory of Pierre Gy, Philadelphia: The Society for Industrial and Applied Mathematics, 2001.
2. Gy, Pierre M., Sampling of Heterogeneous and Dynamic Material Systems: Theories of Heterogeneity, Sampling and Homogenizing, Amsterdam, Elsevier, 1992.
3. Pitard, Francis F., "The In Situ Nugget Effect: A major component of the random term of a variogram," presented at the Third World Conference on Sampling and Blending, October 23-25, 2007, Porto Alegre, Brazil.
4. FDA, Guidance for Industry. Powder Blends and Finished Dosage Units - Stratified In-Process Dosage Unit Sampling and Assessment. Draft Guidance, October 2003.
ABOUT THE AUTHORS
Patricia L. Smith, Ph.D., is a statistician and process improvement specialist with experience in academics, industry, and consulting. Dr. Smith provides training and consulting in the sampling of solids, liquids, and gases for industry and government, as well as in the Six Sigma methodology. Her book, A Primer for Sampling Solids, Liquids, and Gases, is a practical guide for those in the field as well as an introduction for the theoretician. Dr. Smith may be reached by e-mail at [email protected].
David LeBlond, Ph.D., is Principle Research Statistician at Abbott, Abbott Park, IL. Dr. LeBlond obtained an MS in statistics from Colorado State University and a Ph.D. in biochemistry from Michigan State University. He has 30 years experience in the pharmaceutical and medical diagnostics field. He serves on the statistical expert team in PhRMA. Dr. LeBlond may be reached by e-mail at [email protected].
Copyright Advanstar Communications, Inc. Spring 2010
