Statistical sampling
In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population.
As the population is usually too large and impractical to directly approach, the attempt is for the samples to represent the population closest as feasible. Two advantages of sampling are lower cost and faster data collection than measuring the entire population.
Each observation measures one or more properties (such as weight, location, colour) of observable bodies distinguished as independent objects or individuals. In survey sampling, weightage can be applied to the data to adjust for the sample design assumptions / approximations.
Results from probability theory and statistical theory are employed to guide the practice. In business, opinion / perception surveys and research, sampling is widely used for gathering information about a population. Acceptance sampling is used to determine if a production lot of material meets the governing specifications.
The sampling / survey project is formulated in all possible detail such as: purpose, timeframe, roles / responsibilities, resources, monitoring / review / analysis mechanism, reporting etc. The sampling project formulation must address:
Population - the expected quantity / volume / numbers in the process eg. Number products manufactured in a production shift
Sample - the number of items selected based on a sampling plan, expected to represent the population
Sampling frame - where every item can be identified and included in sampling
Sampling methods – probability based where every item has a chance to be represented and involves random selection and, others where the chance is indeterminate or even non-existent
Sample size determination – decide on the sampling table to be followed, select the parameters, locate the sample size in the appropriate cell of the table
Sample size tables – several useful tables are available for attribute / variable data sampling
Sampling and data collection – carry out observations / measurement based on selected sample
Sampling errors (random variation in the results due to the elements in the sample being selected at random), biases (when the true selection probabilities differ from those assumed in calculating the results)
Non-sampling errors:
- Over-coverage: inclusion of data from outside of the population
- Under-coverage: sampling frame does not include elements in the population
- Measurement error: e.g. when respondents miNon-sampling errors:sunderstand a question, or find it difficult to answer
- Processing error: mistakes in data coding
- Non-response or participation bias: failure to obtain complete data from all selected individuals
Review of sampling process and collected data – for ensuring that the sampling has been done according to plan, fulfilling objectives of the study
Sampling data processing and analysis – to process the data into useful information
Reporting – the outcomes for decision making by management / leadership
Acceptance sampling - statistical sampling to determine whether to accept or reject a production lot of material. Often, a producer supplies the consumer a number of items and a decision to accept or reject the items is made by determining the number of defective items in a sample from the lot. The lot is accepted if the number of defects falls below where the acceptance number or otherwise the lot is rejected.
Various acceptance sampling plans are available (ISO 2859, IS 2500 etc.) for single and multiple sampling applications. Multiple sampling plans use more than two samples to reach a conclusion. A shorter examination period and smaller sample sizes are features of this type of plan. Although the samples are taken at random, the sampling procedure is reliable.
Statistical sampling plans cover both attributes (qualitative) and variable (quantitative) plans.
Benefits of statistical sampling
- Cost effectiveness – studying the whole population is extensive and would require lot of resources. A representative sample is usually less expensive.
- Time effectiveness - Ease of collecting a sample, researching, handling, tabulating and analyzing the results, due to less time spent
- 100% study is usually not as effective as believed, due to fatigue and other factors which lower the reliability and accuracy of data
- Very effective for destructive testing, where only a very small sample can be selected for testing due to obvious cost / other concerns
- Statistical probability basis, increasing the overall assurance and strength factors of the sampling based decisions