File Name: stratified sampling questions and answers .zip
- How Stratified Random Sampling Works
- Simple Random vs. Stratified Random Sample: What's the Difference?
- Stratified Sampling
Actively scan device characteristics for identification.
How Stratified Random Sampling Works
There is a need for better estimators of population size in places that have undergone rapid growth and where collection of census data is difficult. We explored simulated estimates of urban population based on survey data from Bo, Sierra Leone, using two approaches: 1 stratified sampling from across 20 neighborhoods and 2 stratified single-stage cluster sampling of only four randomly-sampled neighborhoods.
The stratification variables evaluated were a occupants per individual residence, b occupants per neighborhood, and c residential structures per neighborhood. For method 1 , stratification variable a yielded the most accurate re-estimate of the current total population.
Stratification variable c , which can be estimated from aerial photography and zoning type verification, and variable b , which could be ascertained by surveying a limited number of households, increased the accuracy of method 2. Small household-level surveys with appropriate sampling methods can yield reasonably accurate estimations of urban populations.
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication. Data Availability: All relevant tabular data are within the paper and its Supporting Information files.
There is no past, present or future Intellectual Property associated with the work described in the paper, and none of the authors have any financial interests or conflicts in the outcome of the study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. This conversion is not always necessary, because some epidemiological parameters can now be estimated from incidence counts alone, including the interval between successive cases, and the reproductive number R 0 , which is the average number of secondary cases attributable to a primary cause [ 1 , 2 ].
If these parameters are insufficient to evaluate the models, it may be necessary to calculate the total population N. The 5 brief examples that follow illustrate both the necessity of doing so, and some of the difficulties that may be encountered. In resource-limited environments, it may be possible to use both aerial imagery and limited residential survey data to estimate the population of a region of interest, as shown in the first two examples.
Using ground truth data for the measured population of 20 sections in Bo City, Sierra Leone, we compared the uncertainty of estimating the population using survey data for either 1 occupants per residence or 2 rooftop area per resident. The latter variable was computed by manually digitizing the rooftop areas of residential structures in 5 sections of Bo, and calculating the ratio of rooftop area per occupant for each residence [ 3 ].
The ability to rapidly estimate the population of both temporary and unplanned settlements is critical for planning resource allocation for refugee and internally displaced populations as well as for places undergoing rapid unplanned urbanization, since in these settings there is usually not a stable residential population. Checchi et al. As shown in the next 2 examples, if salient population data are available either directly or by interpolation; derived rates of infection, immunity, or morbidity may be calculated.
Glasser et al. They applied a SEIR model parameterized by demographic parameters for the United States , including the total population stratified by age. The age-specific death rates attributable to pneumonia and influenza were estimated, as were the death rates from all other remaining causes. Gomez-Elipe et al. In demographically-diverse environments, different methods may be required to estimate the population at different locations, as shown in our final example. In addition to enumerated city population data, city footprints can be established by analyzing nighttime satellite images, but this approach may fail to capture small informal settlements in Africa and rural Asia [ 8 ] page 9.
Accordingly, several corrections are applied for poorly illuminated settlements [ 8 ] page 9 , and point estimates are provided for settlement populations exceeding 1, In a previous study [ 3 ], a Finite Population Bootstrap FPB [ 10 ] page 92 was used to compare the relative uncertainty of two population estimators: an occupancy-based estimator and a rooftop area-based estimator.
For the region of interest, the former was estimated as the product of 1 the average number of persons per residential structure multiplied by 2 the total number of residential structures; and the latter was calculated as 1 the average number of persons per rooftop area i. Both the occupancy-based and rooftop area-based population estimators were evaluated by simulating simple random sampling without replacement SRSWOR. The analysis in this current paper will evaluate the use of stratified sampling for population estimation, and will demonstrate the reduction in the uncertainty of the population estimate achievable relative to SRSWOR.
The relative advantages and restrictions of both methods will be discussed. The city of Bo itself is approximately These sections vary in size from 0. For 20 of the 68 sections, residential survey data are also available [ 3 ] see Table 1. The ground truth survey data for these 20 sections will provide the basis for simulated sampling using different stratification protocols, and for quantifying the reduction in the uncertainty of the population estimate achievable. The first approach, optimal stratification by persons per structure, requires that the number of persons per structure be already known for all residential structures; possibly from a prior survey or census data.
The objective is to exploit this prior data to design an improved stratification protocol for re-estimating the population, and to demonstrate a significant reduction in the uncertainty of the population estimate relative to random sampling.
Single-stage cluster sampling is useful if the number of sections that can actually be sampled is restricted, perhaps because of cost or schedule limitations.
In our examples, the simulated cluster sampling will be restricted to 4 of the 20 available sections. We will investigate the reduction in uncertainty that can be achieved by using a stratified cluster sampling protocol, rather than random selection, to select the 4 sections on each simulation trial.
Each section will be completely sampled. Note that choice of population estimators is independent of the stratified sampling protocol selected for simulated data collection. A stratified Horvitz-Thompson [ 12 ] population estimator will be evaluated for all examples. We have also extended our original FPB model to support stratified sampling [ 10 ], and partial results from the latter will be contrasted with estimates obtained using the stratified Horvitz-Thompson estimator.
We will use a single dataset developed previously in [ 3 ] see Table 1. This dataset contains individual records for each of 1, residential structures surveyed. Each record includes the number of persons in the structure, a variable that we will utilize in this paper.
The survey methodology and data collection methods used to construct the dataset analyzed in this manuscript were all developed previously. The original articles [ 3 , 11 ] should be consulted for a complete discussion.
The current article complements and extends these prior studies, but does not supplant them. The utility of these methods for the 5 initial examples, which were presented to establish the importance of estimating the population of a region of interest, will depend upon the availability of partial survey data for occupancy, the existence of adequate estimates of the total number of residential structures, and the presence of stable patterns of residential occupation.
Neither method is likely to be useful for improved estimation or re-estimation of the population of a highly transient population living in temporary shelters as described by Checchi et al. The simulations described in this investigation were written in the programming language R [ 13 ]. Supporting functions from multiple R libraries were used, including [ 14 — 16 ]. Additional custom code was written and tested by the first author.
In all of the examples presented here, the true optimal boundaries were found through exhaustive search. Given the relatively small size of the dataset 1, records , all possible combination of strata boundaries were tested to determine which set minimized the uncertainty of the population estimate as a function of sample size [ 17 ] page Naval Research Laboratory.
Written informed consent was obtained from each household representative who participated in the survey. Survey data were obtained as part of a broader study to determine not only population demographics but health metrics and health care utilization trends. Structures in Bo City were divided into two categories. Fig 1 in [ 3 ] shows the 20 sections in which the surveys were conducted.
The surveyors received several days of training, including instruction on geographic data collection using hand-held GPS units, interviewing techniques, and research ethics—including an emphasis on confidentiality.
During the interviews, one representative—an adult of either sex—served as a representative of each household. Each residential record lists the number of persons reported living within the same residential structure, and the number of separate households.
No attempt was made to differentiate between persons based on gender, age, or household affiliation. Institutional review boards IRB at all three institutions approved the data collection methodology. Our sampling frame is a list of 1, residential structures encompassing 20 of the 68 sections in Bo City. For each residential structure, there is a unique single record listing the number of persons and households; because these records can be randomly selected, this database will provide the basis for simulated sampling of residential structures.
A cluster is defined as a logical collection of PSUs [ 21 ] page 24 ; in this study, a cluster and a Bo City section will be treated as synonymous in the context of single-stage cluster sampling. The flowchart in Fig 1 summarizes the algorithms and simulations that will be developed in the text. The objective of this study is to investigate alternative approaches for stratified sampling of the residential structures in a resource-limited environment, and to determine the relative reduction in the uncertainty of the estimate of the total population—if any—that results.
In all cases, it is assumed that at least the number of residential structures in each section are known. This flowchart may be referenced as the two major protocols are developed and simulated in detail. This figure summarizes all of the optimization and control protocols for stratified sampling developed in this study.
See text for a summary of each major protocol and its corresponding steps through the flow chart. The light brown parallelogram is the starting point for all protocols, the yellow diamonds are decision boxes, and the light green squares denote the process end states.
As with any stratified sampling scheme, the PSUs Primary Sampling Units —the 1, individual residential structures see Table 1 —must first be divided into mutually-exclusive and exhaustive strata [ 21 ] page After the stratification boundaries have been determined, simulated sampling can be executed.
Based on pilot studies, we determined that 4 levels of stratification would be sufficient for proof of concept. The stratification and estimation algorithms will be summarized later. The survey variable X and the stratification variable Y are the same—specifically, the number of persons per residential structure. For this reason, it was not necessary to model the relationship between Y , the measured survey variable persons per residential structure , and X , the stratification variable [ 17 ].
On each simulation trial, a subset of the PSUs were randomly selected from each stratum as a function of 1 the total sample size and 2 the allocation algorithm selected. This step created a stratified sample of the PSUs. A stratified Horvitz-Thompson estimator was then used to re-estimate the total population of the 20 pooled sections [ 12 , 17 , 21 ].
When schedule or resources restrict the survey to a subset of sections within the region of interest, single-stage cluster sampling can be applied. If there is no restriction on the number of sections to be sampled, all sections can be sampled without replacement for a given sample size. Assume that the number of residential structures per section is known, but not the number of persons per section. The 20 sections will first be partitioned into the desired number of mutually-exclusive strata, using the section sizes i.
Each residence in a section will be assigned to the same stratum. For each trial of the stratified single-stage clustering protocol, one section will be selected from each stratum, and all of the residences in the selected sections will be completely sampled.
For the control case, the same number of sections will be selected, but the stratification boundaries will be ignored. In effect, in the control case, all sections will be assigned to a single stratum. Single-stage cluster sampling may also be executed without stratification, but in the simulations that follow, the uncertainty of the population estimate will be roughly doubled for the unstratified case.
The cluster sampling protocol is appropriate when financial or schedule constraints impose limits on the number of sections to be sampled.
The advantages of stratified cluster sampling are: No auxiliary data is required other than a count of residential structures in each of the 20 sections under consideration.
Simple Random vs. Stratified Random Sample: What's the Difference?
The precision and cost of a stratified design are influenced by the way that sample elements are allocated to strata. One approach is proportionate stratification. With proportionate stratification, the sample size of each stratum is proportionate to the population size of the stratum. Strata sample sizes are determined by the following equation :. Another approach is disproportionate stratification , which can be a better choice e. To take advantage of disproportionate stratification, researchers need to answer such questions as:.
Stratified Sampling Revision. Maths Made Easy gives you access to maths worksheets, practice questions and videos to help you revise.
You are viewing an older version of this Read. Go to the latest version. We have a new and improved read on this topic. Click here to view. We have moved all content for this concept to for better organization.
Never miss a great news story! Get instant notifications from Economic Times Allow Not now. The five forces model of analysis was developed by Michael Porter to analyze the competitive environment in which a product or company works. The threat of entry: competitors can enter from any industry, channel, function, form or marketing activity. How best can the company take care of the threat of new entrants?
Finding size of subgroups
The groups or strata are organized based on the shared characteristics or attributes of the members in the group. The process of classifying the population into groups is called stratification. Stratified random sampling is also known as quota random sampling and proportional random sampling. Stratified random sampling has numerous applications and benefits, such as studying population demographics and life expectancy. Stratified random sampling divides a population into subgroups. Random samples are taken in the same proportion to the population from each of the groups or strata. The members in each stratum singular for strata formed have similar attributes and characteristics.
Home QuestionPro Products Audience. Stratified random sampling is a type of probability sampling using which a research organization can branch off the entire population into multiple non-overlapping, homogeneous groups strata and randomly choose final members from the various strata for research which reduces cost and improves efficiency. Members in each of these groups should be distinct so that every member of all groups get equal opportunity to be selected using simple probability. Select your respondents.
The findings from a study of young single mothers at a university can be generalised to the population of:. Instructions Answer the following questions and then press 'Submit' to get your score. Question 1 A sampling frame is: a A summary of the various stages involved in designing a survey. Question 2 A simple random sample is one in which: a From a random starting point, every n th unit from the sampling frame is selected. Question 3 It is helpful to use a multi-stage cluster sample when: a The population is widely dispersed geographically.
Home QuestionPro Products Audience. Stratified random sampling is a type of probability sampling using which a research organization can branch off the entire population into multiple non-overlapping, homogeneous groups strata and randomly choose final members from the various strata for research which reduces cost and improves efficiency. Members in each of these groups should be distinct so that every member of all groups get equal opportunity to be selected using simple probability.