1. Respiratory syncytial virus (RSV) is a common viral infection in young children. You can get RSV multiple times, although subsequent cases are usually less severe. With an approved RSV vaccine, we expect the number of cases to decline. With the decline in cases, we will assume that the number of times contracting RSV in children under the age of 5 follows a Poisson distribution with parameter λ = 2.6 episodes per year.
1. What is the probability of a child under 5 getting RSV 6 or more times in one year?
4.9%
• What is the probability of a child under 5 getting RSV exactly 3 times in one year? 21.76%
• What is the probability of a child under 5 getting RSV less than 3 times in one year?
51.8%
• As part of an outbreak investigation, an epidemiologist reviews information on 45 individuals who attended a church picnic and ate lunchmeat tainted with salmonella. Of these 45 individuals, 32 developed gastrointestinal illness. In general, 62% of individuals who ingest salmonella develop gastrointestinal illness. Assume a binomial distribution for the number of individuals who become ill from eating tainted lunchmeat.
1. What is the probability that 32 individuals out of 45 who ate the lunchmeat would develop gastrointestinal illness?
P(X = 32) = (45,450,574,620) * (0.62^32) * (0.38^(45 – 32)), P(X = 32) ≈ 0.0573
5.73%
• What is the probability that 32 or more individuals out of 45 who ate the lunchmeat would develop gastrointestinal illness?
a=1-CDF(“binomial”, 32, 0.62, 45)
= p(x>/=32)=0.1337
= 13.37%

• How many individuals out of the 45 who ate the lunchmeat would we expect to develop gastrointestinal illness?
(mean)=45⋅0.62=27.9
28 people would be expected to develop gastrointestinal illness.
• Provide a point estimate and 95% confidence interval based on the binomial distribution for our sample of 45 individuals with 32 developing gastrointestinal illness. Does this confidence interval include the population proportion?Hint: You can calculate the CI either by hand or with SAS
Point estimate = 32/45= .71 or 71.11%
CI = 0.71 ± 1.96 * √( (0.711 * (1 – 0.711)) / 45 ), CI = 0.7111 ± 1.96 * √(0.21 / 45)

= 0.711 ± 1.96 * 0.092, 0.711 – (1.96 * 0.09176) ≈ 0.53, 0.711 + (1.96 * 0.09) ≈ 0.89
(0.53, 0.89)
The CI is approximately 0.53 to 0.89, and it represents the range within which we are 95% confident the true population proportion lies. The interval includes the population mean.

• Based on population-level data for adult men aged 60 -79 years in the United States, systolic blood pressure (SBP) has a normal distribution with a mean of 133 mm Hg and a standard deviation of 13 mm Hg
1. Calculate the 95th percentile of SBP in this population.
X – 133 = 1.645 * 13; X – 133 = 21.385; X = 133 + 21.385; X ≈ 154.385=
154.36
• Normal SBP is 120 mmHg or less. What proportion of the adult men aged 60 -79 years in the US population has a normal SBP?
Z = (120 – 133) / 13,  Z = (-13) / 13
P(Z ≤ -1) = 0.1587 =
15.87% of the adult men aged 60-79 years in the United States have a normal SBP (120 mm Hg or less).
• What proportion of the adult men in the US population aged 60 – 79 are estimated to have stage one hypertension, defined as having an SBP between 130 and 139 mm Hg?
P(-0.2308 ≤ Z ≤ 0.4615) ≈ 0.6764 – 0.4090 ≈ 0.2674 =
26.74%
• Using data from a national, representative survey of the US population, we estimate the mean SBP for adult men aged 60 – 79 to be 128 mm Hg from a sample size of 100. Provide a point estimate and 95% confidence interval for this sample using the population standard deviation. Does this confidence interval include the population mean? Hint: calculate this by hand – Week 5 PPT
Point Estimate = 128 mm Hg
=128-(1.96 * 1.3) ≈ 125.58 mm Hg

=128+(1.96 * 1.3) ≈ 130.42 mm Hg

95% confidence interval for the mean SBP for adult men aged 60-79 based on the sample (sample size 100) and assuming a population standard deviation of 13 mm Hg

=125.58 mm Hg – 130.42 mm Hg

=128 Hg; yes, the 95% confidence interval includes the population mean.

• We want to estimate the incidence rate of stomach cancer in South Dakota in adults. We take data from a random sample of 51,000 South Dakota adults over the age of 18 without stomach cancer and follow them all for six years (assume all individuals were followed for all six years). In that time, we find that 8 individuals develop stomach cancer.
1. Based on this sample, provide a point estimate and 95% confidence interval for the incidence rate of stomach cancer in South Dakota adults per 100,000 person-years
Number of new cases = 8 individuals during the six years.

Total person-years of observation in the sample = 51,000 SDs*6 years=306,000 person-years. Point estimate=8 cases/306,000 person-years≈0.0000261 cases per person-year

95% CI = 8 cases/51000     =0.0001569 – 0.000108504 ≈ 0.000048395 and = 0.0001569 +0.000108504 ≈ 0.000265405  all x100000

Point Estimate: point estimate for the incidence rate of stomach cancer in South Dakota adults is 0.0000261 cases per person, per year.
= 2.61 cases of stomach cancer per 100,000 person-years

95% CI = (2.136, 3.175); 95% confident that the incidence falls within the interval.

• Data for the following questions is based on a youth survey but modified (values changed, variables added) to fit the needs of this class. The data examines BMI, time spent sitting, asthma, and overall health in children aged 4-14 years. The data are provided in Data Set Youth_data.xlsx (data dictionary in Youth_Data Dictionary.doc).
1. Explore the following variables: age, sex, race, general health rating, rural/urban status, height, BMI, sitting time after school, sitting time on weekends, poverty level, and asthma. Determine if there are any unusual values or any missing data that need to be recoded. Describe the process that you used to explore the data and what, if anything, you found and how you dealt with it. Use the one-way frequencies and the data dictionary to determine the type (continuous vs. categorical) of each variable. List how you categorized each variable. (Make sure you recode the variables before you move on to the next steps.)
• Provide tables of descriptive statistics for age, sex, race, general health rating, rural/urban status, height, BMI, sitting time after school, sitting time on weekends, poverty level, and asthma. Include the sample size, mean, standard deviation, median, minimum, and maximum values for continuous data and frequency and percentage for categorical data. Include a count of any missing data for any of the variables. (You should have one table for continuous variables and one table for categorical variables.)

Table 1: Descriptive Statistics of Continuous Variables of Youth Activity

Table 2: Descriptive Statistics of Categorical Variables in Youth Activity

• Construct and present a graph (e.g., histogram, boxplot) to examine the distribution of BMI and discuss what you see in terms of the mean, median, shape, and outliers.
• Provide a point estimate and 95% confidence interval for the mean of minutes spent sitting on a typical day after school.
Sample size: 690; mean = 108.5; standard deviation 64.74; CI=95%
Point Estimate: 108.5 minutes (mean) youth spend on watching tv on a typical day after school; 95% Confidence Interval (103.67, 113.33)
• Provide a point estimate and 95% confidence interval for the mean of minutes spent sitting on a typical weekend day.
Sample size: 690; mean = 193.79; standard deviation 38.5; CI95%
Point Estimate: 193.8 minutes (mean) youth spend on watching tv on a typical weekend day; 95% Confidence Interval (195.91, 201.65)
• Provide a point estimate and 95% confidence interval for the proportion of youth with an asthma diagnosis. Give both the normal approximation and exact confidence intervals. How do the normal approximation and exact confidence intervals compare?
Sample size690; mean=1.896; standard deviation 0.306; CI95%
Point Estimate: 1.896 (mean) of youth with an asthma diagnosis; 95% Confidence Interval (1.87, 1.92)