1.

For each of the following questions you only need to write down the final answer with a brief justification for why you claim your answer is true. You don’t need to derive anything.

(a) (5 points) Let . What is the sampling distribution of ?

SOLUTION

By the definition of the Binomial Distribution

(b) (5 points) Let . What is the sampling distribution of ?

SOLUTION

By the Algebra of Poison Random Variables

(c) (5 points) Let . What is the sampling distribution of ?

SOLUTION

By the Algebra of Normal Random Variables

(d) (5 points) Let . What is the sampling distribution of ?

Hint: Refer to Probability, Statistics & Data, Theorem 5.4, or Week-4 slides.

SOLUTION

By Theorem 5.4:

has a distribution with degrees of freedom

Therefore:

(e) (10 points) Let and . Let . What is the relationship between and such that ? If , , and , draw an illustration of these two quantities in the same plot.

SOLUTION

By the process of Standardization:

as such the relationship between and is:

When , , and , can be found as:

Below is the plot illustration of the two distributions where

  • Orange →
  • Blue →


2.

Bob is a budding social media influencer who is hoping to make it big in the TikTok influencer space. Suppose you work at TikTok, and you know that each of Bob’s TikToks go viral with probability .

(a) (5 points) For a TikTok posted by Bob, let be the outcome where if viral and if not. What is the distribution of ?

SOLUTION

  • (b) (5 points) Bob conducts an experiment with TikToks. In words, what does measure?

SOLUTION

(c) (5 points) Write the sampling distribution of and its PMF .

SOLUTION

(d) (5 points) Write Bob’s best guess for in terms of .

SOLUTION

(e) (5 points) In words, describe the events for and for .

SOLUTION

Bob’s experiment observed his TikTok going viral has a probability greater than some probability

The number of videos going viral observed is more than some value

(f) (5 points) If , what is the relationship between and ?

SOLUTION

From the previous subproblems, we can find the event

Given the condition, we can compare that

For these two comparisons to have the same number of occurrence, the following relationship must be true:

(g) (5 points) Write the mathematical expression for event : Bob’s estimate is greater than 20%.

SOLUTION

(h) (5 points) Using parts (e) and (f), write the final expression for the probability that in terms of a Binomial probability.

SOLUTION


3.

Bob recognizes that his answer for Question 2 is annoying to compute since it involves sums of Binomial probabilities. So, he remembers that there was some way of using the central limit theorem to get the final answer.

(a) (5 points) What is and for in Question 2?

SOLUTION

(b) (5 points) Using HW-3 properties, what is and ?

SOLUTION

(c) (5 points) Using the CLT, let . What is the approximate distribution of ?

SOLUTION

Since follows the CTL’s equation where:

has an approximate distribution of

(d) (5 points) For , what is the relationship between and such that ?

SOLUTION

From the previous subproblems, we can find the event

Given the condition , we can compare that

For these two comparisons to have the same number of occurrence, the following relationship must be true:

(e) (5 points) Express the probability of event from Question 2(g) using the standard normal CDF .

SOLUTION

From 2g:

With the calculated :


4.

A survey was conducted on n = 200 participants from the United States asking: “Who did you vote for in the 2024 elections?”. A summary of the responses is below:

ResponseRespondents
Red Party85
Blue Party115

(a) (5 points) Let be a placeholder for whether a participant votes Blue. What is the distribution for ?

SOLUTION

Since represents individual trials of participant:

(b) (5 points) What is the population parameter of interest? Interpret its meaning.

SOLUTION

The population parameter of interest is : the true proportion of all voters in the US who votes Blue

(c) (5 points) What is the best guess based on the data? Is this a statistic or a parameter?

SOLUTION

Since this is a value calculated from sample data, this is a statistic

(d) (5 points) Write the sampling distribution for .

SOLUTION

(e) (5 points) Construct a 95% confidence interval for .

SOLUTION

Given a interval, the value for and are:

Finding Standard Error:

Now having all the information needed, find the Confidence Interval:

(f) (5 points) Provide a brief interpretation of the interval from part (e).

SOLUTION

The Confidence Interval found represents a confident that the true proportion of US voters voting for the Blue party to be between and

(g) (5 points) A critic says this interval is only valid if is Normal or approximately Normal. Is this true for this data? Explain.

SOLUTION

The critic is right that this interval is valid only when is approximately Normal since it relies on the assumption that the sampling distribution is a Normal Curve (Bell)

This data follows a Normal Distribution since both Red and Blue party counts are large enough to model using Normal distribution.

(h) (5 points) A headline says “The Blue party will win the 2024 elections!“. Does this align with your findings? Rephrase it better.

SOLUTION

This headline is not accurate as the lower bound found was , which has a decent probability that the result of the election will result in a tie or Red Party’s victory.

A better way to phrase this finding is:

“The Blue party is currently in favor to win, but the results remain close”


5.

A survey was conducted on participants from UCSD asking: “Do you think Artificial Intelligence (AI) is going to replace our jobs?”. In addition to the responses to this question, the participants were also asked what their broad major was. The breakdown of the responses by major is below:

Major \ ResponseYesNo
Science & Engineering8040
Arts & Humanities5030

We are interested in constructing a confidence interval for the difference in opinions about AI for Science & Engineering majors vis-á-vis Arts & Humanities majors. Let be the response of a randomly chosen student with a Science & Engineering major, and let be the response of a randomly chosen Arts & Humanities major with population parameters and respectively

(a) (5 points) What is an appropriate distribution for and ?

SOLUTION

(b) (5 points) Let . Interpret this in context.

SOLUTION

: the true proportion of the Science and Engineering students answering “Yes” : the true proportion of the Arts and Humanities students answering “Yes”

As such, the expression represents the true difference between the proportion of the 2 categories of majors believe that AI is going to replace their jobs

(c) (5 points) Write the expression for and compute its value.

SOLUTION

(d) (5 points) Compute , the standard error for the estimator.

SOLUTION

(e) (5 points) Write the expression for a confidence interval for in terms of .

SOLUTION

(f) (5 points) Compute for using a web applet (include screenshot).

SOLUTION

When

Based on the web app, the value for is given

(g) (5 points) Write the final 90% confidence interval for .

SOLUTION

(h) (5 points) Is there sufficient evidence of a difference in perceptions between the groups? Why/why not?

SOLUTION

There are no evidence of a difference in perceptions between the two groups because with the difference’s confidence interval including suggests there is no significant difference between the result of the two groups.


6.

You work at a reputable polling agency covering the elections, and have recently conducted a survey with participants. You collect their responses and compute . Based on your analysis, you find that the margin of error at the confidence level is . You report these results to your boss, who says that the margin of error is too high, and that the margin of error should be less than at the confidence level. Assuming your estimate of remains the same as your original survey, what is the minimum number of participants you need in your new survey to satisfy your boss’s requirements? Provide screenshots justifying the use of any quantiles you use in this problem.

SOLUTION

With a confidence, the

Find the current

With this information, we can find the given a confidence interval to find :

when at confidence


7.

An confidence interval for a population mean, , is given as . This confidence interval is based on a simple random sample of observations. Calculate the sample mean and standard deviation which reproduce this confidence interval. Assume that all conditions necessary for inference are satisfied, and use the student’s t distribution wherever needed. Provide a screenshot justifying the use of any quantiles you use in this problem.

SOLUTION

Finding

Finding

at confidence

For a t-distribution, the Margin of Error is:


8.

The table below summarizes an experiment to answer this question. A total of 20 participants were recruited for the experiments and randomly assigned to two groups of size and . In the first group (), the participants were required to not use any electronic devices for at least an hour before they go to sleep. In the second group (), the participants were asked to doomscroll their favorite social-media platform before falling asleep. A wearable health device collected their sleep quality scores, and the sleep quality data is summarized in the table below:

Groupn
X1270.05.0
Y860.010.0

(a) (5 points) Let . Interpret this in context.

SOLUTION

represents the true mean of group ‘s sleep quality represents the true mean of group ‘s sleep quality

As such, the expression represents the true difference in the mean of sleep quality scores between group and

(b) (5 points) What does 70.0 represent? Is it a parameter or statistic?

SOLUTION

The value represents the sample mean of group ‘s sleep quality score

Since this value is from the observation of the sample space, this is a statistic

(c) (5 points) Write the expression for and compute it.

SOLUTION

(d) (5 points) Write the expression for the confidence interval for in terms of , , and . Compute the degrees of freedom .

SOLUTION

Confidence Interval:

Degree of Freedom:

(e) (5 points) If , what is ? Include a screenshot.

SOLUTION

Based on this, when

(f) (5 points) Compute the 95% confidence interval for .

SOLUTION

(g) (5 points) Interpret the interval. Is there sufficient evidence that doomscrolling impacts sleep scores?

SOLUTION

With confidence, the true difference in mean sleep quality scores between those who avoid electronics and those who doomscroll falls between and points

There is sufficient evidence that doomscrolling impacts sleep scores since the confidence interval did not pass through , suggesting a significant improvement of sleep quality when avoiding electronics before sleep.