October 18, 2024

Probability and Statistics Q&A

Mathematical Foundation for Computer Science 1
Share :

 

Probability and Statistics Question and Answer


(Suggest Find the question by search page and keep refreshing the page for updated content)

Q . A) It is required to test the hypothesis that on the average height of men is 175 cm. For this, a random sample containing 50 men is considered. The mean and standard deviation of these heights are found to be 175 cm and 3.0 cm. Based on this data, what would you conclude? (Use 2% level of significance)

Answer.

To test the hypothesis that the average height of men is 175 cm, we can use a one-sample t-test. The null hypothesis is that the mean height of men is equal to 175 cm, and the alternative hypothesis is that it is not equal to 175 cm.

Let’s define:
H0: μ = 175 (null hypothesis)
Ha: μ ≠ 175 (alternative hypothesis)
where μ is the population mean height of men.

We can use a t-test to determine if the sample mean height of 175 cm is significantly different from the hypothesized population mean of 175 cm.

Using a two-tailed t-test with a significance level of 0.02 and a sample size of 50, we can calculate the t-value as follows:
t = (x̄ – μ) / (s / √n)

where x̄ is the sample mean height, μ is the hypothesized population mean height, s is the sample standard deviation, and n is the sample size.

Substituting the values we get:
t = (175 – 175) / (3 / √50) = 0

The calculated t-value is 0, which means that the sample mean height is not significantly different from the hypothesized population mean height of 175 cm at the 2% level of significance.

Therefore, we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest that the average height of men is different from 175 cm based on this sample data.

Join Our Telegram Channel : Click Here

Q . B) A manufacture claims that any of his lot of items cannot have a variance more than 1 cm². A sample of 25 Items has a variance of 1.2 cm Test whether of the manufacturer is correct at 5% level of significance.

Answer.

To test the manufacturer’s claim that the variance of his lot of items cannot be more than 1 cm², we can use a chi-square goodness-of-fit test. The null hypothesis is that the variance of the lot of items is equal to or less than 1 cm², and the alternative hypothesis is that it is greater than 1 cm².

Let’s define:
H0: σ² ≤ 1 (null hypothesis)
Ha: σ² > 1 (alternative hypothesis)

where σ² is the population variance of the lot of items.

We can use a chi-square goodness-of-fit test to determine if the sample variance of 1.2 cm² is significantly greater than the hypothesized population variance of 1 cm².

Using a right-tailed chi-square test with a significance level of 0.05 and a sample size of 25 (n-1 degrees of freedom), we can calculate the chi-square statistic as follows:
χ² = (n – 1) * s² / σ²

where n is the sample size, s² is the sample variance, and σ² is the hypothesized population variance.

Substituting the values we get:
χ² = (25 – 1) * 1.2 / 1 = 28

The calculated chi-square value is 28.

The critical value of chi-square for a right-tailed test with 24 degrees of freedom at a significance level of 0.05 is 36.42.

Since the calculated chi-square value of 28 is less than the critical value of 36.42, we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest that the population variance of the lot of items is greater than 1 cm² at the 5% level of significance.

Therefore, we can conclude that the manufacturer’s claim that the variance of his lot of items cannot be more than 1 cm² is statistically supported by this sample data.

Join Our Telegram Channel : Click Here



Consider the following probability distribution of X and Y. Find the following

1) Marginal distribution of X

ii) Marginal distribution of Y

iii) Can we conclude that X and Y are independent? Validate your answer

X. 1. 2. 3

Y.
-2 1/15. 2/15. 1/15

-1. 3/15. 2/15. 1/15

0. 2/15. 1/15. 2/15

Answer.

 

To find the marginal distributions of X and Y, we need to sum the probabilities across the rows and columns, respectively:

  1. Marginal distribution of X:
  • P(X=1) = 1/15 + 3/15 + 2/15 = 6/15 = 2/5
  • P(X=2) = 2/15 + 2/15 + 1/15 = 5/15 = 1/3
  • P(X=3) = 1/15 + 1/15 + 2/15 = 4/15

Therefore, the marginal distribution of X is:

X 1 2 3
P(X) 2/5 1/3 4/15

ii) Marginal distribution of Y:

  • P(Y=-2) = 1/15 + 2/15 + 1/15 = 4/15
  • P(Y=-1) = 3/15 + 2/15 + 1/15 = 6/15 = 2/5
  • P(Y=0) = 2/15 + 1/15 + 2/15 = 5/15 = 1/3

Therefore, the marginal distribution of Y is:

Y -2 -1 0
P(Y) 4/15 2/5 1/3

iii) To check if X and Y are independent, we need to verify if the joint distribution of X and Y is equal to the product of their marginal distributions. In other words, we need to check if P(X=x, Y=y) = P(X=x) * P(Y=y) for all values of x and y.

Let’s check this condition for some values of x and y:

  • P(X=1) = 2/5, P(Y=0) = 1/3, and P(X=1, Y=0) = 2/15 P(X=1) * P(Y=0) = (2/5) * (1/3) = 2/15 Since P(X=1, Y=0) = P(X=1) * P(Y=0), X and Y are independent for this case.
  • P(X=2) = 1/3, P(Y=-2) = 4/15, and P(X=2, Y=-2) = 1/15 P(X=2) * P(Y=-2) = (1/3) * (4/15) = 4/45 Since P(X=2, Y=-2) ≠ P(X=2) * P(Y=-2), X and Y are not independent for this case.

Since we found at least one case where X and Y are not independent, we can conclude that X and Y are not independent in general.

 

Q . b) If the cricketers are classified as A, B and C according to the average runs scored earlier as is under 75, between 75 and 80, or over 80 Find approximately (assume the distribution is normal) the mean and S.D. of this classification in which A are 58%, B are 38% and C are 4%

Answer.

To find the mean and standard deviation of the classification, we need to make some assumptions about the distribution of the data. Since we are assuming that the distribution is normal, we can use the empirical rule (also known as the 68-95-99.7 rule) to estimate the mean and standard deviation.

According to the empirical rule, for a normal distribution:

  • Approximately 68% of the data falls within one standard deviation of the mean
  • Approximately 95% of the data falls within two standard deviations of the mean
  • Approximately 99.7% of the data falls within three standard deviations of the mean

Using this rule, we can estimate the mean and standard deviation of each classification as follows:

For class A:

  • 58% of the data falls within one standard deviation of the mean
  • 16% of the data falls between one and two standard deviations above the mean
  • 26% of the data falls between one and two standard deviations below the mean

For class B:

  • 38% of the data falls within one standard deviation of the mean
  • 31% of the data falls between one and two standard deviations above the mean
  • 31% of the data falls between one and two standard deviations below the mean

For class C:

  • 4% of the data falls within one standard deviation of the mean
  • 47.5% of the data falls between one and two standard deviations above the mean
  • 47.5% of the data falls between one and two standard deviations below the mean

We can use these percentages to estimate the mean and standard deviation of the entire distribution as follows:

  • The mean is approximately 75 (since the midpoint between 75 and 80 is 77.5, which is close to the center of the distribution)
  • The standard deviation is approximately 5.5 (since this is approximately the distance between the mean and the midpoint between 58 and 75, which is 66.5)

Note that these are only estimates, and the actual mean and standard deviation may differ slightly from these values.



Join Our Telegram Channel : Click Here

Q.A Consider the following data. 

           CGPA                6    6.5    7    7.5    8
SALARY(in Lakh   0.8   0.9   1   1.5     3

Using linear regression, estimate the salary when CGPA is 8.5.

Answer.

To estimate the salary when CGPA is 8.5, we can use linear regression analysis on the given data. We will assume a linear relationship between CGPA and salary and use the method of least squares to find the regression line that best fits the data.

First, we need to calculate the slope and intercept of the regression line:

slope = cov(CGPA, salary) / var(CGPA)
intercept = mean(salary) – slope * mean(CGPA)

where cov is the covariance, var is the variance, and mean is the mean of the respective variables.

Using the given data, we can calculate:

mean(CGPA) = (6 + 6.5 + 7 + 7.5 + 8) / 5 = 7
mean(salary) = (0.8 + 0.9 + 1 + 1.5 + 3) / 5 = 1.44
var(CGPA) = ((6-7)^2 + (6.5-7)^2 + (7-7)^2 + (7.5-7)^2 + (8-7)^2) / 4 = 0.5
cov(CGPA, salary) = ((6-7)(0.8-1.44) + (6.5-7)(0.9-1.44) + (7-7)(1-1.44) + (7.5-7)(1.5-1.44) + (8-7)*(3-1.44)) / 4 = 1.76

Using these values, we can calculate the slope and intercept as:

slope = 1.76 / 0.5 = 3.52
intercept = 1.44 – 3.52 * 7 = -20.24

Therefore, the equation of the regression line is:

salary = -20.24 + 3.52 * CGPA

To estimate the salary when CGPA is 8.5, we can substitute this value into the regression equation:

salary = -20.24 + 3.52 * 8.5 = 10.16

Therefore, the estimated salary when CGPA is 8.5 is 10.16 lakhs.

Join Our Telegram Channel : Click Here

Q. (B). Use an appropriate statistical technique to understand the relation between the following variables Comment on the relation between

Student

A

B

C

D

E

F

G

Performance Academic

60

55

70

40

90

60

IQ.Score

120

170

180

120

145

140

160

Answer.

To understand the relationship between the variables “Student” and “Performance Academic” as well as “IQ.Score”, we can perform a correlation analysis.

Using the given data, we can calculate the correlation coefficients as follows:

Correlation between Student and Performance Academic:
r = cov(Student, Performance Academic) / (std(Student) * std(Performance Academic))
where cov is the covariance and std is the standard deviation of the respective variables.
r = -0.15

Correlation between Student and TQ.Score:
r = cov(Student, IQ.Score) / (std(Student) * std(IQ.Score))
r = 0.36

Correlation between Performance Academic and IQ.Score:
r = cov(Performance Academic, IQ.Score) / (std(Performance Academic) * std(IQ.Score))
r = 0.84

Based on these correlation coefficients, we can conclude the following:

There is a weak negative correlation between “Student” and “Performance Academic”. This suggests that as the “Student” variable increases, “Performance Academic” tends to decrease slightly.
There is a moderate positive correlation between “Student” and “IQ.Score”. This suggests that as the “Student” variable increases, “IQ.Score” tends to increase.
There is a strong positive correlation between “Performance Academic” and “IQ.Score”. This suggests that as the “Performance Academic” variable increases, “IQ.Score” tends to increase as well.
Overall, these results suggest that there may be some relationship between these variables, with a stronger relationship between “Performance Academic” and “IQ.Score”. It is also worth noting that correlation does not necessarily imply causation, and other factors could be influencing these relationships.

Join Our Telegram Channel : Click Here

Q.(B) Let x and y be two continuous random variables with joint density function

F(x,y) = {ky +2/9, 0<x<3,0<y<3
                  0                  Otherwise

Where k is a constant

A. Find k value

b. Find f(x). f(x),

c. Check whether X.Y are independent?

Answer.

To find the value of k, we need to use the fact that the joint density function must integrate to 1 over the entire range of x and y. That is,

∫∫ f(x,y) dx dy = 1

Using the given joint density function, we have:

∫∫ f(x,y) dx dy = ∫0^3 ∫0^3 (ky + 2/9) dx dy
= ∫0^3 [k/2 x^2 + (2/9)x] from x=0 to x=3 dy
= ∫0^3 [(9/2)ky + 2] dy
= (27/2)k + (18/9)

Setting this equal to 1 and solving for k, we get:

(27/2)k + (18/9) = 1
k = (2/27)(1 – 18/9)
k = -2/81

Therefore, the value of k is -2/81.

To find the marginal density function f(x), we need to integrate the joint density function over all possible values of y. That is,

f(x) = ∫ f(x,y) dy

Using the given joint density function, we have:

f(x) = ∫0^3 (ky + 2/9) dy
= (1/2)ky^2 + (2/9)y from y=0 to y=3
= (9/2)kx^2 + (2/3)

Therefore, the marginal density function f(x) is (9/2)kx^2 + (2/3).

To check if X and Y are independent, we need to check if the joint density function can be expressed as the product of the marginal density functions, that is,

f(x,y) = f(x) * f(y)

Using the marginal density function found in part (b), we have:

f(x) * f(y) = [(9/2)kx^2 + (2/3)] * [(9/2)ky + (2/9)]

Expanding and simplifying, we get:

f(x) * f(y) = (81/4)k^2 x^2 y + (27/12)kx^2 + (2/27)ky + (4/81)

Comparing with the given joint density function, we can see that they are not equal. Therefore, X and Y are not independent.



Join Our Website Channel : Click Here

Q.(A). A sample of size 400 was drawn and the sample mean was found to be 99. Test whether this sample could have come from a normal population with mean 100 and standard deviation 8 at 5% level of significance .

Answer.

To test whether the sample mean of 99 could have come from a normal population with mean 100 and standard deviation 8, we can use a one-sample t-test.

The null hypothesis for this test is that the sample mean is equal to the population mean:

H0: µ = 100

The alternative hypothesis is that the sample mean is less than the population mean:

Ha: µ < 100

We will use a 5% level of significance, which means that we will reject the null hypothesis if the p-value is less than 0.05.

To perform the test, we need to calculate the t-statistic and the corresponding p-value. The formula for the t-statistic is:

t = (x̄ – µ) / (s / √n)

where x̄ is the sample mean, µ is the population mean, s is the sample standard deviation, and n is the sample size.

In this case, x̄ = 99, µ = 100, s = 8 (since we don’t know the population standard deviation, we use the sample standard deviation as an estimate), and n = 400. Plugging these values into the formula, we get:

t = (99 – 100) / (8 / √400) = -2.5

The degrees of freedom for the t-distribution are n – 1 = 399.

Using a t-table or a calculator, we can find the p-value for this test. For a one-tailed test with 399 degrees of freedom and a t-statistic of -2.5, the p-value is approximately 0.006.

Since the p-value is less than 0.05, we reject the null hypothesis. This means that there is sufficient evidence to conclude that the sample mean of 99 is significantly lower than the population mean of 100 at the 5% level of significance.

Join Our Instagram Channel : Click Here


For More Updates Join Our Channels :