pystats Tutorial
Welcome to the pystats package documentation! This package offers a suite of functions to simulate, analyze and interpret data that follows a normal distribution curve. This package provides similar functionality to R’s rnorm, pnorm, dnorm and qnorm functions.
We will illustrate the usage of the 4 functions in our package pystats with a real-life example. Our example features a high school teacher named Mr. Gittu. Mr. Gittu would like to streamline the process of predicting and evaluating test scores, and our package can help him with this!
Mr. Gittu’s Motivation
Mr. Gittu is an experienced computer science teacher at Hogwarts School of Data Science. He will be administering a standardized test and he knows that the test scores for the class he teaches usually follow a normal distribution with:
A class average of 70%
Standard deviation of 10%
He wants to use the pystats package to streamline the process of predicting and evaluating test scores.
1) Simulating Test Scores Distribution with rnorm
Before administering the test, Mr. Gittu would like to simulate 40 scores so he can understand what results to expect for his class in the next test. Using rnorm, Gittu generates a sample of 40 test scores with a mean of 70 and standard deviation of 10.
from pystats.rnorm import rnorm
# Simulating test scores with n=40, mean=70, and sd=10
simulated_scores = rnorm(n=40, mean=70, sd=10)
print(simulated_scores.round(3))
[59.779 64.988 56.855 78.97 64.681 62.019 56.88 65.127 76.27 73.835
75.903 72.452 61.298 63.411 63.457 66.483 70.928 72.076 76.059 68.037
71.949 80.152 77.864 65.557 62.214 73.207 70.391 63.205 76.003 79.577
93.796 75.118 54.877 68.877 68.325 89.035 56.1 63.115 68.124 82.298]
The rnorm function helped Mr. Gittu generate a simulated dataset. This gives him a realistic expectation of what test scores he may expect for the next test he employs in his class of 40 students.
2) Getting Expected Proportions with pnorm
Mr. Gittu wants all of his students to pass. Unfortunately, there is always a risk of students failing his tests, even when he does his best effort teaching. He wants to assess the risk of failure (i.e. score below 50%) given a randomly selected student with the same normal distribution defined above. Using pnorm, he can get the expected proportion of students getting less than 50% on his test.
from pystats.pnorm import pnorm
# Predicting failure proportion with q=50, mean=70, and sd=10
failure_prop = pnorm(q=50, mean=70, sd=10)
print(round(failure_prop, 3))
0.023
Mr. Gittu also wants the best for his students, so he wants to see the expected proportion of students getting an A on his test, or over 80%.
# Proportion of students expected to get an A with
# q=80, mean=70, sd=10, and lower_tail=False
A_prop = pnorm(q=80, mean=70, sd=10, lower_tail=False)
print(round(A_prop, 3))
0.159
Now Mr. Gittu knows what to expect from his best and worst performing students.
3) Calculating quantiles with qnorm
Mr. Gittu also wants know what exam score corresponds to the 90th percentile. Using qnorm, he can easily find out what test score students need to to be in the top 10% of the class. Using the parameter lower_tail = False, Mr. Gittu can also identify the test score that 90% of the class will score above.
from pystats.qnorm import qnorm
# Predicting upper percentile with p=0.9, mean=70, and sd=10
quantile1 = qnorm(p=0.9, mean=70, sd=10)
print(round(quantile1, 3))
82.816
To be in the top 10% of the class, Mr. Gittu’s students will need to score at least 82.816% on the test.
# Predicting lower percentile with p=0.9, mean=70, sd=10, and lower_tail=False
quantile2 = qnorm(p=0.9, mean=70, sd=10, lower_tail=False)
print(round(quantile2, 3))
57.184
Conversely, 90% of Mr. Gittus’ students should score above 57.184% on the test.
4) Calculate probabilities or identify outliers using dnorm
After the test, Mr Gittu finds out that the actual mean was 68% with a standard deviation of 11%, not bad! He’s using dnorm to figure out how much of an outlier certain scores are. One of his students went above and beyond and scored 100%! He wants to know how unusual a score of 100% is to congratulate his student.
from pystats.dnorm import dnorm
# Predicting point probability with x=100, mean=68, and sd=11
result = dnorm(x=100, mean=68, sd=11)
print(result.round(3))
x PDF
0 100 0.001
This output shows the probability density function (PDF) value of a normal distribution for a score of 100%, given a mean of 68% and a standard deviation of 11%. The result indicates that the PDF value at x=100 is approximately 0.001. This confirms that 100% is on the right tail end of the normal distribution curve, which is why it has a low probability density. Impressive!
Final Remarks
The pystats team hopes you find these examples helpful. If the example with Mr. Gittu’s test scores didn’t answer all your questions, we suggest looking through the function documentation.