{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# pystats Tutorial\n", "\n", "Welcome to the `pystats` package documentation! This package offers a suite of functions to simulate, analyze and interpret data that follows a normal distribution curve. This package provides similar functionality to R's `rnorm`, `pnorm`, `dnorm` and `qnorm` functions. \n", "\n", "We will illustrate the usage of the 4 functions in our package `pystats` with a real-life example. Our example features a high school teacher named Mr. Gittu. Mr. Gittu would like to streamline the process of predicting and evaluating test scores, and our package can help him with this! \n", "\n", "## Mr. Gittu's Motivation\n", "Mr. Gittu is an experienced computer science teacher at Hogwarts School of Data Science. He will be administering a standardized test and he knows that the test scores for the class he teaches usually follow a normal distribution with:\n", "- A class average of 70% \n", "- Standard deviation of 10%\n", "\n", "He wants to use the `pystats` package to streamline the process of predicting and evaluating test scores. \n", "\n", "### 1) Simulating Test Scores Distribution with `rnorm`\n", "Before administering the test, Mr. Gittu would like to simulate 40 scores so he can understand what results to expect for his class in the next test. Using `rnorm`, Gittu generates a sample of 40 test scores with a mean of 70 and standard deviation of 10. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "vscode": { "languageId": "plaintext" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[59.779 64.988 56.855 78.97 64.681 62.019 56.88 65.127 76.27 73.835\n", " 75.903 72.452 61.298 63.411 63.457 66.483 70.928 72.076 76.059 68.037\n", " 71.949 80.152 77.864 65.557 62.214 73.207 70.391 63.205 76.003 79.577\n", " 93.796 75.118 54.877 68.877 68.325 89.035 56.1 63.115 68.124 82.298]\n" ] } ], "source": [ "from pystats.rnorm import rnorm\n", "# Simulating test scores with n=40, mean=70, and sd=10\n", "simulated_scores = rnorm(n=40, mean=70, sd=10)\n", "\n", "print(simulated_scores.round(3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `rnorm` function helped Mr. Gittu generate a simulated dataset. This gives him a realistic expectation of what test scores he may expect for the next test he employs in his class of 40 students. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2) Getting Expected Proportions with `pnorm`\n", "\n", "Mr. Gittu wants all of his students to pass. Unfortunately, there is always a risk of students failing his tests, even when he does his best effort teaching. He wants to assess the risk of failure (i.e. score below 50%) given a randomly selected student with the same normal distribution defined above. Using `pnorm`, he can get the expected proportion of students getting less than 50% on his test." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "vscode": { "languageId": "plaintext" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.023\n" ] } ], "source": [ "from pystats.pnorm import pnorm\n", "# Predicting failure proportion with q=50, mean=70, and sd=10\n", "failure_prop = pnorm(q=50, mean=70, sd=10)\n", "print(round(failure_prop, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mr. Gittu also wants the best for his students, so he wants to see the expected proportion of students getting an A on his test, or over 80%." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "vscode": { "languageId": "plaintext" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.159\n" ] } ], "source": [ "# Proportion of students expected to get an A with \n", "# q=80, mean=70, sd=10, and lower_tail=False\n", "A_prop = pnorm(q=80, mean=70, sd=10, lower_tail=False)\n", "print(round(A_prop, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now Mr. Gittu knows what to expect from his best and worst performing students." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3) Calculating quantiles with `qnorm`\n", "\n", "Mr. Gittu also wants know what exam score corresponds to the 90th percentile. Using `qnorm`, he can easily find out what test score students need to to be in the top 10% of the class. Using the parameter `lower_tail = False`, Mr. Gittu can also identify the test score that 90% of the class will score above." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "vscode": { "languageId": "plaintext" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "82.816\n" ] } ], "source": [ "from pystats.qnorm import qnorm\n", "\n", "# Predicting upper percentile with p=0.9, mean=70, and sd=10\n", "quantile1 = qnorm(p=0.9, mean=70, sd=10)\n", "print(round(quantile1, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To be in the top 10% of the class, Mr. Gittu's students will need to score at least 82.816% on the test." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "vscode": { "languageId": "plaintext" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "57.184\n" ] } ], "source": [ "# Predicting lower percentile with p=0.9, mean=70, sd=10, and lower_tail=False\n", "quantile2 = qnorm(p=0.9, mean=70, sd=10, lower_tail=False)\n", "print(round(quantile2, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Conversely, 90% of Mr. Gittus' students should score above 57.184% on the test." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4) Calculate probabilities or identify outliers using `dnorm` \n", "\n", "After the test, Mr Gittu finds out that the actual mean was 68% with a standard deviation of 11%, not bad! He's using `dnorm` to figure out how much of an outlier certain scores are. One of his students went above and beyond and scored 100%! He wants to know how unusual a score of 100% is to congratulate his student." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "vscode": { "languageId": "plaintext" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " x PDF\n", "0 100 0.001\n" ] } ], "source": [ "from pystats.dnorm import dnorm\n", "# Predicting point probability with x=100, mean=68, and sd=11\n", "result = dnorm(x=100, mean=68, sd=11)\n", "print(result.round(3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This output shows the probability density function (PDF) value of a normal distribution for a score of 100%, given a mean of 68% and a standard deviation of 11%. The result indicates that the PDF value at x=100 is approximately 0.001. This confirms that 100% is on the right tail end of the normal distribution curve, which is why it has a low probability density. Impressive!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Final Remarks\n", "\n", "The `pystats` team hopes you find these examples helpful. If the example with Mr. Gittu's test scores didn't answer all your questions, we suggest looking through the function documentation." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.10" } }, "nbformat": 4, "nbformat_minor": 4 }