{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# pystats Tutorial\n",
    "\n",
    "Welcome to the `pystats` package documentation! This package offers a suite of functions to simulate, analyze and interpret data that follows a normal distribution curve. This package provides similar functionality to R's `rnorm`, `pnorm`, `dnorm` and `qnorm` functions. \n",
    "\n",
    "We will illustrate the usage of the 4 functions in our package `pystats` with a real-life example. Our example features a high school teacher named Mr. Gittu. Mr. Gittu would like to streamline the process of predicting and evaluating test scores, and our package can help him with this! \n",
    "\n",
    "## Mr. Gittu's Motivation\n",
    "Mr. Gittu is an experienced computer science teacher at Hogwarts School of Data Science. He will be administering a standardized test and he knows that the test scores for the class he teaches usually follow a normal distribution with:\n",
    "- A class average of 70% \n",
    "- Standard deviation of 10%\n",
    "\n",
    "He wants to use the `pystats` package to streamline the process of predicting and evaluating test scores. \n",
    "\n",
    "### 1) Simulating Test Scores Distribution with `rnorm`\n",
    "Before administering the test, Mr. Gittu would like to simulate 40 scores so he can understand what results to expect for his class in the next test. Using `rnorm`, Gittu generates a sample of 40 test scores with a mean of 70 and standard deviation of 10. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[59.779 64.988 56.855 78.97  64.681 62.019 56.88  65.127 76.27  73.835\n",
      " 75.903 72.452 61.298 63.411 63.457 66.483 70.928 72.076 76.059 68.037\n",
      " 71.949 80.152 77.864 65.557 62.214 73.207 70.391 63.205 76.003 79.577\n",
      " 93.796 75.118 54.877 68.877 68.325 89.035 56.1   63.115 68.124 82.298]\n"
     ]
    }
   ],
   "source": [
    "from pystats.rnorm import rnorm\n",
    "# Simulating test scores with n=40, mean=70, and sd=10\n",
    "simulated_scores = rnorm(n=40, mean=70, sd=10)\n",
    "\n",
    "print(simulated_scores.round(3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `rnorm` function helped Mr. Gittu generate a simulated dataset. This gives him a realistic expectation of what test scores he may expect for the next test he employs in his class of 40 students. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2) Getting Expected Proportions with `pnorm`\n",
    "\n",
    "Mr. Gittu wants all of his students to pass. Unfortunately, there is always a risk of students failing his tests, even when he does his best effort teaching. He wants to assess the risk of failure (i.e. score below 50%) given a randomly selected student with the same normal distribution defined above. Using `pnorm`, he can get the expected proportion of students getting less than 50% on his test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.023\n"
     ]
    }
   ],
   "source": [
    "from pystats.pnorm import pnorm\n",
    "# Predicting failure proportion with q=50, mean=70, and sd=10\n",
    "failure_prop = pnorm(q=50, mean=70, sd=10)\n",
    "print(round(failure_prop, 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mr. Gittu also wants the best for his students, so he wants to see the expected proportion of students getting an A on his test, or over 80%."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.159\n"
     ]
    }
   ],
   "source": [
    "# Proportion of students expected to get an A with \n",
    "# q=80, mean=70, sd=10, and lower_tail=False\n",
    "A_prop = pnorm(q=80, mean=70, sd=10, lower_tail=False)\n",
    "print(round(A_prop, 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now Mr. Gittu knows what to expect from his best and worst performing students."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3) Calculating quantiles with `qnorm`\n",
    "\n",
    "Mr. Gittu also wants know what exam score corresponds to the 90th percentile. Using `qnorm`, he can easily find out what test score students need to to be in the top 10% of the class. Using the parameter `lower_tail = False`, Mr. Gittu can also identify the test score that 90% of the class will score above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "82.816\n"
     ]
    }
   ],
   "source": [
    "from pystats.qnorm import qnorm\n",
    "\n",
    "# Predicting upper percentile with p=0.9, mean=70, and sd=10\n",
    "quantile1 = qnorm(p=0.9, mean=70, sd=10)\n",
    "print(round(quantile1, 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To be in the top 10% of the class, Mr. Gittu's students will need to score at least 82.816% on the test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "57.184\n"
     ]
    }
   ],
   "source": [
    "# Predicting lower percentile with p=0.9, mean=70, sd=10, and lower_tail=False\n",
    "quantile2 = qnorm(p=0.9, mean=70, sd=10, lower_tail=False)\n",
    "print(round(quantile2, 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Conversely, 90% of Mr. Gittus' students should score above 57.184% on the test."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4) Calculate probabilities or identify outliers using `dnorm` \n",
    "\n",
    "After the test, Mr Gittu finds out that the actual mean was 68% with a standard deviation of 11%, not bad! He's using `dnorm` to figure out how much of an outlier certain scores are. One of his students went above and beyond and scored 100%! He wants to know how unusual a score of 100% is to congratulate his student."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "     x    PDF\n",
      "0  100  0.001\n"
     ]
    }
   ],
   "source": [
    "from pystats.dnorm import dnorm\n",
    "# Predicting point probability with x=100, mean=68, and sd=11\n",
    "result = dnorm(x=100, mean=68, sd=11)\n",
    "print(result.round(3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This output shows the probability density function (PDF) value of a normal distribution for a score of 100%, given a mean of 68% and a standard deviation of 11%. The result indicates that the PDF value at x=100 is approximately 0.001. This confirms that 100% is on the right tail end of the normal distribution curve, which is why it has a low probability density. Impressive!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Final Remarks\n",
    "\n",
    "The `pystats` team hopes you find these examples helpful. If the example with Mr. Gittu's test scores didn't answer all your questions, we suggest looking through the function documentation."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}