{ "cells": [ { "cell_type": "markdown", "id": "968d35ea", "metadata": {}, "source": [ "# Logistic Regression\n", "\n", "**Logistic Regression** is a statistical method used for modeling the probability of a binary outcome. It's a type of generalized linear model (GLM) that predicts the probability that a given instance belongs to a particular category. Despite its name, logistic regression is used for binary classification problems, not regression problems.\n", "\n", "Key Points:\n", "1. **Sigmoid Function**: At its core, logistic regression uses the logistic (or sigmoid) function to squeeze the output of a linear equation between 0 and 1, which can then be interpreted as a probability.\n", "\n", "2. **Applications**: Logistic regression is widely used in fields like medicine (e.g., predicting whether a patient has a disease or not), finance (e.g., predicting loan default), and marketing (e.g., predicting customer churn).\n", "\n", "3. **Assumptions**: Logistic regression assumes linearity of independent variables and log odds, absence of multicollinearity, and that the outcome variable is binary.\n", "\n", "4. **Extensions**: For outcomes with more than two categories, extensions of logistic regression like multinomial and ordinal logistic regression are used.\n", "\n", "Logistic regression provides a simple yet powerful way to determine the effect of multiple predictors on a binary outcome, and it's a foundational algorithm in the world of machine learning and statistics." ] }, { "cell_type": "code", "execution_count": 1, "id": "41eacf21", "metadata": {}, "outputs": [], "source": [ "# Import necessary libraries/modules\n", "\n", "# Load the digits dataset\n", "from sklearn.datasets import load_digits\n", "\n", "# Split data into training and testing sets\n", "from sklearn.model_selection import train_test_split\n", "\n", "# Import the logistic regression model\n", "from sklearn.linear_model import LogisticRegression\n", "\n", "# Import matplotlib for data visualization\n", "import matplotlib.pyplot as plt\n", "\n", "# Import seaborn for enhanced data visualization\n", "import seaborn as sns\n", "\n", "# Import metrics for evaluating the model\n", "from sklearn import metrics\n", "\n", "# Import numpy for numerical operations\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "id": "21db97e1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Image Data Shape (1797, 64)\n", "Label Data Shape (1797,)\n" ] } ], "source": [ "# Load the digits dataset and store it in the 'digits' variable\n", "digits = load_digits()\n", "\n", "# Print the shape of the image data and label data\n", "print(\"Image Data Shape\", digits.data.shape)\n", "print(\"Label Data Shape\", digits.target.shape)\n", "\n", "# Split the dataset into training and testing sets\n", "x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=42)" ] }, { "cell_type": "code", "execution_count": 3, "id": "6d168f01", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
LogisticRegression(max_iter=10000)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LogisticRegression(max_iter=10000)
LogisticRegression(max_iter=200)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LogisticRegression(max_iter=200)