Logistic Regression Model Application in Research

 

Logistic Regression Model for Bank loan Default Selection

A logit model (or logistic regression model) is a statistical method used to predict the probability of a binary outcome (like yes/no, success/failure) based on one or more independent variables. It transforms a linear combination of predictors into probabilities between 0 and 1 using the logistic function.

Example

Imagine a bank wants to predict whether a loan applicant will default:

  • Dependent variable: Default (1 = yes, 0 = no)
  • Independent variables: Income, credit score, age
  • The logit model estimates how these factors influence the probability of default.

Let’s build a logit model example for predicting bank loan default using the three independent variables we mentioned: Income, Credit Score, and Age.

Example: Bank Default Prediction

Model Setup

  • Dependent variable (Y): Default (1 = default, 0 = no default)
  • Independent variables (X):
    • (X_1): Income (in $1000s per month)
    • (X_2): Credit Score (scaled 300–850)
    • (X_3): Age (in years)

Example Calculation

For a borrower with:

  • Income = 5 ($5000/month)
  • Credit Score = 650
  • Age = 40

The logit is: [ \text{logit}(p) = 2.0 + (-0.3)(5) + (-0.01)(650) + (0.05)(40) ] [ = 2.0 - 1.5 - 6.5 + 2.0 = -4.0 ]

Convert to probability: [ p = \frac{1}{1 + e^{-(-4.0)}} = \frac{1}{1 + e^{4.0}} \approx 0.018 ]

Probability of default ≈ 1.8%

Interpretation

  • Income effect: Each extra $1000/month reduces odds of default by about 30%.
  • Credit score effect: Each point increase reduces odds by ~1%.
  • Age effect: Each year older increases odds by ~5%.

So, this borrower has a low risk of default given their decent income and moderate credit score.

simulate a small dataset (say 10 borrowers) and show how the logit model assigns different default probabilities across them?

Simulated Bank Default Probabilities

Borrower ID

Income ($1000s)

Credit Score

Age

Probability of Default

1

5.00

608

41

0.0285

2

9.61

643

63

0.0153

3

7.86

791

44

0.0023

4

6.79

713

68

0.0226

5

3.25

685

46

0.0286

6

3.25

491

61

0.3026

7

2.46

576

47

0.1045

8

8.93

460

35

0.0285

9

6.81

759

34

0.0026

10

7.66

613

66

0.0420

Insights

  • Borrower 6 has the highest risk (30%) because of low income and very low credit score.
  • Borrower 3 and 9 have very low risk (<0.3%) thanks to strong credit scores and decent income.
  • Age effect: Older borrowers (like Borrower 10 at age 66) show slightly higher probabilities compared to younger ones with similar profiles.
  • Income effect: Higher income generally reduces default risk (Borrower 2 with $9.61k/month has only 1.5% risk).
  • Credit score effect: This is the strongest driver—low scores sharply increase default probability even if income is moderate.

Step‑by‑Step in SPSS GUI

  1. Open your dataset
    • Make sure your dependent variable (Default) is coded as 0 = No Default, 1 = Default.
    • Independent variables: Income, CreditScore, Age.
  2. Navigate to Logistic Regression
    • Click on Analyze in the top menu.
    • Go to Regression → Binary Logistic…
  3. Select the Dependent Variable
    • In the dialog box, move Default into the Dependent field.
  4. Select the Independent Variables
    • Move Income, CreditScore, and Age into the Covariates field.
  5. Choose Method
    • By default, SPSS uses Enter (all predictors entered at once).
    • You can also choose Forward or Backward stepwise methods if you want SPSS to select variables automatically.
  6. Set Options (Optional)
    • Click Options… to request classification tables, Hosmer-Lemeshow test, confidence intervals, etc.
    • Click Save… if you want SPSS to generate predicted probabilities or group membership for each case.
  7. Run the Analysis
    • Click OK to run the logistic regression.

SPSS Output

  • Omnibus Test of Model Coefficients → checks if predictors improve the model.
  • Model Summary → −2 Log Likelihood, Cox & Snell R², Nagelkerke R².
  • Classification Table → how well the model predicts default vs no default.
  • Variables in the Equation → coefficients (β), standard errors, Wald test, odds ratios (Exp(β)).

Suggestion:

Any bank can apply this model to evaluate loan applicants by defining a set of independent variables and entering the relevant data into SPSS. The analyzed results can then be reviewed to support informed decision-making on whether to approve a loan for a particular applicant.

Comments

Popular posts from this blog

PhD Tips

Differences between the terms Science, Technology, and Engineering

Political Economy in Procurement Perspective