Model Comparison — Weather Warm/Colder Tomorrow (R)

Why this project

This originated as a teaching-and-baselining exercise: before using advanced methods, we should understand how robust classic classifiers behave on a messy, real‑world dataset. The goal was to quantify strengths/weaknesses across bias–variance regimes, get trustworthy reference metrics (Accuracy, AUC), and build intuition about which weather variables actually matter for short‑term temperature changes.

Audience: students and practitioners — from basic stats to ML engineers.
Design principles: consistent splits, transparent pre‑processing, clear metrics, and reproducibility.
Outcome: Random Forest emerged as the most reliable baseline (highest AUC), with wind direction and afternoon temperature/humidity key signals alongside sunshine/evaporation.

Data & preprocessing

Target: WarmerTomorrow (1 if tomorrow’s max temp exceeds today’s; else 0).
Sampling: 10 random locations from 49; 5,000 daily rows sampled; missing values removed; class balance ≈ 1055 : 953 (warm vs not), i.e., reasonably balanced.
Predictors: Sunshine, Evaporation, Min/Max/Temp3pm, Humidity3pm, wind direction features, etc. Date/location retained to test importance but excluded from some models.
Split: 70% train / 30% test, reproducible seed.

Rows

5,000

Train/Test

70/30

Positive:Negative

1055 : 953

Seed

31179762

Potential leakage check: ensure today’s max temperature doesn’t trivially encode tomorrow by construction; keep lag definitions clean.

Models compared

Decision Tree (tree): interpretable baseline; pruned via cross‑validation.
Naive Bayes (e1071): strong with conditionally independent evidence; surprisingly competitive.
Bagging (adabag): reduces variance by aggregating bootstrapped trees.
Boosting (adabag + rpart): focuses on hard cases; good AUC on this data.
Random Forest (randomForest): best overall balance of accuracy/AUC and variable importance stability.
Neural Net (neuralnet): tiny MLP on a compact subset of features.

Results (test set)

Classifier	Accuracy	AUC	Notes
Decision Tree	0.612	0.665	Improves to 0.625 after pruning (CV).
Naive Bayes	0.648	0.685	Simple, fast, decent baseline.
Bagging	0.625	0.703	Variance reduction; solid AUC.
Boosting	0.658	0.729	Best single‑tree ensemble on AUC aside from RF.
Random Forest	0.668	0.744	Top performer overall.
Neural Net (small)	≈0.660	—	Comparable accuracy; RF still ahead on AUC.

Metrics reproduced from the original analysis; minor variation is expected if you re‑sample or change pre‑processing.

ROC curves & AUC

We compute ROC curves with ROCR from model posterior probabilities/scores. On this dataset the bagging and boosting curves overlap substantially; Random Forest dominates the upper‑left area, translating to the best AUC.

Tip: also track balanced accuracy and PR‑AUC if class skew increases.

Feature importance

Across tree‑based models, the consistently influential variables were:

Sunshine, Temp3pm, Humidity3pm, Evaporation, MaxTemp, MinTemp
Wind directions: WindDir9am, WindDir3pm, WindGustDir

Date/location fields contributed little and can be dropped in production to avoid spurious associations.

My classifier (parsimonious RF)

Based on importance, I trained a compact Random Forest using only: Sunshine, Evaporation, Temp3pm, Humidity3pm, MinTemp. This matched the broader RF baseline (≈66–67% accuracy) while remaining simpler to explain and faster to score.

Why RF here? Stable across re‑samples, handles non‑linear interactions, and gives robust importance estimates.
Tuning: consider mtry grid search via caret::train, and calibrate the classification threshold from the ROC Youden point.

Full R code (reproducible)

Drop this into an R script. It mirrors the original analysis and adds optional tuning + calibration. Requires packages: tree, e1071, ROCR, randomForest, adabag, rpart, neuralnet, caret, pROC.

R — data prep, train/test split

rm(list = ls())

library(tree)
library(e1071)
library(ROCR)
library(randomForest)
library(adabag)
library(rpart)
library(neuralnet)
library(caret)
library(pROC)

set.seed(31179762)
WAUS <- read.csv("WarmerTomorrow2022.csv")
L <- as.data.frame(1:49)
L <- L[sample(nrow(L), 10, replace = FALSE), ]
WAUS <- WAUS[(WAUS$Location %in% L[,1]), ]
WAUS <- WAUS[sample(nrow(WAUS), 5000, replace = FALSE), ]
WAUS <- WAUS[complete.cases(WAUS), ]

# target as factor
WAUS$WarmerTomorrow <- factor(WAUS$WarmerTomorrow)

# split 70/30
set.seed(31179762)
idx <- sample(1:nrow(WAUS), 0.7*nrow(WAUS))
WAUS.train <- WAUS[idx,]
WAUS.test  <- WAUS[-idx,]

R — fit models & metrics

# Decision Tree
WAUS.dt.fit  <- tree(WarmerTomorrow ~ ., data = WAUS.train)
WAUS.dt.pred <- predict(WAUS.dt.fit, WAUS.test, type = "class")
WAUS.dt.vec  <- predict(WAUS.dt.fit, WAUS.test, type = "vector")
cm.dt        <- caret::confusionMatrix(WAUS.dt.pred, WAUS.test$WarmerTomorrow, positive = "1")
roc.dt       <- ROCR::performance(ROCR::prediction(WAUS.dt.vec[,2], WAUS.test$WarmerTomorrow), "tpr","fpr")

# Naive Bayes
WAUS.nb.fit  <- naiveBayes(WarmerTomorrow ~ . - WarmerTomorrow, data = WAUS.train)
WAUS.nb.pred <- predict(WAUS.nb.fit, WAUS.test, type = "class")
WAUS.nb.vec  <- predict(WAUS.nb.fit, WAUS.test, type = "raw")
cm.nb        <- caret::confusionMatrix(WAUS.nb.pred, WAUS.test$WarmerTomorrow, positive = "1")
roc.nb       <- ROCR::performance(ROCR::prediction(WAUS.nb.vec[,2], WAUS.test$WarmerTomorrow), "tpr","fpr")

# Bagging
WAUS.bag.fit <- adabag::bagging(WarmerTomorrow ~ . - WarmerTomorrow, data = WAUS.train)
WAUS.bag.pred<- predict(WAUS.bag.fit, WAUS.test)
cm.bag       <- WAUS.bag.pred$confusion
roc.bag      <- ROCR::performance(ROCR::prediction(WAUS.bag.pred$prob[,2], WAUS.test$WarmerTomorrow), "tpr","fpr")

# Boosting
WAUS.boost.fit <- adabag::boosting(WarmerTomorrow ~ . - WarmerTomorrow, data = WAUS.train)
WAUS.boost.pred<- predict.boosting(WAUS.boost.fit, newdata = WAUS.test)
cm.boost       <- WAUS.boost.pred$confusion
roc.boost      <- ROCR::performance(ROCR::prediction(WAUS.boost.pred$prob[,2], WAUS.test$WarmerTomorrow), "tpr","fpr")

# Random Forest
WAUS.rf.fit  <- randomForest(WarmerTomorrow ~ . - WarmerTomorrow, data = WAUS.train)
WAUS.rf.pred <- predict(WAUS.rf.fit, WAUS.test)
WAUS.rf.prob <- predict(WAUS.rf.fit, WAUS.test, type = "prob")
cm.rf        <- caret::confusionMatrix(factor(WAUS.rf.pred), WAUS.test$WarmerTomorrow, positive = "1")
roc.rf       <- ROCR::performance(ROCR::prediction(WAUS.rf.prob[,2], WAUS.test$WarmerTomorrow), "tpr","fpr")

# AUC helper
auc_of <- function(pred){ as.numeric(ROCR::performance(pred, "auc")@y.values) }
auc.dt    <- auc_of(ROCR::prediction(WAUS.dt.vec[,2], WAUS.test$WarmerTomorrow))
auc.nb    <- auc_of(ROCR::prediction(WAUS.nb.vec[,2], WAUS.test$WarmerTomorrow))
auc.bag   <- auc_of(ROCR::prediction(WAUS.bag.pred$prob[,2], WAUS.test$WarmerTomorrow))
auc.boost <- auc_of(ROCR::prediction(WAUS.boost.pred$prob[,2], WAUS.test$WarmerTomorrow))
auc.rf    <- auc_of(ROCR::prediction(WAUS.rf.prob[,2], WAUS.test$WarmerTomorrow))

# Plot ROC
plot(roc.dt, col = "red"); abline(0,1)
plot(roc.nb, add=TRUE, col = "orange")
plot(roc.bag, add=TRUE, col = "green")
plot(roc.boost, add=TRUE, col = "blue")
plot(roc.rf, add=TRUE, col = "purple")
legend("bottomright", legend=c("Decision Tree","Naive Bayes","Bagging","Boosting","Random Forest"), fill=c("red","orange","green","blue","purple"))

R — pruning, my RF, and optional tuning

# Best tree via cross‑validation pruning
WAUS.dt.cv     <- cv.tree(WAUS.dt.fit, FUN = prune.misclass)
WAUS.dt.pruned <- prune.misclass(WAUS.dt.fit, best = 4)
WAUS.dt.pr.pred<- predict(WAUS.dt.pruned, WAUS.test, type = "class")
cm.dt.pr       <- caret::confusionMatrix(factor(WAUS.dt.pr.pred), WAUS.test$WarmerTomorrow, positive = "1")

# My compact RF
WAUS.my.fit <- randomForest(WarmerTomorrow ~ Sunshine + Evaporation + MinTemp + Temp3pm + Humidity3pm, data=WAUS.train)
WAUS.my.pred<- predict(WAUS.my.fit, subset(WAUS.test, select=c("Sunshine","MinTemp","Evaporation","Temp3pm","Humidity3pm")))
cm.my       <- table(Predicted_Class = WAUS.my.pred, Actual_Class = WAUS.test$WarmerTomorrow)

# Optional: caret tuning for RF (mtry grid)
ctrl <- trainControl(method = "repeatedcv", number = 5, repeats = 2, classProbs = TRUE, summaryFunction = twoClassSummary)
WAUS.train$WarmerTomorrow2 <- ifelse(WAUS.train$WarmerTomorrow=="1","yes","no")
set.seed(31179762)
rf.tuned <- train(WarmerTomorrow2 ~ Sunshine + Evaporation + MinTemp + Temp3pm + Humidity3pm,
                  data = WAUS.train,
                  method = "rf",
                  metric = "ROC",
                  trControl = ctrl,
                  tuneGrid = data.frame(mtry = c(2,3,4)))
print(rf.tuned)

# Threshold calibration using Youden J (pROC)
rf_probs <- predict(WAUS.my.fit, WAUS.test, type = "prob")[,2]
roc_obj  <- pROC::roc(WAUS.test$WarmerTomorrow, rf_probs)
coords   <- pROC::coords(roc_obj, "best", ret = c("threshold","sensitivity","specificity"), best.method = "youden")
coords

# Variable importance
importance(WAUS.rf.fit)
varImpPlot(WAUS.rf.fit)