Chapter 8: Baseline Experiments¶

Purpose: Train baseline models to understand data predictability and establish performance benchmarks.

What you'll learn:

  • How to prepare data for ML with proper train/test splitting
  • How to handle class imbalance with class weights
  • How to evaluate models with appropriate metrics (not just accuracy!)
  • How to interpret feature importance

Outputs:

  • Baseline model performance (AUC, Precision, Recall, F1)
  • Feature importance rankings
  • ROC and Precision-Recall curves
  • Performance benchmarks for comparison

Evaluation Metrics for Imbalanced Data¶

Metric What It Measures When to Use
AUC-ROC Ranking quality across thresholds General model comparison
Precision "Of predicted churned, how many are correct?" When false positives are costly
Recall "Of actual churned, how many did we catch?" When missing churners is costly
F1-Score Balance of precision and recall When both matter equally
PR-AUC Precision-Recall under curve Better for imbalanced data

8.1 Setup¶

In [1]:
Show/Hide Code
from customer_retention.analysis.notebook_progress import track_and_export_previous

track_and_export_previous("08_baseline_experiments.ipynb")

import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    average_precision_score,
    classification_report,
    f1_score,
    precision_score,
    recall_score,
    roc_auc_score,
)
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler

from customer_retention.analysis.auto_explorer import ExplorationFindings
from customer_retention.analysis.visualization import ChartBuilder, display_figure, display_table
from customer_retention.core.config.column_config import NON_FEATURE_COLUMN_TYPES, ColumnType
from customer_retention.core.config.experiments import (
    FINDINGS_DIR,
)
In [2]:
Show/Hide Code
from pathlib import Path

from customer_retention.analysis.auto_explorer import load_notebook_findings, resolve_target_column

FINDINGS_PATH, _namespace, dataset_name = load_notebook_findings(
    "08_baseline_experiments.ipynb", prefer_aggregated=True
)
print(f"Using: {FINDINGS_PATH}")

findings = ExplorationFindings.load(FINDINGS_PATH)
target = resolve_target_column(_namespace, findings)

# Load data - prefer aggregated entity-level data for modeling
from customer_retention.analysis.auto_explorer.active_dataset_store import load_active_dataset
from customer_retention.core.config.column_config import DatasetGranularity
from customer_retention.stages.temporal import TEMPORAL_METADATA_COLS

if "_aggregated" in FINDINGS_PATH:
    source_path = Path(findings.source_path)
    if not source_path.is_absolute():
        source_path = Path("..") / source_path
    if source_path.is_dir():
        from customer_retention.integrations.adapters.factory import get_delta
        df = get_delta(force_local=True).read(str(source_path))
    elif source_path.is_file():
        df = pd.read_parquet(source_path)
    else:
        df = load_active_dataset(_namespace, dataset_name)
    data_source = f"aggregated:{source_path.name}"
elif dataset_name is None and _namespace:
    from customer_retention.integrations.adapters.factory import get_delta
    df = get_delta(force_local=True).read(str(_namespace.silver_merged_path))
    data_source = "silver_merged"
else:
    df = load_active_dataset(_namespace, dataset_name)
    data_source = dataset_name

charts = ChartBuilder()

print(f"\nLoaded {len(df):,} rows from: {data_source}")
Using: /Users/Vital/python/CustomerRetention/experiments/runs/retail-e7471284/datasets/customer_retention_retail/findings/customer_retention_retail_aggregated_findings.yaml
Loaded 30,770 rows from: aggregated:customer_retention_retail_aggregated

8.2 Prepare Data for Modeling¶

📖 Feature Source:

Features used in this notebook come from the ExplorationFindings generated in earlier notebooks:

  • Column types are auto-detected in notebook 01 (Data Discovery)
  • Target column is identified from the findings
  • Identifier columns are excluded to prevent data leakage
  • Text columns are excluded (require specialized NLP processing)

📖 Best Practices:

  1. Stratified Split: Maintains class ratios in train/test sets
  2. Scale After Split: Fit scaler on train only (prevents data leakage)
  3. Handle Missing: Impute or drop before scaling

📖 Transformations Applied:

  • Categorical variables → Label Encoded
  • Missing values → Median (numeric) or Mode (categorical)
  • Features → StandardScaler (fit on train only)
In [3]:
Show/Hide Code
if not target:
    raise ValueError("No target column set. Please define one in exploration notebooks.")

y = df[target]

feature_cols = [
    name for name, col in findings.columns.items()
    if col.inferred_type not in NON_FEATURE_COLUMN_TYPES
    and name not in TEMPORAL_METADATA_COLS
]

print("=" * 70)
print("FEATURE SELECTION FROM FINDINGS")
print("=" * 70)
print(f"\n  Target Column: {target}")
print(f"  Features Selected: {len(feature_cols)}")

type_counts = {}
for name in feature_cols:
    col_type = findings.columns[name].inferred_type.value
    type_counts[col_type] = type_counts.get(col_type, 0) + 1

print("\n  Features by Type:")
for col_type, count in sorted(type_counts.items()):
    print(f"   {col_type}: {count}")

excluded = [name for name, col in findings.columns.items()
            if col.inferred_type in NON_FEATURE_COLUMN_TYPES]
if excluded:
    print(f"\n  Excluded Columns ({len(excluded)}): {', '.join(excluded[:10])}{'...' if len(excluded) > 10 else ''}")
======================================================================
FEATURE SELECTION FROM FINDINGS
======================================================================

  Target Column: retained
  Features Selected: 394

  Features by Type:
   binary: 47
   categorical_nominal: 1
   numeric_continuous: 79
   numeric_discrete: 267

  Excluded Columns (12): custid, retained, esent_middle, eopenrate_middle, eclickrate_middle, avgorder_middle, ordfreq_middle, paperless_middle, refill_middle, doorstep_middle...
In [4]:
Show/Hide Code
# Check feature availability and remove problematic features
from customer_retention.stages.features.feature_selector import FeatureSelector

print("=" * 70)
print("FEATURE AVAILABILITY CHECK")
print("=" * 70)

unavailable_features = []
if findings.has_availability_issues:
    selector = FeatureSelector(target_column=target)
    availability_recs = selector.get_availability_recommendations(findings.feature_availability)
    unavailable_features = [rec.column for rec in availability_recs]

    print(f"\n⚠️  {len(availability_recs)} feature(s) have availability issues:\n")
    for rec in availability_recs:
        print(f"   • {rec.column} ({rec.issue_type}, {rec.coverage_pct:.0f}% coverage)")

    print("\n📋 Alternative approaches (for investigation):")
    print("   • segment_by_cohort: Train separate models per availability period")
    print("   • add_indicator: Create availability flags and impute missing")
    print("   • filter_window: Restrict data to feature's available period")

    original_count = len(feature_cols)
    feature_cols = [f for f in feature_cols if f not in unavailable_features]

    print(f"\n🗑️  Removed {original_count - len(feature_cols)} unavailable features")
    print(f"📊 Features remaining: {len(feature_cols)}")
else:
    print("\n✅ All features have full temporal coverage.")
======================================================================
FEATURE AVAILABILITY CHECK
======================================================================

✅ All features have full temporal coverage.
In [5]:
Show/Hide Code
from customer_retention.analysis.auto_explorer.project_context import ProjectContext
from customer_retention.core.config.column_config import select_model_ready_columns
from customer_retention.stages.modeling import DataSplitter, SplitStrategy

_project_ctx = ProjectContext.load(_namespace.project_context_path) if _namespace and _namespace.project_context_path.exists() else None
_use_temporal = _project_ctx.intent.temporal_split if _project_ctx and _project_ctx.intent else False

X = select_model_ready_columns(df[feature_cols].copy())
feature_cols = X.columns.tolist()

_nan_target = y.isna().sum()
if _nan_target:
    _valid = y.notna()
    X, y, df = X.loc[_valid], y.loc[_valid], df.loc[_valid]
    print(f"Dropped {_nan_target} rows with missing target")

for col in X.select_dtypes(include=['object']).columns:
    le = LabelEncoder()
    X[col] = le.fit_transform(X[col].astype(str))

for col in X.columns:
    if X[col].isnull().any():
        _median = X[col].median()
        X[col] = X[col].fillna(_median if pd.notna(_median) else 0)

if _use_temporal and "as_of_date" in df.columns:
    _purge_gap = _project_ctx.intent.purge_gap_days if _project_ctx and _project_ctx.intent else 104
    _exclude = [c for c in ["as_of_date", "entity_id"] if c in X.columns]
    _split_df = pd.concat([X, y], axis=1)
    _split_df["as_of_date"] = df.loc[X.index, "as_of_date"].values
    if "entity_id" in df.columns:
        _split_df["entity_id"] = df.loc[X.index, "entity_id"].values
    _split_df = _split_df.sort_values("as_of_date").reset_index(drop=True)
    splitter = DataSplitter(
        target_column=target,
        strategy=SplitStrategy.TEMPORAL,
        temporal_column="as_of_date",
        test_size=0.2,
        purge_gap_days=_purge_gap,
        exclude_columns=_exclude,
    )
    _split_result = splitter.split(_split_df)
    X_train, X_test = _split_result.X_train, _split_result.X_test
    y_train, y_test = _split_result.y_train, _split_result.y_test
    _train_entities = _split_df.loc[X_train.index, "entity_id"] if "entity_id" in _split_df.columns else None
    _train_dates = _split_df.loc[X_train.index, "as_of_date"] if "as_of_date" in _split_df.columns else None
    _split_method = "temporal (purge gap)"
    print(f"Purge gap: {_purge_gap} days")
    print(f"Cutoff date: {_split_result.split_info.get('cutoff_date', 'N/A')}")
    print(f"Rows purged: {_split_result.split_info.get('purge_gap_rows', 0)}")
else:
    _split_df = pd.concat([X, y], axis=1)
    splitter = DataSplitter(
        target_column=target,
        strategy=SplitStrategy.RANDOM_STRATIFIED,
        test_size=0.2,
        random_state=42,
    )
    _split_result = splitter.split(_split_df)
    X_train, X_test = _split_result.X_train, _split_result.X_test
    y_train, y_test = _split_result.y_train, _split_result.y_test
    _train_entities = None
    _train_dates = None
    _split_method = "stratified random"

X_train = X_train.fillna(0)
X_test = X_test.fillna(0)

_zero_var = X_train.columns[X_train.std() == 0].tolist()
if _zero_var:
    X_train = X_train.drop(columns=_zero_var)
    X_test = X_test.drop(columns=_zero_var)
    print(f"Dropped {len(_zero_var)} zero-variance columns")

scaler = StandardScaler()
X_train_scaled = pd.DataFrame(scaler.fit_transform(X_train), columns=X_train.columns)
X_test_scaled = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns)
X_train_scaled = X_train_scaled.fillna(0)
X_test_scaled = X_test_scaled.fillna(0)

print(f"\nSplit method: {_split_method}")
print(f"Train size: {len(X_train):,} ({len(X_train)/len(X)*100:.0f}%)")
print(f"Test size: {len(X_test):,} ({len(X_test)/len(X)*100:.0f}%)")
print("\nTrain class distribution:")
print(f"  Retained (1): {(y_train == 1).sum():,} ({(y_train == 1).sum()/len(y_train)*100:.1f}%)")
print(f"  Churned (0): {(y_train == 0).sum():,} ({(y_train == 0).sum()/len(y_train)*100:.1f}%)")
Dropped 1 rows with missing target
Dropped 212 zero-variance columns

Split method: stratified random
Train size: 24,615 (80%)
Test size: 6,154 (20%)

Train class distribution:
  Retained (1): 19,562 (79.5%)
  Churned (0): 5,053 (20.5%)

8.3 Baseline Models (with Class Weights)¶

📖 Using Class Weights:

  • class_weight='balanced' automatically adjusts weights inversely proportional to class frequencies
  • This helps models pay more attention to the minority class (churned customers)
  • Without weights, models may just predict "retained" for everyone
In [6]:
Show/Hide Code
import warnings

import numpy as np

from customer_retention.stages.modeling import CrossValidator, CVStrategy

models = {
    "Logistic Regression": LogisticRegression(max_iter=1000, random_state=42, class_weight='balanced'),
    "Random Forest": RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1, class_weight='balanced'),
    "Gradient Boosting": GradientBoostingClassifier(n_estimators=100, random_state=42)
}

_is_binary = y.nunique() == 2
_avg = "binary" if _is_binary else "weighted"
_cv_scoring = "roc_auc" if _is_binary else "f1_weighted"

def _safe_auc(y_true, y_score, model_classes=None):
    try:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            if model_classes is None:
                return roc_auc_score(y_true, y_score)
            return roc_auc_score(y_true, y_score, multi_class='ovr', labels=model_classes)
    except ValueError:
        return float('nan')

results = []
model_predictions = {}

for name, model in models.items():
    print(f"Training {name}...")

    _use_scaled = "Logistic" in name
    _X_fit, _X_eval = (X_train_scaled, X_test_scaled) if _use_scaled else (X_train, X_test)

    model.fit(_X_fit, y_train)
    y_pred = model.predict(_X_eval)
    y_pred_proba = model.predict_proba(_X_eval)

    if _is_binary:
        y_score = y_pred_proba[:, 1]
        auc = _safe_auc(y_test, y_score)
        pr_auc = average_precision_score(y_test, y_score)
    else:
        y_score = y_pred_proba
        auc = _safe_auc(y_test, y_score, model.classes_)
        pr_auc = float('nan')

    f1 = f1_score(y_test, y_pred, average=_avg, zero_division=0)
    precision = precision_score(y_test, y_pred, average=_avg, zero_division=0)
    recall = recall_score(y_test, y_pred, average=_avg, zero_division=0)

    if _use_temporal and _train_entities is not None:
        _cv = CrossValidator(strategy=CVStrategy.TEMPORAL_ENTITY, n_splits=5, scoring=_cv_scoring, purge_gap_days=_purge_gap)
        _cv_result = _cv.run(model, _X_fit, y_train, groups=_train_entities, temporal_values=_train_dates)
        cv_scores = _cv_result.cv_scores
    else:
        cv_scores = cross_val_score(model, _X_fit, y_train, cv=5, scoring=_cv_scoring)

    results.append({
        "Model": name, "Test AUC": auc, "PR-AUC": pr_auc,
        "F1-Score": f1, "Precision": precision, "Recall": recall,
        "CV Score Mean": cv_scores.mean(), "CV Score Std": cv_scores.std()
    })

    model_predictions[name] = {
        'y_pred': y_pred, 'y_pred_proba': y_score, 'model': model
    }

results_df = pd.DataFrame(results).round(4)

_cv_method = "temporal entity (GroupKFold + purge)" if (_use_temporal and _train_entities is not None) else "stratified 5-fold"
_class_type = "binary" if _is_binary else f"multiclass ({y.nunique()} classes)"
_cv_metric = "AUC" if _is_binary else "F1-weighted"
print(f"\nCV method: {_cv_method}")
print(f"CV metric: {_cv_metric}")
print(f"Classification type: {_class_type}")
print("\n" + "=" * 80)
print("MODEL COMPARISON")
print("=" * 80)
display_table(results_df)
Training Logistic Regression...
Training Random Forest...
Training Gradient Boosting...
CV method: stratified 5-fold
CV metric: AUC
Classification type: binary

================================================================================
MODEL COMPARISON
================================================================================
Model Test AUC PR-AUC F1-Score Precision Recall CV Score Mean CV Score Std
Logistic Regression 0.9685 0.9886 0.9436 0.9785 0.9111 0.9696 0.0027
Random Forest 0.9818 0.9923 0.9782 0.9663 0.9904 0.9824 0.0027
Gradient Boosting 0.9825 0.9937 0.9805 0.9704 0.9908 0.9849 0.0024

8.4 Feature Importance (Random Forest)¶

In [7]:
Show/Hide Code
rf_model = models["Random Forest"]
importance_df = pd.DataFrame({
    "Feature": X_train.columns.tolist(),
    "Importance": rf_model.feature_importances_
}).sort_values("Importance", ascending=False)

top_n = 15
top_features = importance_df.head(top_n)

fig = charts.bar_chart(
    top_features["Feature"].tolist(),
    top_features["Importance"].tolist(),
    title=f"Top {top_n} Feature Importances"
)
display_figure(fig)
No description has been provided for this image

8.5 Classification Report (Best Model)¶

In [8]:
Show/Hide Code
best_model = models["Gradient Boosting"]
y_pred = best_model.predict(X_test)

print("Classification Report (Gradient Boosting):")
print(classification_report(y_test, y_pred))
Classification Report (Gradient Boosting):
              precision    recall  f1-score   support

         0.0       0.96      0.88      0.92      1263
         1.0       0.97      0.99      0.98      4891

    accuracy                           0.97      6154
   macro avg       0.97      0.94      0.95      6154
weighted avg       0.97      0.97      0.97      6154

8.6 Model Comparison Grid¶

This visualization shows all models side-by-side with:

  • Row 1: Confusion matrices (counts and percentages)
  • Row 2: ROC curves with AUC scores
  • Row 3: Precision-Recall curves with PR-AUC scores

📖 How to Read:

  • Confusion Matrix: Diagonal = correct predictions. Off-diagonal = errors.
  • ROC Curve: Higher curve = better. AUC > 0.8 is good, > 0.9 is excellent.
  • PR Curve: Higher curve = better at finding positives without false alarms.
In [9]:
Show/Hide Code
grid_results = {
    name: {"y_pred": data["y_pred"], "y_pred_proba": data["y_pred_proba"]}
    for name, data in model_predictions.items()
}

if _is_binary:
    fig = charts.model_comparison_grid(
        grid_results, y_test,
        class_labels=["Churned (0)", "Retained (1)"],
        title="Model Comparison: Confusion Matrix | ROC Curve | Precision-Recall"
    )
    display_figure(fig)
else:
    import numpy as np
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    from sklearn.metrics import confusion_matrix

    model_names = list(grid_results.keys())
    n_models = len(model_names)
    fig = make_subplots(rows=1, cols=n_models, subplot_titles=[f"{n[:20]}" for n in model_names])
    for i, name in enumerate(model_names):
        cm = confusion_matrix(y_test, grid_results[name]["y_pred"])
        fig.add_trace(go.Heatmap(
            z=cm, x=list(range(cm.shape[1])), y=list(range(cm.shape[0])),
            text=cm.astype(str), texttemplate="%{text}", showscale=False,
            colorscale="Blues",
        ), row=1, col=i + 1)
    fig.update_layout(title="Model Comparison: Confusion Matrices (multiclass)", height=400, width=350 * n_models + 50)
    display_figure(fig)

print("\n" + "=" * 80)
print("METRICS SUMMARY")
print("=" * 80)
_metrics_cols = ["Model", "Test AUC", "F1-Score", "Precision", "Recall"]
if _is_binary:
    _metrics_cols.insert(2, "PR-AUC")
display_table(results_df[_metrics_cols])
No description has been provided for this image
================================================================================
METRICS SUMMARY
================================================================================
Model Test AUC PR-AUC F1-Score Precision Recall
Logistic Regression 0.9685 0.9886 0.9436 0.9785 0.9111
Random Forest 0.9818 0.9923 0.9782 0.9663 0.9904
Gradient Boosting 0.9825 0.9937 0.9805 0.9704 0.9908

8.6.1 Individual Model Analysis¶

The grid above shows all models together. Below is detailed analysis per model.

In [10]:
Show/Hide Code
print("=" * 70)
print("CLASSIFICATION REPORTS BY MODEL")
print("=" * 70)

_target_names = ["Churned", "Retained"] if _is_binary else None

for name, data in model_predictions.items():
    print(f"\n{'='*40}")
    print(f"  {name}")
    print('='*40)
    print(classification_report(y_test, data['y_pred'], target_names=_target_names, zero_division=0))
======================================================================
CLASSIFICATION REPORTS BY MODEL
======================================================================

========================================
  Logistic Regression
========================================
              precision    recall  f1-score   support

     Churned       0.73      0.92      0.81      1263
    Retained       0.98      0.91      0.94      4891

    accuracy                           0.91      6154
   macro avg       0.85      0.92      0.88      6154
weighted avg       0.93      0.91      0.92      6154


========================================
  Random Forest
========================================
              precision    recall  f1-score   support

     Churned       0.96      0.87      0.91      1263
    Retained       0.97      0.99      0.98      4891

    accuracy                           0.96      6154
   macro avg       0.96      0.93      0.94      6154
weighted avg       0.96      0.96      0.96      6154


========================================
  Gradient Boosting
========================================
              precision    recall  f1-score   support

     Churned       0.96      0.88      0.92      1263
    Retained       0.97      0.99      0.98      4891

    accuracy                           0.97      6154
   macro avg       0.97      0.94      0.95      6154
weighted avg       0.97      0.97      0.97      6154

8.6.1 Precision-Recall Curves¶

📖 Why PR Curves for Imbalanced Data:

  • ROC curves can look optimistic for imbalanced data
  • PR curves focus on the minority class (churners)
  • Better at showing how well we detect actual churners

📖 How to Read:

  • Baseline (dashed line) = proportion of positives in the data
  • Higher curve = better at finding churners without too many false alarms

8.7 Key Takeaways¶

📖 Interpreting Results:

In [11]:
Show/Hide Code
_primary_metric = "Test AUC" if results_df["Test AUC"].notna().any() else "F1-Score"
best_model = results_df.loc[results_df[_primary_metric].idxmax()]

print("=" * 70)
print("KEY TAKEAWAYS")
print("=" * 70)

print(f"\n  BEST MODEL (by {_primary_metric}): {best_model['Model']}")
if results_df["Test AUC"].notna().any():
    print(f"   Test AUC: {best_model['Test AUC']:.4f}")
if _is_binary:
    print(f"   PR-AUC: {best_model['PR-AUC']:.4f}")
print(f"   F1-Score: {best_model['F1-Score']:.4f}")

print("\n  TOP 3 IMPORTANT FEATURES:")
for i, feat in enumerate(importance_df.head(3)['Feature'].tolist(), 1):
    imp = importance_df[importance_df['Feature'] == feat]['Importance'].values[0]
    print(f"   {i}. {feat} ({imp:.3f})")

_best_score = best_model[_primary_metric]
print("\n  MODEL PERFORMANCE ASSESSMENT:")
if _best_score > 0.90:
    print("   Excellent predictive signal - likely production-ready with tuning")
elif _best_score > 0.80:
    print("   Strong predictive signal - good baseline for improvement")
elif _best_score > 0.70:
    print("   Moderate signal - consider more feature engineering")
else:
    print("   Weak signal - may need more data or different features")

print("\n  NEXT STEPS:")
print("   1. Feature engineering with derived features (notebook 05)")
print("   2. Hyperparameter tuning (GridSearchCV)")
print("   3. Threshold optimization for business metrics")
print("   4. A/B testing in production")
======================================================================
KEY TAKEAWAYS
======================================================================

  BEST MODEL (by Test AUC): Gradient Boosting
   Test AUC: 0.9825
   PR-AUC: 0.9937
   F1-Score: 0.9805

  TOP 3 IMPORTANT FEATURES:
   1. esent_mean_all_time (0.098)
   2. esent_vs_cohort_mean (0.092)
   3. lag0_esent_sum (0.086)

  MODEL PERFORMANCE ASSESSMENT:
   Excellent predictive signal - likely production-ready with tuning

  NEXT STEPS:
   1. Feature engineering with derived features (notebook 05)
   2. Hyperparameter tuning (GridSearchCV)
   3. Threshold optimization for business metrics
   4. A/B testing in production

Summary: What We Learned¶

In this notebook, we trained baseline models and established performance benchmarks:

  1. Data Preparation - Proper train/test split with stratification and scaling
  2. Class Imbalance Handling - Used balanced class weights
  3. Model Comparison - Compared Logistic Regression, Random Forest, and Gradient Boosting
  4. Multiple Metrics - Evaluated with AUC, PR-AUC, F1, Precision, Recall
  5. Feature Importance - Identified the most predictive features

Key Results for This Dataset¶

Metric Value Interpretation
Best AUC ~0.98 Excellent discrimination
Top Feature esent Email engagement is critical
Imbalance ~4:1 Moderate, handled with class weights

Next Steps¶

Continue to 09_business_alignment.ipynb to:

  • Align model performance with business objectives
  • Define intervention strategies by risk level
  • Calculate expected ROI from the model
  • Set deployment requirements
In [12]:
Show/Hide Code
_best_score_val = results_df[_primary_metric].max()

print("Key Takeaways:")
print("="*50)
print(f"Best baseline {_primary_metric}: {_best_score_val:.4f}")
print(f"Top 3 important features: {', '.join(importance_df.head(3)['Feature'].tolist())}")

if _best_score_val > 0.85:
    print("\nStrong predictive signal detected. Data is well-suited for modeling.")
elif _best_score_val > 0.70:
    print("\nModerate predictive signal. Consider feature engineering for improvement.")
else:
    print("\nWeak predictive signal. May need more features or data.")
Key Takeaways:
==================================================
Best baseline Test AUC: 0.9825
Top 3 important features: esent_mean_all_time, esent_vs_cohort_mean, lag0_esent_sum

Strong predictive signal detected. Data is well-suited for modeling.

Next Steps¶

Continue to 09_business_alignment.ipynb to align with business objectives.

Save Reminder: Save this notebook (Ctrl+S / Cmd+S) before running the next one. The next notebook will automatically export this notebook's HTML documentation from the saved file.