Chapter 8: Baseline Experiments¶
Purpose: Train baseline models to understand data predictability and establish performance benchmarks.
What you'll learn:
- How to prepare data for ML with proper train/test splitting
- How to handle class imbalance with class weights
- How to evaluate models with appropriate metrics (not just accuracy!)
- How to interpret feature importance
Outputs:
- Baseline model performance (AUC, Precision, Recall, F1)
- Feature importance rankings
- ROC and Precision-Recall curves
- Performance benchmarks for comparison
Evaluation Metrics for Imbalanced Data¶
| Metric | What It Measures | When to Use |
|---|---|---|
| AUC-ROC | Ranking quality across thresholds | General model comparison |
| Precision | "Of predicted churned, how many are correct?" | When false positives are costly |
| Recall | "Of actual churned, how many did we catch?" | When missing churners is costly |
| F1-Score | Balance of precision and recall | When both matter equally |
| PR-AUC | Precision-Recall under curve | Better for imbalanced data |
8.1 Setup¶
Show/Hide Code
from customer_retention.analysis.notebook_progress import track_and_export_previous
track_and_export_previous("08_baseline_experiments.ipynb")
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
average_precision_score,
classification_report,
f1_score,
precision_score,
recall_score,
roc_auc_score,
)
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from customer_retention.analysis.auto_explorer import ExplorationFindings
from customer_retention.analysis.visualization import ChartBuilder, display_figure, display_table
from customer_retention.core.config.column_config import NON_FEATURE_COLUMN_TYPES, ColumnType
from customer_retention.core.config.experiments import (
FINDINGS_DIR,
)
Show/Hide Code
from pathlib import Path
from customer_retention.analysis.auto_explorer import load_notebook_findings, resolve_target_column
FINDINGS_PATH, _namespace, dataset_name = load_notebook_findings(
"08_baseline_experiments.ipynb", prefer_aggregated=True
)
print(f"Using: {FINDINGS_PATH}")
findings = ExplorationFindings.load(FINDINGS_PATH)
target = resolve_target_column(_namespace, findings)
# Load data - prefer aggregated entity-level data for modeling
from customer_retention.analysis.auto_explorer.active_dataset_store import load_active_dataset
from customer_retention.core.config.column_config import DatasetGranularity
from customer_retention.stages.temporal import TEMPORAL_METADATA_COLS
if "_aggregated" in FINDINGS_PATH:
source_path = Path(findings.source_path)
if not source_path.is_absolute():
source_path = Path("..") / source_path
if source_path.is_dir():
from customer_retention.integrations.adapters.factory import get_delta
df = get_delta(force_local=True).read(str(source_path))
elif source_path.is_file():
df = pd.read_parquet(source_path)
else:
df = load_active_dataset(_namespace, dataset_name)
data_source = f"aggregated:{source_path.name}"
elif dataset_name is None and _namespace:
from customer_retention.integrations.adapters.factory import get_delta
df = get_delta(force_local=True).read(str(_namespace.silver_merged_path))
data_source = "silver_merged"
else:
df = load_active_dataset(_namespace, dataset_name)
data_source = dataset_name
charts = ChartBuilder()
print(f"\nLoaded {len(df):,} rows from: {data_source}")
Using: /Users/Vital/python/CustomerRetention/experiments/runs/retail-e7471284/datasets/customer_retention_retail/findings/customer_retention_retail_aggregated_findings.yaml
Loaded 30,770 rows from: aggregated:customer_retention_retail_aggregated
8.2 Prepare Data for Modeling¶
📖 Feature Source:
Features used in this notebook come from the ExplorationFindings generated in earlier notebooks:
- Column types are auto-detected in notebook 01 (Data Discovery)
- Target column is identified from the findings
- Identifier columns are excluded to prevent data leakage
- Text columns are excluded (require specialized NLP processing)
📖 Best Practices:
- Stratified Split: Maintains class ratios in train/test sets
- Scale After Split: Fit scaler on train only (prevents data leakage)
- Handle Missing: Impute or drop before scaling
📖 Transformations Applied:
- Categorical variables → Label Encoded
- Missing values → Median (numeric) or Mode (categorical)
- Features → StandardScaler (fit on train only)
Show/Hide Code
if not target:
raise ValueError("No target column set. Please define one in exploration notebooks.")
y = df[target]
feature_cols = [
name for name, col in findings.columns.items()
if col.inferred_type not in NON_FEATURE_COLUMN_TYPES
and name not in TEMPORAL_METADATA_COLS
]
print("=" * 70)
print("FEATURE SELECTION FROM FINDINGS")
print("=" * 70)
print(f"\n Target Column: {target}")
print(f" Features Selected: {len(feature_cols)}")
type_counts = {}
for name in feature_cols:
col_type = findings.columns[name].inferred_type.value
type_counts[col_type] = type_counts.get(col_type, 0) + 1
print("\n Features by Type:")
for col_type, count in sorted(type_counts.items()):
print(f" {col_type}: {count}")
excluded = [name for name, col in findings.columns.items()
if col.inferred_type in NON_FEATURE_COLUMN_TYPES]
if excluded:
print(f"\n Excluded Columns ({len(excluded)}): {', '.join(excluded[:10])}{'...' if len(excluded) > 10 else ''}")
====================================================================== FEATURE SELECTION FROM FINDINGS ====================================================================== Target Column: retained Features Selected: 394 Features by Type: binary: 47 categorical_nominal: 1 numeric_continuous: 79 numeric_discrete: 267 Excluded Columns (12): custid, retained, esent_middle, eopenrate_middle, eclickrate_middle, avgorder_middle, ordfreq_middle, paperless_middle, refill_middle, doorstep_middle...
Show/Hide Code
# Check feature availability and remove problematic features
from customer_retention.stages.features.feature_selector import FeatureSelector
print("=" * 70)
print("FEATURE AVAILABILITY CHECK")
print("=" * 70)
unavailable_features = []
if findings.has_availability_issues:
selector = FeatureSelector(target_column=target)
availability_recs = selector.get_availability_recommendations(findings.feature_availability)
unavailable_features = [rec.column for rec in availability_recs]
print(f"\n⚠️ {len(availability_recs)} feature(s) have availability issues:\n")
for rec in availability_recs:
print(f" • {rec.column} ({rec.issue_type}, {rec.coverage_pct:.0f}% coverage)")
print("\n📋 Alternative approaches (for investigation):")
print(" • segment_by_cohort: Train separate models per availability period")
print(" • add_indicator: Create availability flags and impute missing")
print(" • filter_window: Restrict data to feature's available period")
original_count = len(feature_cols)
feature_cols = [f for f in feature_cols if f not in unavailable_features]
print(f"\n🗑️ Removed {original_count - len(feature_cols)} unavailable features")
print(f"📊 Features remaining: {len(feature_cols)}")
else:
print("\n✅ All features have full temporal coverage.")
====================================================================== FEATURE AVAILABILITY CHECK ====================================================================== ✅ All features have full temporal coverage.
Show/Hide Code
from customer_retention.analysis.auto_explorer.project_context import ProjectContext
from customer_retention.core.config.column_config import select_model_ready_columns
from customer_retention.stages.modeling import DataSplitter, SplitStrategy
_project_ctx = ProjectContext.load(_namespace.project_context_path) if _namespace and _namespace.project_context_path.exists() else None
_use_temporal = _project_ctx.intent.temporal_split if _project_ctx and _project_ctx.intent else False
X = select_model_ready_columns(df[feature_cols].copy())
feature_cols = X.columns.tolist()
_nan_target = y.isna().sum()
if _nan_target:
_valid = y.notna()
X, y, df = X.loc[_valid], y.loc[_valid], df.loc[_valid]
print(f"Dropped {_nan_target} rows with missing target")
for col in X.select_dtypes(include=['object']).columns:
le = LabelEncoder()
X[col] = le.fit_transform(X[col].astype(str))
for col in X.columns:
if X[col].isnull().any():
_median = X[col].median()
X[col] = X[col].fillna(_median if pd.notna(_median) else 0)
if _use_temporal and "as_of_date" in df.columns:
_purge_gap = _project_ctx.intent.purge_gap_days if _project_ctx and _project_ctx.intent else 104
_exclude = [c for c in ["as_of_date", "entity_id"] if c in X.columns]
_split_df = pd.concat([X, y], axis=1)
_split_df["as_of_date"] = df.loc[X.index, "as_of_date"].values
if "entity_id" in df.columns:
_split_df["entity_id"] = df.loc[X.index, "entity_id"].values
_split_df = _split_df.sort_values("as_of_date").reset_index(drop=True)
splitter = DataSplitter(
target_column=target,
strategy=SplitStrategy.TEMPORAL,
temporal_column="as_of_date",
test_size=0.2,
purge_gap_days=_purge_gap,
exclude_columns=_exclude,
)
_split_result = splitter.split(_split_df)
X_train, X_test = _split_result.X_train, _split_result.X_test
y_train, y_test = _split_result.y_train, _split_result.y_test
_train_entities = _split_df.loc[X_train.index, "entity_id"] if "entity_id" in _split_df.columns else None
_train_dates = _split_df.loc[X_train.index, "as_of_date"] if "as_of_date" in _split_df.columns else None
_split_method = "temporal (purge gap)"
print(f"Purge gap: {_purge_gap} days")
print(f"Cutoff date: {_split_result.split_info.get('cutoff_date', 'N/A')}")
print(f"Rows purged: {_split_result.split_info.get('purge_gap_rows', 0)}")
else:
_split_df = pd.concat([X, y], axis=1)
splitter = DataSplitter(
target_column=target,
strategy=SplitStrategy.RANDOM_STRATIFIED,
test_size=0.2,
random_state=42,
)
_split_result = splitter.split(_split_df)
X_train, X_test = _split_result.X_train, _split_result.X_test
y_train, y_test = _split_result.y_train, _split_result.y_test
_train_entities = None
_train_dates = None
_split_method = "stratified random"
X_train = X_train.fillna(0)
X_test = X_test.fillna(0)
_zero_var = X_train.columns[X_train.std() == 0].tolist()
if _zero_var:
X_train = X_train.drop(columns=_zero_var)
X_test = X_test.drop(columns=_zero_var)
print(f"Dropped {len(_zero_var)} zero-variance columns")
scaler = StandardScaler()
X_train_scaled = pd.DataFrame(scaler.fit_transform(X_train), columns=X_train.columns)
X_test_scaled = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns)
X_train_scaled = X_train_scaled.fillna(0)
X_test_scaled = X_test_scaled.fillna(0)
print(f"\nSplit method: {_split_method}")
print(f"Train size: {len(X_train):,} ({len(X_train)/len(X)*100:.0f}%)")
print(f"Test size: {len(X_test):,} ({len(X_test)/len(X)*100:.0f}%)")
print("\nTrain class distribution:")
print(f" Retained (1): {(y_train == 1).sum():,} ({(y_train == 1).sum()/len(y_train)*100:.1f}%)")
print(f" Churned (0): {(y_train == 0).sum():,} ({(y_train == 0).sum()/len(y_train)*100:.1f}%)")
Dropped 1 rows with missing target
Dropped 212 zero-variance columns Split method: stratified random Train size: 24,615 (80%) Test size: 6,154 (20%) Train class distribution: Retained (1): 19,562 (79.5%) Churned (0): 5,053 (20.5%)
8.3 Baseline Models (with Class Weights)¶
📖 Using Class Weights:
class_weight='balanced'automatically adjusts weights inversely proportional to class frequencies- This helps models pay more attention to the minority class (churned customers)
- Without weights, models may just predict "retained" for everyone
Show/Hide Code
import warnings
import numpy as np
from customer_retention.stages.modeling import CrossValidator, CVStrategy
models = {
"Logistic Regression": LogisticRegression(max_iter=1000, random_state=42, class_weight='balanced'),
"Random Forest": RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1, class_weight='balanced'),
"Gradient Boosting": GradientBoostingClassifier(n_estimators=100, random_state=42)
}
_is_binary = y.nunique() == 2
_avg = "binary" if _is_binary else "weighted"
_cv_scoring = "roc_auc" if _is_binary else "f1_weighted"
def _safe_auc(y_true, y_score, model_classes=None):
try:
with warnings.catch_warnings():
warnings.simplefilter("ignore")
if model_classes is None:
return roc_auc_score(y_true, y_score)
return roc_auc_score(y_true, y_score, multi_class='ovr', labels=model_classes)
except ValueError:
return float('nan')
results = []
model_predictions = {}
for name, model in models.items():
print(f"Training {name}...")
_use_scaled = "Logistic" in name
_X_fit, _X_eval = (X_train_scaled, X_test_scaled) if _use_scaled else (X_train, X_test)
model.fit(_X_fit, y_train)
y_pred = model.predict(_X_eval)
y_pred_proba = model.predict_proba(_X_eval)
if _is_binary:
y_score = y_pred_proba[:, 1]
auc = _safe_auc(y_test, y_score)
pr_auc = average_precision_score(y_test, y_score)
else:
y_score = y_pred_proba
auc = _safe_auc(y_test, y_score, model.classes_)
pr_auc = float('nan')
f1 = f1_score(y_test, y_pred, average=_avg, zero_division=0)
precision = precision_score(y_test, y_pred, average=_avg, zero_division=0)
recall = recall_score(y_test, y_pred, average=_avg, zero_division=0)
if _use_temporal and _train_entities is not None:
_cv = CrossValidator(strategy=CVStrategy.TEMPORAL_ENTITY, n_splits=5, scoring=_cv_scoring, purge_gap_days=_purge_gap)
_cv_result = _cv.run(model, _X_fit, y_train, groups=_train_entities, temporal_values=_train_dates)
cv_scores = _cv_result.cv_scores
else:
cv_scores = cross_val_score(model, _X_fit, y_train, cv=5, scoring=_cv_scoring)
results.append({
"Model": name, "Test AUC": auc, "PR-AUC": pr_auc,
"F1-Score": f1, "Precision": precision, "Recall": recall,
"CV Score Mean": cv_scores.mean(), "CV Score Std": cv_scores.std()
})
model_predictions[name] = {
'y_pred': y_pred, 'y_pred_proba': y_score, 'model': model
}
results_df = pd.DataFrame(results).round(4)
_cv_method = "temporal entity (GroupKFold + purge)" if (_use_temporal and _train_entities is not None) else "stratified 5-fold"
_class_type = "binary" if _is_binary else f"multiclass ({y.nunique()} classes)"
_cv_metric = "AUC" if _is_binary else "F1-weighted"
print(f"\nCV method: {_cv_method}")
print(f"CV metric: {_cv_metric}")
print(f"Classification type: {_class_type}")
print("\n" + "=" * 80)
print("MODEL COMPARISON")
print("=" * 80)
display_table(results_df)
Training Logistic Regression...
Training Random Forest...
Training Gradient Boosting...
CV method: stratified 5-fold CV metric: AUC Classification type: binary ================================================================================ MODEL COMPARISON ================================================================================
| Model | Test AUC | PR-AUC | F1-Score | Precision | Recall | CV Score Mean | CV Score Std |
|---|---|---|---|---|---|---|---|
| Logistic Regression | 0.9685 | 0.9886 | 0.9436 | 0.9785 | 0.9111 | 0.9696 | 0.0027 |
| Random Forest | 0.9818 | 0.9923 | 0.9782 | 0.9663 | 0.9904 | 0.9824 | 0.0027 |
| Gradient Boosting | 0.9825 | 0.9937 | 0.9805 | 0.9704 | 0.9908 | 0.9849 | 0.0024 |
8.4 Feature Importance (Random Forest)¶
Show/Hide Code
rf_model = models["Random Forest"]
importance_df = pd.DataFrame({
"Feature": X_train.columns.tolist(),
"Importance": rf_model.feature_importances_
}).sort_values("Importance", ascending=False)
top_n = 15
top_features = importance_df.head(top_n)
fig = charts.bar_chart(
top_features["Feature"].tolist(),
top_features["Importance"].tolist(),
title=f"Top {top_n} Feature Importances"
)
display_figure(fig)
8.5 Classification Report (Best Model)¶
Show/Hide Code
best_model = models["Gradient Boosting"]
y_pred = best_model.predict(X_test)
print("Classification Report (Gradient Boosting):")
print(classification_report(y_test, y_pred))
Classification Report (Gradient Boosting):
precision recall f1-score support
0.0 0.96 0.88 0.92 1263
1.0 0.97 0.99 0.98 4891
accuracy 0.97 6154
macro avg 0.97 0.94 0.95 6154
weighted avg 0.97 0.97 0.97 6154
8.6 Model Comparison Grid¶
This visualization shows all models side-by-side with:
- Row 1: Confusion matrices (counts and percentages)
- Row 2: ROC curves with AUC scores
- Row 3: Precision-Recall curves with PR-AUC scores
📖 How to Read:
- Confusion Matrix: Diagonal = correct predictions. Off-diagonal = errors.
- ROC Curve: Higher curve = better. AUC > 0.8 is good, > 0.9 is excellent.
- PR Curve: Higher curve = better at finding positives without false alarms.
Show/Hide Code
grid_results = {
name: {"y_pred": data["y_pred"], "y_pred_proba": data["y_pred_proba"]}
for name, data in model_predictions.items()
}
if _is_binary:
fig = charts.model_comparison_grid(
grid_results, y_test,
class_labels=["Churned (0)", "Retained (1)"],
title="Model Comparison: Confusion Matrix | ROC Curve | Precision-Recall"
)
display_figure(fig)
else:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.metrics import confusion_matrix
model_names = list(grid_results.keys())
n_models = len(model_names)
fig = make_subplots(rows=1, cols=n_models, subplot_titles=[f"{n[:20]}" for n in model_names])
for i, name in enumerate(model_names):
cm = confusion_matrix(y_test, grid_results[name]["y_pred"])
fig.add_trace(go.Heatmap(
z=cm, x=list(range(cm.shape[1])), y=list(range(cm.shape[0])),
text=cm.astype(str), texttemplate="%{text}", showscale=False,
colorscale="Blues",
), row=1, col=i + 1)
fig.update_layout(title="Model Comparison: Confusion Matrices (multiclass)", height=400, width=350 * n_models + 50)
display_figure(fig)
print("\n" + "=" * 80)
print("METRICS SUMMARY")
print("=" * 80)
_metrics_cols = ["Model", "Test AUC", "F1-Score", "Precision", "Recall"]
if _is_binary:
_metrics_cols.insert(2, "PR-AUC")
display_table(results_df[_metrics_cols])
================================================================================ METRICS SUMMARY ================================================================================
| Model | Test AUC | PR-AUC | F1-Score | Precision | Recall |
|---|---|---|---|---|---|
| Logistic Regression | 0.9685 | 0.9886 | 0.9436 | 0.9785 | 0.9111 |
| Random Forest | 0.9818 | 0.9923 | 0.9782 | 0.9663 | 0.9904 |
| Gradient Boosting | 0.9825 | 0.9937 | 0.9805 | 0.9704 | 0.9908 |
8.6.1 Individual Model Analysis¶
The grid above shows all models together. Below is detailed analysis per model.
Show/Hide Code
print("=" * 70)
print("CLASSIFICATION REPORTS BY MODEL")
print("=" * 70)
_target_names = ["Churned", "Retained"] if _is_binary else None
for name, data in model_predictions.items():
print(f"\n{'='*40}")
print(f" {name}")
print('='*40)
print(classification_report(y_test, data['y_pred'], target_names=_target_names, zero_division=0))
======================================================================
CLASSIFICATION REPORTS BY MODEL
======================================================================
========================================
Logistic Regression
========================================
precision recall f1-score support
Churned 0.73 0.92 0.81 1263
Retained 0.98 0.91 0.94 4891
accuracy 0.91 6154
macro avg 0.85 0.92 0.88 6154
weighted avg 0.93 0.91 0.92 6154
========================================
Random Forest
========================================
precision recall f1-score support
Churned 0.96 0.87 0.91 1263
Retained 0.97 0.99 0.98 4891
accuracy 0.96 6154
macro avg 0.96 0.93 0.94 6154
weighted avg 0.96 0.96 0.96 6154
========================================
Gradient Boosting
========================================
precision recall f1-score support
Churned 0.96 0.88 0.92 1263
Retained 0.97 0.99 0.98 4891
accuracy 0.97 6154
macro avg 0.97 0.94 0.95 6154
weighted avg 0.97 0.97 0.97 6154
8.6.1 Precision-Recall Curves¶
📖 Why PR Curves for Imbalanced Data:
- ROC curves can look optimistic for imbalanced data
- PR curves focus on the minority class (churners)
- Better at showing how well we detect actual churners
📖 How to Read:
- Baseline (dashed line) = proportion of positives in the data
- Higher curve = better at finding churners without too many false alarms
8.7 Key Takeaways¶
📖 Interpreting Results:
Show/Hide Code
_primary_metric = "Test AUC" if results_df["Test AUC"].notna().any() else "F1-Score"
best_model = results_df.loc[results_df[_primary_metric].idxmax()]
print("=" * 70)
print("KEY TAKEAWAYS")
print("=" * 70)
print(f"\n BEST MODEL (by {_primary_metric}): {best_model['Model']}")
if results_df["Test AUC"].notna().any():
print(f" Test AUC: {best_model['Test AUC']:.4f}")
if _is_binary:
print(f" PR-AUC: {best_model['PR-AUC']:.4f}")
print(f" F1-Score: {best_model['F1-Score']:.4f}")
print("\n TOP 3 IMPORTANT FEATURES:")
for i, feat in enumerate(importance_df.head(3)['Feature'].tolist(), 1):
imp = importance_df[importance_df['Feature'] == feat]['Importance'].values[0]
print(f" {i}. {feat} ({imp:.3f})")
_best_score = best_model[_primary_metric]
print("\n MODEL PERFORMANCE ASSESSMENT:")
if _best_score > 0.90:
print(" Excellent predictive signal - likely production-ready with tuning")
elif _best_score > 0.80:
print(" Strong predictive signal - good baseline for improvement")
elif _best_score > 0.70:
print(" Moderate signal - consider more feature engineering")
else:
print(" Weak signal - may need more data or different features")
print("\n NEXT STEPS:")
print(" 1. Feature engineering with derived features (notebook 05)")
print(" 2. Hyperparameter tuning (GridSearchCV)")
print(" 3. Threshold optimization for business metrics")
print(" 4. A/B testing in production")
====================================================================== KEY TAKEAWAYS ====================================================================== BEST MODEL (by Test AUC): Gradient Boosting Test AUC: 0.9825 PR-AUC: 0.9937 F1-Score: 0.9805 TOP 3 IMPORTANT FEATURES: 1. esent_mean_all_time (0.098) 2. esent_vs_cohort_mean (0.092) 3. lag0_esent_sum (0.086) MODEL PERFORMANCE ASSESSMENT: Excellent predictive signal - likely production-ready with tuning NEXT STEPS: 1. Feature engineering with derived features (notebook 05) 2. Hyperparameter tuning (GridSearchCV) 3. Threshold optimization for business metrics 4. A/B testing in production
Summary: What We Learned¶
In this notebook, we trained baseline models and established performance benchmarks:
- Data Preparation - Proper train/test split with stratification and scaling
- Class Imbalance Handling - Used balanced class weights
- Model Comparison - Compared Logistic Regression, Random Forest, and Gradient Boosting
- Multiple Metrics - Evaluated with AUC, PR-AUC, F1, Precision, Recall
- Feature Importance - Identified the most predictive features
Key Results for This Dataset¶
| Metric | Value | Interpretation |
|---|---|---|
| Best AUC | ~0.98 | Excellent discrimination |
| Top Feature | esent | Email engagement is critical |
| Imbalance | ~4:1 | Moderate, handled with class weights |
Next Steps¶
Continue to 09_business_alignment.ipynb to:
- Align model performance with business objectives
- Define intervention strategies by risk level
- Calculate expected ROI from the model
- Set deployment requirements
Show/Hide Code
_best_score_val = results_df[_primary_metric].max()
print("Key Takeaways:")
print("="*50)
print(f"Best baseline {_primary_metric}: {_best_score_val:.4f}")
print(f"Top 3 important features: {', '.join(importance_df.head(3)['Feature'].tolist())}")
if _best_score_val > 0.85:
print("\nStrong predictive signal detected. Data is well-suited for modeling.")
elif _best_score_val > 0.70:
print("\nModerate predictive signal. Consider feature engineering for improvement.")
else:
print("\nWeak predictive signal. May need more features or data.")
Key Takeaways: ================================================== Best baseline Test AUC: 0.9825 Top 3 important features: esent_mean_all_time, esent_vs_cohort_mean, lag0_esent_sum Strong predictive signal detected. Data is well-suited for modeling.
Save Reminder: Save this notebook (Ctrl+S / Cmd+S) before running the next one. The next notebook will automatically export this notebook's HTML documentation from the saved file.