Chapter 5: Relationship AnalysisΒΆ
Purpose: Explore feature correlations, relationships with the target, and identify predictive signals.
What you'll learn:
- How to interpret correlation matrices and identify multicollinearity
- How to visualize feature distributions by target class
- How to identify which features have the strongest relationship with retention
- How to analyze categorical features for predictive power
Outputs:
- Correlation heatmap with multicollinearity detection
- Feature distributions by retention status (box plots)
- Retention rates by categorical features
- Feature-target correlation rankings
Understanding Feature RelationshipsΒΆ
| Analysis | What It Tells You | Action |
|---|---|---|
| High Correlation (r > 0.7) | Features carry redundant information | Consider removing one |
| Target Correlation | Feature's predictive power | Prioritize high-correlation features |
| Class Separation | How different retained vs churned look | Good separation = good predictor |
| Categorical Rates | Retention varies by category | Use for segmentation and encoding |
5.1 SetupΒΆ
Show/Hide Code
from customer_retention.analysis.notebook_progress import track_and_export_previous
track_and_export_previous("05_relationship_analysis.ipynb")
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import yaml
from plotly.subplots import make_subplots
from customer_retention.analysis.auto_explorer import ExplorationFindings, ExplorationManager, RecommendationRegistry
from customer_retention.analysis.visualization import ChartBuilder, display_figure
from customer_retention.core.config.column_config import ColumnType
from customer_retention.core.config.experiments import (
FINDINGS_DIR,
)
from customer_retention.core.utils.leakage import detect_leaking_features
from customer_retention.stages.profiling import RecommendationCategory, RelationshipRecommender
Show/Hide Code
from pathlib import Path
from customer_retention.analysis.auto_explorer import load_notebook_findings
FINDINGS_PATH, _namespace, dataset_name = load_notebook_findings("05_relationship_analysis.ipynb")
print(f"Using: {FINDINGS_PATH}")
RECOMMENDATIONS_PATH = FINDINGS_PATH.replace("_findings.yaml", "_recommendations.yaml")
findings = ExplorationFindings.load(FINDINGS_PATH)
# Load data - handle aggregated vs standard paths
from customer_retention.analysis.auto_explorer.active_dataset_store import load_active_dataset
from customer_retention.stages.temporal import TEMPORAL_METADATA_COLS
if "_aggregated" in FINDINGS_PATH:
source_path = Path(findings.source_path)
if not source_path.is_absolute():
if str(source_path).startswith("experiments"):
source_path = Path("..") / source_path
else:
source_path = FINDINGS_DIR / source_path.name
if source_path.is_dir():
from customer_retention.integrations.adapters.factory import get_delta
df = get_delta(force_local=True).read(str(source_path))
elif source_path.is_file():
df = pd.read_parquet(source_path)
else:
df = load_active_dataset(_namespace, dataset_name)
data_source = f"aggregated:{source_path.name}"
else:
df = load_active_dataset(_namespace, dataset_name)
data_source = dataset_name
_df_cols = set(df.columns)
findings.columns = {k: v for k, v in findings.columns.items() if k in _df_cols}
if findings.target_column and findings.target_column not in _df_cols:
findings.target_column = None
charts = ChartBuilder()
if Path(RECOMMENDATIONS_PATH).exists():
with open(RECOMMENDATIONS_PATH, "r") as f:
registry = RecommendationRegistry.from_dict(yaml.safe_load(f))
print(f"Loaded existing recommendations: {len(registry.all_recommendations)} total")
else:
registry = RecommendationRegistry()
registry.init_bronze(findings.source_path)
_entity_col = (findings.time_series_metadata.entity_column
if findings.time_series_metadata else None)
registry.init_silver(_entity_col or "entity_id")
registry.init_gold(findings.target_column or "target")
print("Initialized new recommendation registry")
print(f"\nLoaded {len(df):,} rows from: {data_source}")
Using: /Users/Vital/python/CustomerRetention/experiments/runs/email-6301db6c/datasets/customer_emails/findings/customer_emails_aggregated_findings.yaml
Loaded existing recommendations: 164 total Loaded 4,998 rows from: aggregated:customer_emails_aggregated
5.1b Leakage Exclusion GateΒΆ
Features that leak target information are automatically detected and removed before relationship analysis.
Add column names to EXCLUDE_LEAKING_FEATURES to manually exclude additional features you suspect of leakage.
Show/Hide Code
EXCLUDE_LEAKING_FEATURES = []
_check_cols = [
name for name, col in findings.columns.items()
if col.inferred_type in [ColumnType.NUMERIC_CONTINUOUS, ColumnType.NUMERIC_DISCRETE, ColumnType.BINARY]
and name != findings.target_column
and name not in TEMPORAL_METADATA_COLS
]
_auto_leakers = detect_leaking_features(df, _check_cols, findings.target_column)
_all_excluded = sorted(set(_auto_leakers) | set(EXCLUDE_LEAKING_FEATURES))
if _all_excluded:
for _col in _all_excluded:
findings.columns.pop(_col, None)
df = df.drop(columns=[c for c in _all_excluded if c in df.columns])
findings.excluded_leaking_features = _all_excluded
_auto_only = [c for c in _auto_leakers if c not in EXCLUDE_LEAKING_FEATURES]
_manual_only = [c for c in EXCLUDE_LEAKING_FEATURES if c not in _auto_leakers]
print(f"Excluded {len(_all_excluded)} leaking feature(s):")
if _auto_only:
print(f" Auto-detected: {', '.join(_auto_only)}")
if _manual_only:
print(f" Manual: {', '.join(_manual_only)}")
_both = [c for c in _all_excluded if c in _auto_leakers and c in EXCLUDE_LEAKING_FEATURES]
if _both:
print(f" Both: {', '.join(_both)}")
else:
print("No leaking features detected.")
No leaking features detected.
5.2 Numeric Correlation MatrixΒΆ
π How to Read the Heatmap:
- Red (+1): Perfect positive correlation - features move together
- Blue (-1): Perfect negative correlation - features move opposite
- White (0): No linear relationship
β οΈ Multicollinearity Warning:
- Pairs with |r| > 0.7 may cause issues in linear models
- Consider removing one feature from highly correlated pairs
- Tree-based models are more robust to multicollinearity
Show/Hide Code
numeric_cols = [
name for name, col in findings.columns.items()
if col.inferred_type in [ColumnType.NUMERIC_CONTINUOUS, ColumnType.NUMERIC_DISCRETE, ColumnType.TARGET]
and name not in TEMPORAL_METADATA_COLS
]
if len(numeric_cols) >= 2:
corr_matrix = df[numeric_cols].corr()
fig = charts.heatmap(
corr_matrix.values,
x_labels=numeric_cols,
y_labels=numeric_cols,
title="Numeric Correlation Matrix"
)
display_figure(fig)
else:
print("Not enough numeric columns for correlation analysis.")
5.3 High Correlation PairsΒΆ
Show/Hide Code
high_corr_threshold = 0.7
high_corr_pairs = []
if len(numeric_cols) >= 2:
corr_matrix = df[numeric_cols].corr()
for i in range(len(numeric_cols)):
for j in range(i+1, len(numeric_cols)):
corr_val = corr_matrix.iloc[i, j]
if abs(corr_val) >= high_corr_threshold:
high_corr_pairs.append({
"Column 1": numeric_cols[i],
"Column 2": numeric_cols[j],
"Correlation": f"{corr_val:.3f}"
})
if high_corr_pairs:
print(f"High Correlation Pairs (|r| >= {high_corr_threshold}):")
display(pd.DataFrame(high_corr_pairs))
print("\nConsider removing one of each pair to reduce multicollinearity.")
else:
print("No high correlation pairs detected.")
High Correlation Pairs (|r| >= 0.7):
| Column 1 | Column 2 | Correlation | |
|---|---|---|---|
| 0 | event_count_180d | event_count_365d | 0.821 |
| 1 | event_count_180d | opened_count_180d | 1.000 |
| 2 | event_count_180d | clicked_count_180d | 1.000 |
| 3 | event_count_180d | send_hour_sum_180d | 0.975 |
| 4 | event_count_180d | send_hour_count_180d | 1.000 |
| ... | ... | ... | ... |
| 440 | bounced_vs_cohort_mean | bounced_cohort_zscore | 1.000 |
| 441 | bounced_vs_cohort_pct | bounced_cohort_zscore | 1.000 |
| 442 | time_to_open_hours_vs_cohort_mean | time_to_open_hours_vs_cohort_pct | 1.000 |
| 443 | time_to_open_hours_vs_cohort_mean | time_to_open_hours_cohort_zscore | 1.000 |
| 444 | time_to_open_hours_vs_cohort_pct | time_to_open_hours_cohort_zscore | 1.000 |
445 rows Γ 3 columns
Consider removing one of each pair to reduce multicollinearity.
5.4 Feature Distributions by Retention StatusΒΆ
π How to Interpret Box Plots:
- Box = Middle 50% of data (IQR)
- Line inside box = Median
- Whiskers = 1.5 Γ IQR from box edges
- Points outside = Outliers
β οΈ What Makes a Good Predictor:
- Clear separation between retained (green) and churned (red) boxes
- Different medians = Feature values differ between classes
- Minimal overlap = Easier to distinguish classes
Show/Hide Code
# Feature Distributions by Retention Status
if findings.target_column and findings.target_column in df.columns:
target = findings.target_column
feature_cols = [
name for name, col in findings.columns.items()
if col.inferred_type in [ColumnType.NUMERIC_CONTINUOUS, ColumnType.NUMERIC_DISCRETE]
and name != target
and name not in TEMPORAL_METADATA_COLS
]
if feature_cols:
print("=" * 80)
print(f"FEATURE DISTRIBUTIONS BY TARGET: {target}")
print("=" * 80)
# Calculate summary statistics by target
summary_by_target = []
for col in feature_cols:
for target_val, label in [(0, "Churned"), (1, "Retained")]:
subset = df[df[target] == target_val][col].dropna()
if len(subset) > 0:
summary_by_target.append({
"Feature": col,
"Group": label,
"Count": len(subset),
"Mean": subset.mean(),
"Median": subset.median(),
"Std": subset.std()
})
if summary_by_target:
summary_df = pd.DataFrame(summary_by_target)
# Display summary table
print("\nπ Summary Statistics by Retention Status:")
display_summary = summary_df.pivot(index="Feature", columns="Group", values=["Mean", "Median"])
display_summary.columns = [f"{stat} ({group})" for stat, group in display_summary.columns]
display(display_summary.round(3))
# Calculate effect size (Cohen's d) for each feature
print("\nπ Feature Importance Indicators (Effect Size - Cohen's d):")
print("-" * 70)
effect_sizes = []
for col in feature_cols:
churned = df[df[target] == 0][col].dropna()
retained = df[df[target] == 1][col].dropna()
if len(churned) > 0 and len(retained) > 0:
# Cohen's d
pooled_std = np.sqrt(((len(churned)-1)*churned.std()**2 + (len(retained)-1)*retained.std()**2) /
(len(churned) + len(retained) - 2))
if pooled_std > 0:
d = (retained.mean() - churned.mean()) / pooled_std
else:
d = 0
# Interpret effect size
abs_d = abs(d)
if abs_d >= 0.8:
interpretation = "Large effect"
emoji = "π΄"
elif abs_d >= 0.5:
interpretation = "Medium effect"
emoji = "π‘"
elif abs_d >= 0.2:
interpretation = "Small effect"
emoji = "π’"
else:
interpretation = "Negligible"
emoji = "βͺ"
effect_sizes.append({
"feature": col,
"cohens_d": d,
"abs_d": abs_d,
"interpretation": interpretation
})
direction = "β Higher in retained" if d > 0 else "β Lower in retained"
print(f" {emoji} {col}: d={d:+.3f} ({interpretation}) {direction}")
# Sort by effect size for identifying important features
if effect_sizes:
effect_df = pd.DataFrame(effect_sizes).sort_values("abs_d", ascending=False)
important_features = effect_df[effect_df["abs_d"] >= 0.2]["feature"].tolist()
if important_features:
print(f"\nβ Features with notable effect (|d| β₯ 0.2): {', '.join(important_features)}")
else:
print(" No effect sizes could be calculated (insufficient data in one or both groups)")
else:
print("No numeric feature columns found for distribution analysis.")
else:
print("Target column not available.")
================================================================================ FEATURE DISTRIBUTIONS BY TARGET: unsubscribed ================================================================================
π Summary Statistics by Retention Status:
| Mean (Churned) | Mean (Retained) | Median (Churned) | Median (Retained) | |
|---|---|---|---|---|
| Feature | ||||
| active_span_days | 2943.246 | 1446.595 | 3002.000 | 1395.000 |
| bounced_acceleration | 0.001 | 0.002 | 0.000 | 0.000 |
| bounced_beginning | 0.151 | 0.091 | 0.000 | 0.000 |
| bounced_cohort_zscore | -0.006 | 0.007 | -0.154 | -0.154 |
| bounced_count_180d | 1.093 | 0.073 | 1.000 | 0.000 |
| ... | ... | ... | ... | ... |
| time_to_open_hours_trend_ratio | 3.052 | 1.409 | 0.714 | 0.000 |
| time_to_open_hours_velocity | 0.010 | -0.005 | 0.000 | 0.000 |
| time_to_open_hours_velocity_pct | 0.311 | -0.659 | -1.000 | -1.000 |
| time_to_open_hours_vs_cohort_mean | 0.426 | -0.530 | -0.689 | -0.689 |
| time_to_open_hours_vs_cohort_pct | 1.619 | 0.230 | 0.000 | 0.000 |
179 rows Γ 4 columns
π Feature Importance Indicators (Effect Size - Cohen's d): ---------------------------------------------------------------------- π΄ event_count_180d: d=-1.170 (Large effect) β Lower in retained π΄ event_count_365d: d=-1.524 (Large effect) β Lower in retained π΄ event_count_all_time: d=-0.893 (Large effect) β Lower in retained π‘ opened_sum_180d: d=-0.663 (Medium effect) β Lower in retained π‘ opened_mean_180d: d=-0.579 (Medium effect) β Lower in retained π΄ opened_count_180d: d=-1.170 (Large effect) β Lower in retained π’ clicked_sum_180d: d=-0.385 (Small effect) β Lower in retained π’ clicked_mean_180d: d=-0.297 (Small effect) β Lower in retained π΄ clicked_count_180d: d=-1.170 (Large effect) β Lower in retained π΄ send_hour_sum_180d: d=-1.136 (Large effect) β Lower in retained βͺ send_hour_mean_180d: d=+0.026 (Negligible) β Higher in retained βͺ send_hour_max_180d: d=-0.070 (Negligible) β Lower in retained π΄ send_hour_count_180d: d=-1.170 (Large effect) β Lower in retained βͺ bounced_sum_180d: d=-0.173 (Negligible) β Lower in retained βͺ bounced_mean_180d: d=+0.012 (Negligible) β Higher in retained π΄ bounced_count_180d: d=-1.170 (Large effect) β Lower in retained π’ time_to_open_hours_sum_180d: d=-0.487 (Small effect) β Lower in retained βͺ time_to_open_hours_mean_180d: d=-0.055 (Negligible) β Lower in retained βͺ time_to_open_hours_max_180d: d=+0.025 (Negligible) β Higher in retained π‘ time_to_open_hours_count_180d: d=-0.663 (Medium effect) β Lower in retained π΄ opened_sum_365d: d=-0.957 (Large effect) β Lower in retained π‘ opened_mean_365d: d=-0.705 (Medium effect) β Lower in retained π΄ opened_count_365d: d=-1.524 (Large effect) β Lower in retained π‘ clicked_sum_365d: d=-0.546 (Medium effect) β Lower in retained π’ clicked_mean_365d: d=-0.361 (Small effect) β Lower in retained π΄ clicked_count_365d: d=-1.524 (Large effect) β Lower in retained π΄ send_hour_sum_365d: d=-1.481 (Large effect) β Lower in retained βͺ send_hour_mean_365d: d=-0.000 (Negligible) β Lower in retained βͺ send_hour_max_365d: d=-0.122 (Negligible) β Lower in retained π΄ send_hour_count_365d: d=-1.524 (Large effect) β Lower in retained π’ bounced_sum_365d: d=-0.261 (Small effect) β Lower in retained βͺ bounced_mean_365d: d=-0.134 (Negligible) β Lower in retained π΄ bounced_count_365d: d=-1.524 (Large effect) β Lower in retained π‘ time_to_open_hours_sum_365d: d=-0.687 (Medium effect) β Lower in retained βͺ time_to_open_hours_mean_365d: d=+0.051 (Negligible) β Higher in retained βͺ time_to_open_hours_max_365d: d=+0.044 (Negligible) β Higher in retained π΄ time_to_open_hours_count_365d: d=-0.957 (Large effect) β Lower in retained π΄ opened_sum_all_time: d=-1.008 (Large effect) β Lower in retained π΄ opened_mean_all_time: d=-0.839 (Large effect) β Lower in retained π΄ opened_count_all_time: d=-0.893 (Large effect) β Lower in retained π‘ clicked_sum_all_time: d=-0.692 (Medium effect) β Lower in retained π’ clicked_mean_all_time: d=-0.467 (Small effect) β Lower in retained π΄ clicked_count_all_time: d=-0.893 (Large effect) β Lower in retained π΄ send_hour_sum_all_time: d=-0.886 (Large effect) β Lower in retained βͺ send_hour_mean_all_time: d=-0.013 (Negligible) β Lower in retained π‘ send_hour_max_all_time: d=-0.594 (Medium effect) β Lower in retained π΄ send_hour_count_all_time: d=-0.893 (Large effect) β Lower in retained π’ bounced_sum_all_time: d=-0.338 (Small effect) β Lower in retained βͺ bounced_mean_all_time: d=-0.049 (Negligible) β Lower in retained π΄ bounced_count_all_time: d=-0.893 (Large effect) β Lower in retained π΄ time_to_open_hours_sum_all_time: d=-0.823 (Large effect) β Lower in retained βͺ time_to_open_hours_mean_all_time: d=-0.023 (Negligible) β Lower in retained π’ time_to_open_hours_max_all_time: d=-0.412 (Small effect) β Lower in retained π΄ time_to_open_hours_count_all_time: d=-1.008 (Large effect) β Lower in retained π΄ days_since_last_event_x: d=+2.403 (Large effect) β Higher in retained βͺ days_since_first_event_x: d=+0.134 (Negligible) β Higher in retained βͺ dow_sin: d=-0.105 (Negligible) β Lower in retained π’ dow_cos: d=+0.350 (Small effect) β Higher in retained βͺ bounced_momentum_180_365: d=+0.052 (Negligible) β Higher in retained βͺ clicked_momentum_180_365: d=+0.009 (Negligible) β Higher in retained π‘ lag0_opened_sum: d=-0.632 (Medium effect) β Lower in retained π‘ lag0_opened_mean: d=-0.729 (Medium effect) β Lower in retained π’ lag0_opened_count: d=+0.210 (Small effect) β Higher in retained π’ lag0_clicked_sum: d=-0.314 (Small effect) β Lower in retained π’ lag0_clicked_mean: d=-0.361 (Small effect) β Lower in retained π’ lag0_clicked_count: d=+0.210 (Small effect) β Higher in retained βͺ lag0_send_hour_sum: d=+0.189 (Negligible) β Higher in retained βͺ lag0_send_hour_mean: d=-0.001 (Negligible) β Lower in retained π’ lag0_send_hour_count: d=+0.210 (Small effect) β Higher in retained βͺ lag0_send_hour_max: d=+0.062 (Negligible) β Higher in retained βͺ lag0_bounced_sum: d=+0.013 (Negligible) β Higher in retained βͺ lag0_bounced_mean: d=-0.008 (Negligible) β Lower in retained π’ lag0_bounced_count: d=+0.210 (Small effect) β Higher in retained π’ lag0_time_to_open_hours_sum: d=-0.418 (Small effect) β Lower in retained βͺ lag0_time_to_open_hours_mean: d=+0.106 (Negligible) β Higher in retained π‘ lag0_time_to_open_hours_count: d=-0.632 (Medium effect) β Lower in retained βͺ lag0_time_to_open_hours_max: d=+0.135 (Negligible) β Higher in retained π’ lag1_opened_sum: d=-0.390 (Small effect) β Lower in retained π‘ lag1_opened_mean: d=-0.534 (Medium effect) β Lower in retained βͺ lag1_opened_count: d=+0.146 (Negligible) β Higher in retained π’ lag1_clicked_sum: d=-0.282 (Small effect) β Lower in retained π’ lag1_clicked_mean: d=-0.337 (Small effect) β Lower in retained βͺ lag1_clicked_count: d=+0.146 (Negligible) β Higher in retained π’ lag1_send_hour_sum: d=+0.321 (Small effect) β Higher in retained βͺ lag1_send_hour_mean: d=-0.013 (Negligible) β Lower in retained βͺ lag1_send_hour_count: d=+0.146 (Negligible) β Higher in retained βͺ lag1_send_hour_max: d=+0.084 (Negligible) β Higher in retained βͺ lag1_bounced_mean: d=-0.045 (Negligible) β Lower in retained βͺ lag1_bounced_count: d=+0.146 (Negligible) β Higher in retained π’ lag1_time_to_open_hours_sum: d=-0.291 (Small effect) β Lower in retained βͺ lag1_time_to_open_hours_mean: d=+0.057 (Negligible) β Higher in retained βͺ lag1_time_to_open_hours_count: d=-0.125 (Negligible) β Lower in retained βͺ lag1_time_to_open_hours_max: d=+0.107 (Negligible) β Higher in retained π’ lag2_opened_sum: d=-0.359 (Small effect) β Lower in retained π’ lag2_opened_mean: d=-0.498 (Small effect) β Lower in retained βͺ lag2_opened_count: d=+0.144 (Negligible) β Higher in retained βͺ lag2_clicked_sum: d=-0.162 (Negligible) β Lower in retained π’ lag2_clicked_mean: d=-0.225 (Small effect) β Lower in retained βͺ lag2_clicked_count: d=+0.144 (Negligible) β Higher in retained π’ lag2_send_hour_sum: d=+0.228 (Small effect) β Higher in retained βͺ lag2_send_hour_mean: d=-0.104 (Negligible) β Lower in retained βͺ lag2_send_hour_count: d=+0.144 (Negligible) β Higher in retained βͺ lag2_send_hour_max: d=-0.049 (Negligible) β Lower in retained βͺ lag2_bounced_mean: d=+0.009 (Negligible) β Higher in retained βͺ lag2_bounced_count: d=+0.144 (Negligible) β Higher in retained π’ lag2_time_to_open_hours_sum: d=-0.302 (Small effect) β Lower in retained βͺ lag2_time_to_open_hours_mean: d=-0.098 (Negligible) β Lower in retained βͺ lag2_time_to_open_hours_count: d=-0.102 (Negligible) β Lower in retained βͺ lag2_time_to_open_hours_max: d=-0.055 (Negligible) β Lower in retained
βͺ lag3_opened_sum: d=-0.177 (Negligible) β Lower in retained
π’ lag3_opened_mean: d=-0.284 (Small effect) β Lower in retained βͺ lag3_opened_count: d=+0.145 (Negligible) β Higher in retained βͺ lag3_clicked_mean: d=-0.199 (Negligible) β Lower in retained βͺ lag3_clicked_count: d=+0.145 (Negligible) β Higher in retained π’ lag3_send_hour_sum: d=+0.264 (Small effect) β Higher in retained βͺ lag3_send_hour_mean: d=+0.025 (Negligible) β Higher in retained βͺ lag3_send_hour_count: d=+0.145 (Negligible) β Higher in retained βͺ lag3_send_hour_max: d=+0.090 (Negligible) β Higher in retained βͺ lag3_bounced_mean: d=-0.070 (Negligible) β Lower in retained βͺ lag3_bounced_count: d=+0.145 (Negligible) β Higher in retained βͺ lag3_time_to_open_hours_sum: d=-0.076 (Negligible) β Lower in retained βͺ lag3_time_to_open_hours_mean: d=+0.189 (Negligible) β Higher in retained βͺ lag3_time_to_open_hours_count: d=-0.030 (Negligible) β Lower in retained π’ lag3_time_to_open_hours_max: d=+0.220 (Small effect) β Higher in retained βͺ opened_velocity: d=-0.169 (Negligible) β Lower in retained π’ opened_velocity_pct: d=-0.264 (Small effect) β Lower in retained βͺ clicked_velocity: d=-0.007 (Negligible) β Lower in retained βͺ send_hour_velocity: d=+0.086 (Negligible) β Higher in retained βͺ send_hour_velocity_pct: d=+0.050 (Negligible) β Higher in retained βͺ bounced_velocity: d=+0.059 (Negligible) β Higher in retained βͺ time_to_open_hours_velocity: d=-0.124 (Negligible) β Lower in retained βͺ time_to_open_hours_velocity_pct: d=-0.167 (Negligible) β Lower in retained βͺ opened_acceleration: d=-0.020 (Negligible) β Lower in retained π’ opened_momentum: d=-0.266 (Small effect) β Lower in retained π’ clicked_acceleration: d=+0.266 (Small effect) β Higher in retained βͺ send_hour_acceleration: d=+0.052 (Negligible) β Higher in retained βͺ send_hour_momentum: d=+0.118 (Negligible) β Higher in retained βͺ bounced_acceleration: d=+0.075 (Negligible) β Higher in retained βͺ time_to_open_hours_acceleration: d=-0.127 (Negligible) β Lower in retained βͺ time_to_open_hours_momentum: d=-0.167 (Negligible) β Lower in retained π‘ opened_beginning: d=-0.555 (Medium effect) β Lower in retained π΄ opened_end: d=-1.007 (Large effect) β Lower in retained π‘ opened_trend_ratio: d=-0.701 (Medium effect) β Lower in retained π’ clicked_beginning: d=-0.338 (Small effect) β Lower in retained π‘ clicked_end: d=-0.623 (Medium effect) β Lower in retained π’ clicked_trend_ratio: d=-0.453 (Small effect) β Lower in retained π‘ send_hour_beginning: d=-0.697 (Medium effect) β Lower in retained π‘ send_hour_end: d=-0.735 (Medium effect) β Lower in retained βͺ send_hour_trend_ratio: d=+0.058 (Negligible) β Higher in retained βͺ bounced_beginning: d=-0.167 (Negligible) β Lower in retained π’ bounced_end: d=-0.202 (Small effect) β Lower in retained βͺ bounced_trend_ratio: d=+0.011 (Negligible) β Higher in retained π’ time_to_open_hours_beginning: d=-0.443 (Small effect) β Lower in retained π‘ time_to_open_hours_end: d=-0.753 (Medium effect) β Lower in retained βͺ time_to_open_hours_trend_ratio: d=-0.175 (Negligible) β Lower in retained βͺ days_since_last_event_y: d=+0.000 (Negligible) β Lower in retained π΄ days_since_first_event_y: d=-2.365 (Large effect) β Lower in retained π΄ active_span_days: d=-2.365 (Large effect) β Lower in retained βͺ recency_ratio: d=+0.000 (Negligible) β Lower in retained βͺ event_frequency: d=+0.172 (Negligible) β Higher in retained π’ inter_event_gap_mean: d=-0.381 (Small effect) β Lower in retained π‘ inter_event_gap_std: d=-0.549 (Medium effect) β Lower in retained π΄ inter_event_gap_max: d=-0.929 (Large effect) β Lower in retained π’ regularity_score: d=+0.476 (Small effect) β Higher in retained π‘ opened_vs_cohort_mean: d=-0.632 (Medium effect) β Lower in retained π‘ opened_vs_cohort_pct: d=-0.632 (Medium effect) β Lower in retained π‘ opened_cohort_zscore: d=-0.632 (Medium effect) β Lower in retained π’ clicked_vs_cohort_mean: d=-0.314 (Small effect) β Lower in retained π’ clicked_vs_cohort_pct: d=-0.314 (Small effect) β Lower in retained π’ clicked_cohort_zscore: d=-0.314 (Small effect) β Lower in retained βͺ send_hour_vs_cohort_mean: d=+0.189 (Negligible) β Higher in retained βͺ send_hour_vs_cohort_pct: d=+0.189 (Negligible) β Higher in retained βͺ send_hour_cohort_zscore: d=+0.189 (Negligible) β Higher in retained βͺ bounced_vs_cohort_mean: d=+0.013 (Negligible) β Higher in retained βͺ bounced_vs_cohort_pct: d=+0.013 (Negligible) β Higher in retained βͺ bounced_cohort_zscore: d=+0.013 (Negligible) β Higher in retained π’ time_to_open_hours_vs_cohort_mean: d=-0.418 (Small effect) β Lower in retained π’ time_to_open_hours_vs_cohort_pct: d=-0.418 (Small effect) β Lower in retained π’ time_to_open_hours_cohort_zscore: d=-0.418 (Small effect) β Lower in retained β Features with notable effect (|d| β₯ 0.2): days_since_last_event_x, days_since_first_event_y, active_span_days, send_hour_count_365d, event_count_365d, bounced_count_365d, clicked_count_365d, opened_count_365d, send_hour_sum_365d, send_hour_count_180d, bounced_count_180d, event_count_180d, clicked_count_180d, opened_count_180d, send_hour_sum_180d, opened_sum_all_time, time_to_open_hours_count_all_time, opened_end, opened_sum_365d, time_to_open_hours_count_365d, inter_event_gap_max, send_hour_count_all_time, clicked_count_all_time, bounced_count_all_time, event_count_all_time, opened_count_all_time, send_hour_sum_all_time, opened_mean_all_time, time_to_open_hours_sum_all_time, time_to_open_hours_end, send_hour_end, lag0_opened_mean, opened_mean_365d, opened_trend_ratio, send_hour_beginning, clicked_sum_all_time, time_to_open_hours_sum_365d, opened_sum_180d, time_to_open_hours_count_180d, opened_cohort_zscore, lag0_opened_sum, lag0_time_to_open_hours_count, opened_vs_cohort_pct, opened_vs_cohort_mean, clicked_end, send_hour_max_all_time, opened_mean_180d, opened_beginning, inter_event_gap_std, clicked_sum_365d, lag1_opened_mean, lag2_opened_mean, time_to_open_hours_sum_180d, regularity_score, clicked_mean_all_time, clicked_trend_ratio, time_to_open_hours_beginning, time_to_open_hours_vs_cohort_pct, time_to_open_hours_cohort_zscore, time_to_open_hours_vs_cohort_mean, lag0_time_to_open_hours_sum, time_to_open_hours_max_all_time, lag1_opened_sum, clicked_sum_180d, inter_event_gap_mean, clicked_mean_365d, lag0_clicked_mean, lag2_opened_sum, dow_cos, bounced_sum_all_time, clicked_beginning, lag1_clicked_mean, lag1_send_hour_sum, clicked_vs_cohort_pct, lag0_clicked_sum, clicked_cohort_zscore, clicked_vs_cohort_mean, lag2_time_to_open_hours_sum, clicked_mean_180d, lag1_time_to_open_hours_sum, lag3_opened_mean, lag1_clicked_sum, clicked_acceleration, opened_momentum, lag3_send_hour_sum, opened_velocity_pct, bounced_sum_365d, lag2_send_hour_sum, lag2_clicked_mean, lag3_time_to_open_hours_max, lag0_opened_count, lag0_send_hour_count, lag0_bounced_count, lag0_clicked_count, bounced_end
Interpreting Effect Sizes (Cohen's d)ΒΆ
| Effect Size | Interpretation | What It Means for Modeling |
|---|---|---|
| |d| β₯ 0.8 | Large | Strong discriminator - prioritize this feature |
| |d| = 0.5-0.8 | Medium | Useful predictor - include in model |
| |d| = 0.2-0.5 | Small | Weak but may help in combination with others |
| |d| < 0.2 | Negligible | Limited predictive value alone |
π― Actionable Insights:
- Features with large effects are your best predictors - ensure they're included in your model
- Direction matters: "Higher in retained" means customers with high values tend to stay; use this for threshold-based business rules
- Features with small/negligible effects may still be useful in combination or as interaction terms
β οΈ Cautions:
- Effect size assumes roughly normal distributions - check skewness in notebook 03
- Large effects could be due to confounding variables - validate with domain knowledge
- Correlation β causation: high engagement may not cause retention
Box Plot VisualizationΒΆ
π How to Read the Box Plots Below:
- Well-separated boxes (little/no overlap) β Feature clearly distinguishes retained vs churned
- Different medians (center lines at different heights) β Groups have different typical values
- Many outliers in one group β May indicate subpopulations worth investigating
Show/Hide Code
# Box Plots: Visual comparison of distributions
if findings.target_column and findings.target_column in df.columns:
target = findings.target_column
feature_cols = [
name for name, col in findings.columns.items()
if col.inferred_type in [ColumnType.NUMERIC_CONTINUOUS, ColumnType.NUMERIC_DISCRETE]
and name != target
and name not in TEMPORAL_METADATA_COLS
]
if feature_cols:
# Create box plots - one subplot per feature for better control
n_features = min(len(feature_cols), 6)
fig = make_subplots(
rows=1, cols=n_features,
subplot_titles=feature_cols[:n_features],
horizontal_spacing=0.05
)
for i, col in enumerate(feature_cols[:n_features]):
col_num = i + 1
# Retained (1) - Green
retained_data = df[df[target] == 1][col].dropna()
fig.add_trace(
go.Box(
y=retained_data,
name='Retained',
fillcolor='rgba(46, 204, 113, 0.7)',
line=dict(color='#1e8449', width=2),
marker=dict(
color='rgba(46, 204, 113, 0.5)', # Light green outliers
size=5,
line=dict(color='#1e8449', width=1)
),
boxpoints='outliers',
width=0.35,
showlegend=(i == 0),
legendgroup='retained',
offsetgroup='retained'
),
row=1, col=col_num
)
# Churned (0) - Red
churned_data = df[df[target] == 0][col].dropna()
fig.add_trace(
go.Box(
y=churned_data,
name='Churned',
fillcolor='rgba(231, 76, 60, 0.7)',
line=dict(color='#922b21', width=2),
marker=dict(
color='rgba(231, 76, 60, 0.5)', # Light red outliers
size=5,
line=dict(color='#922b21', width=1)
),
boxpoints='outliers',
width=0.35,
showlegend=(i == 0),
legendgroup='churned',
offsetgroup='churned'
),
row=1, col=col_num
)
fig.update_layout(
height=450,
title_text="Feature Distributions: Retained (Green) vs Churned (Red)",
template='plotly_white',
showlegend=True,
legend=dict(orientation="h", yanchor="bottom", y=1.05, xanchor="center", x=0.5),
boxmode='group',
boxgap=0.3,
boxgroupgap=0.1
)
# Center the boxes by removing x-axis tick labels (title is above each subplot)
fig.update_xaxes(showticklabels=False)
display_figure(fig)
# Print mean comparison
print("\nπ MEAN COMPARISON BY RETENTION STATUS:")
print("-" * 70)
for col in feature_cols[:n_features]:
retained_mean = df[df[target] == 1][col].mean()
churned_mean = df[df[target] == 0][col].mean()
diff_pct = ((retained_mean - churned_mean) / churned_mean * 100) if churned_mean != 0 else 0
print(f" {col}:")
print(f" Retained: {retained_mean:.2f} | Churned: {churned_mean:.2f} | Diff: {diff_pct:+.1f}%")
π MEAN COMPARISON BY RETENTION STATUS:
----------------------------------------------------------------------
event_count_180d:
Retained: 0.07 | Churned: 1.09 | Diff: -93.3%
event_count_365d:
Retained: 0.20 | Churned: 2.21 | Diff: -90.9%
event_count_all_time:
Retained: 12.43 | Churned: 19.89 | Diff: -37.5%
opened_sum_180d:
Retained: 0.00 | Churned: 0.27 | Diff: -98.5%
opened_mean_180d:
Retained: 0.03 | Churned: 0.24 | Diff: -86.1%
opened_count_180d:
Retained: 0.07 | Churned: 1.09 | Diff: -93.3%
5.5 Feature-Target CorrelationsΒΆ
Features ranked by absolute correlation with the target variable.
π Interpretation:
- Positive correlation: Higher values = more likely retained
- Negative correlation: Higher values = more likely churned
- |r| > 0.3: Moderately predictive
- |r| > 0.5: Strongly predictive
Show/Hide Code
if findings.target_column and findings.target_column in df.columns:
target = findings.target_column
feature_cols = [
name for name, col in findings.columns.items()
if col.inferred_type in [ColumnType.NUMERIC_CONTINUOUS, ColumnType.NUMERIC_DISCRETE]
and name != target
and name not in TEMPORAL_METADATA_COLS
]
if feature_cols:
correlations = []
for col in feature_cols:
corr = df[[col, target]].corr().iloc[0, 1]
correlations.append({"Feature": col, "Correlation": corr})
corr_df = pd.DataFrame(correlations).sort_values("Correlation", key=abs, ascending=False)
fig = charts.bar_chart(
corr_df["Feature"].tolist(),
corr_df["Correlation"].tolist(),
title=f"Feature Correlations with {target}"
)
display_figure(fig)
else:
print("Target column not available for correlation analysis.")
5.6 Categorical Feature AnalysisΒΆ
Retention rates by category help identify which segments are at higher risk.
π What to Look For:
- Categories with low retention rates = high-risk segments for intervention
- Large variation across categories = strong predictive feature
- Small categories with extreme rates may be unreliable (small sample size)
π Metrics Explained:
- Retention Rate: % of customers in category who were retained
- Lift: How much better/worse than overall retention rate (>1 = better, <1 = worse)
- CramΓ©r's V: Strength of association (0-1 scale, like correlation for categorical)
Show/Hide Code
from customer_retention.stages.profiling import CategoricalTargetAnalyzer
if findings.target_column:
target = findings.target_column
overall_retention = df[target].mean()
categorical_cols = [
name for name, col in findings.columns.items()
if col.inferred_type in [ColumnType.CATEGORICAL_NOMINAL, ColumnType.CATEGORICAL_ORDINAL]
and name not in TEMPORAL_METADATA_COLS
]
print("=" * 80)
print("CATEGORICAL FEATURE ANALYSIS")
print("=" * 80)
print(f"Overall retention rate: {overall_retention:.1%}")
if categorical_cols:
# Use framework analyzer for summary
cat_analyzer = CategoricalTargetAnalyzer(min_samples_per_category=10)
summary_df = cat_analyzer.analyze_multiple(df, categorical_cols, target)
print("\nπ Categorical Feature Strength (CramΓ©r's V):")
print("-" * 60)
for _, row in summary_df.iterrows():
if row["cramers_v"] >= 0.3:
strength = "Strong"
emoji = "π΄"
elif row["cramers_v"] >= 0.1:
strength = "Moderate"
emoji = "π‘"
else:
strength = "Weak"
emoji = "π’"
sig = "***" if row["p_value"] < 0.001 else "**" if row["p_value"] < 0.01 else "*" if row["p_value"] < 0.05 else ""
print(f" {emoji} {row['feature']}: V={row['cramers_v']:.3f} ({strength}) {sig}")
# Detailed analysis for each categorical feature
for col_name in categorical_cols[:5]:
result = cat_analyzer.analyze(df, col_name, target)
print(f"\n{'='*60}")
print(f"π {col_name.upper()}")
print("="*60)
# Display stats table
if len(result.category_stats) > 0:
display_stats = result.category_stats[['category', 'total_count', 'retention_rate', 'lift', 'pct_of_total']].copy()
display_stats['retention_rate'] = display_stats['retention_rate'].apply(lambda x: f"{x:.1%}")
display_stats['lift'] = display_stats['lift'].apply(lambda x: f"{x:.2f}x")
display_stats['pct_of_total'] = display_stats['pct_of_total'].apply(lambda x: f"{x:.1%}")
display_stats.columns = [col_name, 'Count', 'Retention Rate', 'Lift', '% of Data']
display(display_stats)
# Stacked bar chart
cat_stats = result.category_stats
categories = cat_stats['category'].tolist()
retained_counts = cat_stats['retained_count'].tolist()
churned_counts = cat_stats['churned_count'].tolist()
fig = go.Figure()
fig.add_trace(go.Bar(
name='Retained',
x=categories,
y=retained_counts,
marker_color='rgba(46, 204, 113, 0.8)',
text=[f"{r/(r+c)*100:.0f}%" for r, c in zip(retained_counts, churned_counts)],
textposition='inside',
textfont=dict(color='white', size=12)
))
fig.add_trace(go.Bar(
name='Churned',
x=categories,
y=churned_counts,
marker_color='rgba(231, 76, 60, 0.8)',
text=[f"{c/(r+c)*100:.0f}%" for r, c in zip(retained_counts, churned_counts)],
textposition='inside',
textfont=dict(color='white', size=12)
))
fig.update_layout(
barmode='stack',
title=f"Retention by {col_name}",
xaxis_title=col_name,
yaxis_title="Count",
template='plotly_white',
height=350,
legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="center", x=0.5)
)
display_figure(fig)
# Flag high-risk categories from framework result
if result.high_risk_categories:
print("\n β οΈ High-risk categories (lift < 0.9x):")
for cat in result.high_risk_categories:
cat_row = cat_stats[cat_stats['category'] == cat].iloc[0]
print(f" β’ {cat}: {cat_row['retention_rate']:.1%} retention ({cat_row['lift']:.2f}x lift)")
else:
print("\n βΉοΈ No categorical columns detected.")
else:
print("No target column available for categorical analysis.")
================================================================================ CATEGORICAL FEATURE ANALYSIS ================================================================================ Overall retention rate: 44.6% π Categorical Feature Strength (CramΓ©r's V): ------------------------------------------------------------ π΄ lifecycle_quadrant: V=0.728 (Strong) *** π΄ recency_bucket: V=0.619 (Strong) *** ============================================================ π LIFECYCLE_QUADRANT ============================================================
| lifecycle_quadrant | Count | Retention Rate | Lift | % of Data | |
|---|---|---|---|---|---|
| 0 | Intense & Brief | 1679 | 82.7% | 1.86x | 33.6% |
| 1 | One-shot | 816 | 76.7% | 1.72x | 16.3% |
| 2 | Steady & Loyal | 820 | 10.4% | 0.23x | 16.4% |
| 3 | Occasional & Loyal | 1683 | 7.6% | 0.17x | 33.7% |
β οΈ High-risk categories (lift < 0.9x):
β’ Steady & Loyal: 10.4% retention (0.23x lift)
β’ Occasional & Loyal: 7.6% retention (0.17x lift)
============================================================
π RECENCY_BUCKET
============================================================
| recency_bucket | Count | Retention Rate | Lift | % of Data | |
|---|---|---|---|---|---|
| 0 | >180d | 3084 | 68.8% | 1.54x | 61.7% |
| 1 | 91-180d | 702 | 7.0% | 0.16x | 14.0% |
| 2 | 31-90d | 725 | 5.9% | 0.13x | 14.5% |
| 3 | 0-7d | 123 | 4.9% | 0.11x | 2.5% |
| 4 | 8-30d | 364 | 2.2% | 0.05x | 7.3% |
β οΈ High-risk categories (lift < 0.9x):
β’ 91-180d: 7.0% retention (0.16x lift)
β’ 31-90d: 5.9% retention (0.13x lift)
β’ 0-7d: 4.9% retention (0.11x lift)
β’ 8-30d: 2.2% retention (0.05x lift)
5.7 Scatter Plot Matrix (Sample)ΒΆ
Visual exploration of pairwise relationships between numeric features.
π How to Read the Scatter Matrix:
- Diagonal: Distribution of each feature (histogram or density)
- Off-diagonal: Scatter plot showing relationship between two features
- Each row/column represents one feature
π What to Look For:
| Pattern | What It Means | Action |
|---|---|---|
| Linear trend (diagonal line of points) | Strong correlation | Check if redundant; may cause multicollinearity |
| Curved pattern | Non-linear relationship | Consider polynomial features or transformations |
| Clusters/groups | Natural segments in data | May benefit from segment-aware modeling |
| Fan shape (spreading out) | Heteroscedasticity | May need log transform or robust methods |
| Random scatter | No relationship | Features are independent |
β οΈ Cautions:
- Sample shown (max 1000 points) for performance - patterns may differ in full data
- Look for the same patterns in correlation matrix (section 4.2) to confirm
Show/Hide Code
top_numeric = numeric_cols[:4] if len(numeric_cols) > 4 else numeric_cols
if len(top_numeric) >= 2:
fig = charts.scatter_matrix(
df[top_numeric].sample(min(1000, len(df))),
title="Scatter Plot Matrix (Sample)"
)
display_figure(fig)
Interpreting the Scatter Matrix AboveΒΆ
π― Key Questions to Answer:
Are any features redundant?
- Look for tight linear patterns β high correlation β consider dropping one
- Cross-reference with high correlation pairs in section 4.3
Are there natural customer segments?
- Distinct clusters suggest different customer types
- Links to segment-aware outlier analysis in notebook 03
Do relationships suggest feature engineering?
- Curved patterns β polynomial or interaction terms may help
- Ratios between correlated features may be more predictive
Are distributions suitable for linear models?
- Fan shapes or heavy skew β consider transformations
- Outlier clusters β verify with segment analysis
π‘ Pro Tip: Hover over points in the interactive plot to see exact values. Look for outliers that appear across multiple scatter plots - these may be influential observations worth investigating.
5.8 Datetime Feature AnalysisΒΆ
Temporal patterns can reveal important retention signals - when customers joined, their last activity, and seasonal patterns.
π What to Look For:
- Cohort effects: Do customers who joined in certain periods have different retention?
- Recency patterns: How does time since last activity relate to retention?
- Seasonal trends: Are there monthly or quarterly patterns?
π Common Temporal Features:
| Feature Type | Example | Typical Insight |
|---|---|---|
| Tenure | Days since signup | Longer tenure often = higher retention |
| Recency | Days since last order | Recent activity = engaged customer |
| Cohort | Signup month/year | Economic conditions affect cohorts |
| Day of Week | Signup day | Weekend vs weekday patterns |
Show/Hide Code
from customer_retention.stages.profiling import TemporalTargetAnalyzer
datetime_cols = [
name for name, col in findings.columns.items()
if col.inferred_type == ColumnType.DATETIME
]
print("=" * 80)
print("DATETIME FEATURE ANALYSIS")
print("=" * 80)
print(f"Detected datetime columns: {datetime_cols}")
if datetime_cols and findings.target_column:
target = findings.target_column
overall_retention = df[target].mean()
# Use framework analyzer
temporal_analyzer = TemporalTargetAnalyzer(min_samples_per_period=10)
for col_name in datetime_cols[:3]:
result = temporal_analyzer.analyze(df, col_name, target)
print(f"\n{'='*60}")
print(f"π
{col_name.upper()}")
print("="*60)
if result.n_valid_dates == 0:
print(" No valid dates found")
continue
print(f" Date range: {result.min_date} to {result.max_date}")
print(f" Valid dates: {result.n_valid_dates:,}")
# 1. Retention by Year (from framework result)
if len(result.yearly_stats) > 1:
print(f"\n π Retention by Year: Trend is {result.yearly_trend}")
year_stats = result.yearly_stats
fig = make_subplots(rows=1, cols=2, subplot_titles=["Retention Rate by Year", "Customer Count by Year"],
column_widths=[0.6, 0.4])
fig.add_trace(
go.Scatter(
x=year_stats['period'].astype(str),
y=year_stats['retention_rate'],
mode='lines+markers',
name='Retention Rate',
line=dict(color='#3498db', width=3),
marker=dict(size=10)
),
row=1, col=1
)
fig.add_hline(y=overall_retention, line_dash="dash", line_color="gray",
annotation_text=f"Overall: {overall_retention:.1%}", row=1, col=1)
fig.add_trace(
go.Bar(
x=year_stats['period'].astype(str),
y=year_stats['count'],
name='Count',
marker_color='rgba(52, 152, 219, 0.6)'
),
row=1, col=2
)
fig.update_layout(height=350, template='plotly_white', showlegend=False)
fig.update_yaxes(tickformat='.0%', row=1, col=1)
display_figure(fig)
# 2. Retention by Month (from framework result)
if len(result.monthly_stats) > 1:
print("\n π Retention by Month (Seasonality):")
month_stats = result.monthly_stats
colors = ['rgba(46, 204, 113, 0.7)' if r >= overall_retention else 'rgba(231, 76, 60, 0.7)'
for r in month_stats['retention_rate']]
fig = go.Figure()
fig.add_trace(go.Bar(
x=month_stats['month_name'],
y=month_stats['retention_rate'],
marker_color=colors,
text=[f"{r:.0%}" for r in month_stats['retention_rate']],
textposition='outside'
))
fig.add_hline(y=overall_retention, line_dash="dash", line_color="gray",
annotation_text=f"Overall: {overall_retention:.1%}")
fig.update_layout(
title=f"Monthly Retention Pattern ({col_name})",
xaxis_title="Month",
yaxis_title="Retention Rate",
template='plotly_white',
height=350,
yaxis_tickformat='.0%'
)
display_figure(fig)
# Seasonal insights from framework
if result.seasonal_spread > 0.05:
print(f" π Seasonal spread: {result.seasonal_spread:.1%}")
print(f" Best month: {result.best_month}")
print(f" Worst month: {result.worst_month}")
# 3. Retention by Day of Week (from framework result)
if len(result.dow_stats) > 1:
print("\n π Retention by Day of Week:")
dow_stats = result.dow_stats
colors = ['rgba(46, 204, 113, 0.7)' if r >= overall_retention else 'rgba(231, 76, 60, 0.7)'
for r in dow_stats['retention_rate']]
fig = go.Figure()
fig.add_trace(go.Bar(
x=dow_stats['day_name'],
y=dow_stats['retention_rate'],
marker_color=colors,
text=[f"{r:.0%}" for r in dow_stats['retention_rate']],
textposition='outside'
))
fig.add_hline(y=overall_retention, line_dash="dash", line_color="gray")
fig.update_layout(
title=f"Day of Week Pattern ({col_name})",
xaxis_title="Day of Week",
yaxis_title="Retention Rate",
template='plotly_white',
height=300,
yaxis_tickformat='.0%'
)
display_figure(fig)
else:
if not datetime_cols:
print("\n βΉοΈ No datetime columns detected in this dataset.")
print(" Consider adding date parsing in notebook 01 if dates exist as strings.")
else:
print("\n βΉοΈ No target column available for retention analysis.")
================================================================================
DATETIME FEATURE ANALYSIS
================================================================================
Detected datetime columns: []
βΉοΈ No datetime columns detected in this dataset.
Consider adding date parsing in notebook 01 if dates exist as strings.
5.9 Actionable Recommendations SummaryΒΆ
This section consolidates all relationship analysis findings into actionable recommendations organized by their impact on the modeling pipeline.
π Recommendation Categories:
| Category | Purpose | Impact |
|---|---|---|
| Feature Selection | Which features to keep/drop | Reduces noise, improves interpretability |
| Feature Engineering | New features to create | Captures interactions, improves accuracy |
| Stratification | Train/test split strategy | Ensures fair evaluation, prevents leakage |
| Model Selection | Which algorithms to try | Matches model to data characteristics |
Show/Hide Code
# Generate comprehensive actionable recommendations
recommender = RelationshipRecommender()
# Gather columns by type
numeric_features = [
name for name, col in findings.columns.items()
if col.inferred_type in [ColumnType.NUMERIC_CONTINUOUS, ColumnType.NUMERIC_DISCRETE]
and name != findings.target_column
and name not in TEMPORAL_METADATA_COLS
]
categorical_features = [
name for name, col in findings.columns.items()
if col.inferred_type in [ColumnType.CATEGORICAL_NOMINAL, ColumnType.CATEGORICAL_ORDINAL]
and name not in TEMPORAL_METADATA_COLS
]
# Run comprehensive analysis
analysis_summary = recommender.analyze(
df,
numeric_cols=numeric_features,
categorical_cols=categorical_features,
target_col=findings.target_column,
)
print("=" * 80)
print("ACTIONABLE RECOMMENDATIONS FROM RELATIONSHIP ANALYSIS")
print("=" * 80)
# Group recommendations by category
grouped_recs = analysis_summary.recommendations_by_category
high_priority = analysis_summary.high_priority_actions
if high_priority:
print(f"\nπ΄ HIGH PRIORITY ACTIONS ({len(high_priority)}):")
print("-" * 60)
for rec in high_priority:
print(f"\n π {rec.title}")
print(f" {rec.description}")
print(f" β Action: {rec.action}")
if rec.affected_features:
print(f" β Features: {', '.join(rec.affected_features[:5])}")
# Persist recommendations to registry
for pair in analysis_summary.multicollinear_pairs:
registry.add_gold_drop_multicollinear(
column=pair["feature1"], correlated_with=pair["feature2"],
correlation=pair["correlation"],
rationale=f"High correlation ({pair['correlation']:.2f}) - consider dropping one",
source_notebook="05_relationship_analysis"
)
for predictor in analysis_summary.strong_predictors:
registry.add_gold_prioritize_feature(
column=predictor["feature"], effect_size=predictor["effect_size"],
correlation=predictor["correlation"],
rationale=f"Strong predictor with effect size {predictor['effect_size']:.2f}",
source_notebook="05_relationship_analysis"
)
for weak_col in analysis_summary.weak_predictors[:10]:
registry.add_gold_drop_weak(
column=weak_col, effect_size=0.0, correlation=0.0,
rationale="Negligible predictive power",
source_notebook="05_relationship_analysis"
)
# Persist ratio feature recommendations
for rec in grouped_recs.get(RecommendationCategory.FEATURE_ENGINEERING, []):
if "ratio" in rec.title.lower() and len(rec.affected_features) >= 2:
registry.add_silver_ratio(
column=f"{rec.affected_features[0]}_to_{rec.affected_features[1]}_ratio",
numerator=rec.affected_features[0], denominator=rec.affected_features[1],
rationale=rec.description, source_notebook="05_relationship_analysis"
)
elif "interaction" in rec.title.lower() and len(rec.affected_features) >= 2:
for i, f1 in enumerate(rec.affected_features[:3]):
for f2 in rec.affected_features[i+1:4]:
registry.add_silver_interaction(
column=f"{f1}_x_{f2}", features=[f1, f2],
rationale=rec.description, source_notebook="05_relationship_analysis"
)
# Store for findings metadata
findings.metadata["relationship_analysis"] = {
"n_recommendations": len(analysis_summary.recommendations),
"n_high_priority": len(high_priority),
"strong_predictors": [p["feature"] for p in analysis_summary.strong_predictors],
"weak_predictors": analysis_summary.weak_predictors[:5],
"multicollinear_pairs": [(p["feature1"], p["feature2"]) for p in analysis_summary.multicollinear_pairs],
}
print(f"\nβ
Persisted {len(analysis_summary.multicollinear_pairs)} multicollinearity recommendations")
print(f"β
Persisted {len(analysis_summary.strong_predictors)} strong predictor recommendations")
print(f"β
Persisted {min(len(analysis_summary.weak_predictors), 10)} weak predictor recommendations")
================================================================================
ACTIONABLE RECOMMENDATIONS FROM RELATIONSHIP ANALYSIS
================================================================================
π΄ HIGH PRIORITY ACTIONS (241):
------------------------------------------------------------
π Remove multicollinear feature
event_count_180d and opened_count_180d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_180d, opened_count_180d
π Remove multicollinear feature
event_count_180d and clicked_count_180d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_180d, clicked_count_180d
π Remove multicollinear feature
event_count_180d and send_hour_sum_180d are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_180d, send_hour_sum_180d
π Remove multicollinear feature
event_count_180d and send_hour_count_180d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_180d, send_hour_count_180d
π Remove multicollinear feature
event_count_180d and bounced_count_180d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_180d, bounced_count_180d
π Remove multicollinear feature
event_count_365d and opened_count_365d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_365d, opened_count_365d
π Remove multicollinear feature
event_count_365d and clicked_count_365d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_365d, clicked_count_365d
π Remove multicollinear feature
event_count_365d and send_hour_sum_365d are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_365d, send_hour_sum_365d
π Remove multicollinear feature
event_count_365d and send_hour_count_365d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_365d, send_hour_count_365d
π Remove multicollinear feature
event_count_365d and bounced_count_365d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_365d, bounced_count_365d
π Remove multicollinear feature
event_count_all_time and opened_count_all_time are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_all_time, opened_count_all_time
π Remove multicollinear feature
event_count_all_time and clicked_count_all_time are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_all_time, clicked_count_all_time
π Remove multicollinear feature
event_count_all_time and send_hour_sum_all_time are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_all_time, send_hour_sum_all_time
π Remove multicollinear feature
event_count_all_time and send_hour_count_all_time are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_all_time, send_hour_count_all_time
π Remove multicollinear feature
event_count_all_time and bounced_count_all_time are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: event_count_all_time, bounced_count_all_time
π Remove multicollinear feature
opened_sum_180d and time_to_open_hours_count_180d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_sum_180d, time_to_open_hours_count_180d
π Remove multicollinear feature
opened_mean_180d and lag0_opened_mean are highly correlated (r=0.90)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_mean_180d, lag0_opened_mean
π Remove multicollinear feature
opened_count_180d and clicked_count_180d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_180d, clicked_count_180d
π Remove multicollinear feature
opened_count_180d and send_hour_sum_180d are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_180d, send_hour_sum_180d
π Remove multicollinear feature
opened_count_180d and send_hour_count_180d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_180d, send_hour_count_180d
π Remove multicollinear feature
opened_count_180d and bounced_count_180d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_180d, bounced_count_180d
π Remove multicollinear feature
clicked_sum_180d and clicked_mean_180d are highly correlated (r=0.88)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_sum_180d, clicked_mean_180d
π Remove multicollinear feature
clicked_mean_180d and lag0_clicked_mean are highly correlated (r=0.88)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_mean_180d, lag0_clicked_mean
π Remove multicollinear feature
clicked_count_180d and send_hour_sum_180d are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_count_180d, send_hour_sum_180d
π Remove multicollinear feature
clicked_count_180d and send_hour_count_180d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_count_180d, send_hour_count_180d
π Remove multicollinear feature
clicked_count_180d and bounced_count_180d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_count_180d, bounced_count_180d
π Remove multicollinear feature
send_hour_sum_180d and send_hour_count_180d are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_sum_180d, send_hour_count_180d
π Remove multicollinear feature
send_hour_sum_180d and bounced_count_180d are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_sum_180d, bounced_count_180d
π Remove multicollinear feature
send_hour_mean_180d and send_hour_max_180d are highly correlated (r=0.88)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_mean_180d, send_hour_max_180d
π Remove multicollinear feature
send_hour_mean_180d and lag0_send_hour_mean are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_mean_180d, lag0_send_hour_mean
π Remove multicollinear feature
send_hour_mean_180d and lag0_send_hour_max are highly correlated (r=0.86)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_mean_180d, lag0_send_hour_max
π Remove multicollinear feature
send_hour_count_180d and bounced_count_180d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_count_180d, bounced_count_180d
π Remove multicollinear feature
bounced_sum_180d and bounced_mean_180d are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: bounced_sum_180d, bounced_mean_180d
π Remove multicollinear feature
bounced_mean_180d and lag0_bounced_mean are highly correlated (r=0.88)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: bounced_mean_180d, lag0_bounced_mean
π Remove multicollinear feature
time_to_open_hours_sum_180d and time_to_open_hours_mean_180d are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_sum_180d, time_to_open_hours_mean_180d
π Remove multicollinear feature
time_to_open_hours_sum_180d and time_to_open_hours_max_180d are highly correlated (r=0.96)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_sum_180d, time_to_open_hours_max_180d
π Remove multicollinear feature
time_to_open_hours_mean_180d and time_to_open_hours_max_180d are highly correlated (r=0.97)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_180d, time_to_open_hours_max_180d
π Remove multicollinear feature
time_to_open_hours_mean_180d and time_to_open_hours_mean_365d are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_180d, time_to_open_hours_mean_365d
π Remove multicollinear feature
time_to_open_hours_mean_180d and time_to_open_hours_max_365d are highly correlated (r=0.88)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_180d, time_to_open_hours_max_365d
π Remove multicollinear feature
time_to_open_hours_mean_180d and lag0_time_to_open_hours_mean are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_180d, lag0_time_to_open_hours_mean
π Remove multicollinear feature
time_to_open_hours_mean_180d and lag0_time_to_open_hours_max are highly correlated (r=0.96)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_180d, lag0_time_to_open_hours_max
π Remove multicollinear feature
time_to_open_hours_mean_180d and lag2_time_to_open_hours_mean are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_180d, lag2_time_to_open_hours_mean
π Remove multicollinear feature
time_to_open_hours_mean_180d and lag2_time_to_open_hours_max are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_180d, lag2_time_to_open_hours_max
π Remove multicollinear feature
time_to_open_hours_mean_180d and lag3_time_to_open_hours_mean are highly correlated (r=0.86)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_180d, lag3_time_to_open_hours_mean
π Remove multicollinear feature
time_to_open_hours_mean_180d and lag3_time_to_open_hours_max are highly correlated (r=0.87)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_180d, lag3_time_to_open_hours_max
π Remove multicollinear feature
time_to_open_hours_max_180d and time_to_open_hours_mean_365d are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_max_180d, time_to_open_hours_mean_365d
π Remove multicollinear feature
time_to_open_hours_max_180d and time_to_open_hours_max_365d are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_max_180d, time_to_open_hours_max_365d
π Remove multicollinear feature
time_to_open_hours_max_180d and lag0_time_to_open_hours_mean are highly correlated (r=0.95)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_max_180d, lag0_time_to_open_hours_mean
π Remove multicollinear feature
time_to_open_hours_max_180d and lag0_time_to_open_hours_max are highly correlated (r=0.96)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_max_180d, lag0_time_to_open_hours_max
π Remove multicollinear feature
time_to_open_hours_max_180d and lag2_time_to_open_hours_mean are highly correlated (r=0.90)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_max_180d, lag2_time_to_open_hours_mean
π Remove multicollinear feature
time_to_open_hours_max_180d and lag2_time_to_open_hours_max are highly correlated (r=0.90)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_max_180d, lag2_time_to_open_hours_max
π Remove multicollinear feature
opened_sum_365d and time_to_open_hours_count_365d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_sum_365d, time_to_open_hours_count_365d
π Remove multicollinear feature
opened_count_365d and clicked_count_365d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_365d, clicked_count_365d
π Remove multicollinear feature
opened_count_365d and send_hour_sum_365d are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_365d, send_hour_sum_365d
π Remove multicollinear feature
opened_count_365d and send_hour_count_365d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_365d, send_hour_count_365d
π Remove multicollinear feature
opened_count_365d and bounced_count_365d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_365d, bounced_count_365d
π Remove multicollinear feature
clicked_count_365d and send_hour_sum_365d are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_count_365d, send_hour_sum_365d
π Remove multicollinear feature
clicked_count_365d and send_hour_count_365d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_count_365d, send_hour_count_365d
π Remove multicollinear feature
clicked_count_365d and bounced_count_365d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_count_365d, bounced_count_365d
π Remove multicollinear feature
send_hour_sum_365d and send_hour_count_365d are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_sum_365d, send_hour_count_365d
π Remove multicollinear feature
send_hour_sum_365d and bounced_count_365d are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_sum_365d, bounced_count_365d
π Remove multicollinear feature
send_hour_count_365d and bounced_count_365d are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_count_365d, bounced_count_365d
π Remove multicollinear feature
time_to_open_hours_sum_365d and time_to_open_hours_max_365d are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_sum_365d, time_to_open_hours_max_365d
π Remove multicollinear feature
time_to_open_hours_mean_365d and time_to_open_hours_max_365d are highly correlated (r=0.95)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_365d, time_to_open_hours_max_365d
π Remove multicollinear feature
time_to_open_hours_mean_365d and lag0_time_to_open_hours_mean are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_365d, lag0_time_to_open_hours_mean
π Remove multicollinear feature
time_to_open_hours_mean_365d and lag0_time_to_open_hours_max are highly correlated (r=0.93)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_365d, lag0_time_to_open_hours_max
π Remove multicollinear feature
time_to_open_hours_mean_365d and lag1_time_to_open_hours_mean are highly correlated (r=0.85)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_mean_365d, lag1_time_to_open_hours_mean
π Remove multicollinear feature
time_to_open_hours_max_365d and lag0_time_to_open_hours_mean are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_max_365d, lag0_time_to_open_hours_mean
π Remove multicollinear feature
time_to_open_hours_max_365d and lag0_time_to_open_hours_max are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_max_365d, lag0_time_to_open_hours_max
π Remove multicollinear feature
opened_sum_all_time and time_to_open_hours_sum_all_time are highly correlated (r=0.86)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_sum_all_time, time_to_open_hours_sum_all_time
π Remove multicollinear feature
opened_sum_all_time and time_to_open_hours_count_all_time are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_sum_all_time, time_to_open_hours_count_all_time
π Remove multicollinear feature
opened_count_all_time and clicked_count_all_time are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_all_time, clicked_count_all_time
π Remove multicollinear feature
opened_count_all_time and send_hour_sum_all_time are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_all_time, send_hour_sum_all_time
π Remove multicollinear feature
opened_count_all_time and send_hour_count_all_time are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_all_time, send_hour_count_all_time
π Remove multicollinear feature
opened_count_all_time and bounced_count_all_time are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_count_all_time, bounced_count_all_time
π Remove multicollinear feature
clicked_count_all_time and send_hour_sum_all_time are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_count_all_time, send_hour_sum_all_time
π Remove multicollinear feature
clicked_count_all_time and send_hour_count_all_time are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_count_all_time, send_hour_count_all_time
π Remove multicollinear feature
clicked_count_all_time and bounced_count_all_time are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_count_all_time, bounced_count_all_time
π Remove multicollinear feature
send_hour_sum_all_time and send_hour_count_all_time are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_sum_all_time, send_hour_count_all_time
π Remove multicollinear feature
send_hour_sum_all_time and bounced_count_all_time are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_sum_all_time, bounced_count_all_time
π Remove multicollinear feature
send_hour_sum_all_time and send_hour_beginning are highly correlated (r=0.85)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_sum_all_time, send_hour_beginning
π Remove multicollinear feature
send_hour_sum_all_time and send_hour_end are highly correlated (r=0.85)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_sum_all_time, send_hour_end
π Remove multicollinear feature
send_hour_count_all_time and bounced_count_all_time are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_count_all_time, bounced_count_all_time
π Remove multicollinear feature
time_to_open_hours_sum_all_time and time_to_open_hours_count_all_time are highly correlated (r=0.86)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_sum_all_time, time_to_open_hours_count_all_time
π Remove multicollinear feature
days_since_last_event_x and days_since_first_event_y are highly correlated (r=-0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: days_since_last_event_x, days_since_first_event_y
π Remove multicollinear feature
days_since_last_event_x and active_span_days are highly correlated (r=-0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: days_since_last_event_x, active_span_days
π Remove multicollinear feature
lag0_opened_sum and lag0_opened_mean are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_sum, lag0_opened_mean
π Remove multicollinear feature
lag0_opened_sum and lag0_time_to_open_hours_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_sum, lag0_time_to_open_hours_count
π Remove multicollinear feature
lag0_opened_sum and opened_velocity_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_sum, opened_velocity_pct
π Remove multicollinear feature
lag0_opened_sum and opened_vs_cohort_mean are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_sum, opened_vs_cohort_mean
π Remove multicollinear feature
lag0_opened_sum and opened_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_sum, opened_vs_cohort_pct
π Remove multicollinear feature
lag0_opened_sum and opened_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_sum, opened_cohort_zscore
π Remove multicollinear feature
lag0_opened_mean and lag0_time_to_open_hours_count are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_mean, lag0_time_to_open_hours_count
π Remove multicollinear feature
lag0_opened_mean and opened_vs_cohort_mean are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_mean, opened_vs_cohort_mean
π Remove multicollinear feature
lag0_opened_mean and opened_vs_cohort_pct are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_mean, opened_vs_cohort_pct
π Remove multicollinear feature
lag0_opened_mean and opened_cohort_zscore are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_mean, opened_cohort_zscore
π Remove multicollinear feature
lag0_opened_count and lag0_clicked_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_count, lag0_clicked_count
π Remove multicollinear feature
lag0_opened_count and lag0_send_hour_sum are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_count, lag0_send_hour_sum
π Remove multicollinear feature
lag0_opened_count and lag0_send_hour_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_count, lag0_send_hour_count
π Remove multicollinear feature
lag0_opened_count and lag0_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_count, lag0_bounced_count
π Remove multicollinear feature
lag0_opened_count and send_hour_vs_cohort_mean are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_count, send_hour_vs_cohort_mean
π Remove multicollinear feature
lag0_opened_count and send_hour_vs_cohort_pct are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_count, send_hour_vs_cohort_pct
π Remove multicollinear feature
lag0_opened_count and send_hour_cohort_zscore are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_opened_count, send_hour_cohort_zscore
π Remove multicollinear feature
lag0_clicked_sum and lag0_clicked_mean are highly correlated (r=0.92)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_sum, lag0_clicked_mean
π Remove multicollinear feature
lag0_clicked_sum and clicked_vs_cohort_mean are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_sum, clicked_vs_cohort_mean
π Remove multicollinear feature
lag0_clicked_sum and clicked_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_sum, clicked_vs_cohort_pct
π Remove multicollinear feature
lag0_clicked_sum and clicked_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_sum, clicked_cohort_zscore
π Remove multicollinear feature
lag0_clicked_mean and clicked_vs_cohort_mean are highly correlated (r=0.92)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_mean, clicked_vs_cohort_mean
π Remove multicollinear feature
lag0_clicked_mean and clicked_vs_cohort_pct are highly correlated (r=0.92)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_mean, clicked_vs_cohort_pct
π Remove multicollinear feature
lag0_clicked_mean and clicked_cohort_zscore are highly correlated (r=0.92)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_mean, clicked_cohort_zscore
π Remove multicollinear feature
lag0_clicked_count and lag0_send_hour_sum are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_count, lag0_send_hour_sum
π Remove multicollinear feature
lag0_clicked_count and lag0_send_hour_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_count, lag0_send_hour_count
π Remove multicollinear feature
lag0_clicked_count and lag0_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_count, lag0_bounced_count
π Remove multicollinear feature
lag0_clicked_count and send_hour_vs_cohort_mean are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_count, send_hour_vs_cohort_mean
π Remove multicollinear feature
lag0_clicked_count and send_hour_vs_cohort_pct are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_count, send_hour_vs_cohort_pct
π Remove multicollinear feature
lag0_clicked_count and send_hour_cohort_zscore are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_clicked_count, send_hour_cohort_zscore
π Remove multicollinear feature
lag0_send_hour_sum and lag0_send_hour_count are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_send_hour_sum, lag0_send_hour_count
π Remove multicollinear feature
lag0_send_hour_sum and lag0_bounced_count are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_send_hour_sum, lag0_bounced_count
π Remove multicollinear feature
lag0_send_hour_sum and send_hour_vs_cohort_mean are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_send_hour_sum, send_hour_vs_cohort_mean
π Remove multicollinear feature
lag0_send_hour_sum and send_hour_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_send_hour_sum, send_hour_vs_cohort_pct
π Remove multicollinear feature
lag0_send_hour_sum and send_hour_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_send_hour_sum, send_hour_cohort_zscore
π Remove multicollinear feature
lag0_send_hour_mean and lag0_send_hour_max are highly correlated (r=0.95)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_send_hour_mean, lag0_send_hour_max
π Remove multicollinear feature
lag0_send_hour_count and lag0_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_send_hour_count, lag0_bounced_count
π Remove multicollinear feature
lag0_send_hour_count and send_hour_vs_cohort_mean are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_send_hour_count, send_hour_vs_cohort_mean
π Remove multicollinear feature
lag0_send_hour_count and send_hour_vs_cohort_pct are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_send_hour_count, send_hour_vs_cohort_pct
π Remove multicollinear feature
lag0_send_hour_count and send_hour_cohort_zscore are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_send_hour_count, send_hour_cohort_zscore
π Remove multicollinear feature
lag0_bounced_sum and lag0_bounced_mean are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_bounced_sum, lag0_bounced_mean
π Remove multicollinear feature
lag0_bounced_sum and bounced_vs_cohort_mean are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_bounced_sum, bounced_vs_cohort_mean
π Remove multicollinear feature
lag0_bounced_sum and bounced_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_bounced_sum, bounced_vs_cohort_pct
π Remove multicollinear feature
lag0_bounced_sum and bounced_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_bounced_sum, bounced_cohort_zscore
π Remove multicollinear feature
lag0_bounced_mean and bounced_vs_cohort_mean are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_bounced_mean, bounced_vs_cohort_mean
π Remove multicollinear feature
lag0_bounced_mean and bounced_vs_cohort_pct are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_bounced_mean, bounced_vs_cohort_pct
π Remove multicollinear feature
lag0_bounced_mean and bounced_cohort_zscore are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_bounced_mean, bounced_cohort_zscore
π Remove multicollinear feature
lag0_bounced_count and send_hour_vs_cohort_mean are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_bounced_count, send_hour_vs_cohort_mean
π Remove multicollinear feature
lag0_bounced_count and send_hour_vs_cohort_pct are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_bounced_count, send_hour_vs_cohort_pct
π Remove multicollinear feature
lag0_bounced_count and send_hour_cohort_zscore are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_bounced_count, send_hour_cohort_zscore
π Remove multicollinear feature
lag0_time_to_open_hours_sum and lag0_time_to_open_hours_mean are highly correlated (r=0.97)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_sum, lag0_time_to_open_hours_mean
π Remove multicollinear feature
lag0_time_to_open_hours_sum and lag0_time_to_open_hours_max are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_sum, lag0_time_to_open_hours_max
π Remove multicollinear feature
lag0_time_to_open_hours_sum and time_to_open_hours_momentum are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_sum, time_to_open_hours_momentum
π Remove multicollinear feature
lag0_time_to_open_hours_sum and time_to_open_hours_vs_cohort_mean are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_sum, time_to_open_hours_vs_cohort_mean
π Remove multicollinear feature
lag0_time_to_open_hours_sum and time_to_open_hours_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_sum, time_to_open_hours_vs_cohort_pct
π Remove multicollinear feature
lag0_time_to_open_hours_sum and time_to_open_hours_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_sum, time_to_open_hours_cohort_zscore
π Remove multicollinear feature
lag0_time_to_open_hours_mean and lag0_time_to_open_hours_max are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_mean, lag0_time_to_open_hours_max
π Remove multicollinear feature
lag0_time_to_open_hours_mean and time_to_open_hours_velocity are highly correlated (r=0.86)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_mean, time_to_open_hours_velocity
π Remove multicollinear feature
lag0_time_to_open_hours_mean and time_to_open_hours_vs_cohort_mean are highly correlated (r=0.97)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_mean, time_to_open_hours_vs_cohort_mean
π Remove multicollinear feature
lag0_time_to_open_hours_mean and time_to_open_hours_vs_cohort_pct are highly correlated (r=0.97)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_mean, time_to_open_hours_vs_cohort_pct
π Remove multicollinear feature
lag0_time_to_open_hours_mean and time_to_open_hours_cohort_zscore are highly correlated (r=0.97)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_mean, time_to_open_hours_cohort_zscore
π Remove multicollinear feature
lag0_time_to_open_hours_count and opened_velocity_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_count, opened_velocity_pct
π Remove multicollinear feature
lag0_time_to_open_hours_count and opened_vs_cohort_mean are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_count, opened_vs_cohort_mean
π Remove multicollinear feature
lag0_time_to_open_hours_count and opened_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_count, opened_vs_cohort_pct
π Remove multicollinear feature
lag0_time_to_open_hours_count and opened_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_count, opened_cohort_zscore
π Remove multicollinear feature
lag0_time_to_open_hours_max and time_to_open_hours_velocity are highly correlated (r=0.90)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_max, time_to_open_hours_velocity
π Remove multicollinear feature
lag0_time_to_open_hours_max and time_to_open_hours_momentum are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_max, time_to_open_hours_momentum
π Remove multicollinear feature
lag0_time_to_open_hours_max and time_to_open_hours_vs_cohort_mean are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_max, time_to_open_hours_vs_cohort_mean
π Remove multicollinear feature
lag0_time_to_open_hours_max and time_to_open_hours_vs_cohort_pct are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_max, time_to_open_hours_vs_cohort_pct
π Remove multicollinear feature
lag0_time_to_open_hours_max and time_to_open_hours_cohort_zscore are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag0_time_to_open_hours_max, time_to_open_hours_cohort_zscore
π Remove multicollinear feature
lag1_opened_sum and lag1_opened_mean are highly correlated (r=0.92)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_opened_sum, lag1_opened_mean
π Remove multicollinear feature
lag1_opened_sum and lag1_time_to_open_hours_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_opened_sum, lag1_time_to_open_hours_count
π Remove multicollinear feature
lag1_opened_mean and lag1_time_to_open_hours_count are highly correlated (r=0.92)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_opened_mean, lag1_time_to_open_hours_count
π Remove multicollinear feature
lag1_opened_count and lag1_clicked_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_opened_count, lag1_clicked_count
π Remove multicollinear feature
lag1_opened_count and lag1_send_hour_sum are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_opened_count, lag1_send_hour_sum
π Remove multicollinear feature
lag1_opened_count and lag1_send_hour_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_opened_count, lag1_send_hour_count
π Remove multicollinear feature
lag1_opened_count and lag1_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_opened_count, lag1_bounced_count
π Remove multicollinear feature
lag1_clicked_sum and lag1_clicked_mean are highly correlated (r=0.95)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_clicked_sum, lag1_clicked_mean
π Remove multicollinear feature
lag1_clicked_count and lag1_send_hour_sum are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_clicked_count, lag1_send_hour_sum
π Remove multicollinear feature
lag1_clicked_count and lag1_send_hour_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_clicked_count, lag1_send_hour_count
π Remove multicollinear feature
lag1_clicked_count and lag1_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_clicked_count, lag1_bounced_count
π Remove multicollinear feature
lag1_send_hour_sum and lag1_send_hour_count are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_send_hour_sum, lag1_send_hour_count
π Remove multicollinear feature
lag1_send_hour_sum and lag1_bounced_count are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_send_hour_sum, lag1_bounced_count
π Remove multicollinear feature
lag1_send_hour_mean and lag1_send_hour_max are highly correlated (r=0.96)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_send_hour_mean, lag1_send_hour_max
π Remove multicollinear feature
lag1_send_hour_count and lag1_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_send_hour_count, lag1_bounced_count
π Remove multicollinear feature
lag1_time_to_open_hours_sum and lag1_time_to_open_hours_mean are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_time_to_open_hours_sum, lag1_time_to_open_hours_mean
π Remove multicollinear feature
lag1_time_to_open_hours_sum and lag1_time_to_open_hours_max are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_time_to_open_hours_sum, lag1_time_to_open_hours_max
π Remove multicollinear feature
lag1_time_to_open_hours_mean and lag1_time_to_open_hours_max are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag1_time_to_open_hours_mean, lag1_time_to_open_hours_max
π Remove multicollinear feature
lag2_opened_sum and lag2_opened_mean are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_opened_sum, lag2_opened_mean
π Remove multicollinear feature
lag2_opened_sum and lag2_time_to_open_hours_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_opened_sum, lag2_time_to_open_hours_count
π Remove multicollinear feature
lag2_opened_mean and lag2_time_to_open_hours_count are highly correlated (r=0.91)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_opened_mean, lag2_time_to_open_hours_count
π Remove multicollinear feature
lag2_opened_count and lag2_clicked_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_opened_count, lag2_clicked_count
π Remove multicollinear feature
lag2_opened_count and lag2_send_hour_sum are highly correlated (r=0.88)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_opened_count, lag2_send_hour_sum
π Remove multicollinear feature
lag2_opened_count and lag2_send_hour_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_opened_count, lag2_send_hour_count
π Remove multicollinear feature
lag2_opened_count and lag2_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_opened_count, lag2_bounced_count
π Remove multicollinear feature
lag2_clicked_sum and lag2_clicked_mean are highly correlated (r=0.92)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_clicked_sum, lag2_clicked_mean
π Remove multicollinear feature
lag2_clicked_count and lag2_send_hour_sum are highly correlated (r=0.88)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_clicked_count, lag2_send_hour_sum
π Remove multicollinear feature
lag2_clicked_count and lag2_send_hour_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_clicked_count, lag2_send_hour_count
π Remove multicollinear feature
lag2_clicked_count and lag2_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_clicked_count, lag2_bounced_count
π Remove multicollinear feature
lag2_send_hour_sum and lag2_send_hour_count are highly correlated (r=0.88)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_send_hour_sum, lag2_send_hour_count
π Remove multicollinear feature
lag2_send_hour_sum and lag2_bounced_count are highly correlated (r=0.88)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_send_hour_sum, lag2_bounced_count
π Remove multicollinear feature
lag2_send_hour_mean and lag2_send_hour_max are highly correlated (r=0.96)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_send_hour_mean, lag2_send_hour_max
π Remove multicollinear feature
lag2_send_hour_count and lag2_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_send_hour_count, lag2_bounced_count
π Remove multicollinear feature
lag2_time_to_open_hours_sum and lag2_time_to_open_hours_mean are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_time_to_open_hours_sum, lag2_time_to_open_hours_mean
π Remove multicollinear feature
lag2_time_to_open_hours_sum and lag2_time_to_open_hours_max are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_time_to_open_hours_sum, lag2_time_to_open_hours_max
π Remove multicollinear feature
lag2_time_to_open_hours_mean and lag2_time_to_open_hours_max are highly correlated (r=0.98)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag2_time_to_open_hours_mean, lag2_time_to_open_hours_max
π Remove multicollinear feature
lag3_opened_sum and lag3_opened_mean are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_opened_sum, lag3_opened_mean
π Remove multicollinear feature
lag3_opened_sum and lag3_time_to_open_hours_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_opened_sum, lag3_time_to_open_hours_count
π Remove multicollinear feature
lag3_opened_mean and lag3_time_to_open_hours_count are highly correlated (r=0.94)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_opened_mean, lag3_time_to_open_hours_count
π Remove multicollinear feature
lag3_opened_count and lag3_clicked_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_opened_count, lag3_clicked_count
π Remove multicollinear feature
lag3_opened_count and lag3_send_hour_sum are highly correlated (r=0.87)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_opened_count, lag3_send_hour_sum
π Remove multicollinear feature
lag3_opened_count and lag3_send_hour_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_opened_count, lag3_send_hour_count
π Remove multicollinear feature
lag3_opened_count and lag3_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_opened_count, lag3_bounced_count
π Remove multicollinear feature
lag3_clicked_count and lag3_send_hour_sum are highly correlated (r=0.87)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_clicked_count, lag3_send_hour_sum
π Remove multicollinear feature
lag3_clicked_count and lag3_send_hour_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_clicked_count, lag3_send_hour_count
π Remove multicollinear feature
lag3_clicked_count and lag3_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_clicked_count, lag3_bounced_count
π Remove multicollinear feature
lag3_send_hour_sum and lag3_send_hour_count are highly correlated (r=0.87)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_send_hour_sum, lag3_send_hour_count
π Remove multicollinear feature
lag3_send_hour_sum and lag3_bounced_count are highly correlated (r=0.87)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_send_hour_sum, lag3_bounced_count
π Remove multicollinear feature
lag3_send_hour_mean and lag3_send_hour_max are highly correlated (r=0.96)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_send_hour_mean, lag3_send_hour_max
π Remove multicollinear feature
lag3_send_hour_count and lag3_bounced_count are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_send_hour_count, lag3_bounced_count
π Remove multicollinear feature
lag3_time_to_open_hours_sum and lag3_time_to_open_hours_mean are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_time_to_open_hours_sum, lag3_time_to_open_hours_mean
π Remove multicollinear feature
lag3_time_to_open_hours_sum and lag3_time_to_open_hours_max are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_time_to_open_hours_sum, lag3_time_to_open_hours_max
π Remove multicollinear feature
lag3_time_to_open_hours_mean and lag3_time_to_open_hours_max are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: lag3_time_to_open_hours_mean, lag3_time_to_open_hours_max
π Remove multicollinear feature
opened_velocity and opened_velocity_pct are highly correlated (r=0.92)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_velocity, opened_velocity_pct
π Remove multicollinear feature
opened_velocity_pct and opened_vs_cohort_mean are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_velocity_pct, opened_vs_cohort_mean
π Remove multicollinear feature
opened_velocity_pct and opened_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_velocity_pct, opened_vs_cohort_pct
π Remove multicollinear feature
opened_velocity_pct and opened_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_velocity_pct, opened_cohort_zscore
π Remove multicollinear feature
send_hour_velocity and send_hour_velocity_pct are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_velocity, send_hour_velocity_pct
π Remove multicollinear feature
bounced_velocity and bounced_acceleration are highly correlated (r=0.87)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: bounced_velocity, bounced_acceleration
π Remove multicollinear feature
time_to_open_hours_momentum and time_to_open_hours_vs_cohort_mean are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_momentum, time_to_open_hours_vs_cohort_mean
π Remove multicollinear feature
time_to_open_hours_momentum and time_to_open_hours_vs_cohort_pct are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_momentum, time_to_open_hours_vs_cohort_pct
π Remove multicollinear feature
time_to_open_hours_momentum and time_to_open_hours_cohort_zscore are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_momentum, time_to_open_hours_cohort_zscore
π Remove multicollinear feature
clicked_end and clicked_trend_ratio are highly correlated (r=0.92)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_end, clicked_trend_ratio
π Remove multicollinear feature
bounced_end and bounced_trend_ratio are highly correlated (r=0.99)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: bounced_end, bounced_trend_ratio
π Remove multicollinear feature
days_since_first_event_y and active_span_days are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: days_since_first_event_y, active_span_days
π Remove multicollinear feature
inter_event_gap_std and inter_event_gap_max are highly correlated (r=0.89)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: inter_event_gap_std, inter_event_gap_max
π Remove multicollinear feature
opened_vs_cohort_mean and opened_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_vs_cohort_mean, opened_vs_cohort_pct
π Remove multicollinear feature
opened_vs_cohort_mean and opened_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_vs_cohort_mean, opened_cohort_zscore
π Remove multicollinear feature
opened_vs_cohort_pct and opened_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: opened_vs_cohort_pct, opened_cohort_zscore
π Remove multicollinear feature
clicked_vs_cohort_mean and clicked_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_vs_cohort_mean, clicked_vs_cohort_pct
π Remove multicollinear feature
clicked_vs_cohort_mean and clicked_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_vs_cohort_mean, clicked_cohort_zscore
π Remove multicollinear feature
clicked_vs_cohort_pct and clicked_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: clicked_vs_cohort_pct, clicked_cohort_zscore
π Remove multicollinear feature
send_hour_vs_cohort_mean and send_hour_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_vs_cohort_mean, send_hour_vs_cohort_pct
π Remove multicollinear feature
send_hour_vs_cohort_mean and send_hour_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_vs_cohort_mean, send_hour_cohort_zscore
π Remove multicollinear feature
send_hour_vs_cohort_pct and send_hour_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: send_hour_vs_cohort_pct, send_hour_cohort_zscore
π Remove multicollinear feature
bounced_vs_cohort_mean and bounced_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: bounced_vs_cohort_mean, bounced_vs_cohort_pct
π Remove multicollinear feature
bounced_vs_cohort_mean and bounced_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: bounced_vs_cohort_mean, bounced_cohort_zscore
π Remove multicollinear feature
bounced_vs_cohort_pct and bounced_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: bounced_vs_cohort_pct, bounced_cohort_zscore
π Remove multicollinear feature
time_to_open_hours_vs_cohort_mean and time_to_open_hours_vs_cohort_pct are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_vs_cohort_mean, time_to_open_hours_vs_cohort_pct
π Remove multicollinear feature
time_to_open_hours_vs_cohort_mean and time_to_open_hours_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_vs_cohort_mean, time_to_open_hours_cohort_zscore
π Remove multicollinear feature
time_to_open_hours_vs_cohort_pct and time_to_open_hours_cohort_zscore are highly correlated (r=1.00)
β Action: Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
β Features: time_to_open_hours_vs_cohort_pct, time_to_open_hours_cohort_zscore
π Prioritize strong predictors
Top predictive features: days_since_last_event_x, days_since_first_event_y, active_span_days
β Action: Ensure these features are included in your model and check for data quality issues.
β Features: days_since_last_event_x, days_since_first_event_y, active_span_days
π Stratify by lifecycle_quadrant
Significant variation in retention rates across lifecycle_quadrant categories (spread: 75.1%)
β Action: Use stratified sampling by lifecycle_quadrant in train/test split to ensure all segments are represented.
β Features: lifecycle_quadrant
π Stratify by recency_bucket
Significant variation in retention rates across recency_bucket categories (spread: 66.6%)
β Action: Use stratified sampling by recency_bucket in train/test split to ensure all segments are represented.
β Features: recency_bucket
π Monitor high-risk segments
Segments with below-average retention: Steady & Loyal, Occasional & Loyal, 0-7d
β Action: Target these segments for intervention campaigns and ensure adequate representation in training data.
β Features: lifecycle_quadrant, lifecycle_quadrant, recency_bucket
β
Persisted 442 multicollinearity recommendations
β
Persisted 51 strong predictor recommendations
β
Persisted 10 weak predictor recommendations
5.9.1 Feature Selection RecommendationsΒΆ
What these recommendations tell you:
- Which features to prioritize (strong predictors)
- Which features to consider dropping (weak predictors, redundant features)
- Which feature pairs cause multicollinearity issues
π Decision Guide:
| Finding | Linear Models | Tree-Based Models |
|---|---|---|
| Strong predictors | Include - will have high coefficients | Include - will appear early in splits |
| Weak predictors | Consider dropping | May help in interactions |
| Multicollinear pairs | Drop one feature | Can keep both (trees handle it) |
Show/Hide Code
# Feature Selection Recommendations
selection_recs = grouped_recs.get(RecommendationCategory.FEATURE_SELECTION, [])
print("=" * 70)
print("FEATURE SELECTION")
print("=" * 70)
# Strong predictors summary
if analysis_summary.strong_predictors:
print("\nβ
STRONG PREDICTORS (prioritize these):")
strong_df = pd.DataFrame(analysis_summary.strong_predictors)
strong_df["effect_size"] = strong_df["effect_size"].apply(lambda x: f"{x:+.3f}")
strong_df["correlation"] = strong_df["correlation"].apply(lambda x: f"{x:+.3f}")
strong_df = strong_df.sort_values("effect_size", key=lambda x: x.str.replace("+", "").astype(float).abs(), ascending=False)
display(strong_df)
print("\n π‘ These features show strong discrimination between retained/churned customers.")
print(" β Ensure they're included in your model")
print(" β Check for data quality issues that could inflate their importance")
# Weak predictors summary
if analysis_summary.weak_predictors:
print(f"\nβͺ WEAK PREDICTORS (consider dropping): {', '.join(analysis_summary.weak_predictors[:5])}")
print(" β Low individual predictive power, but may help in combination")
# Multicollinearity summary
if analysis_summary.multicollinear_pairs:
print("\nβ οΈ MULTICOLLINEAR PAIRS (drop one from each pair for linear models):")
for pair in analysis_summary.multicollinear_pairs:
print(f" β’ {pair['feature1']} β {pair['feature2']}: r = {pair['correlation']:.2f}")
print("\n π‘ For each pair, keep the feature with:")
print(" - Stronger business meaning")
print(" - Higher target correlation")
print(" - Fewer missing values")
# Display all feature selection recommendations
if selection_recs:
print("\n" + "-" * 70)
print("DETAILED RECOMMENDATIONS:")
for rec in selection_recs:
priority_icon = "π΄" if rec.priority == "high" else "π‘" if rec.priority == "medium" else "π’"
print(f"\n{priority_icon} {rec.title}")
print(f" {rec.description}")
print(f" β {rec.action}")
====================================================================== FEATURE SELECTION ====================================================================== β STRONG PREDICTORS (prioritize these):
| feature | correlation | effect_size | |
|---|---|---|---|
| 32 | days_since_last_event_x | +0.767 | +2.403 |
| 45 | active_span_days | -0.762 | -2.365 |
| 44 | days_since_first_event_y | -0.762 | -2.365 |
| 18 | bounced_count_365d | -0.604 | -1.524 |
| 1 | event_count_365d | -0.604 | -1.524 |
| 17 | send_hour_count_365d | -0.604 | -1.524 |
| 15 | clicked_count_365d | -0.604 | -1.524 |
| 13 | opened_count_365d | -0.604 | -1.524 |
| 16 | send_hour_sum_365d | -0.593 | -1.481 |
| 0 | event_count_180d | -0.503 | -1.170 |
| 8 | send_hour_count_180d | -0.503 | -1.170 |
| 6 | clicked_count_180d | -0.503 | -1.170 |
| 5 | opened_count_180d | -0.503 | -1.170 |
| 9 | bounced_count_180d | -0.503 | -1.170 |
| 7 | send_hour_sum_180d | -0.492 | -1.136 |
| 31 | time_to_open_hours_count_all_time | -0.448 | -1.008 |
| 21 | opened_sum_all_time | -0.448 | -1.008 |
| 38 | opened_end | -0.447 | -1.007 |
| 20 | time_to_open_hours_count_365d | -0.430 | -0.957 |
| 11 | opened_sum_365d | -0.430 | -0.957 |
| 47 | inter_event_gap_max | -0.419 | -0.929 |
| 29 | bounced_count_all_time | -0.406 | -0.893 |
| 28 | send_hour_count_all_time | -0.406 | -0.893 |
| 25 | clicked_count_all_time | -0.406 | -0.893 |
| 2 | event_count_all_time | -0.406 | -0.893 |
| 23 | opened_count_all_time | -0.406 | -0.893 |
| 26 | send_hour_sum_all_time | -0.403 | -0.886 |
| 22 | opened_mean_all_time | -0.385 | -0.839 |
| 30 | time_to_open_hours_sum_all_time | -0.379 | -0.823 |
| 43 | time_to_open_hours_end | -0.350 | -0.753 |
| 42 | send_hour_end | -0.343 | -0.735 |
| 34 | lag0_opened_mean | -0.341 | -0.729 |
| 12 | opened_mean_365d | -0.186 | -0.705 |
| 39 | opened_trend_ratio | -0.317 | -0.701 |
| 41 | send_hour_beginning | -0.327 | -0.697 |
| 24 | clicked_sum_all_time | -0.325 | -0.692 |
| 19 | time_to_open_hours_sum_365d | -0.323 | -0.687 |
| 3 | opened_sum_180d | -0.313 | -0.663 |
| 10 | time_to_open_hours_count_180d | -0.313 | -0.663 |
| 48 | opened_vs_cohort_mean | -0.300 | -0.632 |
| 49 | opened_vs_cohort_pct | -0.300 | -0.632 |
| 50 | opened_cohort_zscore | -0.300 | -0.632 |
| 35 | lag0_time_to_open_hours_count | -0.300 | -0.632 |
| 33 | lag0_opened_sum | -0.300 | -0.632 |
| 40 | clicked_end | -0.295 | -0.623 |
| 27 | send_hour_max_all_time | -0.283 | -0.594 |
| 4 | opened_mean_180d | -0.131 | -0.579 |
| 37 | opened_beginning | -0.265 | -0.555 |
| 46 | inter_event_gap_std | -0.263 | -0.549 |
| 14 | clicked_sum_365d | -0.262 | -0.546 |
| 36 | lag1_opened_mean | -0.258 | -0.534 |
π‘ These features show strong discrimination between retained/churned customers.
β Ensure they're included in your model
β Check for data quality issues that could inflate their importance
βͺ WEAK PREDICTORS (consider dropping): send_hour_mean_180d, send_hour_max_180d, bounced_sum_180d, bounced_mean_180d, time_to_open_hours_mean_180d
β Low individual predictive power, but may help in combination
β οΈ MULTICOLLINEAR PAIRS (drop one from each pair for linear models):
β’ event_count_180d β event_count_365d: r = 0.82
β’ event_count_180d β opened_count_180d: r = 1.00
β’ event_count_180d β clicked_count_180d: r = 1.00
β’ event_count_180d β send_hour_sum_180d: r = 0.98
β’ event_count_180d β send_hour_count_180d: r = 1.00
β’ event_count_180d β bounced_count_180d: r = 1.00
β’ event_count_180d β opened_count_365d: r = 0.82
β’ event_count_180d β clicked_count_365d: r = 0.82
β’ event_count_180d β send_hour_sum_365d: r = 0.80
β’ event_count_180d β send_hour_count_365d: r = 0.82
β’ event_count_180d β bounced_count_365d: r = 0.82
β’ event_count_365d β opened_count_180d: r = 0.82
β’ event_count_365d β clicked_count_180d: r = 0.82
β’ event_count_365d β send_hour_sum_180d: r = 0.80
β’ event_count_365d β send_hour_count_180d: r = 0.82
β’ event_count_365d β bounced_count_180d: r = 0.82
β’ event_count_365d β opened_count_365d: r = 1.00
β’ event_count_365d β clicked_count_365d: r = 1.00
β’ event_count_365d β send_hour_sum_365d: r = 0.98
β’ event_count_365d β send_hour_count_365d: r = 1.00
β’ event_count_365d β bounced_count_365d: r = 1.00
β’ event_count_all_time β opened_sum_all_time: r = 0.80
β’ event_count_all_time β opened_count_all_time: r = 1.00
β’ event_count_all_time β clicked_count_all_time: r = 1.00
β’ event_count_all_time β send_hour_sum_all_time: r = 0.99
β’ event_count_all_time β send_hour_count_all_time: r = 1.00
β’ event_count_all_time β bounced_count_all_time: r = 1.00
β’ event_count_all_time β time_to_open_hours_count_all_time: r = 0.80
β’ event_count_all_time β send_hour_beginning: r = 0.84
β’ event_count_all_time β send_hour_end: r = 0.84
β’ opened_sum_180d β opened_mean_180d: r = 0.82
β’ opened_sum_180d β time_to_open_hours_sum_180d: r = 0.74
β’ opened_sum_180d β time_to_open_hours_count_180d: r = 1.00
β’ opened_sum_180d β opened_sum_365d: r = 0.75
β’ opened_sum_180d β time_to_open_hours_count_365d: r = 0.75
β’ opened_mean_180d β time_to_open_hours_count_180d: r = 0.82
β’ opened_mean_180d β opened_mean_365d: r = 0.79
β’ opened_mean_180d β lag0_opened_sum: r = 0.84
β’ opened_mean_180d β lag0_opened_mean: r = 0.90
β’ opened_mean_180d β lag0_time_to_open_hours_count: r = 0.84
β’ opened_mean_180d β opened_vs_cohort_mean: r = 0.84
β’ opened_mean_180d β opened_vs_cohort_pct: r = 0.84
β’ opened_mean_180d β opened_cohort_zscore: r = 0.84
β’ opened_count_180d β clicked_count_180d: r = 1.00
β’ opened_count_180d β send_hour_sum_180d: r = 0.98
β’ opened_count_180d β send_hour_count_180d: r = 1.00
β’ opened_count_180d β bounced_count_180d: r = 1.00
β’ opened_count_180d β opened_count_365d: r = 0.82
β’ opened_count_180d β clicked_count_365d: r = 0.82
β’ opened_count_180d β send_hour_sum_365d: r = 0.80
β’ opened_count_180d β send_hour_count_365d: r = 0.82
β’ opened_count_180d β bounced_count_365d: r = 0.82
β’ clicked_sum_180d β clicked_mean_180d: r = 0.88
β’ clicked_sum_180d β clicked_sum_365d: r = 0.71
β’ clicked_mean_180d β clicked_mean_365d: r = 0.77
β’ clicked_mean_180d β lag0_clicked_sum: r = 0.84
β’ clicked_mean_180d β lag0_clicked_mean: r = 0.88
β’ clicked_mean_180d β clicked_vs_cohort_mean: r = 0.84
β’ clicked_mean_180d β clicked_vs_cohort_pct: r = 0.84
β’ clicked_mean_180d β clicked_cohort_zscore: r = 0.84
β’ clicked_count_180d β send_hour_sum_180d: r = 0.98
β’ clicked_count_180d β send_hour_count_180d: r = 1.00
β’ clicked_count_180d β bounced_count_180d: r = 1.00
β’ clicked_count_180d β opened_count_365d: r = 0.82
β’ clicked_count_180d β clicked_count_365d: r = 0.82
β’ clicked_count_180d β send_hour_sum_365d: r = 0.80
β’ clicked_count_180d β send_hour_count_365d: r = 0.82
β’ clicked_count_180d β bounced_count_365d: r = 0.82
β’ send_hour_sum_180d β send_hour_count_180d: r = 0.98
β’ send_hour_sum_180d β bounced_count_180d: r = 0.98
β’ send_hour_sum_180d β opened_count_365d: r = 0.80
β’ send_hour_sum_180d β clicked_count_365d: r = 0.80
β’ send_hour_sum_180d β send_hour_sum_365d: r = 0.81
β’ send_hour_sum_180d β send_hour_count_365d: r = 0.80
β’ send_hour_sum_180d β bounced_count_365d: r = 0.80
β’ send_hour_mean_180d β send_hour_max_180d: r = 0.88
β’ send_hour_mean_180d β send_hour_mean_365d: r = 0.81
β’ send_hour_mean_180d β lag0_send_hour_mean: r = 0.89
β’ send_hour_mean_180d β lag0_send_hour_max: r = 0.86
β’ send_hour_max_180d β send_hour_mean_365d: r = 0.73
β’ send_hour_max_180d β send_hour_max_365d: r = 0.75
β’ send_hour_max_180d β lag0_send_hour_mean: r = 0.79
β’ send_hour_max_180d β lag0_send_hour_max: r = 0.81
β’ send_hour_count_180d β bounced_count_180d: r = 1.00
β’ send_hour_count_180d β opened_count_365d: r = 0.82
β’ send_hour_count_180d β clicked_count_365d: r = 0.82
β’ send_hour_count_180d β send_hour_sum_365d: r = 0.80
β’ send_hour_count_180d β send_hour_count_365d: r = 0.82
β’ send_hour_count_180d β bounced_count_365d: r = 0.82
β’ bounced_sum_180d β bounced_mean_180d: r = 0.89
β’ bounced_sum_180d β bounced_sum_365d: r = 0.70
β’ bounced_mean_180d β bounced_mean_365d: r = 0.80
β’ bounced_mean_180d β lag0_bounced_sum: r = 0.85
β’ bounced_mean_180d β lag0_bounced_mean: r = 0.88
β’ bounced_mean_180d β bounced_vs_cohort_mean: r = 0.85
β’ bounced_mean_180d β bounced_vs_cohort_pct: r = 0.85
β’ bounced_mean_180d β bounced_cohort_zscore: r = 0.85
β’ bounced_count_180d β opened_count_365d: r = 0.82
β’ bounced_count_180d β clicked_count_365d: r = 0.82
β’ bounced_count_180d β send_hour_sum_365d: r = 0.80
β’ bounced_count_180d β send_hour_count_365d: r = 0.82
β’ bounced_count_180d β bounced_count_365d: r = 0.82
β’ time_to_open_hours_sum_180d β time_to_open_hours_mean_180d: r = 0.89
β’ time_to_open_hours_sum_180d β time_to_open_hours_max_180d: r = 0.96
β’ time_to_open_hours_sum_180d β time_to_open_hours_count_180d: r = 0.74
β’ time_to_open_hours_sum_180d β time_to_open_hours_sum_365d: r = 0.71
β’ time_to_open_hours_mean_180d β time_to_open_hours_max_180d: r = 0.97
β’ time_to_open_hours_mean_180d β time_to_open_hours_sum_365d: r = 0.73
β’ time_to_open_hours_mean_180d β time_to_open_hours_mean_365d: r = 0.94
β’ time_to_open_hours_mean_180d β time_to_open_hours_max_365d: r = 0.88
β’ time_to_open_hours_mean_180d β lag0_time_to_open_hours_sum: r = 0.70
β’ time_to_open_hours_mean_180d β lag0_time_to_open_hours_mean: r = 0.98
β’ time_to_open_hours_mean_180d β lag0_time_to_open_hours_max: r = 0.96
β’ time_to_open_hours_mean_180d β lag1_time_to_open_hours_mean: r = 0.73
β’ time_to_open_hours_mean_180d β lag1_time_to_open_hours_max: r = 0.73
β’ time_to_open_hours_mean_180d β lag2_time_to_open_hours_mean: r = 0.94
β’ time_to_open_hours_mean_180d β lag2_time_to_open_hours_max: r = 0.94
β’ time_to_open_hours_mean_180d β lag3_time_to_open_hours_mean: r = 0.86
β’ time_to_open_hours_mean_180d β lag3_time_to_open_hours_max: r = 0.87
β’ time_to_open_hours_mean_180d β time_to_open_hours_vs_cohort_mean: r = 0.70
β’ time_to_open_hours_mean_180d β time_to_open_hours_vs_cohort_pct: r = 0.70
β’ time_to_open_hours_mean_180d β time_to_open_hours_cohort_zscore: r = 0.70
β’ time_to_open_hours_max_180d β time_to_open_hours_sum_365d: r = 0.81
β’ time_to_open_hours_max_180d β time_to_open_hours_mean_365d: r = 0.91
β’ time_to_open_hours_max_180d β time_to_open_hours_max_365d: r = 0.91
β’ time_to_open_hours_max_180d β lag0_time_to_open_hours_sum: r = 0.74
β’ time_to_open_hours_max_180d β lag0_time_to_open_hours_mean: r = 0.95
β’ time_to_open_hours_max_180d β lag0_time_to_open_hours_max: r = 0.96
β’ time_to_open_hours_max_180d β lag2_time_to_open_hours_mean: r = 0.90
β’ time_to_open_hours_max_180d β lag2_time_to_open_hours_max: r = 0.90
β’ time_to_open_hours_max_180d β lag3_time_to_open_hours_mean: r = 0.76
β’ time_to_open_hours_max_180d β lag3_time_to_open_hours_max: r = 0.77
β’ time_to_open_hours_max_180d β time_to_open_hours_momentum: r = 0.72
β’ time_to_open_hours_max_180d β time_to_open_hours_vs_cohort_mean: r = 0.74
β’ time_to_open_hours_max_180d β time_to_open_hours_vs_cohort_pct: r = 0.74
β’ time_to_open_hours_max_180d β time_to_open_hours_cohort_zscore: r = 0.74
β’ time_to_open_hours_count_180d β opened_sum_365d: r = 0.75
β’ time_to_open_hours_count_180d β time_to_open_hours_count_365d: r = 0.75
β’ opened_sum_365d β opened_mean_365d: r = 0.76
β’ opened_sum_365d β time_to_open_hours_sum_365d: r = 0.75
β’ opened_sum_365d β time_to_open_hours_count_365d: r = 1.00
β’ opened_mean_365d β time_to_open_hours_count_365d: r = 0.76
β’ opened_mean_365d β lag0_opened_sum: r = 0.72
β’ opened_mean_365d β lag0_opened_mean: r = 0.76
β’ opened_mean_365d β lag0_time_to_open_hours_count: r = 0.72
β’ opened_mean_365d β opened_vs_cohort_mean: r = 0.72
β’ opened_mean_365d β opened_vs_cohort_pct: r = 0.72
β’ opened_mean_365d β opened_cohort_zscore: r = 0.72
β’ opened_count_365d β clicked_count_365d: r = 1.00
β’ opened_count_365d β send_hour_sum_365d: r = 0.98
β’ opened_count_365d β send_hour_count_365d: r = 1.00
β’ opened_count_365d β bounced_count_365d: r = 1.00
β’ clicked_sum_365d β clicked_mean_365d: r = 0.84
β’ clicked_mean_365d β lag0_clicked_sum: r = 0.71
β’ clicked_mean_365d β lag0_clicked_mean: r = 0.74
β’ clicked_mean_365d β clicked_vs_cohort_mean: r = 0.71
β’ clicked_mean_365d β clicked_vs_cohort_pct: r = 0.71
β’ clicked_mean_365d β clicked_cohort_zscore: r = 0.71
β’ clicked_count_365d β send_hour_sum_365d: r = 0.98
β’ clicked_count_365d β send_hour_count_365d: r = 1.00
β’ clicked_count_365d β bounced_count_365d: r = 1.00
β’ send_hour_sum_365d β send_hour_count_365d: r = 0.98
β’ send_hour_sum_365d β bounced_count_365d: r = 0.98
β’ send_hour_mean_365d β send_hour_max_365d: r = 0.81
β’ send_hour_mean_365d β lag0_send_hour_mean: r = 0.78
β’ send_hour_mean_365d β lag0_send_hour_max: r = 0.75
β’ send_hour_count_365d β bounced_count_365d: r = 1.00
β’ bounced_sum_365d β bounced_mean_365d: r = 0.84
β’ bounced_mean_365d β lag0_bounced_sum: r = 0.74
β’ bounced_mean_365d β lag0_bounced_mean: r = 0.77
β’ bounced_mean_365d β bounced_vs_cohort_mean: r = 0.74
β’ bounced_mean_365d β bounced_vs_cohort_pct: r = 0.74
β’ bounced_mean_365d β bounced_cohort_zscore: r = 0.74
β’ time_to_open_hours_sum_365d β time_to_open_hours_mean_365d: r = 0.83
β’ time_to_open_hours_sum_365d β time_to_open_hours_max_365d: r = 0.94
β’ time_to_open_hours_sum_365d β time_to_open_hours_count_365d: r = 0.75
β’ time_to_open_hours_mean_365d β time_to_open_hours_max_365d: r = 0.95
β’ time_to_open_hours_mean_365d β lag0_time_to_open_hours_mean: r = 0.94
β’ time_to_open_hours_mean_365d β lag0_time_to_open_hours_max: r = 0.93
β’ time_to_open_hours_mean_365d β lag1_time_to_open_hours_mean: r = 0.85
β’ time_to_open_hours_mean_365d β lag1_time_to_open_hours_max: r = 0.85
β’ time_to_open_hours_mean_365d β lag2_time_to_open_hours_mean: r = 0.81
β’ time_to_open_hours_mean_365d β lag2_time_to_open_hours_max: r = 0.81
β’ time_to_open_hours_mean_365d β lag3_time_to_open_hours_mean: r = 0.81
β’ time_to_open_hours_mean_365d β lag3_time_to_open_hours_max: r = 0.81
β’ time_to_open_hours_max_365d β lag0_time_to_open_hours_mean: r = 0.89
β’ time_to_open_hours_max_365d β lag0_time_to_open_hours_max: r = 0.91
β’ time_to_open_hours_max_365d β lag1_time_to_open_hours_mean: r = 0.72
β’ time_to_open_hours_max_365d β lag1_time_to_open_hours_max: r = 0.72
β’ time_to_open_hours_max_365d β lag2_time_to_open_hours_mean: r = 0.71
β’ time_to_open_hours_max_365d β lag2_time_to_open_hours_max: r = 0.71
β’ time_to_open_hours_max_365d β lag3_time_to_open_hours_mean: r = 0.77
β’ time_to_open_hours_max_365d β lag3_time_to_open_hours_max: r = 0.77
β’ opened_sum_all_time β opened_count_all_time: r = 0.80
β’ opened_sum_all_time β clicked_sum_all_time: r = 0.74
β’ opened_sum_all_time β clicked_count_all_time: r = 0.80
β’ opened_sum_all_time β send_hour_sum_all_time: r = 0.79
β’ opened_sum_all_time β send_hour_count_all_time: r = 0.80
β’ opened_sum_all_time β bounced_count_all_time: r = 0.80
β’ opened_sum_all_time β time_to_open_hours_sum_all_time: r = 0.86
β’ opened_sum_all_time β time_to_open_hours_count_all_time: r = 1.00
β’ opened_sum_all_time β opened_beginning: r = 0.76
β’ opened_sum_all_time β opened_end: r = 0.77
β’ opened_count_all_time β clicked_count_all_time: r = 1.00
β’ opened_count_all_time β send_hour_sum_all_time: r = 0.99
β’ opened_count_all_time β send_hour_count_all_time: r = 1.00
β’ opened_count_all_time β bounced_count_all_time: r = 1.00
β’ opened_count_all_time β time_to_open_hours_count_all_time: r = 0.80
β’ opened_count_all_time β send_hour_beginning: r = 0.84
β’ opened_count_all_time β send_hour_end: r = 0.84
β’ clicked_sum_all_time β clicked_mean_all_time: r = 0.78
β’ clicked_sum_all_time β time_to_open_hours_count_all_time: r = 0.74
β’ clicked_count_all_time β send_hour_sum_all_time: r = 0.99
β’ clicked_count_all_time β send_hour_count_all_time: r = 1.00
β’ clicked_count_all_time β bounced_count_all_time: r = 1.00
β’ clicked_count_all_time β time_to_open_hours_count_all_time: r = 0.80
β’ clicked_count_all_time β send_hour_beginning: r = 0.84
β’ clicked_count_all_time β send_hour_end: r = 0.84
β’ send_hour_sum_all_time β send_hour_count_all_time: r = 0.99
β’ send_hour_sum_all_time β bounced_count_all_time: r = 0.99
β’ send_hour_sum_all_time β time_to_open_hours_count_all_time: r = 0.79
β’ send_hour_sum_all_time β send_hour_beginning: r = 0.85
β’ send_hour_sum_all_time β send_hour_end: r = 0.85
β’ send_hour_count_all_time β bounced_count_all_time: r = 1.00
β’ send_hour_count_all_time β time_to_open_hours_count_all_time: r = 0.80
β’ send_hour_count_all_time β send_hour_beginning: r = 0.84
β’ send_hour_count_all_time β send_hour_end: r = 0.84
β’ bounced_sum_all_time β bounced_mean_all_time: r = 0.71
β’ bounced_count_all_time β time_to_open_hours_count_all_time: r = 0.80
β’ bounced_count_all_time β send_hour_beginning: r = 0.84
β’ bounced_count_all_time β send_hour_end: r = 0.84
β’ time_to_open_hours_sum_all_time β time_to_open_hours_max_all_time: r = 0.75
β’ time_to_open_hours_sum_all_time β time_to_open_hours_count_all_time: r = 0.86
β’ time_to_open_hours_sum_all_time β time_to_open_hours_beginning: r = 0.71
β’ time_to_open_hours_sum_all_time β time_to_open_hours_end: r = 0.71
β’ time_to_open_hours_mean_all_time β time_to_open_hours_max_all_time: r = 0.74
β’ time_to_open_hours_count_all_time β opened_beginning: r = 0.76
β’ time_to_open_hours_count_all_time β opened_end: r = 0.77
β’ days_since_last_event_x β days_since_first_event_y: r = -0.99
β’ days_since_last_event_x β active_span_days: r = -0.99
β’ lag0_opened_sum β lag0_opened_mean: r = 0.91
β’ lag0_opened_sum β lag0_time_to_open_hours_count: r = 1.00
β’ lag0_opened_sum β opened_velocity: r = 0.70
β’ lag0_opened_sum β opened_velocity_pct: r = 1.00
β’ lag0_opened_sum β opened_momentum: r = 0.77
β’ lag0_opened_sum β opened_vs_cohort_mean: r = 1.00
β’ lag0_opened_sum β opened_vs_cohort_pct: r = 1.00
β’ lag0_opened_sum β opened_cohort_zscore: r = 1.00
β’ lag0_opened_mean β lag0_time_to_open_hours_count: r = 0.91
β’ lag0_opened_mean β opened_velocity_pct: r = 0.84
β’ lag0_opened_mean β opened_vs_cohort_mean: r = 0.91
β’ lag0_opened_mean β opened_vs_cohort_pct: r = 0.91
β’ lag0_opened_mean β opened_cohort_zscore: r = 0.91
β’ lag0_opened_count β lag0_clicked_count: r = 1.00
β’ lag0_opened_count β lag0_send_hour_sum: r = 0.91
β’ lag0_opened_count β lag0_send_hour_count: r = 1.00
β’ lag0_opened_count β lag0_bounced_count: r = 1.00
β’ lag0_opened_count β send_hour_vs_cohort_mean: r = 0.91
β’ lag0_opened_count β send_hour_vs_cohort_pct: r = 0.91
β’ lag0_opened_count β send_hour_cohort_zscore: r = 0.91
β’ lag0_clicked_sum β lag0_clicked_mean: r = 0.92
β’ lag0_clicked_sum β clicked_vs_cohort_mean: r = 1.00
β’ lag0_clicked_sum β clicked_vs_cohort_pct: r = 1.00
β’ lag0_clicked_sum β clicked_cohort_zscore: r = 1.00
β’ lag0_clicked_mean β clicked_vs_cohort_mean: r = 0.92
β’ lag0_clicked_mean β clicked_vs_cohort_pct: r = 0.92
β’ lag0_clicked_mean β clicked_cohort_zscore: r = 0.92
β’ lag0_clicked_count β lag0_send_hour_sum: r = 0.91
β’ lag0_clicked_count β lag0_send_hour_count: r = 1.00
β’ lag0_clicked_count β lag0_bounced_count: r = 1.00
β’ lag0_clicked_count β send_hour_vs_cohort_mean: r = 0.91
β’ lag0_clicked_count β send_hour_vs_cohort_pct: r = 0.91
β’ lag0_clicked_count β send_hour_cohort_zscore: r = 0.91
β’ lag0_send_hour_sum β lag0_send_hour_count: r = 0.91
β’ lag0_send_hour_sum β lag0_bounced_count: r = 0.91
β’ lag0_send_hour_sum β send_hour_velocity: r = 0.77
β’ lag0_send_hour_sum β send_hour_vs_cohort_mean: r = 1.00
β’ lag0_send_hour_sum β send_hour_vs_cohort_pct: r = 1.00
β’ lag0_send_hour_sum β send_hour_cohort_zscore: r = 1.00
β’ lag0_send_hour_mean β lag0_send_hour_max: r = 0.95
β’ lag0_send_hour_count β lag0_bounced_count: r = 1.00
β’ lag0_send_hour_count β send_hour_vs_cohort_mean: r = 0.91
β’ lag0_send_hour_count β send_hour_vs_cohort_pct: r = 0.91
β’ lag0_send_hour_count β send_hour_cohort_zscore: r = 0.91
β’ lag0_bounced_sum β lag0_bounced_mean: r = 0.94
β’ lag0_bounced_sum β bounced_velocity: r = 0.79
β’ lag0_bounced_sum β bounced_vs_cohort_mean: r = 1.00
β’ lag0_bounced_sum β bounced_vs_cohort_pct: r = 1.00
β’ lag0_bounced_sum β bounced_cohort_zscore: r = 1.00
β’ lag0_bounced_mean β bounced_velocity: r = 0.72
β’ lag0_bounced_mean β bounced_vs_cohort_mean: r = 0.94
β’ lag0_bounced_mean β bounced_vs_cohort_pct: r = 0.94
β’ lag0_bounced_mean β bounced_cohort_zscore: r = 0.94
β’ lag0_bounced_count β send_hour_vs_cohort_mean: r = 0.91
β’ lag0_bounced_count β send_hour_vs_cohort_pct: r = 0.91
β’ lag0_bounced_count β send_hour_cohort_zscore: r = 0.91
β’ lag0_time_to_open_hours_sum β lag0_time_to_open_hours_mean: r = 0.97
β’ lag0_time_to_open_hours_sum β lag0_time_to_open_hours_max: r = 0.99
β’ lag0_time_to_open_hours_sum β opened_velocity_pct: r = 0.70
β’ lag0_time_to_open_hours_sum β time_to_open_hours_velocity: r = 0.77
β’ lag0_time_to_open_hours_sum β time_to_open_hours_momentum: r = 0.89
β’ lag0_time_to_open_hours_sum β time_to_open_hours_vs_cohort_mean: r = 1.00
β’ lag0_time_to_open_hours_sum β time_to_open_hours_vs_cohort_pct: r = 1.00
β’ lag0_time_to_open_hours_sum β time_to_open_hours_cohort_zscore: r = 1.00
β’ lag0_time_to_open_hours_mean β lag0_time_to_open_hours_max: r = 0.99
β’ lag0_time_to_open_hours_mean β time_to_open_hours_velocity: r = 0.86
β’ lag0_time_to_open_hours_mean β time_to_open_hours_momentum: r = 0.83
β’ lag0_time_to_open_hours_mean β time_to_open_hours_vs_cohort_mean: r = 0.97
β’ lag0_time_to_open_hours_mean β time_to_open_hours_vs_cohort_pct: r = 0.97
β’ lag0_time_to_open_hours_mean β time_to_open_hours_cohort_zscore: r = 0.97
β’ lag0_time_to_open_hours_count β opened_velocity: r = 0.70
β’ lag0_time_to_open_hours_count β opened_velocity_pct: r = 1.00
β’ lag0_time_to_open_hours_count β opened_momentum: r = 0.77
β’ lag0_time_to_open_hours_count β opened_vs_cohort_mean: r = 1.00
β’ lag0_time_to_open_hours_count β opened_vs_cohort_pct: r = 1.00
β’ lag0_time_to_open_hours_count β opened_cohort_zscore: r = 1.00
β’ lag0_time_to_open_hours_max β time_to_open_hours_velocity: r = 0.90
β’ lag0_time_to_open_hours_max β time_to_open_hours_momentum: r = 0.89
β’ lag0_time_to_open_hours_max β time_to_open_hours_vs_cohort_mean: r = 0.99
β’ lag0_time_to_open_hours_max β time_to_open_hours_vs_cohort_pct: r = 0.99
β’ lag0_time_to_open_hours_max β time_to_open_hours_cohort_zscore: r = 0.99
β’ lag1_opened_sum β lag1_opened_mean: r = 0.92
β’ lag1_opened_sum β lag1_time_to_open_hours_sum: r = 0.72
β’ lag1_opened_sum β lag1_time_to_open_hours_count: r = 1.00
β’ lag1_opened_sum β opened_acceleration: r = -0.79
β’ lag1_opened_mean β lag1_time_to_open_hours_count: r = 0.92
β’ lag1_opened_count β lag1_clicked_count: r = 1.00
β’ lag1_opened_count β lag1_send_hour_sum: r = 0.89
β’ lag1_opened_count β lag1_send_hour_count: r = 1.00
β’ lag1_opened_count β lag1_bounced_count: r = 1.00
β’ lag1_clicked_sum β lag1_clicked_mean: r = 0.95
β’ lag1_clicked_sum β clicked_velocity: r = -0.72
β’ lag1_clicked_sum β clicked_acceleration: r = -0.84
β’ lag1_clicked_mean β clicked_acceleration: r = -0.76
β’ lag1_clicked_count β lag1_send_hour_sum: r = 0.89
β’ lag1_clicked_count β lag1_send_hour_count: r = 1.00
β’ lag1_clicked_count β lag1_bounced_count: r = 1.00
β’ lag1_send_hour_sum β lag1_send_hour_count: r = 0.89
β’ lag1_send_hour_sum β lag1_bounced_count: r = 0.89
β’ lag1_send_hour_mean β lag1_send_hour_max: r = 0.96
β’ lag1_send_hour_count β lag1_bounced_count: r = 1.00
β’ lag1_time_to_open_hours_sum β lag1_time_to_open_hours_mean: r = 0.98
β’ lag1_time_to_open_hours_sum β lag1_time_to_open_hours_count: r = 0.72
β’ lag1_time_to_open_hours_sum β lag1_time_to_open_hours_max: r = 0.99
β’ lag1_time_to_open_hours_sum β time_to_open_hours_acceleration: r = -0.78
β’ lag1_time_to_open_hours_mean β lag1_time_to_open_hours_max: r = 1.00
β’ lag1_time_to_open_hours_mean β time_to_open_hours_acceleration: r = -0.78
β’ lag1_time_to_open_hours_count β opened_acceleration: r = -0.79
β’ lag1_time_to_open_hours_max β time_to_open_hours_acceleration: r = -0.81
β’ lag2_opened_sum β lag2_opened_mean: r = 0.91
β’ lag2_opened_sum β lag2_time_to_open_hours_sum: r = 0.74
β’ lag2_opened_sum β lag2_time_to_open_hours_count: r = 1.00
β’ lag2_opened_mean β lag2_time_to_open_hours_count: r = 0.91
β’ lag2_opened_count β lag2_clicked_count: r = 1.00
β’ lag2_opened_count β lag2_send_hour_sum: r = 0.88
β’ lag2_opened_count β lag2_send_hour_count: r = 1.00
β’ lag2_opened_count β lag2_bounced_count: r = 1.00
β’ lag2_clicked_sum β lag2_clicked_mean: r = 0.92
β’ lag2_clicked_count β lag2_send_hour_sum: r = 0.88
β’ lag2_clicked_count β lag2_send_hour_count: r = 1.00
β’ lag2_clicked_count β lag2_bounced_count: r = 1.00
β’ lag2_send_hour_sum β lag2_send_hour_count: r = 0.88
β’ lag2_send_hour_sum β lag2_bounced_count: r = 0.88
β’ lag2_send_hour_mean β lag2_send_hour_max: r = 0.96
β’ lag2_send_hour_count β lag2_bounced_count: r = 1.00
β’ lag2_time_to_open_hours_sum β lag2_time_to_open_hours_mean: r = 0.94
β’ lag2_time_to_open_hours_sum β lag2_time_to_open_hours_count: r = 0.74
β’ lag2_time_to_open_hours_sum β lag2_time_to_open_hours_max: r = 0.99
β’ lag2_time_to_open_hours_mean β lag2_time_to_open_hours_max: r = 0.98
β’ lag3_opened_sum β lag3_opened_mean: r = 0.94
β’ lag3_opened_sum β lag3_time_to_open_hours_count: r = 1.00
β’ lag3_opened_mean β lag3_time_to_open_hours_count: r = 0.94
β’ lag3_opened_count β lag3_clicked_count: r = 1.00
β’ lag3_opened_count β lag3_send_hour_sum: r = 0.87
β’ lag3_opened_count β lag3_send_hour_count: r = 1.00
β’ lag3_opened_count β lag3_bounced_count: r = 1.00
β’ lag3_clicked_count β lag3_send_hour_sum: r = 0.87
β’ lag3_clicked_count β lag3_send_hour_count: r = 1.00
β’ lag3_clicked_count β lag3_bounced_count: r = 1.00
β’ lag3_send_hour_sum β lag3_send_hour_count: r = 0.87
β’ lag3_send_hour_sum β lag3_bounced_count: r = 0.87
β’ lag3_send_hour_mean β lag3_send_hour_max: r = 0.96
β’ lag3_send_hour_count β lag3_bounced_count: r = 1.00
β’ lag3_time_to_open_hours_sum β lag3_time_to_open_hours_mean: r = 0.99
β’ lag3_time_to_open_hours_sum β lag3_time_to_open_hours_max: r = 1.00
β’ lag3_time_to_open_hours_mean β lag3_time_to_open_hours_max: r = 1.00
β’ opened_velocity β opened_velocity_pct: r = 0.92
β’ opened_velocity β opened_acceleration: r = 0.84
β’ opened_velocity β opened_vs_cohort_mean: r = 0.70
β’ opened_velocity β opened_vs_cohort_pct: r = 0.70
β’ opened_velocity β opened_cohort_zscore: r = 0.70
β’ opened_velocity_pct β opened_vs_cohort_mean: r = 1.00
β’ opened_velocity_pct β opened_vs_cohort_pct: r = 1.00
β’ opened_velocity_pct β opened_cohort_zscore: r = 1.00
β’ opened_velocity_pct β time_to_open_hours_vs_cohort_mean: r = 0.70
β’ opened_velocity_pct β time_to_open_hours_vs_cohort_pct: r = 0.70
β’ opened_velocity_pct β time_to_open_hours_cohort_zscore: r = 0.70
β’ clicked_velocity β clicked_acceleration: r = 0.85
β’ send_hour_velocity β send_hour_velocity_pct: r = 0.89
β’ send_hour_velocity β send_hour_acceleration: r = 0.79
β’ send_hour_velocity β send_hour_vs_cohort_mean: r = 0.77
β’ send_hour_velocity β send_hour_vs_cohort_pct: r = 0.77
β’ send_hour_velocity β send_hour_cohort_zscore: r = 0.77
β’ send_hour_velocity_pct β send_hour_acceleration: r = 0.76
β’ bounced_velocity β bounced_acceleration: r = 0.87
β’ bounced_velocity β bounced_vs_cohort_mean: r = 0.79
β’ bounced_velocity β bounced_vs_cohort_pct: r = 0.79
β’ bounced_velocity β bounced_cohort_zscore: r = 0.79
β’ time_to_open_hours_velocity β time_to_open_hours_acceleration: r = 0.85
β’ time_to_open_hours_velocity β time_to_open_hours_momentum: r = 0.72
β’ time_to_open_hours_velocity β time_to_open_hours_vs_cohort_mean: r = 0.77
β’ time_to_open_hours_velocity β time_to_open_hours_vs_cohort_pct: r = 0.77
β’ time_to_open_hours_velocity β time_to_open_hours_cohort_zscore: r = 0.77
β’ opened_acceleration β time_to_open_hours_acceleration: r = 0.71
β’ opened_momentum β opened_vs_cohort_mean: r = 0.77
β’ opened_momentum β opened_vs_cohort_pct: r = 0.77
β’ opened_momentum β opened_cohort_zscore: r = 0.77
β’ time_to_open_hours_momentum β time_to_open_hours_vs_cohort_mean: r = 0.89
β’ time_to_open_hours_momentum β time_to_open_hours_vs_cohort_pct: r = 0.89
β’ time_to_open_hours_momentum β time_to_open_hours_cohort_zscore: r = 0.89
β’ opened_beginning β time_to_open_hours_beginning: r = 0.77
β’ opened_end β opened_trend_ratio: r = 0.73
β’ opened_end β time_to_open_hours_end: r = 0.79
β’ clicked_end β clicked_trend_ratio: r = 0.92
β’ bounced_end β bounced_trend_ratio: r = 0.99
β’ days_since_first_event_y β active_span_days: r = 1.00
β’ inter_event_gap_std β inter_event_gap_max: r = 0.89
β’ opened_vs_cohort_mean β opened_vs_cohort_pct: r = 1.00
β’ opened_vs_cohort_mean β opened_cohort_zscore: r = 1.00
β’ opened_vs_cohort_pct β opened_cohort_zscore: r = 1.00
β’ clicked_vs_cohort_mean β clicked_vs_cohort_pct: r = 1.00
β’ clicked_vs_cohort_mean β clicked_cohort_zscore: r = 1.00
β’ clicked_vs_cohort_pct β clicked_cohort_zscore: r = 1.00
β’ send_hour_vs_cohort_mean β send_hour_vs_cohort_pct: r = 1.00
β’ send_hour_vs_cohort_mean β send_hour_cohort_zscore: r = 1.00
β’ send_hour_vs_cohort_pct β send_hour_cohort_zscore: r = 1.00
β’ bounced_vs_cohort_mean β bounced_vs_cohort_pct: r = 1.00
β’ bounced_vs_cohort_mean β bounced_cohort_zscore: r = 1.00
β’ bounced_vs_cohort_pct β bounced_cohort_zscore: r = 1.00
β’ time_to_open_hours_vs_cohort_mean β time_to_open_hours_vs_cohort_pct: r = 1.00
β’ time_to_open_hours_vs_cohort_mean β time_to_open_hours_cohort_zscore: r = 1.00
β’ time_to_open_hours_vs_cohort_pct β time_to_open_hours_cohort_zscore: r = 1.00
π‘ For each pair, keep the feature with:
- Stronger business meaning
- Higher target correlation
- Fewer missing values
----------------------------------------------------------------------
DETAILED RECOMMENDATIONS:
π‘ Remove multicollinear feature
event_count_180d and event_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_180d and opened_count_180d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_180d and clicked_count_180d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_180d and send_hour_sum_180d are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_180d and send_hour_count_180d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_180d and bounced_count_180d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_180d and opened_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_180d and clicked_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_180d and send_hour_sum_365d are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_180d and send_hour_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_180d and bounced_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_365d and opened_count_180d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_365d and clicked_count_180d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_365d and send_hour_sum_180d are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_365d and send_hour_count_180d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_365d and bounced_count_180d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_365d and opened_count_365d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_365d and clicked_count_365d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_365d and send_hour_sum_365d are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_365d and send_hour_count_365d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_365d and bounced_count_365d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_all_time and opened_sum_all_time are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_all_time and opened_count_all_time are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_all_time and clicked_count_all_time are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_all_time and send_hour_sum_all_time are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_all_time and send_hour_count_all_time are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
event_count_all_time and bounced_count_all_time are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_all_time and time_to_open_hours_count_all_time are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_all_time and send_hour_beginning are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
event_count_all_time and send_hour_end are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_180d and opened_mean_180d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_180d and time_to_open_hours_sum_180d are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_sum_180d and time_to_open_hours_count_180d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_180d and opened_sum_365d are highly correlated (r=0.75)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_180d and time_to_open_hours_count_365d are highly correlated (r=0.75)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_180d and time_to_open_hours_count_180d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_180d and opened_mean_365d are highly correlated (r=0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_180d and lag0_opened_sum are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_mean_180d and lag0_opened_mean are highly correlated (r=0.90)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_180d and lag0_time_to_open_hours_count are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_180d and opened_vs_cohort_mean are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_180d and opened_vs_cohort_pct are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_180d and opened_cohort_zscore are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_180d and clicked_count_180d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_180d and send_hour_sum_180d are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_180d and send_hour_count_180d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_180d and bounced_count_180d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_count_180d and opened_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_count_180d and clicked_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_count_180d and send_hour_sum_365d are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_count_180d and send_hour_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_count_180d and bounced_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_sum_180d and clicked_mean_180d are highly correlated (r=0.88)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_sum_180d and clicked_sum_365d are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_mean_180d and clicked_mean_365d are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_mean_180d and lag0_clicked_sum are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_mean_180d and lag0_clicked_mean are highly correlated (r=0.88)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_mean_180d and clicked_vs_cohort_mean are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_mean_180d and clicked_vs_cohort_pct are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_mean_180d and clicked_cohort_zscore are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_count_180d and send_hour_sum_180d are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_count_180d and send_hour_count_180d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_count_180d and bounced_count_180d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_count_180d and opened_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_count_180d and clicked_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_count_180d and send_hour_sum_365d are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_count_180d and send_hour_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_count_180d and bounced_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_sum_180d and send_hour_count_180d are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_sum_180d and bounced_count_180d are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_sum_180d and opened_count_365d are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_sum_180d and clicked_count_365d are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_sum_180d and send_hour_sum_365d are highly correlated (r=0.81)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_sum_180d and send_hour_count_365d are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_sum_180d and bounced_count_365d are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_mean_180d and send_hour_max_180d are highly correlated (r=0.88)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_mean_180d and send_hour_mean_365d are highly correlated (r=0.81)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_mean_180d and lag0_send_hour_mean are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_mean_180d and lag0_send_hour_max are highly correlated (r=0.86)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_max_180d and send_hour_mean_365d are highly correlated (r=0.73)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_max_180d and send_hour_max_365d are highly correlated (r=0.75)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_max_180d and lag0_send_hour_mean are highly correlated (r=0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_max_180d and lag0_send_hour_max are highly correlated (r=0.81)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_count_180d and bounced_count_180d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_count_180d and opened_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_count_180d and clicked_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_count_180d and send_hour_sum_365d are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_count_180d and send_hour_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_count_180d and bounced_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
bounced_sum_180d and bounced_mean_180d are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_sum_180d and bounced_sum_365d are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_mean_180d and bounced_mean_365d are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_mean_180d and lag0_bounced_sum are highly correlated (r=0.85)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
bounced_mean_180d and lag0_bounced_mean are highly correlated (r=0.88)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_mean_180d and bounced_vs_cohort_mean are highly correlated (r=0.85)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_mean_180d and bounced_vs_cohort_pct are highly correlated (r=0.85)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_mean_180d and bounced_cohort_zscore are highly correlated (r=0.85)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_count_180d and opened_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_count_180d and clicked_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_count_180d and send_hour_sum_365d are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_count_180d and send_hour_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_count_180d and bounced_count_365d are highly correlated (r=0.82)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_sum_180d and time_to_open_hours_mean_180d are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_sum_180d and time_to_open_hours_max_180d are highly correlated (r=0.96)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_sum_180d and time_to_open_hours_count_180d are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_sum_180d and time_to_open_hours_sum_365d are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_180d and time_to_open_hours_max_180d are highly correlated (r=0.97)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_180d and time_to_open_hours_sum_365d are highly correlated (r=0.73)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_180d and time_to_open_hours_mean_365d are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_180d and time_to_open_hours_max_365d are highly correlated (r=0.88)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_180d and lag0_time_to_open_hours_sum are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_180d and lag0_time_to_open_hours_mean are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_180d and lag0_time_to_open_hours_max are highly correlated (r=0.96)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_180d and lag1_time_to_open_hours_mean are highly correlated (r=0.73)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_180d and lag1_time_to_open_hours_max are highly correlated (r=0.73)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_180d and lag2_time_to_open_hours_mean are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_180d and lag2_time_to_open_hours_max are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_180d and lag3_time_to_open_hours_mean are highly correlated (r=0.86)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_180d and lag3_time_to_open_hours_max are highly correlated (r=0.87)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_180d and time_to_open_hours_vs_cohort_mean are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_180d and time_to_open_hours_vs_cohort_pct are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_180d and time_to_open_hours_cohort_zscore are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_180d and time_to_open_hours_sum_365d are highly correlated (r=0.81)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_max_180d and time_to_open_hours_mean_365d are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_max_180d and time_to_open_hours_max_365d are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_180d and lag0_time_to_open_hours_sum are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_max_180d and lag0_time_to_open_hours_mean are highly correlated (r=0.95)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_max_180d and lag0_time_to_open_hours_max are highly correlated (r=0.96)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_max_180d and lag2_time_to_open_hours_mean are highly correlated (r=0.90)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_max_180d and lag2_time_to_open_hours_max are highly correlated (r=0.90)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_180d and lag3_time_to_open_hours_mean are highly correlated (r=0.76)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_180d and lag3_time_to_open_hours_max are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_180d and time_to_open_hours_momentum are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_180d and time_to_open_hours_vs_cohort_mean are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_180d and time_to_open_hours_vs_cohort_pct are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_180d and time_to_open_hours_cohort_zscore are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_count_180d and opened_sum_365d are highly correlated (r=0.75)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_count_180d and time_to_open_hours_count_365d are highly correlated (r=0.75)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_365d and opened_mean_365d are highly correlated (r=0.76)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_365d and time_to_open_hours_sum_365d are highly correlated (r=0.75)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_sum_365d and time_to_open_hours_count_365d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_365d and time_to_open_hours_count_365d are highly correlated (r=0.76)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_365d and lag0_opened_sum are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_365d and lag0_opened_mean are highly correlated (r=0.76)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_365d and lag0_time_to_open_hours_count are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_365d and opened_vs_cohort_mean are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_365d and opened_vs_cohort_pct are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_mean_365d and opened_cohort_zscore are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_365d and clicked_count_365d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_365d and send_hour_sum_365d are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_365d and send_hour_count_365d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_365d and bounced_count_365d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_sum_365d and clicked_mean_365d are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_mean_365d and lag0_clicked_sum are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_mean_365d and lag0_clicked_mean are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_mean_365d and clicked_vs_cohort_mean are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_mean_365d and clicked_vs_cohort_pct are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_mean_365d and clicked_cohort_zscore are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_count_365d and send_hour_sum_365d are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_count_365d and send_hour_count_365d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_count_365d and bounced_count_365d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_sum_365d and send_hour_count_365d are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_sum_365d and bounced_count_365d are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_mean_365d and send_hour_max_365d are highly correlated (r=0.81)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_mean_365d and lag0_send_hour_mean are highly correlated (r=0.78)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_mean_365d and lag0_send_hour_max are highly correlated (r=0.75)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_count_365d and bounced_count_365d are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_sum_365d and bounced_mean_365d are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_mean_365d and lag0_bounced_sum are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_mean_365d and lag0_bounced_mean are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_mean_365d and bounced_vs_cohort_mean are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_mean_365d and bounced_vs_cohort_pct are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_mean_365d and bounced_cohort_zscore are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_sum_365d and time_to_open_hours_mean_365d are highly correlated (r=0.83)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_sum_365d and time_to_open_hours_max_365d are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_sum_365d and time_to_open_hours_count_365d are highly correlated (r=0.75)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_365d and time_to_open_hours_max_365d are highly correlated (r=0.95)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_365d and lag0_time_to_open_hours_mean are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_365d and lag0_time_to_open_hours_max are highly correlated (r=0.93)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_mean_365d and lag1_time_to_open_hours_mean are highly correlated (r=0.85)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_365d and lag1_time_to_open_hours_max are highly correlated (r=0.85)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_365d and lag2_time_to_open_hours_mean are highly correlated (r=0.81)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_365d and lag2_time_to_open_hours_max are highly correlated (r=0.81)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_365d and lag3_time_to_open_hours_mean are highly correlated (r=0.81)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_365d and lag3_time_to_open_hours_max are highly correlated (r=0.81)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_max_365d and lag0_time_to_open_hours_mean are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_max_365d and lag0_time_to_open_hours_max are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_365d and lag1_time_to_open_hours_mean are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_365d and lag1_time_to_open_hours_max are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_365d and lag2_time_to_open_hours_mean are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_365d and lag2_time_to_open_hours_max are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_365d and lag3_time_to_open_hours_mean are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_max_365d and lag3_time_to_open_hours_max are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_all_time and opened_count_all_time are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_all_time and clicked_sum_all_time are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_all_time and clicked_count_all_time are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_all_time and send_hour_sum_all_time are highly correlated (r=0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_all_time and send_hour_count_all_time are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_all_time and bounced_count_all_time are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_sum_all_time and time_to_open_hours_sum_all_time are highly correlated (r=0.86)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_sum_all_time and time_to_open_hours_count_all_time are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_all_time and opened_beginning are highly correlated (r=0.76)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_sum_all_time and opened_end are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_all_time and clicked_count_all_time are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_all_time and send_hour_sum_all_time are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_all_time and send_hour_count_all_time are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_count_all_time and bounced_count_all_time are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_count_all_time and time_to_open_hours_count_all_time are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_count_all_time and send_hour_beginning are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_count_all_time and send_hour_end are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_sum_all_time and clicked_mean_all_time are highly correlated (r=0.78)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_sum_all_time and time_to_open_hours_count_all_time are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_count_all_time and send_hour_sum_all_time are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_count_all_time and send_hour_count_all_time are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_count_all_time and bounced_count_all_time are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_count_all_time and time_to_open_hours_count_all_time are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_count_all_time and send_hour_beginning are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_count_all_time and send_hour_end are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_sum_all_time and send_hour_count_all_time are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_sum_all_time and bounced_count_all_time are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_sum_all_time and time_to_open_hours_count_all_time are highly correlated (r=0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_sum_all_time and send_hour_beginning are highly correlated (r=0.85)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_sum_all_time and send_hour_end are highly correlated (r=0.85)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_count_all_time and bounced_count_all_time are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_count_all_time and time_to_open_hours_count_all_time are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_count_all_time and send_hour_beginning are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_count_all_time and send_hour_end are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_sum_all_time and bounced_mean_all_time are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_count_all_time and time_to_open_hours_count_all_time are highly correlated (r=0.80)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_count_all_time and send_hour_beginning are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_count_all_time and send_hour_end are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_sum_all_time and time_to_open_hours_max_all_time are highly correlated (r=0.75)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_sum_all_time and time_to_open_hours_count_all_time are highly correlated (r=0.86)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_sum_all_time and time_to_open_hours_beginning are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_sum_all_time and time_to_open_hours_end are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_mean_all_time and time_to_open_hours_max_all_time are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_count_all_time and opened_beginning are highly correlated (r=0.76)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_count_all_time and opened_end are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
days_since_last_event_x and days_since_first_event_y are highly correlated (r=-0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
days_since_last_event_x and active_span_days are highly correlated (r=-0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_sum and lag0_opened_mean are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_sum and lag0_time_to_open_hours_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag0_opened_sum and opened_velocity are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_sum and opened_velocity_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag0_opened_sum and opened_momentum are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_sum and opened_vs_cohort_mean are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_sum and opened_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_sum and opened_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_mean and lag0_time_to_open_hours_count are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag0_opened_mean and opened_velocity_pct are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_mean and opened_vs_cohort_mean are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_mean and opened_vs_cohort_pct are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_mean and opened_cohort_zscore are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_count and lag0_clicked_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_count and lag0_send_hour_sum are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_count and lag0_send_hour_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_count and lag0_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_count and send_hour_vs_cohort_mean are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_count and send_hour_vs_cohort_pct are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_opened_count and send_hour_cohort_zscore are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_sum and lag0_clicked_mean are highly correlated (r=0.92)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_sum and clicked_vs_cohort_mean are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_sum and clicked_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_sum and clicked_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_mean and clicked_vs_cohort_mean are highly correlated (r=0.92)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_mean and clicked_vs_cohort_pct are highly correlated (r=0.92)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_mean and clicked_cohort_zscore are highly correlated (r=0.92)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_count and lag0_send_hour_sum are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_count and lag0_send_hour_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_count and lag0_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_count and send_hour_vs_cohort_mean are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_count and send_hour_vs_cohort_pct are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_clicked_count and send_hour_cohort_zscore are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_send_hour_sum and lag0_send_hour_count are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_send_hour_sum and lag0_bounced_count are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag0_send_hour_sum and send_hour_velocity are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_send_hour_sum and send_hour_vs_cohort_mean are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_send_hour_sum and send_hour_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_send_hour_sum and send_hour_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_send_hour_mean and lag0_send_hour_max are highly correlated (r=0.95)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_send_hour_count and lag0_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_send_hour_count and send_hour_vs_cohort_mean are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_send_hour_count and send_hour_vs_cohort_pct are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_send_hour_count and send_hour_cohort_zscore are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_bounced_sum and lag0_bounced_mean are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag0_bounced_sum and bounced_velocity are highly correlated (r=0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_bounced_sum and bounced_vs_cohort_mean are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_bounced_sum and bounced_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_bounced_sum and bounced_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag0_bounced_mean and bounced_velocity are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_bounced_mean and bounced_vs_cohort_mean are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_bounced_mean and bounced_vs_cohort_pct are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_bounced_mean and bounced_cohort_zscore are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_bounced_count and send_hour_vs_cohort_mean are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_bounced_count and send_hour_vs_cohort_pct are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_bounced_count and send_hour_cohort_zscore are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_sum and lag0_time_to_open_hours_mean are highly correlated (r=0.97)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_sum and lag0_time_to_open_hours_max are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag0_time_to_open_hours_sum and opened_velocity_pct are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag0_time_to_open_hours_sum and time_to_open_hours_velocity are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_sum and time_to_open_hours_momentum are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_sum and time_to_open_hours_vs_cohort_mean are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_sum and time_to_open_hours_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_sum and time_to_open_hours_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_mean and lag0_time_to_open_hours_max are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_mean and time_to_open_hours_velocity are highly correlated (r=0.86)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag0_time_to_open_hours_mean and time_to_open_hours_momentum are highly correlated (r=0.83)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_mean and time_to_open_hours_vs_cohort_mean are highly correlated (r=0.97)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_mean and time_to_open_hours_vs_cohort_pct are highly correlated (r=0.97)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_mean and time_to_open_hours_cohort_zscore are highly correlated (r=0.97)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag0_time_to_open_hours_count and opened_velocity are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_count and opened_velocity_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag0_time_to_open_hours_count and opened_momentum are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_count and opened_vs_cohort_mean are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_count and opened_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_count and opened_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_max and time_to_open_hours_velocity are highly correlated (r=0.90)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_max and time_to_open_hours_momentum are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_max and time_to_open_hours_vs_cohort_mean are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_max and time_to_open_hours_vs_cohort_pct are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag0_time_to_open_hours_max and time_to_open_hours_cohort_zscore are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_opened_sum and lag1_opened_mean are highly correlated (r=0.92)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag1_opened_sum and lag1_time_to_open_hours_sum are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_opened_sum and lag1_time_to_open_hours_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag1_opened_sum and opened_acceleration are highly correlated (r=-0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_opened_mean and lag1_time_to_open_hours_count are highly correlated (r=0.92)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_opened_count and lag1_clicked_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_opened_count and lag1_send_hour_sum are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_opened_count and lag1_send_hour_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_opened_count and lag1_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_clicked_sum and lag1_clicked_mean are highly correlated (r=0.95)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag1_clicked_sum and clicked_velocity are highly correlated (r=-0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag1_clicked_sum and clicked_acceleration are highly correlated (r=-0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag1_clicked_mean and clicked_acceleration are highly correlated (r=-0.76)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_clicked_count and lag1_send_hour_sum are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_clicked_count and lag1_send_hour_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_clicked_count and lag1_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_send_hour_sum and lag1_send_hour_count are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_send_hour_sum and lag1_bounced_count are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_send_hour_mean and lag1_send_hour_max are highly correlated (r=0.96)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_send_hour_count and lag1_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_time_to_open_hours_sum and lag1_time_to_open_hours_mean are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag1_time_to_open_hours_sum and lag1_time_to_open_hours_count are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_time_to_open_hours_sum and lag1_time_to_open_hours_max are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag1_time_to_open_hours_sum and time_to_open_hours_acceleration are highly correlated (r=-0.78)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag1_time_to_open_hours_mean and lag1_time_to_open_hours_max are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag1_time_to_open_hours_mean and time_to_open_hours_acceleration are highly correlated (r=-0.78)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag1_time_to_open_hours_count and opened_acceleration are highly correlated (r=-0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag1_time_to_open_hours_max and time_to_open_hours_acceleration are highly correlated (r=-0.81)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_opened_sum and lag2_opened_mean are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag2_opened_sum and lag2_time_to_open_hours_sum are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_opened_sum and lag2_time_to_open_hours_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_opened_mean and lag2_time_to_open_hours_count are highly correlated (r=0.91)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_opened_count and lag2_clicked_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_opened_count and lag2_send_hour_sum are highly correlated (r=0.88)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_opened_count and lag2_send_hour_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_opened_count and lag2_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_clicked_sum and lag2_clicked_mean are highly correlated (r=0.92)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_clicked_count and lag2_send_hour_sum are highly correlated (r=0.88)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_clicked_count and lag2_send_hour_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_clicked_count and lag2_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_send_hour_sum and lag2_send_hour_count are highly correlated (r=0.88)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_send_hour_sum and lag2_bounced_count are highly correlated (r=0.88)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_send_hour_mean and lag2_send_hour_max are highly correlated (r=0.96)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_send_hour_count and lag2_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_time_to_open_hours_sum and lag2_time_to_open_hours_mean are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
lag2_time_to_open_hours_sum and lag2_time_to_open_hours_count are highly correlated (r=0.74)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_time_to_open_hours_sum and lag2_time_to_open_hours_max are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag2_time_to_open_hours_mean and lag2_time_to_open_hours_max are highly correlated (r=0.98)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_opened_sum and lag3_opened_mean are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_opened_sum and lag3_time_to_open_hours_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_opened_mean and lag3_time_to_open_hours_count are highly correlated (r=0.94)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_opened_count and lag3_clicked_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_opened_count and lag3_send_hour_sum are highly correlated (r=0.87)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_opened_count and lag3_send_hour_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_opened_count and lag3_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_clicked_count and lag3_send_hour_sum are highly correlated (r=0.87)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_clicked_count and lag3_send_hour_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_clicked_count and lag3_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_send_hour_sum and lag3_send_hour_count are highly correlated (r=0.87)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_send_hour_sum and lag3_bounced_count are highly correlated (r=0.87)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_send_hour_mean and lag3_send_hour_max are highly correlated (r=0.96)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_send_hour_count and lag3_bounced_count are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_time_to_open_hours_sum and lag3_time_to_open_hours_mean are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_time_to_open_hours_sum and lag3_time_to_open_hours_max are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
lag3_time_to_open_hours_mean and lag3_time_to_open_hours_max are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_velocity and opened_velocity_pct are highly correlated (r=0.92)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_velocity and opened_acceleration are highly correlated (r=0.84)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_velocity and opened_vs_cohort_mean are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_velocity and opened_vs_cohort_pct are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_velocity and opened_cohort_zscore are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_velocity_pct and opened_vs_cohort_mean are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_velocity_pct and opened_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_velocity_pct and opened_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_velocity_pct and time_to_open_hours_vs_cohort_mean are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_velocity_pct and time_to_open_hours_vs_cohort_pct are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_velocity_pct and time_to_open_hours_cohort_zscore are highly correlated (r=0.70)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
clicked_velocity and clicked_acceleration are highly correlated (r=0.85)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_velocity and send_hour_velocity_pct are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_velocity and send_hour_acceleration are highly correlated (r=0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_velocity and send_hour_vs_cohort_mean are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_velocity and send_hour_vs_cohort_pct are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_velocity and send_hour_cohort_zscore are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
send_hour_velocity_pct and send_hour_acceleration are highly correlated (r=0.76)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
bounced_velocity and bounced_acceleration are highly correlated (r=0.87)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_velocity and bounced_vs_cohort_mean are highly correlated (r=0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_velocity and bounced_vs_cohort_pct are highly correlated (r=0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
bounced_velocity and bounced_cohort_zscore are highly correlated (r=0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_velocity and time_to_open_hours_acceleration are highly correlated (r=0.85)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_velocity and time_to_open_hours_momentum are highly correlated (r=0.72)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_velocity and time_to_open_hours_vs_cohort_mean are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_velocity and time_to_open_hours_vs_cohort_pct are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
time_to_open_hours_velocity and time_to_open_hours_cohort_zscore are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_acceleration and time_to_open_hours_acceleration are highly correlated (r=0.71)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_momentum and opened_vs_cohort_mean are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_momentum and opened_vs_cohort_pct are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_momentum and opened_cohort_zscore are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_momentum and time_to_open_hours_vs_cohort_mean are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_momentum and time_to_open_hours_vs_cohort_pct are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_momentum and time_to_open_hours_cohort_zscore are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_beginning and time_to_open_hours_beginning are highly correlated (r=0.77)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_end and opened_trend_ratio are highly correlated (r=0.73)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π‘ Remove multicollinear feature
opened_end and time_to_open_hours_end are highly correlated (r=0.79)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_end and clicked_trend_ratio are highly correlated (r=0.92)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
bounced_end and bounced_trend_ratio are highly correlated (r=0.99)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
days_since_first_event_y and active_span_days are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
inter_event_gap_std and inter_event_gap_max are highly correlated (r=0.89)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_vs_cohort_mean and opened_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_vs_cohort_mean and opened_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
opened_vs_cohort_pct and opened_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_vs_cohort_mean and clicked_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_vs_cohort_mean and clicked_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
clicked_vs_cohort_pct and clicked_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_vs_cohort_mean and send_hour_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_vs_cohort_mean and send_hour_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
send_hour_vs_cohort_pct and send_hour_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
bounced_vs_cohort_mean and bounced_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
bounced_vs_cohort_mean and bounced_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
bounced_vs_cohort_pct and bounced_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_vs_cohort_mean and time_to_open_hours_vs_cohort_pct are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_vs_cohort_mean and time_to_open_hours_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Remove multicollinear feature
time_to_open_hours_vs_cohort_pct and time_to_open_hours_cohort_zscore are highly correlated (r=1.00)
β Consider dropping one of these features. Keep the one with stronger business meaning or higher target correlation.
π΄ Prioritize strong predictors
Top predictive features: days_since_last_event_x, days_since_first_event_y, active_span_days
β Ensure these features are included in your model and check for data quality issues.
π’ Consider removing weak predictors
Features with low predictive power: send_hour_mean_180d, send_hour_max_180d, bounced_sum_180d, bounced_mean_180d, time_to_open_hours_mean_180d
β These features may add noise. Consider removing or combining with other features.
5.9.2 Stratification RecommendationsΒΆ
What these recommendations tell you:
- How to split your data for training and testing
- Which segments require special attention in sampling
- High-risk segments that need adequate representation
β οΈ Why This Matters:
- Random splits can under-represent rare segments
- High-risk segments may be systematically excluded
- Model evaluation will be biased without proper stratification
π Implementation:
from sklearn.model_selection import train_test_split
# Stratified split by target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
# Multi-column stratification (for categorical segments)
df['stratify_col'] = df['target'].astype(str) + '_' + df['segment']
Show/Hide Code
# Stratification Recommendations
strat_recs = grouped_recs.get(RecommendationCategory.STRATIFICATION, [])
print("=" * 70)
print("STRATIFICATION (Train/Test Split Strategy)")
print("=" * 70)
# High-risk segments
if analysis_summary.high_risk_segments:
print("\nπ― HIGH-RISK SEGMENTS (ensure representation in training data):")
risk_df = pd.DataFrame(analysis_summary.high_risk_segments)
risk_df["retention_rate"] = risk_df["retention_rate"].apply(lambda x: f"{x:.1%}")
risk_df["lift"] = risk_df["lift"].apply(lambda x: f"{x:.2f}x")
display(risk_df[["feature", "segment", "count", "retention_rate", "lift"]])
print("\n π‘ These segments have below-average retention.")
print(" β Ensure they're adequately represented in both train and test sets")
print(" β Consider oversampling or class weights in modeling")
# Display all stratification recommendations
if strat_recs:
print("\n" + "-" * 70)
print("STRATIFICATION RECOMMENDATIONS:")
for rec in strat_recs:
priority_icon = "π΄" if rec.priority == "high" else "π‘" if rec.priority == "medium" else "π’"
print(f"\n{priority_icon} {rec.title}")
print(f" {rec.description}")
print(f" β {rec.action}")
else:
print("\nβ
No special stratification requirements detected.")
print(" Standard stratified split by target variable is sufficient.")
====================================================================== STRATIFICATION (Train/Test Split Strategy) ====================================================================== π― HIGH-RISK SEGMENTS (ensure representation in training data):
| feature | segment | count | retention_rate | lift | |
|---|---|---|---|---|---|
| 0 | lifecycle_quadrant | Occasional & Loyal | 1683 | 7.6% | 0.17x |
| 1 | lifecycle_quadrant | Steady & Loyal | 820 | 10.4% | 0.23x |
| 2 | recency_bucket | 0-7d | 123 | 4.9% | 0.11x |
| 3 | recency_bucket | 31-90d | 725 | 5.9% | 0.13x |
| 4 | recency_bucket | 8-30d | 364 | 2.2% | 0.05x |
| 5 | recency_bucket | 91-180d | 702 | 7.0% | 0.16x |
π‘ These segments have below-average retention. β Ensure they're adequately represented in both train and test sets β Consider oversampling or class weights in modeling ---------------------------------------------------------------------- STRATIFICATION RECOMMENDATIONS: π΄ Stratify by lifecycle_quadrant Significant variation in retention rates across lifecycle_quadrant categories (spread: 75.1%) β Use stratified sampling by lifecycle_quadrant in train/test split to ensure all segments are represented. π΄ Stratify by recency_bucket Significant variation in retention rates across recency_bucket categories (spread: 66.6%) β Use stratified sampling by recency_bucket in train/test split to ensure all segments are represented. π΄ Monitor high-risk segments Segments with below-average retention: Steady & Loyal, Occasional & Loyal, 0-7d β Target these segments for intervention campaigns and ensure adequate representation in training data.
5.9.3 Model Selection RecommendationsΒΆ
What these recommendations tell you:
- Which model types are well-suited for your data characteristics
- Linear vs non-linear based on relationship patterns
- Ensemble considerations based on feature interactions
π Model Selection Guide Based on Data Characteristics:
| Data Characteristic | Recommended Models | Reason |
|---|---|---|
| Strong linear relationships | Logistic Regression, Linear SVM | Interpretable, fast, less overfit risk |
| Non-linear patterns | Random Forest, XGBoost, LightGBM | Capture complex interactions |
| High multicollinearity | Tree-based models | Robust to correlated features |
| Many categorical features | CatBoost, LightGBM | Native categorical handling |
| Imbalanced classes | Any with class_weight='balanced' | Adjust for minority class |
Show/Hide Code
# Model Selection Recommendations
model_recs = grouped_recs.get(RecommendationCategory.MODEL_SELECTION, [])
print("=" * 70)
print("MODEL SELECTION")
print("=" * 70)
if model_recs:
for rec in model_recs:
priority_icon = "π΄" if rec.priority == "high" else "π‘" if rec.priority == "medium" else "π’"
print(f"\n{priority_icon} {rec.title}")
print(f" {rec.description}")
print(f" β {rec.action}")
# Summary recommendations based on data characteristics
print("\n" + "-" * 70)
print("RECOMMENDED MODELING APPROACH:")
has_multicollinearity = len(analysis_summary.multicollinear_pairs) > 0
has_strong_linear = len([p for p in analysis_summary.strong_predictors if abs(p.get("effect_size", 0)) >= 0.5]) > 0
has_categoricals = len(categorical_features) > 0
if has_strong_linear and not has_multicollinearity:
print("\nβ
RECOMMENDED: Start with Logistic Regression")
print(" β’ Strong linear relationships detected")
print(" β’ Interpretable coefficients for business insights")
print(" β’ Fast training and inference")
print(" β’ Then compare with tree-based ensemble for potential improvement")
elif has_multicollinearity:
print("\nβ
RECOMMENDED: Start with Random Forest or XGBoost")
print(" β’ Multicollinearity present - tree models handle it naturally")
print(" β’ Can keep all features without VIF analysis")
print(" β’ Use feature importance to understand contributions")
else:
print("\nβ
RECOMMENDED: Compare Linear and Tree-Based Models")
print(" β’ No clear linear dominance - test both approaches")
print(" β’ Logistic Regression for interpretability baseline")
print(" β’ Random Forest/XGBoost for potential accuracy gain")
if has_categoricals:
print("\nπ‘ CATEGORICAL HANDLING:")
print(" β’ For tree models: Consider CatBoost or LightGBM with native categorical support")
print(" β’ For linear models: Use target encoding for high-cardinality features")
====================================================================== MODEL SELECTION ====================================================================== π‘ Consider tree-based models for multicollinearity Found 442 highly correlated feature pairs β Tree-based models (Random Forest, XGBoost) are robust to multicollinearity. For linear models, remove redundant features first. π‘ Linear models may perform well Strong linear relationships detected (avg effect size: 0.99) β Start with Logistic Regression as baseline. Clear feature-target relationships suggest interpretable models may work well. π‘ Categorical features are predictive Strong categorical associations: lifecycle_quadrant, recency_bucket β Use target encoding for tree-based models or one-hot encoding for linear models. Consider CatBoost for native categorical handling. ---------------------------------------------------------------------- RECOMMENDED MODELING APPROACH: β RECOMMENDED: Start with Random Forest or XGBoost β’ Multicollinearity present - tree models handle it naturally β’ Can keep all features without VIF analysis β’ Use feature importance to understand contributions π‘ CATEGORICAL HANDLING: β’ For tree models: Consider CatBoost or LightGBM with native categorical support β’ For linear models: Use target encoding for high-cardinality features
5.9.4 Feature Engineering RecommendationsΒΆ
What these recommendations tell you:
- Interaction features to create based on correlation patterns
- Ratio features that may capture relative relationships
- Polynomial features for non-linear patterns
π Common Feature Engineering Patterns:
| Pattern Found | Feature to Create | Example |
|---|---|---|
| Moderate correlation | Ratio feature | feature_a / feature_b |
| Both features predictive | Interaction term | feature_a * feature_b |
| Curved scatter pattern | Polynomial | feature_a ** 2 |
| Related semantics | Difference | total_orders - returned_orders |
Show/Hide Code
# Feature Engineering Recommendations
eng_recs = grouped_recs.get(RecommendationCategory.FEATURE_ENGINEERING, [])
print("=" * 70)
print("FEATURE ENGINEERING")
print("=" * 70)
if eng_recs:
for rec in eng_recs:
priority_icon = "π΄" if rec.priority == "high" else "π‘" if rec.priority == "medium" else "π’"
print(f"\n{priority_icon} {rec.title}")
print(f" {rec.description}")
print(f" β {rec.action}")
if rec.affected_features:
print(f" β Features: {', '.join(rec.affected_features[:5])}")
else:
print("\nβ
No specific feature engineering recommendations based on correlation patterns.")
print(" Consider domain-specific features based on business knowledge.")
# Additional suggestions based on strong predictors
if analysis_summary.strong_predictors:
print("\n" + "-" * 70)
print("POTENTIAL INTERACTION FEATURES:")
strong_features = [p["feature"] for p in analysis_summary.strong_predictors[:5]]
if len(strong_features) >= 2:
print("\n Based on strong predictors, consider interactions between:")
for i, f1 in enumerate(strong_features[:3]):
for f2 in strong_features[i+1:4]:
print(f" β’ {f1} Γ {f2}")
print("\n π‘ Tree-based models discover interactions automatically.")
print(" β For linear models, create explicit interaction columns.")
====================================================================== FEATURE ENGINEERING ====================================================================== π’ Consider ratio features Moderately correlated pairs may benefit from ratio features: event_count_180d/event_count_all_time, event_count_180d/opened_sum_180d, event_count_180d/clicked_sum_180d β Create ratio features (e.g., feature_a / feature_b) to capture relative relationships. β Features: event_count_180d, event_count_180d, event_count_180d, event_count_all_time, opened_sum_180d π’ Test feature interactions Interaction terms may capture non-linear relationships β Use PolynomialFeatures(interaction_only=True) or tree-based models which automatically discover interactions. β Features: event_count_180d, event_count_365d, event_count_all_time, opened_sum_180d ---------------------------------------------------------------------- POTENTIAL INTERACTION FEATURES: Based on strong predictors, consider interactions between: β’ event_count_180d Γ event_count_365d β’ event_count_180d Γ event_count_all_time β’ event_count_180d Γ opened_sum_180d β’ event_count_365d Γ event_count_all_time β’ event_count_365d Γ opened_sum_180d β’ event_count_all_time Γ opened_sum_180d π‘ Tree-based models discover interactions automatically. β For linear models, create explicit interaction columns.
5.9.5 Recommendations Summary TableΒΆ
Show/Hide Code
# Create summary table of all recommendations
all_recs_data = []
for rec in analysis_summary.recommendations:
all_recs_data.append({
"Category": rec.category.value.replace("_", " ").title(),
"Priority": rec.priority.upper(),
"Recommendation": rec.title,
"Action": rec.action[:80] + "..." if len(rec.action) > 80 else rec.action
})
if all_recs_data:
recs_df = pd.DataFrame(all_recs_data)
# Sort by priority
priority_order = {"HIGH": 0, "MEDIUM": 1, "LOW": 2}
recs_df["_sort"] = recs_df["Priority"].map(priority_order)
recs_df = recs_df.sort_values("_sort").drop("_sort", axis=1)
print("=" * 80)
print("ALL RECOMMENDATIONS SUMMARY")
print("=" * 80)
print(f"\nTotal: {len(recs_df)} recommendations")
print(f" π΄ High priority: {len(recs_df[recs_df['Priority'] == 'HIGH'])}")
print(f" π‘ Medium priority: {len(recs_df[recs_df['Priority'] == 'MEDIUM'])}")
print(f" π’ Low priority: {len(recs_df[recs_df['Priority'] == 'LOW'])}")
display(recs_df)
# Save updated findings and recommendations registry
findings.save(FINDINGS_PATH)
registry.save(RECOMMENDATIONS_PATH)
print(f"\nβ
Findings updated with relationship analysis: {FINDINGS_PATH}")
print(f"β
Recommendations registry saved: {RECOMMENDATIONS_PATH}")
print(f" Total recommendations in registry: {len(registry.all_recommendations)}")
if _namespace:
from customer_retention.analysis.auto_explorer.project_context import ProjectContext
_namespace.merged_dir.mkdir(parents=True, exist_ok=True)
_all_findings = _namespace.discover_all_findings(prefer_aggregated=True)
_mgr = ExplorationManager(_namespace.merged_dir, findings_paths=_all_findings)
_scaffold = []
if _namespace.project_context_path.exists():
_ctx = ProjectContext.load(_namespace.project_context_path)
_scaffold = _ctx.merge_scaffold
_multi = _mgr.create_multi_dataset_findings(merge_scaffold=_scaffold)
_multi.save(str(_namespace.multi_dataset_findings_path))
_per_dataset_recs = []
for _ds_name in _namespace.list_datasets():
_ds_dir = _namespace.dataset_findings_dir(_ds_name)
if _ds_dir.is_dir():
for _rp in sorted(_ds_dir.glob("*_recommendations.yaml")):
_per_dataset_recs.append(RecommendationRegistry.load(str(_rp)))
if _per_dataset_recs:
_merged = RecommendationRegistry.merge(_per_dataset_recs)
_merged.save(str(_namespace.merged_recommendations_path))
print(f"\nβ
Merged findings/recommendations saved to {_namespace.merged_dir}")
================================================================================ ALL RECOMMENDATIONS SUMMARY ================================================================================ Total: 452 recommendations π΄ High priority: 241 π‘ Medium priority: 208 π’ Low priority: 3
| Category | Priority | Recommendation | Action | |
|---|---|---|---|---|
| 357 | Feature Selection | HIGH | Remove multicollinear feature | Consider dropping one of these features. Keep ... |
| 179 | Feature Selection | HIGH | Remove multicollinear feature | Consider dropping one of these features. Keep ... |
| 336 | Feature Selection | HIGH | Remove multicollinear feature | Consider dropping one of these features. Keep ... |
| 335 | Feature Selection | HIGH | Remove multicollinear feature | Consider dropping one of these features. Keep ... |
| 334 | Feature Selection | HIGH | Remove multicollinear feature | Consider dropping one of these features. Keep ... |
| ... | ... | ... | ... | ... |
| 157 | Feature Selection | MEDIUM | Remove multicollinear feature | Consider dropping one of these features. Keep ... |
| 170 | Feature Selection | MEDIUM | Remove multicollinear feature | Consider dropping one of these features. Keep ... |
| 443 | Feature Selection | LOW | Consider removing weak predictors | These features may add noise. Consider removin... |
| 450 | Feature Engineering | LOW | Consider ratio features | Create ratio features (e.g., feature_a / featu... |
| 451 | Feature Engineering | LOW | Test feature interactions | Use PolynomialFeatures(interaction_only=True) ... |
452 rows Γ 4 columns
β Findings updated with relationship analysis: /Users/Vital/python/CustomerRetention/experiments/runs/email-6301db6c/datasets/customer_emails/findings/customer_emails_aggregated_findings.yaml β Recommendations registry saved: /Users/Vital/python/CustomerRetention/experiments/runs/email-6301db6c/datasets/customer_emails/findings/customer_emails_aggregated_recommendations.yaml Total recommendations in registry: 674
β Merged findings/recommendations saved to /Users/Vital/python/CustomerRetention/experiments/runs/email-6301db6c/merged
Summary: What We LearnedΒΆ
In this notebook, we analyzed feature relationships and generated actionable recommendations for modeling.
Analysis PerformedΒΆ
Numeric Features:
- Correlation Matrix - Identified multicollinearity issues between feature pairs
- Effect Sizes (Cohen's d) - Quantified how well features discriminate retained vs churned
- Box Plots - Visualized distribution differences between classes
- Feature-Target Correlations - Ranked features by predictive power
Categorical Features: 5. CramΓ©r's V - Measured association strength for categorical variables 6. Retention by Category - Identified high-risk segments 7. Lift Analysis - Found categories performing above/below average
Datetime Features: 8. Cohort Analysis - Retention trends by signup year 9. Seasonality - Monthly patterns in retention
Actionable Recommendations GeneratedΒΆ
| Category | What It Tells You | Impact on Pipeline |
|---|---|---|
| Feature Selection | Which features to prioritize/drop | Reduces noise, improves interpretability |
| Stratification | How to split train/test | Ensures fair evaluation |
| Model Selection | Which algorithms to try first | Matches model to data |
| Feature Engineering | Interactions to create | Captures non-linear patterns |
Key Metrics ReferenceΒΆ
| Data Type | Effect Measure | Strong Signal |
|---|---|---|
| Numeric | Cohen's d | |d| β₯ 0.8 |
| Numeric | Correlation | |r| β₯ 0.5 |
| Categorical | CramΓ©r's V | V β₯ 0.3 |
| Categorical | Lift | < 0.9x or > 1.1x |
Recommended Actions ChecklistΒΆ
Based on the analysis above, here are the key actions to take:
- Feature Selection: Review strong/weak predictors and multicollinear pairs
- Stratification: Use stratified sampling with identified high-risk segments
- Model Selection: Start with recommended model type based on data characteristics
- Feature Engineering: Create interaction features between strong predictors
Next StepsΒΆ
Continue to 05_feature_opportunities.ipynb to:
- Generate derived features (tenure, recency, engagement scores)
- Identify interaction features based on relationships found here
- Create business-relevant composite scores
- Review automated feature recommendations
Save Reminder: Save this notebook (Ctrl+S / Cmd+S) before running the next one. The next notebook will automatically export this notebook's HTML documentation from the saved file.