Chapter 4: Column Deep Dive¶
Purpose: Analyze each column in detail with distribution analysis, value validation, and transformation recommendations.
What you'll learn:
- How to validate value ranges for different column types
- How to interpret distribution shapes (skewness, kurtosis)
- When and why to apply transformations (log, sqrt, capping)
- How to detect zero-inflation and handle it
Outputs:
- Value range validation results
- Per-column distribution visualizations with statistics
- Skewness/kurtosis analysis with transformation recommendations
- Zero-inflation detection
- Type confirmation/override capability
- Updated exploration findings
4.1 Load Previous Findings¶
Show/Hide Code
from customer_retention.analysis.notebook_progress import track_and_export_previous
track_and_export_previous("04_column_deep_dive.ipynb")
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from scipy import stats
from customer_retention.analysis.auto_explorer import ExplorationFindings, RecommendationRegistry
from customer_retention.analysis.visualization import ChartBuilder, console, display_figure, display_table
from customer_retention.core.config.column_config import ColumnType
from customer_retention.core.config.experiments import (
FINDINGS_DIR,
)
from customer_retention.stages.profiling import (
CategoricalDistributionAnalyzer,
DistributionAnalyzer,
TemporalAnalyzer,
TemporalGranularity,
TransformationType,
)
from customer_retention.stages.validation import DataValidator, RuleGenerator
Show/Hide Code
from customer_retention.analysis.auto_explorer import load_notebook_findings
FINDINGS_PATH, _namespace, dataset_name = load_notebook_findings("04_column_deep_dive.ipynb")
print(f"Using: {FINDINGS_PATH}")
findings = ExplorationFindings.load(FINDINGS_PATH)
print(f"\nLoaded findings for {findings.column_count} columns from {findings.source_path}")
# Warn if this is event-level data (should run 01d first)
if findings.is_time_series and "_aggregated" not in FINDINGS_PATH:
ts_meta = findings.time_series_metadata
print("\n\u26a0\ufe0f WARNING: This appears to be EVENT-LEVEL data")
print(f" Entity: {ts_meta.entity_column}, Time: {ts_meta.time_column}")
print(" Recommendation: Run 01d_event_aggregation.ipynb first to create entity-level data")
Using: /Users/Vital/python/CustomerRetention/experiments/runs/retail-e7471284/datasets/customer_retention_retail/findings/customer_retention_retail_aggregated_findings.yaml
Loaded findings for 407 columns from /Users/Vital/python/CustomerRetention/experiments/runs/retail-e7471284/data/bronze/customer_retention_retail_aggregated
4.2 Load Source Data¶
Show/Hide Code
# Load data - handle aggregated data (parquet or Delta Lake)
from pathlib import Path
from customer_retention.analysis.auto_explorer.active_dataset_store import load_active_dataset
from customer_retention.stages.temporal import TEMPORAL_METADATA_COLS
# For aggregated data, load directly from the source path
if "_aggregated" in FINDINGS_PATH:
source_path = Path(findings.source_path)
if not source_path.is_absolute():
source_path = Path("..") / source_path
if source_path.is_dir():
from customer_retention.integrations.adapters.factory import get_delta
df = get_delta(force_local=True).read(str(source_path))
elif source_path.is_file():
df = pd.read_parquet(source_path)
else:
df = load_active_dataset(_namespace, dataset_name)
data_source = f"aggregated:{source_path.name}"
else:
# Standard loading for event-level or entity-level data
df = load_active_dataset(_namespace, dataset_name)
data_source = dataset_name
print(f"Loaded data from: {data_source}")
print(f"Shape: {df.shape}")
charts = ChartBuilder()
# Initialize recommendation registry for this exploration
registry = RecommendationRegistry()
registry.init_bronze(findings.source_path)
# Find target column for Gold layer initialization
target_col = next((name for name, col in findings.columns.items() if col.inferred_type == ColumnType.TARGET), None)
if target_col:
registry.init_gold(target_col)
# Find entity column for Silver layer initialization
entity_col = next((name for name, col in findings.columns.items() if col.inferred_type == ColumnType.IDENTIFIER), None)
if entity_col:
registry.init_silver(entity_col)
print(f"Initialized recommendation registry (Bronze: {findings.source_path})")
Loaded data from: aggregated:customer_retention_retail_aggregated Shape: (30770, 407) Initialized recommendation registry (Bronze: /Users/Vital/python/CustomerRetention/experiments/runs/retail-e7471284/data/bronze/customer_retention_retail_aggregated)
4.3 Value Range Validation¶
📖 Interpretation Guide:
- Percentage fields (rates): Should be 0-100 or 0-1 depending on format
- Binary fields: Should only contain 0 and 1
- Count fields: Should be non-negative integers
- Amount fields: Should be non-negative (unless refunds are possible)
What to Watch For:
- Rates > 100% suggest measurement or data entry errors
- Negative values in fields that should be positive
- Binary fields with values other than 0/1
Actions:
- Cap rates at 100 if they exceed (or investigate cause)
- Flag records with impossible negative values
- Convert binary fields to proper 0/1 encoding
Show/Hide Code
validator = DataValidator()
range_rules = RuleGenerator.from_findings(findings)
console.start_section()
console.header("Value Range Validation")
if range_rules:
range_results = validator.validate_value_ranges(df, range_rules)
issues_found = []
for r in range_results:
detail = f"{r.invalid_values} invalid" if r.invalid_values > 0 else None
console.check(f"{r.column_name} ({r.rule_type})", r.invalid_values == 0, detail)
if r.invalid_values > 0:
issues_found.append(r)
all_invalid = sum(r.invalid_values for r in range_results)
if all_invalid == 0:
console.success("All value ranges valid")
else:
console.error(f"Found {all_invalid:,} values outside expected ranges")
console.info("Examples of invalid values:")
for r in issues_found[:3]:
col = r.column_name
if col in df.columns:
if r.rule_type == 'binary':
invalid_mask = ~df[col].isin([0, 1, np.nan])
condition = "value not in [0, 1]"
elif r.rule_type == 'non_negative':
invalid_mask = df[col] < 0
condition = "value < 0"
elif r.rule_type == 'percentage':
invalid_mask = (df[col] < 0) | (df[col] > 100)
condition = "value < 0 or value > 100"
elif r.rule_type == 'rate':
invalid_mask = (df[col] < 0) | (df[col] > 1)
condition = "value < 0 or value > 1"
else:
continue
invalid_values = df.loc[invalid_mask, col].dropna()
if len(invalid_values) > 0:
examples = invalid_values.head(5).tolist()
console.metric(f" {col}", f"{examples}")
# Add filtering recommendation
registry.add_bronze_filtering(
column=col, condition=condition, action="cap",
rationale=f"{r.invalid_values} values violate {r.rule_type} constraint",
source_notebook="04_column_deep_dive"
)
console.info("Rules auto-generated from detected column types")
else:
range_results = []
console.info("No validation rules generated - no binary/numeric columns detected")
console.end_section()
VALUE RANGE VALIDATION¶
[X] eopenrate_sum_all_time (percentage) — 1 invalid
[OK] eopenrate_mean_all_time (percentage)
[OK] eopenrate_max_all_time (percentage)
[OK] eopenrate_count_all_time (percentage)
[OK] eclickrate_sum_all_time (percentage)
[OK] eclickrate_mean_all_time (percentage)
[OK] eclickrate_max_all_time (percentage)
[OK] eclickrate_count_all_time (percentage)
[OK] paperless_mean_all_time (binary)
[OK] paperless_max_all_time (binary)
[OK] refill_sum_all_time (binary)
[OK] refill_mean_all_time (binary)
[OK] refill_max_all_time (binary)
[OK] doorstep_sum_all_time (binary)
[OK] doorstep_mean_all_time (binary)
[OK] doorstep_max_all_time (binary)
[OK] created_is_weekend_sum_all_time (binary)
[OK] created_is_weekend_mean_all_time (binary)
[OK] created_is_weekend_max_all_time (binary)
[OK] firstorder_is_weekend_sum_all_time (binary)
[OK] firstorder_is_weekend_max_all_time (binary)
[OK] is_missing_firstorder_sum_all_time (binary)
[OK] is_missing_firstorder_mean_all_time (binary)
[OK] is_missing_firstorder_max_all_time (binary)
[OK] lastorder_is_weekend_sum_all_time (binary)
[OK] lastorder_is_weekend_mean_all_time (binary)
[OK] lastorder_is_weekend_max_all_time (binary)
[OK] lag0_eopenrate_sum (percentage)
[OK] lag0_eopenrate_mean (percentage)
[OK] lag0_eopenrate_count (percentage)
[OK] lag0_eopenrate_max (percentage)
[OK] lag0_eclickrate_sum (percentage)
[OK] lag0_eclickrate_mean (percentage)
[OK] lag0_eclickrate_count (percentage)
[OK] lag0_eclickrate_max (percentage)
[OK] lag0_paperless_sum (binary)
[OK] lag0_paperless_mean (binary)
[OK] lag0_paperless_max (binary)
[OK] lag0_refill_sum (binary)
[OK] lag0_refill_mean (binary)
[OK] lag0_refill_max (binary)
[OK] lag0_doorstep_sum (binary)
[OK] lag0_doorstep_mean (binary)
[OK] lag0_doorstep_max (binary)
[OK] lag0_created_delta_hours_count (binary)
[OK] lag0_created_hour_count (binary)
[OK] lag1_eopenrate_count (percentage)
[OK] lag1_eclickrate_count (percentage)
[OK] lag2_eopenrate_count (percentage)
[OK] lag2_eclickrate_count (percentage)
[OK] lag3_eopenrate_count (percentage)
[OK] lag3_eclickrate_count (percentage)
[OK] esent_trend_ratio (percentage)
[OK] eopenrate_beginning (percentage)
[OK] eopenrate_end (percentage)
[OK] eopenrate_trend_ratio (percentage)
[OK] eclickrate_beginning (percentage)
[X] eclickrate_end (binary) — 1 invalid
[OK] eclickrate_trend_ratio (percentage)
[X] avgorder_trend_ratio (binary) — 1 invalid
[X] ordfreq_beginning (binary) — 1 invalid
[X] ordfreq_end (binary) — 1 invalid
[OK] ordfreq_trend_ratio (percentage)
[OK] paperless_beginning (binary)
[OK] paperless_end (binary)
[OK] paperless_trend_ratio (percentage)
[X] days_since_first_event_y (binary) — 5 invalid
[X] active_span_days (binary) — 5 invalid
[OK] recency_ratio (percentage)
[OK] esent_vs_cohort_pct (percentage)
[X] eopenrate_vs_cohort_mean (percentage) — 19763 invalid
[OK] eopenrate_vs_cohort_pct (percentage)
[X] eopenrate_cohort_zscore (percentage) — 19763 invalid
[X] eclickrate_vs_cohort_mean (percentage) — 21763 invalid
[OK] eclickrate_vs_cohort_pct (percentage)
[X] eclickrate_cohort_zscore (percentage) — 21763 invalid
[OK] avgorder_vs_cohort_pct (percentage)
[OK] ordfreq_vs_cohort_pct (percentage)
[X] paperless_vs_cohort_mean (binary) — 30769 invalid
[X] paperless_vs_cohort_pct (binary) — 19973 invalid
[X] paperless_cohort_zscore (binary) — 30769 invalid
[X] refill_vs_cohort_mean (binary) — 30769 invalid
[X] refill_vs_cohort_pct (binary) — 2924 invalid
[X] refill_cohort_zscore (binary) — 30769 invalid
[X] doorstep_vs_cohort_mean (binary) — 30769 invalid
[X] doorstep_vs_cohort_pct (binary) — 1198 invalid
[X] doorstep_cohort_zscore (binary) — 30769 invalid
[OK] created_delta_hours_vs_cohort_pct (percentage)
[X] Found 291,776 values outside expected ranges
(i) Examples of invalid values:
eopenrate_sum_all_time: [116.66666667]
eclickrate_end: [13.63636364]
avgorder_trend_ratio: [1.00990099009901]
(i) Rules auto-generated from detected column types
4.4 Numeric Columns Analysis¶
📖 How to Interpret These Charts:
- Red dashed line = Mean (sensitive to outliers)
- Green solid line = Median (robust to outliers)
- Large gap between mean and median = Skewed distribution
- Long right tail = Positive skew (common in count/amount data)
📖 Understanding Distribution Metrics
| Metric | Interpretation | Action |
|---|---|---|
| Skewness | Measures asymmetry | |skew| > 1: Consider log transform |
| Kurtosis | Measures tail heaviness | kurt > 10: Cap outliers before transform |
| Zero % | Percentage of zeros | > 40%: Use zero-inflation handling |
📖 Transformation Decision Tree:
- If zeros > 40% → Create binary indicator + log(non-zeros)
- If |skewness| > 1 AND kurtosis > 10 → Cap then log
- If |skewness| > 1 → Log transform
- If kurtosis > 10 → Cap outliers only
- Otherwise → Standard scaling is sufficient
Show/Hide Code
# Use framework's DistributionAnalyzer for comprehensive analysis
analyzer = DistributionAnalyzer()
numeric_cols = [
name for name, col in findings.columns.items()
if col.inferred_type in [ColumnType.NUMERIC_CONTINUOUS, ColumnType.NUMERIC_DISCRETE]
and name not in TEMPORAL_METADATA_COLS
]
# Analyze all numeric columns using the framework
analyses = analyzer.analyze_dataframe(df, numeric_cols)
recommendations = {col: analyzer.recommend_transformation(analysis)
for col, analysis in analyses.items()}
for col_name in numeric_cols:
col_info = findings.columns[col_name]
analysis = analyses.get(col_name)
rec = recommendations.get(col_name)
print(f"\n{'='*70}")
print(f"Column: {col_name}")
print(f"Type: {col_info.inferred_type.value} (Confidence: {col_info.confidence:.0%})")
print("-" * 70)
if analysis:
print("\U0001f4ca Distribution Statistics:")
print(f" Mean: {analysis.mean:.3f} | Median: {analysis.median:.3f} | Std: {analysis.std:.3f}")
print(f" Range: [{analysis.min_value:.3f}, {analysis.max_value:.3f}]")
if analysis.percentiles:
print(f" Percentiles: 1%={analysis.percentiles.get('p1', 0):.3f}, 25%={analysis.q1:.3f}, 75%={analysis.q3:.3f}, 99%={analysis.percentiles.get('p99', 0):.3f}")
print("\n\U0001f4c8 Shape Analysis:")
skew_label = '(Right-skewed)' if analysis.skewness > 0.5 else '(Left-skewed)' if analysis.skewness < -0.5 else '(Symmetric)'
print(f" Skewness: {analysis.skewness:.2f} {skew_label}")
kurt_label = '(Heavy tails/outliers)' if analysis.kurtosis > 3 else '(Light tails)'
print(f" Kurtosis: {analysis.kurtosis:.2f} {kurt_label}")
print(f" Zeros: {analysis.zero_count:,} ({analysis.zero_percentage:.1f}%)")
print(f" Outliers (IQR): {analysis.outlier_count_iqr:,} ({analysis.outlier_percentage:.1f}%)")
if rec:
print(f"\n\U0001f527 Recommended Transformation: {rec.recommended_transform.value}")
print(f" Reason: {rec.reason}")
print(f" Priority: {rec.priority}")
if rec.warnings:
for warn in rec.warnings:
print(f" \u26a0\ufe0f {warn}")
# Create enhanced histogram with Plotly
data = df[col_name].dropna()
fig = go.Figure()
fig.add_trace(go.Histogram(x=data, nbinsx=50, name='Distribution',
marker_color='steelblue', opacity=0.7))
# Calculate mean and median
mean_val = data.mean()
median_val = data.median()
# Position labels on opposite sides (left/right) to avoid overlap
# The larger value gets right-justified, smaller gets left-justified
mean_position = "top right" if mean_val >= median_val else "top left"
median_position = "top left" if mean_val >= median_val else "top right"
# Add mean line
fig.add_vline(
x=mean_val,
line_dash="dash",
line_color="red",
annotation_text=f"Mean: {mean_val:.2f}",
annotation_position=mean_position,
annotation_font_color="red",
annotation_bgcolor="rgba(255,255,255,0.8)"
)
# Add median line
fig.add_vline(
x=median_val,
line_dash="solid",
line_color="green",
annotation_text=f"Median: {median_val:.2f}",
annotation_position=median_position,
annotation_font_color="green",
annotation_bgcolor="rgba(255,255,255,0.8)"
)
# Add 99th percentile marker if there are outliers
if analysis and analysis.outlier_percentage > 5 and analysis.percentiles.get('p99') is not None:
fig.add_vline(x=analysis.percentiles['p99'], line_dash="dot", line_color="orange",
annotation_text=f"99th: {analysis.percentiles['p99']:.2f}",
annotation_position="top right",
annotation_font_color="orange",
annotation_bgcolor="rgba(255,255,255,0.8)")
transform_label = rec.recommended_transform.value if rec else "none"
fig.update_layout(
title=f"Distribution: {col_name}<br><sub>Skew: {analysis.skewness:.2f} | Kurt: {analysis.kurtosis:.2f} | Strategy: {transform_label}</sub>",
xaxis_title=col_name,
yaxis_title="Count",
template='plotly_white',
height=400
)
display_figure(fig)
====================================================================== Column: event_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: esent_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 28.144 | Median: 32.000 | Std: 16.754 Range: [0.000, 291.000] Percentiles: 1%=0.000, 25%=16.000, 75%=42.000, 99%=56.000 📈 Shape Analysis: Skewness: -0.05 (Symmetric) Kurtosis: 3.67 (Heavy tails/outliers) Zeros: 3,400 (11.0%) Outliers (IQR): 12 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.05) Priority: low
====================================================================== Column: esent_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 28.141 | Median: 32.000 | Std: 16.750 Range: [0.000, 291.000] Percentiles: 1%=0.000, 25%=16.000, 75%=42.000, 99%=56.000 📈 Shape Analysis: Skewness: -0.05 (Symmetric) Kurtosis: 3.67 (Heavy tails/outliers) Zeros: 3,399 (11.0%) Outliers (IQR): 12 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.05) Priority: low
====================================================================== Column: esent_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 28.142 | Median: 32.000 | Std: 16.750 Range: [0.000, 291.000] Percentiles: 1%=0.000, 25%=16.000, 75%=42.000, 99%=56.000 📈 Shape Analysis: Skewness: -0.05 (Symmetric) Kurtosis: 3.67 (Heavy tails/outliers) Zeros: 3,399 (11.0%) Outliers (IQR): 12 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.05) Priority: low
====================================================================== Column: esent_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: eopenrate_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 25.554 | Median: 13.208 | Std: 29.556 Range: [0.000, 116.667] Percentiles: 1%=0.000, 25%=2.083, 75%=40.000, 99%=100.000 📈 Shape Analysis: Skewness: 1.17 (Right-skewed) Kurtosis: 0.21 (Light tails) Zeros: 7,618 (24.8%) Outliers (IQR): 1,181 (3.8%) 🔧 Recommended Transformation: sqrt_transform Reason: Moderate skewness (1.17) Priority: medium
====================================================================== Column: eopenrate_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 25.551 | Median: 13.208 | Std: 29.551 Range: [0.000, 100.000] Percentiles: 1%=0.000, 25%=2.083, 75%=40.000, 99%=100.000 📈 Shape Analysis: Skewness: 1.17 (Right-skewed) Kurtosis: 0.21 (Light tails) Zeros: 7,617 (24.8%) Outliers (IQR): 1,180 (3.8%) 🔧 Recommended Transformation: sqrt_transform Reason: Moderate skewness (1.17) Priority: medium
====================================================================== Column: eopenrate_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 25.554 | Median: 13.208 | Std: 29.554 Range: [0.000, 100.000] Percentiles: 1%=0.000, 25%=2.083, 75%=40.000, 99%=100.000 📈 Shape Analysis: Skewness: 1.17 (Right-skewed) Kurtosis: 0.21 (Light tails) Zeros: 7,617 (24.8%) Outliers (IQR): 1,181 (3.8%) 🔧 Recommended Transformation: sqrt_transform Reason: Moderate skewness (1.17) Priority: medium
====================================================================== Column: eopenrate_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: eclickrate_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 5.671 | Median: 0.000 | Std: 10.552 Range: [0.000, 100.000] Percentiles: 1%=0.000, 25%=0.000, 75%=7.143, 99%=50.000 📈 Shape Analysis: Skewness: 3.89 (Right-skewed) Kurtosis: 22.91 (Heavy tails/outliers) Zeros: 15,452 (50.2%) Outliers (IQR): 2,787 (9.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (50.2%) combined with high skewness (3.89) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: eclickrate_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 5.671 | Median: 0.000 | Std: 10.552 Range: [0.000, 100.000] Percentiles: 1%=0.000, 25%=0.000, 75%=7.143, 99%=50.000 📈 Shape Analysis: Skewness: 3.89 (Right-skewed) Kurtosis: 22.91 (Heavy tails/outliers) Zeros: 15,451 (50.2%) Outliers (IQR): 2,787 (9.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (50.2%) combined with high skewness (3.89) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: eclickrate_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 5.672 | Median: 0.000 | Std: 10.552 Range: [0.000, 100.000] Percentiles: 1%=0.000, 25%=0.000, 75%=7.143, 99%=50.000 📈 Shape Analysis: Skewness: 3.89 (Right-skewed) Kurtosis: 22.91 (Heavy tails/outliers) Zeros: 15,451 (50.2%) Outliers (IQR): 2,787 (9.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (50.2%) combined with high skewness (3.89) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: eclickrate_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: avgorder_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 61.851 | Median: 50.960 | Std: 40.950 Range: [0.000, 2600.140] Percentiles: 1%=13.081, 25%=40.020, 75%=74.270, 99%=200.324 📈 Shape Analysis: Skewness: 11.70 (Right-skewed) Kurtosis: 548.62 (Heavy tails/outliers) Zeros: 9 (0.0%) Outliers (IQR): 1,741 (5.7%) 🔧 Recommended Transformation: cap_then_log Reason: High skewness (11.70) with significant outliers (5.7%) Priority: high
====================================================================== Column: avgorder_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 61.845 | Median: 50.960 | Std: 40.946 Range: [0.000, 2600.140] Percentiles: 1%=13.138, 25%=40.020, 75%=74.260, 99%=200.326 📈 Shape Analysis: Skewness: 11.71 (Right-skewed) Kurtosis: 548.84 (Heavy tails/outliers) Zeros: 8 (0.0%) Outliers (IQR): 1,742 (5.7%) 🔧 Recommended Transformation: cap_then_log Reason: High skewness (11.71) with significant outliers (5.7%) Priority: high
====================================================================== Column: avgorder_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 61.845 | Median: 50.960 | Std: 40.946 Range: [0.000, 2600.140] Percentiles: 1%=13.138, 25%=40.020, 75%=74.260, 99%=200.326 📈 Shape Analysis: Skewness: 11.71 (Right-skewed) Kurtosis: 548.84 (Heavy tails/outliers) Zeros: 8 (0.0%) Outliers (IQR): 1,742 (5.7%) 🔧 Recommended Transformation: cap_then_log Reason: High skewness (11.71) with significant outliers (5.7%) Priority: high
====================================================================== Column: avgorder_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: ordfreq_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.038 | Median: 0.000 | Std: 0.104 Range: [0.000, 3.250] Percentiles: 1%=0.000, 25%=0.000, 75%=0.041, 99%=0.333 📈 Shape Analysis: Skewness: 10.47 (Right-skewed) Kurtosis: 179.54 (Heavy tails/outliers) Zeros: 18,991 (61.7%) Outliers (IQR): 3,727 (12.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (10.47) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: ordfreq_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.038 | Median: 0.000 | Std: 0.104 Range: [0.000, 3.250] Percentiles: 1%=0.000, 25%=0.000, 75%=0.041, 99%=0.333 📈 Shape Analysis: Skewness: 10.47 (Right-skewed) Kurtosis: 179.54 (Heavy tails/outliers) Zeros: 18,990 (61.7%) Outliers (IQR): 3,727 (12.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (10.47) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: ordfreq_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.038 | Median: 0.000 | Std: 0.104 Range: [0.000, 3.250] Percentiles: 1%=0.000, 25%=0.000, 75%=0.041, 99%=0.333 📈 Shape Analysis: Skewness: 10.47 (Right-skewed) Kurtosis: 179.54 (Heavy tails/outliers) Zeros: 18,990 (61.7%) Outliers (IQR): 3,727 (12.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (10.47) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: ordfreq_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: paperless_sum_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.649 | Median: 1.000 | Std: 0.477 Range: [0.000, 2.000] Percentiles: 1%=0.000, 25%=0.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -0.62 (Left-skewed) Kurtosis: -1.61 (Light tails) Zeros: 10,797 (35.1%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (35.1%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: paperless_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: refill_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: doorstep_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: created_delta_hours_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -3162.416 | Median: -384.000 | Std: 6446.832 Range: [-47952.000, 0.000] Percentiles: 1%=-29712.000, 25%=-2544.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: -2.88 (Left-skewed) Kurtosis: 8.80 (Heavy tails/outliers) Zeros: 10,242 (33.3%) Outliers (IQR): 4,646 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (33.3%) combined with high skewness (-2.88) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: created_delta_hours_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -3199.012 | Median: -384.000 | Std: 6474.993 Range: [-47952.000, 0.000] Percentiles: 1%=-29712.000, 25%=-2616.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: -2.86 (Left-skewed) Kurtosis: 8.66 (Heavy tails/outliers) Zeros: 9,890 (32.5%) Outliers (IQR): 4,587 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (32.5%) combined with high skewness (-2.86) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: created_delta_hours_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -3199.012 | Median: -384.000 | Std: 6474.993 Range: [-47952.000, 0.000] Percentiles: 1%=-29712.000, 25%=-2616.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: -2.86 (Left-skewed) Kurtosis: 8.66 (Heavy tails/outliers) Zeros: 9,890 (32.5%) Outliers (IQR): 4,587 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (32.5%) combined with high skewness (-2.86) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: created_delta_hours_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.989 | Median: 1.000 | Std: 0.107 Range: [0.000, 2.000] Percentiles: 1%=0.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -9.12 (Left-skewed) Kurtosis: 82.22 (Heavy tails/outliers) Zeros: 352 (1.1%) Outliers (IQR): 353 (1.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-9.12) with non-positive values Priority: high
====================================================================== Column: created_hour_sum_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,770 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: created_hour_mean_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,439 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: created_hour_max_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,439 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: created_hour_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.989 | Median: 1.000 | Std: 0.103 Range: [0.000, 2.000] Percentiles: 1%=0.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -9.41 (Left-skewed) Kurtosis: 87.73 (Heavy tails/outliers) Zeros: 331 (1.1%) Outliers (IQR): 332 (1.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-9.41) with non-positive values Priority: high
====================================================================== Column: created_dow_sum_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.746 | Median: 3.000 | Std: 1.984 Range: [0.000, 6.000] Percentiles: 1%=0.000, 25%=1.000, 75%=4.000, 99%=6.000 📈 Shape Analysis: Skewness: 0.17 (Symmetric) Kurtosis: -1.21 (Light tails) Zeros: 5,065 (16.5%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.17) Priority: low
====================================================================== Column: created_dow_mean_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.776 | Median: 3.000 | Std: 1.974 Range: [0.000, 6.000] Percentiles: 1%=0.000, 25%=1.000, 75%=4.000, 99%=6.000 📈 Shape Analysis: Skewness: 0.16 (Symmetric) Kurtosis: -1.20 (Light tails) Zeros: 4,734 (15.6%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.16) Priority: low
====================================================================== Column: created_dow_max_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.776 | Median: 3.000 | Std: 1.974 Range: [0.000, 6.000] Percentiles: 1%=0.000, 25%=1.000, 75%=4.000, 99%=6.000 📈 Shape Analysis: Skewness: 0.16 (Symmetric) Kurtosis: -1.20 (Light tails) Zeros: 4,734 (15.6%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.16) Priority: low
====================================================================== Column: created_dow_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.989 | Median: 1.000 | Std: 0.103 Range: [0.000, 2.000] Percentiles: 1%=0.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -9.41 (Left-skewed) Kurtosis: 87.73 (Heavy tails/outliers) Zeros: 331 (1.1%) Outliers (IQR): 332 (1.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-9.41) with non-positive values Priority: high
====================================================================== Column: created_is_weekend_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.989 | Median: 1.000 | Std: 0.103 Range: [0.000, 2.000] Percentiles: 1%=0.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -9.41 (Left-skewed) Kurtosis: 87.73 (Heavy tails/outliers) Zeros: 331 (1.1%) Outliers (IQR): 332 (1.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-9.41) with non-positive values Priority: high
====================================================================== Column: firstorder_delta_hours_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 966.851 | Median: 24.000 | Std: 3395.851 Range: [-120672.000, 47952.000] Percentiles: 1%=-792.000, 25%=0.000, 75%=528.000, 99%=16447.440 📈 Shape Analysis: Skewness: -3.62 (Left-skewed) Kurtosis: 288.44 (Heavy tails/outliers) Zeros: 13,566 (44.1%) Outliers (IQR): 4,646 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (44.1%) combined with high skewness (-3.62) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: firstorder_delta_hours_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 966.991 | Median: 24.000 | Std: 3395.747 Range: [-120672.000, 47952.000] Percentiles: 1%=-792.000, 25%=0.000, 75%=528.000, 99%=16450.560 📈 Shape Analysis: Skewness: -3.62 (Left-skewed) Kurtosis: 288.59 (Heavy tails/outliers) Zeros: 13,553 (44.1%) Outliers (IQR): 4,646 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (44.1%) combined with high skewness (-3.62) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: firstorder_delta_hours_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 966.991 | Median: 24.000 | Std: 3395.747 Range: [-120672.000, 47952.000] Percentiles: 1%=-792.000, 25%=0.000, 75%=528.000, 99%=16450.560 📈 Shape Analysis: Skewness: -3.62 (Left-skewed) Kurtosis: 288.59 (Heavy tails/outliers) Zeros: 13,553 (44.1%) Outliers (IQR): 4,646 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (44.1%) combined with high skewness (-3.62) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: firstorder_delta_hours_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.024 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -18.35 (Left-skewed) Kurtosis: 1706.33 (Heavy tails/outliers) Zeros: 13 (0.0%) Outliers (IQR): 18 (0.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-18.35) with non-positive values Priority: high
====================================================================== Column: firstorder_hour_sum_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,770 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: firstorder_hour_mean_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,757 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: firstorder_hour_max_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,757 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: firstorder_hour_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.024 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -18.35 (Left-skewed) Kurtosis: 1706.33 (Heavy tails/outliers) Zeros: 13 (0.0%) Outliers (IQR): 18 (0.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-18.35) with non-positive values Priority: high
====================================================================== Column: firstorder_dow_sum_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.687 | Median: 2.000 | Std: 2.023 Range: [0.000, 7.000] Percentiles: 1%=0.000, 25%=1.000, 75%=4.000, 99%=6.000 📈 Shape Analysis: Skewness: 0.26 (Symmetric) Kurtosis: -1.17 (Light tails) Zeros: 5,557 (18.1%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.26) Priority: low
====================================================================== Column: firstorder_dow_mean_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.688 | Median: 2.000 | Std: 2.023 Range: [0.000, 6.000] Percentiles: 1%=0.000, 25%=1.000, 75%=4.000, 99%=6.000 📈 Shape Analysis: Skewness: 0.26 (Symmetric) Kurtosis: -1.16 (Light tails) Zeros: 5,544 (18.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.26) Priority: low
====================================================================== Column: firstorder_dow_max_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.688 | Median: 2.000 | Std: 2.023 Range: [0.000, 6.000] Percentiles: 1%=0.000, 25%=1.000, 75%=4.000, 99%=6.000 📈 Shape Analysis: Skewness: 0.26 (Symmetric) Kurtosis: -1.17 (Light tails) Zeros: 5,544 (18.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.26) Priority: low
====================================================================== Column: firstorder_dow_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.024 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -18.35 (Left-skewed) Kurtosis: 1706.33 (Heavy tails/outliers) Zeros: 13 (0.0%) Outliers (IQR): 18 (0.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-18.35) with non-positive values Priority: high
====================================================================== Column: firstorder_is_weekend_mean_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.230 | Median: 0.000 | Std: 0.421 Range: [0.000, 1.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=1.000 📈 Shape Analysis: Skewness: 1.28 (Right-skewed) Kurtosis: -0.36 (Light tails) Zeros: 23,687 (77.0%) Outliers (IQR): 7,082 (23.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (77.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: firstorder_is_weekend_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: days_since_created_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 131.767 | Median: 16.000 | Std: 268.618 Range: [0.000, 1998.000] Percentiles: 1%=0.000, 25%=0.000, 75%=106.000, 99%=1238.000 📈 Shape Analysis: Skewness: 2.88 (Right-skewed) Kurtosis: 8.80 (Heavy tails/outliers) Zeros: 10,242 (33.3%) Outliers (IQR): 4,646 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (33.3%) combined with high skewness (2.88) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_since_created_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 133.292 | Median: 16.000 | Std: 269.791 Range: [0.000, 1998.000] Percentiles: 1%=0.000, 25%=0.000, 75%=109.000, 99%=1238.000 📈 Shape Analysis: Skewness: 2.86 (Right-skewed) Kurtosis: 8.66 (Heavy tails/outliers) Zeros: 9,890 (32.5%) Outliers (IQR): 4,587 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (32.5%) combined with high skewness (2.86) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_since_created_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 133.292 | Median: 16.000 | Std: 269.791 Range: [0.000, 1998.000] Percentiles: 1%=0.000, 25%=0.000, 75%=109.000, 99%=1238.000 📈 Shape Analysis: Skewness: 2.86 (Right-skewed) Kurtosis: 8.66 (Heavy tails/outliers) Zeros: 9,890 (32.5%) Outliers (IQR): 4,587 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (32.5%) combined with high skewness (2.86) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_since_created_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.989 | Median: 1.000 | Std: 0.107 Range: [0.000, 2.000] Percentiles: 1%=0.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -9.12 (Left-skewed) Kurtosis: 82.22 (Heavy tails/outliers) Zeros: 352 (1.1%) Outliers (IQR): 353 (1.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-9.12) with non-positive values Priority: high
====================================================================== Column: days_until_created_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -131.767 | Median: -16.000 | Std: 268.618 Range: [-1998.000, 0.000] Percentiles: 1%=-1238.000, 25%=-106.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: -2.88 (Left-skewed) Kurtosis: 8.80 (Heavy tails/outliers) Zeros: 10,242 (33.3%) Outliers (IQR): 4,646 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (33.3%) combined with high skewness (-2.88) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_until_created_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -133.292 | Median: -16.000 | Std: 269.791 Range: [-1998.000, 0.000] Percentiles: 1%=-1238.000, 25%=-109.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: -2.86 (Left-skewed) Kurtosis: 8.66 (Heavy tails/outliers) Zeros: 9,890 (32.5%) Outliers (IQR): 4,587 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (32.5%) combined with high skewness (-2.86) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_until_created_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -133.292 | Median: -16.000 | Std: 269.791 Range: [-1998.000, -0.000] Percentiles: 1%=-1238.000, 25%=-109.000, 75%=-0.000, 99%=-0.000 📈 Shape Analysis: Skewness: -2.86 (Left-skewed) Kurtosis: 8.66 (Heavy tails/outliers) Zeros: 9,890 (32.5%) Outliers (IQR): 4,587 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (32.5%) combined with high skewness (-2.86) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_until_created_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.989 | Median: 1.000 | Std: 0.107 Range: [0.000, 2.000] Percentiles: 1%=0.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -9.12 (Left-skewed) Kurtosis: 82.22 (Heavy tails/outliers) Zeros: 352 (1.1%) Outliers (IQR): 353 (1.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-9.12) with non-positive values Priority: high
====================================================================== Column: log1p_days_since_created_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.697 | Median: 2.833 | Std: 2.387 Range: [0.000, 7.600] Percentiles: 1%=0.000, 25%=0.000, 75%=4.673, 99%=7.122 📈 Shape Analysis: Skewness: 0.22 (Symmetric) Kurtosis: -1.33 (Light tails) Zeros: 10,242 (33.3%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (33.3%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: log1p_days_since_created_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.728 | Median: 2.833 | Std: 2.383 Range: [0.000, 7.600] Percentiles: 1%=0.000, 25%=0.000, 75%=4.700, 99%=7.122 📈 Shape Analysis: Skewness: 0.21 (Symmetric) Kurtosis: -1.33 (Light tails) Zeros: 9,890 (32.5%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (32.5%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: log1p_days_since_created_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.728 | Median: 2.833 | Std: 2.383 Range: [0.000, 7.600] Percentiles: 1%=0.000, 25%=0.000, 75%=4.700, 99%=7.122 📈 Shape Analysis: Skewness: 0.21 (Symmetric) Kurtosis: -1.33 (Light tails) Zeros: 9,890 (32.5%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (32.5%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: log1p_days_since_created_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.989 | Median: 1.000 | Std: 0.107 Range: [0.000, 2.000] Percentiles: 1%=0.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -9.12 (Left-skewed) Kurtosis: 82.22 (Heavy tails/outliers) Zeros: 352 (1.1%) Outliers (IQR): 353 (1.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-9.12) with non-positive values Priority: high
====================================================================== Column: is_missing_created_sum_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,770 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: is_missing_created_mean_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: is_missing_created_max_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: is_missing_created_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: is_future_created_sum_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,770 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: is_future_created_mean_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,439 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: is_future_created_max_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,439 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: is_future_created_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.989 | Median: 1.000 | Std: 0.103 Range: [0.000, 2.000] Percentiles: 1%=0.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -9.41 (Left-skewed) Kurtosis: 87.73 (Heavy tails/outliers) Zeros: 331 (1.1%) Outliers (IQR): 332 (1.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-9.41) with non-positive values Priority: high
====================================================================== Column: days_since_firstorder_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 90.489 | Median: 0.000 | Std: 223.082 Range: [0.000, 1985.000] Percentiles: 1%=0.000, 25%=0.000, 75%=46.000, 99%=1059.000 📈 Shape Analysis: Skewness: 3.28 (Right-skewed) Kurtosis: 11.40 (Heavy tails/outliers) Zeros: 18,999 (61.7%) Outliers (IQR): 5,209 (16.9%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (3.28) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_since_firstorder_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 90.560 | Median: 0.000 | Std: 223.155 Range: [0.000, 1985.000] Percentiles: 1%=0.000, 25%=0.000, 75%=46.000, 99%=1059.000 📈 Shape Analysis: Skewness: 3.28 (Right-skewed) Kurtosis: 11.39 (Heavy tails/outliers) Zeros: 18,975 (61.7%) Outliers (IQR): 5,209 (16.9%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (3.28) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_since_firstorder_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 90.560 | Median: 0.000 | Std: 223.155 Range: [0.000, 1985.000] Percentiles: 1%=0.000, 25%=0.000, 75%=46.000, 99%=1059.000 📈 Shape Analysis: Skewness: 3.28 (Right-skewed) Kurtosis: 11.39 (Heavy tails/outliers) Zeros: 18,975 (61.7%) Outliers (IQR): 5,209 (16.9%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (3.28) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_since_firstorder_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.999 | Median: 1.000 | Std: 0.028 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -32.23 (Left-skewed) Kurtosis: 1226.31 (Heavy tails/outliers) Zeros: 24 (0.1%) Outliers (IQR): 25 (0.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-32.23) with non-positive values Priority: high
====================================================================== Column: days_until_firstorder_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -90.489 | Median: 0.000 | Std: 223.082 Range: [-1985.000, 0.000] Percentiles: 1%=-1059.000, 25%=-46.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: -3.28 (Left-skewed) Kurtosis: 11.40 (Heavy tails/outliers) Zeros: 18,999 (61.7%) Outliers (IQR): 5,209 (16.9%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (-3.28) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_until_firstorder_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -90.560 | Median: 0.000 | Std: 223.155 Range: [-1985.000, 0.000] Percentiles: 1%=-1059.000, 25%=-46.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: -3.28 (Left-skewed) Kurtosis: 11.39 (Heavy tails/outliers) Zeros: 18,975 (61.7%) Outliers (IQR): 5,209 (16.9%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (-3.28) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_until_firstorder_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -90.560 | Median: 0.000 | Std: 223.155 Range: [-1985.000, -0.000] Percentiles: 1%=-1059.000, 25%=-46.000, 75%=-0.000, 99%=-0.000 📈 Shape Analysis: Skewness: -3.28 (Left-skewed) Kurtosis: 11.39 (Heavy tails/outliers) Zeros: 18,975 (61.7%) Outliers (IQR): 5,209 (16.9%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (-3.28) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: days_until_firstorder_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.999 | Median: 1.000 | Std: 0.028 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -32.23 (Left-skewed) Kurtosis: 1226.31 (Heavy tails/outliers) Zeros: 24 (0.1%) Outliers (IQR): 25 (0.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-32.23) with non-positive values Priority: high
====================================================================== Column: log1p_days_since_firstorder_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.742 | Median: 0.000 | Std: 2.392 Range: [0.000, 7.594] Percentiles: 1%=0.000, 25%=0.000, 75%=3.850, 99%=6.966 📈 Shape Analysis: Skewness: 0.89 (Right-skewed) Kurtosis: -0.81 (Light tails) Zeros: 18,999 (61.7%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (61.7%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: log1p_days_since_firstorder_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.744 | Median: 0.000 | Std: 2.393 Range: [0.000, 7.594] Percentiles: 1%=0.000, 25%=0.000, 75%=3.850, 99%=6.966 📈 Shape Analysis: Skewness: 0.89 (Right-skewed) Kurtosis: -0.81 (Light tails) Zeros: 18,975 (61.7%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (61.7%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: log1p_days_since_firstorder_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.744 | Median: 0.000 | Std: 2.393 Range: [0.000, 7.594] Percentiles: 1%=0.000, 25%=0.000, 75%=3.850, 99%=6.966 📈 Shape Analysis: Skewness: 0.89 (Right-skewed) Kurtosis: -0.81 (Light tails) Zeros: 18,975 (61.7%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (61.7%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: log1p_days_since_firstorder_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.999 | Median: 1.000 | Std: 0.028 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -32.23 (Left-skewed) Kurtosis: 1226.31 (Heavy tails/outliers) Zeros: 24 (0.1%) Outliers (IQR): 25 (0.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-32.23) with non-positive values Priority: high
====================================================================== Column: is_missing_firstorder_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: is_future_firstorder_sum_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,770 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: is_future_firstorder_mean_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,767 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: is_future_firstorder_max_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,767 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: is_future_firstorder_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.011 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -43.84 (Left-skewed) Kurtosis: 7690.25 (Heavy tails/outliers) Zeros: 3 (0.0%) Outliers (IQR): 4 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-43.84) with non-positive values Priority: high
====================================================================== Column: lastorder_delta_hours_sum_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 3116.892 | Median: 384.000 | Std: 6813.770 Range: [-120672.000, 47952.000] Percentiles: 1%=-120.000, 25%=0.000, 75%=2544.000, 99%=29712.000 📈 Shape Analysis: Skewness: 0.66 (Right-skewed) Kurtosis: 37.73 (Heavy tails/outliers) Zeros: 9,908 (32.2%) Outliers (IQR): 4,657 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (32.2%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lastorder_delta_hours_mean_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 3118.716 | Median: 384.000 | Std: 6815.346 Range: [-120672.000, 47952.000] Percentiles: 1%=-120.000, 25%=0.000, 75%=2568.000, 99%=29712.000 📈 Shape Analysis: Skewness: 0.66 (Right-skewed) Kurtosis: 37.71 (Heavy tails/outliers) Zeros: 9,890 (32.2%) Outliers (IQR): 4,643 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (32.2%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lastorder_delta_hours_max_all_time Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 3118.716 | Median: 384.000 | Std: 6815.346 Range: [-120672.000, 47952.000] Percentiles: 1%=-120.000, 25%=0.000, 75%=2568.000, 99%=29712.000 📈 Shape Analysis: Skewness: 0.66 (Right-skewed) Kurtosis: 37.71 (Heavy tails/outliers) Zeros: 9,890 (32.2%) Outliers (IQR): 4,643 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (32.2%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lastorder_delta_hours_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.999 | Median: 1.000 | Std: 0.025 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -35.97 (Left-skewed) Kurtosis: 1615.14 (Heavy tails/outliers) Zeros: 18 (0.1%) Outliers (IQR): 19 (0.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-35.97) with non-positive values Priority: high
====================================================================== Column: lastorder_hour_sum_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,770 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lastorder_hour_mean_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,752 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lastorder_hour_max_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,752 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lastorder_hour_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.999 | Median: 1.000 | Std: 0.025 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -35.97 (Left-skewed) Kurtosis: 1615.14 (Heavy tails/outliers) Zeros: 18 (0.1%) Outliers (IQR): 19 (0.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-35.97) with non-positive values Priority: high
====================================================================== Column: lastorder_dow_sum_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.709 | Median: 2.000 | Std: 2.064 Range: [0.000, 6.000] Percentiles: 1%=0.000, 25%=1.000, 75%=4.000, 99%=6.000 📈 Shape Analysis: Skewness: 0.28 (Symmetric) Kurtosis: -1.20 (Light tails) Zeros: 5,615 (18.2%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.28) Priority: low
====================================================================== Column: lastorder_dow_mean_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.711 | Median: 2.000 | Std: 2.063 Range: [0.000, 6.000] Percentiles: 1%=0.000, 25%=1.000, 75%=4.000, 99%=6.000 📈 Shape Analysis: Skewness: 0.28 (Symmetric) Kurtosis: -1.20 (Light tails) Zeros: 5,597 (18.2%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.28) Priority: low
====================================================================== Column: lastorder_dow_max_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.711 | Median: 2.000 | Std: 2.063 Range: [0.000, 6.000] Percentiles: 1%=0.000, 25%=1.000, 75%=4.000, 99%=6.000 📈 Shape Analysis: Skewness: 0.28 (Symmetric) Kurtosis: -1.20 (Light tails) Zeros: 5,597 (18.2%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.28) Priority: low
====================================================================== Column: lastorder_dow_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.999 | Median: 1.000 | Std: 0.025 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: -35.97 (Left-skewed) Kurtosis: 1615.14 (Heavy tails/outliers) Zeros: 18 (0.1%) Outliers (IQR): 19 (0.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-35.97) with non-positive values Priority: high
====================================================================== Column: lastorder_is_weekend_count_all_time Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.014 Range: [0.000, 2.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 47.72 (Right-skewed) Kurtosis: 5125.28 (Heavy tails/outliers) Zeros: 1 (0.0%) Outliers (IQR): 6 (0.0%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (47.72) with non-positive values Priority: high
====================================================================== Column: days_since_first_event_x Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1557.130 | Median: 1578.000 | Std: 722.528 Range: [0.000, 3501.000] Percentiles: 1%=21.000, 25%=1493.000, 75%=1962.000, 99%=3016.640 📈 Shape Analysis: Skewness: -0.54 (Left-skewed) Kurtosis: 0.17 (Light tails) Zeros: 15 (0.0%) Outliers (IQR): 6,472 (21.0%) 🔧 Recommended Transformation: cap_outliers Reason: Significant outliers (21.0%) despite low skewness Priority: medium ⚠️ Consider investigating outlier causes before capping
====================================================================== Column: dow_sin Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.076 | Median: 0.000 | Std: 0.694 Range: [-0.975, 0.975] Percentiles: 1%=-0.975, 25%=-0.434, 75%=0.782, 99%=0.975 📈 Shape Analysis: Skewness: -0.17 (Symmetric) Kurtosis: -1.44 (Light tails) Zeros: 4,808 (15.6%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.17) Priority: low
====================================================================== Column: dow_cos Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.032 | Median: -0.223 | Std: 0.715 Range: [-1.000, 1.000] Percentiles: 1%=-0.901, 25%=-0.901, 75%=0.623, 99%=1.000 📈 Shape Analysis: Skewness: -0.08 (Symmetric) Kurtosis: -1.52 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.08) Priority: low
====================================================================== Column: cohort_year Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2013.168 | Median: 2013.000 | Std: 1.952 Range: [2008.000, 2018.000] Percentiles: 1%=2009.000, 25%=2012.000, 75%=2013.000, 99%=2017.000 📈 Shape Analysis: Skewness: 0.59 (Right-skewed) Kurtosis: 0.20 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 7,722 (25.1%) 🔧 Recommended Transformation: cap_outliers Reason: Significant outliers (25.1%) despite low skewness Priority: medium ⚠️ Consider investigating outlier causes before capping
====================================================================== Column: lag0_esent_sum Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 28.140 | Median: 32.000 | Std: 16.750 Range: [0.000, 291.000] Percentiles: 1%=0.000, 25%=16.000, 75%=42.000, 99%=56.000 📈 Shape Analysis: Skewness: -0.05 (Symmetric) Kurtosis: 3.67 (Heavy tails/outliers) Zeros: 3,400 (11.1%) Outliers (IQR): 12 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.05) Priority: low
====================================================================== Column: lag0_esent_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 28.140 | Median: 32.000 | Std: 16.750 Range: [0.000, 291.000] Percentiles: 1%=0.000, 25%=16.000, 75%=42.000, 99%=56.000 📈 Shape Analysis: Skewness: -0.05 (Symmetric) Kurtosis: 3.67 (Heavy tails/outliers) Zeros: 3,400 (11.1%) Outliers (IQR): 12 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.05) Priority: low
====================================================================== Column: lag0_esent_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.000 Range: [1.000, 1.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag0_esent_max Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 28.140 | Median: 32.000 | Std: 16.750 Range: [0.000, 291.000] Percentiles: 1%=0.000, 25%=16.000, 75%=42.000, 99%=56.000 📈 Shape Analysis: Skewness: -0.05 (Symmetric) Kurtosis: 3.67 (Heavy tails/outliers) Zeros: 3,400 (11.1%) Outliers (IQR): 12 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.05) Priority: low
====================================================================== Column: lag0_eopenrate_sum Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 25.549 | Median: 13.158 | Std: 29.551 Range: [0.000, 100.000] Percentiles: 1%=0.000, 25%=2.083, 75%=40.000, 99%=100.000 📈 Shape Analysis: Skewness: 1.17 (Right-skewed) Kurtosis: 0.21 (Light tails) Zeros: 7,618 (24.8%) Outliers (IQR): 1,180 (3.8%) 🔧 Recommended Transformation: sqrt_transform Reason: Moderate skewness (1.17) Priority: medium
====================================================================== Column: lag0_eopenrate_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 25.549 | Median: 13.158 | Std: 29.551 Range: [0.000, 100.000] Percentiles: 1%=0.000, 25%=2.083, 75%=40.000, 99%=100.000 📈 Shape Analysis: Skewness: 1.17 (Right-skewed) Kurtosis: 0.21 (Light tails) Zeros: 7,618 (24.8%) Outliers (IQR): 1,180 (3.8%) 🔧 Recommended Transformation: sqrt_transform Reason: Moderate skewness (1.17) Priority: medium
====================================================================== Column: lag0_eopenrate_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.000 Range: [1.000, 1.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag0_eopenrate_max Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 25.549 | Median: 13.158 | Std: 29.551 Range: [0.000, 100.000] Percentiles: 1%=0.000, 25%=2.083, 75%=40.000, 99%=100.000 📈 Shape Analysis: Skewness: 1.17 (Right-skewed) Kurtosis: 0.21 (Light tails) Zeros: 7,618 (24.8%) Outliers (IQR): 1,180 (3.8%) 🔧 Recommended Transformation: sqrt_transform Reason: Moderate skewness (1.17) Priority: medium
====================================================================== Column: lag0_eclickrate_sum Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 5.671 | Median: 0.000 | Std: 10.552 Range: [0.000, 100.000] Percentiles: 1%=0.000, 25%=0.000, 75%=7.143, 99%=50.000 📈 Shape Analysis: Skewness: 3.89 (Right-skewed) Kurtosis: 22.91 (Heavy tails/outliers) Zeros: 15,453 (50.2%) Outliers (IQR): 2,787 (9.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (50.2%) combined with high skewness (3.89) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: lag0_eclickrate_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 5.671 | Median: 0.000 | Std: 10.552 Range: [0.000, 100.000] Percentiles: 1%=0.000, 25%=0.000, 75%=7.143, 99%=50.000 📈 Shape Analysis: Skewness: 3.89 (Right-skewed) Kurtosis: 22.91 (Heavy tails/outliers) Zeros: 15,453 (50.2%) Outliers (IQR): 2,787 (9.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (50.2%) combined with high skewness (3.89) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: lag0_eclickrate_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.000 Range: [1.000, 1.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag0_eclickrate_max Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 5.671 | Median: 0.000 | Std: 10.552 Range: [0.000, 100.000] Percentiles: 1%=0.000, 25%=0.000, 75%=7.143, 99%=50.000 📈 Shape Analysis: Skewness: 3.89 (Right-skewed) Kurtosis: 22.91 (Heavy tails/outliers) Zeros: 15,453 (50.2%) Outliers (IQR): 2,787 (9.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (50.2%) combined with high skewness (3.89) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: lag0_avgorder_sum Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 61.845 | Median: 50.960 | Std: 40.946 Range: [0.000, 2600.140] Percentiles: 1%=13.138, 25%=40.020, 75%=74.260, 99%=200.326 📈 Shape Analysis: Skewness: 11.71 (Right-skewed) Kurtosis: 548.84 (Heavy tails/outliers) Zeros: 8 (0.0%) Outliers (IQR): 1,742 (5.7%) 🔧 Recommended Transformation: cap_then_log Reason: High skewness (11.71) with significant outliers (5.7%) Priority: high
====================================================================== Column: lag0_avgorder_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 61.845 | Median: 50.960 | Std: 40.946 Range: [0.000, 2600.140] Percentiles: 1%=13.138, 25%=40.020, 75%=74.260, 99%=200.326 📈 Shape Analysis: Skewness: 11.71 (Right-skewed) Kurtosis: 548.84 (Heavy tails/outliers) Zeros: 8 (0.0%) Outliers (IQR): 1,742 (5.7%) 🔧 Recommended Transformation: cap_then_log Reason: High skewness (11.71) with significant outliers (5.7%) Priority: high
====================================================================== Column: lag0_avgorder_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.000 Range: [1.000, 1.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag0_avgorder_max Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 61.845 | Median: 50.960 | Std: 40.946 Range: [0.000, 2600.140] Percentiles: 1%=13.138, 25%=40.020, 75%=74.260, 99%=200.326 📈 Shape Analysis: Skewness: 11.71 (Right-skewed) Kurtosis: 548.84 (Heavy tails/outliers) Zeros: 8 (0.0%) Outliers (IQR): 1,742 (5.7%) 🔧 Recommended Transformation: cap_then_log Reason: High skewness (11.71) with significant outliers (5.7%) Priority: high
====================================================================== Column: lag0_ordfreq_sum Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.038 | Median: 0.000 | Std: 0.104 Range: [0.000, 3.250] Percentiles: 1%=0.000, 25%=0.000, 75%=0.041, 99%=0.333 📈 Shape Analysis: Skewness: 10.47 (Right-skewed) Kurtosis: 179.54 (Heavy tails/outliers) Zeros: 18,991 (61.7%) Outliers (IQR): 3,727 (12.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (10.47) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: lag0_ordfreq_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.038 | Median: 0.000 | Std: 0.104 Range: [0.000, 3.250] Percentiles: 1%=0.000, 25%=0.000, 75%=0.041, 99%=0.333 📈 Shape Analysis: Skewness: 10.47 (Right-skewed) Kurtosis: 179.54 (Heavy tails/outliers) Zeros: 18,991 (61.7%) Outliers (IQR): 3,727 (12.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (10.47) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: lag0_ordfreq_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.000 Range: [1.000, 1.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag0_ordfreq_max Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.038 | Median: 0.000 | Std: 0.104 Range: [0.000, 3.250] Percentiles: 1%=0.000, 25%=0.000, 75%=0.041, 99%=0.333 📈 Shape Analysis: Skewness: 10.47 (Right-skewed) Kurtosis: 179.54 (Heavy tails/outliers) Zeros: 18,991 (61.7%) Outliers (IQR): 3,727 (12.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (10.47) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: lag0_paperless_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.000 Range: [1.000, 1.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag0_refill_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.000 Range: [1.000, 1.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag0_doorstep_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.000 Range: [1.000, 1.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag0_created_delta_hours_sum Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -3162.519 | Median: -384.000 | Std: 6446.912 Range: [-47952.000, 0.000] Percentiles: 1%=-29712.000, 25%=-2544.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: -2.88 (Left-skewed) Kurtosis: 8.80 (Heavy tails/outliers) Zeros: 10,241 (33.3%) Outliers (IQR): 4,646 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (33.3%) combined with high skewness (-2.88) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: lag0_created_delta_hours_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -3199.012 | Median: -384.000 | Std: 6474.993 Range: [-47952.000, 0.000] Percentiles: 1%=-29712.000, 25%=-2616.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: -2.86 (Left-skewed) Kurtosis: 8.66 (Heavy tails/outliers) Zeros: 9,890 (32.5%) Outliers (IQR): 4,587 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (32.5%) combined with high skewness (-2.86) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: lag0_created_delta_hours_max Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -3199.012 | Median: -384.000 | Std: 6474.993 Range: [-47952.000, 0.000] Percentiles: 1%=-29712.000, 25%=-2616.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: -2.86 (Left-skewed) Kurtosis: 8.66 (Heavy tails/outliers) Zeros: 9,890 (32.5%) Outliers (IQR): 4,587 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (32.5%) combined with high skewness (-2.86) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: lag0_created_hour_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag0_created_hour_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,435 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag0_created_hour_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,435 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag1_esent_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_esent_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_esent_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag1_esent_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_eopenrate_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_eopenrate_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_eopenrate_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag1_eopenrate_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_eclickrate_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_eclickrate_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_eclickrate_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag1_eclickrate_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_avgorder_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_avgorder_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_avgorder_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag1_avgorder_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_ordfreq_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_ordfreq_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_ordfreq_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag1_ordfreq_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_paperless_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_paperless_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_paperless_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag1_paperless_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_refill_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_refill_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_refill_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag1_refill_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_doorstep_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_doorstep_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_doorstep_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag1_doorstep_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_created_delta_hours_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_created_delta_hours_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_created_delta_hours_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag1_created_delta_hours_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_created_hour_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_created_hour_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag1_created_hour_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag1_created_hour_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_esent_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_esent_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_esent_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag2_esent_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_eopenrate_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_eopenrate_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_eopenrate_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag2_eopenrate_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_eclickrate_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_eclickrate_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_eclickrate_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag2_eclickrate_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_avgorder_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_avgorder_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_avgorder_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag2_avgorder_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_ordfreq_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_ordfreq_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_ordfreq_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag2_ordfreq_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_paperless_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_paperless_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_paperless_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag2_paperless_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_refill_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_refill_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_refill_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag2_refill_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_doorstep_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_doorstep_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_doorstep_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag2_doorstep_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_created_delta_hours_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_created_delta_hours_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_created_delta_hours_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag2_created_delta_hours_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_created_hour_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_created_hour_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag2_created_hour_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag2_created_hour_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_esent_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_esent_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_esent_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag3_esent_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_eopenrate_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_eopenrate_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_eopenrate_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag3_eopenrate_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_eclickrate_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_eclickrate_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_eclickrate_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag3_eclickrate_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_avgorder_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_avgorder_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_avgorder_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag3_avgorder_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_ordfreq_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_ordfreq_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_ordfreq_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag3_ordfreq_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_paperless_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_paperless_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_paperless_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag3_paperless_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_refill_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_refill_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_refill_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag3_refill_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_doorstep_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_doorstep_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_doorstep_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag3_doorstep_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_created_delta_hours_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_created_delta_hours_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_created_delta_hours_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag3_created_delta_hours_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_created_hour_sum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_created_hour_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: lag3_created_hour_count Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: lag3_created_hour_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: esent_velocity Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: esent_velocity_pct Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: eopenrate_velocity Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: eopenrate_velocity_pct Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: eclickrate_velocity Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: eclickrate_velocity_pct Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: avgorder_velocity Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: avgorder_velocity_pct Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: ordfreq_velocity Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: ordfreq_velocity_pct Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: paperless_velocity Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: paperless_velocity_pct Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: refill_velocity Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: refill_velocity_pct Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: doorstep_velocity Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: doorstep_velocity_pct Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: created_delta_hours_velocity Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: created_delta_hours_velocity_pct Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: created_hour_velocity Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: created_hour_velocity_pct Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: esent_acceleration Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: esent_momentum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: eopenrate_acceleration Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: eopenrate_momentum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: eclickrate_acceleration Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: eclickrate_momentum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: avgorder_acceleration Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: avgorder_momentum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: ordfreq_acceleration Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: ordfreq_momentum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: paperless_acceleration Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: paperless_momentum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: refill_acceleration Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: refill_momentum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: doorstep_acceleration Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: doorstep_momentum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: created_delta_hours_acceleration Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: created_delta_hours_momentum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: created_hour_acceleration Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: created_hour_momentum Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: esent_beginning Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 27.400 | Median: 30.000 | Std: 13.903 Range: [11.000, 45.000] Percentiles: 1%=11.200, 25%=16.000, 75%=35.000, 99%=44.600 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: -1.69 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: esent_end Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 20.400 | Median: 22.000 | Std: 17.286 Range: [0.000, 40.000] Percentiles: 1%=0.240, 25%=6.000, 75%=34.000, 99%=39.760 📈 Shape Analysis: Skewness: -0.12 (Symmetric) Kurtosis: -2.43 (Light tails) Zeros: 1 (20.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.12) Priority: low
====================================================================== Column: esent_trend_ratio Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.695 | Median: 0.756 | Std: 0.590 Range: [0.000, 1.375] Percentiles: 1%=0.008, 25%=0.200, 75%=1.143, 99%=1.366 📈 Shape Analysis: Skewness: -0.10 (Symmetric) Kurtosis: -2.37 (Light tails) Zeros: 1 (20.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.10) Priority: low
====================================================================== Column: eopenrate_beginning Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 36.169 | Median: 27.273 | Std: 37.564 Range: [0.000, 100.000] Percentiles: 1%=1.000, 25%=25.000, 75%=28.571, 99%=97.143 📈 Shape Analysis: Skewness: 1.65 (Right-skewed) Kurtosis: 3.51 (Heavy tails/outliers) Zeros: 1 (20.0%) Outliers (IQR): 2 (40.0%) 🔧 Recommended Transformation: sqrt_transform Reason: Moderate skewness (1.65) Priority: medium
====================================================================== Column: eopenrate_end Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 14.833 | Median: 7.500 | Std: 20.820 Range: [0.000, 50.000] Percentiles: 1%=0.000, 25%=0.000, 75%=16.667, 99%=48.667 📈 Shape Analysis: Skewness: 1.69 (Right-skewed) Kurtosis: 2.84 (Light tails) Zeros: 2 (40.0%) Outliers (IQR): 1 (20.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (40.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: eopenrate_trend_ratio Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.607 | Median: 0.215 | Std: 0.935 Range: [0.000, 2.000] Percentiles: 1%=0.005, 25%=0.125, 75%=0.697, 99%=1.948 📈 Shape Analysis: Skewness: 1.92 (Right-skewed) Kurtosis: 3.74 (Heavy tails/outliers) Zeros: 1 (25.0%) Outliers (IQR): 1 (25.0%) 🔧 Recommended Transformation: sqrt_transform Reason: Moderate skewness (1.92) Priority: medium
====================================================================== Column: eclickrate_beginning Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 2.476 | Median: 0.000 | Std: 3.407 Range: [0.000, 6.667] Percentiles: 1%=0.000, 25%=0.000, 75%=5.714, 99%=6.629 📈 Shape Analysis: Skewness: 0.65 (Right-skewed) Kurtosis: -3.07 (Light tails) Zeros: 3 (60.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (60.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: eclickrate_trend_ratio Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: nan (Symmetric) Kurtosis: nan (Light tails) Zeros: 2 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: avgorder_beginning Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 48.006 | Median: 51.480 | Std: 13.420 Range: [29.000, 61.610] Percentiles: 1%=29.441, 25%=40.020, 75%=57.920, 99%=61.462 📈 Shape Analysis: Skewness: -0.66 (Left-skewed) Kurtosis: -1.12 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.66) Priority: low
====================================================================== Column: avgorder_end Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 48.128 | Median: 51.480 | Std: 13.577 Range: [29.000, 62.220] Percentiles: 1%=29.441, 25%=40.020, 75%=57.920, 99%=62.048 📈 Shape Analysis: Skewness: -0.63 (Left-skewed) Kurtosis: -1.12 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.63) Priority: low
====================================================================== Column: ordfreq_trend_ratio Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: nan Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: nan (Symmetric) Kurtosis: nan (Light tails) Zeros: 1 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: paperless_trend_ratio Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.000 Range: [1.000, 1.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: nan (Symmetric) Kurtosis: nan (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: nan) Priority: low
====================================================================== Column: refill_beginning Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 5 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: refill_end Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 5 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: refill_trend_ratio Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: doorstep_beginning Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 5 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: doorstep_end Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 5 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: doorstep_trend_ratio Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: created_delta_hours_beginning Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 5 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: created_delta_hours_end Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 5 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: created_delta_hours_trend_ratio Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: created_hour_beginning Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 5 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: created_hour_end Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 5 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: created_hour_trend_ratio Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: days_since_last_event Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: recency_ratio Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: event_frequency Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.041 | Median: 0.041 | Std: 0.000 Range: [0.041, 0.041] Percentiles: 1%=0.041, 25%=0.041, 75%=0.041, 99%=0.041 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: inter_event_gap_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1462.000 | Median: 1462.000 | Std: 0.000 Range: [1462.000, 1462.000] Percentiles: 1%=1462.000, 25%=1462.000, 75%=1462.000, 99%=1462.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: inter_event_gap_std Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 5 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: inter_event_gap_max Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1462.000 | Median: 1462.000 | Std: 0.000 Range: [1462.000, 1462.000] Percentiles: 1%=1462.000, 25%=1462.000, 75%=1462.000, 99%=1462.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: regularity_score Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.000 | Std: 0.000 Range: [1.000, 1.000] Percentiles: 1%=1.000, 25%=1.000, 75%=1.000, 99%=1.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
====================================================================== Column: esent_vs_cohort_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 3.860 | Std: 16.750 Range: [-28.140, 262.860] Percentiles: 1%=-28.140, 25%=-12.140, 75%=13.860, 99%=27.860 📈 Shape Analysis: Skewness: -0.05 (Symmetric) Kurtosis: 3.67 (Heavy tails/outliers) Zeros: 0 (0.0%) Outliers (IQR): 12 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.05) Priority: low
====================================================================== Column: esent_vs_cohort_pct Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 1.137 | Std: 0.595 Range: [0.000, 10.341] Percentiles: 1%=0.000, 25%=0.569, 75%=1.493, 99%=1.990 📈 Shape Analysis: Skewness: -0.05 (Symmetric) Kurtosis: 3.67 (Heavy tails/outliers) Zeros: 3,400 (11.1%) Outliers (IQR): 12 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.05) Priority: low
====================================================================== Column: esent_cohort_zscore Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.230 | Std: 1.000 Range: [-1.680, 15.693] Percentiles: 1%=-1.680, 25%=-0.725, 75%=0.827, 99%=1.663 📈 Shape Analysis: Skewness: -0.05 (Symmetric) Kurtosis: 3.67 (Heavy tails/outliers) Zeros: 0 (0.0%) Outliers (IQR): 12 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: -0.05) Priority: low
====================================================================== Column: eopenrate_vs_cohort_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -0.000 | Median: -12.391 | Std: 29.551 Range: [-25.549, 74.451] Percentiles: 1%=-25.549, 25%=-23.466, 75%=14.451, 99%=74.451 📈 Shape Analysis: Skewness: 1.17 (Right-skewed) Kurtosis: 0.21 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 1,180 (3.8%) 🔧 Recommended Transformation: yeo_johnson Reason: Moderate skewness (1.17) with negative values Priority: medium
====================================================================== Column: eopenrate_vs_cohort_pct Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 0.515 | Std: 1.157 Range: [0.000, 3.914] Percentiles: 1%=0.000, 25%=0.082, 75%=1.566, 99%=3.914 📈 Shape Analysis: Skewness: 1.17 (Right-skewed) Kurtosis: 0.21 (Light tails) Zeros: 7,618 (24.8%) Outliers (IQR): 1,180 (3.8%) 🔧 Recommended Transformation: sqrt_transform Reason: Moderate skewness (1.17) Priority: medium
====================================================================== Column: eopenrate_cohort_zscore Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -0.000 | Median: -0.419 | Std: 1.000 Range: [-0.865, 2.519] Percentiles: 1%=-0.865, 25%=-0.794, 75%=0.489, 99%=2.519 📈 Shape Analysis: Skewness: 1.17 (Right-skewed) Kurtosis: 0.21 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 1,180 (3.8%) 🔧 Recommended Transformation: yeo_johnson Reason: Moderate skewness (1.17) with negative values Priority: medium
====================================================================== Column: eclickrate_vs_cohort_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -0.000 | Median: -5.671 | Std: 10.552 Range: [-5.671, 94.329] Percentiles: 1%=-5.671, 25%=-5.671, 75%=1.472, 99%=44.329 📈 Shape Analysis: Skewness: 3.89 (Right-skewed) Kurtosis: 22.91 (Heavy tails/outliers) Zeros: 0 (0.0%) Outliers (IQR): 2,787 (9.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (3.89) with negative values present Priority: high ⚠️ Yeo-Johnson handles negative values unlike log/sqrt
====================================================================== Column: eclickrate_vs_cohort_pct Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 0.000 | Std: 1.861 Range: [0.000, 17.633] Percentiles: 1%=0.000, 25%=0.000, 75%=1.259, 99%=8.816 📈 Shape Analysis: Skewness: 3.89 (Right-skewed) Kurtosis: 22.91 (Heavy tails/outliers) Zeros: 15,453 (50.2%) Outliers (IQR): 2,787 (9.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (50.2%) combined with high skewness (3.89) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: eclickrate_cohort_zscore Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -0.000 | Median: -0.537 | Std: 1.000 Range: [-0.537, 8.940] Percentiles: 1%=-0.537, 25%=-0.537, 75%=0.139, 99%=4.201 📈 Shape Analysis: Skewness: 3.89 (Right-skewed) Kurtosis: 22.91 (Heavy tails/outliers) Zeros: 0 (0.0%) Outliers (IQR): 2,787 (9.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (3.89) with negative values present Priority: high ⚠️ Yeo-Johnson handles negative values unlike log/sqrt
====================================================================== Column: avgorder_vs_cohort_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -0.000 | Median: -10.885 | Std: 40.946 Range: [-61.845, 2538.295] Percentiles: 1%=-48.707, 25%=-21.825, 75%=12.415, 99%=138.481 📈 Shape Analysis: Skewness: 11.71 (Right-skewed) Kurtosis: 548.84 (Heavy tails/outliers) Zeros: 0 (0.0%) Outliers (IQR): 1,742 (5.7%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (11.71) with negative values present Priority: high ⚠️ Yeo-Johnson handles negative values unlike log/sqrt
====================================================================== Column: avgorder_vs_cohort_pct Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 0.824 | Std: 0.662 Range: [0.000, 42.043] Percentiles: 1%=0.212, 25%=0.647, 75%=1.201, 99%=3.239 📈 Shape Analysis: Skewness: 11.71 (Right-skewed) Kurtosis: 548.84 (Heavy tails/outliers) Zeros: 8 (0.0%) Outliers (IQR): 1,742 (5.7%) 🔧 Recommended Transformation: cap_then_log Reason: High skewness (11.71) with significant outliers (5.7%) Priority: high
====================================================================== Column: avgorder_cohort_zscore Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -0.000 | Median: -0.266 | Std: 1.000 Range: [-1.510, 61.991] Percentiles: 1%=-1.190, 25%=-0.533, 75%=0.303, 99%=3.382 📈 Shape Analysis: Skewness: 11.71 (Right-skewed) Kurtosis: 548.84 (Heavy tails/outliers) Zeros: 0 (0.0%) Outliers (IQR): 1,742 (5.7%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (11.71) with negative values present Priority: high ⚠️ Yeo-Johnson handles negative values unlike log/sqrt
====================================================================== Column: ordfreq_vs_cohort_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -0.000 | Median: -0.038 | Std: 0.104 Range: [-0.038, 3.212] Percentiles: 1%=-0.038, 25%=-0.038, 75%=0.003, 99%=0.296 📈 Shape Analysis: Skewness: 10.47 (Right-skewed) Kurtosis: 179.54 (Heavy tails/outliers) Zeros: 0 (0.0%) Outliers (IQR): 3,727 (12.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (10.47) with negative values present Priority: high ⚠️ Yeo-Johnson handles negative values unlike log/sqrt
====================================================================== Column: ordfreq_vs_cohort_pct Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 0.000 | Std: 2.754 Range: [0.000, 86.070] Percentiles: 1%=0.000, 25%=0.000, 75%=1.081, 99%=8.828 📈 Shape Analysis: Skewness: 10.47 (Right-skewed) Kurtosis: 179.54 (Heavy tails/outliers) Zeros: 18,991 (61.7%) Outliers (IQR): 3,727 (12.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (61.7%) combined with high skewness (10.47) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: ordfreq_cohort_zscore Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: -0.000 | Median: -0.363 | Std: 1.000 Range: [-0.363, 30.893] Percentiles: 1%=-0.363, 25%=-0.363, 75%=0.029, 99%=2.843 📈 Shape Analysis: Skewness: 10.47 (Right-skewed) Kurtosis: 179.54 (Heavy tails/outliers) Zeros: 0 (0.0%) Outliers (IQR): 3,727 (12.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (10.47) with negative values present Priority: high ⚠️ Yeo-Johnson handles negative values unlike log/sqrt
====================================================================== Column: created_delta_hours_vs_cohort_mean Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 2778.519 | Std: 6446.912 Range: [-44789.481, 3162.519] Percentiles: 1%=-26549.481, 25%=618.519, 75%=3162.519, 99%=3162.519 📈 Shape Analysis: Skewness: -2.88 (Left-skewed) Kurtosis: 8.80 (Heavy tails/outliers) Zeros: 0 (0.0%) Outliers (IQR): 4,646 (15.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-2.88) with negative values present Priority: high ⚠️ Yeo-Johnson handles negative values unlike log/sqrt
====================================================================== Column: created_delta_hours_vs_cohort_pct Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 1.000 | Median: 0.121 | Std: 2.039 Range: [-0.000, 15.163] Percentiles: 1%=-0.000, 25%=0.000, 75%=0.804, 99%=9.395 📈 Shape Analysis: Skewness: 2.88 (Right-skewed) Kurtosis: 8.80 (Heavy tails/outliers) Zeros: 10,241 (33.3%) Outliers (IQR): 4,646 (15.1%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Zero-inflation (33.3%) combined with high skewness (2.88) Priority: high ⚠️ Consider creating a binary indicator for zeros plus log transform of non-zero values
====================================================================== Column: created_delta_hours_cohort_zscore Type: numeric_continuous (Confidence: 90%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.431 | Std: 1.000 Range: [-6.947, 0.491] Percentiles: 1%=-4.118, 25%=0.096, 75%=0.491, 99%=0.491 📈 Shape Analysis: Skewness: -2.88 (Left-skewed) Kurtosis: 8.80 (Heavy tails/outliers) Zeros: 0 (0.0%) Outliers (IQR): 4,646 (15.1%) 🔧 Recommended Transformation: yeo_johnson Reason: High skewness (-2.88) with negative values present Priority: high ⚠️ Yeo-Johnson handles negative values unlike log/sqrt
====================================================================== Column: created_hour_vs_cohort_mean Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] Percentiles: 1%=0.000, 25%=0.000, 75%=0.000, 99%=0.000 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 30,769 (100.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: zero_inflation_handling Reason: Significant zero-inflation (100.0%) Priority: medium ⚠️ Many zero values may indicate a mixture distribution
====================================================================== Column: created_hour_vs_cohort_pct Type: numeric_discrete (Confidence: 70%) ---------------------------------------------------------------------- 📊 Distribution Statistics: Mean: 0.000 | Median: 0.000 | Std: 0.000 Range: [0.000, 0.000] 📈 Shape Analysis: Skewness: 0.00 (Symmetric) Kurtosis: 0.00 (Light tails) Zeros: 0 (0.0%) Outliers (IQR): 0 (0.0%) 🔧 Recommended Transformation: none Reason: Distribution is approximately normal (skewness: 0.00) Priority: low
Show/Hide Code
# Numerical Feature Statistics Table
if numeric_cols:
stats_data = []
for col_name in numeric_cols:
series = df[col_name].dropna()
if len(series) > 0:
stats_data.append({
"feature": col_name,
"count": len(series),
"mean": series.mean(),
"std": series.std(),
"min": series.min(),
"25%": series.quantile(0.25),
"50%": series.quantile(0.50),
"75%": series.quantile(0.75),
"95%": series.quantile(0.95),
"99%": series.quantile(0.99),
"max": series.max(),
"skewness": stats.skew(series),
"kurtosis": stats.kurtosis(series)
})
stats_df = pd.DataFrame(stats_data)
# Format for display
display_stats = stats_df.copy()
for col in ["mean", "std", "min", "25%", "50%", "75%", "95%", "99%", "max"]:
display_stats[col] = display_stats[col].apply(lambda x: f"{x:.3f}")
display_stats["skewness"] = display_stats["skewness"].apply(lambda x: f"{x:.3f}")
display_stats["kurtosis"] = display_stats["kurtosis"].apply(lambda x: f"{x:.3f}")
print("=" * 80)
print("NUMERICAL FEATURE STATISTICS")
print("=" * 80)
display(display_stats)
================================================================================ NUMERICAL FEATURE STATISTICS ================================================================================
| feature | count | mean | std | min | 25% | 50% | 75% | 95% | 99% | max | skewness | kurtosis | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | event_count_all_time | 30770 | 1.000 | 0.014 | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 2.000 | 47.720 | 5124.445 |
| 1 | esent_sum_all_time | 30770 | 28.144 | 16.754 | 0.000 | 16.000 | 32.000 | 42.000 | 48.000 | 56.000 | 291.000 | -0.052 | 3.669 |
| 2 | esent_mean_all_time | 30769 | 28.141 | 16.750 | 0.000 | 16.000 | 32.000 | 42.000 | 48.000 | 56.000 | 291.000 | -0.053 | 3.672 |
| 3 | esent_max_all_time | 30769 | 28.142 | 16.750 | 0.000 | 16.000 | 32.000 | 42.000 | 48.000 | 56.000 | 291.000 | -0.053 | 3.672 |
| 4 | esent_count_all_time | 30770 | 1.000 | 0.014 | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 2.000 | 47.720 | 5124.445 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 206 | ordfreq_cohort_zscore | 30769 | -0.000 | 1.000 | -0.363 | -0.363 | -0.363 | 0.029 | 1.295 | 2.843 | 30.893 | 10.465 | 179.508 |
| 207 | created_delta_hours_vs_cohort_mean | 30769 | 0.000 | 6446.912 | -44789.481 | 618.519 | 2778.519 | 3162.519 | 3162.519 | 3162.519 | 3162.519 | -2.877 | 8.797 |
| 208 | created_delta_hours_vs_cohort_pct | 30769 | 1.000 | 2.039 | -0.000 | 0.000 | 0.121 | 0.804 | 5.962 | 9.395 | 15.163 | 2.877 | 8.797 |
| 209 | created_delta_hours_cohort_zscore | 30769 | 0.000 | 1.000 | -6.947 | 0.096 | 0.431 | 0.491 | 0.491 | 0.491 | 0.491 | -2.877 | 8.797 |
| 210 | created_hour_vs_cohort_mean | 30769 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | nan | nan |
211 rows × 13 columns
4.5 Distribution Summary & Transformation Plan¶
This table summarizes all numeric columns with their recommended transformations.
Show/Hide Code
# Build transformation summary table
summary_data = []
for col_name in numeric_cols:
analysis = analyses.get(col_name)
rec = recommendations.get(col_name)
if analysis and rec:
summary_data.append({
"Column": col_name,
"Skewness": f"{analysis.skewness:.2f}",
"Kurtosis": f"{analysis.kurtosis:.2f}",
"Zeros %": f"{analysis.zero_percentage:.1f}%",
"Outliers %": f"{analysis.outlier_percentage:.1f}%",
"Transform": rec.recommended_transform.value,
"Priority": rec.priority
})
# Add Gold transformation recommendation if not "none"
if rec.recommended_transform != TransformationType.NONE and registry.gold:
registry.add_gold_transformation(
column=col_name,
transform=rec.recommended_transform.value,
parameters=rec.parameters,
rationale=rec.reason,
source_notebook="04_column_deep_dive"
)
if summary_data:
summary_df = pd.DataFrame(summary_data)
display_table(summary_df)
# Show how many transformation recommendations were added
transform_count = sum(1 for r in recommendations.values() if r and r.recommended_transform != TransformationType.NONE)
if transform_count > 0 and registry.gold:
print(f"\n✅ Added {transform_count} transformation recommendations to Gold layer")
else:
console.info("No numeric columns to summarize")
| Column | Skewness | Kurtosis | Zeros % | Outliers % | Transform | Priority |
|---|---|---|---|---|---|---|
| event_count_all_time | 47.72 | 5125.28 | 0.0% | 0.0% | yeo_johnson | high |
| esent_sum_all_time | -0.05 | 3.67 | 11.0% | 0.0% | none | low |
| esent_mean_all_time | -0.05 | 3.67 | 11.0% | 0.0% | none | low |
| esent_max_all_time | -0.05 | 3.67 | 11.0% | 0.0% | none | low |
| esent_count_all_time | 47.72 | 5125.28 | 0.0% | 0.0% | yeo_johnson | high |
| eopenrate_sum_all_time | 1.17 | 0.21 | 24.8% | 3.8% | sqrt_transform | medium |
| eopenrate_mean_all_time | 1.17 | 0.21 | 24.8% | 3.8% | sqrt_transform | medium |
| eopenrate_max_all_time | 1.17 | 0.21 | 24.8% | 3.8% | sqrt_transform | medium |
| eopenrate_count_all_time | 47.72 | 5125.28 | 0.0% | 0.0% | yeo_johnson | high |
| eclickrate_sum_all_time | 3.89 | 22.91 | 50.2% | 9.1% | zero_inflation_handling | high |
| eclickrate_mean_all_time | 3.89 | 22.91 | 50.2% | 9.1% | zero_inflation_handling | high |
| eclickrate_max_all_time | 3.89 | 22.91 | 50.2% | 9.1% | zero_inflation_handling | high |
| eclickrate_count_all_time | 47.72 | 5125.28 | 0.0% | 0.0% | yeo_johnson | high |
| avgorder_sum_all_time | 11.70 | 548.62 | 0.0% | 5.7% | cap_then_log | high |
| avgorder_mean_all_time | 11.71 | 548.84 | 0.0% | 5.7% | cap_then_log | high |
| avgorder_max_all_time | 11.71 | 548.84 | 0.0% | 5.7% | cap_then_log | high |
| avgorder_count_all_time | 47.72 | 5125.28 | 0.0% | 0.0% | yeo_johnson | high |
| ordfreq_sum_all_time | 10.47 | 179.54 | 61.7% | 12.1% | zero_inflation_handling | high |
| ordfreq_mean_all_time | 10.47 | 179.54 | 61.7% | 12.1% | zero_inflation_handling | high |
| ordfreq_max_all_time | 10.47 | 179.54 | 61.7% | 12.1% | zero_inflation_handling | high |
| ordfreq_count_all_time | 47.72 | 5125.28 | 0.0% | 0.0% | yeo_johnson | high |
| paperless_sum_all_time | -0.62 | -1.61 | 35.1% | 0.0% | zero_inflation_handling | medium |
| paperless_count_all_time | 47.72 | 5125.28 | 0.0% | 0.0% | yeo_johnson | high |
| refill_count_all_time | 47.72 | 5125.28 | 0.0% | 0.0% | yeo_johnson | high |
| doorstep_count_all_time | 47.72 | 5125.28 | 0.0% | 0.0% | yeo_johnson | high |
| created_delta_hours_sum_all_time | -2.88 | 8.80 | 33.3% | 15.1% | zero_inflation_handling | high |
| created_delta_hours_mean_all_time | -2.86 | 8.66 | 32.5% | 15.1% | zero_inflation_handling | high |
| created_delta_hours_max_all_time | -2.86 | 8.66 | 32.5% | 15.1% | zero_inflation_handling | high |
| created_delta_hours_count_all_time | -9.12 | 82.22 | 1.1% | 1.1% | yeo_johnson | high |
| created_hour_sum_all_time | 0.00 | 0.00 | 100.0% | 0.0% | zero_inflation_handling | medium |
| created_hour_mean_all_time | 0.00 | 0.00 | 100.0% | 0.0% | zero_inflation_handling | medium |
| created_hour_max_all_time | 0.00 | 0.00 | 100.0% | 0.0% | zero_inflation_handling | medium |
| created_hour_count_all_time | -9.41 | 87.73 | 1.1% | 1.1% | yeo_johnson | high |
| created_dow_sum_all_time | 0.17 | -1.21 | 16.5% | 0.0% | none | low |
| created_dow_mean_all_time | 0.16 | -1.20 | 15.6% | 0.0% | none | low |
| created_dow_max_all_time | 0.16 | -1.20 | 15.6% | 0.0% | none | low |
| created_dow_count_all_time | -9.41 | 87.73 | 1.1% | 1.1% | yeo_johnson | high |
| created_is_weekend_count_all_time | -9.41 | 87.73 | 1.1% | 1.1% | yeo_johnson | high |
| firstorder_delta_hours_sum_all_time | -3.62 | 288.44 | 44.1% | 15.1% | zero_inflation_handling | high |
| firstorder_delta_hours_mean_all_time | -3.62 | 288.59 | 44.1% | 15.1% | zero_inflation_handling | high |
| firstorder_delta_hours_max_all_time | -3.62 | 288.59 | 44.1% | 15.1% | zero_inflation_handling | high |
| firstorder_delta_hours_count_all_time | -18.35 | 1706.33 | 0.0% | 0.1% | yeo_johnson | high |
| firstorder_hour_sum_all_time | 0.00 | 0.00 | 100.0% | 0.0% | zero_inflation_handling | medium |
| firstorder_hour_mean_all_time | 0.00 | 0.00 | 100.0% | 0.0% | zero_inflation_handling | medium |
| firstorder_hour_max_all_time | 0.00 | 0.00 | 100.0% | 0.0% | zero_inflation_handling | medium |
| firstorder_hour_count_all_time | -18.35 | 1706.33 | 0.0% | 0.1% | yeo_johnson | high |
| firstorder_dow_sum_all_time | 0.26 | -1.17 | 18.1% | 0.0% | none | low |
| firstorder_dow_mean_all_time | 0.26 | -1.16 | 18.0% | 0.0% | none | low |
| firstorder_dow_max_all_time | 0.26 | -1.17 | 18.0% | 0.0% | none | low |
| firstorder_dow_count_all_time | -18.35 | 1706.33 | 0.0% | 0.1% | yeo_johnson | high |
✅ Added 173 transformation recommendations to Gold layer
4.6 Categorical Columns Analysis¶
📖 Distribution Metrics (Analogues to Numeric Skewness/Kurtosis):
| Metric | Interpretation | Action |
|---|---|---|
| Imbalance Ratio | Largest / Smallest category count | > 10: Consider grouping rare categories |
| Entropy | Diversity measure (0 = one category, higher = more uniform) | Low entropy: May need stratified sampling |
| Top-3 Concentration | % of data in top 3 categories | > 90%: Rare categories may cause issues |
| Rare Category % | Categories with < 1% of data | High %: Group into "Other" category |
📖 Encoding Recommendations:
- Low cardinality (≤5) → One-hot encoding
- Medium cardinality (6-20) → One-hot or Target encoding
- High cardinality (>20) → Target encoding or Frequency encoding
- Cyclical (days, months) → Sin/Cos encoding
⚠️ Common Issues:
- Rare categories can cause overfitting with one-hot encoding
- High cardinality + one-hot = feature explosion
- Imbalanced categories may need special handling in train/test splits
Show/Hide Code
# Use framework's CategoricalDistributionAnalyzer
cat_analyzer = CategoricalDistributionAnalyzer()
categorical_cols = [
name for name, col in findings.columns.items()
if col.inferred_type in [ColumnType.CATEGORICAL_NOMINAL, ColumnType.CATEGORICAL_ORDINAL, ColumnType.CATEGORICAL_CYCLICAL]
and col.inferred_type != ColumnType.TEXT # TEXT columns processed separately in 02a
and name not in TEMPORAL_METADATA_COLS
]
# Analyze all categorical columns
cat_analyses = cat_analyzer.analyze_dataframe(df, categorical_cols)
# Get encoding recommendations
cyclical_cols = [name for name, col in findings.columns.items()
if col.inferred_type == ColumnType.CATEGORICAL_CYCLICAL]
cat_recommendations = cat_analyzer.get_all_recommendations(df, categorical_cols, cyclical_columns=cyclical_cols)
for col_name in categorical_cols:
col_info = findings.columns[col_name]
analysis = cat_analyses.get(col_name)
rec = next((r for r in cat_recommendations if r.column_name == col_name), None)
print(f"\n{'='*70}")
print(f"Column: {col_name}")
print(f"Type: {col_info.inferred_type.value} (Confidence: {col_info.confidence:.0%})")
print("-" * 70)
if analysis:
print("\n📊 Distribution Metrics:")
print(f" Categories: {analysis.category_count}")
print(f" Imbalance Ratio: {analysis.imbalance_ratio:.1f}x (largest/smallest)")
print(f" Entropy: {analysis.entropy:.2f} ({analysis.normalized_entropy*100:.0f}% of max)")
print(f" Top-1 Concentration: {analysis.top1_concentration:.1f}%")
print(f" Top-3 Concentration: {analysis.top3_concentration:.1f}%")
print(f" Rare Categories (<1%): {analysis.rare_category_count}")
# Interpretation
print("\n📈 Interpretation:")
if analysis.has_low_diversity:
print(" ⚠️ LOW DIVERSITY: Distribution dominated by few categories")
elif analysis.normalized_entropy > 0.9:
print(" ✓ HIGH DIVERSITY: Categories are relatively balanced")
else:
print(" ✓ MODERATE DIVERSITY: Some category dominance but acceptable")
if analysis.imbalance_ratio > 100:
print(" 🔴 SEVERE IMBALANCE: Rarest category has very few samples")
elif analysis.is_imbalanced:
print(" 🟡 MODERATE IMBALANCE: Consider grouping rare categories")
# Recommendations
if rec:
print("\n🔧 Recommendations:")
print(f" Encoding: {rec.encoding_type.value}")
print(f" Reason: {rec.reason}")
print(f" Priority: {rec.priority}")
if rec.preprocessing_steps:
print(" Preprocessing:")
for step in rec.preprocessing_steps:
print(f" • {step}")
if rec.warnings:
for warn in rec.warnings:
print(f" ⚠️ {warn}")
# Visualization
value_counts = df[col_name].value_counts()
subtitle = f"Entropy: {analysis.normalized_entropy*100:.0f}% | Imbalance: {analysis.imbalance_ratio:.1f}x | Rare: {analysis.rare_category_count}" if analysis else ""
fig = charts.bar_chart(
value_counts.head(10).index.tolist(),
value_counts.head(10).values.tolist(),
title=f"Top Categories: {col_name}<br><sub>{subtitle}</sub>"
)
display_figure(fig)
# Summary table and add recommendations to registry
if cat_analyses:
print("\n" + "=" * 70)
print("CATEGORICAL COLUMNS SUMMARY")
print("=" * 70)
summary_data = []
for col_name, analysis in cat_analyses.items():
rec = next((r for r in cat_recommendations if r.column_name == col_name), None)
summary_data.append({
"Column": col_name,
"Categories": analysis.category_count,
"Imbalance": f"{analysis.imbalance_ratio:.1f}x",
"Entropy": f"{analysis.normalized_entropy*100:.0f}%",
"Top-3 Conc.": f"{analysis.top3_concentration:.1f}%",
"Rare (<1%)": analysis.rare_category_count,
"Encoding": rec.encoding_type.value if rec else "N/A"
})
# Add encoding recommendation to Gold layer
if rec and registry.gold:
registry.add_gold_encoding(
column=col_name,
method=rec.encoding_type.value,
rationale=rec.reason,
source_notebook="04_column_deep_dive"
)
display_table(pd.DataFrame(summary_data))
if registry.gold:
print(f"\n✅ Added {len(cat_recommendations)} encoding recommendations to Gold layer")
======================================================================
Column: cohort_quarter
Type: categorical_nominal (Confidence: 70%)
----------------------------------------------------------------------
📊 Distribution Metrics:
Categories: 41
Imbalance Ratio: 7221.0x (largest/smallest)
Entropy: 4.12 (77% of max)
Top-1 Concentration: 23.5%
Top-3 Concentration: 46.0%
Rare Categories (<1%): 22
📈 Interpretation:
✓ MODERATE DIVERSITY: Some category dominance but acceptable
🔴 SEVERE IMBALANCE: Rarest category has very few samples
🔧 Recommendations:
Encoding: target
Reason: High cardinality (41 categories) - target or frequency encoding
Priority: high
Preprocessing:
• Group 22 rare categories into 'Other'
⚠️ Use stratified sampling to preserve rare category representation
⚠️ High cardinality may require regularization with target encoding
====================================================================== CATEGORICAL COLUMNS SUMMARY ======================================================================
| Column | Categories | Imbalance | Entropy | Top-3 Conc. | Rare (<1%) | Encoding |
|---|---|---|---|---|---|---|
| cohort_quarter | 41 | 7221.0x | 77% | 46.0% | 22 | target |
✅ Added 1 encoding recommendations to Gold layer
4.7 Datetime Columns Analysis¶
📖 Unlike numeric transformations, datetime analysis recommends NEW FEATURES to create:
| Recommendation Type | Purpose | Examples |
|---|---|---|
| Feature Engineering | Create predictive features from dates | days_since_signup, tenure_years, month_sin_cos |
| Modeling Strategy | How to structure train/test | Time-based splits when trends detected |
| Data Quality | Issues to address before modeling | Placeholder dates (1/1/1900) to filter |
📖 Feature Engineering Strategies:
- Recency:
days_since_X- How recent was the event? (useful for predicting behavior) - Tenure:
tenure_years- How long has customer been active? (maturity/loyalty) - Duration:
days_between_A_and_B- Time between events (e.g., signup to first purchase) - Cyclical:
month_sin,month_cos- Preserves that December is near January - Categorical:
is_weekend,is_quarter_end- Behavioral indicators
Show/Hide Code
from customer_retention.stages.profiling.temporal_analyzer import TemporalRecommendationType
datetime_cols = [
name for name, col in findings.columns.items()
if col.inferred_type == ColumnType.DATETIME
and name not in TEMPORAL_METADATA_COLS
]
temporal_analyzer = TemporalAnalyzer()
# Store all datetime recommendations grouped by type
feature_engineering_recs = []
modeling_strategy_recs = []
data_quality_recs = []
datetime_summaries = []
for col_name in datetime_cols:
col_info = findings.columns[col_name]
print(f"\n{'='*70}")
print(f"Column: {col_name}")
print(f"Type: {col_info.inferred_type.value} (Confidence: {col_info.confidence:.0%})")
print(f"{'='*70}")
date_series = pd.to_datetime(df[col_name], errors='coerce', format='mixed')
valid_dates = date_series.dropna()
print(f"\n📅 Date Range: {valid_dates.min()} to {valid_dates.max()}")
print(f" Nulls: {date_series.isna().sum():,} ({date_series.isna().mean()*100:.1f}%)")
# Basic temporal analysis
analysis = temporal_analyzer.analyze(date_series)
print(f" Auto-detected granularity: {analysis.granularity.value}")
print(f" Span: {analysis.span_days:,} days ({analysis.span_days/365:.1f} years)")
# Growth analysis
growth = temporal_analyzer.calculate_growth_rate(date_series)
if growth.get("has_data"):
print("\n📈 Growth Analysis:")
print(f" Trend: {growth['trend_direction'].upper()}")
print(f" Overall growth: {growth['overall_growth_pct']:+.1f}%")
print(f" Avg monthly growth: {growth['avg_monthly_growth']:+.1f}%")
# Seasonality analysis
seasonality = temporal_analyzer.analyze_seasonality(date_series)
if seasonality.has_seasonality:
print("\n🔄 Seasonality Detected:")
print(f" Peak months: {', '.join(seasonality.peak_periods[:3])}")
print(f" Trough months: {', '.join(seasonality.trough_periods[:3])}")
print(f" Seasonal strength: {seasonality.seasonal_strength:.2f}")
# Get recommendations using framework
other_dates = [c for c in datetime_cols if c != col_name]
recommendations = temporal_analyzer.recommend_features(date_series, col_name, other_date_columns=other_dates)
# Group by recommendation type
col_feature_recs = [r for r in recommendations if r.recommendation_type == TemporalRecommendationType.FEATURE_ENGINEERING]
col_modeling_recs = [r for r in recommendations if r.recommendation_type == TemporalRecommendationType.MODELING_STRATEGY]
col_quality_recs = [r for r in recommendations if r.recommendation_type == TemporalRecommendationType.DATA_QUALITY]
feature_engineering_recs.extend(col_feature_recs)
modeling_strategy_recs.extend(col_modeling_recs)
data_quality_recs.extend(col_quality_recs)
# Display recommendations grouped by type
if col_feature_recs:
print("\n🛠️ FEATURES TO CREATE:")
for rec in col_feature_recs:
priority_icon = "🔴" if rec.priority == "high" else "🟡" if rec.priority == "medium" else "✓"
print(f" {priority_icon} {rec.feature_name} ({rec.category})")
print(f" Why: {rec.reason}")
if rec.code_hint:
print(f" Code: {rec.code_hint}")
if col_modeling_recs:
print("\n⚙️ MODELING CONSIDERATIONS:")
for rec in col_modeling_recs:
priority_icon = "🔴" if rec.priority == "high" else "🟡" if rec.priority == "medium" else "✓"
print(f" {priority_icon} {rec.feature_name}")
print(f" Why: {rec.reason}")
if col_quality_recs:
print("\n⚠️ DATA QUALITY ISSUES:")
for rec in col_quality_recs:
priority_icon = "🔴" if rec.priority == "high" else "🟡" if rec.priority == "medium" else "✓"
print(f" {priority_icon} {rec.feature_name}")
print(f" Why: {rec.reason}")
if rec.code_hint:
print(f" Code: {rec.code_hint}")
# Standard extractions always available
print("\n Standard extractions available: year, month, day, day_of_week, quarter")
# Store summary
datetime_summaries.append({
"Column": col_name,
"Span (days)": analysis.span_days,
"Seasonality": "Yes" if seasonality.has_seasonality else "No",
"Trend": growth.get('trend_direction', 'N/A').capitalize() if growth.get("has_data") else "N/A",
"Features to Create": len(col_feature_recs),
"Modeling Notes": len(col_modeling_recs),
"Quality Issues": len(col_quality_recs)
})
# === VISUALIZATIONS ===
if growth.get("has_data"):
fig = charts.growth_summary_indicators(growth, title=f"Growth Summary: {col_name}")
display_figure(fig)
chart_type = "line" if analysis.granularity in [TemporalGranularity.DAY, TemporalGranularity.WEEK] else "bar"
fig = charts.temporal_distribution(analysis, title=f"Records Over Time: {col_name}", chart_type=chart_type)
display_figure(fig)
fig = charts.temporal_trend(analysis, title=f"Trend Analysis: {col_name}")
display_figure(fig)
yoy_data = temporal_analyzer.year_over_year_comparison(date_series)
if len(yoy_data) > 1:
fig = charts.year_over_year_lines(yoy_data, title=f"Year-over-Year: {col_name}")
display_figure(fig)
fig = charts.year_month_heatmap(yoy_data, title=f"Records Heatmap: {col_name}")
display_figure(fig)
if growth.get("has_data"):
fig = charts.cumulative_growth_chart(growth["cumulative"], title=f"Cumulative Records: {col_name}")
display_figure(fig)
fig = charts.temporal_heatmap(date_series, title=f"Day of Week Distribution: {col_name}")
display_figure(fig)
# === DATETIME SUMMARY ===
if datetime_summaries:
print("\n" + "=" * 70)
print("DATETIME COLUMNS SUMMARY")
print("=" * 70)
display_table(pd.DataFrame(datetime_summaries))
# Summary by recommendation type
print("\n📋 ALL RECOMMENDATIONS BY TYPE:")
if feature_engineering_recs:
print(f"\n🛠️ FEATURES TO CREATE ({len(feature_engineering_recs)}):")
for i, rec in enumerate(feature_engineering_recs, 1):
priority_icon = "🔴" if rec.priority == "high" else "🟡" if rec.priority == "medium" else "✓"
print(f" {i}. {priority_icon} {rec.feature_name}")
if modeling_strategy_recs:
print(f"\n⚙️ MODELING CONSIDERATIONS ({len(modeling_strategy_recs)}):")
for i, rec in enumerate(modeling_strategy_recs, 1):
priority_icon = "🔴" if rec.priority == "high" else "🟡" if rec.priority == "medium" else "✓"
print(f" {i}. {priority_icon} {rec.feature_name}: {rec.reason}")
if data_quality_recs:
print(f"\n⚠️ DATA QUALITY TO ADDRESS ({len(data_quality_recs)}):")
for i, rec in enumerate(data_quality_recs, 1):
priority_icon = "🔴" if rec.priority == "high" else "🟡" if rec.priority == "medium" else "✓"
print(f" {i}. {priority_icon} {rec.feature_name}: {rec.reason}")
# Add recommendations to registry
added_derived = 0
added_modeling = 0
# Add feature engineering recommendations to Silver layer (derived columns)
if registry.silver:
for rec in feature_engineering_recs:
registry.add_silver_derived(
column=rec.feature_name,
expression=rec.code_hint or "",
feature_type=rec.category,
rationale=rec.reason,
source_notebook="04_column_deep_dive"
)
added_derived += 1
# Add modeling strategy recommendations to Bronze layer
seen_strategies = set()
for rec in modeling_strategy_recs:
if rec.feature_name not in seen_strategies:
registry.add_bronze_modeling_strategy(
strategy=rec.feature_name,
column=datetime_cols[0] if datetime_cols else "",
parameters={"category": rec.category},
rationale=rec.reason,
source_notebook="04_column_deep_dive"
)
seen_strategies.add(rec.feature_name)
added_modeling += 1
print(f"\n✅ Added {added_derived} derived column recommendations to Silver layer")
print(f"✅ Added {added_modeling} modeling strategy recommendations to Bronze layer")
4.8 Type Override (Optional)¶
If any column types were incorrectly inferred, you can override them here.
Common overrides:
- Binary columns detected as numeric →
ColumnType.BINARY - IDs detected as numeric →
ColumnType.IDENTIFIER - Ordinal categories detected as nominal →
ColumnType.CATEGORICAL_ORDINAL
Show/Hide Code
# === TYPE OVERRIDES ===
# Uncomment and modify to override any incorrectly inferred types
TYPE_OVERRIDES = {
# "column_name": ColumnType.NEW_TYPE,
# Examples:
# "is_active": ColumnType.BINARY,
# "user_id": ColumnType.IDENTIFIER,
# "satisfaction_level": ColumnType.CATEGORICAL_ORDINAL,
}
if TYPE_OVERRIDES:
print("Applying type overrides:")
for col_name, new_type in TYPE_OVERRIDES.items():
if col_name in findings.columns:
old_type = findings.columns[col_name].inferred_type.value
findings.columns[col_name].inferred_type = new_type
findings.columns[col_name].confidence = 1.0
findings.columns[col_name].evidence.append("Manually overridden")
print(f" {col_name}: {old_type} → {new_type.value}")
else:
print("No type overrides configured.")
print("To override a type, add entries to TYPE_OVERRIDES dictionary above.")
No type overrides configured. To override a type, add entries to TYPE_OVERRIDES dictionary above.
4.9 Data Segmentation Analysis¶
Purpose: Determine if the dataset contains natural subgroups that might benefit from separate models.
📖 Why This Matters:
- Some datasets have distinct customer segments with very different behaviors
- A single model might struggle to capture patterns that vary significantly across segments
- Segmented models can improve accuracy but add maintenance complexity
Recommendations:
- single_model - Data is homogeneous; one model for all records
- consider_segmentation - Some variation exists; evaluate if complexity is worth it
- strong_segmentation - Distinct segments with different target rates; separate models likely beneficial
Important: This is exploratory guidance only. The final decision depends on business context, model complexity tolerance, and available resources.
Show/Hide Code
from customer_retention.stages.profiling import SegmentAnalyzer
# Initialize segment analyzer
segment_analyzer = SegmentAnalyzer()
# Find target column if detected
target_col = None
for col_name, col_info in findings.columns.items():
if col_info.inferred_type == ColumnType.TARGET:
target_col = col_name
break
# Run segmentation analysis using numeric features
print("="*70)
print("DATA SEGMENTATION ANALYSIS")
print("="*70)
segmentation = segment_analyzer.analyze(
df,
target_col=target_col,
feature_cols=numeric_cols if numeric_cols else None,
max_segments=5
)
print("\n🎯 Analysis Results:")
print(f" Method: {segmentation.method.value}")
print(f" Detected Segments: {segmentation.n_segments}")
print(f" Cluster Quality Score: {segmentation.quality_score:.2f}")
if segmentation.target_variance_ratio is not None:
print(f" Target Variance Ratio: {segmentation.target_variance_ratio:.2f}")
print("\n📊 Segment Profiles:")
for profile in segmentation.profiles:
target_info = f" | Target Rate: {profile.target_rate*100:.1f}%" if profile.target_rate is not None else ""
print(f" Segment {profile.segment_id}: {profile.size:,} records ({profile.size_pct:.1f}%){target_info}")
# Display recommendation card
fig = charts.segment_recommendation_card(segmentation)
display_figure(fig)
# Display segment overview
fig = charts.segment_overview(segmentation, title="Segment Overview")
display_figure(fig)
# Display feature comparison if we have features
if segmentation.n_segments > 1 and any(p.defining_features for p in segmentation.profiles):
fig = charts.segment_feature_comparison(segmentation, title="Feature Comparison Across Segments")
display_figure(fig)
print("\n📝 Rationale:")
for reason in segmentation.rationale:
print(f" • {reason}")
====================================================================== DATA SEGMENTATION ANALYSIS ====================================================================== 🎯 Analysis Results: Method: kmeans Detected Segments: 1 Cluster Quality Score: 0.00 Target Variance Ratio: 0.00 📊 Segment Profiles: Segment 0: 30,770 records (100.0%) | Target Rate: 79.5%
📝 Rationale: • Insufficient data for meaningful segmentation
4.10 Save Updated Findings¶
Show/Hide Code
# Save updated findings back to the same file
findings.save(FINDINGS_PATH)
print(f"Updated findings saved to: {FINDINGS_PATH}")
# Save recommendations registry
recommendations_path = FINDINGS_PATH.replace("_findings.yaml", "_recommendations.yaml")
registry.save(recommendations_path)
print(f"Recommendations saved to: {recommendations_path}")
# Summary of recommendations
all_recs = registry.all_recommendations
print("\n📋 Recommendations Summary:")
print(f" Bronze layer: {len(registry.get_by_layer('bronze'))} recommendations")
print(f" Silver layer: {len(registry.get_by_layer('silver'))} recommendations")
print(f" Gold layer: {len(registry.get_by_layer('gold'))} recommendations")
print(f" Total: {len(all_recs)} recommendations")
Updated findings saved to: /Users/Vital/python/CustomerRetention/experiments/runs/retail-e7471284/datasets/customer_retention_retail/findings/customer_retention_retail_aggregated_findings.yaml Recommendations saved to: /Users/Vital/python/CustomerRetention/experiments/runs/retail-e7471284/datasets/customer_retention_retail/findings/customer_retention_retail_aggregated_recommendations.yaml 📋 Recommendations Summary: Bronze layer: 3 recommendations Silver layer: 0 recommendations Gold layer: 174 recommendations Total: 177 recommendations
Summary: What We Learned¶
In this notebook, we performed a deep dive analysis that included:
- Value Range Validation - Validated rates, binary fields, and non-negative constraints
- Numeric Distribution Analysis - Calculated skewness, kurtosis, and percentiles with transformation recommendations
- Categorical Distribution Analysis - Calculated imbalance ratio, entropy, and concentration with encoding recommendations
- Datetime Analysis - Analyzed seasonality, trends, and patterns with feature engineering recommendations
- Data Segmentation - Evaluated if natural subgroups exist that might benefit from separate models
Key Metrics Reference¶
Numeric Columns:
| Metric | Threshold | Action |
|---|---|---|
| Skewness | |skew| > 1 | Log transform |
| Kurtosis | > 10 | Cap outliers first |
| Zero % | > 40% | Zero-inflation handling |
Categorical Columns:
| Metric | Threshold | Action |
|---|---|---|
| Imbalance Ratio | > 10x | Group rare categories |
| Entropy | < 50% | Stratified sampling |
| Rare Categories | > 0 | Group into "Other" |
Datetime Columns:
| Finding | Action |
|---|---|
| Seasonality | Add cyclical month encoding |
| Strong trend | Time-based train/test split |
| Multiple dates | Calculate duration features |
| Placeholder dates | Filter or flag |
Transformation & Encoding Summary¶
Review the summary tables above for:
- Numeric: Which columns need log transforms, capping, or zero-inflation handling
- Categorical: Which encoding to use and whether to group rare categories
- Datetime: Which temporal features to engineer based on detected patterns
Next Steps¶
Continue to 02_source_integrity.ipynb to:
- Analyze duplicate records and value conflicts
- Deep dive into missing value patterns
- Analyze outliers with IQR method
- Check data consistency
- Get cleaning recommendations
Or jump to 05_feature_opportunities.ipynb if you want to see derived feature recommendations.
Save Reminder: Save this notebook (Ctrl+S / Cmd+S) before running the next one. The next notebook will automatically export this notebook's HTML documentation from the saved file.