Skip to content

freshdata

Automated DataFrame cleaning for pandas — explainable, safe, and production-ready.

freshdata turns a messy CSV, Excel, or SQL export into analysis- and ML-ready data in a single call — and tells you exactly what it changed and why.

import pandas as pd
import freshdata as fd

df = pd.read_csv("export.csv")
cleaned, report = fd.clean(df, return_report=True)
print(report.summary())

It is not a fillna wrapper. A rule-based decision engine profiles every column — missing ratio, dtype, skewness, cardinality, inferred role — and chooses the right action per column, logging a rationale, a risk level, and a confidence score for each decision.

Why freshdata

  • Automated DataFrame cleaning in one call: missing values, outliers, duplicates, dtype repair, and column-name normalization.
  • Explainable — every decision is logged; if a NaN survives, the report says why.
  • Safe — never imputes an identifier, modifies a target/label column, force-fills free text, or removes outliers blindly.
  • AI-ready preprocessing — leakage-aware, typed output for scikit-learn, XGBoost, and any ML pipeline.
  • pandas-first, Polars-optional, fully typed, 1,200+ tests, 93% coverage gate (CI-enforced).

Install

pip install freshdata-cleaner

See Installation for optional extras (ml, enterprise, all).

Next steps