freshdata¶
Automated DataFrame cleaning for pandas — explainable, safe, and production-ready.
freshdata turns a messy CSV, Excel, or SQL export into analysis- and ML-ready
data in a single call — and tells you exactly what it changed and why.
import pandas as pd
import freshdata as fd
df = pd.read_csv("export.csv")
cleaned, report = fd.clean(df, return_report=True)
print(report.summary())
It is not a fillna wrapper. A rule-based decision engine profiles every
column — missing ratio, dtype, skewness, cardinality, inferred role — and chooses
the right action per column, logging a rationale, a risk level, and a confidence
score for each decision.
Why freshdata¶
- Automated DataFrame cleaning in one call: missing values, outliers, duplicates, dtype repair, and column-name normalization.
- Explainable — every decision is logged; if a
NaNsurvives, the report says why. - Safe — never imputes an identifier, modifies a target/label column, force-fills free text, or removes outliers blindly.
- AI-ready preprocessing — leakage-aware, typed output for scikit-learn, XGBoost, and any ML pipeline.
- pandas-first, Polars-optional, fully typed, 1,200+ tests, 93% coverage gate (CI-enforced).
Install¶
See Installation for optional extras (ml, enterprise, all).
Next steps¶
- Quickstart — clean your first DataFrame.
- Cleaning engine — how decisions are made.
- Data profiling — inspect before you clean.
- API reference — every function and class.
- Examples — runnable end-to-end recipes.
- FAQ — common questions answered.