freshdata¶

Automated DataFrame cleaning for pandas — explainable, safe, and production-ready.

freshdata turns a messy CSV, Excel, or SQL export into analysis- and ML-ready data in a single call — and tells you exactly what it changed and why.

import pandas as pd
import freshdata as fd

df = pd.read_csv("export.csv")
cleaned, report = fd.clean(df, return_report=True)
print(report.summary())

It is not a fillna wrapper. A rule-based decision engine profiles every column — missing ratio, dtype, skewness, cardinality, inferred role — and chooses the right action per column, logging a rationale, a risk level, and a confidence score for each decision.

Why freshdata¶

Automated DataFrame cleaning in one call: missing values, outliers, duplicates, dtype repair, and column-name normalization.
Explainable — every decision is logged; if a NaN survives, the report says why.
Safe — never imputes an identifier, modifies a target/label column, force-fills free text, or removes outliers blindly.
AI-ready preprocessing — leakage-aware, typed output for scikit-learn, XGBoost, and any ML pipeline.
pandas-first, Polars-optional, fully typed, 1,200+ tests, 93% coverage gate (CI-enforced).

Install¶

pip install freshdata-cleaner

See Installation for optional extras (ml, enterprise, all).

Next steps¶

Quickstart — clean your first DataFrame.
Cleaning engine — how decisions are made.
Data profiling — inspect before you clean.
API reference — every function and class.
Examples — runnable end-to-end recipes.
FAQ — common questions answered.