Quickstart¶

Clean a DataFrame¶

import pandas as pd
import freshdata as fd

df = pd.read_csv("messy_export.csv")

cleaned = fd.clean(df)   # sensible, explainable defaults

fd.clean returns a new, cleaned DataFrame and never mutates your input (unless you pass preserve_original=False).

Get the audit trail¶

cleaned, report = fd.clean(df, return_report=True)
print(report.summary())

freshdata clean report
  rows:    525 -> 500 (-25)
  columns: 7 -> 6 (-1)
  missing: 421 -> 0 cell(s)
  time:    0.017s
  actions (7):
    - [drop_duplicates] dropped 25 duplicate row(s) (4.8% of rows, keep='first')
    - [missing] 'age': filled 12 missing value(s) with median (39.6846)
    - [outliers] 'amount': flagged 15 outlier(s) in new column 'amount_outlier'
  review (1):
    ? column 'mostly_gone' preserved at 60.0% missing in balanced mode

The report is also machine-readable:

report.to_frame()   # one row per decision, as a DataFrame
report.to_dict()    # JSON-friendly for logging / dashboards

Preview before cleaning¶

# Read-only data-quality report
print(fd.profile(df))

# The exact plan clean() would run
print(fd.suggest_plan(df).summary())

# Compare strategies side by side
print(fd.compare_plans(df))

Protect important columns¶

cleaned = fd.clean(
    df,
    target_column="churn",        # never modified (prevents leakage)
    id_columns=("customer_id",),  # never imputed
    preserve_columns=("notes",),  # never dropped
    return_report=True,
)

Reuse a configured pipeline¶

cleaner = fd.Cleaner(target_column="churn", strategy="balanced")
for path in paths:
    out = cleaner.clean(pd.read_csv(path))
    log.info(cleaner.report_.summary())

Choose a strategy¶

strategy	behavior
`"balanced"` (default)	accuracy-first; preserves high-missing columns, flags outliers
`"aggressive"`	maximal scrubbing: KNN imputation, column drops, winsorization
`"conservative"`	representation repair only (names, sentinels, dtypes, dupes)

fd.clean(df, strategy="aggressive")

Next: learn exactly how those decisions are made in the cleaning engine guide.