Data profiling¶
fd.profile(df) inspects a DataFrame without changing anything. Because it
runs the same inference code as clean, its suggestions are a faithful preview
of what cleaning would do — ideal for exploratory data analysis (EDA) and
data-quality checks.
freshdata profile — 5 rows x 6 columns, 1.5 KB
missing cells: 6 (20.0%) duplicate rows: 1
column dtype missing issues
First Name object 20% 20.0% missing; 1 value(s) with surrounding whitespace; 1 sentinel value(s) meaning missing
AGE object - 1 sentinel value(s) meaning missing; would convert to Int64
Joined Date object - 1 sentinel value(s) meaning missing; would convert to datetime64[ns]
Active object - would convert to bool
Salary($) object - 1 sentinel value(s) meaning missing; would convert to float64
empty object 100% 100.0% missing; constant column
Attach a dry-run plan¶
profile = fd.profile(df, include_plan=True)
print(profile.plan.summary()) # primary model chosen per column
Programmatic access¶
profile = fd.profile(df)
profile.to_dict() # JSON-friendly summary
for col in profile.columns:
print(col.name, col.dtype, col.missing_pct, col.issues)
Inferring column roles¶
Explaining a clean¶
explanation = fd.explain_clean(df, strategy="balanced")
print(explanation.summary())
print(explanation.roles)
See the API reference for Profile and ColumnProfile
fields.
Compare with ydata-profiling¶
fd.profile(df) gives a repair-focused preview: it tells you what cleaning
actions would likely change your DataFrame, while leaving the data untouched.
ydata_profiling.ProfileReport (optional dependency) is useful for a fuller
EDA-style report with charts and correlation views.
import pandas as pd
import freshdata as fd
sample = pd.DataFrame(
{
"customer_id": [1, 2, None, 4, 5],
"revenue": [120.0, -999.0, 340.0, None, 560.0],
"signup_date": ["2026-01-05", "2026/03/12", "invalid", "2026-06-01", None],
"churned": ["yes", "no", "yes", None, "no"],
}
)
print("Freshdata profile preview:")
fresh_profile = fd.profile(sample)
print(fresh_profile)
print("Freshdata plan summary:")
print(fresh_profile.plan.summary())
# Optional: ydata-profiling for richer exploratory context.
# Install only if your project already uses it.
# pip install ydata-profiling
from ydata_profiling import ProfileReport
profile = ProfileReport(sample, title="raw-data quick check", minimal=True)
profile.to_file("quickstart_profile.html")
When to use which¶
- Use
fd.profilewhen you need a lightweight, dry-run signal for whatfreshdatawould clean. - Use
ydata_profilingwhen you need deeper EDA artifacts (distribution and correlation views) before deciding a cleaning strategy. - You can run both on the same raw frame: profile first for a repair plan, then use a full profiling report for additional manual exploration.