Every Sunday, I take 15 minutes to look back at the week — not just what I built, but how I thought. This habit has quietly become one of the most valuable things I do for my career.
This week was one of those weeks where the biggest wins came from doing less, not more.
1. Simpler Pipelines Beat Clever Ones (Almost Always)
I inherited an Airflow DAG this week that had 14 tasks, custom sensors, dynamic task mapping, and enough conditional logic to make your head spin. It was impressive — but it was also breaking constantly and nobody could debug it in under an hour.
We replaced it with a dbt model + a single cron job. Result: 80% less code, same output, and any junior engineer on the team can now understand and maintain it.
The lesson? Complexity is not sophistication. If a pipeline needs a presentation to explain it, it’s already too complicated.
2. Query Execution Plans Are Underrated
I started spending 30 minutes each morning reviewing EXPLAIN ANALYZE output on our slowest queries. Within three days, I found two silent killers: a full table scan on a 200M-row table and a nested loop join picking the wrong strategy due to stale statistics.
EXPLAIN ANALYZE
SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > NOW() - INTERVAL '7 days';
Takeaway: Reading execution plans feels slow. Not reading them is slower.
3. The Power of Saying No to Data Sources
A stakeholder came to me with a “quick” request: connect 3 new data sources. Old me would’ve said yes. This week’s me asked: What decision will this data enable? Who will use it? How often? The answers were vague. The request got deprioritized.
Every new data source is a long-term maintenance commitment. Be selective. A lean data platform that reliably serves 10 use cases is worth more than a sprawling one that partially serves 50.
4. Documentation Debt Is Real (And Painful)
I came back to a Python utility script I wrote 6 weeks ago. No comments. No README. No docstrings. I spent 45 minutes reverse-engineering what I had written.
def normalize_event_timestamps(df: pd.DataFrame, tz: str = "UTC") -> pd.DataFrame:
"""
Convert all timestamp columns to a unified timezone.
Args:
df: Input DataFrame with raw event data
tz: Target timezone string (default: 'UTC')
Returns:
DataFrame with normalized timestamp columns
"""
# implementation here
A docstring + type hints. Takes 2 minutes. Saves 45 minutes later.
5. The Mindset Shift That Changed My Week
Stop asking “how do I build this?” Start asking “should I build this at all?”
Most data problems are not engineering problems. They’re clarity problems. The best data engineers push back — not to be difficult, but to make sure the work they do actually matters.
Wrapping Up
If you’re a data engineer, spend 15 minutes every Sunday asking: What worked and why? What didn’t work and what would I do differently? What’s one thing I’ll carry into next week?
Small habit. Big compounding returns. See you next Sunday 👋
— Pushpjeet Cholkar, Data Engineer
Leave a Reply