Let’s not romanticize the grind. You know the one—googling pandas syntax for the hundredth time, half a dozen tabs open with titles like “Best IDE for data science in 2025,” and secretly wondering if you’re still doing it all wrong. If you’ve ever felt the low-key panic of being one package away from feeling like a “real” data professional, you’re in good company.
I’ve been there. Still am sometimes. Here’s the truth: you don’t need everything, but you do need enough. Enough to stop breaking your flow every time you want to plot something, enough to feel dangerous when you open a Jupyter notebook, enough to not lose sleep over whether your local PostgreSQL is still running.
So if you’re building your setup—whether you’re in week one of your first bootcamp or already debugging spaghetti pipelines at 2am—here’s the definitive gear list. Not everything under the sun, just the stuff that actually makes life better.
Python & Conda: Your New Best Frenemies
Let’s start with the elephant in the room. Python is the language you’ll hear about in 98% of data job postings. Conda is what you’ll eventually thank for saving you from dependency hell.
Install Anaconda or Miniconda if you want to keep it lean. I used to avoid Conda because I thought I was being “hardcore” managing packages with pip
. That phase ended quickly after breaking my environment trying to get scikit-learn
and xgboost
to coexist peacefully.
Pro tip? Use separate environments for different projects. Call them envs
, venvs
, whatever you like—but isolate them like they’re contagious.
Quick tip:
Create a new environment like so:
conda create -n my_project_env python=3.11
JupyterLab: The Sandbox You’ll Never Outgrow
Jupyter Notebooks are where most data folks cut their teeth. And JupyterLab? It’s the glow-up version. Tabs, file browser, terminals—all in one interface.
I still remember the “aha” moment when I discovered %matplotlib inline
. (Yes, I’m that old.) Jupyter’s magic commands, visual flexibility, and markdown mix-ins are like having a whiteboard, notebook, and command line all rolled into one.
Install with:
conda install -c conda-forge jupyterlab
Bonus points: Install extensions like jupyterlab-vim
if you want to feel like a terminal god.
VS Code: Not Just for Coders Anymore
Let’s talk editors. I know plenty of people swear by PyCharm, and hey, go wild. But for 80% of us who want an editor that’s fast, customizable, and plays well with Python, notebooks, Docker, Git—you name it—VS Code is where it’s at.
What won me over? That damn “Run Cell” button in notebooks. That and the extensions. You can spin up a container, browse a remote repo, and debug a broken SQL query all without leaving your editor.
Must-have extensions:
- Python (duh)
- Jupyter
- GitLens
- Docker (if you’re that deep yet)
Git: Yes, You Need It (Even for Side Projects)
If your project lives only on your laptop, it might as well not exist. Harsh? Maybe. But real.
Git is how you version, collaborate, and (more often than we’d like) recover from catastrophic mistakes. Install Git early. Don’t wait until a team forces you to.
Get comfy with the basics: add
, commit
, push
, pull
, clone
. Learn what a branch is. Learn what not to commit (looking at you, data.csv
).
I still have muscle memory for git log --oneline --graph
. It’s weirdly satisfying.
Quick tip:
Always .gitignore
your virtual environments and large raw data files.
Docker: Because “It Worked on My Machine” Isn’t a Valid Excuse
Docker feels scary at first. Like, “wait, I’m virtualizing an OS to run a Jupyter notebook?” But once it clicks, you’ll never go back.
Containers let you package up your environment—Python version, dependencies, weird config files—and run it anywhere. Which means fewer surprises when you deploy or collaborate.
Start simple. Build a container for your notebook and a requirements.txt
. Then escalate to docker-compose
when you’re ready to feel cool.
There’s something very adult about running a containerized PostgreSQL alongside your data pipeline. It’s the data equivalent of owning matching Tupperware.
PostgreSQL & pgAdmin: Know Your Way Around a Real Database
At some point, you’ll hit the limits of CSVs. When you do, having a local PostgreSQL install is a game changer.
I still remember connecting to my first remote database and feeling like I had just hacked the Pentagon. All I did was SELECT some rows, but man—it felt good.
Pair it with pgAdmin for a GUI that won’t make your eyes bleed. Or use psql
if you’re living your command-line fantasy.
If you’re into synthetic data or testing, tools like pgbench
or Faker
can help populate your tables for practice.
Pandas-Profiling & ydata-profiling: EDA in One Line
Exploratory Data Analysis is often a chaotic mix of .head()
, .describe()
, and “wait, why is this column 90% null?”
That’s where ydata-profiling (formerly pandas-profiling
) comes in. One line of code gives you an interactive HTML report with distributions, missing value maps, correlations—the works.
It’s not perfect. It chokes on really large datasets. But for a quick gut-check before diving in, it’s solid. Think of it like a second brain for your EDA.
from ydata_profiling import ProfileReport
profile = ProfileReport(df, title="My Report")
profile.to_notebook_iframe()
DVC or MLflow: If You’re Getting Serious About Modeling
Here’s the twist: most models don’t fail because of bad math—they fail because nobody remembers how they were trained.
If you’re training ML models, you’ll need to version not just your code but your data and hyperparameters too. That’s where tools like DVC or MLflow shine.
They take some getting used to. But once you’ve used MLflow to log a dozen experiments and compare them in a UI, going back to “model_v6_final_final2.pkl” feels barbaric.
You don’t need this on day one. But when you hit the modeling stage, install it. Thank me later.
Tableau Public or Power BI: Show, Don’t Just Tell
Yes, dashboards. I used to roll my eyes too. Then I saw how fast Tableau can make a good story pop out of a dataset.
You don’t need to become a full-time dashboard monkey. But knowing how to turn a dataset into a compelling visual in under 10 minutes? That’s a power move—especially when execs are in the room.
If Tableau feels like too much, try Datawrapper or Plotly. Even matplotlib
or seaborn
in the right hands can tell a killer story.
Bonus Picks (a.k.a. Nerd Candy)
- nbdev: Turn your Jupyter notebooks into production-grade Python modules.
- DuckDB: SQL meets OLAP, but local. In-memory querying for fast prototyping.
- Great Expectations: For data validation that feels almost fun. Almost.
- Rich / Textual: Make your CLI apps prettier than they have any right to be.
- Airflow: For when you’ve got data pipelines and you want them to run on schedule without babysitting.
Final Thought: Your Stack Will Grow With You
Look—don’t panic if you don’t have all this set up today. Nobody does. It’s an evolving toolkit. You’ll add things when you hit a wall or get curious. That’s the whole point.
But what matters is starting with intention. Pick tools that help you think faster, code cleaner, and stay in the zone. Learn enough to know what’s slowing you down. And maybe—just maybe—let go of the idea that installing yet another fancy library is going to “fix” your workflow overnight.
You’re the tool. The rest is just accessories.
tl;dr
- Install Conda and use isolated environments.
- Live in JupyterLab and VS Code.
- Use Git from day one, even for solo projects.
- Don’t fear Docker—it’s your future-proofing friend.
- Get comfy with SQL and PostgreSQL.
- Use profiling tools to speed up EDA.
- Version your models if you care about reproducibility.
- Make visuals that don’t suck.
You don’t need everything. But you do need enough to stop fighting your tools and start building with them.