Let me guess—you’ve been dabbling in Python, running some print("Hello, world!")
, maybe poking around in pandas, but you’re not sure what to do next. Tutorials are fine, but they’re starting to feel like textbook problems. You want something real. Something fun. Something that doesn’t involve yet another CSV full of fake sales data.
So here’s an idea: analyze your own Spotify listening habits using Python.
Yeah, I’m serious. Your music taste—those long Taylor Swift sessions, your Sunday jazz mornings, that one week you deep-dived into Mongolian throat singing—that’s all data. And with a little Python, you can turn it into charts, trends, maybe even a mild identity crisis.
This isn’t just another beginner project. It’s a personal one. And that’s what makes it stick.
The Data’s Already Yours—You Just Have to Ask for It
Let’s start at the very beginning: getting the actual Spotify data.
Spotify lets you request your listening data through their Privacy Settings. The catch? It doesn’t show up instantly. You request it, and they send you a ZIP file by email—usually within 24 hours.
The ZIP includes all sorts of goodies:
- Your streaming history (timestamps + tracks)
- Account data, search queries, in-app interactions
- Even… ads you’ve heard. Yeah. That’s in there.
But for our purposes, you’ll mostly care about the file called something like Streaming_History_Audio_*.json
.
Do this now: Request your Spotify data. It takes 2 minutes, and the project’s way more fun with your own tracks.
Set Up: Keep It Simple (Seriously)
When that ZIP file arrives, unzip it and move the JSON files into a folder. Then, fire up Google Colab—no installations, no local setup nightmares.
Upload one or more of your streaming history JSON files to the notebook session. If you don’t know how: Files
tab → upload.
Here’s the opening move:
import pandas as pd
import json
with open('Streaming_History_Audio_2023_1.json', 'r', encoding='utf-8') as f:
data = json.load(f)
df = pd.DataFrame(data)
df.head()
If all went well, you’ll see rows with endTime
, artistName
, trackName
, and msPlayed
.
Boom. Your taste in music is now a DataFrame.
Wait, What Am I Even Looking At?
Let’s break it down.
Each row is a song you listened to. The columns tell you:
endTime
: when the song finished playingartistName
: who performed ittrackName
: the name of the songmsPlayed
: how long you listened (in milliseconds)
This means you can figure out:
- How often you play certain artists
- Which songs you actually finish
- What time of day you listen most
- How your taste shifts across months
This isn’t a project about just “reading a JSON.” It’s about seeing your daily life—your moods, rituals, guilty pleasures—through data.
Let’s Answer Some Questions
Now we’re cooking. The best part of this project? You set the questions. Here are a few juicy ones to start with:
1. What artists do I listen to the most?
top_artists = df['artistName'].value_counts().head(10)
top_artists.plot(kind='barh', title='Top 10 Most Played Artists')
This gives you the count of how often each artist appears. Want to be fancy? Sum the msPlayed
instead to see total listening time.
df.groupby('artistName')['msPlayed'].sum().sort_values(ascending=False).head(10)
That will tell you who truly dominates your attention span.
2. Which songs do I actually finish?
Define “finishing” as playing at least 90% of a typical song length (~3 minutes = 180,000 ms). So:
df['finished'] = df['msPlayed'] >= 180000 * 0.9
completion_rate = df['finished'].mean()
print(f"You finish songs {completion_rate:.2%} of the time.")
It’s weirdly satisfying (or slightly humiliating) to see how often you skip halfway.
3. Do I listen more on weekdays or weekends?
First, convert the endTime
into an actual datetime:
df['endTime'] = pd.to_datetime(df['endTime'])
df['day_of_week'] = df['endTime'].dt.day_name()
Then count plays per day:
df['day_of_week'].value_counts().reindex([
'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'
]).plot(kind='bar', title='Listening by Day of Week')
Weekends or weekdays—where’s your audio comfort zone?
4. What time of day am I most musical?
df['hour'] = df['endTime'].dt.hour
df['hour'].value_counts().sort_index().plot(kind='bar', title='Listening by Hour')
You’ll quickly learn if you’re a late-night playlist scroller or a morning podcast loyalist.
5. What month had the biggest music binge?
df['month'] = df['endTime'].dt.to_period('M')
monthly = df.groupby('month')['msPlayed'].sum() / (1000 * 60 * 60) # convert ms to hours
monthly.plot(kind='line', title='Total Listening Hours by Month')
This is where life shows up. That one month you broke up? You probably doubled your listening time.
Customize Your Vibes (Optional, but Fun)
Want to go a level deeper? Match your listening with external data:
- Pull your mood journal (if you track one) and see what songs align with sadness or joy.
- Combine it with weather data. Did rain increase lo-fi beats?
- Overlay with calendar events. Did your playlist shift after a job change?
But Wait—Can’t I Use Spotify’s API?
Yes, you can. Spotify has a slick API that gives you real-time data, audio features, and more. But there’s a catch: it only works going forward. You can’t access your full historical data unless you downloaded it like we did.
That said, once you’re comfy, playing with the API is a great next project. You can pull song tempo, key, energy, even danceability. You’ll basically become your own music recommender system.
But for now, stick with your good ol’ JSON files. They’re surprisingly rich.
Let Me Be Honest—You’ll Go Down Rabbit Holes
I started analyzing my own Spotify data just to “learn pandas better.” Three hours later I was plotting emotional arcs through Kanye albums and wondering why I listened to more disco in February.
This is what makes it so effective for learning: you care. These aren’t made-up sales leads or test scores. It’s your life, your routines, your late-night impulse plays.
Python becomes more than code—it becomes a mirror.
Final Touch: Save and Share It
Don’t just run this once and forget it. Turn it into something:
- Save the notebook to GitHub with a README explaining your findings.
- Make a mini-report and send it to a friend. (“Here’s why I’m legally required to see Beyoncé live.”)
- Post your favorite chart on social media. Be that person.
The act of packaging your work forces clarity. It’s what separates learners from builders.
Do this: Add one chart or takeaway you didn’t expect, and share it. Even if just in a DM to yourself.
tl;dr (if you’re the scroll-to-the-bottom type)
- Request your Spotify data from your account settings.
- Load the JSON into a Colab notebook using pandas.
- Ask questions that actually interest you (top artists, listening times, etc.).
- Use basic Python and visualization to explore trends.
- Laugh at your past obsession with sad acoustic songs.
- Share your analysis, even if it’s scrappy—it makes the learning stick.
This isn’t just about Python. It’s about realizing you can take the most ordinary data—your Spotify history—and turn it into something personal, surprising, even a little bit addictive.
You don’t need to be an expert. You don’t need to “wait until you’ve learned more.”
You just need to open a notebook and hit play.
Ready to analyze your soundtrack?