Skip to content

Commit

Permalink
add spotify export script + README
Browse files Browse the repository at this point in the history
  • Loading branch information
karlicoss committed Oct 25, 2020
1 parent 008f39f commit 0251d89
Show file tree
Hide file tree
Showing 3 changed files with 185 additions and 0 deletions.
73 changes: 73 additions & 0 deletions README.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#+begin_src python :dir src :results drawer :exports results
import spotifyexport.export as E; return E.make_parser().prog
#+end_src

#+RESULTS:
:results:
Export your personal Spotify data: playlists, saved tracks/albums/shows, etc. as JSON.
:end:

* Setting up
1. The easiest way is =pip3 install --user git+https://github.com/karlicoss/spotifyexport=.

Alternatively, use =git clone --recursive=, or =git pull && git submodules update --init=. After that, you can use =pip3 install --editable=.
2. To use the API, you need to create a new app on [[https://developer.spotify.com/dashboard/applications][Spotify for Developers]]

For =redirect_uri=: you can pick pretty much anything, e.g. =https://github.com=. After that you'll get =client_id= and =client_secret=.

3. On the first script run, you'll be prompted to approve the script access.

Once approved, the token is saved to =cache_path= (will be created if doesn't exist). After that you won't need to enter the password again as long as you pass the same =cache_path=.


* Exporting

#+begin_src python :dir src :results drawer :exports results
import spotifyexport.export as E; return E.make_parser().epilog
#+end_src

#+RESULTS:
:results:

Usage:

*Recommended*: create =secrets.py= keeping your api parameters, e.g.:


: client_id = "CLIENT_ID"
: client_secret = "CLIENT_SECRET"
: redirect_uri = "REDIRECT_URI"
: cache_path = "CACHE_PATH"


After that, use:

: python3 -m spotifyexport.export --secrets /path/to/secrets.py

That way you type less and have control over where you keep your plaintext secrets.

*Alternatively*, you can pass parameters directly, e.g.

: python3 -m spotifyexport.export --client_id <client_id> --client_secret <client_secret> --redirect_uri <redirect_uri> --cache_path <cache_path>

However, this is verbose and prone to leaking your keys/tokens/passwords in shell history.



I *highly* recommend checking exported files at least once just to make sure they contain everything you expect from your export. If not, please feel free to ask or raise an issue!

:end:

* API limitations

- you might want to do a [[https://www.spotify.com/uk/privacy/#privacy-center-control-section][GDPR export]] in addition, just in case

- [[https://developer.spotify.com/documentation/web-api/reference/player/get-recently-played]["Recently played"]] API endpoint **only returns the 50 most recent tracks**, which makes it kind of useless unless you export the data every hour or so.

If you care about them, might be a good idea to connect Spotify to Last.FM.

GPDR export has more tracks, but also seems incomplete (e.g. my data is missing first few years).

* Using the data

** TODO need to implement the data access bit
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ def main():
package_data={pkg: ['py.typed']},

install_requires=[
'spotipy', # API
],
extras_require={
'testing': ['pytest'],
Expand Down
111 changes: 111 additions & 0 deletions src/spotifyexport/export.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
#!/usr/bin/env python3
import argparse
import json
from functools import lru_cache
import sys
from typing import List

import spotipy # type: ignore[import]

from .exporthelpers import logging_helper
from .exporthelpers.export_helper import Json


logger = logging_helper.logger('spotifyexport')


def _cleanup(j: Json) -> Json:
'''
Clean up irrelevant (hopefully?) stuff from the data.
'''
# NOTE: for now not used.. maybe make it an optional cmdline flag?
artists = j['track']['album']['artists'] + j['track']['artists']
for k in ('external_urls', ):
for a in artists:
del a[k]
for k in ('available_markets', 'images', 'external_urls', 'href', 'uri', 'release_date_precision'):
del j['track']['album'][k]
for k in ('available_markets', 'preview_url', 'external_ids', 'external_urls', 'href', 'uri'):
del j['track'][k]
return j


def as_list(api_method) -> List[Json]:
results: List[Json] = []
while True:
offset = len(results)
cres = api_method(limit=50, offset=offset)
chunk = cres['items']
total = cres['total']
logger.debug('%s: collected: %d/%d', api_method, len(results), total)
if len(results) >= total:
break
results.extend(chunk)
# todo log?
return results


class Exporter:
SCOPE = 'playlist-read-private,user-library-read,user-read-recently-played'

def __init__(self, **kwargs) -> None:
kw = {
'scope' : self.SCOPE,
'open_browser': False,
}
kw.update(kwargs)
auth = spotipy.oauth2.SpotifyOAuth(**kw)
self.api = spotipy.Spotify(auth_manager=auth)

def export_json(self) -> Json:
playlists = as_list(self.api.current_user_playlists)
for p in playlists:
pid = p['id']
p['tracks'] = as_list(lambda *args, **kwargs: self.api.playlist_items(*args, playlist_id=pid, **kwargs))
# todo cleanup stuff??
return dict(
saved_tracks=as_list(self.api.current_user_saved_tracks),
saved_albums=as_list(self.api.current_user_saved_albums),
saved_shows =as_list(self.api.current_user_saved_shows),
# NOTE: seems that only supports the most recent 50
# https://developer.spotify.com/documentation/web-api/reference/player/get-recently-played
recently_played=self.api.current_user_recently_played(limit=50)['items'],
playlists =playlists,
)


def get_json(**params):
return Exporter(**params).export_json()


def main() -> None:
p = make_parser()
args = p.parse_args()

params = args.params
dumper = args.dumper
j = get_json(**params)
js = json.dumps(j, ensure_ascii=False, indent=1)
dumper(js)


def make_parser():
from .exporthelpers.export_helper import setup_parser, Parser
p = Parser('Export your personal Spotify data: playlists, saved tracks/albums/shows, etc. as JSON.')
setup_parser(
parser=p,
params=[
'client_id' ,
'client_secret',
'redirect_uri' ,
'cache_path' ,
]
)
return p


if __name__ == "__main__":
main()


# todo https://stackoverflow.com/a/30557896 in case of too may requests

0 comments on commit 0251d89

Please sign in to comment.