November 15th, 2015
Hi again, some more coding today. My original attempt at this broke some time ago, because themoviedb's page HTML changed. Lesson learned: rather use an API.
So I rewrote this script, improving it and adding some cool new features. Check out the code/readme on github.
main.py [options] Options: -h, --help show this help message and exit -a ACTOR, --actor=ACTOR filter on actor (not yet implemented) -c CATEGORY, --category=CATEGORY category [now_playing, upcoming, top_rated, popular] -d DIRECTOR, --director=DIRECTOR filter on director (not yet implemented) -g GENRES, --genres=GENRES filter on genres (not yet implemented) -l LISTING, --listing=LISTING create email from themoviedb list URL -m, --mailres mail the html to recipients -n NUMRES, --numres=NUMRES number of results -p, --printres print the html
To use it yourself:
- create a flat text file called 'key' with your themoviedb API key;
- create a flat text file called 'recipients' with one or more emails (each one on a new line);
- use a cron job to call the script to send one or more notification alerts, I use the weekly.py script for this (note I also added a customized banner, if you don't want it modify generate_header in output.py).
Some things I picked up:
Apart from the end result - a rich html email with new movie details -it's nice to see what you can learn from this kind of project:
- for any data mining, use the API if available, relying on HTML is trickier. Themoviedb has a great API I already used for my sharemovi.es site, for this exercise I used the friendly tmdbsimple Python wrapper.
- instead of a text file or DB, I use the Python shelve for caching. I store movie IDs and movie details (info and credits = cast) as a key,value pairs. So data for each movie ID is downloaded once, then re-used for priting, mailing, etc. Also, storing objects is a pretty flex solution, you can load them from the shelve and call methods on them). See cache.py for its implementation. I used the shelve recently for the Safari new books project too (see here)
- use several classes (files) to decouple the design (cache, mail, output generation, movie objects, handle movie lists). The code was easy to extend, for example to work with customized (themoviedb) lists. I only had to write some code to crawl the list page (tmdbsimple does not support lists), the subsequent actions of shelving, printing and emailing worked out of the box, passing a list of movie IDs.
- optparse / OptionParser makes CLI switch handling very easy and compact, I use it all the time.
- I learned how to Python email using bcc and how to exclude mails and API key from the code putting them in files and ignore them from version control with .gitignore, pretty important.
- probably a couple of things more ... comment if you like to know more ...
II. of a customized list (hacker movies)
You can now add movies from the newsletter to your sharemovi.es watchlist when FB logged in - see the small link at the "Released" line, alongside the IMDB link:
- api (18) ,
- movies (14) ,
- optparse (1) ,
- python (11) ,
- sharemovi.es (7) ,
- shelve (1) ,
- themoviedb (2) ,
- tmdb (1) ,
- tmdbsimple (1)