To learn more Python I am making up some new scripting exercises these days. I am also rewriting some Perl scripts I made last year. Today my daily Spanish TV movie email script, discussed here: rewritten and improved in Python.
What is this script about?
This script queries http://www.sincroguia.tv/todas-las-peliculas.html for movies that will be aired on Spanish TV today. For each title it crawls the corresponding URL for additional details. All is formatted in an output I get emailed everyday via a cronjob on my webhost.
What could be better since last time?
- All titles were shown in verbose mode, so if I wanted to quickly see what was on, it required a lot of scrolling, not good. Hence the script prints a summary first now (example at the end of this post).
- Related: the script has day detection now, because during weekdays I only want to know what is on starting 8pm, for weekend days I want to have the movie guide for the whole day.
- The movie URLs get more thoroughly parsed, providing more movie info (kudos to sincroguia.tv, the movie info is actually quite good)
- Spanish movie titles have their English counter-title on the details page, so I pushed this vital piece of information to the top summary. This way I can quickly see what movie it actually is !
- Structure code in OOP: this is a new trend in my coding lately and I feel code gets much cleaner, and potentially more re-usable. The class is a black box and somebody could just plug it in and call the methods he/she is interested in, in this case only two, but it makes the point I think:
t = TvCine() t.print_movie_titles() t.print_movie_details()
I still think the methods should be shorter and over time I want to introduce TDD to make it all more robust, but you have to start somewhere. I think this version is much more readable than the Perl variant (any opinions and suggestions are welcome in the comments of course). Btw, Why Python? is an interesting read if you also consider Python after or alongside Perl.
The script
Without further ado, see also on github:
#!/usr/bin/env python # -*- coding: utf-8 -*- # Author: Bob Belderbos / written: Dec 2012 # Purpose: get movies aired on Spanish tv to use in 24-hour cronjob # import pprint, urllib, re, sys, datetime from bs4 import BeautifulSoup as Soup class TvCine(object): def __init__(self): """ Setup variables, define hour range of which I want to know the movie airing of """ # if weekday (0-4 - 0 being Monday) show movies from 20-24h, weekend I want to see all movies aired: self.weekday = datetime.datetime.today().weekday() if self.weekday in [5,6]: # 5 = Sat, 6 = Sun self.START_TIME = 9 else: self.START_TIME = 20 # always end at midnight (tomorrow a new day, so a new output from cron) self.END_TIME = 00 self.moviePage = "http://www.sincroguia.tv/todas-las-peliculas.html" self.movies = self.parse_movies() # pprint.pprint(self.movies); sys.exit() def parse_movies(self): """ Import the movie URL """ soup = Soup(self.read_url(self.moviePage)) movies = [] for link in soup.find_all("a"): time = link.previous_sibling try: channel = re.sub(r".* - ", "", str(link.contents[0].encode(encoding='UTF-8',errors='strict'))) except: channel = "not_found" url = link.get('href') title = link.get('title') if not "/peliculas/" in url: continue if int(time[:2]) < self.START_TIME: continue if time[:2] == self.END_TIME: break (longTitle, verboseInfo) = self.get_movie_verbose_info(title, url) movies.append({ 'time': time[0:6], 'channel': channel, 'title':longTitle.encode(encoding='UTF-8',errors='strict'), 'url': url.encode(encoding='UTF-8',errors='strict'), 'info': verboseInfo.encode(encoding='UTF-8',errors='strict'), }) return movies def get_movie_verbose_info(self, title, url): """ Read the movie page in and return the translated title if available and all movie info """ html = self.read_url(url) # try to get the relevant html section of the movie page, if nothing found too bad, move on soup = self.filter_relevant_bits(html) titleInfo = ficha = contentficha = "" lineNum = 0 if soup: for line in soup.li.stripped_strings: ficha += line + "n" for line in soup.find_all('li')[1].stripped_strings: lineNum += 1 if lineNum<3: titleInfo += line + " " contentficha += line + "n" else: ficha = "Not able to obtain movie info for %s" % title return (titleInfo, ficha+"n"+contentficha) def read_url(self, url): """ Read and return the content of a url """ f = urllib.urlopen(url) html = f.read() f.close return html def filter_relevant_bits(self, html): """ Get the html part that matters from the movie page """ a = html.split('class="ficha">') try: movieInfo = a[1].split('<a href="javascript:;" onclick="remote') except IndexError: return False soup = Soup(movieInfo[0]) return soup def print_movie_titles(self): """ Print all the movie titles to be aired on Spanish TV today """ print "I. Movies Spanish TV Today %s:00-%s:00n" % (self.START_TIME, self.END_TIME) for m in self.movies: print m['time'], " | ", "%-8s" % m['channel'], " | ", m['title'] print "nn" def print_movie_details(self): """ Print verbose details for each movie """ print "II. Details for each movie ... n" for m in self.movies: print "+" * 80 print m['time'], " | ", "%-8s" % m['channel'], " | ", m['title'] print "+" * 80 print "URL: n" + m['url'] print "nDetails: n" + m['info'] print "nn" ### instant t = TvCine() t.print_movie_titles() t.print_movie_details()
How does the output look?
I can send you one, or download the script and run it yourself. This is a snippet (the Spanish accents actually work in my terminal and email, not sure why they get messed up here):
$ python cine_tv.py I. Movies Spanish TV Today 20:00-0:00 20:00 | L63 | Cine: Poltergeist (Fenómenos extraños) Cine - Terror 20:25 | PARAM | El guerrero americano II: la confrontación (American Ninja 2: The Confrontation) 22:00 | PARAM | Seabiscuit, más allá de la leyenda Cine - Drama 22:00 | L63 | Cine: Poltergeist II (Poltergeist II: The Other Side) 22:00 | T5 | Cine: El equipo A (The A-Team) 22:00 | La2 | El cine de La 2: Oliver Twist Cine - Drama 22:10 | A3 | El peliculón: Toy Story 3 Cine - Animación 22:25 | La6 | Cine: Estado de sitio (The Siege) 22:30 | La1 | Cine: Ocean's Eleven (Ocean's Eleven) 22:30 | Nova | Cine: Homicidio en primer grado (Murder in the First) II. Details for each movie ... ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 20:00 | L63 | Cine: Poltergeist (Fenómenos extraños) Cine - Terror ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ URL: http://www.sincroguia.tv/peliculas/poltergeist_fenos_extra_18570664.html Details: Director: Tobe Hooper Intérpretes: JoBeth Williams, Oliver Robins, Heather O'Rourke, Beatrice Straight, Craig T. Nelson Guión: Steven Spielberg, Michael Grais, Mark Victor Música: Jerry Goldsmith Director de FotografÃa: Matthew F. Leonetti Producción: Steven Spielberg, Frank Marshall Productora: Metro-Goldwyn-Mayer (MGM), SLM Production Group Idioma Original: Inglés Nacionalidad: Estados Unidos Año: 1982 Duración: 114 minutos Edad: Todos los Públicos Cine: Poltergeist (Fenómenos extraños) Cine - Terror laSexta3 Miércoles 02 de Enero de 2013 Inicio: 20:00 / Fin: 22:00 Terror Calificación ArtÃstica: Calificación Comercial: Una familia estadounidense padece fenómenos paranormales en su casa. Al principio, los espÃritus se manifiestan moviendo muebles y demás objetos del hogar. Pero pronto se vuelven agresivos y secuestran a la hija pequeña de la familia. Cuando todas las explicaciones cientÃficas y racionales han fracasado, los padres contratan a una espiritista que intentará limpiar la casa y recuperar a la niña. Producida por Steven Spielberg (quien se rumorea que también dirigió parte del filme), "Poltergeist" fue una de las pelÃculas de terror más exitosas de la década de 1980. Recuperó para el cine a Tobe Hooper ("La matanza de Texas"), un nombre legendario del género. La cinta está basada en un capÃtulo de la serie "La dimensión desconocida" titulado "Niña perdida". "Poltergeist (Fenómenos extraños)" dio pie a dos secuelas y hasta a una serie de televisión. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 20:25 | PARAM | El guerrero americano II: la confrontación (American Ninja 2: The Confrontation) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ URL: http://www.sincroguia.tv/peliculas/el_guerrero_americano_ii_la_confrontaci_18511018.html Details: .. .. etc. etc. .. ..
In closing
Hope this inspires you to try to come up with coding exercises yourself to share. Feel free to ping me for ideas and suggestions.
I wish you all a Happy New Year