Daily movie digest Spanish TV / Part II – rewrite in Python

To learn more Python I am making up some new scripting exercises these days. I am also rewriting some Perl scripts I made last year. Today my daily Spanish TV movie email script, discussed here: rewritten and improved in Python.

What is this script about?

This script queries http://www.sincroguia.tv/todas-las-peliculas.html for movies that will be aired on Spanish TV today. For each title it crawls the corresponding URL for additional details. All is formatted in an output I get emailed everyday via a cronjob on my webhost.

What could be better since last time?

    featured image

  • All titles were shown in verbose mode, so if I wanted to quickly see what was on, it required a lot of scrolling, not good. Hence the script prints a summary first now (example at the end of this post).
  • Related: the script has day detection now, because during weekdays I only want to know what is on starting 8pm, for weekend days I want to have the movie guide for the whole day.
  • The movie URLs get more thoroughly parsed, providing more movie info (kudos to sincroguia.tv, the movie info is actually quite good)
  • Spanish movie titles have their English counter-title on the details page, so I pushed this vital piece of information to the top summary. This way I can quickly see what movie it actually is !
  • Structure code in OOP: this is a new trend in my coding lately and I feel code gets much cleaner, and potentially more re-usable. The class is a black box and somebody could just plug it in and call the methods he/she is interested in, in this case only two, but it makes the point I think:
  • t = TvCine()

    I still think the methods should be shorter and over time I want to introduce TDD to make it all more robust, but you have to start somewhere. I think this version is much more readable than the Perl variant (any opinions and suggestions are welcome in the comments of course). Btw, Why Python? is an interesting read if you also consider Python after or alongside Perl.

The script

Without further ado, see also on github:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Bob Belderbos / written: Dec 2012
# Purpose: get movies aired on Spanish tv to use in 24-hour cronjob
import pprint, urllib, re, sys, datetime
from bs4 import BeautifulSoup as Soup

class TvCine(object):

  def __init__(self):
    """ Setup variables, define hour range of which I want to know the movie airing of """
    # if weekday (0-4 - 0 being Monday) show movies from 20-24h, weekend I want to see all movies aired: 
    self.weekday = datetime.datetime.today().weekday() 
    if self.weekday in [5,6]: # 5 = Sat, 6 = Sun
      self.START_TIME = 9
      self.START_TIME = 20
    # always end at midnight (tomorrow a new day, so a new output from cron)
    self.END_TIME = 00
    self.moviePage = "http://www.sincroguia.tv/todas-las-peliculas.html" 
    self.movies = self.parse_movies()
    # pprint.pprint(self.movies); sys.exit()

  def parse_movies(self):
    """ Import the movie URL """
    soup = Soup(self.read_url(self.moviePage))
    movies = []
    for link in soup.find_all("a"):
      time = link.previous_sibling
        channel = re.sub(r".* - ", "", str(link.contents[0].encode(encoding='UTF-8',errors='strict')))
        channel = "not_found" 
      url = link.get('href')
      title = link.get('title')
      if not "/peliculas/" in url: continue
      if int(time[:2]) < self.START_TIME: continue
      if time[:2] == self.END_TIME: break
      (longTitle, verboseInfo) = self.get_movie_verbose_info(title, url)
      movies.append({ 'time': time[0:6], 
                      'channel': channel,
                      'url': url.encode(encoding='UTF-8',errors='strict'), 
                      'info': verboseInfo.encode(encoding='UTF-8',errors='strict'),  
    return movies

  def get_movie_verbose_info(self, title, url):
    """ Read the movie page in and return the translated title if available and all movie info """
    html = self.read_url(url)
    # try to get the relevant html section of the movie page, if nothing found too bad, move on
    soup = self.filter_relevant_bits(html)
    titleInfo = ficha = contentficha = ""
    lineNum = 0
    if soup: 
      for line in soup.li.stripped_strings: 
        ficha += line + "n"
      for line in soup.find_all('li')[1].stripped_strings:
        lineNum += 1
        if lineNum<3: titleInfo += line + " "
        contentficha += line + "n"
      ficha = "Not able to obtain movie info for %s" % title
    return (titleInfo, ficha+"n"+contentficha)

  def read_url(self, url):
    """ Read and return the content of a url """
    f = urllib.urlopen(url) 
    html = f.read()
    return html

  def filter_relevant_bits(self, html):
    """ Get the html part that matters from the movie page """
    a = html.split('class="ficha">')
      movieInfo = a[1].split('<a href="javascript:;" onclick="remote')
    except IndexError:
      return False 
    soup = Soup(movieInfo[0]) 
    return soup

  def print_movie_titles(self): 
    """ Print all the movie titles to be aired on Spanish TV today """
    print "I. Movies Spanish TV Today %s:00-%s:00n" % (self.START_TIME, self.END_TIME)
    for m in self.movies:
      print m['time'], " | ", "%-8s" % m['channel'], " | ", m['title']
    print "nn"

  def print_movie_details(self):
    """ Print verbose details for each movie """
    print "II. Details for each movie ... n" 
    for m in self.movies:
      print "+" * 80
      print m['time'], " | ", "%-8s" % m['channel'], " | ", m['title']
      print "+" * 80
      print "URL: n" + m['url']
      print "nDetails: n" + m['info']
      print "nn"

### instant
t = TvCine()

How does the output look?

I can send you one, or download the script and run it yourself. This is a snippet (the Spanish accents actually work in my terminal and email, not sure why they get messed up here):

$ python cine_tv.py 

I. Movies Spanish TV Today 20:00-0:00

20:00   |  L63       |  Cine: Poltergeist (Fenómenos extraños) Cine - Terror 
20:25   |  PARAM     |  El guerrero americano II: la confrontación (American Ninja 2: The Confrontation) 
22:00   |  PARAM     |  Seabiscuit, más allá de la leyenda Cine - Drama 
22:00   |  L63       |  Cine: Poltergeist II (Poltergeist II: The Other Side) 
22:00   |  T5        |  Cine: El equipo A (The A-Team) 
22:00   |  La2       |  El cine de La 2: Oliver Twist Cine - Drama 
22:10   |  A3        |  El peliculón: Toy Story 3 Cine - Animación 
22:25   |  La6       |  Cine: Estado de sitio (The Siege) 
22:30   |  La1       |  Cine: Ocean's Eleven (Ocean's Eleven) 
22:30   |  Nova      |  Cine: Homicidio en primer grado (Murder in the First) 

II. Details for each movie ... 

20:00   |  L63       |  Cine: Poltergeist (Fenómenos extraños) Cine - Terror 

Tobe Hooper
JoBeth Williams, Oliver Robins, Heather O'Rourke, Beatrice Straight, Craig T. Nelson
Steven Spielberg, Michael Grais, Mark Victor
Jerry Goldsmith
Director de Fotografía:
Matthew F. Leonetti
Steven Spielberg, Frank Marshall
Metro-Goldwyn-Mayer (MGM), SLM Production Group
Idioma Original:
Estados Unidos
114          minutos
Todos los Públicos

Cine: Poltergeist (Fenómenos extraños)
Cine - Terror
Miércoles 02 de Enero de 2013
        20:00        / Fin:
Calificación Artística:
Calificación Comercial:
Una familia estadounidense padece fenómenos paranormales en su casa. Al principio, los espíritus se manifiestan moviendo muebles y demás objetos del hogar. Pero pronto se vuelven agresivos y secuestran a la hija pequeña de la familia. Cuando todas las explicaciones científicas y racionales han fracasado, los padres contratan a una espiritista que intentará limpiar la casa y recuperar a la niña.
Producida por Steven Spielberg (quien se rumorea que también dirigió parte del filme), "Poltergeist" fue una de las películas de terror más exitosas de la década de 1980. Recuperó para el cine a Tobe Hooper ("La matanza de Texas"), un nombre legendario del género. La cinta está basada en un capítulo de la serie "La dimensión desconocida" titulado "Niña perdida". "Poltergeist (Fenómenos extraños)" dio pie a dos secuelas y hasta a una serie de televisión.

20:25   |  PARAM     |  El guerrero americano II: la confrontación (American Ninja 2: The Confrontation) 

etc. etc.

In closing

Hope this inspires you to try to come up with coding exercises yourself to share. Feel free to ping me for ideas and suggestions.

I wish you all a Happy New Year

© 2009-2015 - Bob Belderbos - All rights reserved.

- If you like something here, link to it instead of copy+paste.
- Disclaimer: ideas expressed on my blog are mine, and have nothing to do with the current/previous employers.
- Proudly using Wordpress and the Insider Theme on Bluehost