A while back when I was developing MoviePlay (a movies database) I wanted to allow the user to get the movie’s details from IMDb (The Internet Movie Database) instead of having him/her enter it manually. IMDb does not provide a way for developers to use their data, so I wrote the code below to screen scrape or web scrape the details from the movie’s page. It gets the title, director, genre, release year and tag line of the movie as well as the thumbnail.

Update: The code used here doesn’t work anymore since IMDB’s last design update. The values were optained by web scraping the pages, and this depended a lot on the design of the site remaining reletively the same. Sorry for the inconvenience.



The first group of fields in the code below will be used for scraping. _SiteTitle contains the URL with out the movie’s number, so concatenating it with _IMDBNo will get the full URL of the movie. _FileContent will contain the raw HTML of the page. And htmlCodes is an HTMLCodes object used to remove the HTML tags. The Constructor just initializes _IMDBNo with the number passed by the user, downloads the HTML page into _FileContent, and initializes htmlCodes.

Scraping is done in the GetInfo method. I used regular expressions to get the data from the raw HTML. I got the movie’s title from the HTML Title tag, and for the rest I filtered the Body tag for the fields’ descriptions. For example to get the release date of the movie I searched for the Date .*.

The GetPhoto method will get the thumbnail of the movie, which is on the top left corner in IMDb’s current website theme. To get the URL of the image I searched for an img tag that contains the movie’s title. This gets me the thumbnail image because IMDb always sets its alternate tag to the name of the movie.


    Tnx, but it doesnt work anymore, since imdb updated their site :C

  3. @Robin, yes it doesn’t work anymore. I used screen scraping to get the details which depends a lot on the page’s design staying the same. I’ll check if they provide an API or not.

    Is there an updated version in the meantime?

    This is not a valid or accepted way to get IMDb data since IMDb terms say this: “Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site.”

    How to search by name instead of title?