Please help Writing this in python Data Mining the Internet

Please help! Writing this in python

Data Mining the Internet Movie Database

Websites like the Internet Movie Database (www.imdb.com) maintain extensive information about movies and actors. If you search for a movie on the website, a web page showing information about the movie is displayed. It also shows all the actors in the movie. If you click on the link for an actor, you are taken to an actor’s page, where you can find information about him or her, including the movies the actor has appeared in. This assignment should give you some insight into the working of such websites. Here is what we’d like to do with the data:

(a) Given two titles of a movie, each representing the set of actors in that movie:

i. Find all the actors in those movies: i.e., A union B (A & B).

ii. Find the common actors in the two movies: i.e., A intersection B (A | B).

iii. Find the actors who are in either of them movies but not both: symmetric difference (A - B).

(b) Given an actor’s name, find all the actors with whom he or she has acted.

The data are available as a huge, compressed text file (at www.imdb.com/interfaces) that lists each actor followed by his or her movies and the year the movies were made.

Here is a small sample (also available on the text website) that you can work with for this exercise:

Brad Pitt, Meet Joe Black (1998), Oceans Eleven (2001), Se7en (1995), Mr & Mrs Smith (2005) Tom Hanks, Sleepless in Seattle (1993), Catch Me If You Can (2002), You’ve got mail (1998) Meg Ryan, You’ve got mail (1998), Sleepless in Seattle (1993), When Harry Met Sally (1989) Anthony Hopkins, Hannibal (2001), The Edge (1997), Meet Joe Black (1998), Proof (2005) Alec Baldwin, The Edge (1997), Pearl Harbor (2001) Angelina Jolie, Bone Collector (1999), Lara Croft Tomb Raider (2001), Mr & Mrs Smith (2005) Denzel Washington, Bone Collector (1999), American Gangster (2007) Julia Roberts, Pretty Woman (1990), Oceans Eleven (2001), Runaway Bride (1999) Gwyneth Paltrow, Shakespeare in Love (1998), Bounce (2000), Proof (2005) Russell Crowe, Gladiator (2000), Cinderella Man (2005), American Gangster (2007) Leonardo Di Caprio, Titanic (1997), The Departed (2006), Catch Me If You Can (2002) Tom Cruise, Mission Impossible (1996), Jerry Maguire (1996), A Few Good Men (1992) George Clooney, Oceans Eleven (2001), Intolerable Cruelty (2003) Matt Damon, Good Will Hunting (1997), The Departed (2006), Oceans Eleven (2001) Ben Affleck, Bounce (2000), Good Will Hunting (1997), Pearl Harbor (2001) Morgan Freeman, Bone Collector (1999), Se7en (1995), Million Dollar Baby (2004) Julianne Moore, Assassins (1995), Hannibal (2001) Salma Hayek, Desperado (1995), Wild Wild West (1999) Will Smith, Wild Wild West (1999), Hitch (2005), Men in Black (1997) Renee Zellweger, Me-Myself & Irene (2000), Jerry Maguire (1996), Cinderella Man (2005)

Repeatedly prompt the user until some sentinel is entered. If two movies are entered, they should be separated by the appropriate operator: &, |, to indicate the appropriate set operation to be performed (union, intersection, symmetric difference). If an actor is entered, find all the actors that he or she has been in movies with.

Solution

import csv

import re

import sys

li=[]
dict1 = {}
dict2 ={}
with open(\'/home/naresh/Desktop/file.txt\',\'r\') as csvfile:
csvreader = csv.reader(csvfile)
for line in csvreader:
print(line)
li.append(list(line))   

def generate_set(inp,Lable):
a,b,s1,s2=[],[],[],[]
c,d=set(),set()
x=re.split(\"[&|-]\", inp)
for i in range(len(li)):
for j in range(1,len(li[i])):
if li[i][j]==x[0]:
s1=li[i][0]
a.append(s1)
if li[i][j]==x[1]:
s2=li[i][0]
b.append(s2)
#print(a) uncommnet this to see first movie set.
#print(b) uncommnet this to see second movie set.
c=set(a)
d=set(b)
if Lable==\"union\":
print(\"\ All the actors in those movies:\")
print(c.union(d))
if Lable==\"intersect\":
print(\"\ the common actors in the two movies:\")
print(c.intersection(d))
if Lable==\"diff\":
print(\"\ the actors who are in either of them movies but not both:\")
print(c.difference(d).union(d.difference(c)))

inp=input()
if re.search(\"[&|-]\", inp):
while True:
if re.search(\"[|]\", inp):
Lable= \"union\"
#print(Lable)
generate_set(inp,Lable);
if re.search(\"[&]\", inp):
Lable=\"intersect\"
#print(Lable)
generate_set(inp,Lable);
if re.search(\"[-]\", inp):
Lable=\"diff\"
#print(Lable)
generate_set(inp,Lable)
if inp==\"exit\":
sys.exit()
else:
print(\"\ Enter 2 movies with either of these (&,|,-) operator or type exit to stop the program:\")
inp=input()
else:
for line in li:
key, value = line[0],line[1:]
dict1[key] = value
dict2 = dict1

if inp in dict1.keys():
l2 = dict1[inp]
else:
print(\"entered value is not found\")
dict2.pop(inp,None)
for key,value in dict2.items():
for val in l2:
if val in value:
print(key)

Please help! Writing this in python Data Mining the Internet Movie Database Websites like the Internet Movie Database (www.imdb.com) maintain extensive informat
Please help! Writing this in python Data Mining the Internet Movie Database Websites like the Internet Movie Database (www.imdb.com) maintain extensive informat

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site