This week youll apply your knowledge of data collections for

This week, you\'ll apply your knowledge of data collections for sentiment analysis, a common technique applied to movie, product, and business reviews, as well as social media posts.

For this assignment, you\'ll determine whether movie reviews are positive or negative.

Modify the wordfreq.py program described in the textbook (available here - http://mcsp.wartburg.edu/zelle/python/ppics2/code/chapter11/ (Links to an external site.) ) to evaluate whether a particular review is positive, negative, or neutral.

Your modified program should:

exclude \'stop\' words from your word counts, using the the below list;

a, an, and, as, at, be, but, etc, for, in, it, its, is, of, or, so, such, the, this, to, with

print the remaining top 25 words, along with their frequency,

print the top 5 positive and top 5 negative words, along with their frequency,

calculate and display a sentiment score for the review, where the score is incremented (+1) for each positive word in the review and decremented (-1) for each negative word,

Hint

Solution

# wordfreq.py

def byFreq(pair):
    return pair[1]

# list of stop words
stopWords = [\"a\", \"an\", \"and\", \"as\", \"at\", \"be\", \"but\", \"etc\", \"for\", \"in\", \"it\",
\"its\", \"is\", \"of\", \"or\", \"so\", \"such\", \"the\", \"this\", \"to\", \"with\"]

# Stores positive and negative words frequency
posReviewCount = {}
negReviewCount = {}

# Can store stop words as the keys of the dictionary, so to reduce the access time
def isStopWord(inputWord) :
    for word in stopWords :
        if word == inputWord :
            return True
    return False

def countWordFrequency(filename, frequencyDict) :
    textLines = open(filename, \'r\', encoding=\'utf-8\', errors=\'ignore\').read().splitlines()[35:]
    for line in textLines :
        line = line.lower()
        for ch in \'!\"#$%&()*+,-./:;<=>?@[\\\\]^_`{|}~\':
            line = line.replace(ch, \' \')
            words = line.split()
            for word in words:
                if isStopWord(word):
                    continue
                frequencyDict[word] = frequencyDict.get(word,0) + 1
    return frequencyDict;


#prints top n words in the dictionary ordered by frequency
def printTopWords(frequencyDict, n) :
    items = list(frequencyDict.items())
    # items.sort()
    # items.sort(key=byFreq, reverse=True)
    for i in range(n):
        word, count = items[i]
        print(\"{0:<15}{1:>5}\".format(word, count))
    print

def calculateReviewScore(inputReview, posReviewCount, negReviewCount) :
    inputReview = inputReview.lower()
    for ch in \'!\"#$%&()*+,-./:;<=>?@[\\\\]^_`{|}~\':
        inputReview = inputReview.replace(ch, \' \')
    inputWords = inputReview.split()
    score = 0
    for word in inputWords:
        if isStopWord(word):
            continue
        if posReviewCount.has_key(word) :
            score = score + 1
        elif negReviewCount.has_key(word) :
            score = score - 1
    return score;

def main():
    print(\"This program analyzes word frequency in a file\")
    print(\"and prints a report on the n most frequent words.\ \")

    countWordFrequency(\"negative-words.txt\", negReviewCount)
    countWordFrequency(\"positive-words.txt\", posReviewCount)

    print \"Top 5 positive words are : \"
    printTopWords(posReviewCount, 5)

    print \"Top 5 negative words are : \"
    printTopWords(negReviewCount, 5)

    print \"\ Enter a review, (enter empty-line to save) : \"
    multiLines = []
    while True:
        line = raw_input()
        if line:
            multiLines.append(line)
        else:
            break
    inputReview = \'\ \'.join(multiLines)

    reviewScore = calculateReviewScore(inputReview, posReviewCount, negReviewCount)
    print \"Average Score for the review is \", reviewScore

if __name__ == \'__main__\': main()

This week, you\'ll apply your knowledge of data collections for sentiment analysis, a common technique applied to movie, product, and business reviews, as well
This week, you\'ll apply your knowledge of data collections for sentiment analysis, a common technique applied to movie, product, and business reviews, as well

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site