Code is written in python The file can be downloaded here ht
Code is written in python
The file can be downloaded here: https://relate.cs.illinois.edu/course/cs101sp17/f/media/batting.csv
Calculating baseball statistics in a file 5 points The Lahman Baseball Database is a comprehensive database of Major League baseball statistics. The journalist Sean Lahman provides all of this data freely to the public. We will make use of some of his data in this assignment. If you would like to learn more about the database, you can visit his website We provide you with a csv file named batting t contains the annual batting performance data for all Major League Baseball players dating back to the year 1871. The first row in the file is a header indicating what data is stored in each column of the file. Forexample, column 12 is labeled \"HR\" and contain the number of home runs the player hit tha year Each the of next 99,846 lines contains a comma separated list of the data for that player and year. For example, the fifth line in the file indicates that a player with the ID alli sdo01 hit 2 home runs in 1871 You should download battin csv and place it in the same directory as your Python code Your job will be to write a Python program that finds the player ID of the player with the highest total career RBIs of a me. But be careful your program should work with any similarly formatted CSV fle Data Input: Opening the file file using the open function and iterating through each line (remember that you can use either read or First, you wi ll need to read in the data file You can do this by opening the readlines l strongly urge the latter of these). To parse the data contained in each line, you will need to use the split method. We are interested in two columns playerID and RBI (Your program should skip the header in the file and completely ignore any lines where the RBI column does not contain a digit.) You should create an accumulator dictionary called career rbis at maps each player ID string to an integer representing the total number of RBIs for that player As you literat through the file, you should update the career rbis dictionary. Duplicate entries should be ignored at this point Data Processing: Finding the most RBIs After reading in the data and generating the career rbis\" dictionary, you should next iterate through the dictionary to find the player with the most careerRBIs (thus summing duplicate entries) You will need two accumulator variables to track both the most RBIs you\'ve seen so far AND the player having that many RBIs. Store an integer representing the highest number of RBIs in a variable named max rbis and the corresponding player id string in a variable named max player DO NOT try to write your code here. Debugging it here will be very difficult. You should write and test your code on your own computer Using files and directories When you submit code here, you should use open with no directory path. On your own machine things may behave diferently. Briefly, the best thing to do is to batting .csv move your file batting.csv here. Otherwise you can try to find out where your file is located and refer to it directly using the code shared in figure out where Python is running a lecture You can also use the smaller batting test.csv to test your code Your submission should include the following variables defined correc tly: career rbis max rbis max playerSolution
import pandas as pd
import scipy as sp
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
batting = pd.read_csv(\'../data/Batting.csv\')
batting = batting[batting[\'playerID\'] ]
batting = batting[[\'playerID\',\'yearID\',\'stint\', \'teamID\', \'IgID\', \'G\', \'AB\', \'R\', \'H\', \'2B\', \'3B\', \'HR\', \'RBI\', \'SB\', \'CS\', \'BB\', \'SO\']]
batting = batting.set_index([\'playerID\', \'RBI\'])
batting[\'G\'][\'birdsda01\', \'24\']
def plot_spending_rbi(batting, playerID):
batting_playerID = batting.xs(playerID)
fig, ax = plt.subplots()
for i in batting_playerID.index:
if i == \'birgsda01\':
ax.scatter(batting_playerID[\'batting\'][i], batting_playerID[\'G\'][i], color=\"#4DDB94\", s=200)
ax.annotate(i, (batting_playerID[\'batting\'][i], batting_playerID[\'G\'][i]),
bbox=dict(boxstyle=\"round\", color=\"#4DDB94\"),
xytext=(-30, 30), textcoords=\'offset points\',
arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"angle,angleA=0,angleB=90,rad=10\"))
elif i == \'NA\':
ax.scatter(batting_playerID[\'batting\'][i], batting_playerID[\'G\'][i], color=\"#0099FF\", s=200)
ax.annotate(i, (batting_playerID[\'batting\'][i], batting_playerID[\'G\'][i]),
bbox=dict(boxstyle=\"round\", color=\"#0099FF\"),
xytext=(30, -30), textcoords=\'offset points\',
arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"angle,angleA=0,angleB=90,rad=10\"))
elif i == \'BOS\':
ax.scatter(batting_playerID[\'batting\'][i], batting_playerID[\'G\'][i], color=\"#FF6666\", s=200)
ax.annotate(i, (batting_playerID[\'batting\'][i], batting_playerID[\'G\'][i]),
bbox=dict(boxstyle=\"round\", color=\"#FF6666\"),
xytext=(-30, 30), textcoords=\'offset points\',
arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"angle,angleA=0,angleB=90,rad=10\"))
else:
ax.scatter(batting_playerID[\'batting\'][i], batting_playerID[\'G\'][i], color=\"grey\", s=200)
ax.xaxis.set_major_formatter(formatter)
ax.tick_params(axis=\'x\', labelsize=15)
ax.tick_params(axis=\'y\', labelsize=15)
ax.set_xlabel(\'RBI\', fontsize=20)
ax.set_ylabel(\'Number of games\' , fontsize=20)
ax.set_title(\'rbi - games: \'+ str(year), fontsize=25, fontweight=\'bold\')
plt.show()
plot_spending_games(batting, birdsda01)
#First Model
runs_reg_model1 = sm.ols(\"R~OBP+SLG+BA\",batting)
runs_reg1 = runs_reg_model1.fit()
#Second Model
runs_reg_model2 = sm.ols(\"R~OBP+SLG\",batting)
runs_reg2 = runs_reg_model2.fit()
runs_reg1.summary()
runs_reg2.summary()

