This what i have so far loading libraries import numpy as np

This what i have so far

#loading libraries

import numpy as np

import csv

with open(\'energy.csv\', \'rb\') as csvfile:

lines = csv.reader(csvfile)

for row in lines:

print \', \'.join(row)

import csv

with open(\'energy.csv\', \'rb\') as csvfile:

lines = csv.reader(csvfile)

for row in lines:

print \', \'.join(row)

# define column names

names = [\'Countries\', \'Years\', \'Number of Countries\', \'Mean\', \'Small Production\',

\'Average Production\', \'Large Production\']

# create design matrix X and target vector y

X = np.array(df.ix[:, 0:4]) # end index is exclusive

y = np.array(df[\'Countries\']) # another way of indexing a pandas df

Assignment:

The file energy.csv contains the values of energy generated for each country in Terawatt Hours for different years(fromOECD Factbook 2011: Economic, Environmental and Social Statistics).

The data contains missing values (denoted by..) and also aggregate rows for the EU, OECD, and World countries.

Requirements

You are to create a program in Python that performs the following:

1.Loads the energy.csv file (assume it\'s in the current directory) and create a DataFrame object from it.

2.Replaces the missing values with the mean energy production for that country.

3.Removes the data for the aggregate values (EU27, OECD, World)

4.Adds a new column called \'Continent\' and fills it in with the continent that corresponds to each country.

5.Creates a DataFrame that contains the continent name as the index and the following columns:

a.\'num_countries\'=numberof countries for this continent

b.\'mean\'=mean energy production of the countries in this continent

c.\'small_production \' =1 if continent mean is less than the mean production of all countries minus one standard deviation; 0 otherwise

d.\'avg_production\' = 1 if continent mean is greater than the mean production of all countries minus one standard deviation , but less than the mean production of all countries plus one standard deviation; 0 otherwise

e.\'large_production\' = 1 if continent mean is greater than the mean production of all countries plus one standard deviation; 0 otherwise

6.Display the new DataFrame to the screen

1. The name of your source code file should be DataPrep.py. All your code should be withina single file.

2.You need to use the pandas DataFrame object for storing data.

3.Your code should follow good coding practices, including good useof whitespace and use of both inline and block comments.

4.You need to use meaningful identifier names that conform to standard naming conventions.

5.At the top of each file, you need to put in a block comment with the following information: your name,

date, course name, semester, and assignment name.

Data File: energy.csv

Continent Map?

u\'Australia\':\'Australia\',
u\'Austria\':\'Europe\',
u\'Belgium\':\'Europe\',
u\'Canada\':\'North America\',
u\'Chile\':\'South America\',
u\'CzechRepublic\':\'Europe\',
u\'Denmark\':\'Europe\',
u\'Estonia\':\'Europe\',
u\'Finland\':\'Europe\',
u\'France\':\'Europe\',
u\'Germany\':\'Europe\',
u\'Greece\':\'Europe\',
u\'Hungary\':\'Europe\',
u\'Iceland\':\'Europe\',
u\'Ireland\':\'Europe\',
u\'Israel\':\'Asia\',
u\'Italy\':\'Europe\',
u\'Japan\':\'Asia\',
u\'Korea\':\'Asia\',
u\'Luxembourg\':\'Europe\',
u\'Mexico\':\'North America\',
u\'Netherlands\':\'Europe\',
u\'NewZealand\':\'Oceania\',
u\'Norway\':\'Europe\',
u\'Poland\':\'Europe\',
u\'Portugal\':\'Europe\',
u\'SlovakRepublic\':\'Europe\',
u\'Slovenia\':\'Europe\',
u\'Spain\':\'Europe\',
u\'Sweden\':\'Europe\',
u\'Switzerland\':\'Europe\',
u\'Turkey\':\'Asia\',
u\'UnitedKingdom\':\'Europe\',
u\'UnitedStates\':\'North America\',
u\'Brazil\':\'South America\',
u\'China\':\'Asia\',
u\'India\':\'Asia\',
u\'Indonesia\':\'Asia\',
u\'RussianFederation\':\'Europe\',
u\'SouthAfrica\':\'Africa\'

1971 1990 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Australia 53 154.3 203.6 209.9 224.3 227.4 226.3 236.3 245.2 247 250.8 257.1 260.9 256.2
Austria 28.2 49.3 59.7 59.9 60.9 60.4 57.7 61.5 63.6 61.7 62.2 64.1 65.6 67
Belgium 33.2 70.3 83.4 82.8 78.6 80.9 83.6 84.4 85.7 84.3 87.5 83.6 89.8 95.1
Canada 221.8 482 578.9 605.6 589.8 601.2 589.5 599.9 626 615.9 642 640.9 603.1 598
Chile 8.5 18.4 38.4 40.1 42.5 43.7 46.8 51.2 52.5 55.3 58.5 59.7 60.7 62.5
CzechRepublic 36.4 62.3 64.2 72.9 74.2 76 82.8 83.8 81.9 83.7 87.8 83.2 81.7 85.3
Denmark 18.6 26 38.9 36.1 37.7 39.3 46.2 40.4 36.2 45.6 39.3 36.6 36.4 38.6
Estonia .. 17.4 8.3 8.5 8.5 8.6 10.2 10.3 10.2 9.7 12.2 10.6 8.8 13
Finland 21.7 54.4 69.5 70 74.5 74.9 84.2 85.8 70.6 82.3 81.2 77.4 72.1 80.4
France 155.8 417.2 521.3 536.1 545.7 553.9 561.8 569.1 571.5 569.3 564.4 569.5 537.4 567.6
Germany 327.2 547.7 552.5 572.3 581.9 582 601.5 608.5 613.4 629.4 629.5 631.2 586.4 614.1
Greece 11.6 34.8 49.4 53.4 53.1 53.9 57.9 58.8 59.4 60.2 62.7 62.9 61.1 60.8
Hungary 15 28.4 37.8 35.2 36.4 36.2 34.1 33.7 35.8 35.9 40 40 35.9 37.4
Iceland 1.6 4.5 7.2 7.7 8 8.4 8.5 8.6 8.7 9.9 12 16.5 16.8 17.1
Ireland 6.3 14.2 21.8 23.7 24.6 24.8 24.9 25.2 25.6 27.1 27.9 29.9 27.9 28.3
Israel 7.6 20.9 39.2 42.7 44 45.5 47 47.3 48.6 50.6 53.8 57 55 57.2
Italy 123.9 213.1 259.3 269.9 271.9 277.5 286.3 295.8 296.8 307.7 308.2 313.5 288.3 295
Japan 382.9 835.5 1028.1 1049 1030.3 1049 1038.4 1068.3 1089.9 1094.8 1125.5 1075.5 1041 1071.3
Korea 10.5 105.4 235.6 288.5 309.1 329.8 343.2 366.6 387.9 402.3 425.9 443.9 451.7 478
Luxembourg 1.3 0.6 0.4 0.4 0.9 2.8 2.8 3.4 3.3 3.5 3.2 2.7 3.2 3.2
Mexico 31 115.8 190 204.2 211.9 215.9 213.7 232.6 243.8 249.5 257.2 261.9 261 268.4
Netherlands 44.9 71.9 86.7 89.6 93.7 95.9 96.8 102.4 100.2 98.4 105.2 107.6 113.5 114.7
NewZealand 15.5 32.3 37.8 39.2 39.9 40.7 40.8 42.5 43 43.6 43.8 43.9 43.5 44.8
Norway 63.5 121.6 122.3 139.6 119.2 130.3 106.8 110.2 137.2 121.2 136.1 141.2 132 124.1
Poland 69.5 134.4 140 143.2 143.7 142.5 150 152.6 155.4 160.8 158.8 154.7 151.1 157
Portugal 7.9 28.4 42.9 43.4 46.2 45.7 46.5 44.8 46.2 48.6 46.9 45.5 49.5 52.7
SlovakRepublic 10.9 25.5 28.1 30.8 31.9 32.2 31 30.5 31.4 31.3 27.9 28.8 25.9 27.3
Slovenia .. 12.4 13.3 13.6 14.5 14.6 13.8 15.3 15.1 15.1 15 16.4 16.4 16.2
Spain 61.6 151.2 205.9 222.2 233.2 241.6 257.9 277.2 288.9 295.5 301.8 311.1 291 295.3
Sweden 66.5 146 154.8 145.2 161.6 146.7 135.4 151.7 158.4 143.3 148.8 149.9 136.6 152.8
Switzerland 31.2 55 68.7 66.1 71.1 65.5 65.4 63.9 57.8 62.1 66.4 67 66.7 66.6
Turkey 9.8 57.5 116.4 124.9 122.7 129.4 140.6 150.7 162 176.3 191.6 198.4 194.8 211.2
UnitedKingdom 255.8 317.8 365.3 374.4 382.4 384.6 395.5 391.3 395.4 393.4 393 384.6 372 378.1
UnitedStates 1703.4 3202.8 3873.6 4025.9 3838.8 4026.4 4054.6 4148.1 4268.9 4275 4323.9 4343 4165.4 4337.1
EU27total .. 2567.8 2914.3 2996.7 3077.5 3099 3187.5 3254.2 3274.5 3318.9 3333.4 3339.4 3178.3 ..
OECDtotal 3836.9 7629.3 9343.3 9726.9 9607.5 9888 9982.6 10252.7 10516.6 10590.3 10790.9 10809.8 10403.1 10772.2
Brazil 51.6 222.8 334.7 349.2 328.2 346 365.3 387.9 403.4 419.9 445.8 463.4 466.5 ..
China 138.4 621.2 1239.8 1356.2 1472.4 1641.4 1908.5 2201 2499.7 2864.3 3276.3 3458.8 3695.9 ..
India 66.4 289.4 536.6 561.2 579.9 597.3 634 666.6 698.2 753.2 813.9 843.3 899.4 ..
Indonesia 1.8 32.7 85.8 93.4 101.4 108.3 114.1 121.3 127.8 132.7 140.9 148.4 155.5 ..
RussianFederation .. 1082.2 845.3 876.5 889.3 889.3 914.3 929.9 951.2 993.9 1013.4 1038.4 990 ..
SouthAfrica 54.6 165.4 200.4 207.8 208.2 215.7 231.2 240.9 242.1 250.9 260.5 255.5 246.8 ..
World 5245 11819.1 14708.1 15403.4 15511.9 16114.5 16701.2 17490.9 18256.4 18960.6 19801.7 20164 20052.8 ..

Solution

#loading libraries
import pandas as pd

df = pd.read_csv(\"energy.csv\",na_values=\"..\")
df.fillna(df.mean(),inplace=True)
df.drop(df.index[[34,35,42]],inplace=True)

This what i have so far #loading libraries import numpy as np import csv with open(\'energy.csv\', \'rb\') as csvfile: lines = csv.reader(csvfile) for row in li
This what i have so far #loading libraries import numpy as np import csv with open(\'energy.csv\', \'rb\') as csvfile: lines = csv.reader(csvfile) for row in li
This what i have so far #loading libraries import numpy as np import csv with open(\'energy.csv\', \'rb\') as csvfile: lines = csv.reader(csvfile) for row in li
This what i have so far #loading libraries import numpy as np import csv with open(\'energy.csv\', \'rb\') as csvfile: lines = csv.reader(csvfile) for row in li

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site