pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
We can analyze data in pandas with:
Series
DataFrames
Series:
Series is one dimensional(1-D) array defined in pandas that can be used to store any data type.
Code #1: Creating Series
# Program to create series import pandas as pd # Import Panda Library # Create series with Data, and Index a = pd.Series(Data, index = Index)
Here, Data can be:
A Scalar value which can be integerValue, string
A Python Dictionary which can be Key, Value pair
A Ndarray
Code #2: When Data contains scalar values
# Program to Create series with scalar values Data =[1,3,4,5,6,2,9] # Numeric data # Creating series with default index values s = pd.Series(Data)print(s)# predefined index values Index =['a','b','c','d','e','f','g'] # Creating series with predefined index values si = pd.Series(Data, Index)print(si)
OUTPUT:
Code #3: When Data contains Dictionary
# Program to Create Dictionary series dictionary ={'a':1,'b':2,'c':3,'d':4,'e':5}# Creating series of Dictionary type sd = pd.Series(dictionary)print(sd)
OUTPUT:
Code #4:When Data contains Ndarray
# Program to Create ndarray series Data =[[2,3,4], [5,6,7]] # Defining 2darray # Creating series of 2darray snd = pd.Series(Data)print(snd)
OUTPUT:
# We import Pandas as pd into Pythonimport pandas as pd# We create a Pandas Series that stores a grocery listgroceries = pd.Series(data = [30, 6, 'Yes','No'], index = ['eggs', 'apples', 'milk', 'bread'])# We display the Groceries Pandas Seriesprint(groceries)
eggs 30
apples 6
milk Yes
bread No
dtype: object
# We print some information about Groceriesprint('Groceries has shape:', groceries.shape)print('Groceries has dimension:', groceries.ndim)print('Groceries has a total of', groceries.size, 'elements')
Groceries has shape: (4,)
Groceries has dimension: 1
Groceries has a total of 4 elements
# We print the index and data of Groceriesprint('The data in Groceries is:', groceries.values)print('The index of Groceries is:', groceries.index)
The data in Groceries is: [30 6 'Yes' 'No']
The index of Groceries is: Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')
# We check whether bananas is a food item (an index) in Groceriesx ='bananas'in groceries# We check whether bread is a food item (an index) in Groceriesy ='bread'in groceries# We print the resultsprint('Is bananas an index label in Groceries:', x)print('Is bread an index label in Groceries:', y)
Is bananas an index label in Groceries: False
Is bread an index label in Groceries: True
# We access elements in Groceries using index labels:# We use a single index labelprint('How many eggs do we need to buy:', groceries['eggs'])print()# we can access multiple index labelsprint('Do we need milk and bread:\n', groceries[['milk', 'bread']])print()
How many eggs do we need to buy: 30
Do we need milk and bread:
milk Yes
bread No
dtype: object
We can also delete items from a Pandas Series by using the .drop() method. The Series.drop(label) method removes the given label from the given Series. We should note that the Series.drop(label) method drops elements from the Series out of place, meaning that it doesn't change the original Series being modified. Let's see how this works:
# We display the original grocery list
print('Original Grocery List:\n', groceries)
# We remove apples from our grocery list. The drop function removes elements out of place
print()
print('We remove apples (out of place):\n', groceries.drop('apples'))
# When we remove elements out of place the original Series remains intact. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples out of place:\n', groceries)
Original Grocery List:
eggs 30
apples 6
milk Yes
bread No
dtype: object
We remove apples (out of place):
eggs 30
milk Yes
bread No
dtype: object
Grocery List after removing apples out of place:
eggs 30
apples 6
milk Yes
bread No
dtype: object
We can delete items from a Pandas Series in place by setting the keyword inplace to True in the .drop() method. Let's see an example:
# We display the original grocery list
print('Original Grocery List:\n', groceries)
# We remove apples from our grocery list in place by setting the inplace keyword to True
groceries.drop('apples', inplace = True)
# When we remove elements in place the original Series its modified. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples in place:\n', groceries)
Original Grocery List:
eggs 30
apples 6
milk Yes
bread No
dtype: object
Grocery List after removing apples in place:
eggs 30
milk Yes
bread No
dtype: object
Arithmetic Operations on Pandas Series
Just like with NumPy ndarrays, we can perform element-wise arithmetic operations on Pandas Series. In this lesson we will look at arithmetic operations between Pandas Series and single numbers. Let's create a new Pandas Series that will hold a grocery list of just fruits.
# We create a Pandas Series that stores a grocery list of just fruits
fruits= pd.Series(data = [10, 6, 3,], index = ['apples', 'oranges', 'bananas'])
# We display the fruits Pandas Series
fruits
apples 10
oranges 6
bananas 3
dtype: int64
We can now modify the data in fruits by performing basic arithmetic operations. Let's see some examples
# We print fruits for reference
print('Original grocery list of fruits:\n ', fruits)
# We perform basic element-wise operations using arithmetic symbols
print()
print('fruits + 2:\n', fruits + 2) # We add 2 to each item in fruits
print()
print('fruits - 2:\n', fruits - 2) # We subtract 2 to each item in fruits
print()
print('fruits * 2:\n', fruits * 2) # We multiply each item in fruits by 2
print()
print('fruits / 2:\n', fruits / 2) # We divide each item in fruits by 2
print()
Original grocery list of fruits:
apples 10
oranges 6
bananas 3
dtype: int64
You can also apply mathematical functions from NumPy, such assqrt(x), to all elements of a Pandas Series.
# We import NumPy as np to be able to use the mathematical functions
import numpy as np
# We print fruits for reference
print('Original grocery list of fruits:\n', fruits)
# We apply different mathematical functions to all elements of fruits
print()
print('EXP(X) = \n', np.exp(fruits))
print()
print('SQRT(X) =\n', np.sqrt(fruits))
print()
print('POW(X,2) =\n',np.power(fruits,2)) # We raise all elements of fruits to the power of 2
Original grocery list of fruits:
apples 10
oranges 6
bananas 3
dtype: int64
EXP(X) =
apples 22026.465795
oranges 403.428793
bananas 20.085537
dtype: float64
SQRT(X) =
apples 3.162278
oranges 2.449490
bananas 1.732051
dtype: float64
POW(X,2) =
apples 100
oranges 36
bananas 9
dtype: int64
DataFrames:
DataFrames is two-dimensional(2-D) data structure defined in pandas which consists of rows and columns.
Code #1: Creation of DataFrame
# Program to Create DataFrame import pandas as pd # Import Library a = pd.DataFrame(Data)# Create DataFrame with Data
Here, Data can be:
One or more dictionaries
One or more Series
2D-numpy Ndarray
Code #2: When Data is Dictionaries
# Program to Create Data Frame with two dictionaries dict1 ={'a':1,'b':2,'c':3,'d':4}# Define Dictionary 1 dict2 ={'a':5,'b':6,'c':7,'d':8,'e':9}# Define Dictionary 2 Data ={'first':dict1,'second':dict2}# Define Data with dict1 and dict2 df = pd.DataFrame(Data)# Create DataFrameprint(df)
OUTPUT:
Code #3: When Data is Series
# Program to create Dataframe of three series import pandas as pd s1 = pd.Series([1, 3, 4, 5, 6, 2, 9])# Define series 1 s2 = pd.Series([1.1, 3.5, 4.7, 5.8, 2.9, 9.3])# Define series 2 s3 = pd.Series(['a', 'b', 'c', 'd', 'e'])# Define series 3 Data ={'first':s1,'second':s2,'third':s3}# Define Data dfseries = pd.DataFrame(Data)# Create DataFrame print(dfseries)
OUTPUT:
Code #4: When Data is 2D-numpy ndarray:
Note: One constraint has to be maintained while creating DataFrame of 2D arrays – Dimensions of 2D array must be same
# Program to create DataFrame from 2D array import pandas as pd # Import Library d1 =[[2,3,4], [5,6,7]] # Define 2d array 1 d2 =[[2,4,8], [1,3,9]] # Define 2d array 2 Data ={'first': d1,'second': d2}# Define Data df2d = pd.DataFrame(Data)# Create DataFrameprint(df2d)