Pandas

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

We can analyze data in pandas with:

  1. Series

  2. DataFrames

Series:

Series is one dimensional(1-D) array defined in pandas that can be used to store any data type.

Code #1: Creating Series

# Program to create series 
import pandas as pd  # Import Panda Library 
  
# Create series with Data, and Index 
a = pd.Series(Data, index = Index)  

Here, Data can be:

  1. A Scalar value which can be integerValue, string

  2. A Python Dictionary which can be Key, Value pair

  3. A Ndarray

Code #2: When Data contains scalar values

# Program to Create series with scalar values  
Data =[1, 3, 4, 5, 6, 2, 9]  # Numeric data 
  
# Creating series with default index values 
s = pd.Series(Data)  
print(s)   
  
# predefined index values 
Index =['a', 'b', 'c', 'd', 'e', 'f', 'g']  
  
# Creating series with predefined index values 
si = pd.Series(Data, Index)  
print(si)

OUTPUT:

Code #3: When Data contains Dictionary

# Program to Create Dictionary series 
dictionary ={'a':1, 'b':2, 'c':3, 'd':4, 'e':5}  
  
# Creating series of Dictionary type 
sd = pd.Series(dictionary) 
print(sd)

OUTPUT:

Code #4:When Data contains Ndarray

# Program to Create ndarray series 
Data =[[2, 3, 4], [5, 6, 7]]  # Defining 2darray 
  
# Creating series of 2darray 
snd = pd.Series(Data)
print(snd)

OUTPUT:

# We import Pandas as pd into Python
import pandas as pd

# We create a Pandas Series that stores a grocery list
groceries = pd.Series(data = [30, 6, 'Yes','No'], index = ['eggs', 'apples', 'milk', 'bread'])

# We display the Groceries Pandas Series
print(groceries)

eggs 30

apples 6

milk Yes

bread No

dtype: object

# We print some information about Groceries
print('Groceries has shape:', groceries.shape)
print('Groceries has dimension:', groceries.ndim)
print('Groceries has a total of', groceries.size, 'elements')

Groceries has shape: (4,)

Groceries has dimension: 1

Groceries has a total of 4 elements

# We print the index and data of Groceries
print('The data in Groceries is:', groceries.values)
print('The index of Groceries is:', groceries.index)

The data in Groceries is: [30 6 'Yes' 'No']

The index of Groceries is: Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')

# We check whether bananas is a food item (an index) in Groceries
x = 'bananas' in groceries

# We check whether bread is a food item (an index) in Groceries
y = 'bread' in groceries

# We print the results
print('Is bananas an index label in Groceries:', x)
print('Is bread an index label in Groceries:', y)

Is bananas an index label in Groceries: False

Is bread an index label in Groceries: True

# We access elements in Groceries using index labels:

# We use a single index label
print('How many eggs do we need to buy:', groceries['eggs'])
print()

# we can access multiple index labels
print('Do we need milk and bread:\n', groceries[['milk', 'bread']]) 
print()

How many eggs do we need to buy: 30

Do we need milk and bread:

milk Yes

bread No

dtype: object

We can also delete items from a Pandas Series by using the .drop() method. The Series.drop(label) method removes the given label from the given Series. We should note that the Series.drop(label) method drops elements from the Series out of place, meaning that it doesn't change the original Series being modified. Let's see how this works:

# We display the original grocery list
print('Original Grocery List:\n', groceries)

# We remove apples from our grocery list. The drop function removes elements out of place
print()
print('We remove apples (out of place):\n', groceries.drop('apples'))

# When we remove elements out of place the original Series remains intact. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples out of place:\n', groceries)

Original Grocery List: eggs 30 apples 6 milk Yes bread No dtype: object

We remove apples (out of place): eggs 30 milk Yes bread No dtype: object

Grocery List after removing apples out of place: eggs 30 apples 6 milk Yes bread No dtype: object

We can delete items from a Pandas Series in place by setting the keyword inplace to True in the .drop() method. Let's see an example:

# We display the original grocery list
print('Original Grocery List:\n', groceries)

# We remove apples from our grocery list in place by setting the inplace keyword to True
groceries.drop('apples', inplace = True)

# When we remove elements in place the original Series its modified. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples in place:\n', groceries)

Original Grocery List: eggs 30 apples 6 milk Yes bread No dtype: object

Grocery List after removing apples in place: eggs 30 milk Yes bread No dtype: object

Arithmetic Operations on Pandas Series

Just like with NumPy ndarrays, we can perform element-wise arithmetic operations on Pandas Series. In this lesson we will look at arithmetic operations between Pandas Series and single numbers. Let's create a new Pandas Series that will hold a grocery list of just fruits.

# We create a Pandas Series that stores a grocery list of just fruits
fruits= pd.Series(data = [10, 6, 3,], index = ['apples', 'oranges', 'bananas'])

# We display the fruits Pandas Series
fruits

apples 10 oranges 6 bananas 3 dtype: int64

We can now modify the data in fruits by performing basic arithmetic operations. Let's see some examples

# We print fruits for reference
print('Original grocery list of fruits:\n ', fruits)

# We perform basic element-wise operations using arithmetic symbols
print()
print('fruits + 2:\n', fruits + 2) # We add 2 to each item in fruits
print()
print('fruits - 2:\n', fruits - 2) # We subtract 2 to each item in fruits
print()
print('fruits * 2:\n', fruits * 2) # We multiply each item in fruits by 2 
print()
print('fruits / 2:\n', fruits / 2) # We divide each item in fruits by 2
print()

Original grocery list of fruits: apples 10 oranges 6 bananas 3 dtype: int64

fruits + 2: apples 12 oranges 8 bananas 5 dtype: int64

fruits - 2: apples 8 oranges 4 bananas 1 dtype: int64

fruits * 2: apples 20 oranges 12 bananas 6 dtype: int64

fruits / 2: apples 5.0 oranges 3.0 bananas 1.5 dtype: float64

You can also apply mathematical functions from NumPy, such assqrt(x), to all elements of a Pandas Series.

# We import NumPy as np to be able to use the mathematical functions
import numpy as np

# We print fruits for reference
print('Original grocery list of fruits:\n', fruits)

# We apply different mathematical functions to all elements of fruits
print()
print('EXP(X) = \n', np.exp(fruits))
print() 
print('SQRT(X) =\n', np.sqrt(fruits))
print()
print('POW(X,2) =\n',np.power(fruits,2)) # We raise all elements of fruits to the power of 2

Original grocery list of fruits: apples 10 oranges 6 bananas 3 dtype: int64

EXP(X) = apples 22026.465795 oranges 403.428793 bananas 20.085537 dtype: float64

SQRT(X) = apples 3.162278 oranges 2.449490 bananas 1.732051 dtype: float64

POW(X,2) = apples 100 oranges 36 bananas 9 dtype: int64

DataFrames:

DataFrames is two-dimensional(2-D) data structure defined in pandas which consists of rows and columns.

Code #1: Creation of DataFrame

# Program to Create DataFrame 
import pandas as pd   # Import Library 
  
a = pd.DataFrame(Data)  # Create DataFrame with Data

Here, Data can be:

  1. One or more dictionaries

  2. One or more Series

  3. 2D-numpy Ndarray

Code #2: When Data is Dictionaries

# Program to Create Data Frame with two dictionaries 
dict1 ={'a':1, 'b':2, 'c':3, 'd':4}        # Define Dictionary 1 
dict2 ={'a':5, 'b':6, 'c':7, 'd':8, 'e':9} # Define Dictionary 2 
Data = {'first':dict1, 'second':dict2}  # Define Data with dict1 and dict2 
df = pd.DataFrame(Data)  # Create DataFrame
print(df)

OUTPUT:

Code #3: When Data is Series

# Program to create Dataframe of three series  
import pandas as pd 
  
s1 = pd.Series([1, 3, 4, 5, 6, 2, 9])           # Define series 1 
s2 = pd.Series([1.1, 3.5, 4.7, 5.8, 2.9, 9.3]) # Define series 2 
s3 = pd.Series(['a', 'b', 'c', 'd', 'e'])     # Define series 3 
  
  
Data ={'first':s1, 'second':s2, 'third':s3} # Define Data 
dfseries = pd.DataFrame(Data)              # Create DataFrame 
print(dfseries)

OUTPUT:

Code #4: When Data is 2D-numpy ndarray:

Note: One constraint has to be maintained while creating DataFrame of 2D arrays – Dimensions of 2D array must be same

# Program to create DataFrame from 2D array 
import pandas as pd # Import Library 
d1 =[[2, 3, 4], [5, 6, 7]] # Define 2d array 1 
d2 =[[2, 4, 8], [1, 3, 9]] # Define 2d array 2 
Data ={'first': d1, 'second': d2} # Define Data  
df2d = pd.DataFrame(Data)    # Create DataFrame
print(df2d)

OUTPUT:

Last updated