Pandas
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
We can analyze data in pandas with:
Series
DataFrames
Series:
Series is one dimensional(1-D) array defined in pandas that can be used to store any data type.
Code #1: Creating Series
# Program to create series
import pandas as pd # Import Panda Library
# Create series with Data, and Index
a = pd.Series(Data, index = Index)
Here, Data can be:
A Scalar value which can be integerValue, string
A Python Dictionary which can be Key, Value pair
A Ndarray
Code #2: When Data contains scalar values
# Program to Create series with scalar values
Data =[1, 3, 4, 5, 6, 2, 9] # Numeric data
# Creating series with default index values
s = pd.Series(Data)
print(s)
# predefined index values
Index =['a', 'b', 'c', 'd', 'e', 'f', 'g']
# Creating series with predefined index values
si = pd.Series(Data, Index)
print(si)
OUTPUT:


Code #3: When Data contains Dictionary
# Program to Create Dictionary series
dictionary ={'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
# Creating series of Dictionary type
sd = pd.Series(dictionary)
print(sd)
OUTPUT:

Code #4:When Data contains Ndarray
# Program to Create ndarray series
Data =[[2, 3, 4], [5, 6, 7]] # Defining 2darray
# Creating series of 2darray
snd = pd.Series(Data)
print(snd)
OUTPUT:

# We import Pandas as pd into Python
import pandas as pd
# We create a Pandas Series that stores a grocery list
groceries = pd.Series(data = [30, 6, 'Yes','No'], index = ['eggs', 'apples', 'milk', 'bread'])
# We display the Groceries Pandas Series
print(groceries)
eggs 30
apples 6
milk Yes
bread No
dtype: object
# We print some information about Groceries
print('Groceries has shape:', groceries.shape)
print('Groceries has dimension:', groceries.ndim)
print('Groceries has a total of', groceries.size, 'elements')
Groceries has shape: (4,)
Groceries has dimension: 1
Groceries has a total of 4 elements
# We print the index and data of Groceries
print('The data in Groceries is:', groceries.values)
print('The index of Groceries is:', groceries.index)
The data in Groceries is: [30 6 'Yes' 'No']
The index of Groceries is: Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')
# We check whether bananas is a food item (an index) in Groceries
x = 'bananas' in groceries
# We check whether bread is a food item (an index) in Groceries
y = 'bread' in groceries
# We print the results
print('Is bananas an index label in Groceries:', x)
print('Is bread an index label in Groceries:', y)
Is bananas an index label in Groceries: False
Is bread an index label in Groceries: True
# We access elements in Groceries using index labels:
# We use a single index label
print('How many eggs do we need to buy:', groceries['eggs'])
print()
# we can access multiple index labels
print('Do we need milk and bread:\n', groceries[['milk', 'bread']])
print()
How many eggs do we need to buy: 30
Do we need milk and bread:
milk Yes
bread No
dtype: object
We can also delete items from a Pandas Series by using the .drop()
method. The Series.drop(label)
method removes the given label
from the given Series
. We should note that the Series.drop(label)
method drops elements from the Series out of place, meaning that it doesn't change the original Series being modified. Let's see how this works:
# We display the original grocery list
print('Original Grocery List:\n', groceries)
# We remove apples from our grocery list. The drop function removes elements out of place
print()
print('We remove apples (out of place):\n', groceries.drop('apples'))
# When we remove elements out of place the original Series remains intact. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples out of place:\n', groceries)
Original Grocery List: eggs 30 apples 6 milk Yes bread No dtype: object
We remove apples (out of place): eggs 30 milk Yes bread No dtype: object
Grocery List after removing apples out of place: eggs 30 apples 6 milk Yes bread No dtype: object
We can delete items from a Pandas Series in place by setting the keyword inplace
to True
in the .drop()
method. Let's see an example:
# We display the original grocery list
print('Original Grocery List:\n', groceries)
# We remove apples from our grocery list in place by setting the inplace keyword to True
groceries.drop('apples', inplace = True)
# When we remove elements in place the original Series its modified. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples in place:\n', groceries)
Original Grocery List: eggs 30 apples 6 milk Yes bread No dtype: object
Grocery List after removing apples in place: eggs 30 milk Yes bread No dtype: object
Arithmetic Operations on Pandas Series
Just like with NumPy ndarrays, we can perform element-wise arithmetic operations on Pandas Series. In this lesson we will look at arithmetic operations between Pandas Series and single numbers. Let's create a new Pandas Series that will hold a grocery list of just fruits.
# We create a Pandas Series that stores a grocery list of just fruits
fruits= pd.Series(data = [10, 6, 3,], index = ['apples', 'oranges', 'bananas'])
# We display the fruits Pandas Series
fruits
apples 10 oranges 6 bananas 3 dtype: int64
We can now modify the data in fruits by performing basic arithmetic operations. Let's see some examples
# We print fruits for reference
print('Original grocery list of fruits:\n ', fruits)
# We perform basic element-wise operations using arithmetic symbols
print()
print('fruits + 2:\n', fruits + 2) # We add 2 to each item in fruits
print()
print('fruits - 2:\n', fruits - 2) # We subtract 2 to each item in fruits
print()
print('fruits * 2:\n', fruits * 2) # We multiply each item in fruits by 2
print()
print('fruits / 2:\n', fruits / 2) # We divide each item in fruits by 2
print()
Original grocery list of fruits: apples 10 oranges 6 bananas 3 dtype: int64
fruits + 2: apples 12 oranges 8 bananas 5 dtype: int64
fruits - 2: apples 8 oranges 4 bananas 1 dtype: int64
fruits * 2: apples 20 oranges 12 bananas 6 dtype: int64
fruits / 2: apples 5.0 oranges 3.0 bananas 1.5 dtype: float64
You can also apply mathematical functions from NumPy, such assqrt(x)
, to all elements of a Pandas Series.
# We import NumPy as np to be able to use the mathematical functions
import numpy as np
# We print fruits for reference
print('Original grocery list of fruits:\n', fruits)
# We apply different mathematical functions to all elements of fruits
print()
print('EXP(X) = \n', np.exp(fruits))
print()
print('SQRT(X) =\n', np.sqrt(fruits))
print()
print('POW(X,2) =\n',np.power(fruits,2)) # We raise all elements of fruits to the power of 2
Original grocery list of fruits: apples 10 oranges 6 bananas 3 dtype: int64
EXP(X) = apples 22026.465795 oranges 403.428793 bananas 20.085537 dtype: float64
SQRT(X) = apples 3.162278 oranges 2.449490 bananas 1.732051 dtype: float64
POW(X,2) = apples 100 oranges 36 bananas 9 dtype: int64
DataFrames:
DataFrames is two-dimensional(2-D) data structure defined in pandas which consists of rows and columns.
Code #1: Creation of DataFrame
# Program to Create DataFrame
import pandas as pd # Import Library
a = pd.DataFrame(Data) # Create DataFrame with Data
Here, Data can be:
One or more dictionaries
One or more Series
2D-numpy Ndarray
Code #2: When Data is Dictionaries
# Program to Create Data Frame with two dictionaries
dict1 ={'a':1, 'b':2, 'c':3, 'd':4} # Define Dictionary 1
dict2 ={'a':5, 'b':6, 'c':7, 'd':8, 'e':9} # Define Dictionary 2
Data = {'first':dict1, 'second':dict2} # Define Data with dict1 and dict2
df = pd.DataFrame(Data) # Create DataFrame
print(df)
OUTPUT:

Code #3: When Data is Series
# Program to create Dataframe of three series
import pandas as pd
s1 = pd.Series([1, 3, 4, 5, 6, 2, 9]) # Define series 1
s2 = pd.Series([1.1, 3.5, 4.7, 5.8, 2.9, 9.3]) # Define series 2
s3 = pd.Series(['a', 'b', 'c', 'd', 'e']) # Define series 3
Data ={'first':s1, 'second':s2, 'third':s3} # Define Data
dfseries = pd.DataFrame(Data) # Create DataFrame
print(dfseries)
OUTPUT:

Code #4: When Data is 2D-numpy ndarray:
Note: One constraint has to be maintained while creating DataFrame of 2D arrays – Dimensions of 2D array must be same
# Program to create DataFrame from 2D array
import pandas as pd # Import Library
d1 =[[2, 3, 4], [5, 6, 7]] # Define 2d array 1
d2 =[[2, 4, 8], [1, 3, 9]] # Define 2d array 2
Data ={'first': d1, 'second': d2} # Define Data
df2d = pd.DataFrame(Data) # Create DataFrame
print(df2d)
OUTPUT:

Last updated
Was this helpful?