How to read excel (xls, xlsx) files in python using pandas

Excel file is one of the most popular files in the world because it is easy and very instinctive and user-friendly which makes it ideal for manipulating large datasets even with less technical skill. And everyone who uses a computer at one time or the other has come across and worked with excel spreadsheets. This post will explain how to use Pandas to work with spreadsheets using pandas.read_excel and pandas.ExcelFile.

At the end of this post, you will have the knowledge of:

  • How to get all of the sheet-name from an excel file
  • How to read an excel file as a Pandas DataFrame in Python (pandas.read_excel and pandas.ExcelFile)
  • How to get column’s names of an excel spreadsheet
  • How to find the total number of rows and columns of an excel spreadsheet

But before we start, you need to provide an excel file sample below:

Excel File Sample (Sheet name: MyComm, Sheet1 & Sheet2)

*My excel file name “Sample001.xlsx” and have 3 sheets (MyComm, Sheet1 & Sheet2). You can create the same excel file or you can do it with your own excel file.

First: Read Excel file with pandas.read_excel

For the purpose of easy to use, we’ll wrap the core method of pandas.read_excel to the class below:

import pandas as pd

class CReadExcel: 
    #excel=io: str, bytes, ExcelFile, xlrd.Book, path object, or file-like object
    #sheet_name: str, int, list, or None, default 0
    #header: int, list of int, default 0
    def __init__ (self, excel, sheet_name=0, header=0): 
        self.dataframe = pd.read_excel(excel, sheet_name=sheet_name, header=header)
        self.rows = self.dataframe.shape[0]
        self.cols = self.dataframe.shape[1]
        self.columnNames = self.dataframe.columns

Below is the explanation of the code:

  • Call ‘pandas.read_excel(excel_file_name, sheet_name, header)’ to create excel file object from the file specified in excel, sheet_name (case sensitive) and header parameters above and assign it to self.dataframe.
  • For the total number of rows, we’ll get from self.dataframe.shape[0] and for columns we’ll get from self.dataframe.shape[1] that we assign to self.rows and self.cols respectively.
  • For the spreadsheet column’s names, we can get from self.dataframe.columns and assign to self.columnNames.

Let’s test it:

import pandas as pd

class CReadExcel: 
    #excel=io: str, bytes, ExcelFile, xlrd.Book, path object, or file-like object
    #sheet_name: str, int, list, or None, default 0
    #header: int, list of int, default 0
    def __init__ (self, excel, sheet_name=0, header=0): 
        self.dataframe = pd.read_excel(excel, sheet_name=sheet_name, header=header)
        self.rows = self.dataframe.shape[0]
        self.cols = self.dataframe.shape[1]
        self.columnNames = self.dataframe.columns

file_path = r"Samples\Sample001.xlsx"

mycomm = CReadExcel(file_path, 'MyComm')
print("DataFrame: \r\n", mycomm.dataframe)
print("Rows: ", mycomm.rows)
print("Cols: ", mycomm.cols)
print("Column Names: ", mycomm.columnNames)

The output:

Second: Read Excel file with pandas.ExcelFile

We’ll wrap the core method ‘pandas.ExcelFile’ with two classes, CExcelFile and CSheet:

import pandas as pd

class CSheet: 
    def __init__(self, dataframe):
        self.dataframe = dataframe
        self.rows = self.dataframe.shape[0]
        self.cols = self.dataframe.shape[1]
        self.columnNames = self.dataframe.columns
        
class CExcelFile: 
    #excel_file_path=io: string, path object 
    def __init__(self, excel_file_path): 
        self.excel = pd.ExcelFile(excel_file_path)
        
    def sheetNames(self): 
        return self.excel.sheet_names
    
    def sheet(self, sheet_name): 
        return CSheet(self.excel.parse(sheet_name))

Below is the explanation of the code:

  • Call ‘pandas.ExcelFile (excel_file_path)’ to create excel file object from the file specified in excel and assign it to self.dataframe.
  • For the total number of rows, we’ll get from self.dataframe.shape[0] and for columns we’ll get from self.dataframe.shape[1] that we assign to self.rows and self.cols respectively.
  • For the spreadsheet column’s names, we can get from self.dataframe.columns and assign to self.columnNames.
  • At this class CExcelFile, we can get all of the spreadsheets name by return it from self.excel.sheet_names
  • CExcelFile is also return CSheet class to save all sheet properties like number of rows, cols, columnNames and DataFrame

Let’s test it:

import pandas as pd

class CSheet: 
    def __init__(self, dataframe):
        self.dataframe = dataframe
        self.rows = self.dataframe.shape[0]
        self.cols = self.dataframe.shape[1]
        self.columnNames = self.dataframe.columns
        
class CExcelFile: 
    #excel_file_path=io: string, path object 
    def __init__(self, excel_file_path): 
        self.excel = pd.ExcelFile(excel_file_path)
        
    def sheetNames(self): 
        return self.excel.sheet_names
    
    def sheet(self, sheet_name): 
        return CSheet(self.excel.parse(sheet_name))

    
file_path = r"Samples\Sample001.xlsx"

excel = CExcelFile(file_path)
print('Sheet Names: ', excel.sheetNames())

mycomm = excel.sheet('MyComm') #('Sheet1')
print('DataFrame:\r\n', mycomm.dataframe)
print('Rows: ', mycomm.rows)
print('Cols: ', mycomm.cols)
print('Column Names: ', mycomm.columnNames)

The output:

Conclusion

In this tutorial you just saw how to import an Excel file into Python using Pandas (pandas.read_excel and pandas.ExcelFile), get all of the sheet-name, get the column’s name of an excel spreadsheet, and find the total number of rows and columns of an excel spreadsheet. You may also check the Pandas Documentation to find out more about the different options that you can use.

How to implement CReadExcel or (CExcelFile and CSheet) class, you can see on these posts:

Leave a comment