In a world where nearly all employees rely on spreadsheets, teams across industries are increasingly turning to automation to streamline repetitive workflows and empower staff to focus on more strategic tasks.
In this article, we delve into 7 key elements of Python Excel automation, covering everything from creating and modifying Excel files to templating, formatting cells, embedding charts, and extracting vital information. Whether you're a newcomer to Python-driven Excel automation or seeking to refine existing processes, this guide offers invaluable insights and strategies. Let's dive in.
How to choose a Python Excel library
Choosing the right Python library to automate Excel processes largely depends on your specific project needs. Let’s look at three of the most popular libraries along with their pros and cons:
Pandas
Primarily a data manipulation and analysis library, the Pandas library offers support for importing and analyzing Excel data via its read_excel function. It provides a straightforward API and excels in handling large datasets across various file formats.
Pros | Cons |
---|---|
|
|
XlsxWriter
This library is specifically designed for creating new .xlsx files using Python. It offers extensive formatting options and features like conditional formatting, charts, and images.
Pros | Cons |
---|---|
|
|
Openpyxl
Unlike XlsxWriter, with Openpyxl, both reading and writing data to Excel files using Python is possible. While it provides comprehensive features for Excel file interaction, it does perform slower in writing operations compared to XlsxWriter.
Pros | Cons |
---|---|
|
|
XlsxTemplate
This library is a bit different than the others we’ve looked at. Instead of using Python code to generate an Excel output file, you create a template in Excel that contains placeholders that will be filled with data during the automation process.
Pros | Cons |
---|---|
|
|
7 key elements of Python Excel automation
Now that we’ve reviewed some of the most popular Python libraries for Excel automation, let’s dive into the top 7 excel automation tasks you can perform using Python:
Create Excel spreadsheets
Creating Excel documents with Python libraries is foundational to many automation projects. There are two main methods to create Excel documents with Python:
Method 1: Generating Excel documents from scratch with Python
Libraries like XlsxWriter allow you to create new Excel spreadsheets from scratch with Python code. This method works well if you prefer to customize each new Excel sheet without templates and generally if you prefer to work in Python code.
Pros | Cons |
---|---|
|
|
Method 2: Generating Excel documents from templates
On the other hand, libraries like xlsx-template are built around templating (which we’ll get to in a later section of this article). Instead of building out automation using Python, you manage most of the work using a single Excel file template that contains a series of tags.
Pros | Cons |
---|---|
|
|
Modify an Excel file
Python isn’t just for generating new files; it can also be used to modify existing files. Here are some of the key operations you can perform on an existing Excel sheet:
- Edit existing data in cells, rows, and columns
- Add new data into your worksheets
- Delete data from rows, columns, and cells
- Manipulate functions and write Excel formulas
- Change cell formatting and styles
PRO TIP: Understanding how content is managed in Excel is crucial to understanding how Excel documents are represented in Python. For instance, when using Openpyxl:
- The Workbook object represents the Excel file.
- Each Workbook can contain multiple sheets that are represented by the Worksheet object.
- Every Worksheet object consists of Rows and Columns, represented as lists in Python.
- Cells in particular can be accessed and modified, granting you the ability to get or modify the data each cell contains.
Embed and combine documents
Python's capabilities extend beyond simple spreadsheet manipulation, allowing for advanced integration and modification of Excel docs. For example, with Python libraries you can embed other file types, like PDFs, directly into the spreadsheet.
Using Python, you can also merge two Excel files into a consolidated spreadsheet or split a large file into several smaller ones. From there you could analyze the data of the merged file or perform other computations.
Templating with Excel files
Templating using Python, especially for Excel automation, ushers in a way to produce highly customizable and dynamic Excel spreadsheets. Templating languages like Jinja2, when used with Python Excel libraries, can alter the content, formatting styles, and even structure of an Excel file based on input data or business rules.
A standout option for templating is xlsxTemplate. Operating in a similar mode as if Excel were a powerful layout engine (which it actually is), it allows you to harness Excel formula, layout, and formatting features by mixing it with the logic control offered by templating engines. Let’s look at an example.
Example 1: Use a template to add text
Python code creates an Excel file (template.xlsx) with placeholder values enclosed in double curly brackets, similar to {{ placeholder }}.
Then, use Python scripts to replace these placeholders with actual data.
from Openpyxl import load_workbook
# Load Excel template
wb = load_workbook('template.xlsx')
sheet = wb.active
# Define the actual data
data = {'placeholder': 'Hello, Python Excel Automation!'}
# Replace cell value placeholders with actual data
for row in sheet.iter_rows():
for cell in row:
if cell.value is not None and isinstance(cell.value, str):
for key in data.keys():
if '{{' + key + '}}' in cell.value:
cell.value = cell.value.replace('{{' + key + '}}', data[key])
wb.save('output.xlsx')
In the example above, the python script reads the Excel file, scans each cell for placeholders, and if found, replaces them with the actual data.
Format Excel cells
Python libraries offer various options to format the cells within your Excel spreadsheet. This allows you to apply styles and changes that enhance the presentation of your document and make it more understandable and appealing. Here are some options for cell formatting you can accomplish with libraries like Openpyxl:
Example 1: Modify font styles
Openpyxl has tools to modify the font properties like name, size, color, bold, italic, underline, and more. You can highlight important information or make your text aesthetically pleasing.
from openpyxl.styles import Font, Color
cell = ws['A1']
cell.value = "Bold and Italic Font"
cell.font = Font(bold=True, italic=True, color="FF0000")
Example 2: Manage number formats
Manage the way numbers are displayed in cells. You can set format codes to control number precision, insert dollar signs, represent percentages, etc.
cell = ws['A5']
cell.value = 0.25
cell.number_format = '0.00%'
Below is a screenshot which incorporates all styling methods mentioned above.
Embed charts
Much like with Python Word automation, Python libraries offer a way to embed charts directly into Excel files. The process looks different depending on the library you work with. There are two main options we’ll focus on here. XlsxWriter offers more advanced features and customization options, making it suitable for complex charting requirements. On the other hand, Openpyxl provides a simpler interface for basic charting needs and is more intuitive for beginners.
XlsxWriter | Openpyxl |
---|---|
|
|
Now let’s look at an example code snippet using these two libraries:
Example 1: Insert charts with XlsxWriter
import XlsxWriter
# Create a new Excel workbook
workbook = XlsxWriter.Workbook('charts_with_XlsxWriter.xlsx')
worksheet = workbook.add_worksheet()
# Write some data to add to the chart
data = [10, 40, 50, 20, 10]
worksheet.write_column('A1', data)
# Create a chart object
chart = workbook.add_chart({'type': 'line'})
# Configure the series for the chart
chart.add_series({'values': '=Sheet1!$A$1:$A$5'})
# Insert the chart into the worksheet
worksheet.insert_chart('C1', chart)
# Close the workbook
workbook.close()
Example 2: Insert charts with Openpyxl
from openpyxl import Workbook
from openpyxl.chart import LineChart, Reference
# Create a new workbook
wb = Workbook()
ws = wb.active
# Add data to the worksheet
data = [10, 40, 50, 20, 10]
for i, value in enumerate(data, start=1):
ws.cell(row=i, column=1, value=value)
# Create a chart
chart = LineChart()
chart.add_data(Reference(ws, min_col=1, min_row=1, max_col=1, max_row=len(data)))
# Add the chart to the worksheet
ws.add_chart(chart, "C1")
# Save the workbook
wb.save("charts_with_openpyxl.xlsx")
PRO TIP: In addition to XlsxWriter and Openpyxl, there are other libraries available for adding charts to Excel in Python. While these may not be as widely used or feature-rich as the two main contenders, they offer alternative options for specific use cases:
- Pandas: Although Pandas itself does not directly add charts to Excel, you can save Matplotlib or Seaborn plots as images and then insert them into Excel with Pandas.
- Xlwings: While its primary focus is on integrating Python with Excel for automation and data analysis, it also provides functionality for adding charts to Excel workbooks. Xlwings may be preferred by users who prefer working directly within Excel for chart creation and manipulation.
- Plotly: While it is primarily used for creating interactive charts in web applications or Jupyter notebooks, Plotly can also be used to generate static images of charts, which can then be inserted into a new Excel file using XlsxWriter or Openpyxl.
Extract information
Python libraries, such as Openpyxl and Pandas, provide robust capabilities to traverse through rows and columns of a sheet, or pivot tables, to export data and further process the required Excel data.This feature gets extremely useful when working with bulky datasets or documents filled with numerical data.
Example 1: Read and extract data using Pandas
In this example we use Pandas to open the specified Excel document using the Pandas read_excel function, which then reads all the data and prints it. If the document contains multiple sheets, you might want to specify the sheet name or index in the read_excel function.
import Pandas as pd
def read_excel_data(filename):
data = pd.read_excel(filename)
return data
print(read_excel_data('test.xlsx'))
PRO TIP: Developers and data analysts are not just limited to extracting and printing the data. Once you have access to it, you could perform a plethora of tasks like:
- Clean, filter and process the Excel data
- Use Pandas or other Python modules for data analysis
- Leverage Matplotlib for data visualization
- Apply Sklearn for Machine Learning tasks
Excel automation with SoftKraft
If you’re looking for a development team to bring your document processing vision to life, we’d love to help. We offer Python development outsourcing that simplify the implementation process, enabling you to achieve business results without the hassle. Our team will guide you in selecting the right Python library, planning development, and building an end-to-end solution that perfectly aligns with your business requirements.
Conclusion
Python's ability to seamlessly interface with Microsoft Excel opens up a vast array of possibilities in terms of automating document creation, manipulation, and enhancement. With the strategies outlined in this article, you can take your Python Excel automation to the next level and start building out automation that improves efficiency and streamlines operations.