Images: Flatart |
Excel is one of the most common productivity tools used in any business or personal computing environment. It provides an easy to use interface that allows quick and easy analysis of data.
It’s no wonder then, that there is a genuine use case for data generated in Python to be converted into some kind of format, for example CSV, that can be imported into Excel for further analysis.
Of course you could work with the data directly in Python, using a tool such as Pandas, but often this can overcomplicate the data analysis goals that can easily be achieved using Excel.
When streamlining or automating these workflows, it can be useful to be able to interact directly with an Excel workbook from within Python. Conversely, extracting data from Excel to be used in calculations or data analysis in Python, is also possible.
I have often used Python to create automation of workflows, such as exporting Python data to Excel, reading data from multiple Excel data sources in order to consolidate information, or reformatting unstructured Excel data sources to a more usable layout to perform further analysis. (to learn how to use Python for automation, check out this book: Automate the Boring Stuff with Python [commission paid link])
With Python’s easy to use syntax, many Excel tasks that would normally require complicated macros and Visual Basic for Applications (VBA) code, can be achieved with much less effort using an appropriate Python script.
There are a few Python packages available that can interact with Excel, but my favourite is the openpyxl library.
Before we can get started with some examples, make sure that Python is installed and then install openpyxl with the following command: pip install openpyxl
Create a new file (for example, openpyxl_example.py
), and import the openpyxl
module on the first line:
import openpyxl
First, we create a new workbook instance and select the active sheet:
# Create a new workbook instance and select the active sheet wb = openpyxl.Workbook() sheet = wb.active
For our test data, we will create a Python dictionary of superheroes and their respective real names:
# dict of superhero data superheroes = { 'Captain America':'Steve Rogers', 'Iron Man':'Tony Stark', 'Spiderman':'Peter Parker', 'Hulk':'Bruce Banner', 'Superman':'Clark Kent', 'Batman':'Bruce Wayne', 'Wonder Woman':'Princess Diana'}
The data will be written in tabulated format to the active Excel workbook object contained in the variable sheet
, but first we write the table headings to individual cells, using the cell reference, and set the font style to bold:
# Write 'Superhero' heading to cell A1 and set font style to bold sheet['A1'].value = 'Superhero' sheet['A1'].font = openpyxl.styles.Font(bold=True) # Write 'Real Name' heading to cell A1 and set font style to bold sheet['B1'].value = 'Real Name' sheet['B1'].font = openpyxl.styles.Font(bold=True)
Next, we use a loop to iterate through the dictionary (starting at row 2) and write each key and value pair to the respective fields in a row in the workbook, ordered alphabetically by superhero name:
# Starting at row 2, write data to sheet (alphabetically by key) row = 2 for key in sorted(superheroes.keys()): sheet.cell(column=1, row=row, value=key) sheet.cell(column=2, row=row, value=superheroes[key]) row += 1
Notice how we now use the column and row numbering based on the absolute cell reference location (so ‘A2’ is in row 2, column 1) to identify the cells that are to be populated.
Finally, we save the workbook data to a file:
# Save the file myFile = 'heroes.xlsx' wb.save(filename = myFile)
If you open the newly created file ‘heroes.xlsx’, the workbook should contain the data and headers that we have written to Excel.
Data can also be read back from an existing workbook.
To open the workbook and create new workbook and sheet objects to read in the data, we use the following code:
# open active sheet in the workbook and create new workbook and sheet objects wb1 = openpyxl.load_workbook(myFile) sheet1 = wb1.active
To test the concept, we read back the third row of the workbook:
# Read back data from the worksheet by cell reference print ('{}tt{}'.format(sheet1['A3'].value, sheet1['B3'].value)) print
The output should be as follows:
Captain America Steve Rogers
We can also iterate through all the populated rows in the workbook and read in the data. First we create an iterator over sheet1
and place it in the variable sheet_rows
. We then use a loop to read and print the data for each row, taking care to create fixed length spacing for columns to make the output easier to read:
# Read back all the rows in the worksheet sheet_rows = sheet1.iter_rows() # (warning - iter_rows starts counting at 0) for row in sheet_rows: print('{}{}{}'.format(row[0].value, ' '*(20-len(row[0].value)), row[1].value))
The output of the above should be as follows:
Superhero Real Name Batman Bruce Wayne Captain America Steve Rogers Hulk Bruce Banner Iron Man Tony Stark Spiderman Peter Parker Superman Clark Kent Wonder Woman Princess Diana
The above code only covers a few straightforward examples, which barely touch the surface of what can be achieved with openpyxl
, but it is clearly a very powerful package that can be used in creative ways to interact with Excel.
The completed example code can be downloaded from my Github page.