Pyexcel
Pyexcel
Pyexcel
Release 0.7.0
1 Introduction 3
Index 189
i
ii
pyexcel, Release 0.7.0
Author C.W.
Source code http://github.com/pyexcel/pyexcel.git
Issues http://github.com/pyexcel/pyexcel/issues
License New BSD License
Released 0.7.0
Generated Jul 16, 2023
Contents 1
pyexcel, Release 0.7.0
2 Contents
CHAPTER 1
Introduction
pyexcel provides one application programming interface to read, manipulate and write data in various excel formats.
This library makes information processing involving excel files an enjoyable task. The data in excel files can be turned
into array or dict with minimal code and vice versa. This library focuses on data processing using excel files as storage
media hence fonts, colors and charts were not and will not be considered.
The idea originated from the common usability problem: when an excel file driven web application is delivered for
non-developer users (ie: team assistant, human resource administrator etc). The fact is that not everyone knows (or
cares) about the differences between various excel formats: csv, xls, xlsx are all the same to them. Instead of training
those users about file formats, this library helps web developers to handle most of the excel file formats by providing
a common programming interface. To add a specific excel file format type to you application, all you need is to install
an extra pyexcel plugin. Hence no code changes to your application and no issues with excel file formats any more.
Looking at the community, this library and its associated ones try to become a small and easy to install alternative to
Pandas.
3
pyexcel, Release 0.7.0
4 Chapter 1. Introduction
CHAPTER 2
If your company has embedded pyexcel and its components into a revenue generating product, please support me on
github, patreon or bounty source to maintain the project and develop it further.
If you are an individual, you are welcome to support me too and for however long you feel like. As my backer, you
will receive early access to pyexcel related contents.
And your issues will get prioritized if you would like to become my patreon as pyexcel pro user.
With your financial support, I will be able to invest a little bit more time in coding, documentation and writing
interesting posts.
2.1 Installation
Name Age
Adam 28
Beatrice 29
Ceri 30
Dean 26
you can easily save it into an excel file using the following code:
5
pyexcel, Release 0.7.0
If you are dealing with big data, please consider these usages:
Since 2020, all pyexcel-io plugins have dropped the support for python versions which are lower than 3.6. If you want
to use any of those Python versions, please use pyexcel-io and its plugins versions that are lower than 0.6.0.
Except csv files, xls, xlsx and ods files are a zip of a folder containing a lot of xml files
The dedicated readers for excel files can stream read
In order to manage the list of plugins installed, you need to use pip to add or remove a plugin. When you use virtualenv,
you can have different plugins per virtual environment. In the situation where you have multiple plugins that does
the same thing in your environment, you need to tell pyexcel which plugin to use per function call. For example,
pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr. You need to append get_array(. . . ,
library=’pyexcel-odsr’).
2.4 Usage
2.5 Design
2.5.1 Introduction
This section introduces Excel data models, its representing data structures and provides an overview of formatting,
transformation, manipulation supported by pyexcel.
When dealing with excel files, pyexcel pay attention to three primary objects: cell, sheet and book.
A book contains one or more sheets and a sheet is consisted of a sheet name and a two dimensional array of cells.
Although a sheet can contain charts and a cell can have formula, styling properties, this library ignores them and only
pays attention to the data in the cell and its data type. So, in the context of this library, the definition of those three
concepts are:
2.4. Usage 9
pyexcel, Release 0.7.0
Data source
A data source is a storage format of structured data. The most popular data source is an excel file. Libre Of-
fice/Microsoft Excel can easily be used to generate an excel file of your desired format. Besides a physical file,
this library recognizes three additional types of source:
1. Excel files in computer memory. For example: when a file is uploaded to a Python server for information
processing. If it is relatively small, it can be stored in memory.
2. Database tables. For example: a client would like to have a snapshot of some database table in an excel file and
asks it to be sent to him.
3. Python structures. For example: a developer may have scraped a site and have stored data in Python array or
dictionary. He may want to save this information as a file.
Reading from - and writing to - a data source is modelled as parsers and renderers in pyexcel. Excel data sources and
database sources support read and write. Other data sources may only support read only, or write only methods.
Here is a list of data sources:
Data format
This library and its plugins support most of the frequently used excel file formats.
Data transformation
Often a developer would like to have excel data imported into a Python data structure. This library supports the
conversions from previous three data source to the following list of data structures, and vice versa.
Data manipulation
The main operation on a cell involves cell access, formatting and cleansing. The main operation on a sheet involves
group access to a row or a column; data filtering; and data transformation. The main operation in a book is obtain
access to individual sheets.
Data transcoding
For various reasons the data in one format needs to be transcoded into another. This library provides a transcoding
tunnel for data transcoding between supported file formats.
Data visualization
2.5. Design 11
pyexcel, Release 0.7.0
2.5. Design 13
pyexcel, Release 0.7.0
This library provides one application programming interface to read data from one of the following data sources:
• physical file
• memory file
• SQLAlchemy table
• Django Model
• Python data structures: dictionary, records and array
and to transform them into one of the following data structures:
• two dimensional array
• a dictionary of one dimensional arrays
• a list of dictionaries
• a dictionary of two dimensional arrays
• a Sheet
• a Book
Python data can be handled well using lists, dictionaries and various mixture of both. This library provides four
module level functions to help you obtain excel data in these data structures. Please refer to “A list of module level
functions”, the first three functions operates on any one sheet from an excel book and the fourth one returns all data in
all sheets in an excel book.
See also:
• get_an_array_from_an_excel_sheet
However, you will need to call free_resource() to make sure file handles are closed.
In cases where the excel data needs custom manipulations, a pyexcel user got a few choices: one is to use Sheet and
Book, the other is to look for more sophisticated ones:
• Pandas, for numerical analysis
• Do-it-yourself
Functions Returns
get_sheet() Sheet
get_book() Book
For all six functions, you can pass on the same command parameters while the return value is what the function says.
This library provides one application programming interface to transform them into one of the data structures:
• two dimensional array
• a (ordered) dictionary of one dimensional arrays
• a list of dictionaries
• a dictionary of two dimensional arrays
• a Sheet
• a Book
and write to one of the following data sources:
• physical file
• memory file
• SQLAlchemy table
• Django Model
2.5. Design 15
pyexcel, Release 0.7.0
Functions Description
save_as() Works well with single sheet file
isave_as() Works well with big data files
save_book_as()
Works with multiple sheet file and big data files
isave_book_as()
Works with multiple sheet file and big data files
If you would only use these two functions to do format transcoding, you may enjoy a speed boost using isave_as()
and isave_book_as(), because they use yield keyword and minimize memory footprint. However, you will need
to call free_resource() to make sure file handles are closed. And save_as() and save_book_as() reads
all data into memory and will make all rows the same width.
See also:
• How to save an python array as an excel file
• How to save a dictionary of two dimensional array as an excel file
• How to save an python array as a csv file with special delimiter
Data transportation/transcoding
This library is capable of transporting your data between any of the following data sources:
• physical file
• memory file
• SQLAlchemy table
• Django Model
• Python data structures: dictionary, records and array
See also:
• How to import an excel sheet to a database using SQLAlchemy
• How to open an xls file and save it as xlsx
• How to open an xls file and save it as csv
2.5.3 Architecture
pyexcel uses loosely couple plugins to fullfil the promise to access various file formats. lml is the plugin management
library that provide the specialized support for the loose coupling.
The components of pyexcel is designed as building blocks. For your project, you can cherry-pick the file format
support without affecting the core functionality of pyexcel. Each plugin will bring in additional dependences. For
example, if you choose pyexcel-xls, xlrd and xlwt will be brought in as 2nd level depndencies.
Looking at the following architectural diagram, pyexcel hosts plugin interfaces for data source, data renderer and data
parser. pyexcel-pygal, pyexcel-matplotlib, and pyexcel-handsontable extend pyexcel using data renderer interface.
pyexcel-io package takes away the responsibilities to interface with excel libraries, for example: xlrd, openpyxl,
ezodf.
As in A list of file formats supported by external plugins, there are overlapping capabilities in reading and writing xlsx,
ods files. Because each third parties express different personalities although they may read and write data in the same
file format, you as the pyexcel is left to pick which suit your task best.
Dotted arrow means the package or module is loaded later.
This section shows you how to get data from your excel files and how to export data to excel files in one line
Get a dictionary
And check:
>>> isinstance(book_dict, OrderedDict)
True
>>> import json
>>> for key, item in book_dict.items():
... print(json.dumps({key: item}))
{"Most Expensive Violins": [["Name", "Estimated Value", "Location"], ["Messiah
˓→Stradivarious", "$ 20,000,000", "Ashmolean Museum in Oxford, England"], [
˓→"Vieuxtemps Guarneri", "$ 16,000,000", "On loan to Anne Akiko Meyers"], ["Lady Blunt
˓→"]]}
Write data
Export an array
>>> p.get_sheet(file_name="example.xls")
pyexcel_sheet1:
+----------------------------+----------------------------+---------------------------
˓→-+------------------+
| G | D | A
˓→ | E |
+----------------------------+----------------------------+---------------------------
˓→-+------------------+
+----------------------------+----------------------------+---------------------------
˓→-+------------------+
>>> p.save_as(array=data,
... dest_file_name="example.csv",
... dest_delimiter=':')
>>> records = [
... {"year": 1903, "country": "Germany", "speed": "206.7km/h"},
... {"year": 1964, "country": "Japan", "speed": "210km/h"},
... {"year": 2008, "country": "China", "speed": "350km/h"}
... ]
>>> p.save_as(records=records, dest_file_name='high_speed_rail.xls')
>>> henley_on_thames_facts = {
... "area": "5.58 square meters",
... "population": "11,619",
... "civial parish": "Henley-on-Thames",
... "latitude": "51.536",
... "longitude": "-0.898"
... }
>>> p.save_as(adict=henley_on_thames_facts, dest_file_name='henley.xlsx')
>>> ccs_insights = {
... "year": ["2017", "2018", "2019", "2020", "2021"],
... "smart phones": [1.53, 1.64, 1.74, 1.82, 1.90],
... "feature phones": [0.46, 0.38, 0.30, 0.23, 0.17]
... }
>>> p.save_as(adict=ccs_insights, dest_file_name='ccs.csv')
>>> p.save_book_as(
... bookdict=a_dictionary_of_two_dimensional_arrays,
... dest_file_name="book.xls"
... )
If you want to preserve the order of sheets in your dictionary, you have to pass on an ordered dictionary to the function
itself. For example:
Please notice that “Sheet 2” is the first item in the book_dict, meaning the order of sheets are preserved.
Transcoding
Note: Please note that pyexcel-cli can perform file transcoding at command line. No need to open your editor, save
the problem, then python run.
The following code does a simple file format transcoding from xls to csv:
Note: Please note that csv(comma separate value) file is pure text file. Formula, charts, images and formatting in xls
file will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding
job.
Merge all excel files in directory into a book where each file become a sheet
The following code will merge every excel files into one file, say “output.xls”:
from pyexcel.cookbook import merge_all_to_a_book
import glob
merge_all_to_a_book(glob.glob("your_csv_directory\*.csv"), "output.xls")
You can mix and match with other excel formats: xls, xlsm and ods. For example, if you are sure you have only xls,
xlsm, xlsx, ods and csv files in your_excel_file_directory, you can do the following:
from pyexcel.cookbook import merge_all_to_a_book
import glob
merge_all_to_a_book(glob.glob("your_excel_file_directory\*.*"), "output.xls")
Suppose you have many sheets in a work book and you would like to separate each into a single sheet excel file. You
can easily do this:
>>> from pyexcel.cookbook import split_a_book
>>> split_a_book("megabook.xls", "output.xls")
>>> import glob
>>> outputfiles = glob.glob("*_output.xls")
(continues on next page)
for the output file, you can specify any of the supported formats
Suppose you just want to extract one sheet from many sheets that exists in a work book and you would like to separate
it into a single sheet excel file. You can easily do this:
for the output file, you can specify any of the supported formats
When you are dealing with BIG excel files, you will want pyexcel to use constant memory.
This section shows you how to get data from your BIG excel files and how to export data to excel files in two lines at
most, without eating all your computer memory.
Please do not forgot the second line to close the opened file handle:
>>> p.free_resources()
>>> p.free_resources()
Export an array
But the following line is not required because the data source are not file sources:
>>> # p.free_resources()
>>> p.get_sheet(file_name="example.xls")
pyexcel_sheet1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
>>> p.isave_as(array=data,
... dest_file_name="example.csv",
... dest_delimiter=':')
>>> records = [
... {"year": 1903, "country": "Germany", "speed": "206.7km/h"},
... {"year": 1964, "country": "Japan", "speed": "210km/h"},
... {"year": 2008, "country": "China", "speed": "350km/h"}
... ]
>>> p.isave_as(records=records, dest_file_name='high_speed_rail.xls')
>>> henley_on_thames_facts = {
... "area": "5.58 square meters",
... "population": "11,619",
... "civial parish": "Henley-on-Thames",
... "latitude": "51.536",
... "longitude": "-0.898"
... }
>>> p.isave_as(adict=henley_on_thames_facts, dest_file_name='henley.xlsx')
>>> ccs_insights = {
... "year": ["2017", "2018", "2019", "2020", "2021"],
... "smart phones": [1.53, 1.64, 1.74, 1.82, 1.90],
... "feature phones": [0.46, 0.38, 0.30, 0.23, 0.17]
... }
>>> p.isave_as(adict=ccs_insights, dest_file_name='ccs.csv')
>>> p.free_resources()
>>> a_dictionary_of_two_dimensional_arrays = {
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
... ],
... 'Sheet 2':
(continues on next page)
>>> p.isave_book_as(
... bookdict=a_dictionary_of_two_dimensional_arrays,
... dest_file_name="book.xls"
... )
If you want to preserve the order of sheets in your dictionary, you have to pass on an ordered dictionary to the function
itself. For example:
Please notice that “Sheet 2” is the first item in the book_dict, meaning the order of sheets are preserved.
Note: Please note that the following file transcoding could be with zero line. Please install pyexcel-cli and you will
do the transcode in one command. No need to open your editor, save the problem, then python run.
The following code does a simple file format transcoding from xls to csv:
Note: Please note that csv(comma separate value) file is pure text file. Formula, charts, images and formatting in xls
file will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding
job.
The following libraries are written to facilitate the daily import and export of excel data.
framework plugin/middleware/extension
Flask Flask-Excel
Django django-excel
Pyramid pyramid-excel
You can find a real world example in examples/memoryfile/ directory: pyexcel_server.py. Here is the example snippet
1 def upload():
2 if request.method == 'POST' and 'excel' in request.files:
(continues on next page)
request.files[‘excel’] in line 4 holds the file object. line 5 finds out the file extension. line 13 obtains a sheet instance.
line 15 uses the first row as data header. line 17 sends the json representation of the excel file back to client browser.
1 data = [
2 [...],
3 ...
4 ]
5
6 @app.route('/download')
7 def download():
8 sheet = pe.Sheet(data)
9 output = make_response(sheet.csv)
10 output.headers["Content-Disposition"] = "attachment; filename=export.csv"
11 output.headers["Content-type"] = "text/csv"
12 return output
There exist a few data renderers for pyexcel data. This chapter will walk you through them.
With pyexcel-text, you can get pyexcel data in newline delimited json, normal json and other formats.
sphinxcontrib-excel help you present your excel data in various formats inside your sphinx documentation.
pyexcel-pygal helps you with all charting options and give you charts in svg format.
pyexcel-echarts draws 2D, 3D, geo charts from pyexcel data and has awesome animations too, but it is under develop-
ment.
pyexcel-matplotlib helps you with scientific charts and is under development.
2.6.5 Sheet
The sheet api here is much less powerful than pandas DataFrame when the array is of significant size. To be honesty,
pandas DataFrame is much more powerful and provide rich data manipulation apis. When would you consider the
sheet api here? if your data manipulation steps are basic and your data volume is not high, you can use them.
Random access
sheet[row, column]
or:
sheet['A1']
The former syntax is handy when you know the row and column numbers. The latter syntax is introduced to help you
convert the excel column header such as “AX” to integer numbers.
Suppose you have the following data, you can get value 5 by reader[2, 2].
Here is the example code showing how you can randomly access a cell:
Note: In order to set a value to a cell, please use sheet[row_index, column_index] = new_value
or sheet[‘A1’] = new_value
>>> sheet.row[1]
['a', 1, 2, 3]
>>> sheet.column[2]
['Y', 2, 5, 8]
Use custom names instead of index Alternatively, it is possible to use the first row to refer to each columns:
>>> sheet.name_columns_by_row(0)
>>> print(sheet[1, "Y"])
5
>>> sheet[1, "Y"] = 100
>>> print(sheet[1, "Y"])
100
You have noticed the row index has been changed. It is because first row is taken as the column names, hence all rows
after the first row are shifted. Now accessing the columns are changed too:
>>> sheet.column['Y']
[2, 100, 8]
>>> sheet.column['Y'][1]
100
>>> sheet.name_rows_by_column(0)
>>> sheet.row["b"][1]
100
For the same reason, the row index has been reduced by 1. Since we have named columns and rows, it is possible to
access the same cell like this:
Maybe you want to get only the data without the column headers. You can call rows() instead:
>>> list(sheet.rows())
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
attributes
Attributes:
>>> import pyexcel
>>> content = "1,2,3\n3,4,5"
>>> sheet = pyexcel.get_sheet(file_type="csv", file_content=content)
>>> sheet.tsv
'1\t2\t3\r\n3\t4\t5\r\n'
>>> print(sheet.simple)
csv:
- - -
1 2 3
3 4 5
- - -
What’s more, you could as well set value to an attribute, for example::
>>> import pyexcel
>>> content = "1,2,3\n3,4,5"
>>> sheet = pyexcel.Sheet()
>>> sheet.csv = content
>>> sheet.array
[[1, 2, 3], [3, 4, 5]]
You can get the direct access to underneath stream object. In some situation, it is desired:
>>> stream = sheet.stream.tsv
The returned stream object has tsv formatted content for reading.
What you could further do is to set a memory stream of any supported file format to a sheet. For example:
>>> another_sheet = pyexcel.Sheet()
>>> another_sheet.xls = sheet.xls
>>> another_sheet.content
(continues on next page)
Yet, it is possible assign a absolute url to an online excel file to an instance of pyexcel.Sheet.
custom attributes
You can pass on source specific parameters to getter and setter functions.
Data manipulation
The data in a sheet is represented by Sheet which maintains the data as a list of lists. You can regard Sheet as a
two dimensional array with additional iterators. Random access to individual column and row is exposed by Column
and Row
Column manipulation
And you want to update Column 2 with these data: [11, 12, 13]
>>> sheet
pyexcel sheet:
+----------+----------+
| Column 1 | Column 3 |
+==========+==========+
| 1 | 7 |
+----------+----------+
| 2 | 8 |
+----------+----------+
| 3 | 9 |
+----------+----------+
Continue from previous example. Suppose you want add two more columns to the data file
Column 4 Column 5
10 13
11 14
12 15
>>> extra_data = [
... ["Column 4", "Column 5"],
... [10, 13],
... [11, 14],
... [12, 15]
... ]
>>> sheet2 = pyexcel.Sheet(extra_data)
>>> sheet3 = sheet.column + sheet2
>>> sheet3.column["Column 4"]
[10, 11, 12]
>>> sheet3.column["Column 5"]
[13, 14, 15]
Please note above column plus statement will not update original sheet instance, as pyexcel user demanded:
>>> sheet
pyexcel sheet:
+----------+----------+
| Column 1 | Column 3 |
+==========+==========+
| 1 | 7 |
+----------+----------+
| 2 | 8 |
+----------+----------+
| 3 | 9 |
+----------+----------+
>>> sheet
pyexcel sheet:
+----------+----------+----------+----------+
| Column 1 | Column 3 | Column 4 | Column 5 |
+==========+==========+==========+==========+
| 1 | 7 | 10 | 13 |
+----------+----------+----------+----------+
| 2 | 8 | 11 | 14 |
+----------+----------+----------+----------+
| 3 | 9 | 12 | 15 |
+----------+----------+----------+----------+
>>> data = [
... ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'],
... [1,2,3,4,5,6,7,9],
... ]
>>> sheet = pyexcel.Sheet(data, name_columns_by_row=0)
>>> sheet
pyexcel sheet:
+---+---+---+---+---+---+---+---+
| a | b | c | d | e | f | g | h |
+===+===+===+===+===+===+===+===+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 |
+---+---+---+---+---+---+---+---+
And you want to remove columns named as: ‘a’, ‘c, ‘e’, ‘h’. This is how you do it:
Row manipulation
Formatting
Previous section has assumed the data is in the format that you want. In reality, you have to manipulate the data types
a bit to suit your needs. Hence, formatters comes into the scene. use format() to apply formatter immediately.
Note: int, float and datetime values are automatically detected in csv files since pyexcel version 0.2.2
As you can see, userid column is of int type. Next, let’s convert the column to string format:
Sometimes, the data in a spreadsheet may have unwanted strings in all or some cells. Let’s take an example. Suppose
we have a spread sheet that contains all strings but it as random spaces before and after the text values. Some field had
weird characters, such as “ “:
>>> data = [
... [" Version", " Comments", " Author "],
... [" v0.0.1 ", " Release versions"," Eda"],
... [" v0.0.2 ", "Useful updates ", " Freud"]
... ]
>>> sheet = pyexcel.Sheet(data)
>>> sheet.content
+-----------------+------------------------------+----------------------+
| Version | Comments | Author |
+-----------------+------------------------------+----------------------+
| v0.0.1 | Release versions | Eda |
+-----------------+------------------------------+----------------------+
| v0.0.2 | Useful updates | Freud |
+-----------------+------------------------------+----------------------+
.. code-block:: python
.. code-block:: python
>>> sheet.map(cleanse_func)
>>> sheet.content
+---------+------------------+--------+
| Version | Comments | Author |
+---------+------------------+--------+
| v0.0.1 | Release versions | Eda |
+---------+------------------+--------+
| v0.0.2 | Useful updates | Freud |
+---------+------------------+--------+
Data filtering
You may want to filter odd rows and print them in an array of dictionaries:
>>> sheet.filter(column_indices=[1])
>>> sheet.content
+----------+----------+
| Column 1 | Column 3 |
+==========+==========+
| 4 | 6 |
+----------+----------+
>>> sheet.save_as("example_series_filter.xls")
Column 1 Column 3
2 8
Suppose you have the following data in a sheet and you want to remove those rows with blanks:
You can use pyexcel.filters.RowValueFilter, which examines each row, return True if the row should be
filtered out. So, let’s define a filter function:
2.6.6 Book
book[sheet_index][row, column]
or:
book["sheet_name"][row, column]
Tip: With pyexcel, you can regard single sheet as an two dimensional array and multi-sheet excel book as an ordered
dictionary of two dimensional arrays.
>>> content = {
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
(continues on next page)
Get content
>>> book_dict = {
... 'Sheet 2':
... [
... ['X', 'Y', 'Z'],
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0]
... ],
... 'Sheet 3':
... [
... ['O', 'P', 'Q'],
... [3.0, 2.0, 1.0],
... [4.0, 3.0, 2.0]
... ],
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
... ]
... }
>>> book = pyexcel.get_book(bookdict=book_dict)
>>> book
(continues on next page)
You can get the direct access to underneath stream object. In some situation, it is desired.
The returned stream object has the content formatted in plain format for further reading.
Set content
Suppose you have two excel books and each had three sheets. You can merge them and get a new book:
You also can merge individual sheets:
Suppose you want to merge many csv files row by row into a new sheet.
import pyexcel
book = pyexcel.get_book(file_name="yourfile.xls")
for sheet in book:
# do you processing with sheet
# do filtering?
pass
book.save_as("output.xls")
What would happen if I save a multi sheet book into “csv” file
Well, you will get one csv file per each sheet. Suppose you have these code:
>>> content = {
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
... ],
... 'Sheet 2':
... [
... ['X', 'Y', 'Z'],
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0]
... ],
... 'Sheet 3':
... [
... ['O', 'P', 'Q'],
... [3.0, 2.0, 1.0],
... [4.0, 3.0, 2.0]
... ]
... }
>>> book = pyexcel.Book(content)
>>> book.save_as("myfile.csv")
and their content is the value of the dictionary at the corresponding key
Alternatively, you could use save_book_as() function
After I have saved my multiple sheet book in csv format, how do I get them back
First of all, you can read them back individual as csv file using meth:~pyexcel.get_sheet method. Secondly, the pyexcel
can do the magic to load all of them back into a book. You will just need to provide the common name before the
separator “__”:
Note: You can find the complete code of this example in examples folder on github
Before going ahead, let’s import the needed components and initialize sql engine and table base:
>>> import os
>>> import pyexcel as p
>>> from sqlalchemy import create_engine
>>> from sqlalchemy.ext.declarative import declarative_base
>>> from sqlalchemy import Column , Integer, String, Float, Date
>>> from sqlalchemy.orm import sessionmaker
>>> engine = create_engine("sqlite:///birth.db")
>>> Base = declarative_base()
>>> Session = sessionmaker(bind=engine)
>>> Base.metadata.create_all(engine)
Done it. It is that simple. Let’s verify what has been imported to make sure.
Warning: The pyexcel DOES NOT consider Fonts, Styles, Formulas and Charts at all. When you load a stylish
excel and update it, you definitely will lose all those styles.
Meanwhile, a tab separated file can be read as csv too. You can specify a delimiter parameter.
>>> with open('tab_example.csv', 'w') as f:
... unused = f.write('I\tam\ttab\tseparated\tcsv\n') # for passing doctest
... unused = f.write('You\tneed\tdelimiter\tparameter\n') # unused is added
>>> sheet = p.get_sheet(file_name="tab_example.csv", delimiter='\t')
>>> sheet
tab_example.csv:
+-----+------+-----------+-----------+-----+
| I | am | tab | separated | csv |
+-----+------+-----------+-----------+-----+
| You | need | delimiter | parameter | |
+-----+------+-----------+-----------+-----+
Suppose you want to update the last row of the example file as:
[‘N/A’, ‘N/A’, ‘N/A’]
Here is the sample code:
.. code-block:: python
How about the same alternative solution to previous row based example? Well, you’d better to have the following kind
of data:
And then you want to update “Row 3” with for example:
Excel files in memory can be manipulated directly without saving it to physical disk and vice versa. This is useful in
excel file handling at file upload or in excel file download. For example:
Since version 0.3.0, each supported file types became an attribute of the Sheet and Book class. What it means is that:
1. Read the content in memory
2. Set the content in memory
For example, after you have your Sheet and Book instance, you could access its content in a support file type by using
its dot notation. The code in previous section could be rewritten as:
You can find a real world example in examples/memoryfile/ directory: pyexcel_server.py. Here is the example snippet
1 def upload():
2 if request.method == 'POST' and 'excel' in request.files:
3 # handle file upload
4 filename = request.files['excel'].filename
5 extension = filename.split(".")[-1]
6 # Obtain the file extension and content
7 # pass a tuple instead of a file name
8 content = request.files['excel'].read()
9 if sys.version_info[0] > 2:
10 # in order to support python 3
11 # have to decode bytes to str
12 content = content.decode('utf-8')
13 sheet = pe.get_sheet(file_type=extension, file_content=content)
14 # then use it as usual
15 sheet.name_columns_by_row(0)
16 # respond with a json
17 return jsonify({"result": sheet.dict})
18 return render_template('upload.html')
request.files[‘excel’] in line 4 holds the file object. line 5 finds out the file extension. line 13 obtains a sheet instance.
line 15 uses the first row as data header. line 17 sends the json representation of the excel file back to client browser.
1 data = [
2 [...],
3 ...
4 ]
5
6 @app.route('/download')
7 def download():
8 sheet = pe.Sheet(data)
9 output = make_response(sheet.csv)
10 output.headers["Content-Disposition"] = "attachment; filename=export.csv"
11 output.headers["Content-type"] = "text/csv"
12 return output
Relevant packages
Readily made plugins have been made on top of this example. Here is a list of them:
framework plugin/middleware/extension
Flask Flask-Excel
Django django-excel
Pyramid pyramid-excel
>>> pyexcel.get_sheet(file_name="example.xls")
pyexcel_sheet1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
If you want to preserve the order of sheets in your dictionary, you have to pass on an ordered dictionary to the function
itself. For example:
Please notice that “Sheet 2” is the first item in the book_dict, meaning the order of sheets are preserved.
Note: You can find the complete code of this example in examples folder on github
Before going ahead, let’s import the needed components and initialize sql engine and table base:
>>> Base.metadata.create_all(engine)
Done it. It is that simple. Let’s verify what has been imported to make sure.
Note: Please note that csv(comma separate value) file is pure text file. Formula, charts, images and formatting in xls
file will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding
job.
Warning: Formula, charts, images and formatting in xls file will disappear as pyexcel does not support Formula,
charts, images and formatting.
How to open a xls multiple sheet excel book and save it as csv
Well, you write similar codes as before but you will need to use save_book_as() function.
Since version 0.3.0, the data source becomes an attribute of the pyexcel native classes. All support data format is a dot
notation away.
For sheet
Get content
What’s more, you could as well set value to an attribute, for example:
You can get the direct access to underneath stream object. In some situation, it is desired.
The returned stream object has tsv formatted content for reading.
Set content
What you could further do is to set a memory stream of any supported file format to a sheet. For example:
Yet, it is possible assign a absolute url to an online excel file to an instance of pyexcel.Sheet.
>>> another_sheet.content
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
For book
Get content
>>> book_dict = {
... 'Sheet 2':
... [
... ['X', 'Y', 'Z'],
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0]
... ],
... 'Sheet 3':
... [
... ['O', 'P', 'Q'],
... [3.0, 2.0, 1.0],
... [4.0, 3.0, 2.0]
... ],
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
... ]
... }
>>> book = pyexcel.get_book(bookdict=book_dict)
>>> book
(continues on next page)
You can get the direct access to underneath stream object. In some situation, it is desired.
The returned stream object has the content formatted in plain format for further reading.
Set content
>>> another_book
(continues on next page)
You can pass on source specific parameters to getter and setter functions.
When you are dealing with huge amount of data, e.g. 64GB, obviously you would not like to fill up your memory with
those data. What you may want to do is, record data from Nth line, take M records and stop. And you only want to
use your memory for the M records, not for beginning part nor for the tail part.
Hence partial read feature is developed to read partial data into memory for processing.
You can paginate by row, by column and by both, hence you dictate what portion of the data to read back. But
remember only row limit features help you save memory. Let’s you use this feature to record data from Nth column,
take M number of columns and skip the rest. You are not going to reduce your memory footprint.
‘pyexcel-xls‘_ (xlrd), ‘pyexcel-xlsx‘_ (openpyxl), ‘pyexcel-ods‘_ (odfpy) and ‘pyexcel-ods3‘_ (pyexcel-ezodf) will
read all data into memory. Because xls, xlsx and ods file are effective a zipped folder, all four will unzip the folder and
read the content in xml format in full, so as to make sense of all details.
Hence, during the partial data is been returned, the memory consumption won’t differ from reading the whole data
back. Only after the partial data is returned, the memory comsumption curve shall jump the cliff. So pagination code
here only limits the data returned to your program.
With that said, ‘pyexcel-xlsxr‘_, ‘pyexcel-odsr‘_ and ‘pyexcel-htmlr‘_ DOES read partial data into memory. Those
three are implemented in such a way that they consume the xml(html) when needed. When they have read designated
portion of the data, they stop, even if they are half way through.
In addition, pyexcel’s csv readers can read partial data into memory too.
Let’s assume the following file is a huge csv file:
>>> pe.get_sheet(file_name="your_file.csv",
... start_row=2, row_limit=3,
... start_column=1, column_limit=2)
your_file.csv:
+----+----+
| 23 | 33 |
+----+----+
| 24 | 34 |
+----+----+
| 25 | 35 |
+----+----+
If you are transcoding a big data set, conventional formatting method would not help unless a on-demand free RAM
is available. However, there is a way to minimize the memory footprint of pyexcel while the formatting is performed.
Let’s continue from previous example. Suppose we want to transcode “your_file.csv” to “your_file.xls” but increase
each element by 1.
What we can do is to define a row renderer function as the following:
>>> pe.isave_as(file_name="your_file.csv",
... row_renderer=increment_by_one,
... dest_file_name="your_file.xlsx")
>>> pe.get_sheet(file_name="your_file.xlsx")
your_file.csv:
+---+----+----+
| 2 | 22 | 32 |
+---+----+----+
| 3 | 23 | 33 |
+---+----+----+
| 4 | 24 | 34 |
+---+----+----+
| 5 | 25 | 35 |
+---+----+----+
| 6 | 26 | 36 |
+---+----+----+
(continues on next page)
Here is the way to read the csv file and iterate through each row:
Often people wanted to use csv.Dict reader to read it because it has a header. Here is how you do it with pyexcel:
Line 2 remove the header from the actual content. The removed header can be used to access its columns using the
name itself, for example:
>>> sheet.column['Age']
[10, 11]
Top left corner of a sheet is (0, 0), meaning both row index and column index start from 0. To randomly access a cell
of Sheet instance, two syntax are available:
sheet[row, column]
This syntax helps you iterate the data by row and by column. If you use excel positions, the syntax below help you get
the cell instantly without converting alphabet column index to integer:
sheet['A1']
Please note that with excel positions, top left corner is ‘A1’.
For example: suppose you have the following data sheet,
here is the example code showing how you can randomly access a cell:
Note: In order to set a value to a cell, please use sheet[row_index, column_index] = new_value
Continue with previous excel file, you can access row and column separately:
>>> sheet.row[1]
['a', 1, 2, 3]
>>> sheet.column[2]
['Y', 2, 5, 8]
>>> sheet.name_columns_by_row(0)
>>> print(sheet[1, "Y"])
5
>>> sheet[1, "Y"] = 100
>>> print(sheet[1, "Y"])
100
You have noticed the row index has been changed. It is because first row is taken as the column names, hence all rows
after the first row are shifted. Now accessing the columns are changed too:
>>> sheet.column['Y']
[2, 100, 8]
>>> sheet.column['Y'][1]
100
>>> sheet.name_rows_by_column(0)
>>> sheet.row["b"][1]
100
For the same reason, the row index has been reduced by 1. Since we have named columns and rows, it is possible to
access the same cell like this:
Note: When you have named your rows and columns, in order to set a value to a cell, please use sheet[row_name,
column_name] = new_value
For multiple sheet file, you can regard it as three dimensional array if you use Book. So, you access each cell via this
syntax:
book[sheet_index][row, column]
or:
book["sheet_name"][row, column]
Tip: With pyexcel, you can regard single sheet reader as an two dimensional array and multi-sheet excel book reader
as a ordered dictionary of two dimensional arrays.
>>> # "example.xls","example.xlsx","example.xlsm"
>>> sheet = pyexcel.get_sheet(file_name="example_series.xls", name_columns_by_row=0)
>>> sheet.to_dict()
OrderedDict([('Column 1', [1, 4, 7]), ('Column 2', [2, 5, 8]), ('Column 3', [3, 6,
˓→9])])
>>> # "example.csv","example.xlsx","example.xlsm"
>>> sheet = pyexcel.get_sheet(file_name="example.xls", name_columns_by_row=0)
>>> records = sheet.to_records()
>>> for record in records:
... keys = sorted(record.keys())
... print("{")
... for key in keys:
... print("'%s':%d" % (key, record[key]))
... print("}")
{
'X':1
'Y':2
'Z':3
}
{
'X':4
'Y':5
'Z':6
}
{
'X':7
'Y':8
'Z':9
}
1 2 3
4 5 6
7 8 9
.. testcode::
Suppose you have previous data as a dictionary and you want to save it as multiple sheet excel file:
>>> content = {
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
... ],
... 'Sheet 2':
... [
... ['X', 'Y', 'Z'],
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0]
... ],
... 'Sheet 3':
... [
... ['O', 'P', 'Q'],
... [3.0, 2.0, 1.0],
... [4.0, 3.0, 2.0]
... ]
... }
>>> book = pyexcel.get_book(bookdict=content)
>>> book.save_as("output.xls")
Suppose you have the following data in any of the supported excel formats again:
>>> sheet = pyexcel.get_sheet(file_name="example_series.xls", name_columns_by_row=0)
Maybe you want to get only the data without the column headers. You can call rows() instead:
>>> list(sheet.rows())
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
You can get data from the bottom to the top one by calling rrows() instead:
>>> list(sheet.rrows())
[[7, 8, 9], [4, 5, 6], [1, 2, 3]]
You might want the data arranged vertically. You can call columns() instead:
>>> list(sheet.columns())
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
You can get columns in reverse sequence as well by calling rcolumns() instead:
>>> list(sheet.rcolumns())
[[3, 6, 9], [2, 5, 8], [1, 4, 7]]
Do you want to flatten the data? You can get the content in one dimensional array. If you are interested in playing with
one dimensional enumeration, you can check out these functions enumerate(), reverse(), vertical(), and
rvertical():
>>> list(sheet.enumerate())
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(sheet.reverse())
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> list(sheet.vertical())
[1, 4, 7, 2, 5, 8, 3, 6, 9]
>>> list(sheet.rvertical())
[9, 6, 3, 8, 5, 2, 7, 4, 1]
The data in a sheet is represented by Sheet which maintains the data as a list of lists. You can regard Sheet as a
two dimensional array with additional iterators. Random access to individual column and row is exposed by Column
and Row
Column manipulation
And you want to update Column 2 with these data: [11, 12, 13]
>>> sheet.column["Column 2"] = [11, 12, 13]
>>> sheet.column[1]
[11, 12, 13]
>>> sheet
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+==========+==========+==========+
| 1 | 11 | 7 |
+----------+----------+----------+
| 2 | 12 | 8 |
+----------+----------+----------+
| 3 | 13 | 9 |
+----------+----------+----------+
Continue from previous example. Suppose you want add two more columns to the data file
Column 4 Column 5
10 13
11 14
12 15
And you want to remove columns named as: ‘a’, ‘c, ‘e’, ‘h’. This is how you do it:
>>> sheet
pyexcel sheet:
+----------+----------+----------+
| 1 | 2 | 3 |
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
| 4 | 5 | 6 |
+----------+----------+----------+
>>> sheet.name_columns_by_row(1)
>>> sheet
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+==========+==========+==========+
| 1 | 2 | 3 |
+----------+----------+----------+
| 4 | 5 | 6 |
+----------+----------+----------+
Row manipulation
>>> sheet
pyexcel sheet:
+---+---+---+-------+
| a | b | c | Row 1 |
+---+---+---+-------+
| e | f | g | Row 2 |
+---+---+---+-------+
| 1 | 2 | 3 | Row 3 |
+---+---+---+-------+
>>> sheet.name_rows_by_column(3)
>>> sheet
pyexcel sheet:
+-------+---+---+---+
| Row 1 | a | b | c |
+-------+---+---+---+
| Row 2 | e | f | g |
+-------+---+---+---+
| Row 3 | 1 | 2 | 3 |
+-------+---+---+---+
You may want to filter odd rows and print them in an array of dictionaries:
>>> sheet.filter(column_indices=[1])
>>> sheet.content
+----------+----------+
| Column 1 | Column 3 |
+==========+==========+
| 4 | 6 |
+----------+----------+
>>> sheet.save_as("example_series_filter.xls")
Column 1 Column 3
2 8
Suppose you have the following data in a sheet and you want to remove those rows with blanks:
You can use pyexcel.filters.RowValueFilter, which examines each row, return True if the row should be
filtered out. So, let’s define a filter function:
Previous section has assumed the data is in the format that you want. In reality, you have to manipulate the data types
a bit to suit your needs. Hence, formatters comes into the scene. use format() to apply formatter immediately.
Note: int, float and datetime values are automatically detected in csv files since pyexcel version 0.2.2
As you can see, userid column is of int type. Next, let’s convert the column to string format:
Sometimes, the data in a spreadsheet may have unwanted strings in all or some cells. Let’s take an example. Suppose
we have a spread sheet that contains all strings but it as random spaces before and after the text values. Some field had
weird characters, such as “ “:
>>> data = [
... [" Version", " Comments", " Author "],
... [" v0.0.1 ", " Release versions"," Eda"],
... [" v0.0.2 ", "Useful updates ", " Freud"]
... ]
>>> sheet = pyexcel.Sheet(data)
>>> sheet.content
+-----------------+------------------------------+----------------------+
| Version | Comments | Author |
+-----------------+------------------------------+----------------------+
(continues on next page)
.. code-block:: python
.. code-block:: python
>>> sheet.map(cleanse_func)
>>> sheet.content
+---------+------------------+--------+
| Version | Comments | Author |
+---------+------------------+--------+
| v0.0.1 | Release versions | Eda |
+---------+------------------+--------+
| v0.0.2 | Useful updates | Freud |
+---------+------------------+--------+
Suppose you have two excel books and each had three sheets. You can merge them and get a new book:
You also can merge individual sheets:
Suppose you want to merge many csv files row by row into a new sheet.
import pyexcel
book = pyexcel.get_book(file_name="yourfile.xls")
for sheet in book:
# do you processing with sheet
# do filtering?
pass
book.save_as("output.xls")
What would happen if I save a multi sheet book into “csv” file
Well, you will get one csv file per each sheet. Suppose you have these code:
>>> content = {
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
... ],
... 'Sheet 2':
... [
... ['X', 'Y', 'Z'],
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0]
... ],
... 'Sheet 3':
... [
... ['O', 'P', 'Q'],
... [3.0, 2.0, 1.0],
... [4.0, 3.0, 2.0]
... ]
... }
>>> book = pyexcel.Book(content)
>>> book.save_as("myfile.csv")
and their content is the value of the dictionary at the corresponding key
Alternatively, you could use save_book_as() function
After I have saved my multiple sheet book in csv format, how do I get them back
First of all, you can read them back individual as csv file using meth:~pyexcel.get_sheet method. Secondly, the pyexcel
can do the magic to load all of them back into a book. You will just need to provide the common name before the
separator “__”:
2.8.1 Recipes
Warning: The pyexcel DOES NOT consider Fonts, Styles and Charts at all. In the resulting excel files, fonts,
styles and charts will not be transferred.
These recipes give a one-stop utility functions for known use cases. Similar functionality can be achieved using other
application interfaces.
And you want to update Column 2 with these data: [11, 12, 13]
Here is the code:
Row 1 1 2 3
Row 2 4 5 6
Row 3 7 8 9
And you want to update the second row with these data: [7, 4, 1]
Here is the code:
example.xls
Column 4 Column 5
10 12
11 13
The following code will merge the tow into one file, say “output.xls”:
example.xls
>>> data = [
... ["Column 1", "Column 2", "Column 3", "Column 4", "Column 5"],
... [1, 4, 7, 10, 13],
... [2, 5, 8, 11, 14],
... [3, 6, 9, 12, 15]
... ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.csv")
>>> data = [
... ["Column 6", "Column 7", "Column 8", "Column 9", "Column 10"],
... [16, 17, 18, 19, 20]
... ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.xls")
And you want to filter out column 2 and 4 from example.ods, filter out column 6 and 7 and merge them:
Merge two files into a book where each file become a sheet
example.xls
Column 4 Column 5
10 12
11 13
>>> data = [
... ["Column 1", "Column 2", "Column 3"],
... [1, 2, 3],
... [4, 5, 6],
... [7, 8, 9]
... ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.csv")
>>> data = [
... ["Column 4", "Column 5"],
... [10, 12],
... [11, 13]
... ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.xls")
The following code will merge the tow into one file, say “output.xls”:
Column 4 Column 5
10 12
11 13
Merge all excel files in directory into a book where each file become a sheet
The following code will merge every excel files into one file, say “output.xls”:
merge_all_to_a_book(glob.glob("your_csv_directory\*.csv"), "output.xls")
You can mix and match with other excel formats: xls, xlsm and ods. For example, if you are sure you have only xls,
xlsm, xlsx, ods and csv files in your_excel_file_directory, you can do the following:
merge_all_to_a_book(glob.glob("your_excel_file_directory\*.*"), "output.xls")
Suppose you have many sheets in a work book and you would like to separate each into a single sheet excel file. You
can easily do this:
for the output file, you can specify any of the supported formats
Suppose you just want to extract one sheet from many sheets that exists in a work book and you would like to separate
it into a single sheet excel file. You can easily do this:
for the output file, you can specify any of the supported formats
list
dict
>>> a_dictionary_of_key_value_pair = {
... "IE": 0.2,
... "Firefox": 0.3
... }
>>> sheet = p.get_sheet(adict=a_dictionary_of_key_value_pair)
>>> sheet
pyexcel_sheet1:
+---------+-----+
| Firefox | IE |
+---------+-----+
| 0.3 | 0.2 |
+---------+-----+
>>> a_dictionary_of_one_dimensional_arrays = {
... "Column 1": [1, 2, 3, 4],
... "Column 2": [5, 6, 7, 8],
... "Column 3": [9, 10, 11, 12],
... }
>>> sheet = p.get_sheet(adict=a_dictionary_of_one_dimensional_arrays)
>>> sheet
pyexcel_sheet1:
(continues on next page)
records
>>> a_list_of_dictionaries = [
... {
... "Name": 'Adam',
... "Age": 28
... },
... {
... "Name": 'Beatrice',
... "Age": 29
... },
... {
... "Name": 'Ceri',
... "Age": 30
... },
... {
... "Name": 'Dean',
... "Age": 26
... }
... ]
>>> sheet = p.get_sheet(records=a_list_of_dictionaries)
>>> sheet
pyexcel_sheet1:
+-----+----------+
| Age | Name |
+-----+----------+
| 28 | Adam |
+-----+----------+
| 29 | Beatrice |
+-----+----------+
| 30 | Ceri |
+-----+----------+
| 26 | Dean |
+-----+----------+
book dict
>>> a_dictionary_of_two_dimensional_arrays = {
... 'Sheet 1':
... [
(continues on next page)
For sheet
Get content
>>> another_sheet.content
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
For book
>>> another_book
Sheet 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
Sheet 2:
+---+---+---+
| X | Y | Z |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
Sheet 3:
+---+---+---+
| O | P | Q |
+---+---+---+
| 3 | 2 | 1 |
+---+---+---+
| 4 | 3 | 2 |
+---+---+---+
Here is real case in the stack-overflow. Due to the author’s ignorance, the user would like to have the code in matlab
than Python. Hence, I am sharing my pyexcel solution here.
Problem definition
Pyexcel solution
If you could insert an id field to act as the primary key, it can be mapped using sqlalchemy’s ORM:
$ sqlite3 /tmp/stack2.db
sqlite> CREATE TABLE ALLPROTEINS (
...> ID INT,
...> Protein_ID CHAR(20),
...> PROTEIN_KEY INT,
...> VALUE_OF_KEY INT
...> );
Here is the short script to get data inserted into the database:
>>> import pyexcel as p
>>> from itertools import product
>>> # data insertion code starts here
>>> sheet = p.get_sheet(file_name="csv-to-mysql-in-matlab-code.csv", delimiter='\t')
>>> sheet.name_columns_by_row(0)
>>> sheet.name_rows_by_column(0)
>>> print(sheet)
csv-to-mysql-in-matlab-code.csv:
+------+--------+--------+--------+---------+
| | 123442 | 234335 | 234336 | 3549867 |
+======+========+========+========+=========+
| a001 | 6 | 0 | 0 | 8 |
+------+--------+--------+--------+---------+
| b001 | 4 | 2 | 0 | 0 |
+------+--------+--------+--------+---------+
| c003 | 0 | 0 | 0 | 5 |
+------+--------+--------+--------+---------+
>>> results = []
>>> for protein_id, protein_key in product(sheet.rownames, sheet.colnames):
... results.append([protein_id, protein_key, sheet[str(protein_id), protein_key]])
(continues on next page)
$ sqlite3 /tmp/stack2.db
sqlite> select * from allproteins
...> ;
|a001|123442|6
|a001|234335|0
|a001|234336|0
|a001|3549867|8
|b001|123442|4
|b001|234335|2
|b001|234336|0
|b001|234336|0
|c003|123442|0
|c003|234335|0
|c003|234336|0
|c003|3549867|5
Signature functions
pyexcel.get_array
pyexcel.get_array(**keywords)
Obtain an array from an excel source
It accepts the same parameters as get_sheet() but return an array instead.
Not all parameters are needed. Here is a table
source parameters
loading from file file_name, sheet_name, keywords
loading from string file_content, file_type, sheet_name, keywords
loading from stream file_stream, file_type, sheet_name, keywords
loading from sql session, table
loading from sql in django model
loading from query sets any query sets(sqlalchemy or django)
loading from dictionary adict, with_keys
loading from records records
loading from array array
loading from an url url
Parameters
file_name : a file with supported file extension
file_content : the file content
file_stream : the file stream
encoding: csv specific. Specify the file encoding the csv file. For example: encoding=’latin1’. Especially,
encoding=’utf-8-sig’ would add utf 8 bom header if used in renderer, or would parse a csv with utf brom
header used in parser.
escapechar : A one-character string used by the writer to escape the delimiter if quoting is set to
QUOTE_NONE and the quotechar if doublequote is False.
quotechar : A one-character string used to quote fields containing special characters, such as the delimiter or
quotechar, or which contain new-line characters. It defaults to ‘”’
quoting : Controls when quotes should be generated by the writer and recognised by the reader. It can take on
any of the QUOTE_* constants (see section Module Contents) and defaults to QUOTE_MINIMAL.
skipinitialspace : When True, whitespace immediately following the delimiter is ignored. The default is False.
pep_0515_off : When True in python version 3.6, PEP-0515 is turned on. The default is False
Parameters related to xls file format: Please note the following parameters apply to pyexcel-xls. more details
can be found in xlrd.open_workbook()
logfile: An open file to which messages and diagnostics are written.
verbosity: Increases the volume of trace material written to the logfile.
use_mmap: Whether to use the mmap module is determined heuristically. Use this arg to override the result.
Current heuristic: mmap is used if it exists.
encoding_override: Used to overcome missing or bad codepage information in older-version files.
formatting_info: The default is False, which saves memory.
When True, formatting information will be read from the spreadsheet file. This provides all cells, including
empty and blank cells. Formatting information is available for each cell.
ragged_rows: The default of False means all rows are padded out with empty cells so that all rows have the
same size as found in ncols.
True means that there are no empty cells at the ends of rows. This can result in substantial memory savings
if rows are of widely varying sizes. See also the row_len() method.
pyexcel.get_dict
pyexcel.get_dict(name_columns_by_row=0, **keywords)
Obtain a dictionary from an excel source
It accepts the same parameters as get_sheet() but return a dictionary instead.
Specifically: name_columns_by_row : specify a row to be a dictionary key. It is default to 0 or first row.
If you would use a column index 0 instead, you should do:
get_dict(name_columns_by_row=-1, name_rows_by_column=0)
>>> pe.get_sheet(file_name="your_file.csv",
... start_row=2, row_limit=3,
... start_column=1, column_limit=2)
your_file.csv:
+----+----+
| 23 | 33 |
+----+----+
| 24 | 34 |
+----+----+
| 25 | 35 |
+----+----+
source parameters
loading from file file_name, sheet_name, keywords
loading from string file_content, file_type, sheet_name, keywords
loading from stream file_stream, file_type, sheet_name, keywords
loading from sql session, table
loading from sql in django model
loading from query sets any query sets(sqlalchemy or django)
loading from dictionary adict, with_keys
loading from records records
loading from array array
loading from an url url
Parameters
pyexcel.get_records
pyexcel.get_records(name_columns_by_row=0, **keywords)
Obtain a list of records from an excel source
It accepts the same parameters as get_sheet() but return a list of dictionary(records) instead.
Specifically: name_columns_by_row : specify a row to be a dictionary key. It is default to 0 or first row.
If you would use a column index 0 instead, you should do:
get_records(name_columns_by_row=-1, name_rows_by_column=0)
>>> pe.get_sheet(file_name="your_file.csv",
... start_row=2, row_limit=3,
... start_column=1, column_limit=2)
your_file.csv:
+----+----+
| 23 | 33 |
+----+----+
| 24 | 34 |
+----+----+
| 25 | 35 |
+----+----+
>>> pe.isave_as(file_name="your_file.csv",
... row_renderer=increment_by_one,
... dest_file_name="your_file.xlsx")
>>> pe.get_sheet(file_name="your_file.xlsx")
your_file.csv:
+---+----+----+
| 2 | 22 | 32 |
+---+----+----+
| 3 | 23 | 33 |
+---+----+----+
| 4 | 24 | 34 |
+---+----+----+
| 5 | 25 | 35 |
+---+----+----+
| 6 | 26 | 36 |
+---+----+----+
| 7 | 27 | 37 |
+---+----+----+
source parameters
loading from file file_name, sheet_name, keywords
loading from string file_content, file_type, sheet_name, keywords
loading from stream file_stream, file_type, sheet_name, keywords
loading from sql session, table
loading from sql in django model
loading from query sets any query sets(sqlalchemy or django)
loading from dictionary adict, with_keys
loading from records records
loading from array array
loading from an url url
Parameters
file_name : a file with supported file extension
file_content : the file content
file_stream : the file stream
file_type : the file type in file_content or file_stream
session : database session
table : database table
model: a django model
adict: a dictionary of one dimensional arrays
url : a download http url for your excel file
with_keys : load with previous dictionary’s keys, default is True
records : a list of dictionaries that have the same keys
array : a two dimensional array, a list of lists
sheet_name : sheet name. if sheet_name is not given, the default sheet at index 0 is loaded
start_row [int] defaults to 0. It allows you to skip rows at the begginning
row_limit: int defaults to -1, meaning till the end of the whole sheet. It allows you to skip the tailing rows.
start_column [int] defaults to 0. It allows you to skip columns on your left hand side
column_limit: int defaults to -1, meaning till the end of the columns. It allows you to skip the tailing columns.
skip_row_func: It allows you to write your own row skipping functions.
The protocol is to return pyexcel_io.constants.SKIP_DATA if skipping data, pyex-
cel_io.constants.TAKE_DATA to read data, pyexcel_io.constants.STOP_ITERATION to exit the
reading procedure
skip_column_func: It allows you to write your own column skipping functions.
The protocol is to return pyexcel_io.constants.SKIP_DATA if skipping data, pyex-
cel_io.constants.TAKE_DATA to read data, pyexcel_io.constants.STOP_ITERATION to exit the
reading procedure
skip_empty_rows: bool Defaults to False. Toggle it to True if the rest of empty rows are useless, but it does
affect the number of rows.
row_renderer: You could choose to write a custom row renderer when the data is being read.
pyexcel.get_book_dict
pyexcel.get_book_dict(**keywords)
Obtain a dictionary of two dimensional arrays
It accepts the same parameters as get_book() but return a dictionary instead.
Here is a table of parameters:
source parameters
loading from file file_name, keywords
loading from string file_content, file_type, keywords
loading from stream file_stream, file_type, keywords
loading from sql session, tables
loading from django models models
loading from dictionary bookdict
loading from an url url
Where the dictionary should have text as keys and two dimensional array as values.
Parameters
file_name : a file with supported file extension
file_content : the file content
file_stream : the file stream
file_type : the file type in file_content or file_stream
session : database session
tables : a list of database table
models : a list of django models
bookdict : a dictionary of two dimensional arrays
url : a download http url for your excel file
sheets: a list of mixed sheet names and sheet indices to be read. This is done to keep Pandas compactibility.
With this parameter, more than one sheet can be read and you have the control to read the sheets of your
interest instead of all available sheets.
auto_detect_float : defaults to True
auto_detect_int : defaults to True
auto_detect_datetime : defaults to True
ignore_infinity : defaults to True
library : choose a specific pyexcel-io plugin for reading
source_library : choose a specific data source plugin for reading
parser_library : choose a pyexcel parser plugin for reading
skip_hidden_sheets: default is True. Please toggle it to read hidden sheets
Parameters related to csv file format
for csv, fmtparams are accepted
delimiter : field separator
pyexcel.get_book
pyexcel.get_book(**keywords)
Get an instance of Book from an excel source
Here is a table of parameters:
source parameters
loading from file file_name, keywords
loading from string file_content, file_type, keywords
loading from stream file_stream, file_type, keywords
loading from sql session, tables
loading from django models models
loading from dictionary bookdict
loading from an url url
Where the dictionary should have text as keys and two dimensional array as values.
Parameters
file_name : a file with supported file extension
file_content : the file content
file_stream : the file stream
file_type : the file type in file_content or file_stream
session : database session
tables : a list of database table
models : a list of django models
bookdict : a dictionary of two dimensional arrays
url : a download http url for your excel file
sheets: a list of mixed sheet names and sheet indices to be read. This is done to keep Pandas compactibility.
With this parameter, more than one sheet can be read and you have the control to read the sheets of your
interest instead of all available sheets.
auto_detect_float : defaults to True
pyexcel.get_sheet
pyexcel.get_sheet(**keywords)
Get an instance of Sheet from an excel source
Examples on start_row, start_column
Let’s assume the following file is a huge csv file:
>>> pe.get_sheet(file_name="your_file.csv",
... start_row=2, row_limit=3,
... start_column=1, column_limit=2)
your_file.csv:
+----+----+
| 23 | 33 |
+----+----+
| 24 | 34 |
+----+----+
| 25 | 35 |
+----+----+
>>> pe.isave_as(file_name="your_file.csv",
... row_renderer=increment_by_one,
... dest_file_name="your_file.xlsx")
>>> pe.get_sheet(file_name="your_file.xlsx")
your_file.csv:
+---+----+----+
| 2 | 22 | 32 |
+---+----+----+
| 3 | 23 | 33 |
+---+----+----+
| 4 | 24 | 34 |
+---+----+----+
| 5 | 25 | 35 |
+---+----+----+
| 6 | 26 | 36 |
+---+----+----+
| 7 | 27 | 37 |
+---+----+----+
source parameters
loading from file file_name, sheet_name, keywords
loading from string file_content, file_type, sheet_name, keywords
loading from stream file_stream, file_type, sheet_name, keywords
loading from sql session, table
loading from sql in django model
loading from query sets any query sets(sqlalchemy or django)
loading from dictionary adict, with_keys
loading from records records
loading from array array
loading from an url url
Parameters
file_name : a file with supported file extension
file_content : the file content
file_stream : the file stream
file_type : the file type in file_content or file_stream
session : database session
escapechar : A one-character string used by the writer to escape the delimiter if quoting is set to
QUOTE_NONE and the quotechar if doublequote is False.
quotechar : A one-character string used to quote fields containing special characters, such as the delimiter or
quotechar, or which contain new-line characters. It defaults to ‘”’
quoting : Controls when quotes should be generated by the writer and recognised by the reader. It can take on
any of the QUOTE_* constants (see section Module Contents) and defaults to QUOTE_MINIMAL.
skipinitialspace : When True, whitespace immediately following the delimiter is ignored. The default is False.
pep_0515_off : When True in python version 3.6, PEP-0515 is turned on. The default is False
Parameters related to xls file format: Please note the following parameters apply to pyexcel-xls. more details
can be found in xlrd.open_workbook()
logfile: An open file to which messages and diagnostics are written.
verbosity: Increases the volume of trace material written to the logfile.
use_mmap: Whether to use the mmap module is determined heuristically. Use this arg to override the result.
Current heuristic: mmap is used if it exists.
encoding_override: Used to overcome missing or bad codepage information in older-version files.
formatting_info: The default is False, which saves memory.
When True, formatting information will be read from the spreadsheet file. This provides all cells, including
empty and blank cells. Formatting information is available for each cell.
ragged_rows: The default of False means all rows are padded out with empty cells so that all rows have the
same size as found in ncols.
True means that there are no empty cells at the ends of rows. This can result in substantial memory savings
if rows are of widely varying sizes. See also the row_len() method.
pyexcel.iget_book
pyexcel.iget_book(**keywords)
Get an instance of BookStream from an excel source
First use case is to get all sheet names without extracting the sheets into memory.
Here is a table of parameters:
source parameters
loading from file file_name, keywords
loading from string file_content, file_type, keywords
loading from stream file_stream, file_type, keywords
loading from sql session, tables
loading from django models models
loading from dictionary bookdict
loading from an url url
Where the dictionary should have text as keys and two dimensional array as values.
Parameters
file_name : a file with supported file extension
file_content : the file content
pyexcel.iget_array
pyexcel.iget_array(**keywords)
Obtain a generator of an two dimensional array from an excel source
It is similiar to pyexcel.get_array() but it has less memory footprint.
Not all parameters are needed. Here is a table
source parameters
loading from file file_name, sheet_name, keywords
loading from string file_content, file_type, sheet_name, keywords
loading from stream file_stream, file_type, sheet_name, keywords
loading from sql session, table
loading from sql in django model
loading from query sets any query sets(sqlalchemy or django)
loading from dictionary adict, with_keys
loading from records records
loading from array array
loading from an url url
Parameters
file_name : a file with supported file extension
file_content : the file content
file_stream : the file stream
file_type : the file type in file_content or file_stream
session : database session
table : database table
model: a django model
adict: a dictionary of one dimensional arrays
url : a download http url for your excel file
with_keys : load with previous dictionary’s keys, default is True
records : a list of dictionaries that have the same keys
array : a two dimensional array, a list of lists
sheet_name : sheet name. if sheet_name is not given, the default sheet at index 0 is loaded
start_row [int] defaults to 0. It allows you to skip rows at the begginning
row_limit: int defaults to -1, meaning till the end of the whole sheet. It allows you to skip the tailing rows.
start_column [int] defaults to 0. It allows you to skip columns on your left hand side
column_limit: int defaults to -1, meaning till the end of the columns. It allows you to skip the tailing columns.
skip_row_func: It allows you to write your own row skipping functions.
The protocol is to return pyexcel_io.constants.SKIP_DATA if skipping data, pyex-
cel_io.constants.TAKE_DATA to read data, pyexcel_io.constants.STOP_ITERATION to exit the
reading procedure
pyexcel.iget_records
pyexcel.iget_records(custom_headers=None, **keywords)
Obtain a generator of a list of records from an excel source
It is similiar to pyexcel.get_records() but it has less memory footprint but requires the headers to be in
the first row. And the data matrix should be of equal length. It should consume less memory and should work
well with large files.
Examples on start_row, start_column
Let’s assume the following file is a huge csv file:
>>> pe.get_sheet(file_name="your_file.csv",
... start_row=2, row_limit=3,
... start_column=1, column_limit=2)
your_file.csv:
+----+----+
| 23 | 33 |
+----+----+
| 24 | 34 |
+----+----+
| 25 | 35 |
+----+----+
>>> pe.isave_as(file_name="your_file.csv",
... row_renderer=increment_by_one,
... dest_file_name="your_file.xlsx")
>>> pe.get_sheet(file_name="your_file.xlsx")
your_file.csv:
+---+----+----+
| 2 | 22 | 32 |
+---+----+----+
| 3 | 23 | 33 |
+---+----+----+
| 4 | 24 | 34 |
+---+----+----+
| 5 | 25 | 35 |
+---+----+----+
| 6 | 26 | 36 |
+---+----+----+
| 7 | 27 | 37 |
+---+----+----+
source parameters
loading from file file_name, sheet_name, keywords
loading from string file_content, file_type, sheet_name, keywords
loading from stream file_stream, file_type, sheet_name, keywords
loading from sql session, table
loading from sql in django model
loading from query sets any query sets(sqlalchemy or django)
loading from dictionary adict, with_keys
loading from records records
loading from array array
loading from an url url
Parameters
file_name : a file with supported file extension
file_content : the file content
file_stream : the file stream
file_type : the file type in file_content or file_stream
session : database session
table : database table
model: a django model
adict: a dictionary of one dimensional arrays
url : a download http url for your excel file
with_keys : load with previous dictionary’s keys, default is True
records : a list of dictionaries that have the same keys
array : a two dimensional array, a list of lists
sheet_name : sheet name. if sheet_name is not given, the default sheet at index 0 is loaded
start_row [int] defaults to 0. It allows you to skip rows at the begginning
row_limit: int defaults to -1, meaning till the end of the whole sheet. It allows you to skip the tailing rows.
start_column [int] defaults to 0. It allows you to skip columns on your left hand side
column_limit: int defaults to -1, meaning till the end of the columns. It allows you to skip the tailing columns.
skip_row_func: It allows you to write your own row skipping functions.
The protocol is to return pyexcel_io.constants.SKIP_DATA if skipping data, pyex-
cel_io.constants.TAKE_DATA to read data, pyexcel_io.constants.STOP_ITERATION to exit the
reading procedure
skip_column_func: It allows you to write your own column skipping functions.
The protocol is to return pyexcel_io.constants.SKIP_DATA if skipping data, pyex-
cel_io.constants.TAKE_DATA to read data, pyexcel_io.constants.STOP_ITERATION to exit the
reading procedure
skip_empty_rows: bool Defaults to False. Toggle it to True if the rest of empty rows are useless, but it does
affect the number of rows.
row_renderer: You could choose to write a custom row renderer when the data is being read.
auto_detect_float : defaults to True
auto_detect_int : defaults to True
auto_detect_datetime : defaults to True
ignore_infinity : defaults to True
library : choose a specific pyexcel-io plugin for reading
source_library : choose a specific data source plugin for reading
parser_library : choose a pyexcel parser plugin for reading
skip_hidden_sheets: default is True. Please toggle it to read hidden sheets
Parameters related to csv file format
for csv, fmtparams are accepted
delimiter : field separator
lineterminator : line terminator
encoding: csv specific. Specify the file encoding the csv file. For example: encoding=’latin1’. Especially,
encoding=’utf-8-sig’ would add utf 8 bom header if used in renderer, or would parse a csv with utf brom
header used in parser.
escapechar : A one-character string used by the writer to escape the delimiter if quoting is set to
QUOTE_NONE and the quotechar if doublequote is False.
quotechar : A one-character string used to quote fields containing special characters, such as the delimiter or
quotechar, or which contain new-line characters. It defaults to ‘”’
quoting : Controls when quotes should be generated by the writer and recognised by the reader. It can take on
any of the QUOTE_* constants (see section Module Contents) and defaults to QUOTE_MINIMAL.
skipinitialspace : When True, whitespace immediately following the delimiter is ignored. The default is False.
pep_0515_off : When True in python version 3.6, PEP-0515 is turned on. The default is False
Parameters related to xls file format: Please note the following parameters apply to pyexcel-xls. more details
can be found in xlrd.open_workbook()
logfile: An open file to which messages and diagnostics are written.
verbosity: Increases the volume of trace material written to the logfile.
use_mmap: Whether to use the mmap module is determined heuristically. Use this arg to override the result.
Current heuristic: mmap is used if it exists.
encoding_override: Used to overcome missing or bad codepage information in older-version files.
formatting_info: The default is False, which saves memory.
When True, formatting information will be read from the spreadsheet file. This provides all cells, including
empty and blank cells. Formatting information is available for each cell.
ragged_rows: The default of False means all rows are padded out with empty cells so that all rows have the
same size as found in ncols.
True means that there are no empty cells at the ends of rows. This can result in substantial memory savings
if rows are of widely varying sizes. See also the row_len() method.
When you use this function to work on physical files, this function will leave its file handle open. When you
finish the operation on its data, you need to call pyexcel.free_resources() to close file hande(s).
for csv, csvz file formats, file handles will be left open. for xls, ods file formats, the file is read all into memory
and is close afterwards. for xlsx, file handles will be left open in python 2.7 - 3.5 by pyexcel-xlsx(openpyxl). In
other words, pyexcel-xls, pyexcel-ods, pyexcel-ods3 won’t leak file handles.
pyexcel.free_resources
pyexcel.free_resources()
Close file handles opened by signature functions that starts with ‘i’
for csv, csvz file formats, file handles will be left open. for xls, ods file formats, the file is read all into memory
and is close afterwards. for xlsx, file handles will be left open in python 2.7 - 3.5 by pyexcel-xlsx(openpyxl). In
other words, pyexcel-xls, pyexcel-ods, pyexcel-ods3 won’t leak file handles.
pyexcel.save_as
pyexcel.save_as(**keywords)
Save a sheet from a data source to another one
It accepts two sets of keywords. Why two sets? one set is source, the other set is destination. In order to distin-
guish the two sets, source set will be exactly the same as the ones for pyexcel.get_sheet(); destination
set are exactly the same as the ones for pyexcel.Sheet.save_as but require a ‘dest’ prefix.
>>> pe.get_sheet(file_name="your_file.csv",
... start_row=2, row_limit=3,
(continues on next page)
>>> pe.isave_as(file_name="your_file.csv",
... row_renderer=increment_by_one,
... dest_file_name="your_file.xlsx")
>>> pe.get_sheet(file_name="your_file.xlsx")
your_file.csv:
+---+----+----+
| 2 | 22 | 32 |
+---+----+----+
| 3 | 23 | 33 |
+---+----+----+
| 4 | 24 | 34 |
+---+----+----+
| 5 | 25 | 35 |
+---+----+----+
| 6 | 26 | 36 |
+---+----+----+
| 7 | 27 | 37 |
+---+----+----+
source parameters
loading from file file_name, sheet_name, keywords
loading from string file_content, file_type, sheet_name, keywords
loading from stream file_stream, file_type, sheet_name, keywords
loading from sql session, table
loading from sql in django model
loading from query sets any query sets(sqlalchemy or django)
loading from dictionary adict, with_keys
loading from records records
loading from array array
loading from an url url
Parameters
file_name : a file with supported file extension
file_content : the file content
file_stream : the file stream
file_type : the file type in file_content or file_stream
session : database session
table : database table
model: a django model
adict: a dictionary of one dimensional arrays
url : a download http url for your excel file
with_keys : load with previous dictionary’s keys, default is True
records : a list of dictionaries that have the same keys
array : a two dimensional array, a list of lists
sheet_name : sheet name. if sheet_name is not given, the default sheet at index 0 is loaded
start_row [int] defaults to 0. It allows you to skip rows at the begginning
row_limit: int defaults to -1, meaning till the end of the whole sheet. It allows you to skip the tailing rows.
start_column [int] defaults to 0. It allows you to skip columns on your left hand side
column_limit: int defaults to -1, meaning till the end of the columns. It allows you to skip the tailing columns.
skip_row_func: It allows you to write your own row skipping functions.
The protocol is to return pyexcel_io.constants.SKIP_DATA if skipping data, pyex-
cel_io.constants.TAKE_DATA to read data, pyexcel_io.constants.STOP_ITERATION to exit the
reading procedure
skip_column_func: It allows you to write your own column skipping functions.
The protocol is to return pyexcel_io.constants.SKIP_DATA if skipping data, pyex-
cel_io.constants.TAKE_DATA to read data, pyexcel_io.constants.STOP_ITERATION to exit the
reading procedure
skip_empty_rows: bool Defaults to False. Toggle it to True if the rest of empty rows are useless, but it does
affect the number of rows.
row_renderer: You could choose to write a custom row renderer when the data is being read.
auto_detect_float : defaults to True
auto_detect_int : defaults to True
auto_detect_datetime : defaults to True
ignore_infinity : defaults to True
library : choose a specific pyexcel-io plugin for reading
source_library : choose a specific data source plugin for reading
parser_library : choose a pyexcel parser plugin for reading
skip_hidden_sheets: default is True. Please toggle it to read hidden sheets
Parameters related to csv file format
for csv, fmtparams are accepted
delimiter : field separator
lineterminator : line terminator
encoding: csv specific. Specify the file encoding the csv file. For example: encoding=’latin1’. Especially,
encoding=’utf-8-sig’ would add utf 8 bom header if used in renderer, or would parse a csv with utf brom
header used in parser.
escapechar : A one-character string used by the writer to escape the delimiter if quoting is set to
QUOTE_NONE and the quotechar if doublequote is False.
quotechar : A one-character string used to quote fields containing special characters, such as the delimiter or
quotechar, or which contain new-line characters. It defaults to ‘”’
quoting : Controls when quotes should be generated by the writer and recognised by the reader. It can take on
any of the QUOTE_* constants (see section Module Contents) and defaults to QUOTE_MINIMAL.
skipinitialspace : When True, whitespace immediately following the delimiter is ignored. The default is False.
pep_0515_off : When True in python version 3.6, PEP-0515 is turned on. The default is False
Parameters related to xls file format: Please note the following parameters apply to pyexcel-xls. more details
can be found in xlrd.open_workbook()
logfile: An open file to which messages and diagnostics are written.
verbosity: Increases the volume of trace material written to the logfile.
use_mmap: Whether to use the mmap module is determined heuristically. Use this arg to override the result.
Current heuristic: mmap is used if it exists.
encoding_override: Used to overcome missing or bad codepage information in older-version files.
formatting_info: The default is False, which saves memory.
When True, formatting information will be read from the spreadsheet file. This provides all cells, including
empty and blank cells. Formatting information is available for each cell.
ragged_rows: The default of False means all rows are padded out with empty cells so that all rows have the
same size as found in ncols.
True means that there are no empty cells at the ends of rows. This can result in substantial memory savings
if rows are of widely varying sizes. See also the row_len() method.
dest_file_name: another file name.
pyexcel.isave_as
pyexcel.isave_as(**keywords)
Save a sheet from a data source to another one with less memory
It is simliar to pyexcel.save_as() except that it does not accept parameters for pyexcel.Sheet. And
it read when it writes.
It accepts two sets of keywords. Why two sets? one set is source, the other set is destination. In order to distin-
guish the two sets, source set will be exactly the same as the ones for pyexcel.get_sheet(); destination
set are exactly the same as the ones for pyexcel.Sheet.save_as but require a ‘dest’ prefix.
>>> pe.get_sheet(file_name="your_file.csv",
... start_row=2, row_limit=3,
... start_column=1, column_limit=2)
your_file.csv:
+----+----+
| 23 | 33 |
+----+----+
| 24 | 34 |
+----+----+
| 25 | 35 |
+----+----+
Let’s continue from previous example. Suppose we want to transcode “your_file.csv” to “your_file.xls” but
increase each element by 1.
What we can do is to define a row renderer function as the following:
>>> pe.isave_as(file_name="your_file.csv",
... row_renderer=increment_by_one,
... dest_file_name="your_file.xlsx")
>>> pe.get_sheet(file_name="your_file.xlsx")
your_file.csv:
+---+----+----+
| 2 | 22 | 32 |
+---+----+----+
| 3 | 23 | 33 |
+---+----+----+
| 4 | 24 | 34 |
+---+----+----+
| 5 | 25 | 35 |
+---+----+----+
| 6 | 26 | 36 |
+---+----+----+
| 7 | 27 | 37 |
+---+----+----+
source parameters
loading from file file_name, sheet_name, keywords
loading from string file_content, file_type, sheet_name, keywords
loading from stream file_stream, file_type, sheet_name, keywords
loading from sql session, table
loading from sql in django model
loading from query sets any query sets(sqlalchemy or django)
loading from dictionary adict, with_keys
loading from records records
loading from array array
loading from an url url
Parameters
file_name : a file with supported file extension
file_content : the file content
file_stream : the file stream
encoding: csv specific. Specify the file encoding the csv file. For example: encoding=’latin1’. Especially,
encoding=’utf-8-sig’ would add utf 8 bom header if used in renderer, or would parse a csv with utf brom
header used in parser.
escapechar : A one-character string used by the writer to escape the delimiter if quoting is set to
QUOTE_NONE and the quotechar if doublequote is False.
quotechar : A one-character string used to quote fields containing special characters, such as the delimiter or
quotechar, or which contain new-line characters. It defaults to ‘”’
quoting : Controls when quotes should be generated by the writer and recognised by the reader. It can take on
any of the QUOTE_* constants (see section Module Contents) and defaults to QUOTE_MINIMAL.
skipinitialspace : When True, whitespace immediately following the delimiter is ignored. The default is False.
pep_0515_off : When True in python version 3.6, PEP-0515 is turned on. The default is False
Parameters related to xls file format: Please note the following parameters apply to pyexcel-xls. more details
can be found in xlrd.open_workbook()
logfile: An open file to which messages and diagnostics are written.
verbosity: Increases the volume of trace material written to the logfile.
use_mmap: Whether to use the mmap module is determined heuristically. Use this arg to override the result.
Current heuristic: mmap is used if it exists.
encoding_override: Used to overcome missing or bad codepage information in older-version files.
formatting_info: The default is False, which saves memory.
When True, formatting information will be read from the spreadsheet file. This provides all cells, including
empty and blank cells. Formatting information is available for each cell.
ragged_rows: The default of False means all rows are padded out with empty cells so that all rows have the
same size as found in ncols.
True means that there are no empty cells at the ends of rows. This can result in substantial memory savings
if rows are of widely varying sizes. See also the row_len() method.
dest_file_name: another file name.
dest_file_type: this is needed if you want to save to memory
dest_session: the target database session
dest_table: the target destination table
dest_model: the target django model
dest_mapdict: a mapping dictionary see pyexcel.Sheet.save_to_memory()
dest_initializer: a custom initializer function for table or model
dest_mapdict: nominate headers
dest_batch_size: object creation batch size. it is Django specific
dest_library: choose a specific pyexcel-io plugin for writing
dest_source_library: choose a specific data source plugin for writing
dest_renderer_library: choose a pyexcel parser plugin for writing
if csv file is destination format, python csv fmtparams are accepted
for example: dest_lineterminator will replace default ‘ ‘ to the one you specified
In addition, this function use pyexcel.Sheet to render the data which could have performance penalty. In
exchange, parameters for pyexcel.Sheet can be passed on, e.g. name_columns_by_row.
When you use this function to work on physical files, this function will leave its file handle open. When you
finish the operation on its data, you need to call pyexcel.free_resources() to close file hande(s).
for csv, csvz file formats, file handles will be left open. for xls, ods file formats, the file is read all into memory
and is close afterwards. for xlsx, file handles will be left open in python 2.7 - 3.5 by pyexcel-xlsx(openpyxl). In
other words, pyexcel-xls, pyexcel-ods, pyexcel-ods3 won’t leak file handles.
pyexcel.save_book_as
pyexcel.save_book_as(**keywords)
Save a book from a data source to another one
Here is a table of parameters:
source parameters
loading from file file_name, keywords
loading from string file_content, file_type, keywords
loading from stream file_stream, file_type, keywords
loading from sql session, tables
loading from django models models
loading from dictionary bookdict
loading from an url url
Where the dictionary should have text as keys and two dimensional array as values.
Parameters
file_name : a file with supported file extension
file_content : the file content
file_stream : the file stream
file_type : the file type in file_content or file_stream
session : database session
tables : a list of database table
models : a list of django models
bookdict : a dictionary of two dimensional arrays
url : a download http url for your excel file
sheets: a list of mixed sheet names and sheet indices to be read. This is done to keep Pandas compactibility.
With this parameter, more than one sheet can be read and you have the control to read the sheets of your
interest instead of all available sheets.
auto_detect_float : defaults to True
auto_detect_int : defaults to True
auto_detect_datetime : defaults to True
ignore_infinity : defaults to True
library : choose a specific pyexcel-io plugin for reading
pyexcel.isave_book_as
pyexcel.isave_book_as(**keywords)
Save a book from a data source to another one
It is simliar to pyexcel.save_book_as() but it read when it writes. This function provide some speedup
but the output data is not made uniform.
source parameters
loading from file file_name, keywords
loading from string file_content, file_type, keywords
loading from stream file_stream, file_type, keywords
loading from sql session, tables
loading from django models models
loading from dictionary bookdict
loading from an url url
Where the dictionary should have text as keys and two dimensional array as values.
Parameters
file_name : a file with supported file extension
file_content : the file content
file_stream : the file stream
file_type : the file type in file_content or file_stream
session : database session
tables : a list of database table
models : a list of django models
bookdict : a dictionary of two dimensional arrays
url : a download http url for your excel file
sheets: a list of mixed sheet names and sheet indices to be read. This is done to keep Pandas compactibility.
With this parameter, more than one sheet can be read and you have the control to read the sheets of your
interest instead of all available sheets.
auto_detect_float : defaults to True
auto_detect_int : defaults to True
auto_detect_datetime : defaults to True
ignore_infinity : defaults to True
library : choose a specific pyexcel-io plugin for reading
source_library : choose a specific data source plugin for reading
parser_library : choose a pyexcel parser plugin for reading
skip_hidden_sheets: default is True. Please toggle it to read hidden sheets
Parameters related to csv file format
for csv, fmtparams are accepted
delimiter : field separator
lineterminator : line terminator
encoding: csv specific. Specify the file encoding the csv file. For example: encoding=’latin1’. Especially,
encoding=’utf-8-sig’ would add utf 8 bom header if used in renderer, or would parse a csv with utf brom
header used in parser.
escapechar : A one-character string used by the writer to escape the delimiter if quoting is set to
QUOTE_NONE and the quotechar if doublequote is False.
quotechar : A one-character string used to quote fields containing special characters, such as the delimiter or
quotechar, or which contain new-line characters. It defaults to ‘”’
quoting : Controls when quotes should be generated by the writer and recognised by the reader. It can take on
any of the QUOTE_* constants (see section Module Contents) and defaults to QUOTE_MINIMAL.
skipinitialspace : When True, whitespace immediately following the delimiter is ignored. The default is False.
pep_0515_off : When True in python version 3.6, PEP-0515 is turned on. The default is False
dest_file_name: another file name.
dest_file_type: this is needed if you want to save to memory
dest_session : the target database session
dest_tables : the list of target destination tables
dest_models : the list of target destination django models
dest_mapdicts : a list of mapping dictionaries
dest_initializers : table initialization functions
dest_mapdicts : to nominate a model or table fields. Optional
dest_batch_size : batch creation size. Optional
Where the dictionary should have text as keys and two dimensional array as values.
When you use this function to work on physical files, this function will leave its file handle open. When you
finish the operation on its data, you need to call pyexcel.free_resources() to close file hande(s).
for csv, csvz file formats, file handles will be left open. for xls, ods file formats, the file is read all into memory
and is close afterwards. for xlsx, file handles will be left open in python 2.7 - 3.5 by pyexcel-xlsx(openpyxl). In
other words, pyexcel-xls, pyexcel-ods, pyexcel-ods3 won’t leak file handles.
These flags can be passed on all signature functions:
auto_detect_int
Automatically convert float values to integers if the float number has no decimal values(e.g. 1.00). By default, it does
the detection. Setting it to False will turn on this behavior
It has no effect on pyexcel-xlsx because it does that by default.
auto_detect_float
Automatically convert text to float values if possible. This applies only pyexcel-io where csv, tsv, csvz and tsvz formats
are supported. By default, it does the detection. Setting it to False will turn on this behavior
auto_detect_datetime
Automatically convert text to python datetime if possible. This applies only pyexcel-io where csv, tsv, csvz and tsvz
formats are supported. By default, it does the detection. Setting it to False will turn on this behavior
library
Name a pyexcel plugin to handle a file format. In the situation where multiple plugins were pip installed, it is confusing
for pyexcel on which plugin to handle the file format. For example, both pyexcel-xlsx and pyexcel-xls reads xlsx
format. Now since version 0.2.2, you can pass on library=”pyexcel-xls” to handle xlsx in a specific function call.
It is better to uninstall the unwanted pyexcel plugin using pip if two plugins for the same file type are not absolutely
necessary.
Cookbook
pyexcel.merge_csv_to_a_book
pyexcel.merge_csv_to_a_book(filelist, outfilename=’merged.xls’)
merge a list of csv files into a excel book
Parameters
• filelist (list) – a list of accessible file path
• outfilename (str) – save the sheet as
pyexcel.merge_all_to_a_book
pyexcel.merge_all_to_a_book(filelist, outfilename=’merged.xls’)
merge a list of excel files into a excel book
Parameters
• filelist (list) – a list of accessible file path
• outfilename (str) – save the sheet as
pyexcel.split_a_book
pyexcel.split_a_book(file_name, outfilename=None)
Split a file into separate sheets
Parameters
• file_name (str) – an accessible file name
pyexcel.extract_a_sheet_from_a_book
Book
Here’s the entity relationship between Book, Sheet, Row and Column
Constructor
Book([sheets, filename, path]) Read an excel book that has one or more sheets
pyexcel.Book
Methods
Attributes
Attribute
pyexcel.Book.number_of_sheets
Book.number_of_sheets()
Return the number of sheets
pyexcel.Book.sheet_names
Book.sheet_names()
Return all sheet names
Conversions
pyexcel.Book.bookdict
Book.bookdict
Get/Set data in/from bookdict format
You could obtain content in bookdict format by dot notation:
Book.bookdict
Book.bookdict = the_io_stream_in_bookdict_format
Book.get_bookdict(**keywords)
Book.set_bookdict(the_io_stream_in_bookdict_format, **keywords)
pyexcel.Book.url
Book.url
Set data in url format
You could set content in url format by dot notation:
Book.url
Book.set_url(the_io_stream_in_url_format, **keywords)
pyexcel.Book.csv
Book.csv
Get/Set data in/from csv format
You could obtain content in csv format by dot notation:
Book.csv
Book.csv = the_io_stream_in_csv_format
Book.get_csv(**keywords)
Book.set_csv(the_io_stream_in_csv_format, **keywords)
pyexcel.Book.tsv
Book.tsv
Get/Set data in/from tsv format
You could obtain content in tsv format by dot notation:
Book.tsv
Book.tsv = the_io_stream_in_tsv_format
Book.get_tsv(**keywords)
Book.set_tsv(the_io_stream_in_tsv_format, **keywords)
pyexcel.Book.csvz
Book.csvz
Get/Set data in/from csvz format
You could obtain content in csvz format by dot notation:
Book.csvz
Book.csvz = the_io_stream_in_csvz_format
Book.get_csvz(**keywords)
Book.set_csvz(the_io_stream_in_csvz_format, **keywords)
pyexcel.Book.tsvz
Book.tsvz
Get/Set data in/from tsvz format
You could obtain content in tsvz format by dot notation:
Book.tsvz
Book.tsvz = the_io_stream_in_tsvz_format
Book.get_tsvz(**keywords)
Book.set_tsvz(the_io_stream_in_tsvz_format, **keywords)
pyexcel.Book.xls
Book.xls
Get/Set data in/from xls format
You could obtain content in xls format by dot notation:
Book.xls
Book.xls = the_io_stream_in_xls_format
Book.get_xls(**keywords)
Book.set_xls(the_io_stream_in_xls_format, **keywords)
pyexcel.Book.xlsm
Book.xlsm
Get/Set data in/from xlsm format
You could obtain content in xlsm format by dot notation:
Book.xlsm
Book.xlsm = the_io_stream_in_xlsm_format
Book.get_xlsm(**keywords)
Book.set_xlsm(the_io_stream_in_xlsm_format, **keywords)
pyexcel.Book.xlsx
Book.xlsx
Get/Set data in/from xlsx format
You could obtain content in xlsx format by dot notation:
Book.xlsx
Book.xlsx = the_io_stream_in_xlsx_format
Book.get_xlsx(**keywords)
Book.set_xlsx(the_io_stream_in_xlsx_format, **keywords)
pyexcel.Book.ods
Book.ods
Get/Set data in/from ods format
You could obtain content in ods format by dot notation:
Book.ods
Book.ods = the_io_stream_in_ods_format
Book.get_ods(**keywords)
Book.set_ods(the_io_stream_in_ods_format, **keywords)
pyexcel.Book.stream
Book.stream
Return a stream in which the content is properly encoded
Example:
Where b.stream.xls.getvalue() is equivalent to b.xls. In some situation b.stream.xls is prefered than b.xls.
Sheet examples:
Where s.stream.xls.getvalue() is equivalent to s.xls. In some situation s.stream.xls is prefered than s.xls.
It is similar to save_to_memory().
Save changes
pyexcel.Book.save_as
Book.save_as(filename, **keywords)
Save the content to a new file
Keywords may vary depending on your file type, because the associated file type employs different library.
PARAMETERS
filename: a file path
library: choose a specific pyexcel-io plugin for writing
renderer_library: choose a pyexcel parser plugin for writing
Parameters related to csv file format
for csv, fmtparams are accepted
delimiter : field separator
lineterminator : line terminator
encoding: csv specific. Specify the file encoding the csv file. For example: encoding=’latin1’. Especially,
encoding=’utf-8-sig’ would add utf 8 bom header if used in renderer, or would parse a csv with utf brom
pyexcel.Book.save_to_memory
pyexcel.Book.save_to_database
pyexcel.Book.save_to_django_models
• initializers – a list of intialization functions for your tables and the sequence should
match tables,
• mapdicts – custom map dictionary for your data columns and the sequence should match
tables
optional parameters: :param batch_size: django bulk_create batch size :param bulk_save: whether to use
bulk_create or to use single save
per record
Sheet
Constructor
Sheet([sheet, name, name_columns_by_row, . . . ]) Two dimensional data container for filtering, formatting
and iteration
pyexcel.Sheet
Methods
Attributes
Attributes
pyexcel.Sheet.content
Sheet.content
Plain representation without headers
pyexcel.Sheet.number_of_rows
Sheet.number_of_rows()
The number of rows
pyexcel.Sheet.number_of_columns
Sheet.number_of_columns()
The number of columns
pyexcel.Sheet.row_range
Sheet.row_range()
Utility function to get row range
pyexcel.Sheet.column_range
Sheet.column_range()
Utility function to get column range
Cell access
pyexcel.Sheet.cell_value
pyexcel.Sheet.__getitem__
Sheet.__getitem__(aset)
By default, this class recognize from top to bottom from left to right
Row access
pyexcel.Sheet.row_at
Sheet.row_at(index)
Gets the data at the specified row
pyexcel.Sheet.set_row_at
Sheet.set_row_at(row_index, data_array)
Update a row data range
pyexcel.Sheet.delete_rows
Sheet.delete_rows(row_indices)
Delete one or more rows
Parameters row_indices (list) – a list of row indices
pyexcel.Sheet.extend_rows
Sheet.extend_rows(rows)
Take ordereddict to extend named rows
Parameters rows (ordereddist/list) – a list of rows.
Column access
pyexcel.Sheet.column_at
Sheet.column_at(index)
Gets the data at the specified column
pyexcel.Sheet.set_column_at
+--> column_index = 2
|
A B C
1 3 N <- starting = 1
2 4 N
This function will not set element outside the current table range
Parameters
• column_index (int) – which column to be modified
• data_array (list) – one dimensional array
• staring (int) – from which index, the update happens
Raises IndexError – if column_index exceeds column range or starting exceeds row range
pyexcel.Sheet.delete_columns
Sheet.delete_columns(column_indices)
Delete one or more columns
Parameters column_indices (list) – a list of column indices
pyexcel.Sheet.extend_columns
Sheet.extend_columns(columns)
Take ordereddict to extend named columns
Parameters columns (ordereddist/list) – a list of columns
Data series
pyexcel.Sheet.name_columns_by_row
Sheet.name_columns_by_row(row_index)
Use the elements of a specified row to represent individual columns
The specified row will be deleted from the data :param row_index: the index of the row that has the column
names
pyexcel.Sheet.rownames
Sheet.rownames
Return row names if any
pyexcel.Sheet.named_column_at
Sheet.named_column_at(name)
Get a column by its name
pyexcel.Sheet.set_named_column_at
Sheet.set_named_column_at(name, column_array)
Take the first row as column names
Given name to identify the column index, set the column to the given array except the column name.
pyexcel.Sheet.delete_named_column_at
Sheet.delete_named_column_at(name)
Works only after you named columns by a row
Given name to identify the column index, set the column to the given array except the column name. :param str
name: a column name
pyexcel.Sheet.name_rows_by_column
Sheet.name_rows_by_column(column_index)
Use the elements of a specified column to represent individual rows
The specified column will be deleted from the data :param column_index: the index of the column that has the
row names
pyexcel.Sheet.colnames
Sheet.colnames
Return column names if any
pyexcel.Sheet.named_row_at
Sheet.named_row_at(name)
Get a row by its name
pyexcel.Sheet.set_named_row_at
Sheet.set_named_row_at(name, row_array)
Take the first column as row names
Given name to identify the row index, set the row to the given array except the row name.
pyexcel.Sheet.delete_named_row_at
Sheet.delete_named_row_at(name)
Take the first column as row names
Given name to identify the row index, set the row to the given array except the row name.
Conversion
pyexcel.Sheet.array
Sheet.array
Get/Set data in/from array format
You could obtain content in array format by dot notation:
Sheet.array
Sheet.array = the_io_stream_in_array_format
Sheet.get_array(**keywords)
Sheet.set_array(the_io_stream_in_array_format, **keywords)
pyexcel.Sheet.records
Sheet.records
Get/Set data in/from records format
You could obtain content in records format by dot notation:
Sheet.records
Sheet.records = the_io_stream_in_records_format
Sheet.get_records(**keywords)
Sheet.set_records(the_io_stream_in_records_format, **keywords)
pyexcel.Sheet.dict
Sheet.dict
Get/Set data in/from dict format
You could obtain content in dict format by dot notation:
Sheet.dict
Sheet.dict = the_io_stream_in_dict_format
Sheet.get_dict(**keywords)
Sheet.set_dict(the_io_stream_in_dict_format, **keywords)
pyexcel.Sheet.url
Sheet.url
Set data in url format
You could set content in url format by dot notation:
Sheet.url
Sheet.set_url(the_io_stream_in_url_format, **keywords)
pyexcel.Sheet.csv
Sheet.csv
Get/Set data in/from csv format
You could obtain content in csv format by dot notation:
Sheet.csv
Sheet.csv = the_io_stream_in_csv_format
Sheet.get_csv(**keywords)
Sheet.set_csv(the_io_stream_in_csv_format, **keywords)
pyexcel.Sheet.tsv
Sheet.tsv
Get/Set data in/from tsv format
You could obtain content in tsv format by dot notation:
Sheet.tsv
Sheet.tsv = the_io_stream_in_tsv_format
Sheet.get_tsv(**keywords)
Sheet.set_tsv(the_io_stream_in_tsv_format, **keywords)
pyexcel.Sheet.csvz
Sheet.csvz
Get/Set data in/from csvz format
You could obtain content in csvz format by dot notation:
Sheet.csvz
Sheet.csvz = the_io_stream_in_csvz_format
Sheet.get_csvz(**keywords)
Sheet.set_csvz(the_io_stream_in_csvz_format, **keywords)
pyexcel.Sheet.tsvz
Sheet.tsvz
Get/Set data in/from tsvz format
You could obtain content in tsvz format by dot notation:
Sheet.tsvz
Sheet.tsvz = the_io_stream_in_tsvz_format
Sheet.get_tsvz(**keywords)
Sheet.set_tsvz(the_io_stream_in_tsvz_format, **keywords)
pyexcel.Sheet.xls
Sheet.xls
Get/Set data in/from xls format
You could obtain content in xls format by dot notation:
Sheet.xls
Sheet.xls = the_io_stream_in_xls_format
Sheet.get_xls(**keywords)
Sheet.set_xls(the_io_stream_in_xls_format, **keywords)
pyexcel.Sheet.xlsm
Sheet.xlsm
Get/Set data in/from xlsm format
You could obtain content in xlsm format by dot notation:
Sheet.xlsm
Sheet.xlsm = the_io_stream_in_xlsm_format
Sheet.get_xlsm(**keywords)
Sheet.set_xlsm(the_io_stream_in_xlsm_format, **keywords)
pyexcel.Sheet.xlsx
Sheet.xlsx
Get/Set data in/from xlsx format
You could obtain content in xlsx format by dot notation:
Sheet.xlsx
Sheet.xlsx = the_io_stream_in_xlsx_format
Sheet.get_xlsx(**keywords)
Sheet.set_xlsx(the_io_stream_in_xlsx_format, **keywords)
pyexcel.Sheet.ods
Sheet.ods
Get/Set data in/from ods format
You could obtain content in ods format by dot notation:
Sheet.ods
Sheet.ods = the_io_stream_in_ods_format
Sheet.get_ods(**keywords)
Sheet.set_ods(the_io_stream_in_ods_format, **keywords)
pyexcel.Sheet.stream
Sheet.stream
Return a stream in which the content is properly encoded
Example:
Where b.stream.xls.getvalue() is equivalent to b.xls. In some situation b.stream.xls is prefered than b.xls.
Sheet examples:
Where s.stream.xls.getvalue() is equivalent to s.xls. In some situation s.stream.xls is prefered than s.xls.
It is similar to save_to_memory().
Formatting
pyexcel.Sheet.format
Sheet.format(formatter)
Apply a formatting action for the whole sheet
Example:
Filtering
pyexcel.Sheet.filter
Sheet.filter(column_indices=None, row_indices=None)
Apply the filter with immediate effect
Transformation
pyexcel.Sheet.project
Sheet.project(new_ordered_columns, exclusion=False)
Rearrange the sheet.
Variables
• new_ordered_columns – new columns
• exclusion – to exlucde named column or not. defaults to False
Example:
pyexcel.Sheet.transpose
Sheet.transpose()
Rotate the data table by 90 degrees
Reference transpose()
pyexcel.Sheet.map
Sheet.map(custom_function)
Execute a function across all cells of the sheet
Example:
>>> import pyexcel as pe
>>> # Given a dictinoary as the following
>>> data = {
... "1": [1, 2, 3, 4, 5, 6, 7, 8],
... "3": [1.25, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8],
... "5": [2, 3, 4, 5, 6, 7, 8, 9],
... "7": [1, '',]
... }
(continues on next page)
pyexcel.Sheet.region
Sheet.region(topleft_corner, bottomright_corner)
Get a rectangle shaped data out
Parameters
• topleft_corner (slice) – the top left corner of the rectangle
• bottomright_corner (slice) – the bottom right corner of the rectangle
pyexcel.Sheet.cut
Sheet.cut(topleft_corner, bottomright_corner)
Get a rectangle shaped data out and clear them in position
Parameters
• topleft_corner (slice) – the top left corner of the rectangle
• bottomright_corner (slice) – the bottom right corner of the rectangle
pyexcel.Sheet.paste
Save changes
pyexcel.Sheet.save_as
Sheet.save_as(filename, **keywords)
Save the content to a named file
Keywords may vary depending on your file type, because the associated file type employs different library.
PARAMETERS
filename: a file path
library: choose a specific pyexcel-io plugin for writing
renderer_library: choose a pyexcel parser plugin for writing
Parameters related to csv file format
for csv, fmtparams are accepted
delimiter : field separator
lineterminator : line terminator
encoding: csv specific. Specify the file encoding the csv file. For example: encoding=’latin1’. Especially,
encoding=’utf-8-sig’ would add utf 8 bom header if used in renderer, or would parse a csv with utf brom
header used in parser.
escapechar : A one-character string used by the writer to escape the delimiter if quoting is set to
QUOTE_NONE and the quotechar if doublequote is False.
quotechar : A one-character string used to quote fields containing special characters, such as the delimiter or
quotechar, or which contain new-line characters. It defaults to ‘”’
quoting : Controls when quotes should be generated by the writer and recognised by the reader. It can take on
any of the QUOTE_* constants (see section Module Contents) and defaults to QUOTE_MINIMAL.
skipinitialspace : When True, whitespace immediately following the delimiter is ignored. The default is False.
pep_0515_off : When True in python version 3.6, PEP-0515 is turned on. The default is False
pyexcel.Sheet.save_to_memory
pyexcel.Sheet.save_to_database
pyexcel.Sheet.save_to_django_model
pyexcel.internal.sheets.Matrix
class pyexcel.internal.sheets.Matrix(array)
The internal representation of a sheet data. Each element can be of any python types
__init__(array)
Constructor
The reason a deep copy was not made here is because the data sheet could be huge. It could be costly to
copy every cell to a new memory area :param list array: a list of arrays
Methods
__init__(array) Constructor
cell_value(row, column[, new_value]) Random access to table cells
clone()
column_at(index) Gets the data at the specified column
column_range() Utility function to get column range
columns() Returns a left to right column iterator
contains(predicate) Has something in the table
cut(topleft_corner, bottomright_corner) Get a rectangle shaped data out and clear them in
position
delete_columns(column_indices) Delete columns by specified list of indices
delete_rows(row_indices) Deletes specified row indices
enumerate() Iterate cell by cell from top to bottom and from left
to right
Continued on next page
Attributes
pyexcel.internal.generators.SheetStream
If you would like to do custom rendering for each row of the two dimensional data, you would need to pass a
row formatting/rendering function to the parameter “renderer” of pyexcel’s signature functions.
__init__(name, payload)
Initialize self. See help(type(self)) for accurate signature.
Methods
Attributes
pyexcel.internal.generators.BookStream
Methods
Row representation
pyexcel.internal.sheets.Row
class pyexcel.internal.sheets.Row(matrix)
Represent row of a matrix
Above column manipulation can be performed on rows similarly. This section will not repeat the same example
but show some advance usages.
__init__(matrix)
Initialize self. See help(type(self)) for accurate signature.
Methods
Column representation
pyexcel.internal.sheets.Column
class pyexcel.internal.sheets.Column(matrix)
Represent columns of a matrix
__init__(matrix)
Initialize self. See help(type(self)) for accurate signature.
Methods
Note: As to rnd_requirements.txt, usually, it is created when a dependent library is not released. Once the dependecy
is installed (will be released), the future version of the dependency in the requirements.txt will be valid.
Although nose and doctest are both used in code testing, it is adviable that unit tests are put in tests. doctest is incor-
porated only to make sure the code examples in documentation remain valid across different development releases.
On Linux/Unix systems, please launch your tests like this:
$ make
> test.bat
Please run:
$ make format
so as to beautify your code otherwise your build may fail your unit test.
When developing source plugins, it becomes necessary to have log trace available. It helps find out what goes wrong
quickly.
The basic step would be to set up logging before pyexcel import statement.
import logging
import logging.config
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
level=logging.DEBUG)
import pyexcel
And if you would use a complex configuration, you can use the following code.
import logging
import logging.config
logging.config.fileConfig('log.conf')
import pyexcel
[loggers]
keys=root, sources, renderers
[handlers]
keys=consoleHandler
[formatters]
keys=custom
[logger_root]
(continues on next page)
[logger_sources]
level=DEBUG
handlers=consoleHandler
qualname=pyexcel.sources.factory
propagate=0
[logger_renderers]
level=DEBUG
handlers=consoleHandler
qualname=pyexcel.renderers.factory
propagate=0
[handler_consoleHandler]
class=StreamHandler
level=DEBUG
formatter=custom
args=(sys.stdout,)
[formatter_custom]
format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
datefmt=
Disable logging
-------------------
In unit testing and django framework, you will find the `lml` logging even you have
˓→not
import logging
logging.getLogger('lml.plugin').propagate = False
With pyexcel v0.5.0, the way to package it has been changed because it uses lml for all plugins.
And you need to do the same for pyexcel-io plugins too.
There are three types of plugins for pyexcel: data parser, data renderer and data source.
Tutorial
setupmobans pyexcel-commons
export YEHUA_FILE=$YOUR_WORK_DIRECTORY/pyexcel-commons/yehua/yehua.yml
Step 1
$ yehua
Yehua will walk you through creating a pyexcel package.
Press ^C to quit at any time.
Step 2
$ cd pyexcel-pdfr/
$ ln -s ../pyexcel-commons/ commons
$ ln -s ../setupmobans/ setupmobans
$ moban
Templating README.rst to README.rst
Templating setup.py to setup.py
Templating requirements.txt to requirements.txt
Templating NEW_BSD_LICENSE.jj2 to LICENSE
Templating MANIFEST.in.jj2 to MANIFEST.in
Templating tests/requirements.txt to tests/requirements.txt
Templating test.script.jj2 to test.sh
Templating test.script.jj2 to test.bat
Templating travis.yml.jj2 to .travis.yml
Templating gitignore.jj2 to .gitignore
Templating docs/source/conf.py.jj2 to docs/source/conf.py
Step 3 - Coding
import pyexcel.ext.ods
import pyexcel.ext.ods3
import pyexcel.ext.text
import pyexcel.ext.xls
import pyexcel.ext.xlsx
sheet_a and sheet_b will no longer have access to the data of sheet. book will no longer have access to the data of
sheet_a and sheet_b.
Under Hyrum’s Law, this enhancement in 0.6.0 will cause breakage otherwise.
get_{{file_type}}_stream functions from pyexcel.Sheet and pyexcel.Book were introduced since 0.4.3 but
were removed since 0.4.4. Please be advised to use save_to_memory functions, Sheet.io.{{file_type}} or
Book.io.{{file_type}}.
Filtering and formatting behavior of pyexcel.Sheet are simplified. Soft filter and soft formatter are removed.
Extra classes such as iterator, formatter, filter are removed.
Most of formatting tasks could be achieved using format() and map(). and Filtering with filter(). Formatting
and filtering on row and/or column can be found with row() and column()
sheet.filter(pe.OddRowFilter())
They are no longer needed. As long as you have pip-installed them, they will be auto-loaded. However, if you do not
want some of the plugins, please use pip to uninstall them.
What if you have your code as it is? No harm but a few warnings shown:
Deprecated usage since v0.2.2! Explicit import is no longer required. pyexcel.ext.ods
˓→is auto imported.
As in Issue 20, pyexcel-xls was used for xls and pyexcel-xlsx had to be used for xlsx. Both must co-exist due to
requirements. The workaround would failed when auto-import are enabled in v0.2.2. Hence, user of pyexcel in this
situation shall use ‘library’ parameter to all signature functions, to instruct pyexcel to use a named library for each
function call.
pyexcel.get_io was passed on from pyexcel-io. However, it is no longer exposed. Please use pyex-
cel_io.manager.RWManager.get_io if you have to.
You are likely to use pyexcel.get_io when you do pyexcel.Sheet.save_to_memory() or pyexcel.Book.
save_to_memory() where you need to put in a io stream. But actually, with latest code, you could put in a
None.
w = pyexcel.Writer("afile.csv")
data=[['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 1.1, 1]]
w.write_array(table)
w.close()
>>> data=[['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 1.1, 1]]
>>> pyexcel.save_as(array=data, dest_file_name="afile.csv")
content = {
"X": [1,2,3,4,5],
"Y": [6,7,8,9,10],
"Z": [11,12,13,14,15],
}
w = pyexcel.Writer("afile.csv")
w.write_dict(self.content)
w.close()
>>> content = {
... "X": [1,2,3,4,5],
... "Y": [6,7,8,9,10],
... "Z": [11,12,13,14,15],
... }
>>> pyexcel.save_as(adict=content, dest_file_name="afile.csv")
data = [
[1, 2, 3],
[4, 5, 6]
]
io = StringIO()
w = pyexcel.Writer(("csv",io))
w.write_rows(data)
w.close()
>>> data = [
... [1, 2, 3],
... [4, 5, 6]
... ]
>>> io = pyexcel.save_as(dest_file_type='csv', array=data)
>>> for line in io.readlines():
... print(line.rstrip())
1,2,3
4,5,6
import pyexcel
content = {
"Sheet1": [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]],
"Sheet2": [[4, 4, 4, 4], [5, 5, 5, 5], [6, 6, 6, 6]],
"Sheet3": [[u'X', u'Y', u'Z'], [1, 4, 7], [2, 5, 8], [3, 6, 9]]
}
w = pyexcel.BookWriter("afile.csv")
w.write_book_from_dict(content)
w.close()
0.7.0 - 12.2.2022
Fixed
1. #250: RecursionError raised on deepcopy of a sheet
Updated
1. #255: pyexcel.get_array documentation page seems to be a copy of pyexcel.get_sheet
Removed
1. #249: drop the support for dummy import statements pyexcel.ext.*
0.6.7 - 12.09.2021
Updated
1. #243: fix small typo.
2. add chardet as explicit dependency
0.6.6 - 14.11.2020
Updated
1. #233: dynamically resize the table matrix on set_value. sheet[‘AA1’] = ‘test’ will work in this release.
0.6.5 - 8.10.2020
Updated
1. update queryset source to work with pyexcel-io 0.6.0
0.6.4 - 18.08.2020
Updated
1. #219: book created from dict no longer discards order.
0.6.3 - 01.08.2020
fixed
1. #214: remove leading and trailing whitespace for column names
removed
1. python 2 compatibility have been permanently removed.
0.6.2 - 8.06.2020
fixed
1. #109: Control the column order when write the data output
0.6.1 - 02.05.2020
fixed
1. #203: texttable was dropped out in 0.6.0 as compulsary dependency. end user may experience it when a
sheet/table is printed in a shell. otherwise, new user of pyexcel won’t see it. As of release date, no issues
were created
0.6.0 - 21.04.2020
updated
1. #199: += in place; = + shall return new instance
2. #195: documentation update. however small is welcome
removed
1. Dropping the test support for python version lower than 3.6. v0.6.0 should work with python 2.7 but is not
guaranteed to work. Please upgrade to python 3.6+.
0.5.15 - 07.07.2019
updated
1. #185: fix a bug with http data source. The real fix lies in pyexcel-io v0.5.19. this release just put the version
requirement in.
0.5.14 - 12.06.2019
updated
1. #182: support dest_force_file_type on save_as and save_book_as
0.5.13 - 12.03.2019
updated
1. #176: get_sheet {IndexError}list index out of range // XLSX can’t be opened
0.5.12 - 25.02.2019
updated
1. #174: include examples in tarbar
0.5.11 - 22.02.2019
updated
1. #169: remove pyexcel-handsontalbe in test
2. add tests, and docs folder in distribution
0.5.10 - 3.12.2018
updated
1. #157: Please use scan_plugins_regex, which lml 0.7 complains about
2. updated dependency on pyexcel-io to 0.5.11
0.5.9.1 - 30.08.2018
updated
1. to require pyexcel-io 0.5.9.1 and use lml at least version 0.0.2
0.5.9 - 30.08.2018
added
1. support __len__. len(book) returns the number of sheets and len(sheet) returns the number of rows
2. #144: memory-efficient way to read sheet names.
3. #148: force_file_type is introduced. When reading a file on a disk, this parameter allows you to choose a reader.
i.e. csv reader for a text file. xlsx reader for a xlsx file but with .blob file suffix.
4. finally, pyexcel got import pyexcel.__version__
updated
1. Sheet.to_records() returns a generator now, saving memory
2. #115, Fix set membership test to run faster in python2
3. #140, Direct writes to cells yield weird results
0.5.8 - 26.03.2018
added
1. #125, sort book sheets
updated
1. #126, dest_sheet_name in save_as will set the sheet name in the output
2. #115, Fix set membership test to run faster in python2
0.5.7 - 11.01.2018
added
1. pyexcel-io#46, expose bulk_save to developer.
0.5.6 - 23.10.2017
removed
1. #105, remove gease from setup_requires, introduced by 0.5.5.
2. removed testing against python 2.6
3. #103, include LICENSE file in MANIFEST.in, meaning LICENSE file will appear in the released tar ball.
0.5.5 - 20.10.2017
removed
1. #105, remove gease from setup_requires, introduced by 0.5.5.
2. removed testing against python 2.6
3. #103, include LICENSE file in MANIFEST.in, meaning LICENSE file will appear in the released tar ball.
0.5.4 - 27.09.2017
fixed
1. #100, Sheet.to_dict() gets out of range error because there is only one row.
updated
1. Updated the baseline of pyexcel-io to 0.5.1.
0.5.3 - 01-08-2017
added
1. #95, respect the order of records in iget_records, isave_as and save_as.
2. #97, new feature to allow intuitive initialization of pyexcel.Book.
0.5.2 - 26-07-2017
Updated
1. embeded the enabler for pyexcel-htmlr. http source does not support text/html as mime type.
0.5.1 - 12.06.2017
Updated
1. support saving SheetStream and BookStream to database targets. This is needed for pyexcel-webio and its
downstream projects.
0.5.0 - 19.06.2017
Added
1. Sheet.top() and Sheet.top_left() for data browsing
2. add html as default rich display in Jupyter notebook when pyexcel-text and pyexcel-chart is installed
3. add svg as default rich display in Jupyter notebook when pyexcel-chart and one of its implementation
plugin(pyexcel-pygal, etc.) are is installed
4. new dictionary source supported: a dictionary of key value pair could be read into a sheet.
5. added dynamic external plugin loading. meaning if a pyexcel plugin is installed, it will be loaded implicitly.
And this change would remove unnecessary info log for those who do not use pyexcel-text and pyexcel-gal
6. save_book_as before 0.5.0 becomes isave_book_as and save_book_as in 0.5.0 convert BookStream to Book
before saving.
7. #83, file closing mechanism is enfored. free_resource is added and it should be called when iget_array,
iget_records, isave_as and/or isave_book_as are used.
Updated
1. array is passed to pyexcel.Sheet as reference. it means your array data will be modified.
Removed
1. pyexcel.Writer and pyexcel.BookWriter were removed
2. pyexcel.load_book_from_sql and pyexcel.load_from_sql were removed
3. pyexcel.deprecated.load_from_query_sets, pyexcel.deprecated.load_book_from_django_models and pyex-
cel.deprecated.load_from_django_model were removed
4. Removed plugin loading code and lml is used instead
0.4.5 - 17.03.2017
Updated
1. #80: remove pyexcel-chart import from v0.4.x
0.4.4 - 06.02.2017
Updated
1. #68: regression save_to_memory() should have returned a stream instance which has been reset to zero if
possible. The exception is sys.stdout, which cannot be reset.
2. #74: Not able to handle decimal.Decimal
Removed
1. remove get_{{file_type}}_stream functions from pyexcel.Sheet and pyexcel.Book introduced since 0.4.3.
0.4.3 - 26.01.2017
Added
1. ‘.stream’ attribute are attached to ~pyexcel.Sheet and ~pyexcel.Book to get direct access the underneath stream
in responding to file type attributes, such as sheet.xls. it helps provide a custom stream to external world, for
example, Sheet.stream.csv gives a text stream that contains csv formatted data. Book.stream.xls returns a xls
format data in a byte stream.
Updated
1. Better error reporting when an unknown parameters or unsupported file types were given to the signature func-
tions.
0.4.2 - 17.01.2017
Updated
1. Raise exception if the incoming sheet does not have column names. In other words, only sheet with column
names could be saved to database. sheet with row names cannot be saved. The alternative is to transpose the
sheet, then name_columns_by_row and then save.
2. fix iget_records where a non-uniform content should be given, e.g. [[“x”, “y”], [1, 2], [3]], some record would
become non-uniform, e.g. key ‘y’ would be missing from the second record.
3. skip_empty_rows is applicable when saving a python data structure to another data source. For example,
if your array contains a row which is consisted of empty string, such as [‘’, ‘’, ‘’ . . . ‘’], please specify
skip_empty_rows=False in order to preserve it. This becomes subtle when you try save a python dictionary
where empty rows is not easy to be spotted.
4. #69: better documentation for save_book_as.
0.4.1 - 23.12.2016
Updated
1. #68: regression save_to_memory() should have returned a stream instance.
0.4.0 - 22.12.2016
Added
1. Flask-Excel#19 allow sheet_name parameter
2. pyexcel-xls#11 case-insensitive for file_type. xls and XLS are treated in the same way
Updated
1. #66: export_columns is ignored
2. Update dependency on pyexcel-io v0.3.0
0.3.3 - 07.11.2016
Updated
1. #63: cannot display empty sheet(hence book with empty sheet) as texttable
0.3.2 - 02.11.2016
Updated
1. #62: optional module import error become visible.
0.3.0 - 28.10.2016
Added:
1. file type setters for Sheet and Book, and its documentation
2. iget_records returns a generator for a list of records and should have better memory performance, especially
dealing with large csv files.
3. iget_array returns a generator for a list of two dimensional array and should have better memory performance,
especially dealing with large csv files.
4. Enable pagination support, and custom row renderer via pyexcel-io v0.2.3
Updated
1. Take isave_as out from save_as. Hence two functions are there for save a sheet as
2. #60: encode ‘utf-8’ if the console is of ascii encoding.
3. #59: custom row renderer
4. #56: set cell value does not work
5. pyexcel.transpose becomes pyexcel.sheets.transpose
6. iterator functions of pyexcel.Sheet were converted to generator functions
• pyexcel.Sheet.enumerate()
• pyexcel.Sheet.reverse()
• pyexcel.Sheet.vertical()
• pyexcel.Sheet.rvertical()
• pyexcel.Sheet.rows()
• pyexcel.Sheet.rrows()
• pyexcel.Sheet.columns()
• pyexcel.Sheet.rcolumns()
• pyexcel.Sheet.named_rows()
• pyexcel.Sheet.named_columns()
7. ~pyexcel.Sheet.save_to_memory and ~pyexcel.Book.save_to_memory return the actual content. No longer they
will return a io object hence you cannot call getvalue() on them.
Removed:
1. content and out_file as function parameters to the signature functions are no longer supported.
2. SourceFactory and RendererFactory are removed
3. The following methods are removed
• pyexcel.to_array
• pyexcel.to_dict
• pyexcel.utils.to_one_dimensional_array
• pyexcel.dict_to_array
• pyexcel.from_records
• pyexcel.to_records
4. pyexcel.Sheet.filter has been re-implemented and all filters were removed:
• pyexcel.filters.ColumnIndexFilter
• pyexcel.filters.ColumnFilter
• pyexcel.filters.RowFilter
• pyexcel.filters.EvenColumnFilter
• pyexcel.filters.OddColumnFilter
• pyexcel.filters.EvenRowFilter
• pyexcel.filters.OddRowFilter
• pyexcel.filters.RowIndexFilter
• pyexcel.filters.SingleColumnFilter
• pyexcel.filters.RowValueFilter
• pyexcel.filters.NamedRowValueFilter
• pyexcel.filters.ColumnValueFilter
• pyexcel.filters.NamedColumnValueFilter
• pyexcel.filters.SingleRowFilter
5. the following functions have been removed
• add_formatter
• remove_formatter
• clear_formatters
• freeze_formatters
• add_filter
• remove_filter
• clear_filters
• freeze_formatters
6. pyexcel.Sheet.filter has been re-implemented and all filters were removed:
• pyexcel.formatters.SheetFormatter
0.2.5 - 31.08.2016
Updated:
1. #58: texttable should have been made as compulsory requirement
0.2.4 - 14.07.2016
Updated:
1. For python 2, writing to sys.stdout by pyexcel-cli raise IOError.
0.2.3 - 11.07.2016
Updated:
1. For python 3, do not seek 0 when saving to memory if sys.stdout is passed on. Hence, adding support for
sys.stdin and sys.stdout.
0.2.2 - 01.06.2016
Updated:
1. Explicit imports, no longer needed
2. Depends on latest setuptools 18.0.1
3. NotImplementedError will be raised if parameters to core functions are not supported, e.g.
get_sheet(cannot_find_me_option=”will be thrown out as NotImplementedError”)
0.2.1 - 23.04.2016
Added:
1. add pyexcel-text file types as attributes of pyexcel.Sheet and pyexcel.Book, related to #31
2. auto import pyexcel-text if it is pip installed
Updated:
1. code refactoring done for easy addition of sources.
2. bug fix #29, Even if the format is a string it is displayed as a float
3. pyexcel-text is no longer a plugin to pyexcel-io but to pyexcel.sources, see pyexcel-text#22
Removed:
1. pyexcel.presentation is removed. No longer the internal decorate @outsource is used. related to #31
0.2.0 - 17.01.2016
Updated
1. adopt pyexcel-io yield key word to return generator as content
2. pyexcel.save_as and pyexcel.save_book_as get performance improvements
0.1.7 - 03.07.2015
Added
1. Support pyramid-excel which does the database commit on its own.
0.1.6 - 13.06.2015
Added
1. get excel data from a http url
0.0.13 - 07.02.2015
Added
1. Support django
2. texttable as default renderer
0.0.12 - 25.01.2015
Added
1. Added sqlalchemy support
0.0.10 - 15.12.2015
Added
1. added csvz and tsvz format
0.0.4 - 12.10.2014
Updated
1. Support python 3
0.0.1 - 14.09.2014
Features:
1. read and write csv, ods, xls, xlsx and xlsm files(which are referred later as excel files)
2. various iterators for the reader
3. row and column filters for the reader
• genindex
• modindex
• search
187
pyexcel, Release 0.7.0
B G
Book (class in pyexcel), 131 get_array() (in module pyexcel), 90
bookdict (pyexcel.Book attribute), 134 get_book() (in module pyexcel), 102
BookStream (class in pyexcel.internal.generators), 164 get_book_dict() (in module pyexcel), 101
get_dict() (in module pyexcel), 92
C get_records() (in module pyexcel), 96
cell_value() (pyexcel.Sheet method), 145 get_sheet() (in module pyexcel), 103
colnames (pyexcel.Sheet attribute), 149
Column (class in pyexcel.internal.sheets), 165 I
column_at() (pyexcel.Sheet method), 147 iget_array() (in module pyexcel), 109
column_range() (pyexcel.Sheet method), 145 iget_book() (in module pyexcel), 107
content (pyexcel.Sheet attribute), 145 iget_records() (in module pyexcel), 111
csv (pyexcel.Book attribute), 135 isave_as() (in module pyexcel), 120
csv (pyexcel.Sheet attribute), 151 isave_book_as() (in module pyexcel), 126
csvz (pyexcel.Book attribute), 135
csvz (pyexcel.Sheet attribute), 152 M
cut() (pyexcel.Sheet method), 157 map() (pyexcel.Sheet method), 156
Matrix (class in pyexcel.internal.sheets), 160
D merge_all_to_a_book() (in module pyexcel), 129
delete_columns() (pyexcel.Sheet method), 147 merge_csv_to_a_book() (in module pyexcel), 129
189
pyexcel, Release 0.7.0
N T
name_columns_by_row() (pyexcel.Sheet method), transpose() (pyexcel.Sheet method), 156
148 tsv (pyexcel.Book attribute), 135
name_rows_by_column() (pyexcel.Sheet method), tsv (pyexcel.Sheet attribute), 151
149 tsvz (pyexcel.Book attribute), 136
named_column_at() (pyexcel.Sheet method), 148 tsvz (pyexcel.Sheet attribute), 152
named_row_at() (pyexcel.Sheet method), 149
number_of_columns() (pyexcel.Sheet method), 145 U
number_of_rows() (pyexcel.Sheet method), 145 url (pyexcel.Book attribute), 134
number_of_sheets() (pyexcel.Book method), 134 url (pyexcel.Sheet attribute), 151
O X
ods (pyexcel.Book attribute), 137 xls (pyexcel.Book attribute), 136
ods (pyexcel.Sheet attribute), 153 xls (pyexcel.Sheet attribute), 152
xlsm (pyexcel.Book attribute), 136
P xlsm (pyexcel.Sheet attribute), 153
paste() (pyexcel.Sheet method), 157 xlsx (pyexcel.Book attribute), 137
project() (pyexcel.Sheet method), 155 xlsx (pyexcel.Sheet attribute), 153
R
records (pyexcel.Sheet attribute), 150
region() (pyexcel.Sheet method), 157
Row (class in pyexcel.internal.sheets), 164
row_at() (pyexcel.Sheet method), 146
row_range() (pyexcel.Sheet method), 145
rownames (pyexcel.Sheet attribute), 148
S
save_as() (in module pyexcel), 115
save_as() (pyexcel.Book method), 138
save_as() (pyexcel.Sheet method), 158
save_book_as() (in module pyexcel), 125
save_to_database() (pyexcel.Book method), 139
save_to_database() (pyexcel.Sheet method), 159
save_to_django_model() (pyexcel.Sheet method),
160
save_to_django_models() (pyexcel.Book
method), 139
save_to_memory() (pyexcel.Book method), 139
save_to_memory() (pyexcel.Sheet method), 159
set_column_at() (pyexcel.Sheet method), 147
set_named_column_at() (pyexcel.Sheet method),
148
set_named_row_at() (pyexcel.Sheet method), 149
set_row_at() (pyexcel.Sheet method), 146
Sheet (class in pyexcel), 140
sheet_names() (pyexcel.Book method), 134
SheetStream (class in pyexcel.internal.generators),
163
split_a_book() (in module pyexcel), 129
stream (pyexcel.Book attribute), 137
stream (pyexcel.Sheet attribute), 154
190 Index