pyfixwidth reads fixed-width text files and converts each record into Python
values. It can be used as a command-line tool that writes delimited output or
as a small parsing library inside your own code.
The package has no runtime dependencies and is designed to stay lightweight.
pip install pyfixwidthThe repository includes a small example layout and sample data files:
python -m fixwidth example/data.layout example/data1.txt example/data2.txtThis writes tab-separated output to standard output:
employee_id job_title salary hire_date
100001 CEO 15000.0 1995-08-23
100002 Programmer 8500.0 2002-11-10
100003 Data Scientist 10000.0 2005-07-01
100004 Sales Rep 5000.0 1999-06-01
100005 Customer Servic 4800.0 2001-12-17
If you install the package, the same command is also available as:
pyfixwidth example/data.layout example/data1.txt example/data2.txtA layout file is tab-delimited and describes how each source field should be read. The first line is a title, then each later line contains:
- field width
- converter name
- field name
Example:
employees
# records on workers and their salaries
6 int employee_id
15 str job_title
8 float salary
# negative values denote fields to skip when reading data
-3 str blank
10 date hire_date
Rules:
- Comments begin with
#and must occupy their own line. - Negative widths skip bytes in the input and do not appear in parsed rows.
- Blank field content becomes
Nonebefore type conversion. - A layout can be loaded from disk with
read_file_format()or supplied directly as a sequence of(width, datatype, name)tuples.
| Type | Meaning | Accepted values |
|---|---|---|
str |
text | any decoded string |
int |
integer | values accepted by int() |
float |
floating point number | values accepted by float() |
bool |
boolean | Python truthiness via bool() |
yesno |
yes/no boolean | Y, N, Yes, No and lowercase variants |
date |
date | 1995-08-23, 19950823, 23aug1995, 1995-8-23, 122599 |
datetime |
date with time | 1995-08-23 14:30:00.000 and similar ISO-like values |
julian |
Julian date | YYYYDDD, with optional separators removed before parsing |
time |
time | 14:30:00, 14.30.00, 143000, 09:00, 0900 |
date and datetime formats are inferred with regular expressions, so if you
have unusual source formats you may want to register a custom converter.
For most code, these are the main entry points:
read_file_format(path)loads a layout file and returns(title, spec).parse_file(path, spec=...)yieldsOrderedDictrows from a file on disk.parse_lines(lines, spec=...)parses an iterable of binary lines.DictReader(fileobj, fieldinfo=...)provides acsv.DictReader-like iterator for binary file objects.register_type(name)lets you add custom converters.
from fixwidth import read_file_format, parse_file
title, layout = read_file_format('example/data.layout')
print(title)
rows = parse_file('example/data1.txt', spec=layout, type_errors='ignore')
for row in rows:
print('Salary for {} is {}'.format(row['employee_id'], row['salary']))DictReader expects a binary file object:
import fixwidth
with open('example/data1.txt', 'rb') as fh:
reader = fixwidth.DictReader(
fh,
fieldinfo='example/data.layout',
skip_blank_lines=True,
)
first_row = next(reader)
print(first_row['job_title'])You can also pass the layout directly:
layout = [
(6, 'int', 'employee_id'),
(15, 'str', 'job_title'),
(8, 'float', 'salary'),
(-3, 'str', 'blank'),
(10, 'date', 'hire_date'),
]
with open('example/data1.txt', 'rb') as fh:
reader = fixwidth.DictReader(fh, layout)
print(next(reader))Converters live in fixwidth.converters. To register a new one, decorate a
function that accepts a decoded string and returns the converted value.
from fixwidth.converters import register_type
@register_type('uppercase')
def convert_uppercase(value):
return value.strip().upper()After registration, the new type name can be used in layouts just like the built-in types.
- Open files in binary mode when using
DictReader. parse_file()defaults toencoding='ascii'.parse_lines()defaults toencoding='utf-8'.- Use
type_errors='ignore'to replace invalid values withNoneand keep parsing. skip_blank_lines=Trueignores lines that are empty after removing trailing newlines. Lines that contain only spaces still produce a row ofNonevalues.
Additional documentation lives in docs/index.md: