What is the way to convert a PDF document to CSV format using Python?

Posted by:

What is the way to convert a PDF document to CSV format using Python?

There are many ways to convert PDF document to CSV format using Python and some are displayed here:

 

PDF document to CSV format using Python

 

METHOD 1:

  1. OCR the pdf using python tesseract open source OCR if PDF is not readable.
  2. Read the pdf content using pypdf2 or pdfminer libraries.
  3. Prettify text using beautifulsoup if necessary.
  4. Load the data into pandas data frame.
  5. Export data into CSV using pandas.

METHOD 2:

You can use this API to convert PDF to CSV using Python. The tool uses an algorithm to ‘see’ tables and hence outputs data from PDFs accurately: PDF to Excel API – How it Works — PDFTables

METHOD 3:

Step 1

In your terminal/command line, install the PDFTables Python library with:

pip install git+https://github.com/pdftables/python-pdftables-api.git

If GitHub is not recognized, download it here. Then, run the above command again.

Or if you’d prefer to install it manually, you can download it from python-pdftables-api then install it with:

python setup.py install

 

Step 2

Create a new Python script then add the following code:

import pdftables_api

c = pdftables_api.Client('my-api-key')
c.xlsx('input.pdf', 'output') #replace c.xlsx with c.csv to convert to CSV

Now, you’ll need to make the following changes to the script:

  • Substitute withmy-api-key your PDFTables API key, which you can get here.
  • Replace withinput.pdf the PDF you would like to convert.
  • Displace withoutput the name you’d like to give the converted document.

Now, save your finished script as inconvert-pdf.py the same directory as the PDF document you want to convert.

PDF and Python script in the conversion directory

 

Step 3

Open your command line/terminal and change your directory (e.g. cd C:/Users/Bob) to the folder you saved your convert-pdf.py script and PDF in, then run the following command:

python convert-pdf.py

To find your converted spreadsheet, navigate to the folder in your file explorer and hey presto, you’ve converted a PDF to Excel or CSV with Python!

Converted Excel spreadsheet in its directory

0

Add a Comment