Home > Software engineering >  How can i extract text from a PDF with python?
How can i extract text from a PDF with python?

Time:01-14

I'm looking to extract some text from a PDF. I'm using this code:

import PyPDF2
Doc = open('document.pdf','rb') 
pdfreader = PyPDF2.PdfFileReader(Doc)
pageObj = pdfreader.getPage(0)
pageObj.extractText()

Using this code the result from pageObj.extractText() is ''. I don't know why this happen because there are text in the pdf that is open. This document just have 1 page.

Someone know what happen? or if there is another way to get information from a PDF?

CodePudding user response:

You can try with PDF Plumber.

Instead of printing you can write it in a text file.

import pdfplumber
with pdfplumber.open(r'D:\document.pdf') as pdf:
    first_page = pdf.pages[0]
    print(first_page.extract_text())
  •  Tags:  
  • Related