API Reference
Main Interface
- class pdfscraper.document.Document(pages: List[pdfscraper.page.Page], doc: Any, orientation: pdfscraper.layout.utils.Orientation)[source]
- Parameters
pages (List[Page]) –
doc (Any) –
orientation (Orientation) –
- classmethod from_pdfminer(path, orientation=Orientation(vertical_orientation=VerticalOrientation(bottom_is_zero=False), horizontal_orientation=HorizontalOrientation(left_is_zero=True)))[source]
- Parameters
orientation (Orientation) –
- Return type
- classmethod from_pymupdf(path, orientation=Orientation(vertical_orientation=VerticalOrientation(bottom_is_zero=False), horizontal_orientation=HorizontalOrientation(left_is_zero=True)))[source]
- Parameters
orientation (Orientation) –
- Return type
- orientation: Orientation[source]
Modules1
- class pdfscraper.layout.image.Image(bbox, source_width, source_height, colorspace_name, bpc, xref, name, source, raw_object=None, parent_object=None, colorspace_n=None)[source]
An image created from pdfminer or pymupdf object.
- Parameters
bbox (Bbox) –
source_width (Optional[int]) –
source_height (Optional[int]) –
colorspace_name (Optional[str]) –
bpc (Optional[int]) –
xref (Optional[int]) –
name (Optional[str]) –
source (typing_extensions.Literal[pdfminer, mupdf]) –
raw_object (Any) –
parent_object (Any) –
colorspace_n (Optional[int]) –
- classmethod from_pdfminer(image, page_orientation)[source]
Create an image out of pdfminer object.
- Parameters
image (pdfminer.layout.LTImage) – pdfminer LTImage object.
orientation – page orientation data.
page_orientation (PageOrientation) –
- Returns
- Return type
- classmethod from_pymupdf(image, doc, page_orientation)[source]
- Parameters
image (MuPDFImage) –
doc (fitz.fitz.Document) –
page_orientation (PageOrientation) –
- Return type
- class pdfscraper.layout.annotations.Annotation(content: str, author: str, mod_date: str, creation_date: str, bbox: pdfscraper.layout.utils.Bbox)[source]
- Parameters
content (str) –
author (str) –
mod_date (str) –
creation_date (str) –
bbox (Bbox) –
- classmethod from_pdfminer_annot(annot, page_orientation)[source]
- Parameters
annot (PDFMinerAnnotation) –
page_orientation (PageOrientation) –
- classmethod from_pymupdf_annot(annot, page_orientation)[source]
- Parameters
annot (PyMuPDFAnnotation) –
page_orientation (PageOrientation) –