API Reference

Main Interface

class pdfscraper.document.Document(pages: List[pdfscraper.page.Page], doc: Any, orientation: pdfscraper.layout.utils.Orientation)[source]
Parameters
create_sections()[source]
doc: Any[source]
classmethod from_pdfminer(path, orientation=Orientation(vertical_orientation=VerticalOrientation(bottom_is_zero=False), horizontal_orientation=HorizontalOrientation(left_is_zero=True)))[source]
Parameters

orientation (Orientation) –

Return type

Document

classmethod from_pymupdf(path, orientation=Orientation(vertical_orientation=VerticalOrientation(bottom_is_zero=False), horizontal_orientation=HorizontalOrientation(left_is_zero=True)))[source]
Parameters

orientation (Orientation) –

Return type

Document

orientation: Orientation[source]
pages: List[Page][source]

Modules1

class pdfscraper.layout.image.Image(bbox, source_width, source_height, colorspace_name, bpc, xref, name, source, raw_object=None, parent_object=None, colorspace_n=None)[source]

An image created from pdfminer or pymupdf object.

Parameters
  • bbox (Bbox) –

  • source_width (Optional[int]) –

  • source_height (Optional[int]) –

  • colorspace_name (Optional[str]) –

  • bpc (Optional[int]) –

  • xref (Optional[int]) –

  • name (Optional[str]) –

  • source (typing_extensions.Literal[pdfminer, mupdf]) –

  • raw_object (Any) –

  • parent_object (Any) –

  • colorspace_n (Optional[int]) –

class Config[source]
arbitrary_types_allowed = True[source]
bbox: Bbox[source]
bpc: Optional[int][source]
colorspace_n: Optional[int] = None[source]
colorspace_name: Optional[str][source]
classmethod from_pdfminer(image, page_orientation)[source]

Create an image out of pdfminer object.

Parameters
  • image (pdfminer.layout.LTImage) – pdfminer LTImage object.

  • orientation – page orientation data.

  • page_orientation (PageOrientation) –

Returns

Return type

Image

classmethod from_pymupdf(image, doc, page_orientation)[source]
Parameters
Return type

Image

property height[source]
move(delta)[source]
name: Optional[str][source]
parent_object: Any = None[source]
raw_object: Any = None[source]
save(path)[source]
Parameters

path (str) –

source: typing_extensions.Literal[pdfminer, mupdf][source]
source_height: Optional[int][source]
source_width: Optional[int][source]
property width[source]
property x0[source]
property x1[source]
xref: Optional[int][source]
property y0[source]
property y1[source]
class pdfscraper.layout.annotations.Annotation(content: str, author: str, mod_date: str, creation_date: str, bbox: pdfscraper.layout.utils.Bbox)[source]
Parameters
  • content (str) –

  • author (str) –

  • mod_date (str) –

  • creation_date (str) –

  • bbox (Bbox) –

author: str[source]
bbox: Bbox[source]
content: str[source]
creation_date: str[source]
classmethod from_pdfminer_annot(annot, page_orientation)[source]
Parameters
classmethod from_pymupdf_annot(annot, page_orientation)[source]
Parameters
mod_date: str[source]