pdfscraper.page
Module Contents
Classes
- class pdfscraper.page.Page(words, drawings, images, raw_object, blocks)[source]
- Parameters
words (List[pdfscraper.layout.text.Word]) –
drawings (List[pdfscraper.layout.drawing.Shape]) –
images (List[pdfscraper.layout.image.Image]) –
raw_object (Union[fitz.fitz.Page, pdfminer.layout.LTPage]) –
blocks (List[pdfscraper.layout.text.Block]) –
- property sorted: List[pdfscraper.layout.text.TextLine][source]
- Return type
- property sorted_lines: Optional[SortedTextlines][source]
- Return type
Optional[SortedTextlines]
- select(condition)[source]
Find content matching condition.
- Parameters
condition (Callable) –
- Return type
- __add__(other, other_position_delta=None)[source]
Create a new page by summing objects of this and another page. To concatenate them vertically or horizontally move all objects of the other page by specified delta.
- Parameters
other – another Page
other_position_delta (pdfscraper.layout.utils.Bbox) –
- Returns
a new Page
- Return type
- classmethod from_pymupdf(page, orientation=None)[source]
- Parameters
page (fitz.fitz.Page) –
orientation (pdfscraper.layout.utils.Orientation) –
- Return type
- classmethod from_pdfminer(page, orientation=None)[source]
- Parameters
page (pdfminer.layout.LTPage) –
orientation (pdfscraper.layout.utils.Orientation) –
- Return type
- class pdfscraper.page.PageSection(words, drawings, images, raw_object, blocks)[source]
Bases:
Page- Parameters
words (List[pdfscraper.layout.text.Word]) –
drawings (List[pdfscraper.layout.drawing.Shape]) –
images (List[pdfscraper.layout.image.Image]) –
raw_object (Union[fitz.fitz.Page, pdfminer.layout.LTPage]) –
blocks (List[pdfscraper.layout.text.Block]) –