pdfscraper.layout.image

Module Contents

Classes

Image

An image created from pdfminer or pymupdf object.

MuPDFImage

dict() -> new empty dictionary

Functions

get_image(layout_object)

attr_as(obj, field, value)

get_images_from_pymupdf_page(page)

Attributes

ImageSource

pdfscraper.layout.image.ImageSource[source]
pdfscraper.layout.image.get_image(layout_object)[source]
Return type

Optional[pdfminer.layout.LTImage]

pdfscraper.layout.image.attr_as(obj, field, value)[source]
Parameters

field (str) –

Return type

Iterator[None]

class pdfscraper.layout.image.Image[source]

Bases: pdfscraper.layout.utils.Rectangular

An image created from pdfminer or pymupdf object.

Parameters
  • bbox (Bbox) –

  • source_width (Optional[int]) –

  • source_height (Optional[int]) –

  • colorspace_name (Optional[str]) –

  • bpc (Optional[int]) –

  • xref (Optional[int]) –

  • name (Optional[str]) –

  • source (typing_extensions.Literal[pdfminer, mupdf]) –

  • raw_object (Any) –

  • parent_object (Any) –

  • colorspace_n (Optional[int]) –

class Config[source]
arbitrary_types_allowed = True[source]
bbox :pdfscraper.layout.utils.Bbox[source]
source_width :Optional[int][source]
source_height :Optional[int][source]
colorspace_name :Optional[str][source]
bpc :Optional[int][source]
xref :Optional[int][source]
name :Optional[str][source]
source :ImageSource[source]
raw_object :Any[source]
parent_object :Any[source]
colorspace_n :Optional[int][source]
__hash__()[source]

Return hash(self).

__contains__(other)[source]
_save_pdfminer(path)[source]
Parameters

path (str) –

_save_pymupdf(path)[source]
Parameters

path (str) –

save(path)[source]
Parameters

path (str) –

classmethod from_pdfminer(image, page_orientation)[source]

Create an image out of pdfminer object.

Parameters
  • image (pdfminer.layout.LTImage) – pdfminer LTImage object.

  • orientation – page orientation data.

  • page_orientation (PageOrientation) –

Returns

Return type

Image

classmethod from_pymupdf(image, doc, page_orientation)[source]
Parameters
Return type

Image

class pdfscraper.layout.image.MuPDFImage[source]

Bases: TypedDict

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

xref :int[source]
mask_xref :int[source]
source_width :int[source]
source_height :int[source]
bpc :int[source]
colorspace_name :str[source]
name :str[source]
decode_filter :str[source]
bbox :Tuple[source]
pdfscraper.layout.image.get_images_from_pymupdf_page(page)[source]
Return type

Iterable[MuPDFImage]