pdfscraper.layout.utils
Module Contents
Classes
Direction of a Y-axis. Bottom→Top or Top→Bottom. |
|
Direction of a X-axis. Left→Right or Right→Left. |
|
Directions of X and Y axes. |
|
Directions of X/Y axes together with page dimensions. |
|
A rectangular bounding box. |
|
An object with a rectangular bounding box. |
|
Generic enumeration. |
|
Functions
|
Creates a bbox taking into account axis direction from a given page. |
|
|
|
|
|
|
|
|
|
|
|
Group words into vertically adjacent lines. |
|
Get a middle point of a group of words. |
|
Get a middle point of a word. |
|
Yield items from any nested iterable. |
|
Attributes
- class pdfscraper.layout.utils.VerticalOrientation[source]
Direction of a Y-axis. Bottom→Top or Top→Bottom.
- class pdfscraper.layout.utils.HorizontalOrientation[source]
Direction of a X-axis. Left→Right or Right→Left.
- class pdfscraper.layout.utils.PageOrientation[source]
Directions of X/Y axes together with page dimensions.
- class pdfscraper.layout.utils.Bbox[source]
Bases:
NamedTupleA rectangular bounding box.
- isclose(other, tolerance)[source]
Check if two bboxes are close to each other.
- Parameters
other (Bbox) –
tolerance (float) –
- class pdfscraper.layout.utils.Backend[source]
Bases:
enum.EnumGeneric enumeration.
Derive from this class to define new enumerations.
- pdfscraper.layout.utils.DEFAULT_BACKEND_PAGE_ORIENTATIONS :Dict[Literal[Backend, Backend], Orientation][source]
- pdfscraper.layout.utils.create_bbox_backend(backend, coords, page_orientation)[source]
Creates a bbox taking into account axis direction from a given page.
- Parameters
backend (Backend) – backend type
coords – 4-item sequence of x0,y0,x1,y1 coordinates
page_orientation (PageOrientation) – page size together with X/Y axes directions.
- Returns
a bounding box
- Return type
- pdfscraper.layout.utils.group_objs(words, gap=5, decimals=1, axis='y')[source]
Group words into vertically adjacent lines.
First, create a dictionary with rounded y-coordinates as keys, and lists of words as values. Then merge together lists whose coordinate delta is <= gap.
- Parameters
words (List) – list of Words
gap (float) – vertical delta between lines to be merged.
decimals (int) – rounding precision.
axis (str) – horizontal (x) or vertical (y) grouping
- Returns
vertically grouped lines, each line is sorted horizontally inside.
- Return type
List[List]