docetl/docs/api-reference/python.md

208 lines
5.4 KiB
Markdown

# Python API
## Operations
::: docetl.schemas.MapOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.ResolveOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.ReduceOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.ParallelMapOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.FilterOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.EquijoinOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.SplitOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.GatherOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.UnnestOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.SampleOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.ClusterOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.CodeMapOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.CodeReduceOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.CodeFilterOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.ExtractOp
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
### Callable support for code ops
Code operations (`code_map`, `code_reduce`, `code_filter`) accept either a string containing Python code that defines a `transform` function, or a regular Python function. When you pass a function, it does not need to be named `transform`; DocETL binds it internally.
Example:
```python
from docetl.api import CodeMapOp
def my_map(doc: dict) -> dict:
return {"double": doc["x"] * 2}
code_map = CodeMapOp(name="double_x", type="code_map", code=my_map)
```
- Map: `fn(doc: dict) -> dict`
- Filter: `fn(doc: dict) -> bool`
- Reduce: `fn(group: list[dict]) -> dict`
See also: [Code Operators](../operators/code.md), [Extract Operator](../operators/extract.md)
## Dataset and Pipeline
::: docetl.schemas.Dataset
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.ParsingTool
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.PipelineStep
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.schemas.PipelineOutput
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true
::: docetl.api.Pipeline
options:
show_root_heading: true
heading_level: 3
show_if_no_docstring: false
docstring_options:
ignore_init_summary: false
trim_doctest_flags: true