Commit Graph

874 Commits

Author SHA1 Message Date
Shreya Shankar 54faaed230 fix: add code ops and extract to python api 2025-11-24 14:18:25 -06:00
Shreya Shankar a184a3c1e9
fix: add code ops and extract to python api (#462)
* fix: add code ops and extract to python api

* fix: add code ops and extract to python api
2025-11-24 14:10:07 -06:00
Shreya Shankar 7cca6f57b5
Graceful jinja template handling with user confirmation (#452)
* feat: Add user confirmation for non-Jinja prompts

This commit introduces a confirmation step for prompts that do not contain Jinja2 syntax. It also modifies strict_render to automatically append document context when Jinja syntax is absent.

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Move DOCETL_CONSOLE import to function scope

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Move has_jinja_syntax to docetl.utils

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-22 14:02:19 -08:00
Shreya Shankar 9ebfc1bd58
feat: add ability to sort chronologically for epstein emails (#458) 2025-11-16 09:08:10 -08:00
Shreya Shankar 067e671650
feat: add ability to sort chronologically for epstein emails (#457)
* feat: add ability to sort chronologically for epstein emails

* feat: add ability to sort chronologically for epstein emails
2025-11-16 08:48:56 -08:00
Shreya Shankar 56de207152
Add new showcase example (#456) 2025-11-14 17:51:49 -08:00
Shreya Shankar 7b2fc49b93
Add new showcase example (#455) 2025-11-14 16:29:06 -08:00
Shreya Shankar c3ef5684d5
Add fallback models documentation to user guide (#454)
* Add fallback models documentation and configuration

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Remove operation-specific models from fallback docs

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Docs: Add content warning errors to fallback model triggers

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Remove outdated best practices from fallback models documentation

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-14 13:32:31 -08:00
Shreya Shankar ea0013d7fd
Implement LiteLLM fallback models for reliability (#453)
* feat: Add LiteLLM Router for fallback models

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Add model to litellm_params for LiteLLM Router

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Prioritize operation model in LiteLLM Router fallbacks

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Cache LiteLLM Routers in APIWrapper

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Separate completion and embedding routers

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Add fallback models example configuration

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* feat: Add fallback models to LiteLLM Router

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor Router fallbacks to use list of dicts

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Remove example fallback config file

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Feat: Add version constraint for paddlepaddle

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-14 13:17:47 -08:00
John Damask 222ea2d706
Added libgl1 and libglib2.0-0 library install to Dockerfile as required by OpenCV (#451)
Co-authored-by: John Damask <johndamask@Johns-MacBook-Pro-2.local>
2025-11-10 09:02:19 -08:00
John Damask fb9a3770ca
Fixes Issue #446 - Split arguments are loaded correctly from yaml (#447)
Co-authored-by: John Damask <johndamask@Johns-MacBook-Pro-2.local>
2025-11-09 15:27:30 -08:00
John Damask 360c7087bf
Closes Issue #448. Added OR conditional for loading arguments from yaml code_reduce operations (#449)
Co-authored-by: John Damask <johndamask@Johns-MacBook-Pro-2.local>
2025-11-09 15:18:58 -08:00
Shreya Shankar 5b9180726b
Fix: Ensure toolInvocations is an array before accessing its properties (#445)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-07 01:52:43 -08:00
Shreya Shankar 0a44699f1e
Refactor: Clean up incomplete tool invocations in scraper (#444)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-07 01:49:01 -08:00
Shreya Shankar 30a1e22ec5
Agentic web scraper with interactive data viewer (#442)
* feat: Add web scraper UI and API integration

Implement a web scraper with a UI for user interaction and an API for data collection.

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* feat: Integrate Modal for code execution and data scraping

This commit integrates Modal for executing Python code in a sandbox environment. It also adds the Tavily search tool and updates the system prompt to guide the AI in data scraping workflows.

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* feat: Implement agent loop control and GPT-5 default

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor scraper UI to show form conditionally

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* add web scraper code

* finish web scraper tool

* finish web scraper tool

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-07 01:29:30 -08:00
Shreya Shankar 1c17491088
Add alert for manual pipeline authoring (#443)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-06 21:47:47 -08:00
Shreya Shankar b2ddedee2d
Set natural language as default entry point (#438)
* feat: Auto-open NL pipeline dialog and update model

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Checkpoint before follow-up message

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* feat: Update default models to gpt-5-nano

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor NaturalLanguagePipelineDialog for improved UX and clarity

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Update UI elements and remove unused ScrollArea

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Fix: Correctly close else block in chat API

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Fix: Enable generate button when currentFile exists

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Fix: Allow pipeline generation without uploaded file

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Improve pipeline application logic and error handling

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor dataset parsing and loading logic

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-06 21:46:21 -08:00
Shreya Shankar 2514b65a85
Fix pydantic-core build error with python 3.14 (#441)
* ci: Add PYO3_USE_ABI3_FORWARD_COMPATIBILITY for docs build

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Update docs workflow to use Python 3.13

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-06 20:57:06 -08:00
Shreya Shankar 532ad50546
Update llms-full.txt with operations (#440)
* Refactor operator documentation and organization

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor operation descriptions and categorization for clarity

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor documentation and improve LLM operator descriptions

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Update default LLM to gpt-5-nano and other models

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-06 20:49:38 -08:00
Shreya Shankar 69a1f9ad3e
Build interactive pipeline visualization editor (#437)
* feat: Add pipeline visualization builder component

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* feat: Add visual editor for pipeline configuration

This commit introduces a visual editor for configuring pipelines, allowing users to manipulate blocks, properties, and stacks through a user-friendly interface. The JSON editor remains available as an alternative.

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Add visualization builder

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-10-22 16:04:35 -07:00
Shreya Shankar e53bb73dbc
Refactor: Use gleaning_model instead of model in APIWrapper (#436)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-10-09 12:45:15 -07:00
Jonathan Hilgart 5209ff9958
allow parallel map operations to be loaded in the UI (#434) 2025-10-07 14:45:24 -07:00
Shreya Shankar b1146e462f
feat: Add copy button to toast descriptions (#433)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-09-23 17:21:45 -07:00
Shreya Shankar 3ae3ada69c
fix a gleaning bug (#432)
* fix: resolve blocking

* fix bug in gleaning model

* fix bug in gleaning model
2025-09-23 16:01:38 -07:00
Shreya Shankar 50e45cf7db
Refactor: Use num_tokens instead of token_count (#430)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-09-15 15:32:33 -07:00
Shreya Shankar d1dab86466
Fix cosine similarity blocking in resolve operation (#428)
* fix: resolve blocking

* fix: resolve blocking
2025-09-14 20:24:58 -07:00
Shreya Shankar 0f6d9514e3
Add type checks to validation functions (#427)
* feat: Add output type validation to map and reduce operations

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refine API error message for validation failures

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Test map type validation for integer answers

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* fix test for validating schemas

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-09-12 16:18:21 -07:00
Anish Athalye 74cd851301
Fix mismatch between prompt and output schema (#425)
The output schema has just three fields, but the formatting instructions
and the example in the prompt include extra fields. As these extra
fields are not used downstream in the pipeline, this patch removes them
from the prompt.
2025-09-07 18:03:21 +01:00
Shreya Shankar 54bc90721b
Upgrade dependencies to latest versions (#423) 2025-09-04 03:06:41 +01:00
Shreya Shankar 8878b6823b
Clarify environment variable usage for frontend and backend (#418)
* Add .env example files and clarify environment configuration

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Update .env.example files with LLM provider flexibility notes

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Update .env.example files with expanded LLM provider documentation

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-24 16:21:04 -07:00
Shreya Shankar 0afc7ad739
Implement gleaning model temp fix (#414)
* Checkpoint before follow-up message

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Remove temperature for GPT-5 models in gleaning process

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-17 15:58:20 -07:00
Shreya Shankar 66d32f342c
Import all operations in init file (#413)
* Add new operations and update __all__ in operations module

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Remove __all__ list from operations module

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-16 15:29:55 -07:00
Shreya Shankar 6a6ea597de
update topk to also get the rank (#411) 2025-08-14 13:38:49 -07:00
Shreya Shankar 4cfd371744
feat: add topk implementation (#410) 2025-08-13 13:27:54 -07:00
Shreya Shankar 6dafd46fba
Update Pandas API to Use New Output Parameter Format (#409)
* chore: update pandas api and docs

* chore: update pd accessors
2025-08-13 10:41:49 -07:00
Shreya Shankar 79542259cb
Refactor sample operation for multiple stratify keys (#408)
* Enhance sample operation with multi-key stratification and per-group sampling

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor SampleOperation with improved validation and sampling methods

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor sample operation with improved stratification and simplified config

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-13 10:16:55 -07:00
Shreya Shankar 6819b9f469
Fix LLM calibration kwargs to respect user settings and model defaults (#406)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-12 14:26:22 -07:00
Shreya Shankar d06eca44c9
fix bug in showcase (#404) 2025-08-11 11:02:56 -07:00
Shreya Shankar 00c94fcce1
website: add new showcase example (#403) 2025-08-09 22:24:38 -07:00
Shreya Shankar 42f6a1e32c chore: bump up version 2025-08-09 13:33:00 -07:00
Shreya Shankar 50720e2243 chore: update CI 2025-08-09 13:27:12 -07:00
Shreya Shankar 5e6b278ff7 chore: update version 2025-08-09 13:21:44 -07:00
Shreya Shankar f6ead4ea6b
Switch from poetry to uv (#402)
* Switch from Poetry to uv for dependency management and packaging

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Update README with uv installation and dependency management instructions

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Optimize Docker build: improve dependency installation and caching

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* verify that uv works on my local installation

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-09 13:20:32 -07:00
Shreya Shankar 84c4807009
chore: switching to cloudbank for blob storage (#400)
* add seo

* add seo

* switching to cloudbank for blob storage
2025-07-31 14:28:00 -07:00
Shreya Shankar 6bf318e755
add SEO (#399)
* add seo

* add seo
2025-07-30 12:56:04 -07:00
Shreya Shankar 2344abd379
add seo (#398) 2025-07-30 12:49:27 -07:00
Shreya Shankar fc62d9fbd9
website: improving SEO (#397)
* add seo

* add seo
2025-07-30 12:43:22 -07:00
Shreya Shankar 8a70fb65a2
website: add SEO for showcase (#396)
* add new demo

* add seo

* add seo
2025-07-30 12:19:27 -07:00
Shreya Shankar b4916de130
add new demo (#395) 2025-07-30 11:57:16 -07:00
Shreya Shankar 0f0253be16
Add structured output mode support to pandas API (#394)
* Add structured output mode support to pandas API

This commit addresses issue #393 by adding the ability to specify structured
output mode in the pandas semantic accessor.

Key changes:
- Enhanced map() method to accept 'output' parameter with schema and mode
- Added support for 'structured_output' mode alongside existing 'tools' mode
- Maintained backward compatibility with existing 'output_schema' parameter
- Added comprehensive parameter validation and error handling

New API format:
```python
df.semantic.map(
    prompt="Extract data: {{input.text}}",
    output={
        "schema": {"name": "str", "age": "int"},
        "mode": "structured_output"  # Optional, defaults to "tools"
    }
)
```

Tests added:
- test_semantic_map_structured_output: Tests new structured output functionality
- test_semantic_map_invalid_output_mode: Tests output mode validation
- test_semantic_map_structured_output_vs_tools: Compares both output modes
- test_semantic_map_backward_compatibility: Ensures old API still works
- test_semantic_map_parameter_validation: Comprehensive parameter validation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Update pandas documentation for structured output mode

- Updated pandas/index.md to document new output parameter format
- Added section explaining output modes (tools vs structured_output)
- Updated examples throughout pandas/operations.md and pandas/examples.md
- Maintained backward compatibility examples
- Added guidance on when to use structured output mode

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-07-22 08:54:44 -07:00