docetl2

Commit Graph

Author	SHA1	Message	Date
Shreya Shankar	54faaed230	fix: add code ops and extract to python api	2025-11-24 14:18:25 -06:00
Shreya Shankar	a184a3c1e9	fix: add code ops and extract to python api (#462 ) * fix: add code ops and extract to python api * fix: add code ops and extract to python api	2025-11-24 14:10:07 -06:00
Shreya Shankar	7cca6f57b5	Graceful jinja template handling with user confirmation (#452 ) * feat: Add user confirmation for non-Jinja prompts This commit introduces a confirmation step for prompts that do not contain Jinja2 syntax. It also modifies strict_render to automatically append document context when Jinja syntax is absent. Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor: Move DOCETL_CONSOLE import to function scope Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor: Move has_jinja_syntax to docetl.utils Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-11-22 14:02:19 -08:00
Shreya Shankar	9ebfc1bd58	feat: add ability to sort chronologically for epstein emails (#458 )	2025-11-16 09:08:10 -08:00
Shreya Shankar	067e671650	feat: add ability to sort chronologically for epstein emails (#457 ) * feat: add ability to sort chronologically for epstein emails * feat: add ability to sort chronologically for epstein emails	2025-11-16 08:48:56 -08:00
Shreya Shankar	56de207152	Add new showcase example (#456 )	2025-11-14 17:51:49 -08:00
Shreya Shankar	7b2fc49b93	Add new showcase example (#455 )	2025-11-14 16:29:06 -08:00
Shreya Shankar	c3ef5684d5	Add fallback models documentation to user guide (#454 ) * Add fallback models documentation and configuration Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor: Remove operation-specific models from fallback docs Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Docs: Add content warning errors to fallback model triggers Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Remove outdated best practices from fallback models documentation Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-11-14 13:32:31 -08:00
Shreya Shankar	ea0013d7fd	Implement LiteLLM fallback models for reliability (#453 ) * feat: Add LiteLLM Router for fallback models Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Add model to litellm_params for LiteLLM Router Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor: Prioritize operation model in LiteLLM Router fallbacks Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor: Cache LiteLLM Routers in APIWrapper Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor: Separate completion and embedding routers Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Add fallback models example configuration Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * feat: Add fallback models to LiteLLM Router Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor Router fallbacks to use list of dicts Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Remove example fallback config file Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Feat: Add version constraint for paddlepaddle Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-11-14 13:17:47 -08:00
John Damask	222ea2d706	Added libgl1 and libglib2.0-0 library install to Dockerfile as required by OpenCV (#451 ) Co-authored-by: John Damask <johndamask@Johns-MacBook-Pro-2.local>	2025-11-10 09:02:19 -08:00
John Damask	fb9a3770ca	Fixes Issue #446 - Split arguments are loaded correctly from yaml (#447 ) Co-authored-by: John Damask <johndamask@Johns-MacBook-Pro-2.local>	2025-11-09 15:27:30 -08:00
John Damask	360c7087bf	Closes Issue #448 . Added OR conditional for loading arguments from yaml code_reduce operations (#449 ) Co-authored-by: John Damask <johndamask@Johns-MacBook-Pro-2.local>	2025-11-09 15:18:58 -08:00
Shreya Shankar	5b9180726b	Fix: Ensure toolInvocations is an array before accessing its properties (#445 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-11-07 01:52:43 -08:00
Shreya Shankar	0a44699f1e	Refactor: Clean up incomplete tool invocations in scraper (#444 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-11-07 01:49:01 -08:00
Shreya Shankar	30a1e22ec5	Agentic web scraper with interactive data viewer (#442 ) * feat: Add web scraper UI and API integration Implement a web scraper with a UI for user interaction and an API for data collection. Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * feat: Integrate Modal for code execution and data scraping This commit integrates Modal for executing Python code in a sandbox environment. It also adds the Tavily search tool and updates the system prompt to guide the AI in data scraping workflows. Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * feat: Implement agent loop control and GPT-5 default Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor scraper UI to show form conditionally Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * add web scraper code * finish web scraper tool * finish web scraper tool --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-11-07 01:29:30 -08:00
Shreya Shankar	1c17491088	Add alert for manual pipeline authoring (#443 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-11-06 21:47:47 -08:00
Shreya Shankar	b2ddedee2d	Set natural language as default entry point (#438 ) * feat: Auto-open NL pipeline dialog and update model Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Checkpoint before follow-up message Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * feat: Update default models to gpt-5-nano Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor NaturalLanguagePipelineDialog for improved UX and clarity Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor: Update UI elements and remove unused ScrollArea Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Fix: Correctly close else block in chat API Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Fix: Enable generate button when currentFile exists Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Fix: Allow pipeline generation without uploaded file Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor: Improve pipeline application logic and error handling Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor dataset parsing and loading logic Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-11-06 21:46:21 -08:00
Shreya Shankar	2514b65a85	Fix pydantic-core build error with python 3.14 (#441 ) * ci: Add PYO3_USE_ABI3_FORWARD_COMPATIBILITY for docs build Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Update docs workflow to use Python 3.13 Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-11-06 20:57:06 -08:00
Shreya Shankar	532ad50546	Update llms-full.txt with operations (#440 ) * Refactor operator documentation and organization Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor operation descriptions and categorization for clarity Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor documentation and improve LLM operator descriptions Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Update default LLM to gpt-5-nano and other models Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-11-06 20:49:38 -08:00
Shreya Shankar	69a1f9ad3e	Build interactive pipeline visualization editor (#437 ) * feat: Add pipeline visualization builder component Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * feat: Add visual editor for pipeline configuration This commit introduces a visual editor for configuring pipelines, allowing users to manipulate blocks, properties, and stacks through a user-friendly interface. The JSON editor remains available as an alternative. Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Add visualization builder --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-10-22 16:04:35 -07:00
Shreya Shankar	e53bb73dbc	Refactor: Use gleaning_model instead of model in APIWrapper (#436 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-10-09 12:45:15 -07:00
Jonathan Hilgart	5209ff9958	allow parallel map operations to be loaded in the UI (#434 )	2025-10-07 14:45:24 -07:00
Shreya Shankar	b1146e462f	feat: Add copy button to toast descriptions (#433 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-09-23 17:21:45 -07:00
Shreya Shankar	3ae3ada69c	fix a gleaning bug (#432 ) * fix: resolve blocking * fix bug in gleaning model * fix bug in gleaning model	2025-09-23 16:01:38 -07:00
Shreya Shankar	50e45cf7db	Refactor: Use num_tokens instead of token_count (#430 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-09-15 15:32:33 -07:00
Shreya Shankar	d1dab86466	Fix cosine similarity blocking in resolve operation (#428 ) * fix: resolve blocking * fix: resolve blocking	2025-09-14 20:24:58 -07:00
Shreya Shankar	0f6d9514e3	Add type checks to validation functions (#427 ) * feat: Add output type validation to map and reduce operations Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refine API error message for validation failures Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Test map type validation for integer answers Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * fix test for validating schemas --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-09-12 16:18:21 -07:00
Anish Athalye	74cd851301	Fix mismatch between prompt and output schema (#425 ) The output schema has just three fields, but the formatting instructions and the example in the prompt include extra fields. As these extra fields are not used downstream in the pipeline, this patch removes them from the prompt.	2025-09-07 18:03:21 +01:00
Shreya Shankar	54bc90721b	Upgrade dependencies to latest versions (#423 )	2025-09-04 03:06:41 +01:00
Shreya Shankar	8878b6823b	Clarify environment variable usage for frontend and backend (#418 ) * Add .env example files and clarify environment configuration Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Update .env.example files with LLM provider flexibility notes Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Update .env.example files with expanded LLM provider documentation Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-08-24 16:21:04 -07:00
Shreya Shankar	0afc7ad739	Implement gleaning model temp fix (#414 ) * Checkpoint before follow-up message Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Remove temperature for GPT-5 models in gleaning process Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-08-17 15:58:20 -07:00
Shreya Shankar	66d32f342c	Import all operations in init file (#413 ) * Add new operations and update __all__ in operations module Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Remove __all__ list from operations module Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-08-16 15:29:55 -07:00
Shreya Shankar	6a6ea597de	update topk to also get the rank (#411 )	2025-08-14 13:38:49 -07:00
Shreya Shankar	4cfd371744	feat: add topk implementation (#410 )	2025-08-13 13:27:54 -07:00
Shreya Shankar	6dafd46fba	Update Pandas API to Use New Output Parameter Format (#409 ) * chore: update pandas api and docs * chore: update pd accessors	2025-08-13 10:41:49 -07:00
Shreya Shankar	79542259cb	Refactor sample operation for multiple stratify keys (#408 ) * Enhance sample operation with multi-key stratification and per-group sampling Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor SampleOperation with improved validation and sampling methods Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Refactor sample operation with improved stratification and simplified config Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-08-13 10:16:55 -07:00
Shreya Shankar	6819b9f469	Fix LLM calibration kwargs to respect user settings and model defaults (#406 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-08-12 14:26:22 -07:00
Shreya Shankar	d06eca44c9	fix bug in showcase (#404 )	2025-08-11 11:02:56 -07:00
Shreya Shankar	00c94fcce1	website: add new showcase example (#403 )	2025-08-09 22:24:38 -07:00
Shreya Shankar	42f6a1e32c	chore: bump up version	2025-08-09 13:33:00 -07:00
Shreya Shankar	50720e2243	chore: update CI	2025-08-09 13:27:12 -07:00
Shreya Shankar	5e6b278ff7	chore: update version	2025-08-09 13:21:44 -07:00
Shreya Shankar	f6ead4ea6b	Switch from poetry to uv (#402 ) * Switch from Poetry to uv for dependency management and packaging Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Update README with uv installation and dependency management instructions Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * Optimize Docker build: improve dependency installation and caching Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com> * verify that uv works on my local installation --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-08-09 13:20:32 -07:00
Shreya Shankar	84c4807009	chore: switching to cloudbank for blob storage (#400 ) * add seo * add seo * switching to cloudbank for blob storage	2025-07-31 14:28:00 -07:00
Shreya Shankar	6bf318e755	add SEO (#399 ) * add seo * add seo	2025-07-30 12:56:04 -07:00
Shreya Shankar	2344abd379	add seo (#398 )	2025-07-30 12:49:27 -07:00
Shreya Shankar	fc62d9fbd9	website: improving SEO (#397 ) * add seo * add seo	2025-07-30 12:43:22 -07:00
Shreya Shankar	8a70fb65a2	website: add SEO for showcase (#396 ) * add new demo * add seo * add seo	2025-07-30 12:19:27 -07:00
Shreya Shankar	b4916de130	add new demo (#395 )	2025-07-30 11:57:16 -07:00
Shreya Shankar	0f0253be16	Add structured output mode support to pandas API (#394 ) * Add structured output mode support to pandas API This commit addresses issue #393 by adding the ability to specify structured output mode in the pandas semantic accessor. Key changes: - Enhanced map() method to accept 'output' parameter with schema and mode - Added support for 'structured_output' mode alongside existing 'tools' mode - Maintained backward compatibility with existing 'output_schema' parameter - Added comprehensive parameter validation and error handling New API format: ```python df.semantic.map( prompt="Extract data: {{input.text}}", output={ "schema": {"name": "str", "age": "int"}, "mode": "structured_output" # Optional, defaults to "tools" } ) ``` Tests added: - test_semantic_map_structured_output: Tests new structured output functionality - test_semantic_map_invalid_output_mode: Tests output mode validation - test_semantic_map_structured_output_vs_tools: Compares both output modes - test_semantic_map_backward_compatibility: Ensures old API still works - test_semantic_map_parameter_validation: Comprehensive parameter validation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Update pandas documentation for structured output mode - Updated pandas/index.md to document new output parameter format - Added section explaining output modes (tools vs structured_output) - Updated examples throughout pandas/operations.md and pandas/examples.md - Maintained backward compatibility examples - Added guidance on when to use structured output mode Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-07-22 08:54:44 -07:00

1 2 3 4 5 ...

874 Commits All Branches Search

874 Commits

All Branches