Commit Graph

157 Commits

Author SHA1 Message Date
Shreya Shankar 54faaed230 fix: add code ops and extract to python api 2025-11-24 14:18:25 -06:00
Shreya Shankar a184a3c1e9
fix: add code ops and extract to python api (#462)
* fix: add code ops and extract to python api

* fix: add code ops and extract to python api
2025-11-24 14:10:07 -06:00
Shreya Shankar c3ef5684d5
Add fallback models documentation to user guide (#454)
* Add fallback models documentation and configuration

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Remove operation-specific models from fallback docs

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Docs: Add content warning errors to fallback model triggers

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Remove outdated best practices from fallback models documentation

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-14 13:32:31 -08:00
Shreya Shankar ea0013d7fd
Implement LiteLLM fallback models for reliability (#453)
* feat: Add LiteLLM Router for fallback models

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Add model to litellm_params for LiteLLM Router

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Prioritize operation model in LiteLLM Router fallbacks

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Cache LiteLLM Routers in APIWrapper

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor: Separate completion and embedding routers

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Add fallback models example configuration

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* feat: Add fallback models to LiteLLM Router

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor Router fallbacks to use list of dicts

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Remove example fallback config file

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Feat: Add version constraint for paddlepaddle

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-14 13:17:47 -08:00
Shreya Shankar 50e45cf7db
Refactor: Use num_tokens instead of token_count (#430)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-09-15 15:32:33 -07:00
Shreya Shankar d1dab86466
Fix cosine similarity blocking in resolve operation (#428)
* fix: resolve blocking

* fix: resolve blocking
2025-09-14 20:24:58 -07:00
Shreya Shankar 4cfd371744
feat: add topk implementation (#410) 2025-08-13 13:27:54 -07:00
Shreya Shankar 6dafd46fba
Update Pandas API to Use New Output Parameter Format (#409)
* chore: update pandas api and docs

* chore: update pd accessors
2025-08-13 10:41:49 -07:00
Shreya Shankar 79542259cb
Refactor sample operation for multiple stratify keys (#408)
* Enhance sample operation with multi-key stratification and per-group sampling

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor SampleOperation with improved validation and sampling methods

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor sample operation with improved stratification and simplified config

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-13 10:16:55 -07:00
Shreya Shankar f6ead4ea6b
Switch from poetry to uv (#402)
* Switch from Poetry to uv for dependency management and packaging

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Update README with uv installation and dependency management instructions

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Optimize Docker build: improve dependency installation and caching

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* verify that uv works on my local installation

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-09 13:20:32 -07:00
Shreya Shankar 84c4807009
chore: switching to cloudbank for blob storage (#400)
* add seo

* add seo

* switching to cloudbank for blob storage
2025-07-31 14:28:00 -07:00
Shreya Shankar 0f0253be16
Add structured output mode support to pandas API (#394)
* Add structured output mode support to pandas API

This commit addresses issue #393 by adding the ability to specify structured
output mode in the pandas semantic accessor.

Key changes:
- Enhanced map() method to accept 'output' parameter with schema and mode
- Added support for 'structured_output' mode alongside existing 'tools' mode
- Maintained backward compatibility with existing 'output_schema' parameter
- Added comprehensive parameter validation and error handling

New API format:
```python
df.semantic.map(
    prompt="Extract data: {{input.text}}",
    output={
        "schema": {"name": "str", "age": "int"},
        "mode": "structured_output"  # Optional, defaults to "tools"
    }
)
```

Tests added:
- test_semantic_map_structured_output: Tests new structured output functionality
- test_semantic_map_invalid_output_mode: Tests output mode validation
- test_semantic_map_structured_output_vs_tools: Compares both output modes
- test_semantic_map_backward_compatibility: Ensures old API still works
- test_semantic_map_parameter_validation: Comprehensive parameter validation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Update pandas documentation for structured output mode

- Updated pandas/index.md to document new output parameter format
- Added section explaining output modes (tools vs structured_output)
- Updated examples throughout pandas/operations.md and pandas/examples.md
- Maintained backward compatibility examples
- Added guidance on when to use structured output mode

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-07-22 08:54:44 -07:00
Sid Jha 19a8286978
Improve syntax_check by leveraging pydantic validation (#392)
* Work on other operators

* Fix

* Add ValidationInfo

* Bug fix

* docs: fix errors in equijoin documentation

---------

Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
2025-07-20 18:54:59 -07:00
Shreya Shankar 9028efc8d4
Update cluster documentation to use inputs iteration in summary prompt (#386)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-07-14 18:12:22 -07:00
Shreya Shankar d995c534b5
feat: add global bypass cache (#383) 2025-07-08 16:53:40 -07:00
Shreya Shankar 9b836ee228
Add operators to pandas API (#379) 2025-07-04 11:59:24 -07:00
Shreya Shankar 1e4709a112
Refactor api.py for structured output (#378)
* Refactor API wrapper with modular design for LLM calls and output handling

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor APIWrapper: Simplify LLM call logic and improve modularity

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor output mode handling in APIWrapper with flexible configuration

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Add comprehensive tests for DocETL output modes with synthetic data

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor output modes tests with improved pytest structure and DSLRunner

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Fix runtime errors

* Add nested JSON parsing for string values in API response

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Handle nested JSON parsing by extracting matching key values

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Simplify JSON parsing logic in API utility functions

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Add to tests

* Add documentation for DocETL output modes and configuration options

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Add docs

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-07-04 10:58:15 -07:00
Shreya Shankar 9a72a6b729
chore: bump up fastapi and python multipart (#376)
* merge

* chore: bump up fastapi and python multipart

* chore: bump up fastapi and python multipart
2025-07-02 07:37:06 -07:00
Shreya Shankar b8d2beb602
feat: adding conditional gleaning (#375)
* fix: improve caching and don't raise error for bad gather configs

* fix: improve caching and don't raise error for bad gather configs

* feat: adding conditional gleaning
2025-07-01 22:40:38 -07:00
Shreya Shankar 7071ade539
fix: improve caching and don't raise error for bad gather configs (#373)
* merge

* fix: improve caching and don't raise error for bad gather configs

* fix: improve caching and don't raise error for bad gather configs

* fix: improve caching and don't raise error for bad gather configs
2025-06-30 23:19:17 -07:00
Shreya Shankar ea60f38afd
docs: improve gleaning description (#371)
* docs: improve gleaning description

* docs: improve gleaning description

* docs: improve gleaning description

* docs: fix spacing in gleaning docs
2025-06-26 21:25:26 -07:00
Shreya Shankar d156351c69
docs: improve gleaning description (#370)
* docs: improve gleaning description

* docs: improve gleaning description

* docs: improve gleaning description
2025-06-26 21:19:51 -07:00
Shreya Shankar 631ea0c34f
feat: Add calibration support to map operations for improved consistency (#365)
* chore: run pre-commit

* feat: add calibration to map ops

* feat: add calibration to map ops

* fix: comment out flaky test
2025-06-14 23:04:13 -07:00
Shreya Shankar 8a7b4e1566
feat: add extract operator (#361)
* feat: add extract operator

* feat: add extract operator

* feat: add extract operator

* feat: add extract operator
2025-05-13 17:43:46 -07:00
Shreya Shankar bb5bdff9d1
feat: adding `api_base` to yaml (#359)
* feat: adding api base to yaml

* feat: adding api base to yaml

* docs: add api base docs
2025-05-11 18:40:29 -07:00
Shreya Shankar 1a41df6cb5
fix: add system prompt and other config vars to python api (#356) 2025-05-03 18:05:22 -07:00
Shreya Shankar 3781edebfa
docs: update for rank op (#344)
* update documentation for rank op

* update documentation for rank op
2025-04-21 18:24:37 -07:00
Shreya Shankar 9ad27afa12
update documentation for rank op (#343) 2025-04-21 18:22:21 -07:00
Shreya Shankar 0b2d5b0324
feat: add rank operation card in docwrangler (#341)
* feat: add rank operation card in docwrangler

* feat: add rank operation card in docwrangler

* feat: add rank operation card in docwrangler
2025-04-21 16:02:19 -07:00
Shreya Shankar c960d6fcc0
feat: add rank operation (#340)
* feat: add order by operator

* adding a bunch of order functions

* refactor: move around tests for rank operator

* add anthropic hh test for rank

* refactor: move around tests for rank operator

* refactor: move around tests for rank operator
2025-04-21 13:48:41 -07:00
Shreya Shankar d20c3982e4
feat: add TPM rate limits & pipeline settings to DocWrangler (#338)
* add TPM rate limit

* feat: add rate limiting in docwrangler; tpm

* feat: add rate limiting in docwrangler; tpm

* feat: add rate limiting in docwrangler; tpm
2025-04-19 22:38:48 -07:00
Shreya Shankar 1c4e181201
docs: add python api quickstart (#337)
* docs: add python api quickstart

* docs: add python api quickstart
2025-04-19 09:49:50 -07:00
Shreya Shankar e7fd306d99
Add python docs (#334) 2025-04-17 12:23:24 -07:00
shabie 4793c89e92
feat: add flag to stream map operation outputs to disk (#323)
* feat: add flag to stream map operation outputs to disk

* flush partial results default False

* comment out print statement in test basic map

* rewrite test to not use batched config

---------

Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
2025-03-19 19:17:56 -06:00
Shreya Shankar fa5460b4c0
feat: add n parameter to output (#320)
* feat: add n parameter to output

* feat: add n parameter to output

* feat: add n parameter to output

* feat: add n parameter to output
2025-03-13 23:25:03 -07:00
Shreya Shankar 080e7f75ec
feat: refactoring map optimizer (#311) 2025-02-18 12:54:45 -08:00
Shreya Shankar f4abe55191
feat: support pdf upload and add tutorial (#309) 2025-02-09 21:19:17 -08:00
Shreya Shankar 2a259a0d93
feat: add pandas df accessor (#287)
* feat: add pandas df accessor

* feat: add pandas df accessor

* feat: add pandas df accessor
2025-01-24 16:54:49 -08:00
Shreya Shankar 05c4357c59
docs: add verbose param (#283) 2025-01-22 12:23:46 +01:00
Shreya Shankar eb995fdc77
feat: add llmstxt (#267)
* feat: adding llms.txt

* feat: adding llms.txt

* feat: adding llms.txt

* feat: adding llms.txt
2025-01-07 21:43:17 -08:00
Shreya Shankar 8b3e1ce640
fix: bypass vercel serverless functions when uploading datasets (#266) 2025-01-06 20:52:37 -08:00
Shreya Shankar f38fbb8960
docs: update playground docs (#259)
* rebrand to docwrangler

* refactor: rebranding to docwrangler

* refactor: rebranding to docwrangler

* refactor: edit vercel.json

* fix: map optimizer should work

* docs: update playground docs
2025-01-02 00:33:05 -08:00
Shreya Shankar 662d6a2c5a
feat: add azure openai for the FE assistants (#256)
* chore: ui nits

* chore: ui nits

* feat: add tutorial for the supreme court hearings

* fix: output csv writing should work even if not all documents have all the keys

* feat: add azure openai for prompt improvement and chat, with logging

* Add system prompts to documentation

* fix: add setSystemPrompt to restore pipeline context

* docs: add vercel json
2025-01-01 11:41:43 -08:00
Shreya Shankar af70998e6f
chore: add param to skip LLM calls when they fail (#255)
* feat: add skip_on_error param to map operations

* chore: add more helpful logging

* chore: prettier printing

* better logging when skipping on error

* better logging when skipping on error
2024-12-29 23:53:37 -06:00
Rohit Rawat 0e077aa740
added enum support (#254)
* added enum support

* tests: add test for enum type output

* docs: update docs to support enum type schemas

---------

Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
2024-12-26 17:50:23 -06:00
Shreya Shankar f1a12d2700
WIP: separate frontend from backend so we can host the frontend (#242)
* fix: prompt engineer agent for map decomposition

* feat: host application

* chore: update azure

* chore: update azure

* chore: update azure

* fix: adding default ops in init

* chore: refactoring fastapi models

* feat: allowing frontend to use a separate server for backend

* feat: allowing frontend to use a separate server for backend

* feat: allowing frontend to use a separate server for backend

* feat: allowing frontend to use a separate server for backend

* feat: allowing frontend to use a separate server for backend

* feat: allowing frontend to use a separate server for backend
2024-12-22 17:42:40 -06:00
Shreya Shankar bd40799e01
refactor recursive optimization for map operations (#225)
* feat: add column view dialog

* feat: add column view dialog

* increase docker time

* feat: add feedback indicator in the row view

* feat: add prompt engineering flow for notes

* feat: add prompt engineering flow for notes

* feat: support resolve prompts in the prompt editor

* fix: fix build errors

* docs: update playground

* docs: update playground

* docs: update playground

* docs: update playground

* Update reduce folding instruction

* fix: make histogram calculation and rendering less blocking

* tests: make docker CI more robust

* docs: edit readme to be formatted better

* Edit system prompt intsruction

* Edit system prompt intsruction

* chore: refactor recursive optimization for map ops

* feat: small performance optimization to prompt improvement
2024-12-03 17:04:25 -08:00
Sushruth Booma 813946141f
Fix cache naming (#220) 2024-12-02 11:43:00 -06:00
Shreya Shankar 8f1a9dbfdf
edit system prompt in prompt improvement (#223)
* feat: add column view dialog

* feat: add column view dialog

* increase docker time

* feat: add feedback indicator in the row view

* feat: add prompt engineering flow for notes

* feat: add prompt engineering flow for notes

* feat: support resolve prompts in the prompt editor

* fix: fix build errors

* docs: update playground

* docs: update playground

* docs: update playground

* docs: update playground

* Update reduce folding instruction

* fix: make histogram calculation and rendering less blocking

* tests: make docker CI more robust

* docs: edit readme to be formatted better

* Edit system prompt intsruction

* Edit system prompt intsruction
2024-12-02 08:47:43 -06:00
Shreya Shankar 67a7f646a8
feat: add column view dialog (#214)
* feat: add column view dialog

* feat: add column view dialog

* increase docker time

* feat: add feedback indicator in the row view

* feat: add prompt engineering flow for notes

* feat: add prompt engineering flow for notes

* feat: support resolve prompts in the prompt editor

* fix: fix build errors

* docs: update playground

* docs: update playground

* docs: update playground

* docs: update playground
2024-11-29 00:48:15 -06:00