Shreya Shankar
54faaed230
fix: add code ops and extract to python api
2025-11-24 14:18:25 -06:00
Shreya Shankar
a184a3c1e9
fix: add code ops and extract to python api ( #462 )
...
* fix: add code ops and extract to python api
* fix: add code ops and extract to python api
2025-11-24 14:10:07 -06:00
Shreya Shankar
c3ef5684d5
Add fallback models documentation to user guide ( #454 )
...
* Add fallback models documentation and configuration
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Refactor: Remove operation-specific models from fallback docs
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Docs: Add content warning errors to fallback model triggers
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Remove outdated best practices from fallback models documentation
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-14 13:32:31 -08:00
Shreya Shankar
ea0013d7fd
Implement LiteLLM fallback models for reliability ( #453 )
...
* feat: Add LiteLLM Router for fallback models
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Add model to litellm_params for LiteLLM Router
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Refactor: Prioritize operation model in LiteLLM Router fallbacks
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Refactor: Cache LiteLLM Routers in APIWrapper
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Refactor: Separate completion and embedding routers
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Add fallback models example configuration
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* feat: Add fallback models to LiteLLM Router
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Refactor Router fallbacks to use list of dicts
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Remove example fallback config file
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Feat: Add version constraint for paddlepaddle
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-11-14 13:17:47 -08:00
Shreya Shankar
50e45cf7db
Refactor: Use num_tokens instead of token_count ( #430 )
...
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-09-15 15:32:33 -07:00
Shreya Shankar
d1dab86466
Fix cosine similarity blocking in resolve operation ( #428 )
...
* fix: resolve blocking
* fix: resolve blocking
2025-09-14 20:24:58 -07:00
Shreya Shankar
4cfd371744
feat: add topk implementation ( #410 )
2025-08-13 13:27:54 -07:00
Shreya Shankar
6dafd46fba
Update Pandas API to Use New Output Parameter Format ( #409 )
...
* chore: update pandas api and docs
* chore: update pd accessors
2025-08-13 10:41:49 -07:00
Shreya Shankar
79542259cb
Refactor sample operation for multiple stratify keys ( #408 )
...
* Enhance sample operation with multi-key stratification and per-group sampling
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Refactor SampleOperation with improved validation and sampling methods
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Refactor sample operation with improved stratification and simplified config
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-13 10:16:55 -07:00
Shreya Shankar
f6ead4ea6b
Switch from poetry to uv ( #402 )
...
* Switch from Poetry to uv for dependency management and packaging
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Update README with uv installation and dependency management instructions
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Optimize Docker build: improve dependency installation and caching
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* verify that uv works on my local installation
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-09 13:20:32 -07:00
Shreya Shankar
84c4807009
chore: switching to cloudbank for blob storage ( #400 )
...
* add seo
* add seo
* switching to cloudbank for blob storage
2025-07-31 14:28:00 -07:00
Shreya Shankar
0f0253be16
Add structured output mode support to pandas API ( #394 )
...
* Add structured output mode support to pandas API
This commit addresses issue #393 by adding the ability to specify structured
output mode in the pandas semantic accessor.
Key changes:
- Enhanced map() method to accept 'output' parameter with schema and mode
- Added support for 'structured_output' mode alongside existing 'tools' mode
- Maintained backward compatibility with existing 'output_schema' parameter
- Added comprehensive parameter validation and error handling
New API format:
```python
df.semantic.map(
prompt="Extract data: {{input.text}}",
output={
"schema": {"name": "str", "age": "int"},
"mode": "structured_output" # Optional, defaults to "tools"
}
)
```
Tests added:
- test_semantic_map_structured_output: Tests new structured output functionality
- test_semantic_map_invalid_output_mode: Tests output mode validation
- test_semantic_map_structured_output_vs_tools: Compares both output modes
- test_semantic_map_backward_compatibility: Ensures old API still works
- test_semantic_map_parameter_validation: Comprehensive parameter validation
🤖 Generated with [Claude Code](https://claude.ai/code )
Co-Authored-By: Claude <noreply@anthropic.com>
* Update pandas documentation for structured output mode
- Updated pandas/index.md to document new output parameter format
- Added section explaining output modes (tools vs structured_output)
- Updated examples throughout pandas/operations.md and pandas/examples.md
- Maintained backward compatibility examples
- Added guidance on when to use structured output mode
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
2025-07-22 08:54:44 -07:00
Sid Jha
19a8286978
Improve syntax_check by leveraging pydantic validation ( #392 )
...
* Work on other operators
* Fix
* Add ValidationInfo
* Bug fix
* docs: fix errors in equijoin documentation
---------
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
2025-07-20 18:54:59 -07:00
Shreya Shankar
9028efc8d4
Update cluster documentation to use inputs iteration in summary prompt ( #386 )
...
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-07-14 18:12:22 -07:00
Shreya Shankar
d995c534b5
feat: add global bypass cache ( #383 )
2025-07-08 16:53:40 -07:00
Shreya Shankar
9b836ee228
Add operators to pandas API ( #379 )
2025-07-04 11:59:24 -07:00
Shreya Shankar
1e4709a112
Refactor api.py for structured output ( #378 )
...
* Refactor API wrapper with modular design for LLM calls and output handling
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Refactor APIWrapper: Simplify LLM call logic and improve modularity
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Refactor output mode handling in APIWrapper with flexible configuration
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Add comprehensive tests for DocETL output modes with synthetic data
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Refactor output modes tests with improved pytest structure and DSLRunner
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Fix runtime errors
* Add nested JSON parsing for string values in API response
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Handle nested JSON parsing by extracting matching key values
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Simplify JSON parsing logic in API utility functions
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Add to tests
* Add documentation for DocETL output modes and configuration options
Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>
* Add docs
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-07-04 10:58:15 -07:00
Shreya Shankar
9a72a6b729
chore: bump up fastapi and python multipart ( #376 )
...
* merge
* chore: bump up fastapi and python multipart
* chore: bump up fastapi and python multipart
2025-07-02 07:37:06 -07:00
Shreya Shankar
b8d2beb602
feat: adding conditional gleaning ( #375 )
...
* fix: improve caching and don't raise error for bad gather configs
* fix: improve caching and don't raise error for bad gather configs
* feat: adding conditional gleaning
2025-07-01 22:40:38 -07:00
Shreya Shankar
7071ade539
fix: improve caching and don't raise error for bad gather configs ( #373 )
...
* merge
* fix: improve caching and don't raise error for bad gather configs
* fix: improve caching and don't raise error for bad gather configs
* fix: improve caching and don't raise error for bad gather configs
2025-06-30 23:19:17 -07:00
Shreya Shankar
ea60f38afd
docs: improve gleaning description ( #371 )
...
* docs: improve gleaning description
* docs: improve gleaning description
* docs: improve gleaning description
* docs: fix spacing in gleaning docs
2025-06-26 21:25:26 -07:00
Shreya Shankar
d156351c69
docs: improve gleaning description ( #370 )
...
* docs: improve gleaning description
* docs: improve gleaning description
* docs: improve gleaning description
2025-06-26 21:19:51 -07:00
Shreya Shankar
631ea0c34f
feat: Add calibration support to map operations for improved consistency ( #365 )
...
* chore: run pre-commit
* feat: add calibration to map ops
* feat: add calibration to map ops
* fix: comment out flaky test
2025-06-14 23:04:13 -07:00
Shreya Shankar
8a7b4e1566
feat: add extract operator ( #361 )
...
* feat: add extract operator
* feat: add extract operator
* feat: add extract operator
* feat: add extract operator
2025-05-13 17:43:46 -07:00
Shreya Shankar
bb5bdff9d1
feat: adding `api_base` to yaml ( #359 )
...
* feat: adding api base to yaml
* feat: adding api base to yaml
* docs: add api base docs
2025-05-11 18:40:29 -07:00
Shreya Shankar
1a41df6cb5
fix: add system prompt and other config vars to python api ( #356 )
2025-05-03 18:05:22 -07:00
Shreya Shankar
3781edebfa
docs: update for rank op ( #344 )
...
* update documentation for rank op
* update documentation for rank op
2025-04-21 18:24:37 -07:00
Shreya Shankar
9ad27afa12
update documentation for rank op ( #343 )
2025-04-21 18:22:21 -07:00
Shreya Shankar
0b2d5b0324
feat: add rank operation card in docwrangler ( #341 )
...
* feat: add rank operation card in docwrangler
* feat: add rank operation card in docwrangler
* feat: add rank operation card in docwrangler
2025-04-21 16:02:19 -07:00
Shreya Shankar
c960d6fcc0
feat: add rank operation ( #340 )
...
* feat: add order by operator
* adding a bunch of order functions
* refactor: move around tests for rank operator
* add anthropic hh test for rank
* refactor: move around tests for rank operator
* refactor: move around tests for rank operator
2025-04-21 13:48:41 -07:00
Shreya Shankar
d20c3982e4
feat: add TPM rate limits & pipeline settings to DocWrangler ( #338 )
...
* add TPM rate limit
* feat: add rate limiting in docwrangler; tpm
* feat: add rate limiting in docwrangler; tpm
* feat: add rate limiting in docwrangler; tpm
2025-04-19 22:38:48 -07:00
Shreya Shankar
1c4e181201
docs: add python api quickstart ( #337 )
...
* docs: add python api quickstart
* docs: add python api quickstart
2025-04-19 09:49:50 -07:00
Shreya Shankar
e7fd306d99
Add python docs ( #334 )
2025-04-17 12:23:24 -07:00
shabie
4793c89e92
feat: add flag to stream map operation outputs to disk ( #323 )
...
* feat: add flag to stream map operation outputs to disk
* flush partial results default False
* comment out print statement in test basic map
* rewrite test to not use batched config
---------
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
2025-03-19 19:17:56 -06:00
Shreya Shankar
fa5460b4c0
feat: add n parameter to output ( #320 )
...
* feat: add n parameter to output
* feat: add n parameter to output
* feat: add n parameter to output
* feat: add n parameter to output
2025-03-13 23:25:03 -07:00
Shreya Shankar
080e7f75ec
feat: refactoring map optimizer ( #311 )
2025-02-18 12:54:45 -08:00
Shreya Shankar
f4abe55191
feat: support pdf upload and add tutorial ( #309 )
2025-02-09 21:19:17 -08:00
Shreya Shankar
2a259a0d93
feat: add pandas df accessor ( #287 )
...
* feat: add pandas df accessor
* feat: add pandas df accessor
* feat: add pandas df accessor
2025-01-24 16:54:49 -08:00
Shreya Shankar
05c4357c59
docs: add verbose param ( #283 )
2025-01-22 12:23:46 +01:00
Shreya Shankar
eb995fdc77
feat: add llmstxt ( #267 )
...
* feat: adding llms.txt
* feat: adding llms.txt
* feat: adding llms.txt
* feat: adding llms.txt
2025-01-07 21:43:17 -08:00
Shreya Shankar
8b3e1ce640
fix: bypass vercel serverless functions when uploading datasets ( #266 )
2025-01-06 20:52:37 -08:00
Shreya Shankar
f38fbb8960
docs: update playground docs ( #259 )
...
* rebrand to docwrangler
* refactor: rebranding to docwrangler
* refactor: rebranding to docwrangler
* refactor: edit vercel.json
* fix: map optimizer should work
* docs: update playground docs
2025-01-02 00:33:05 -08:00
Shreya Shankar
662d6a2c5a
feat: add azure openai for the FE assistants ( #256 )
...
* chore: ui nits
* chore: ui nits
* feat: add tutorial for the supreme court hearings
* fix: output csv writing should work even if not all documents have all the keys
* feat: add azure openai for prompt improvement and chat, with logging
* Add system prompts to documentation
* fix: add setSystemPrompt to restore pipeline context
* docs: add vercel json
2025-01-01 11:41:43 -08:00
Shreya Shankar
af70998e6f
chore: add param to skip LLM calls when they fail ( #255 )
...
* feat: add skip_on_error param to map operations
* chore: add more helpful logging
* chore: prettier printing
* better logging when skipping on error
* better logging when skipping on error
2024-12-29 23:53:37 -06:00
Rohit Rawat
0e077aa740
added enum support ( #254 )
...
* added enum support
* tests: add test for enum type output
* docs: update docs to support enum type schemas
---------
Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
2024-12-26 17:50:23 -06:00
Shreya Shankar
f1a12d2700
WIP: separate frontend from backend so we can host the frontend ( #242 )
...
* fix: prompt engineer agent for map decomposition
* feat: host application
* chore: update azure
* chore: update azure
* chore: update azure
* fix: adding default ops in init
* chore: refactoring fastapi models
* feat: allowing frontend to use a separate server for backend
* feat: allowing frontend to use a separate server for backend
* feat: allowing frontend to use a separate server for backend
* feat: allowing frontend to use a separate server for backend
* feat: allowing frontend to use a separate server for backend
* feat: allowing frontend to use a separate server for backend
2024-12-22 17:42:40 -06:00
Shreya Shankar
bd40799e01
refactor recursive optimization for map operations ( #225 )
...
* feat: add column view dialog
* feat: add column view dialog
* increase docker time
* feat: add feedback indicator in the row view
* feat: add prompt engineering flow for notes
* feat: add prompt engineering flow for notes
* feat: support resolve prompts in the prompt editor
* fix: fix build errors
* docs: update playground
* docs: update playground
* docs: update playground
* docs: update playground
* Update reduce folding instruction
* fix: make histogram calculation and rendering less blocking
* tests: make docker CI more robust
* docs: edit readme to be formatted better
* Edit system prompt intsruction
* Edit system prompt intsruction
* chore: refactor recursive optimization for map ops
* feat: small performance optimization to prompt improvement
2024-12-03 17:04:25 -08:00
Sushruth Booma
813946141f
Fix cache naming ( #220 )
2024-12-02 11:43:00 -06:00
Shreya Shankar
8f1a9dbfdf
edit system prompt in prompt improvement ( #223 )
...
* feat: add column view dialog
* feat: add column view dialog
* increase docker time
* feat: add feedback indicator in the row view
* feat: add prompt engineering flow for notes
* feat: add prompt engineering flow for notes
* feat: support resolve prompts in the prompt editor
* fix: fix build errors
* docs: update playground
* docs: update playground
* docs: update playground
* docs: update playground
* Update reduce folding instruction
* fix: make histogram calculation and rendering less blocking
* tests: make docker CI more robust
* docs: edit readme to be formatted better
* Edit system prompt intsruction
* Edit system prompt intsruction
2024-12-02 08:47:43 -06:00
Shreya Shankar
67a7f646a8
feat: add column view dialog ( #214 )
...
* feat: add column view dialog
* feat: add column view dialog
* increase docker time
* feat: add feedback indicator in the row view
* feat: add prompt engineering flow for notes
* feat: add prompt engineering flow for notes
* feat: support resolve prompts in the prompt editor
* fix: fix build errors
* docs: update playground
* docs: update playground
* docs: update playground
* docs: update playground
2024-11-29 00:48:15 -06:00