Commit Graph

41 Commits

Author SHA1 Message Date
Shreya Shankar 0f6d9514e3
Add type checks to validation functions (#427)
* feat: Add output type validation to map and reduce operations

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refine API error message for validation failures

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Test map type validation for integer answers

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* fix test for validating schemas

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-09-12 16:18:21 -07:00
Shreya Shankar 4cfd371744
feat: add topk implementation (#410) 2025-08-13 13:27:54 -07:00
Shreya Shankar 79542259cb
Refactor sample operation for multiple stratify keys (#408)
* Enhance sample operation with multi-key stratification and per-group sampling

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor SampleOperation with improved validation and sampling methods

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor sample operation with improved stratification and simplified config

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-13 10:16:55 -07:00
Shreya Shankar f6ead4ea6b
Switch from poetry to uv (#402)
* Switch from Poetry to uv for dependency management and packaging

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Update README with uv installation and dependency management instructions

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Optimize Docker build: improve dependency installation and caching

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* verify that uv works on my local installation

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-08-09 13:20:32 -07:00
Sid Jha c89e269d67
Remove old typing imports (#389)
* Remove old typing imports

* Add future annotation

* Add back in import

* Use Iterator instead of Iterable

* fix: small edits to fix broken tests

---------

Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
2025-07-20 12:01:17 -07:00
Shreya Shankar 4cee4d7817
Clean up and reorganize pytest tests (#377)
* Replace api_wrapper with runner in test fixtures and configurations

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

* Refactor test fixtures and reorganize configuration in test files

Co-authored-by: ss.shankar505 <ss.shankar505@gmail.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-07-03 22:15:33 -07:00
Shreya Shankar b8d2beb602
feat: adding conditional gleaning (#375)
* fix: improve caching and don't raise error for bad gather configs

* fix: improve caching and don't raise error for bad gather configs

* feat: adding conditional gleaning
2025-07-01 22:40:38 -07:00
Shreya Shankar 631ea0c34f
feat: Add calibration support to map operations for improved consistency (#365)
* chore: run pre-commit

* feat: add calibration to map ops

* feat: add calibration to map ops

* fix: comment out flaky test
2025-06-14 23:04:13 -07:00
shabie 4793c89e92
feat: add flag to stream map operation outputs to disk (#323)
* feat: add flag to stream map operation outputs to disk

* flush partial results default False

* comment out print statement in test basic map

* rewrite test to not use batched config

---------

Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
2025-03-19 19:17:56 -06:00
Shreya Shankar 05c4357c59
docs: add verbose param (#283) 2025-01-22 12:23:46 +01:00
Shreya Shankar 57d1cd78a1
refactor: DSLRunner now uses a pull-based execution model (#273)
* partial commit

* refactor: dslrunner is now a pull based execution model

* refactor: dslrunner is now a pull based execution model

* refactor: optimizer is now using the new pull based execution model

* refactor: optimizer is now using the new pull based execution model

* refactor: optimizer is now using the new pull based execution model

* remove builder file

* remove builder file and make tests pass

* fix tests
2025-01-10 12:45:04 -08:00
Rohit Rawat 0e077aa740
added enum support (#254)
* added enum support

* tests: add test for enum type output

* docs: update docs to support enum type schemas

---------

Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
2024-12-26 17:50:23 -06:00
Shreya Shankar aa5c2a5c93 test: add new optimizer test 2024-11-13 22:11:37 -08:00
Shreya Shankar 7242da81f4 fix: allow user to pass in litellm completion kwargs 2024-11-11 14:04:40 -08:00
Shreya Shankar 2b9aea83dc fix: allow reduce_key types to be lists 2024-11-11 13:14:19 -08:00
Shreya Shankar 33c24365b6 move tests into basic dir 2024-10-31 17:39:20 -07:00
Shreya Shankar 604ac257fe feat: adding batching for map and filter calls 2024-10-29 19:14:30 -07:00
Shreya Shankar 50936a8901 fix test: comment out timeout test 2024-10-27 19:25:34 -07:00
Shreya Shankar f6e3cd0473 mark as flaky test 2024-10-22 12:47:31 -07:00
Shreya Shankar a6f72510cd refactor: remove optimizer from configwrapper 2024-10-19 13:25:00 -07:00
Shreya Shankar 7e0deb354e fix: tests check for cost 2024-10-13 22:02:43 -04:00
Shreya Shankar eecbd41e44 refactor: address redhog feedback 2024-10-13 17:43:48 -04:00
Shreya Shankar 658b59e689 refactor: address redhog feedback 2024-10-13 17:37:32 -04:00
Shreya Shankar 64b5345043 fix: change gleaning prompt to validation_prompt 2024-10-13 17:17:45 -04:00
Shreya Shankar 307456b98c feat: add reduce operation lineage 2024-10-13 17:17:45 -04:00
Shreya Shankar 38a073ff75 refactor: combine sampling and outlier operators 2024-10-12 17:51:27 -04:00
Shreya Shankar 042bdf2e05 update test 2024-10-12 12:35:28 -07:00
Shreya Shankar c158ae11e8 refactor: move validation and gleaning into call llm 2024-10-11 17:17:32 -07:00
Shreya Shankar 70604cb08b
Merge staging to main (after adding cluster operator) (#88)
* Parsers can now return any number of fields, and can access the whole item

* nit: change gpt-4o to gpt-4o-mini in tests

* feat: add verbose parameter for gleaning

* feat: add verbose parameter for gleaning

* fix: tokenizers should be wrapped in try catch

* fix: resort to eval if ast eval does not work

* docs: update docs to reflect new custom parsing API

Co-authored-by: redhog <redhog@users.noreply.github.com>

* Clustering (#84)

* nit: change gpt-4o to gpt-4o-mini in tests

* feat: add verbose parameter for gleaning

* feat: add verbose parameter for gleaning

* fix: tokenizers should be wrapped in try catch

* fix: resort to eval if ast eval does not work

* Merge staging to main (after parsers refactor) (#82)

* Parsers can now return any number of fields, and can access the whole item

* nit: change gpt-4o to gpt-4o-mini in tests

* feat: add verbose parameter for gleaning

* feat: add verbose parameter for gleaning

* fix: tokenizers should be wrapped in try catch

* fix: resort to eval if ast eval does not work

* docs: update docs to reflect new custom parsing API

---------

Co-authored-by: Egil <egil.moller@freecode.no>

* Added new clustering operation

* Reverse path

* Added docs for cluster operator

* Bugfix for docs formatting

* docs: add sample parameter (#87)

* Added new clustering operation

* Reverse path

* Added docs for cluster operator

* Bugfix for docs formatting

* add tests and link to doc

---------

Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com>
Co-authored-by: Egil <egil.moller@freecode.no>

* fix: fixing params in test

---------

Co-authored-by: Egil <egil.moller@freecode.no>
Co-authored-by: redhog <redhog@users.noreply.github.com>
Co-authored-by: Egil Möller <redhog@redhog.org>
2024-10-08 23:51:02 -07:00
Shreya Shankar 2e6997d646
Merge staging to main (after parsers refactor) (#82)
* Parsers can now return any number of fields, and can access the whole item

* nit: change gpt-4o to gpt-4o-mini in tests

* feat: add verbose parameter for gleaning

* feat: add verbose parameter for gleaning

* fix: tokenizers should be wrapped in try catch

* fix: resort to eval if ast eval does not work

* docs: update docs to reflect new custom parsing API

---------

Co-authored-by: Egil <egil.moller@freecode.no>
2024-10-07 21:33:32 -07:00
Shreya Shankar 69b491eebb refactor: switch changes back to the runner object 2024-10-07 09:30:24 -07:00
Shreya Shankar 27de5f2615 refactor: improving tests and consistency in the rate limits refactor 2024-10-06 22:55:06 -07:00
Shreya Shankar 74fad079af fix: enable gleaning llm calls to work 2024-10-05 09:20:59 -07:00
Shreya Shankar 5883fb354f fix: enable gleaning llm calls to work 2024-10-05 09:00:25 -07:00
Shreya Shankar d12bf3a004 docs: improving documentation for pipeline api 2024-10-04 09:07:38 -07:00
Shreya Shankar 4454949429 feat: support for gemini 2024-10-03 11:44:02 -07:00
Shreya Shankar 9846f01324 docs: update documentation for custom parsers 2024-09-30 22:14:39 -07:00
Shreya Shankar 674c64bf0f test: reduce the character minimum for parsing test 2024-09-30 21:57:03 -07:00
Shreya Shankar c3416753f0 feat: add custom dataset parsers 2024-09-30 18:32:04 -07:00
Shreya Shankar e8694356b2 chore: refactor schemas.py to also include ops from api.py 2024-09-30 15:57:47 -07:00
Shreya Shankar 3e98bcfe9d rebase with main 2024-09-30 15:14:49 -07:00