Commit Graph

500 Commits

Author SHA1 Message Date
Tianbao Xie 30138c5db1
VLC fix (#224)
* Enhance SetupController with improved logging and error handling during setup and file upload processes. Update instance type to t3.xlarge and AMI ID for AWS configuration. Add download progress logging and exception handling for better debugging.

* Enhance VLC status evaluation by adding multiple paths for file and URL information extraction, improving robustness against varying VLC XML structures. Implement detailed logging for better debugging and error handling in case of mismatches or missing data. Update example JSON for VLC evaluation to use a valid HLS stream URL.

* Improve audio comparison robustness in VLC evaluator by adding error handling for audio file loading and extraction. Implement detailed logging for empty or corrupt files, and normalize DTW distance calculation for more accurate similarity scoring. Remove deprecated audio fingerprint comparison function.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-29 20:18:44 +08:00
Tianbao Xie 0cc93543a8
Environment is_used flag; OS domain fix (#219)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

* Enhance DesktopEnv to track environment usage for optimized snapshot management. Introduce is_environment_used flag to determine if a snapshot revert is necessary based on provider type. Update setup and step methods to mark environment usage appropriately. Add new execute_with_verification method in SetupController for command execution with result verification, improving reliability. Change AWS instance type to m5.large for better performance and update AMI ID for compatibility. Update file opening logic in main.py to handle both file paths and application commands more effectively.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-28 00:45:53 +08:00
MillanK 48ac57697a
VSCode fix (#222) 2025-06-24 17:08:09 +08:00
Tianbao Xie 4e11eafd1d
Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
yuanmengqi 630f92fd7c fix: correct URL encoding in JSON examples for invoice paths 2025-06-09 08:06:27 +00:00
yuanmengqi 3e541bb393 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-08 04:01:35 +00:00
yuanmengqi 8853671220 fix: enhance instruction clarity and adjust timing in automation script for LibreOffice Impress example 2025-06-07 21:17:00 +00:00
yuanmengqi 9fa768d24d refactor: update URLs in multiple JSON files to ensure proper encoding of special characters 2025-06-07 17:26:45 +00:00
yuanmengqi f48d80002f Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-07 13:22:53 +00:00
yuanmengqi 8d0ff7c99c refactor: update VLC command configurations to suppress audio and video title display across multiple JSON examples 2025-06-07 09:02:49 +00:00
yuanmengqi a146c1e0b7 edit prompt 2025-06-07 05:21:04 +00:00
yuanmengqi 4ea24ddfd3 add proxy 2025-06-06 09:41:22 +00:00
Timothyxxx fb7bafb885 feat: Add proxy configuration to all 369 evaluation examples - 55 with proxy, 314 without 2025-06-05 18:46:53 +08:00
Timothyxxx 34748567a5 feat: Migrate OSWorld files to HuggingFace cache with comprehensive documentation
- Add detailed README for file cache repository
- Implement migration script with retry logic and browser simulation
- Support automatic file type detection and deduplication
- Ensure reliable hosting for OSWorld evaluation files
2025-05-28 04:29:37 +08:00
Danyang Zhang 7bf99cb823
Update 15c3b339-88f7-4a86-ab16-e71c58dcb01e.json 2025-05-06 16:29:35 +08:00
Danyang Zhang e4097783bb
Update dfac9ee8-9bc4-4cdc-b465-4a4bfcd2f397.json 2025-05-06 16:28:52 +08:00
Thomas Kuntz af993b3a3d fix: Broken profile path in 3 Thunderbird tasks 2025-05-04 14:03:06 +02:00
Xubin Ren 1d10514125
Fix Search Engine Detection Discrepancy in Chrome Evaluation (#172)
* Update bb5e4c0d-f964-439c-97b6-bdb9747de3f4.json

* Update __init__.py

* Update general.py
2025-04-10 17:24:50 +08:00
Parth A. Patel bbfeecb475
fix: af2d657a-e6b3-4c6a-9f67-9e3ed015974c task config has type (#169)
Type on "examine_alignment" option results in false negatives
2025-04-06 02:20:51 +08:00
Timothyxxx d373817edb Modify VLC launch command and fullscreen detection
- Add VLC_VERBOSE=-1 to suppress verbose logging in VLC launch commands across multiple example files
- Update is_vlc_fullscreen function to handle cases where screen size or window size is None
- Improve robustness of VLC-related metrics and example configurations
2025-03-06 22:11:42 +08:00
Timothyxxx 13127de01e Fix id 2025-03-03 18:26:32 +08:00
Timothyxxx 2f0f3f31aa Fix Duplicate ids; Remove unused JSON files across multiple applications 2025-02-10 15:49:54 +08:00
MillanK 983283a86a
patch: minor bug fixes for evaluator and task configurations, documentation update (#121)
* fix: /cursor_position api return format fix

* chore: update README.md to remove deprecated command

* fix: add base score for evaluators and minor bug fixes

* fix: add base score for setup configurations

---------

Co-authored-by: Jiaqi Deng <jiaqideng@Jiaqis-MacBook-Pro.local>
2025-01-18 22:25:18 +08:00
YangJL2003 3148973ce9
Update c1fa57f3-c3db-4596-8f09-020701085416.json 2025-01-14 22:56:32 +08:00
Timothyxxx 63e69cab08 Fix one instruction error in chrome 6766f2b8-8a72-417f-a9e5-56fcaa735837 2024-12-09 12:35:02 +08:00
Jiaqi DENG e0d0041520 chore: modify windows evaluations samples 2024-09-21 23:46:39 +08:00
Timothyxxx 098549d621 Fix one answer 2024-08-15 22:35:57 +08:00
Timothyxxx 794b3ab469 Fix broken links 2024-08-15 01:29:47 +08:00
Timothyxxx 7b38e21b36 Re-org the files in multi_apps subset; fix broken links 2024-08-08 00:17:26 +08:00
tsuky_chen b4bbe4a3b6
Update 0a211154-fda0-48d0-9274-eaac4ce5486d.json 2024-07-05 00:50:14 +08:00
Timothyxxx 25e808cc91 Fix known errors found from feedback (DBUS problems, pulseaudio start, one vlc example with error. typos) 2024-05-18 04:49:29 +08:00
David Chang bc47886c8a
Merge branch 'main' of github.com:ztjhz/DesktopEnv 2024-05-16 10:59:07 +08:00
David Chang 74e400783a
ver May16th
updated a task config of thunderbird
2024-05-16 10:58:31 +08:00
Timothyxxx 09ffcc8542 Fix errors found in the examples (some broken links caused by Google Drive; dbus conflict) 2024-05-15 03:05:58 +08:00
tsuky_chen 02a6e4779b fix 2024-04-05 02:01:39 +08:00
tsuky_chen 773fec2b3e add windows exp 2024-04-05 01:37:43 +08:00
tsuky_chen 31ed626bdc Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-26 17:10:04 +08:00
tsuky_chen 9bd37f0579 add windows example 2024-03-26 17:05:55 +08:00
Timothyxxx 635b6717b3 Fix a key error in multiapps 2024-03-25 17:55:28 +08:00
Timothyxxx 92760b29e1 Merge remote-tracking branch 'origin/main' 2024-03-21 22:05:40 +08:00
Timothyxxx 3ce7636abd Fix one multi_app example; remove some broken examples; Support downsampling 2024-03-21 22:05:16 +08:00
tsuky_chen ca03baacf5 fix conflict 2024-03-21 16:01:31 +08:00
tsuky_chen 3d2ff5d64e fix when checking 2024-03-21 15:57:05 +08:00
tsuky_chen 169a0a15ad add libreoffice examples for windows 2024-03-21 15:49:54 +08:00
David Chang 402fcf01d0
ver Mar21stv2
fixed error
2024-03-21 15:30:59 +08:00
David Chang dac44b2c4f
ver Mar21st
Windows multi_app tasks
2024-03-21 15:03:21 +08:00
David Chang aa3411f7e4
Merge branch 'zdy' 2024-03-20 22:25:45 +08:00
David Chang 15e01e7ccc
ver Mar20thv2
fixed bugs in server/main.py (_create_pywinauto_node and
  get_screen_size)
finished migration of a few task configs to Windows
fixed bug in python.py
2024-03-20 22:22:57 +08:00
tsuky_chen d2d4a54a3f
Update c6bf789c-ba3a-4209-971d-b63abf0ab733.json 2024-03-20 20:45:45 +08:00
tsuky_chen 966339dee0
Update 70745df8-f2f5-42bd-8074-fbc10334fcc5.json 2024-03-20 20:38:27 +08:00
tsuky_chen 21e3ce5cba
Update 70745df8-f2f5-42bd-8074-fbc10334fcc5.json 2024-03-20 20:17:18 +08:00
tsuky_chen 2746bcfe24
Update c6bf789c-ba3a-4209-971d-b63abf0ab733.json 2024-03-20 20:15:04 +08:00
David Chang 6149061621
ver Mar20th
fixed a bug in _create_pywinauto_node
2024-03-20 14:25:09 +08:00
David Chang 25dae64fa6
ver Mar19th
partial windows task configs
2024-03-19 22:31:11 +08:00
BlankCheng f5da5e940b Merge main 2024-03-18 22:21:01 +08:00
BlankCheng 4671455b56 Fix eval func 2024-03-18 22:16:04 +08:00
rhythmcao 1c9c5fd2ad fix multi_apps/51f5801c-18b3-4f25-b0c3-02f85507a078.json missing file problems: who delete it on googledrive??? 2024-03-18 20:51:53 +08:00
Timothyxxx eeae1442cd Add execute timeout to server; Fix error examples 2024-03-18 20:42:57 +08:00
David Chang 4067572af7
Merge branch 'main' of github.com:ztjhz/DesktopEnv 2024-03-17 23:04:12 +08:00
David Chang 1c732ea5d2
Merge branch 'zdy' 2024-03-17 23:03:35 +08:00
David Chang 9bafe09372
ver Mar17th
fixed an error in task config
2024-03-17 23:01:50 +08:00
rhythmcao 7feeab8f6b add missing file 2024-03-17 01:42:43 +08:00
tsuky_chen 65823fcfab Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-16 19:23:55 +08:00
tsuky_chen ddb7131891 fix multi apps instruction 2024-03-16 19:22:20 +08:00
Jason Lee 1789a28657 Merge branch 'main' of github.com:xlang-ai/DesktopEnv 2024-03-15 16:52:33 +08:00
Jason Lee 815c7ab67c filter unfinished examples and add timer to ensure upper limit of each example 2024-03-15 16:52:17 +08:00
David Chang 2b9772174e
ver Mar15th
fixed bugs about infeasible task evaluation
2024-03-15 12:25:41 +08:00
Timothyxxx f9ccaa5773 Move sheetcopilot examples into libreoffice calc folder 2024-03-14 12:57:15 +08:00
Jason Lee cee3b93009 update all ids in experiment_screenshot.py 2024-03-13 21:06:55 +08:00
Jason Lee 670e20a248 update examples 2024-03-13 20:31:52 +08:00
Timothyxxx a7663c534a Update 2024-03-13 16:51:38 +08:00
BlankCheng 4b15595146 Update fix 2024-03-12 00:17:46 +08:00
tsuky_chen 5a4ba28735 fix multi apps 2024-03-11 14:44:11 +08:00
Timothyxxx 3f21519b78 Merge remote-tracking branch 'origin/main' 2024-03-10 23:52:39 +08:00
Timothyxxx b3d27f6387 Fix bugs in multiple examples 2024-03-10 23:52:29 +08:00
tsuky_chen 88b90dd328 Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-10 23:50:23 +08:00
tsuky_chen 59348e3134 fix multi apps 2024-03-10 23:49:28 +08:00
David Chang cb4be00f80
Merge branch 'zdy' 2024-03-10 18:04:43 +08:00
David Chang f08fa4912c
ver Mar10th
changed AT element filtering
2024-03-10 18:03:02 +08:00
Timothyxxx 3f19cc5117 Fix bugs in chrome example 2024-03-10 17:06:39 +08:00
Timothyxxx 4645682b9e Merge remote-tracking branch 'origin/main' 2024-03-10 15:16:40 +08:00
Timothyxxx 8fd27f155e Fix bugs in multiple apps example 0e53 2024-03-10 15:16:28 +08:00
Jason Lee 812be97a41 Merge branch 'main' of github.com:xlang-ai/DesktopEnv 2024-03-10 14:50:17 +08:00
Jason Lee 775cef744f xiaochuan correct his bugs in multiapp examples, you can try it again now 2024-03-10 14:48:56 +08:00
Timothyxxx a12dfacbd7 Add more time for waiting impress in 47f7c0ce-a5fb-4100-a5e6-65cd0e7429e5 2024-03-10 12:10:05 +08:00
Timothyxxx e481afcf5c Fix multiple examples 2024-03-09 23:01:22 +08:00
Timothyxxx b0854e519c Minor fix on instruction of 81c425f5-78f3-4771-afd6-3d2973825947 2024-03-09 21:20:44 +08:00
Timothyxxx 447c886b0a Fix multiple apps 5990457f-2adb-467b-a4af-5c857c92d762 2024-03-09 20:54:52 +08:00
tsuky_chen aae848196b merge 2024-03-09 18:53:27 +08:00
tsuky_chen f4ec36bdfb fix multi apps 2024-03-09 18:48:17 +08:00
Jason Lee 2291af394f update google drive file link in json 2024-03-09 18:06:48 +08:00
Tianbao Xie f01153cadd
Merge branch 'main' into xiaochuanli/addChromeExtensions 2024-03-08 20:45:49 +08:00
Tianbao Xie 4b841c199a
Merge pull request #12 from xlang-ai/zhoujun/multi-app
Update multi-app examples
2024-03-08 20:41:14 +08:00
Timothyxxx 2b119d59b4 Merge remote-tracking branch 'origin/main' 2024-03-08 20:39:20 +08:00
Timothyxxx 6f0fe4f482 Fix a bug in multiple apps example 2024-03-08 20:39:05 +08:00
tsuky_chen 3761de4a05 Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-08 20:37:40 +08:00
tsuky_chen 4070b41fbd fix multi apps 2024-03-08 20:36:34 +08:00
rhythmcao 365c7798f1 Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-08 19:26:04 +08:00
rhythmcao 8df2233730 add multi-turn examples (in total, add 12 examples by ruisheng.cao 2024-03-08) 2024-03-08 19:25:51 +08:00
Jason Lee 62fd8feebb xiaochuan's multiapp examples 2024-03-08 19:24:15 +08:00