Commit Graph

58 Commits

Author SHA1 Message Date
Yuan Mengqi 38a30734a6
Improve code logic for password & resolution (#252)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

* Improve code logic for password & resolution

* edit

* Merge branch 'main' into fix_chrome

* fix chrome tasks

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-13 21:04:07 +08:00
Yuan Mengqi 27319ce1e3
fix password&resolution (#251)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-13 00:25:37 +08:00
Zilong Zhou 595a704aff
fix: fix proxy setup (#227)
* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example
2025-07-02 01:36:32 +08:00
Tianbao Xie 30138c5db1
VLC fix (#224)
* Enhance SetupController with improved logging and error handling during setup and file upload processes. Update instance type to t3.xlarge and AMI ID for AWS configuration. Add download progress logging and exception handling for better debugging.

* Enhance VLC status evaluation by adding multiple paths for file and URL information extraction, improving robustness against varying VLC XML structures. Implement detailed logging for better debugging and error handling in case of mismatches or missing data. Update example JSON for VLC evaluation to use a valid HLS stream URL.

* Improve audio comparison robustness in VLC evaluator by adding error handling for audio file loading and extraction. Implement detailed logging for empty or corrupt files, and normalize DTW distance calculation for more accurate similarity scoring. Remove deprecated audio fingerprint comparison function.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-29 20:18:44 +08:00
Tianbao Xie 0cc93543a8
Environment is_used flag; OS domain fix (#219)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

* Enhance DesktopEnv to track environment usage for optimized snapshot management. Introduce is_environment_used flag to determine if a snapshot revert is necessary based on provider type. Update setup and step methods to mark environment usage appropriately. Add new execute_with_verification method in SetupController for command execution with result verification, improving reliability. Change AWS instance type to m5.large for better performance and update AMI ID for compatibility. Update file opening logic in main.py to handle both file paths and application commands more effectively.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-28 00:45:53 +08:00
Tianbao Xie 4e11eafd1d
Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
yuanmengqi 2bae228803 merge upstream 2025-06-10 13:23:03 +00:00
yuanmengqi 7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
adlsdztony bfae51d74d fix: enhance setup method with retry logic and return status 2025-06-09 16:07:13 +00:00
adlsdztony 493abdeeab feat&refactor: add proxy setup functionality and update .gitignore for proxy config file 2025-06-07 11:24:49 +00:00
adlsdztony 71e9a1ead8 fix&refactor: improve error handling in download process and enhance start_emulator method signature 2025-06-06 09:08:14 +00:00
adlsdztony 0ca0085b18 fix: improve connection logging in SetupController 2025-06-05 11:04:33 +08:00
adlsdztony d8ae209162 fix&refactor: improve connection retry logic and remove unnecessary wait time for AWS instance readiness 2025-05-28 13:05:32 +08:00
adlsdztony 431a762421 feat&fix: add logging for setup function calls and include snapshot name in AWS provider configuration 2025-05-26 20:37:20 +08:00
Tianbao Xie 20442244fa
[Feature] Initialize and Implement Aguvis Evaluation on OSWorld (#98)
* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Clean

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
2024-11-11 12:36:16 +08:00
Pierre Carrier b35dc40ff4
SetupController: no server_port for chrome (#96) 2024-11-07 00:33:03 +08:00
HappySix 6419d707bc
Support Docker VM manager and provider (#75)
* Add docker provider framework

* Update VM download link

* Add stop container

* Update docker manager & provider

* Update

* Update

* Update provider
2024-09-28 21:10:40 +08:00
Timothyxxx df231889c9 Fix minor bug 2024-08-04 11:35:44 +08:00
Jason Lee fcdaf7ce0b
Update setup.py for update_browse_history function 2024-07-04 09:37:13 -05:00
Timothyxxx 97b567a287 Update README and ROADMAP; Fix typos; optimize the code for llm calling in agent.py 2024-04-26 13:32:41 +08:00
Timothyxxx 9c75df5dce Clean code; Refactor environment to pass screenshot content instead of path 2024-04-13 23:34:01 +08:00
rhythmcao da0dafc32c add multi-apps 5 examples by ruisheng 2024-03-06 2024-03-06 21:20:26 +08:00
David Chang c39926fc57
Merge branch 'main' into zdy 2024-02-15 22:27:10 +08:00
Timothyxxx fdb5655c89 Update chrome examples 2024-02-08 13:49:29 +08:00
David Chang c46fcbfcbe
ver Feb2ndv3
working on human eval for multi_apps
2024-02-02 09:30:10 +08:00
David Chang 5ee9621e0d
ver Feb2nd
human evaluation as non-expert on chrome tasks
2024-02-02 05:13:12 +08:00
Timothyxxx d65b6994d3 Fix minor bugs of multiple apps examples 2024-01-31 19:40:41 +08:00
tsuky_chen 932b73c67d load libreoffice writer eval -batch 2 2024-01-26 02:15:42 +08:00
tsuky_chen 3e7cfa8699 load libreoffice writer eval -batch 2 2024-01-26 02:07:26 +08:00
rhythmcao 5ac80dc309 update examples 2024-01-26 00:53:35 +08:00
rhythmcao 5a5309c0fd add multi-app example, fix googledrive functions 2024-01-25 20:30:54 +08:00
Timothyxxx b9ae4174b1 Fix OS examples annotated by Yitao 2024-01-25 19:57:32 +08:00
rhythmcao f194fb8d75 add multi_apps; update chrome utilities 2024-01-25 13:53:19 +08:00
David Chang ffc4c32bac
ver Jan17th
updated the existing task configs
2024-01-17 17:27:08 +08:00
David Chang fc289a3427
Merge branch 'main' into zdy 2024-01-15 12:12:05 +08:00
David Chang 59fdd9f1a2
ver Jan14th
setup method for Thunderbird composing tasks
2024-01-14 23:16:54 +08:00
Timothyxxx d52b692ee5 Finish loading the vscode examples v1; Improve on the infra: Add accessibility tree into the observation; Add activate window function, etc 2024-01-14 18:30:49 +08:00
Timothyxxx 2228f346a9 Fix minor bugs caused from merging in setupcontroller; Initialize vscode example loading 2024-01-14 00:51:26 +08:00
Timothyxxx 186df65683 Merge remote-tracking branch 'origin/main'
# Conflicts:
#	desktop_env/controllers/setup.py
#	desktop_env/evaluators/metrics/utils.py
2024-01-12 17:30:15 +08:00
Timothyxxx 5a93a32958 Update on Chrome examples; Refactor on logic of controlling 2024-01-12 17:24:47 +08:00
David Chang 27eaf2f5d5
ver Jan11th
finally set up a simple task, or which should be simple
2024-01-11 20:03:33 +08:00
David Chang cebae4b183
Merge branch 'main' into zdy 2024-01-10 22:16:25 +08:00
David Chang 1515b05666
ver Jan10thv2
a new example config for Thunderbird
fixed several bugs
2024-01-10 21:58:29 +08:00
Timothyxxx abcafce750 VLC updates, and some infra bugs fix 2024-01-09 23:14:06 +08:00
Timothyxxx fa84b20ea5 VLC updates, and some infra bugs fix 2024-01-09 09:30:11 +08:00
David Chang 26b7d9010d
Merge branch 'zdy' 2024-01-05 15:55:41 +08:00
David Chang eeb8a120d6
ver Jan5th
debugged
2024-01-05 15:20:47 +08:00
David Chang 5fedf5b891
ver Jan4th
updated interfaces for thunderbird evaluation, not tested
2024-01-04 22:41:57 +08:00
Timothyxxx ab71ebb2ba Initialize VLC getters and metrics, fix some bugs in infra logic, needs to be refactored later on 2024-01-04 17:05:17 +08:00
David Chang 15a63074bc
Merge branch 'zdy' 2023-12-25 21:05:44 +08:00