Commit Graph

1117 Commits

Author SHA1 Message Date
zdy023 42361ff9ba
ver Jul7th
pip-installing directly from PyPI fails misteriously in postconfig
execution, possible owing to proxy configuration in the VM, adjusted
strategy by downloading the wheel on host and pip-installing it locally
on VM in thunderbird/d38192b0-17dc-4e1d-99c3-786d0117de77
2025-07-07 20:53:29 +08:00
zdy023 690f6ed6e7
ver Jul4th
fixed check_accessibility_tree function, updated the namespace
definitons according the values defined in server/main.py
2025-07-04 23:20:51 +08:00
zdy023 bfa796dc45
Merge branch 'main' into thbd_eval_fix 2025-07-04 23:14:00 +08:00
XXZ c8a6a22aad
Fix VLC task design (#238)
* fix: fix multiapp tasks

* fix: update instructions for VLC evaluation examples

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-04 20:39:48 +08:00
Shenzhennan 1b40a458de
Impress eval fix (#226)
* fix compare_pptx

* Fix impress-4ed5abd0-8b5d-47bd-839f-cacfa15ca37a eval script:Fix temporarily by ignoring the contaminated  To fix completely, compare source file needs to be updated

* fix impress domain

* fix a53 by changing gold

* fix impress a53

* fix impress b8d origin file

* add table font color check

* fix left pane check

---------

Co-authored-by: chenjix <3107760494@qq.com>
Co-authored-by: moonshot <moonshot@moonshotznshenMacBook-Pro.local>
Co-authored-by: Shen Zhennan <shenzhennan@moonshot.cn>
2025-07-04 13:32:02 +08:00
Zilong Zhou 587f929567
fix: proxy setup (#234) 2025-07-04 13:31:51 +08:00
Zilong Zhou 1308a80029
Update 5990457f-2adb-467b-a4af-5c857c92d762.json (#235) 2025-07-04 13:31:18 +08:00
yuanmengqi 3cd79c9830 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-03 16:57:49 +00:00
yuanmengqi a651b04e49 Update AWS AMI ID, enhance directory creation logic in file upload, modify osworld service configuration, and refine JSON evaluation examples for improved clarity and functionality. 2025-07-03 16:57:41 +00:00
Danyang Zhang adc9ad88c2
Thunderbird eval fix (#233)
* ver Jul2nd

updated task requiring set up new email account

* ver Jul3rd

fixed several tasks
2025-07-03 21:55:55 +08:00
XXZ ac24ccce99
fix: fix multiapp tasks (#229)
Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-03 21:53:58 +08:00
yuanmengqi 7b2120c843 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-03 13:50:35 +00:00
yuanmengqi cb4bed20a0 Refactor compare_python_pure_text function for improved normalization and error handling. Update JSON example to clarify instruction for extracting Python code from Colab, changing output file names for consistency. 2025-07-03 13:50:21 +00:00
Yuan Mengqi b2fb8b4222
fix chrome tasks (#230)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-03 21:32:41 +08:00
zdy023 3cf80eaab8
ver Jul3rd
fixed several tasks
2025-07-03 20:55:30 +08:00
ChenYXxxx bdaf37e0e5
fix_os&gimp (#220)
* Update ec4e3f68-9ea4-4c18-a5c9-69f89d1178b3.json

* Update c288e301-e626-4b98-a1ab-159dcb162af5.json

* Update 3ce045a0-877b-42aa-8d2c-b4a863336ab8.json

* Update b3d4a89c-53f2-4d6b-8b6a-541fb5d205fa.json

* Update 2e6f678f-472d-4c55-99cc-8e7c5c402a71.json

Please batch process all images on the desktop by increasing their brightness to 50, instead of adjusting them individually.

* Update 5ca86c6f-f317-49d8-b6a7-b527541caae8.json

* Update a746add2-cab0-4740-ac36-c3769d9bfb46.json

* Update a746add2-cab0-4740-ac36-c3769d9bfb46.json

* Update 62f7fd55-0687-4a43-b6e1-3eda16fc6252.json

* Update d52d6308-ec58-42b7-a2c9-de80e4837b2b.json

* Update d16c99dc-2a1e-46f2-b350-d97c86c85c15.json

* Update d16c99dc-2a1e-46f2-b350-d97c86c85c15.json

* Update 58d3eeeb-e9d0-499f-962e-fd0db2a744d8.json
2025-07-03 16:59:05 +08:00
Tianbao Xie bba367b8bc
fix: fix multiapps tasks (#231)
* Update JSON example for multi_apps: change snapshot name and specify presenter in instructions for clarity.

* Enhance PDF image comparison in chrome.py by adding existence checks for input files and improving image extraction logic. Introduce image hashing for similarity scoring with a configurable threshold. Update docs.py to support fuzzy matching in DOCX file comparisons, allowing for similarity scoring based on text content. Modify example JSON to enable fuzzy matching option.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-03 16:58:43 +08:00
Tianbao Xie e9c657b714
fix: Libreoffice writer fix (#232)
* Refactor LibreOffice Writer example JSON to support multiple expected and result files for line spacing comparison, enhancing evaluation flexibility. Updated function calls and added additional expected file paths.

* Update source link in LibreOffice Writer example JSON to a more relevant help page for inserting tables, improving instructional clarity.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-03 16:58:08 +08:00
zdy023 c4b47886d9
ver Jul2nd
updated task requiring set up new email account
2025-07-02 20:46:04 +08:00
Zilong Zhou 4d9528f208
feat&fix: add proxy support in get_info_from_website function (#228) 2025-07-02 18:13:15 +08:00
Zilong Zhou 595a704aff
fix: fix proxy setup (#227)
* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example
2025-07-02 01:36:32 +08:00
Danyang Zhang d4273d992e
Calc eval fix (#225)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-30 18:23:09 +08:00
Tianbao Xie 30138c5db1
VLC fix (#224)
* Enhance SetupController with improved logging and error handling during setup and file upload processes. Update instance type to t3.xlarge and AMI ID for AWS configuration. Add download progress logging and exception handling for better debugging.

* Enhance VLC status evaluation by adding multiple paths for file and URL information extraction, improving robustness against varying VLC XML structures. Implement detailed logging for better debugging and error handling in case of mismatches or missing data. Update example JSON for VLC evaluation to use a valid HLS stream URL.

* Improve audio comparison robustness in VLC evaluator by adding error handling for audio file loading and extraction. Implement detailed logging for empty or corrupt files, and normalize DTW distance calculation for more accurate similarity scoring. Remove deprecated audio fingerprint comparison function.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-29 20:18:44 +08:00
Tianbao Xie 0cc93543a8
Environment is_used flag; OS domain fix (#219)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

* Enhance DesktopEnv to track environment usage for optimized snapshot management. Introduce is_environment_used flag to determine if a snapshot revert is necessary based on provider type. Update setup and step methods to mark environment usage appropriately. Add new execute_with_verification method in SetupController for command execution with result verification, improving reliability. Change AWS instance type to m5.large for better performance and update AMI ID for compatibility. Update file opening logic in main.py to handle both file paths and application commands more effectively.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-28 00:45:53 +08:00
MillanK 48ac57697a
VSCode fix (#222) 2025-06-24 17:08:09 +08:00
Zilong Zhou 634e1c3d6f
Reduce the startup time of the software on AWS from one minute to five seconds. (#221)
* feat: use SSD with high throughput

* fix&refactor: update AMI ID and change EBS volume type to gp3 with adjusted IOPS and throughput
2025-06-24 15:35:38 +08:00
Zilong Zhou 3d8f1779a2
feat: use SSD with high throughput (#218) 2025-06-17 18:39:42 +08:00
Tianbao Xie 4e11eafd1d
Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
Kaixin Li 347238e17e
Get VM IP again when getting screenshot fails (#215)
In rare cases, the IP of the VM changes after it launches. We can get the IP every time we retry to ensure the correct connection.
2025-06-16 02:40:40 +08:00
Yuan Mengqi 40354322e8
fix pub eval readme typo (#214)
* update clean code

* fix pub eval readme typo
2025-06-10 22:57:16 +08:00
Yuan Mengqi 362499330e
update clean code (#213) 2025-06-10 22:18:03 +08:00
Yuan Mengqi 4ce05b89ae
Merge pull request #212 from yuanmengqi/aws_clean
AWS OSWorld Provider Enhancement, Proxy Intergration, new Agent Operator Inplementation
2025-06-10 21:44:18 +08:00
yuanmengqi 8a1fc5c385 edit pub eval readme 2025-06-10 13:37:26 +00:00
yuanmengqi b8d229cdb3 edit pub eval readme 2025-06-10 13:36:48 +00:00
yuanmengqi fbe88799cf edit pub eval readme 2025-06-10 13:36:03 +00:00
yuanmengqi 3b5e4f3b15 edit pub eval readme 2025-06-10 13:34:42 +00:00
yuanmengqi 2d5439d062 edit pub eval readme 2025-06-10 13:32:24 +00:00
yuanmengqi 2d3347ca3e edit pub eval readme 2025-06-10 13:28:54 +00:00
yuanmengqi 1b09d63cb2 edit pub eval readme 2025-06-10 13:27:53 +00:00
yuanmengqi 2bae228803 merge upstream 2025-06-10 13:23:03 +00:00
yuanmengqi 7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
yuanmengqi caf487b7cc Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-10 02:36:46 +00:00
yuanmengqi 3da32fe5cf update operator prompt 2025-06-10 02:35:53 +00:00
yuanmengqi caaa4e5baa fix: update AMI ID for us-east-1 region in AWS manager 2025-06-10 02:32:24 +00:00
yuanmengqi 02387f2cee feat: update DesktopEnv to support VMware provider and add proxy configuration
- Changed default provider name from "aws" to "vmware".
- Introduced `enable_proxy` parameter to control proxy support.
- Enhanced retry logic in the `reset` method to use a constant for maximum retries.
- Updated proxy handling to respect the new `enable_proxy` setting.
2025-06-09 16:35:13 +00:00
adlsdztony 168a2694f2 Merge branch 'feat/aws-provider-support' of https://github.com/xlang-ai/OSWorld into feat/aws-provider-support 2025-06-09 16:07:48 +00:00
adlsdztony bfae51d74d fix: enhance setup method with retry logic and return status 2025-06-09 16:07:13 +00:00
yuanmengqi 692486f8e7 add GDrive guideline 2025-06-09 14:59:47 +00:00
yuanmengqi 630f92fd7c fix: correct URL encoding in JSON examples for invoice paths 2025-06-09 08:06:27 +00:00
yuanmengqi b41339c5e5 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-09 04:27:07 +00:00