Compare commits

...

236 Commits

Author SHA1 Message Date
蘑菇先生 5ef8bdfa35
EvoCUA Update (2025.01.05) (#412)
* evocua init

* setup max_token

* evocua update

---------

Co-authored-by: xuetaofeng <xuetaofeng@meituan.com>
Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2026-01-05 16:14:53 +08:00
Bowen Yang 439e178a2e
fix(os_symphony_evaluation) (#410)
* fix(os_symphony)

* Update desktop_env_os_symphony.py

* fix(os_symphony_desktop)

* fix(os_symphony_start)

* Add docstring to run_multienv_os_symphony.py

Added documentation header for the evaluation script.
2026-01-04 15:56:51 +08:00
Bowen Yang 951e1928c8
fix(desktop_os_symphony):support aws (#406)
* fix(os_symphony)

* Update desktop_env_os_symphony.py
2026-01-01 11:27:34 +08:00
Bowen Yang 02a35be067
fix(os_symphony) (#405) 2025-12-30 22:43:47 +08:00
Bowen Yang 662826f57e
fix(os_symphony):prompt (#402)
* add_os_symphony

* fix(os_symphony)

* fix(os_symphony):prompt

---------

Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2025-12-29 20:45:36 +08:00
xuetf 410ec63a89
Add EvoCUA Support (#401)
* evocua init

* setup max_token

---------

Co-authored-by: xuetaofeng <xuetaofeng@meituan.com>
Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2025-12-23 20:46:23 +08:00
Bowen Yang 031696e83c
fix os_symphony (#400)
* add_os_symphony

* fix(os_symphony)

---------

Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2025-12-23 20:45:30 +08:00
Bowen Yang f593f35b1c
add_os_symphony (#399) 2025-12-23 14:30:44 +08:00
Ubuntu ac31778ee3 Update: requirements.txt for seed agent 2025-12-15 11:47:56 +00:00
Ubuntu 60caa52fc4 Update: requirements.txt for seed agent 2025-12-15 11:47:40 +00:00
Ubuntu 41477a9c40 Update: seed agent 2025-12-15 11:45:57 +00:00
Ubuntu 78433ecfcf Add agent: seed agent 2025-12-12 05:35:20 +00:00
Meshal Nayim 9540454b0a
Fix demo agent (PromptAgent) reset(): add vm_ip and kwargs for compatibility with lib_run_single.py (#388) 2025-12-09 15:59:25 +08:00
MillanK cbc3b590ff
Task fix batch (#383)
* update 873cafdd-a581-47f6-8b33-b9696ddb7b05 task eval

* c1fa57f3-c3db-4596-8f09-020701085416 fix, add tolerance to url matching

* 8df7e444-8e06-4f93-8a1a-c5c974269d82 add more clear instruction to the filename for compress

* add address string normalization for 6f4073b8-d8ea-4ade-8a18-c5d1d5d5aa9a

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-11-19 17:24:25 +08:00
Qichen Fu 903ed36715
Add Claude Sonnet 4.5 support and improve action handling (#362)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-14 13:54:32 +08:00
Subash Shibu 3167339e45
Add hosted GBOX agent for OSWorld evaluation (#376) 2025-11-13 13:13:31 +08:00
Pengxiang-Li 00b6468eb7
feat/dart_gui (#371) 2025-11-07 21:50:01 +08:00
yiqilin 6d43dbc532
Update GIMP evaluation examples to replace local file paths with cloud file URLs for consistency and accessibility. (#372) 2025-11-07 21:49:49 +08:00
Timothyxxx 8365edc975 Add new section in README for OSWorld-MCP project 2025-10-30 06:06:48 +00:00
Daphne Barretto 21c2b7629b
Add consistent scores validation (#368)
* Add consistent scores validation

* revert osworld_run_maestro.py changes
2025-10-29 01:44:48 +08:00
Timothyxxx 3bf54c92a9 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-10-23 14:28:14 +08:00
Timothyxxx a484f2e484 Update setup.py for version bump and dependency adjustments
- Bump version from 1.0.0 to 1.0.1
- Update numpy dependency to allow versions >=1.26 and <3
- Adjust pandas dependency to allow versions >=2.2 and <2.3
- Add new __init__.py file in the docker provider directory
2025-10-23 14:27:52 +08:00
Atharva Gundawar 9f97535ef9
oswrold agent wrapper for trained v7 (#360) 2025-10-18 02:29:15 +08:00
ludunjie.ldj afd29115da support aliyun eval of qwen3vl 2025-10-16 16:20:54 +08:00
Dunjie Lu 55372c4432
Fix API base URLs for OpenAI and DashScope
Updated the base URLs for OpenAI and DashScope API calls.
2025-10-14 12:57:00 +08:00
Dunjie Lu d25464c203
Djlu/qwen3vl dash (#356)
* support dashscopoe sdk to call qwen3-vl-plus

* support dashscopoe sdk to call qwen3-vl-plus

---------

Co-authored-by: Timothyxxx <Timothyxxx@users.noreply.github.com>
2025-10-13 16:31:06 +08:00
Xinyuan Wang f9e9273b3b
OpenCUA-72B (#354)
* use aws pub ip

* os task fix: set the default dim screen time to be 300s

* OpenCUA-72B

* update password

* update

* update

* update opencua72b agent

* change provider ip

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-10-13 10:39:33 +08:00
Yan98 ddb8372a6c
init public release (#350) 2025-10-06 22:16:31 +08:00
eun2ce 5eff00a9e3
Fix #347: Fix NameError in open_file timeout message (#351)
- Fix undefined 'timeout' variable in error message
- Use defined TIMEOUT constant instead of undefined timeout variable
- Prevents NameError when LibreOffice crashes during file opening
2025-10-06 22:14:15 +08:00
Timothyxxx ff6285cfbb Add safe browsing feature to Chrome evaluator
- Implemented `get_enable_safe_browsing` function to retrieve safe browsing settings based on the operating system.
- Updated the `__init__.py` to include the new function.
- Modified JSON examples to reflect the change from enabling enhanced safety browsing to enabling safe browsing.
- Added necessary commands in the JSON examples for setting up preferences for safe browsing.
2025-10-05 04:56:08 +00:00
Danyang Zhang afd5952e44
ver Oct3rd (#349)
updated a series of instructions to ask the agent not to do any
unnecessary actions.
2025-10-04 00:13:29 +08:00
Timothyxxx 1572068035 Refactor evaluator functions in JSON examples to use URL pattern matching. Update expected URL formats to regex patterns for better validation in chrome evaluation examples. 2025-10-01 19:20:06 +00:00
Timothyxxx 9be518435c Update GIMP evaluation examples to replace local file paths with cloud file URLs for consistency and accessibility. 2025-10-01 09:54:52 +00:00
Timothyxxx bfb467da18 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-10-01 06:56:43 +00:00
Timothyxxx 4c685bed99 Update run_maestro.py to run in headless mode with a single environment and specify result directory. Adjust default TTL for AWS instances from 60 to 180 minutes in config.py. Enhance AWSProvider to handle missing security groups, subnet IDs, and instance types with fallbacks, and improve termination logic to skip already terminated instances while logging relevant information. 2025-10-01 06:56:33 +00:00
eun2ce 5eb5417188
fix #210: add a11y_tree support to UITARSAgent (#346) 2025-09-26 18:25:28 +08:00
Yanxiao Zhao 6827949418
fix _update_browse_history_setup (#345) 2025-09-25 13:22:40 +08:00
Yanxiao Zhao a4f8fe2f00
Add autoglm-os-9b-v (#344)
* update for autoglm-v

* Update run_autoglm.py

---------

Co-authored-by: hanyullai <hanyullai@outlook.com>
2025-09-24 19:43:28 +08:00
alexandruilie7 f59cf00cae
Add ui agent (#343)
* add uipath agent

* readme update
2025-09-24 19:42:46 +08:00
Long Chen 088e68798c
update aworldguiAgent code (#342) 2025-09-23 16:50:29 +08:00
Timothyxxx 584c7a9875 Enhance AWSProvider instance handling with fallback mechanisms for security groups, subnet IDs, and instance types. Implement checks to skip termination of instances already in 'shutting-down' or 'terminated' states, and handle potential termination errors gracefully. 2025-09-18 07:16:10 +00:00
molanhand 7213eca069
support mano agent (#338)
Co-authored-by: Fei Hu <molanhand@users.noreply.github.com>
2025-09-16 18:10:29 +08:00
ZhangZuhao dc7e46e7aa
Refactor platform detection for VM image download (#337)
Sometimes the platform detection for VM image download is wrong
2025-09-15 21:00:15 +08:00
Dunjie Lu b012301609
support qwen3vl agent (#336)
Co-authored-by: root <ludunjie1219@github.com>
2025-09-15 16:04:29 +08:00
Hiroid a668670349
fix(maestro): Fixed the debug logging level (#334)
Co-authored-by: Liangxuan Guo <guoliangxuan@deepmatrix.com.cn>
2025-09-11 01:03:59 +08:00
Hiroid 3a4b67304f
Add multiple new modules and tools to enhance the functionality and extensibility of the Maestro project (#333)
* Added a **pyproject.toml** file to define project metadata and dependencies.
* Added **run\_maestro.py** and **osworld\_run\_maestro.py** to provide the main execution logic.
* Introduced multiple new modules, including **Evaluator**, **Controller**, **Manager**, and **Sub-Worker**, supporting task planning, state management, and data analysis.
* Added a **tools module** containing utility functions and tool configurations to improve code reusability.
* Updated the **README** and documentation with usage examples and module descriptions.

These changes lay the foundation for expanding the Maestro project’s functionality and improving the user experience.

Co-authored-by: Hiroid <guoliangxuan@deepmatrix.com>
2025-09-08 16:07:21 +09:00
Timothyxxx 029885e78c Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-09-05 15:36:39 +00:00
Timothyxxx 640f3fcd96 Update default path_to_vm argument to None in quickstart.py for improved flexibility 2025-09-05 15:36:31 +00:00
Timothyxxx 756923beea Update instruction wording in LibreOffice Impress example to clarify text color change requirements. Address https://github.com/xlang-ai/OSWorld/issues/324 2025-09-01 23:29:47 +08:00
Timothyxxx 0c681b91e0 Fix README update 2025-09-01 15:15:50 +00:00
aneeshprasad1 8513e8c89e
Add quickstart script and update README (#325)
Co-authored-by: Aneesh Prasad <aneeshprasad@Aneeshs-MacBook-Pro.local>
2025-09-01 23:14:24 +08:00
Howie 756e006af6
add support for mobile agent v3 (#328)
* add support for mobile agent v3

* add mobile_agent

* add support for mobile agent v3
2025-08-31 22:58:41 +08:00
hanyullai 54a14cbc07
fix multienv bug (#327) 2025-08-30 11:10:53 +08:00
Howie 3344abd641
Add support for GUI-Owl agent (#318)
* add run_multienv_owl.py

* add owl_agent.py
2025-08-27 18:03:39 +08:00
Timothyxxx ef2f35de22 Add resource group ID support for Aliyun VM allocation
- Introduced ALIYUN_RESOURCE_GROUP_ID environment variable to manage resource group assignments during VM allocation.
- Updated the _allocate_vm function to include resource group ID in the request if specified.
- Modified VNC URL logging to use public IP when available, enhancing clarity in access information.
- Maintained existing code logic while improving functionality for resource management and logging.
2025-08-26 13:28:23 +08:00
Timothyxxx 4c773f6f7c Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-22 23:29:21 +08:00
Timothyxxx ebda4d8b3f Add Aliyun SDK dependencies and implement TTL configuration for ECS instances
- Added new dependencies for Aliyun ECS SDK in requirements.txt and setup.py to support instance management features.
- Introduced a new config module to handle TTL settings for ECS instances, allowing for auto-termination based on environment variables.
- Updated the manager to utilize TTL settings, including scheduling instance termination with proper error handling and logging.
- Maintained existing code logic while enhancing functionality for improved instance lifecycle management.
2025-08-22 23:28:58 +08:00
Timothyxxx 15d9ddb612 update coact: add autogen/cache 2025-08-21 19:03:35 +00:00
Timothyxxx b14f1c7345 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-21 09:38:37 +00:00
Timothyxxx ead564c92b Update dependencies and refactor DesktopEnv initialization
- Removed specific versioning for the 'requests' library in requirements.txt and setup.py to allow for more flexible updates.
- Refactored the DesktopEnv class to streamline the emulator initialization process, enhancing error handling and logging during startup.
- Improved retry logic for file uploads in SetupController, ensuring robust handling of network issues and providing clearer error messages.
- Maintained existing code logic while enhancing clarity and reliability in the DesktopEnv and SetupController classes.
2025-08-21 09:38:28 +00:00
Timothyxxx b3e1c0344d Update OpenCV dependency to headless version in requirements and setup files
- Replaced 'opencv-python' with 'opencv-python-headless' in both requirements.txt and setup.py to reduce unnecessary GUI dependencies.
- Added a new .gitkeep file in the logs directory to ensure it is tracked in version control.
- Maintained existing code logic while improving dependency management.
2025-08-20 01:26:24 +08:00
Timothyxxx 492c910e94 Refactor AWS scheduler role handling in scheduler_utils.py
- Improved error handling and logging for role resolution and creation.
- Added checks to ensure the trust policy allows for AWS EventBridge Scheduler to assume the role.
- Implemented retry logic for scheduling EC2 termination to handle IAM eventual consistency.
- Maintained existing code logic while enhancing robustness and clarity in role management.
2025-08-18 17:57:31 +00:00
Timothyxxx 3a96fd5046 Add TTL configuration for AWS instance management
- Introduced a new config module to manage TTL settings for EC2 instances, allowing for auto-termination based on environment variables.
- Updated the AWSProvider and manager to utilize the new TTL settings, including scheduling instance termination via EventBridge Scheduler.
- Added utility functions for resolving the scheduler role ARN and creating termination schedules, ensuring robust error handling and logging.
- Maintained existing code logic while integrating new features for improved instance lifecycle management.
2025-08-18 17:30:49 +00:00
Adam Yanxiao Zhao 75f00fea62
Fix AutoGLM-OS custom env reset func (#312)
* Add AWS config for autoglm-os agent script

* update default password

* fix autoglm-os reset
2025-08-18 18:12:09 +08:00
Timothyxxx a5dc64c943 Update Aliyun guidelines to include SSH and VNC password setup script 2025-08-18 07:24:39 +00:00
Adam Yanxiao Zhao deff1fe385
Add AWS config for autoglm-os agent script (#311)
* Add AWS config for autoglm-os agent script

* update default password
2025-08-17 22:54:23 +08:00
Adam Yanxiao Zhao 2664eba23b
fix_run_autoglm (#310) 2025-08-17 18:32:46 +08:00
Adam Yanxiao Zhao aa05f6cc26
Add AutoGLM-OS agent (#309)
* autoglm-os initialize

* clean code

* chore: use proxy for download setup

* feat(autoglm-os): add parameter to toggle images

* fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

* update

* add client_password

* update multienv

* fix

* fix prompt

* fix prompt

* fix prompt

* fix sys prompt

* feat: use proxy in file evaluator

* fix client_password

* fix note_prompt

* fix autoglm agent cmd type

* fix

* revert: fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

reverts commit bab5473eea

* feat(autoglm): setup tools

* fix(autoglm): remove second time of get a11y tree

* add osworld server restart

* Revert "add osworld server restart"

This reverts commit 7bd9d84122.

* fix _launch_setup

* fix autoglm agent tools & xml tree

* fix desktop_env

* fix bug for tool name capitalization

* fix: always use proxy for setup download

* add fail after exceeding max turns

* fix(autoglm): avoid adding image to message when screenshot is empty

* fix maximize_window

* fix maximize_window

* fix maximize_window

* fix import browsertools module bug

* fix task proxy config bug

* restore setup

* refactor desktop env

* restore image in provider

* restore file.py

* refactor desktop_env

* quick fix

* refactor desktop_env.step

* fix our env reset

* add max truns constraint

* clean run script

* clean lib_run_single.py

---------

Co-authored-by: hanyullai <hanyullai@outlook.com>
Co-authored-by: JingBh <jingbohao@yeah.net>
2025-08-17 12:08:40 +08:00
SaiLong Li c833d03a4b
feat: Update eip charge type to 'PayByTraffic' for volcengine. (#308)
Co-authored-by: lisailong <lisailong.ze@bytedance.com>
2025-08-15 20:17:52 +08:00
SaiLong Li cc6eddb466
feat: Add Volcengine provider support for desktop environment. (#307)
Co-authored-by: lisailong <lisailong.ze@bytedance.com>
2025-08-15 18:53:13 +08:00
Timothyxxx 6ecbcf006b chore: add ag2 dependency to requirements and setup files for CoACT-1 support
- Included ag2 version 0.9.7 in requirements.txt and setup.py to ensure proper package installation.
- Maintained existing code logic while enhancing dependency management.
2025-08-13 09:25:49 +00:00
Timothyxxx 50388cfe61 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-13 09:04:17 +00:00
Timothyxxx 7fb5860da0 feat: enhance run_coact.py and related agents with improved task handling and configuration
- Updated TASK_DESCRIPTION in run_coact.py to clarify task-solving steps and requirements.
- Modified configuration parameters for provider name and client password for better security and flexibility.
- Enhanced OrchestratorUserProxyAgent to include user instruction in the auto-reply and improved screenshot handling.
- Adjusted coding_agent.py to ensure proper verification of results before saving changes.
- Improved CUA agent prompts to maintain application state and handle user instructions more effectively.
- Ensured existing code logic remains unchanged while enhancing functionality and usability.
2025-08-13 09:04:09 +00:00
Quyu Kong 893b059e55
feat: Add Aliyun provider support for desktop environment (#304)
* Adding support for aliyun as a provider

* feat: enhance Aliyun provider support

- Added Aliyun as a new provider in the desktop environment.
- Updated the environment configuration guidelines for Aliyun, including prerequisites and environment variables.
- Implemented instance allocation and management functions for Aliyun ECS, including signal handling for graceful termination.
- Improved logging and error handling during instance creation and status checks.
- Adjusted the provider's methods to utilize the new instance management functions.
2025-08-12 14:31:08 +08:00
Timothyxxx d2ae0f697d feat: enhance AnthropicAgent with start_coordinate handling and modifier key support
- Added support for an optional start_coordinate parameter to facilitate drag actions from a specified starting point.
- Implemented validation for start_coordinate to ensure it is a tuple of two integers.
- Enhanced click actions to handle modifier keys, allowing for more complex interactions.
- Ensured existing code logic remains unchanged while improving functionality and usability.
2025-08-12 05:34:18 +00:00
Timothyxxx 7418f5cf2f chore: add traceback import for enhanced error handling
- Introduced the traceback module to improve error reporting and debugging capabilities.
- Ensured that existing code logic remains unchanged while preparing for future enhancements.
2025-08-12 05:15:54 +00:00
Timothyxxx 9e4d717cde fix: update AMI mappings in AWS manager
- Changed the AMI ID for the ap-east-1 region to a new value for better compatibility.
- Added comments to clarify the usage of AMIs for CoACT-1 and the need for manual transfer from us-east-1.
- Ensured existing logic remains unchanged while improving documentation for future reference.
2025-08-11 12:19:18 +00:00
Timothyxxx e2d1887662 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-10 14:40:19 +00:00
Timothyxxx bd6efcfc4d fix: enhance screenshot retrieval in PythonController
- Added a static method to validate image responses for PNG and JPEG formats using magic bytes.
- Improved error handling in the get_screenshot method to log invalid payloads and retry attempts.
- Updated the requests call to include a timeout for better reliability.
2025-08-10 14:40:18 +00:00
Timothyxxx bc1db8d623 chore: update setup.py for version 1.0.0 release
- Bumped version to 1.0.0.
- Updated Python requirement to >=3.10.
- Upgraded dependencies: numpy, Pillow, pandas, torch, and added new dependencies including pygame, backoff, openai, dashscope, google-generativeai, wandb, gdown, tiktoken, groq, docker, loguru, dotenv, tldextract, and anthropic.
- Ensured existing logic remains intact while enhancing package capabilities.
2025-08-05 22:19:42 +08:00
Danyang Zhang 7364a720a6
Calc eval fix (#297)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

* ver Jul19th

added supports for cellIs and some other new types of conditional
formatting for calc evaluation

* ver Aug4th

updated some instructions

* ver Aug4thv2

fixed a typo

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-08-04 12:39:35 +08:00
yuanmengqi 84f407afdd feat: enhance run_coact.py with logging and configuration options
- Added logging configuration to capture runtime logs in both file and console with adjustable log levels.
- Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security.
- Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic.
- Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security.
2025-07-31 05:47:58 +00:00
yuanmengqi a5b51e8010 refactor: update command in JSON example to use placeholder for client password
- Replaced the hardcoded password in the command with a placeholder `{CLIENT_PASSWORD}` for improved security and flexibility.
- Ensured that the overall structure of the JSON remains unchanged while enhancing the example's usability.
2025-07-31 05:20:04 +00:00
yuanmengqi 5e24d72da6 fix: correct IP address return logic in AWSProvider
- Reverted the return value in the AWSProvider class to use private IP address instead of public IP address.
- Ensured that the logic remains intact while addressing the specific requirement for VNC access.
2025-07-31 05:14:00 +00:00
yuanmengqi b081c328bf Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-31 04:16:42 +00:00
yuanmengqi acd75476d8 docs: add acknowledgements section in README.md
- Included a new section to acknowledge institutions and students who contributed feedback and participated in fixes.
- Enhanced recognition of collaborative efforts in the project while maintaining the existing structure of the README.
2025-07-31 04:16:35 +00:00
Yuan Mengqi 239dd37d2e
clean claude run code (#293)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude

* add nogdrive json

* merge claude code

* clean code claude run

* clean code claude run

* clean code claude run
2025-07-31 12:09:08 +08:00
Linxin Song b968155757
CoACT initialize (#292) 2025-07-31 10:35:20 +08:00
Xinyuan Wang 862d704b8c
Wxy/opencua (#290)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines

* update parallel; clean code; use sleep 3s

* ui-tars-0717

* update detail

* add system password to system prompt

* add running command
2025-07-31 08:53:49 +08:00
Xinyuan Wang 3d32556085
Uitars/dev (#291)
* use aws pub ip

* os task fix: set the default dim screen time to be 300s

* add all the uitars agents:
1. run_multienv_uitars.py: Qwen2VL-based UITARS models
2. run_multienv_uitars15_v1.py: UITARS1.5-7B
3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-07-31 08:52:27 +08:00
yuanmengqi dd488c7294 feat: enhance image comparison functionality in gimp.py
- Added resizing logic to handle images of different sizes before comparison, ensuring consistent evaluation.
- Implemented mode conversion to ensure both images are in the same format for accurate comparison.
- Enhanced structure check by MSE to support conversion of numpy arrays to PIL Images, improving compatibility.
- Maintained existing logic while improving robustness and accuracy of image comparison methods.
2025-07-30 06:07:49 +00:00
MillanK0817 4ae9d41da4 feat: update jedi agent with support for o3 as planner 2025-07-30 14:06:37 +08:00
yuanmengqi 99fa3b7cb9 docs: refine proxy configuration note in README.md for clarity
- Updated the proxy configuration section to specify that some tasks may require proxy settings to function properly, depending on website defenses.
- Enhanced user guidance by clarifying the importance of proper proxy configuration for task execution.
- Maintained existing content while improving clarity and user understanding of configuration requirements.
2025-07-29 09:59:31 +00:00
yuanmengqi c3469835f2 docs: update README.md with important configuration requirements for tasks
- Added a section detailing essential configuration requirements for Google Account Tasks and proxy settings.
- Highlighted the impact of missing configurations on task execution and evaluation scores.
- Maintained existing content while enhancing user guidance and clarity in setup instructions.
2025-07-29 09:57:04 +00:00
yuanmengqi 00804f8118 feat: update provider and action space in DesktopEnv class
- Changed the default provider name from "aws" to "vmware" to reflect new environment requirements.
- Updated the action space from "computer_13" to "pyautogui" for improved interaction capabilities.
- Maintained existing class structure and logic while implementing these updates for better functionality.
2025-07-29 06:48:41 +00:00
yuanmengqi af64f4ef49 docs: update README.md with font download link and VSCode trust settings
- Replaced the font download link for LibreOffice with a new source.
- Added instructions for configuring VSCode to disable workspace trust prompts, enhancing user experience.
- Maintained existing content while improving clarity and providing additional setup guidance.
2025-07-28 15:13:37 +00:00
yuanmengqi 70cf3e6982 docs: enhance AWS section in README.md for clarity and efficiency
- Updated the AWS support section to emphasize the benefits of using cloud services for parallel evaluation, including potential time reductions.
- Improved clarity in the username and password information for virtual machines, ensuring security measures are highlighted.
- Maintained existing content while enhancing the overall readability and user guidance in the documentation.
2025-07-28 15:12:18 +00:00
yuanmengqi 0eb3a3d6d7 docs: update README.md with new evaluation sections and guidelines
- Added a new section for Local Evaluation, clarifying the import process for `run_multienv.py`.
- Introduced a Public Evaluation section detailing the process for verifying results on the leaderboard and requirements for sharing agent implementations.
- Included links to the Public Evaluation Guideline for user reference.
- Maintained existing content while enhancing clarity and providing additional resources for users.
2025-07-28 08:39:09 +00:00
yuanmengqi 0dc78937d0 docs: update README.md with enhanced OSWorld-Verified details and AWS support
- Expanded the OSWorld-Verified update entry to include new model results and a comparison with previous benchmarks.
- Added a new section on AWS support, detailing the benefits of using cloud services for parallel evaluation and providing links to setup guides.
- Corrected the baseline agent command example to reflect the updated model name and added a new example for parallel execution.
- Clarified the username and password information for virtual machines, emphasizing security measures for cloud services.
- Maintained existing content while enhancing clarity and providing additional resources for users.
2025-07-28 08:22:25 +00:00
yuanmengqi a37fe86925 feat: enhance logging and signal handling in run_multienv_claude.py
- Refactored logging configuration to support dynamic log levels via command-line arguments, allowing for better control over log verbosity.
- Introduced a new signal handler for graceful shutdown of environments and processes, improving robustness during termination.
- Added functionality to save command-line arguments to a JSON file for better traceability of execution parameters.
- Maintained existing logic while enhancing the overall structure and error handling capabilities of the script.
2025-07-28 07:43:13 +00:00
yuanmengqi 78651040e7 docs: update README.md with new OSWorld-Verified announcement and minor text corrections
- Added a new update entry for the introduction of **OSWorld-Verified** highlighting major updates and community fixes.
- Corrected the spelling of "VirtualBox" in the environment refactor entry.
- Enhanced clarity in the Docker section title for better readability.
2025-07-28 07:19:39 +00:00
yuanmengqi 0f00788c4d feat: add run_multienv_o3.py script for multi-environment evaluation
- Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments.
- Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters.
- Integrated signal handling for graceful shutdown of environments and processes.
- Enhanced logging capabilities for better traceability during execution.
- Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
2025-07-27 16:47:24 +00:00
yuanmengqi 1342bfe5ce delete: remove show_result_opencua.py file and its associated functions 2025-07-27 16:37:40 +00:00
yuanmengqi 5fa490adf4 fix: update Flask port configuration to support environment variable
- Modified the Flask application to allow the port to be set via the `FLASK_PORT` environment variable, defaulting to 8080 if not specified.
- Ensured existing application logic remains unchanged while enhancing configurability for deployment environments.
2025-07-27 16:14:07 +00:00
yuanmengqi 523d553e88 feat: add client password argument to multiple agents and scripts
- Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility.
- Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration.
- Modified evaluation guidelines to reflect the new client password requirement.
- Ensured existing logic remains intact while enhancing functionality for better user experience.
2025-07-27 16:11:23 +00:00
yuanmengqi 122b16742b fix: improve EPUB processing by checking for file existence before reading
- Added checks for the presence of "toc.ncx" and "content.opf" in the EPUB file before attempting to process them.
- Introduced debug logging to notify when these files are not found, enhancing error handling and traceability.
- Maintained existing logic while improving robustness of the EPUB processing function.
2025-07-26 20:42:18 +00:00
yuanmengqi b25854edba feat: introduce DummyAgent class for enhanced coordinate handling
- Added DummyAgent class to facilitate coordinate generation and action assignment.
- Updated GTA1Agent to utilize DummyAgent for improved planning and execution.
- Increased max_steps and N_SEQ parameters for better performance.
- Enhanced logging for planning and execution processes.
- Maintained existing logic while integrating new functionality.
2025-07-26 08:26:23 +00:00
yuanmengqi d49ca9cc2d fix: enhance handling of '<' characters in pyautogui commands
- Refactor _fix_pyautogui_less_than_bug to improve handling of press('<') and typewrite calls.
- Introduce Unicode escape decoding for typewrite content to ensure proper '<' character processing.
- Maintain existing logic while enhancing functionality for better compatibility.
2025-07-26 07:59:37 +00:00
yuanmengqi 123f51ea4a Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-26 07:28:39 +00:00
yuanmengqi 73caf53880 delete: remove img_utils.py and update imports in jedi_3b_agent.py and jedi_7b_agent.py to use qwen_vl_utils 2025-07-26 07:28:31 +00:00
张逸群 2ed0436c21
fix: update DockerVMManager method signatures for interface compatibility (#287)
- Fix delete_vm() method to accept region and **kwargs parameters
- Fix occupy_vm() method to accept pid, region and **kwargs parameters
- Ensures consistency with base VMManager interface and other providers
- Resolves runtime argument mismatch errors when calling these methods

This maintains backward compatibility while fixing the interface contract.
2025-07-26 01:18:00 +08:00
yuanmengqi 40fdc6266f chore: update default AWS instance type from t3.xlarge to t3.medium 2025-07-25 15:56:42 +00:00
yuanmengqi 39e5baf5ae fix: remove unnecessary sleep and observation retrieval in run_single_example function 2025-07-25 15:51:20 +00:00
yuanmengqi f5595df71c delete: remove gat1_agent.py file 2025-07-25 07:11:55 +00:00
Zilong Zhou b8b9e9b166
feat: add proxy handling logic and clean up imports (#285) 2025-07-24 16:27:56 +08:00
Zilong Zhou cbe650d0bb
refactor&delete: simplify AWS VM allocation and remove proxy support (#284) 2025-07-24 16:27:18 +08:00
Jiaqi 23b81993fa os task fix: set the default dim screen time to be 300s 2025-07-24 08:13:02 +00:00
ChenYXxxx 873f8a0359
Update 10a730d5-d414-4b40-b479-684bed1ae522.json
change the ight 2 the night
2025-07-24 15:44:52 +08:00
Jiarui Yao 4fd8b5be0a
support docker without kvm (#282) 2025-07-24 12:31:43 +08:00
张逸群 bf78b6d05e
Add OPENAI_BASE_URL support for custom OpenAI-compatible endpoints (#283)
Enables GPT models to use custom API endpoints through OPENAI_BASE_URL environment variable. This addresses the limitation where only Azure OpenAI supported custom endpoints while standard GPT models were hardcoded to api.openai.com.

- Add intelligent URL handling to avoid duplicate /v1 paths
- Maintain backward compatibility with default OpenAI API
- Update README with configuration instructions
- Non-breaking change preserving existing functionality

Fixes API integration issues for users with custom OpenAI-compatible services.
2025-07-24 12:31:08 +08:00
yuanmengqi 0b0f8413df Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-23 17:19:54 +00:00
yuanmengqi 0a2929137b Simplify the logic for Docker provider 2025-07-23 17:19:47 +00:00
Yan98 2f3a6c48f6
Fix Typos (#275)
* init

* init

* fix typo
2025-07-24 00:06:04 +08:00
yuanmengqi fd7381210e Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-23 16:05:42 +00:00
yuanmengqi 5d219e7a5b Clean code 2025-07-23 16:05:39 +00:00
Yuan Mengqi d128edbbc1
add nogdrive json (#281)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude

* add nogdrive json
2025-07-23 19:12:42 +08:00
张逸群 4d6e0fd031
Add --provider_name parameter to run.py and fix Docker provider initialization (#277)
- Add command-line argument --provider_name to support flexible provider selection
- Default provider remains vmware for backward compatibility
- Fix Docker provider controller initialization issue with delayed setup
- Add safety checks for controller existence in error handling

This enables users to specify different virtualization providers directly
from the command line and resolves Docker container lifecycle issues.
2025-07-23 04:09:36 +08:00
yuanmengqi 73de48af75 Update Public Evaluation Guidelines and README to require Python 3.10 and enhance installation instructions. Added troubleshooting tips for environment issues and clarified access key creation process in AWS for better security practices. 2025-07-22 19:57:55 +00:00
yuanmengqi 82c3cdd590 feat: refactor run_multienv_qwen25vl.py and qwen25vl_agent.py for improved logging and task management
- Introduced signal handling for graceful shutdown of environments and processes.
- Enhanced logging configuration to support dynamic log levels and structured output.
- Updated argument parsing to include new parameters for model selection and task execution.
- Refactored task distribution logic to streamline environment task management.
- Improved error handling during task execution and environment cleanup.
- Adjusted Qwen25VLAgent initialization to support new model and thought prefix options.
- Reduced max tries for LLM calls to optimize performance.
2025-07-22 19:46:42 +00:00
张逸群 4a5d48000f
feat: add HuggingFace mirror support for VM providers (#278)
Add support for HuggingFace mirror (hf-mirror.com) to improve download
speeds in regions where huggingface.co access is slow.

- Support HF_ENDPOINT environment variable detection
- Automatically switch to hf-mirror.com when HF_ENDPOINT is set
- Apply to Docker, VMware, and VirtualBox providers
- Maintain backward compatibility with default huggingface.co URLs

Users can now set HF_ENDPOINT=https://hf-mirror.com to use the mirror.
2025-07-23 03:40:35 +08:00
Yuan Mengqi 0a37cccd53
update claude (#280)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude
2025-07-23 03:35:49 +08:00
Dunjie Lu 53fb96298a
support_qwen25vl (#276)
Co-authored-by: root <ludunjie1219@github.com>
2025-07-22 16:33:03 +08:00
yuanmengqi 921321c5df Update Public Evaluation Guidelines to clarify proxy settings. Added information on automatic proxy wrapping for proxy-sensitive tasks and retained the recommendation for users to disable the proxy if not needed. Ensured existing content structure remains intact. 2025-07-22 05:59:57 +00:00
yuanmengqi 2727696835 Enhance Public Evaluation Guidelines by adding new images for AWS setup and monitoring instructions. Included additional contact information for leaderboard updates and error reporting. Ensured clarity and usability for users while preserving existing content structure. 2025-07-22 05:53:33 +00:00
yuanmengqi 05e25ba1b7 Enhance Public Evaluation Guidelines with detailed AWS setup instructions and security configurations. Added new sections for host and client machine setup, including recommended instance types, storage considerations, and security group rules. Updated existing content for clarity and added a new image for Google Drive authentication. Ensure all changes maintain original logic while improving usability for users with varying AWS experience. 2025-07-22 05:35:58 +00:00
yuanmengqi feaebbc2ec Update AWS guidance 2025-07-20 16:42:14 +00:00
yuanmengqi 46c9407879 Clean elder version of opencua experiment runner 2025-07-20 07:57:27 +00:00
yuanmengqi 91bc6bb6ce Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-20 07:55:57 +00:00
yuanmengqi 88d5639a2a Compatible with agents that cannot use runtime log 2025-07-20 07:55:53 +00:00
Xinyuan Wang e10dd9267c
Wxy/opencua (#274)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines

* update parallel; clean code; use sleep 3s

* ui-tars-0717
2025-07-20 15:52:23 +08:00
Danyang Zhang bec7129fff
Calc eval fix (#273)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

* ver Jul19th

added supports for cellIs and some other new types of conditional
formatting for calc evaluation

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-19 17:15:40 +08:00
yuanmengqi c6c62c52d7 feat: add X11 image handling and enhanced OCR processing
- Introduced a new function `read_x11_image` to read and convert X11 (XWD) format images to PIL Image, supporting both 24-bit and 32-bit formats.
- Enhanced the `compare_image_text` function to include checks for X11 image formats, with multiple conversion attempts using PIL, a custom reader, and netpbm tools.
- Improved error handling and logging for OCR processing, providing detailed feedback on conversion attempts and potential issues with X11 images.
- Maintained existing logic while expanding functionality for better image processing reliability.
2025-07-18 19:26:29 +00:00
yuanmengqi d6f2190a9f fix: refine instruction in OS evaluation example to clarify restrictions on logging out or shutting down the machine 2025-07-18 18:51:01 +00:00
yuanmengqi d52f3b1fca feat: add safe image opening function with retry mechanism
- Introduced a new function `safe_open_image_with_retry` to handle image file opening with retries for truncated or corrupted files.
- Enhanced error handling and logging for image processing in `check_palette_and_structure_sim`.
- Updated the logic to safely open both source and target images, ensuring robust evaluation without altering existing functionality.

These changes improve the reliability of image handling in the GIMP evaluator while maintaining the original code logic.
2025-07-18 18:36:09 +00:00
yuanmengqi 4fa59ebba2 feat: enhance URL comparison logic and Chrome debugging configuration
- Added a new function to ensure URLs have a scheme, defaulting to 'http://' if missing.
- Integrated tldextract to normalize URLs by extracting domain parts and handling 'www' subdomains.
- Updated the compare_urls function to include logging for better traceability during URL comparisons.
- Added tldextract to requirements.txt to support the new functionality.
- Updated the AWS manager with a new AMI ID for the specified resolution.
- Modified Chrome desktop launcher to include --remote-debugging-port=1337 for GUI debugging support.

These changes improve the robustness of URL handling and enable consistent Chrome debugging capabilities without altering existing logic.
2025-07-18 17:55:45 +00:00
yuanmengqi 1ade6fe439 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-18 14:17:37 +00:00
yuanmengqi 44bd66fc9a Increase timeout for page load stability in Chrome evaluator
- Updated the timeout for the page load state from 10 seconds to 60 seconds to ensure better stability during page processing.
- Removed redundant retry mechanisms from the active tab checks to streamline the code while maintaining existing functionality.
- Enhanced logging to provide clearer insights into the page loading process.

These changes aim to improve the reliability of the Chrome evaluator without altering the core logic.
2025-07-18 14:16:16 +00:00
Danyang Zhang 53ffc05042
Calc eval fix (#272)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-18 21:28:48 +08:00
Zilong Zhou 7fb1cee575
fix: img path error (#271)
* feat&style: add task status configuration and clear cache functionality; enhance UI styles

* feat&refactor: enhance current configuration API and improve cache clearing logic

* refactor&style: simplify task status update logic and improve page refresh mechanism

* refactor&feat: streamline default configuration retrieval and enhance cache initialization logic

* feat&refactor: add caching to default configuration retrieval and streamline task status logic

* feat&style: add collapsible section for additional model parameters and enhance styling for config items

* refactor&style: remove floating action button and clean up related styles

* fix: update video and screenshot sources to include action space, observation type, and model name parameters
2025-07-18 19:52:03 +08:00
shenzhennan 1378c745e1 Merge branch 'main' of https://github.com/xlang-ai/OSWorld 2025-07-18 07:15:18 +00:00
shenzhennan c7017a476d fix impress instruction 0a211154 2025-07-18 07:14:35 +00:00
yuanmengqi fcaefe7bb4 Enhance Chrome evaluator with improved error handling and retry mechanisms
- Added robust error handling for page processing, including checks for closed pages and HTTP status codes.
- Implemented retry logic for page loads and active tab checks to improve reliability.
- Enhanced logging throughout the process to capture detailed information about failures and successes.
- Preserved existing logic while ensuring better maintainability and robustness in the Chrome evaluator functions.
2025-07-18 07:13:13 +00:00
yuanmengqi 0fb625e4fd Update instruction in OS evaluation example to include a restriction against shutting down the machine. 2025-07-18 05:28:43 +00:00
yuanmengqi b9a646c11d Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-17 18:19:19 +00:00
yuanmengqi d1ddd3eacd feat: enhance VM wallpaper retrieval and image similarity checks
- Added logging to the VM wallpaper retrieval function to capture errors and warnings related to content retrieval and file creation.
- Implemented checks for None, empty, and invalid content types to ensure robustness in wallpaper handling.
- Enhanced the SSIM structure check function with size validation and improved error handling for image processing.
- Added logging for image size discrepancies and exceptions during SSIM computation to aid in debugging.

These changes improve error handling and logging, ensuring better maintainability and reliability of the evaluators.
2025-07-17 18:19:09 +00:00
Zilong Zhou 66694c663d
Feat/monitor cache (#267)
* feat&style: add task status configuration and clear cache functionality; enhance UI styles

* feat&refactor: enhance current configuration API and improve cache clearing logic

* refactor&style: simplify task status update logic and improve page refresh mechanism

* refactor&feat: streamline default configuration retrieval and enhance cache initialization logic

* feat&refactor: add caching to default configuration retrieval and streamline task status logic

* feat&style: add collapsible section for additional model parameters and enhance styling for config items

* refactor&style: remove floating action button and clean up related styles
2025-07-18 01:58:20 +08:00
yuanmengqi e70cf0bd93 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-17 11:15:53 +00:00
yuanmengqi 9d04624e41 feat: enhance Chrome evaluator with improved retry logic and logging
- Implemented retry mechanism for connecting to Chrome instances, allowing up to two attempts before failure.
- Increased timeout settings for page navigation and loading to enhance reliability.
- Added detailed logging for connection attempts, page loading status, and error handling to improve debugging and user experience.
- Ensured existing logic is preserved while enhancing error handling and operational robustness.

These changes improve the overall reliability and maintainability of the Chrome evaluator functions.
2025-07-17 11:15:47 +00:00
yuanmengqi 2c51950e73 feat: enhance evaluator configuration for Chrome with post-execution commands
- Added postconfig commands to multiple JSON files for Chrome evaluation examples.
- Included commands to terminate existing Chrome processes, launch Chrome with remote debugging, and introduce sleep intervals for timing.
- Updated logging messages in the AWS manager to improve clarity and user experience.

These changes enhance the automation and usability of the evaluation examples while preserving existing logic.
2025-07-17 10:50:10 +00:00
Yuan Mengqi 5ca516ac7a
add uitars agent code (#265) 2025-07-17 18:17:13 +08:00
Xinyuan Wang 24fbad9015
Merge pull request #264 from yuanmengqi/main
Improve the parallel logic
2025-07-17 12:28:48 +08:00
yuanmengqi fe40011b5d Improve the parallel logic 2025-07-17 04:21:42 +00:00
yuanmengqi 6788c58aa3 Improve the parallel logic 2025-07-17 04:20:59 +00:00
yuanmengqi bb8b0b2582 Improve the parallel logic 2025-07-17 04:19:44 +00:00
yuanmengqi 150234307e Merge branch 'fix_chrome' 2025-07-17 04:14:47 +00:00
yuanmengqi 9eeabfc52d Improve the parallel logic 2025-07-17 04:14:20 +00:00
yuanmengqi 2a48500691 refactor: improve code readability and error handling in table.py
- Reformatted import statements for better organization.
- Enhanced error handling in file reading functions to provide clearer logging.
- Introduced a new `_safe_read_file` function to handle multiple encoding attempts when reading files.
- Updated the `compare_csv` function to utilize the new file reading method, improving robustness.
- Ensured consistent return values across functions, replacing `0.` with `0.0` for clarity.

These changes maintain existing logic while improving code maintainability and readability.
2025-07-16 18:11:05 +00:00
yuanmengqi a9c1c6135a Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-16 17:38:39 +00:00
yuanmengqi 0939226020 feat: enhance evaluator configuration with post-execution commands for Chrome
- Added a series of postconfig commands to the evaluator section in the JSON file.
- Commands include executing a refresh in Chrome, managing Chrome processes, launching Chrome with remote debugging, and opening specific settings tabs.
- Introduced sleep intervals to ensure proper execution timing between commands.

This update improves the automation capabilities of the evaluation examples while maintaining existing logic.
2025-07-16 17:37:37 +00:00
Zilong Zhou dc164d5269
feat&fix: update configuration management to save model arguments and enhance UI display for model args (#262) 2025-07-16 21:46:35 +08:00
yuanmengqi e433f35c1f feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
  - Set to `true` for 8 files listed in google_chrome.json multi_apps
  - Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
  - Set to "low" for newly added fields
  - Preserve existing values (4 medium, 2 high) for 6 files that already had this field

This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
2025-07-16 13:45:34 +00:00
yuanmengqi b9df320f31 Enhance check_python_file_by_test_suite function with robust error handling and logging. Added validation for file existence, module loading, and function execution. Improved resource cleanup and working directory management to ensure stability and reliability during test execution. 2025-07-16 11:44:46 +00:00
Xinyuan Wang 0f2655249c
Wxy/opencua (#260)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines
2025-07-16 17:53:12 +08:00
yuanmengqi 5e5058c1f2 fix: standardize provider interface parameters across all implementations
- Add screen_size parameter to get_vm_path() for all providers (with default 1920x1080)
- Add os_type parameter to start_emulator() for Azure and VirtualBox providers
- Add region parameter to stop_emulator() for VMware, Docker, and VirtualBox providers
- Use *args, **kwargs for better extensibility and parameter consistency
- Add documentation comments explaining ignored parameters for interface consistency
- Prevents TypeError exceptions when AWS-specific parameters are passed to other providers

This ensures all providers can handle the same parameter sets while maintaining
backward compatibility and avoiding interface fragmentation.
2025-07-15 21:38:34 +00:00
yuanmengqi cb070307ee merge code 2025-07-15 14:57:14 +00:00
yuanmengqi 175b4b46c2 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-15 14:50:48 +00:00
yuanmengqi 7912880d16 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-15 07:24:38 +00:00
yuanmengqi 451bbf5fc2 Update multi_apps JSON examples: refined instructions for image processing in GIMP, replaced an open command with a launch command for VLC, and corrected assignment modification instruction in LibreOffice Calc example. 2025-07-15 07:24:33 +00:00
Yuan Mengqi af47ed8fb1
fix infeasible&chrome tasks (#258)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

* Improve code logic for password & resolution

* edit

* Merge branch 'main' into fix_chrome

* fix chrome tasks

* Merge branch 'fix_chrome'

* fix insensible&chrome tasks

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-15 13:02:42 +08:00
yuanmengqi 68a9f647f4 fix: address https://github.com/xlang-ai/OSWorld/issues/257 by implement fix for PyAutoGUI '<' character bug in command execution. Introduced a new function to handle typewrite and press calls, ensuring correct behavior when using '<' in commands. Updated command execution logic to apply this fix before executing user commands. 2025-07-15 04:17:34 +00:00
yuanmengqi 8e18b7839a fix insensible&chrome tasks 2025-07-15 02:19:27 +00:00
yuanmengqi 1bf0730dce Merge remote-tracking branch 'upstream/main' 2025-07-15 02:14:41 +00:00
yuanmengqi 756ef96850 Merge branch 'fix_chrome' 2025-07-15 02:13:58 +00:00
yuanmengqi 08b4cf2c2f fix infeasible&chome tasks 2025-07-15 02:09:40 +00:00
ChenYXxxx 698483390a
"Could you turn my image into CYMK mode?" add "within GIMP" 2025-07-14 23:53:20 +08:00
ChenYXxxx 9242becd87
"Please batch process all images on the desktop by increasing their brightness to 50, instead of adjusting them individually." add "within GIMP" 2025-07-14 23:52:41 +08:00
ChenYXxxx 56b2fe9cc4
Update d16c99dc-2a1e-46f2-b350-d97c86c85c15.json 2025-07-14 23:23:11 +08:00
ChenYXxxx b481c794c5
Update 72f83cdc-bf76-4531-9a1b-eb893a13f8aa.json 2025-07-14 23:22:50 +08:00
ChenYXxxx 7f973a391c
Update f723c744-e62c-4ae6-98d1-750d3cd7d79d.json 2025-07-14 23:22:14 +08:00
shenzhennan 7f96cc0633 Merge branch 'main' of https://github.com/xlang-ai/OSWorld 2025-07-14 12:35:00 +00:00
shenzhennan 53983db9cb fix impress eval : extending sleep time to ensure save 2025-07-14 12:34:43 +00:00
Xinyuan Wang db83b9cb2c
Wxy/opencua (#256)
* OpenCUA Agent code base

* update url

* debug, modify url input
2025-07-14 20:26:39 +08:00
Danyang Zhang 2339db20ca
ver Jul7th (#255)
pip-installing directly from PyPI fails misteriously in postconfig
execution, possible owing to proxy configuration in the VM, adjusted
strategy by downloading the wheel on host and pip-installing it locally
on VM in thunderbird/d38192b0-17dc-4e1d-99c3-786d0117de77
2025-07-14 20:26:29 +08:00
shenzhennan 60e26d2d0d fix impress compare use gold file 2025-07-14 11:35:06 +00:00
yuanmengqi 90c4e894a4 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-14 07:14:19 +00:00
yuanmengqi 5d90faa548 run operagor 2025-07-14 07:13:17 +00:00
Zilong Zhou 74b7c189af
Feat/monitor (#254)
* feat: add claude support

* feat: add script for end-to-end evaluation with logging and task distribution

* feat&fix: add tool result handling and update model default in evaluation script

* chore: remove run_test_env.py script

* feat&fix: implement action parsing for tool calls and update default action space

* fix: update text formatting in action parsing and replace logger import

* feat&fix: implement action parsing for tool calls and add screen size handling

* feat: add setup instructions for Anthropic API integration

* feat: add notice about image size limitations for Anthropic API

* Delete test_env/logger.py

* Delete test_env/utils.py

* fix: update logger usage to use global logger and improve error handling

* feat&fix: add configuration management API endpoints and update UI for configuration selection

* feat&fix: update environment configuration, enhance task statistics, and improve UI responsiveness

* feat&fix: add configuration toggle button in UI and improve task loading performance

* feat&fix: add accuracy percentage display to score and style updates for UI
2025-07-14 13:43:41 +08:00
yuanmengqi 0651495d88 fix: Enhance error handling and logging across multiple evaluators
- Added logging for file retrieval and error handling in file.py, improving robustness during file operations.
- Implemented checks for file existence and parsing errors in general.py, enhancing reliability in JSON/YAML processing.
- Improved table comparison logic in table.py with detailed error logging for sheet loading and cell value reading.
- Enhanced metrics evaluation in slides.py with additional checks for paragraph and run counts, ensuring thorough comparison.
- Updated utils.py to include file existence checks and detailed error logging during cell value reading.
2025-07-14 05:43:17 +00:00
yuanmengqi b8b026f817 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-13 13:16:48 +00:00
yuanmengqi 7c807d4f3e Merge remote-tracking branch 'upstream/main' 2025-07-13 13:12:26 +00:00
Zilong Zhou 349f2fd9fe
Feat/claude cua support (#253)
* feat: add claude support

* feat: add script for end-to-end evaluation with logging and task distribution

* feat&fix: add tool result handling and update model default in evaluation script

* chore: remove run_test_env.py script

* feat&fix: implement action parsing for tool calls and update default action space

* fix: update text formatting in action parsing and replace logger import

* feat&fix: implement action parsing for tool calls and add screen size handling

* feat: add setup instructions for Anthropic API integration

* feat: add notice about image size limitations for Anthropic API

* Delete test_env/logger.py

* Delete test_env/utils.py
2025-07-13 21:10:49 +08:00
Yuan Mengqi 38a30734a6
Improve code logic for password & resolution (#252)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

* Improve code logic for password & resolution

* edit

* Merge branch 'main' into fix_chrome

* fix chrome tasks

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-13 21:04:07 +08:00
yuanmengqi 9f806d425d fix chrome final 2025-07-13 12:46:00 +00:00
yuanmengqi 7279469d23 fix chrome tasks 2025-07-13 12:41:27 +00:00
yuanmengqi 572a94b6df Merge branch 'main' into fix_chrome 2025-07-13 10:16:08 +00:00
yuanmengqi a16b54c175 edit 2025-07-13 10:14:41 +00:00
yuanmengqi 94ea30cb45 Merge remote-tracking branch 'upstream/main' 2025-07-13 07:05:54 +00:00
yuanmengqi d3bf4823cb Improve code logic for password & resolution 2025-07-13 07:01:28 +00:00
yuanmengqi a070ddda7e Improve code logic for password & resolution 2025-07-13 06:59:45 +00:00
yuanmengqi 97ed6f99b0 Final review multi_apps fix the rest part 2025-07-12 20:28:55 +00:00
yuanmengqi dbecf46057 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-12 16:35:02 +00:00
yuanmengqi 877e75a013 Final review multi_apps fix Xinzhuang part 2025-07-12 16:34:55 +00:00
Yuan Mengqi 27319ce1e3
fix password&resolution (#251)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-13 00:25:37 +08:00
yuanmengqi 3b698aa3c0 Merge branch 'fix_chrome' 2025-07-12 15:13:44 +00:00
yuanmengqi 08bbf77511 fix password&resolution 2025-07-12 15:11:42 +00:00
yuanmengqi fb0c301e14 Merge branch 'fix_chrome' 2025-07-11 12:17:42 +00:00
yuanmengqi 37c56533f0 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-11 12:16:23 +00:00
yuanmengqi fe3bb2fd92 fix password&resolution 2025-07-11 12:15:03 +00:00
yuanmengqi bc1248f73e Merge branch 'fix_chrome' 2025-07-08 10:49:26 +00:00
yuanmengqi 349c31fa55 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-08 10:45:57 +00:00
yuanmengqi 5778078596 fix some multi_apps tasks 2025-07-08 10:35:47 +00:00
yuanmengqi 8d670df32d fix some multi_apps tasks 2025-07-08 10:31:10 +00:00
yuanmengqi 67b68abaec fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774 2025-07-04 07:22:36 +00:00
yuanmengqi 8aa686a2a9 fix multiapps 2025-07-04 07:18:15 +00:00
yuanmengqi 66e669b50b fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774 2025-07-04 07:14:54 +00:00
yuanmengqi 5b9be93acd clean chrome_fix code 2025-07-02 11:26:19 +00:00
yuanmengqi 3fd9fa94e6 clean chrome_fix code 2025-07-02 11:24:45 +00:00
yuanmengqi 3d7be9f216 Merge remote-tracking branch 'upstream/main' 2025-07-02 11:04:52 +00:00
yuanmengqi 1a5c4b0e10 fix 2025-07-02 10:24:29 +00:00
yuanmengqi ca24d308bd fix chrome finished 2025-07-02 09:22:42 +00:00
yuanmengqi 144a87fd9a Merge remote-tracking branch 'upstream/fix/aws-proxy' 2025-07-01 16:07:41 +00:00
yuanmengqi 2e3a4a5ba9 fix tasks 2025-07-01 15:57:14 +00:00
adlsdztony 2f9ae1f3ad feat&fix: add proxy support in setup and remove hardcoded proxy from example 2025-07-01 13:47:16 +00:00
adlsdztony 64f47d1a32 fix: fix proxy setup 2025-07-01 13:20:26 +00:00
yuanmengqi b48c69a2fb Merge remote-tracking branch 'upstream/main' 2025-06-30 08:20:45 +00:00
yuanmengqi ea51f5264a fix chrome 2025-06-30 08:07:24 +00:00
926 changed files with 143224 additions and 4615 deletions

16
.gitignore vendored
View File

@ -197,4 +197,18 @@ vmware_vm_data
.vscode
dataimpulse_proxy_config.json
dataimpulse_proxy_config.json
## reference and draft and debug
reference/
draft/
manual_examine.py
run_human_examine.sh
quick_start.py
result_multi_apps_pengxiang_transformers12evaluation_examples/settings/proxy/dataimpulse.json
evaluation_examples/settings/proxy/dataimpulse.json
# Local test configurations (not for public repo)
evaluation_examples/spiderman.json
evaluation_examples/test_50_random_proportional.json
evaluation_examples/test_chrome.json

View File

@ -1,30 +1,35 @@
# Public Evaluation Platform User Guide
We have built an AWS-based platform for large-scale parallel evaluation of OSWorld tasks. The system follows a Host-Client architecture:
We have built an AWS-based platform for large-scale parallel evaluation of OSWorld tasks.
The system follows a Host-Client architecture:
- **Host Instance**: The central controller that stores code, configurations, and manages task execution.
- **Client Instances**: Worker nodes automatically launched to perform tasks in parallel.
All instances use a preconfigured AMI to ensure a consistent environment.
The architecture consists of a host machine (where you git clone and set up the OSWorld host environment) that controls multiple virtual machines for testing and potential training purposes.
Each virtual machine serves as an OSWorld client environment using pre-configured AMI images and runs `osworld.service` to execute actions and commands from the host machine.
To prevent security breaches, proper security groups and subnets must be configured for both the host and virtual machines.
## 1. Platform Deployment & Connection
Below, we assume you have no prior AWS configuration experience.
You may freely skip or replace any graphical operations with API calls if you know how to do so.
### 1.1 Launch the Host Instance
Please create an instance in the AWS EC2 graphical interface to build the Host Machine.
Create an EC2 instance in the AWS Console with the following settings:
Our recommended instance type settings are as follows: if you want to run fewer than 5 VM environments in parallel (`--num_envs` < 5), `t3.medium` will be sufficient; if you want to run fewer than 15 VM environments in parallel (`--num_envs` < 15), `t3.large` will be sufficient; however, if you want to use more than 15 VM environments in parallel, it's better to choose a machine with more vCPUs and memory, such as `c4.8xlarge`.
| Configuration Item | Value |
| -------------------------- | ------------------------------------------------------------ |
| AMI ID | `ami-0e49e0a70044dde43` |
| Instance Type | - `t3.medium` (Recommended for ≤5 parallel tasks)<br />- ` t3.large ` (Recommended for ≤15 parallel tasks)<br /><br /> - These numbers are based on using VSCode over SSH. You can save resources by running via CLI—`t3.large` supports up to 20 tasks that way.<br /> - For higher parallelism, use a more powerful instance. |
| VPC | `vpc-0f207282fe145bcda` |
| Subnet | `subnet-0a4b0c5b8f6066712` |
| Firewall (security groups) | `sg-05f8e79c10a7768e4` |
| Storage | 50GB<br /> - Consider increasing if storing multiple results to avoid crashes. |
For the AMI, we recommend using Ubuntu Server 24.04 LTS (HVM), SSD Volume Type, though other options will also work since the host machine doesn't have strict requirements.
For storage space, please consider the number of experiments you plan to run.
We recommend at least 50GB or more.
For security group configuration, please configure according to your specific requirements.
We provides a monitor service that runs on port 8080 by default.
You need to open this port to use this functionality.
Set the VPC as default, and we will return to it later to configure the virtual machines with the same setting.
Once launched, you will receive an instance ID like `i-xxxxxx`.
### 1.2 Connect to the Host Instance
@ -66,7 +71,7 @@ Once launched, you will receive an instance ID like `i-xxxxxx`.
ssh -i <your_key_path> ubuntu@<your_public_dns>
```
* VSCode Remote SSH configuration:
* VSCode/Cursor Remote SSH configuration:
```
Host host_example
@ -75,6 +80,94 @@ Once launched, you will receive an instance ID like `i-xxxxxx`.
IdentityFile <your_key_path>
```
#### Step 3: Set up the host machine
After you connect the host machine, clone the latest OSWorld and set up the environment.
Please ensure that the version of Python is >= 3.10.
```
# Clone the OSWorld repository
git clone https://github.com/xlang-ai/OSWorld
# Change directory into the cloned repository
cd OSWorld
# Optional: Create a Conda environment for OSWorld
# conda create -n osworld python=3.10
# conda activate osworld
# Install required dependencies
pip install -r requirements.txt
```
When installing requirements, you may encounter general environment issues, but these are solvable.
You'll need to use `apt install` to install and configure some dependencies.
These issues can be quickly fixed with the help of AI tools like Claude Code.
Then it is almost done for the host machine part!
### 1.3 Set up the virtual machine
We need to programmatically scale virtual machines.
Therefore, we will use the graphical interface to configure and obtain the necessary environment variables, then set these environment variables on the host machine so that the OSWorld code can read them to automatically scale and run experiments.
For the client machine environments, we have already prepared pre-configured client machine environments in different regions, stored in https://github.com/xlang-ai/OSWorld/blob/main/desktop_env/providers/aws/manager.py:
```
IMAGE_ID_MAP = {
"us-east-1": {
(1920, 1080): "ami-0d23263edb96951d8"
},
"ap-east-1": {
(1920, 1080): "ami-0c092a5b8be4116f5"
}
}
# Tell us if you need more, we can make immigration from one place to another.
```
Therefore, you don't need to configure the virtual machine environments and related variables.
If you need to add additional functionality, you can configure it based on these images.
If you want to reconfigure from scratch, please refer to the files and instructions under https://github.com/xlang-ai/OSWorld/tree/main/desktop_env/server.
#### Step 1: Security Group for OSWorld Virtual Machines
OSWorld requires certain ports to be open, such as port 5000 for backend connections to OSWorld services, port 5910 for VNC visualization, port 9222 for Chrome control, etc.
The `AWS_SECURITY_GROUP_ID` variable represents the security group configuration for virtual machines serving as OSWorld environments.
Please complete the configuration and set this environment variable to the ID of the configured security group.
**⚠️ Important**: Please strictly follow the port settings below to prevent OSWorld tasks from failing due to connection issues:
##### Inbound Rules (8 rules required)
| Type | Protocol | Port Range | Source | Description |
|------|----------|------------|--------|-------------|
| SSH | TCP | 22 | 0.0.0.0/0 | SSH access |
| HTTP | TCP | 80 | 172.31.0.0/16 | HTTP traffic |
| Custom TCP | TCP | 5000 | 172.31.0.0/16 | OSWorld backend service |
| Custom TCP | TCP | 5910 | 0.0.0.0/0 | NoVNC visualization port |
| Custom TCP | TCP | 8006 | 172.31.0.0/16 | VNC service port |
| Custom TCP | TCP | 8080 | 172.31.0.0/16 | VLC service port |
| Custom TCP | TCP | 8081 | 172.31.0.0/16 | Additional service port |
| Custom TCP | TCP | 9222 | 172.31.0.0/16 | Chrome control port |
Once finished, record the `AWS_SECURITY_GROUP_ID` as you will need to set it as the environment variable `AWS_SECURITY_GROUP_ID` on the host machine before starting the client code.
##### Outbound Rules (1 rule required)
| Type | Protocol | Port Range | Destination | Description |
|------|----------|------------|-------------|-------------|
| All traffic | All | All | 0.0.0.0/0 | Allow all outbound traffic |
#### Step 2: Record VPC Configuration for Client Machines from Host Machine
To isolate the entire evaluation stack, we run both the host machine and all client virtual machines inside a dedicated VPC.
The setup is straightforward:
1. Launch the host instance in the EC2 console via the AWS console and note the **VPC ID** and **Subnet ID** shown in its network settings.
2. Record the **Subnet ID** as you will need to set it as the environment variable `AWS_SUBNET_ID` on the host machine before starting the client code.
<p align="center">
<img src="./assets/pubeval_subnet.png" alt="pubeval_subnet" style="width: 80%;" />
</p>
### 1.3 Get AWS Access Keys & Secret Access Key
Click on **Security Credentials** from the drop-down menu under your account in the top-right corner.
@ -89,16 +182,40 @@ In the **Access keys** section, click **"Create access key"** to generate your o
<img src="./assets/pubeval5.png" alt="pubeval5" style="width: 100%;" />
</p>
If this method doesn't work, please go to **IAM****Users** → select your username → **Security credentials** tab → **Create access key**.
Alternatively, you can create access keys through IAM for better security practices:
1. Navigate to **IAM** in the AWS Console
2. Click on **Users** in the left sidebar
3. Select your username or create a new IAM user
4. Go to the **Security credentials** tab
5. Click **Create access key**
6. Choose the appropriate use case (e.g., "Command Line Interface (CLI)")
7. Download or copy the Access Key ID and Secret Access Key
**Note**: For production environments, it's recommended to use IAM roles instead of access keys when possible, or create dedicated IAM users with minimal required permissions rather than using root account credentials.
Similarly, later you will need to set them as the environment variables on the host machine.
## 2. Environment Setup
### 2.1 Google Drive Integration
Great! Now back to the **host machine**, we can start running experiments!
All the following operations are performed on the host machine environment, under the OSWorld path.
### 2.1 Google Drive Integration (Optional)
Follow the instructions in [ACCOUNT_GUIDELINE](./ACCOUNT_GUIDELINE.md), specifically the section "Generating `credentials.json` for Public Eval". This part is necessary if using public evaluation.
You can skip this step at the debugging stage, since it is only 8 Google Drive tasks and it is more and more annoying to make it due to their policy.
<p align="center">
<img src="./assets/pubeval_gdrive_auth.jpg" alt="pubeval_gdrive_auth" style="width:80%;" />
</p>
### 2.2 Proxy Setup
- Register at [DataImpulse](https://dataimpulse.com/).
- Purchase a US residential IP package (approximately $1 per 1GB).
- Configure your credentials in `OSWorld/evaluation_examples/settings/proxy/dataimpulse.json`:
```json
@ -117,18 +234,35 @@ Follow the instructions in [ACCOUNT_GUIDELINE](./ACCOUNT_GUIDELINE.md), specific
]
```
We have set proxy to True in the config JSON files for those proxy-sensitive tasks. OSWorld will automatically wrap these tasks with a proxy when DesktopEnv's `enable_proxy=True`, while other tasks will not be affected.
We recommend using a proxy.
If you don't need it at all, please set `enable_proxy=False` in the experiment's `.py` file:
```
env = DesktopEnv(
...
enable_proxy=False,
...
)
```
(We didn't make too much explanantion on the DesktopEnv interface, please read theough the code to get to understand here.)
Note that disabling the proxy will cause some tasks under the Chrome domain to fail.
### 2.3 Set Environment Variables
```bash
export OPENAI_API_KEY_CUA="your_api_key"
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_security_access_key"
export AWS_REGION="your_aws_region" # eg. us-east-1
export AWS_SUBNET_ID="subnet-0a4b0c5b8f6066712"
export AWS_SECURITY_GROUP_ID="sg-08a53433e9b4abde6"
# export OPENAI_API_KEY_CUA="your_openai_api_key" # if you use openai API
# export ANTHROPIC_API_KEY="your_anthropic_api_key" # if you use anthropic API
# export DASHSCOPE_API_KEY="your_dashscope_api_key" # if you use dashscope API from alibaba qwen
# export DOUBAO_API_KEY, DOUBAO_API_URL = "", "" # if you use doubao seed API from bytedance ui_tars
export AWS_ACCESS_KEY_ID="your_access_key" # key we mentioned before
export AWS_SECRET_ACCESS_KEY="your_security_access_key" # key we mentioned before
export AWS_REGION="your_aws_region" # eg. us-east-1, or leave it, it will be set default to us-east-1
export AWS_SECURITY_GROUP_ID="sg-xxxx" # the security group we mentioned before
export AWS_SUBNET_ID="subnet-xxxx" # the subnet we mentioned before
```
## 3. Running Evaluations
Use the `run_multienv_xxx.py` scripts to launch tasks in parallel.
@ -136,15 +270,31 @@ Use the `run_multienv_xxx.py` scripts to launch tasks in parallel.
Example (with the OpenAI CUA agent):
```bash
# --client_password set to the one you set to the client machine
# Run OpenAI CUA
python run_multienv_openaicua.py \
--headless \
--observation_type screenshot \
--model computer-use-preview \
--result_dir ./results_all \
--result_dir ./results_operator \
--test_all_meta_path evaluation_examples/test_all.json \
--region us-east-1 \
--max_steps 150 \
--num_envs 5
--max_steps 50 \
--num_envs 5 \
--client_password osworld-public-evaluation
# Run Anthropic (via AWS Bedrock), please modify agent if you want Anthropic endpoint
python run_multienv_claude.py \
--headless \
--observation_type screenshot \
--action_space claude_computer_use \
--model claude-4-sonnet-20250514 \
--result_dir ./results_claude \
--test_all_meta_path evaluation_examples/test_all.json \
--max_steps 50 \
--num_envs 5 \
--provider_name aws \
--client_password osworld-public-evaluation
```
Key Parameters:
@ -155,6 +305,8 @@ Key Parameters:
- `--test_all_meta_path`: Path to the test set metadata
- `--region`: AWS region
Usually the running code is named with `run_multi_env_xxx.py` under main folder, and the agent implementation is under `mm_agents` folder.
Add according to your needs.
## 4. Viewing Results
@ -170,6 +322,23 @@ Then, open your Host's **public IP** on port `8080` in a browser. (eg. `http://<
For more, see: [MONITOR_README](./monitor/README.md)
### 4.2 VNC Remote Desktop Access
<p align="center">
<img src="./assets/pubeval_monitor1.jpg" alt="pubeval_monitor" style="width:80%;" />
</p>
<p align="center">
<img src="./assets/pubeval_monitor2.jpg" alt="pubeval_monitor" style="width:80%;" />
</p>
You can also access Client instances via VNC at`http://<client-public-ip>:5910/vnc.html`
### 4.2 VNC Remote Desktop Access
We pre-install vnc for every virtual machine so you can have a look on it during the running.
You can access via VNC at`http://<client-public-ip>:5910/vnc.html`
The password set default is `osworld-public-evaluation` in our AMI to prevent attack.
## 5. Contact the team to update leaderboard and fix errors (optional)
If you want your results to be displayed on the leaderboard, please send a message to the OSWorld leaderboard maintainers (tianbaoxiexxx@gmail.com, yuanmengqi732@gmail.com) and open a pull request. We can update the results in the self-reported section.
If you want your results to be verified and displayed in the verified leaderboard section, we need you to schedule a meeting with us to run your agent code on our side to obtain results and have us report them. Alternatively, if you are from a trusted institution, you can share your monitor and trajectories with us.
If you discover new errors or the environment has undergone some changes, please contact us via GitHub issues or email.

147
README.md
View File

@ -33,9 +33,10 @@
## 📢 Updates
- 2025-07-28: Introducing **OSWorld-Verified**! We have made major updates, fixed several issues reported by the community, with more support for AWS (can reduce evaluation time to within 1 hour through parallelization!), and making the benchmark signals more effective. Check out more in the [report](https://xlang.ai/blog/osworld-verified). We have run new model results in the latest version and updated them on the [official website](https://os-world.github.io/). Please compare your OSWorld results with the new benchmark results when running the latest version.
- 2025-05-01: If you need pre-downloaded files for init state setup, we downloaded for you [here](https://drive.google.com/file/d/1XlEy49otYDyBlA3O9NbR0BpPfr2TXgaD/view?usp=drive_link).
- 2024-10-22: We supported Docker🐳 for hosting virtual machines on virtualized platforms. Check below for detailed instructions!
- 2024-06-15: We refactor the code of environment part to decompose VMware Integration, and start to support other platforms such as VitualBox, AWS, Azure, etc. Hold tight!
- 2024-06-15: We refactor the code of environment part to decompose VMware Integration, and start to support other platforms such as VirtualBox, AWS, Azure, etc. Hold tight!
- 2024-04-11: We released our [paper](https://arxiv.org/abs/2404.07972), [environment and benchmark](https://github.com/xlang-ai/OSWorld), and [project page](https://os-world.github.io/). Check it out!
## 💾 Installation
@ -43,7 +44,7 @@
Suppose you are operating on a system that has not been virtualized (e.g. your desktop, laptop, bare metal machine), meaning you are not utilizing a virtualized environment like AWS, Azure, or k8s.
If this is the case, proceed with the instructions below. However, if you are on a virtualized platform, please refer to the [Docker](https://github.com/xlang-ai/OSWorld?tab=readme-ov-file#docker-server-with-kvm-support-for-the-better) section.
1. First, clone this repository and `cd` into it. Then, install the dependencies listed in `requirements.txt`. It is recommended that you use the latest version of Conda to manage the environment, but you can also choose to manually install the dependencies. Please ensure that the version of Python is >= 3.9.
1. First, clone this repository and `cd` into it. Then, install the dependencies listed in `requirements.txt`. It is recommended that you use the latest version of Conda to manage the environment, but you can also choose to manually install the dependencies. Please ensure that the version of Python is >= 3.10.
```bash
# Clone the OSWorld repository
git clone https://github.com/xlang-ai/OSWorld
@ -52,7 +53,7 @@ git clone https://github.com/xlang-ai/OSWorld
cd OSWorld
# Optional: Create a Conda environment for OSWorld
# conda create -n osworld python=3.9
# conda create -n osworld python=3.10
# conda activate osworld
# Install required dependencies
@ -64,7 +65,7 @@ Alternatively, you can install the environment without any benchmark tasks:
pip install desktop-env
```
2. Install [VMware Workstation Pro](https://www.vmware.com/products/workstation-pro/workstation-pro-evaluation.html) (for systems with Apple Chips, you should install [VMware Fusion](https://support.broadcom.com/group/ecx/productdownloads?subfamily=VMware+Fusion)) and configure the `vmrun` command. The installation process can refer to [How to install VMware Worksation Pro](desktop_env/providers/vmware/INSTALL_VMWARE.md). Verify the successful installation by running the following:
2. Install [VMware Workstation Pro](https://www.vmware.com/products/workstation-pro/workstation-pro-evaluation.html) (for systems with Apple Chips, you should install [VMware Fusion](https://support.broadcom.com/group/ecx/productdownloads?subfamily=VMware+Fusion)) and configure the `vmrun` command. The installation process can refer to [How to install VMware Workstation Pro](desktop_env/providers/vmware/INSTALL_VMWARE.md). Verify the successful installation by running the following:
```bash
vmrun -T ws list
```
@ -73,7 +74,7 @@ If the installation along with the environment variable set is successful, you w
All set! Our setup script will automatically download the necessary virtual machines and configure the environment for you.
### Docker (Server (with KVM Support for the better))
### Docker (Server with KVM Support for Better Performance)
If you are running on a non-bare metal server, or prefer not to use VMware and VirtualBox platforms, we recommend using our Docker support.
#### Prerequisite: Check if your machine supports KVM
@ -93,6 +94,11 @@ Add the following arguments when initializing `DesktopEnv`:
- `os_type`: `Ubuntu` or `Windows`, depending on the OS of the VM
> **Note**: If the experiment is interrupted abnormally (e.g., by interrupting signals), there may be residual docker containers which could affect system performance over time. Please run `docker stop $(docker ps -q) && docker rm $(docker ps -a -q)` to clean up.
### AWS
Using cloud services for parallel evaluation can significantly accelerate evaluation efficiency (can reduce evaluation time to within 1 hour through parallelization!) and can even be used as infrastructure for training.
We provide comprehensive AWS support with a Host-Client architecture that enables large-scale parallel evaluation of OSWorld tasks.
For detailed setup instructions, see [Public Evaluation Guideline](https://github.com/xlang-ai/OSWorld/blob/main/PUBLIC_EVALUATION_GUIDELINE.md) and [AWS Configuration Guide](https://github.com/xlang-ai/OSWorld/blob/main/desktop_env/providers/aws/AWS_GUIDELINE.md).
### Others
We are working on supporting more 👷. Please hold tight!
@ -100,91 +106,99 @@ We are working on supporting more 👷. Please hold tight!
## 🚀 Quick Start
Run the following minimal example to interact with the environment:
```python
from desktop_env.desktop_env import DesktopEnv
```bash
# Basic usage with default settings
python quickstart.py
example = {
"id": "94d95f96-9699-4208-98ba-3c3119edf9c2",
"instruction": "I want to install Spotify on my current system. Could you please help me?",
"config": [
{
"type": "execute",
"parameters": {
"command": [
"python",
"-c",
"import pyautogui; import time; pyautogui.click(960, 540); time.sleep(0.5);"
]
}
}
],
"evaluator": {
"func": "check_include_exclude",
"result": {
"type": "vm_command_line",
"command": "which spotify"
},
"expected": {
"type": "rule",
"rules": {
"include": ["spotify"],
"exclude": ["not found"]
}
}
}
}
env = DesktopEnv(action_space="pyautogui")
obs = env.reset(task_config=example)
obs, reward, done, info = env.step("pyautogui.rightClick()")
# Customize provider and VM path
python quickstart.py --provider_name vmware --path_to_vm "path/to/your/vm.vmx"
```
You will see all the logs of the system running normally, including the successful creation of the environment, completion of setup, and successful execution of actions. In the end, you will observe a successful right-click on the screen, which means you are ready to go.
## 🧪 Experiments
### Agent Baselines
If you wish to run the baseline agent used in our paper, you can execute the following command as an example under the GPT-4V pure-screenshot setting:
> **⚠️ Important Configuration Requirements:**
>
> * **Google Account Tasks**: Some tasks require Google account access and OAuth2.0 configuration. Please refer to [Google Account Guideline](ACCOUNT_GUIDELINE.md) for detailed setup instructions.
> * **Proxy Configuration**: Some tasks may require proxy settings to function properly (this depends on the strength of website defenses against your network location). Please refer to your system's proxy configuration documentation.
> * **Impact of Missing Configuration**: If these configurations are not properly set up, the corresponding tasks will fail to execute correctly, leading to lower evaluation scores.
If you wish to run the baseline agent used in our paper, you can execute the following command as an example under the GPT-4o pure-screenshot setting:
Set **OPENAI_API_KEY** environment variable with your API key
```bash
export OPENAI_API_KEY='changeme'
```
Optionally, set **OPENAI_BASE_URL** to use a custom OpenAI-compatible API endpoint
```bash
python run.py --path_to_vm Ubuntu/Ubuntu.vmx --headless --observation_type screenshot --model gpt-4-vision-preview --result_dir ./results
export OPENAI_BASE_URL='http://your-custom-endpoint.com/v1' # Optional: defaults to https://api.openai.com
```
The results, which include screenshots, actions, and video recordings of the agent's task completion, will be saved in the `./results` directory in this case. You can then run the following command to obtain the result:
Single-threaded execution (deprecated, using `vmware` provider as example)
```bash
python run.py \
--provider_name vmware \
--path_to_vm Ubuntu/Ubuntu.vmx \
--headless \
--observation_type screenshot \
--model gpt-4o \
--sleep_after_execution 3 \
--max_steps 15 \
--result_dir ./results \
--client_password password
```
Parallel execution (example showing switching provider to `docker`)
```bash
python run_multienv.py \
--provider_name docker \
--headless \
--observation_type screenshot \
--model gpt-4o \
--sleep_after_execution 3 \
--max_steps 15 \
--num_envs 10 \
--client_password password
```
The results, which include screenshots, actions, and video recordings of the agent's task completion, will be saved in the `./results` (or other `result_dir` you specified) directory in this case.
You can then run the following command to obtain the result:
```bash
python show_result.py
```
### Evaluation
## Evaluation
### Local Evaluation
Please start by reading through the [agent interface](https://github.com/xlang-ai/OSWorld/blob/main/mm_agents/README.md) and the [environment interface](https://github.com/xlang-ai/OSWorld/blob/main/desktop_env/README.md).
Correctly implement the agent interface and import your customized version in the `run.py` file.
Correctly implement the agent interface and import your customized version in the `run.py` or `run_multienv.py` file.
Afterward, you can execute a command similar to the one in the previous section to run the benchmark on your agent.
### Public Evaluation
If you want your results to be verified and displayed on the verified leaderboard, you need to schedule a meeting with us (current maintainer: tianbaoxiexxx@gmail.com, yuanmengqi732@gmail.com) to run your agent code on our side and have us report the results.
You need to upload and allow us to disclose your agent implementation under the OSWorld framework (you may choose not to expose your model API to the public), along with a report that allows the public to understand what's happening behind the scenes.
Alternatively, if you are from a trusted institution, you can share your monitoring data and trajectories with us.
Please carefully follow the [Public Evaluation Guideline](https://github.com/xlang-ai/OSWorld/blob/main/PUBLIC_EVALUATION_GUIDELINE.md) to get results.
## ❓ FAQ
### What is the username and password for the virtual machines?
The username and password for the virtual machines are as follows:
- **Ubuntu:** `user` / `password`
The username and password for the virtual machines are as follows (for provider `vmware`, `virtualbox` and `docker`): we set the account credentials for Ubuntu as `user` / `password`.
For cloud service providers like `aws`, to prevent attacks due to weak passwords, we default to `osworld-public-evaluation`.
If you make further modifications, remember to set the client_password variable and pass it to DesktopEnv and Agent (if supported) when running experiments.
Some features like setting up proxy require the environment to have the client VM password to obtain sudo privileges, and for some OSWorld tasks, the agent needs the password to obtain sudo privileges to complete them.
### How to setup the account and credentials for Google and Google Drive?
See [Account Guideline](ACCOUNT_GUIDELINE.md).
### How can I configure a proxy for the VM if I'm behind a GFW?
### How can I configure a proxy for the VM (if I'm behind the GFW, or I don't want some of my tasks to be identified as bot and get lower scores)?
See [Proxy Guideline](PROXY_GUIDELINE.md).
### What are the running times and costs under different settings?
| Setting | Expected Time* | Budget Cost (Full Test Set/Small Test Set) |
| ------------------------------ | -------------- | ------------------------------------------ |
| GPT-4V (screenshot) | 10h | $100 ($10) |
| Gemini-ProV (screenshot) | 15h | $0 ($0) |
| Claude-3 Opus (screenshot) | 15h | $150 ($15) |
| GPT-4V (a11y tree, SoM, etc.) | 30h | $500 ($50) |
\*No environment parallelism. Calculated in April 2024.
If you want to set it up yourself, please refer to [Proxy Guideline](PROXY_GUIDELINE.md).
We also provide a pre-configured solution based on dataimpulse, please refer to [proxy-setup section in PUBLIC_EVALUATION_GUIDELINE](https://github.com/xlang-ai/OSWorld/blob/main/PUBLIC_EVALUATION_GUIDELINE.md#22-proxy-setup).
### Open Source Contributors
@ -207,3 +221,14 @@ If you find this environment useful, please consider citing our work:
primaryClass={cs.AI}
}
```
## Acknowledgement for OSWorld-Verified
Special thanks to the following institutions that provided feedback and participated in the fixes (as well as institutions that provided feedback during the process): [MoonShot AI, a.k.a. Kimi](https://www.moonshot.ai/)[Human Data](https://www.hud.so/), [OpenAI](https://openai.com/), [ByteDance Seed TARS](https://seed-tars.com/), [Anthropic](https://www.anthropic.com/), [Simular](https://www.simular.ai/), [HKU Data Intelligence Lab](https://sites.google.com/view/chaoh)
Special thanks to the following students who participated in the specific fixes: [Mengqi Yuan](https://yuanmengqi.github.io/), [Danyang Zhang](https://zdy023.github.io/), [Xinzhuang Xiong](https://thisisxxz.com/), [Zhennan Shen](https://scholar.google.com/citations?user=JPwg5MwAAAAJ&hl=en), [Zilong Zhou](https://github.com/adlsdztony), Yanxu Chen, [Jiaqi Deng](https://millank0817.github.io/), [Tianbao Xie](https://tianbaoxie.com/), Junda Chen, [Jixuan Chen](https://chenjix.github.io/), [Haoyuan Wu](https://www.linkedin.com/in/haoyuan-wu-240878291/).
Special thanks to the following students who participated in running the re-evaluation: [Mengqi Yuan](https://yuanmengqi.github.io/), [Zilong Zhou](https://github.com/adlsdztony), [Xinyuan Wang](https://xinyuanwangcs.github.io/), [Bowen Wang](https://bowenbryanwang.github.io/).
## You might also be interested
- **OSWorld-MCP**: Benchmarking MCP Tool Invocation in Computer-Use Agents. [Website](https://osworld-mcp.github.io/)

Binary file not shown.

After

Width:  |  Height:  |  Size: 168 KiB

BIN
assets/pubeval_monitor1.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1015 KiB

BIN
assets/pubeval_monitor2.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 737 KiB

BIN
assets/pubeval_subnet.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 456 KiB

View File

@ -3,6 +3,7 @@ import logging
import random
from typing import Any, Dict, Optional
import time
import traceback
import requests
from desktop_env.actions import KEYBOARD_KEYS
@ -20,17 +21,41 @@ class PythonController:
self.retry_times = 3
self.retry_interval = 5
@staticmethod
def _is_valid_image_response(content_type: str, data: Optional[bytes]) -> bool:
"""Quick validation for PNG/JPEG payload using magic bytes; Content-Type is advisory.
Returns True only when bytes look like a real PNG or JPEG.
"""
if not isinstance(data, (bytes, bytearray)) or not data:
return False
# PNG magic
if len(data) >= 8 and data[:8] == b"\x89PNG\r\n\x1a\n":
return True
# JPEG magic
if len(data) >= 3 and data[:3] == b"\xff\xd8\xff":
return True
# If server explicitly marks as image, accept as a weak fallback (some environments strip magic)
if content_type and ("image/png" in content_type or "image/jpeg" in content_type or "image/jpg" in content_type):
return True
return False
def get_screenshot(self) -> Optional[bytes]:
"""
Gets a screenshot from the server. With the cursor. None -> no screenshot or unexpected error.
"""
for _ in range(self.retry_times):
for attempt_idx in range(self.retry_times):
try:
response = requests.get(self.http_server + "/screenshot")
response = requests.get(self.http_server + "/screenshot", timeout=10)
if response.status_code == 200:
logger.info("Got screenshot successfully")
return response.content
content_type = response.headers.get("Content-Type", "")
content = response.content
if self._is_valid_image_response(content_type, content):
logger.info("Got screenshot successfully")
return content
else:
logger.error("Invalid screenshot payload (attempt %d/%d).", attempt_idx + 1, self.retry_times)
logger.info("Retrying to get screenshot.")
else:
logger.error("Failed to get screenshot. Status code: %d", response.status_code)
logger.info("Retrying to get screenshot.")
@ -136,13 +161,94 @@ class PythonController:
logger.error("Failed to execute command.")
return None
def run_python_script(self, script: str) -> Optional[Dict[str, Any]]:
"""
Executes a python script on the server.
"""
payload = json.dumps({"code": script})
def execute_action(self, action: Dict[str, Any]):
for _ in range(self.retry_times):
try:
response = requests.post(self.http_server + "/run_python", headers={'Content-Type': 'application/json'},
data=payload, timeout=90)
if response.status_code == 200:
return response.json()
else:
return {"status": "error", "message": "Failed to execute command.", "output": None, "error": response.json()["error"]}
except requests.exceptions.ReadTimeout:
break
except Exception:
logger.error("An error occurred while trying to execute the command: %s", traceback.format_exc())
logger.info("Retrying to execute command.")
time.sleep(self.retry_interval)
logger.error("Failed to execute command.")
return {"status": "error", "message": "Failed to execute command.", "output": "", "error": "Retry limit reached."}
def run_bash_script(self, script: str, timeout: int = 30, working_dir: Optional[str] = None) -> Optional[Dict[str, Any]]:
"""
Executes a bash script on the server.
:param script: The bash script content (can be multi-line)
:param timeout: Execution timeout in seconds (default: 30)
:param working_dir: Working directory for script execution (optional)
:return: Dictionary with status, output, error, and returncode, or None if failed
"""
payload = json.dumps({
"script": script,
"timeout": timeout,
"working_dir": working_dir
})
for _ in range(self.retry_times):
try:
response = requests.post(
self.http_server + "/run_bash_script",
headers={'Content-Type': 'application/json'},
data=payload,
timeout=timeout + 100 # Add buffer to HTTP timeout
)
if response.status_code == 200:
result = response.json()
logger.info("Bash script executed successfully with return code: %d", result.get("returncode", -1))
return result
else:
logger.error("Failed to execute bash script. Status code: %d, response: %s",
response.status_code, response.text)
logger.info("Retrying to execute bash script.")
except requests.exceptions.ReadTimeout:
logger.error("Bash script execution timed out")
return {
"status": "error",
"output": "",
"error": f"Script execution timed out after {timeout} seconds",
"returncode": -1
}
except Exception as e:
logger.error("An error occurred while trying to execute the bash script: %s", e)
logger.info("Retrying to execute bash script.")
time.sleep(self.retry_interval)
logger.error("Failed to execute bash script after %d retries.", self.retry_times)
return {
"status": "error",
"output": "",
"error": f"Failed to execute bash script after {self.retry_times} retries",
"returncode": -1
}
def execute_action(self, action):
"""
Executes an action on the server computer.
"""
# Handle string actions
if action in ['WAIT', 'FAIL', 'DONE']:
return
# Handle dictionary actions
if type(action) == dict and action.get('action_type') in ['WAIT', 'FAIL', 'DONE']:
return
action_type = action["action_type"]
parameters = action["parameters"] if "parameters" in action else {param: action[param] for param in action if param != 'action_type'}

View File

@ -27,7 +27,7 @@ import dotenv
# Load environment variables from .env file
dotenv.load_dotenv()
CLIENT_PASSWORD = os.getenv("CLIENT_PASSWORD", "osworld-public-evaluation") # Default password for sudo operations
PROXY_CONFIG_FILE = os.getenv("PROXY_CONFIG_FILE", "evaluation_examples/settings/proxy/dataimpulse.json") # Default proxy config file
logger = logging.getLogger("desktopenv.setup")
@ -39,7 +39,7 @@ init_proxy_pool(PROXY_CONFIG_FILE) # initialize the global proxy pool
MAX_RETRIES = 20
class SetupController:
def __init__(self, vm_ip: str, server_port: int = 5000, chromium_port: int = 9222, vlc_port: int = 8080, cache_dir: str = "cache"):
def __init__(self, vm_ip: str, server_port: int = 5000, chromium_port: int = 9222, vlc_port: int = 8080, cache_dir: str = "cache", client_password: str = "", screen_width: int = 1920, screen_height: int = 1080):
self.vm_ip: str = vm_ip
self.server_port: int = server_port
self.chromium_port: int = chromium_port
@ -48,6 +48,9 @@ class SetupController:
self.http_server_setup_root: str = f"http://{vm_ip}:{server_port}/setup"
self.cache_dir: str = cache_dir
self.use_proxy: bool = False
self.client_password: str = client_password
self.screen_width: int = screen_width
self.screen_height: int = screen_height
def reset_cache_dir(self, cache_dir: str):
self.cache_dir = cache_dir
@ -196,26 +199,62 @@ class SetupController:
path: str = f["path"]
if not os.path.exists(local_path):
logger.error(f"Setup Upload - Invalid local path ({local_path}).")
return
raise Exception(f"Setup Upload - Invalid local path ({local_path}).")
form = MultipartEncoder({
"file_path": path,
"file_data": (os.path.basename(path), open(local_path, "rb"))
})
headers = {"Content-Type": form.content_type}
logger.debug(form.content_type)
# send request to server to upload file
file_size = None
try:
logger.debug("REQUEST ADDRESS: %s", self.http_server + "/setup" + "/upload")
response = requests.post(self.http_server + "/setup" + "/upload", headers=headers, data=form)
if response.status_code == 200:
logger.info("Command executed successfully: %s", response.text)
else:
logger.error("Failed to upload file. Status code: %s", response.text)
except requests.exceptions.RequestException as e:
logger.error("An error occurred while trying to send the request: %s", e)
file_size = os.path.getsize(local_path)
except Exception:
pass
max_retries = 3
last_error: Optional[Exception] = None
for attempt in range(max_retries):
try:
logger.info(
f"Uploading {os.path.basename(local_path)}{f' ({file_size} bytes)' if file_size is not None else ''} "
f"to VM at {path} (attempt {attempt + 1}/{max_retries})"
)
logger.debug("REQUEST ADDRESS: %s", self.http_server + "/setup" + "/upload")
# Open the file inside each attempt to ensure fresh stream position
with open(local_path, "rb") as fp:
form = MultipartEncoder({
"file_path": path,
"file_data": (os.path.basename(path), fp)
})
headers = {"Content-Type": form.content_type}
logger.debug(form.content_type)
# Explicit connect/read timeout to avoid hanging forever
response = requests.post(
self.http_server + "/setup" + "/upload",
headers=headers,
data=form,
timeout=(10, 600)
)
if response.status_code == 200:
logger.info(f"File uploaded successfully: {path}")
logger.debug("Upload response: %s", response.text)
last_error = None
break
else:
msg = f"Failed to upload file {path}. Status code: {response.status_code}, Response: {response.text}"
logger.error(msg)
last_error = requests.RequestException(msg)
except requests.exceptions.RequestException as e:
last_error = e
logger.error(f"Upload attempt {attempt + 1} failed for {path}: {e}")
# Exponential backoff between retries
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
if last_error is not None:
raise last_error
def _change_wallpaper_setup(self, path: str):
if not path:
@ -298,6 +337,31 @@ class SetupController:
terminates: bool = False
nb_failings = 0
def replace_screen_env_in_command(command):
password = self.client_password
width = self.screen_width
height = self.screen_height
width_half = str(width // 2)
height_half = str(height // 2)
new_command_list = []
new_command = ""
if isinstance(command, str):
new_command = command.replace("{CLIENT_PASSWORD}", password)
new_command = new_command.replace("{SCREEN_WIDTH_HALF}", width_half)
new_command = new_command.replace("{SCREEN_HEIGHT_HALF}", height_half)
new_command = new_command.replace("{SCREEN_WIDTH}", str(width))
new_command = new_command.replace("{SCREEN_HEIGHT}", str(height))
return new_command
else:
for item in command:
item = item.replace("{CLIENT_PASSWORD}", password)
item = item.replace("{SCREEN_WIDTH_HALF}", width_half)
item = item.replace("{SCREEN_HEIGHT_HALF}", height_half)
item = item.replace("{SCREEN_WIDTH}", str(width))
item = item.replace("{SCREEN_HEIGHT}", str(height))
new_command_list.append(item)
return new_command_list
command = replace_screen_env_in_command(command)
payload = json.dumps({"command": command, "shell": shell})
headers = {"Content-Type": "application/json"}
@ -445,7 +509,7 @@ class SetupController:
except requests.exceptions.RequestException as e:
logger.error("An error occurred while trying to send the request: %s", e)
def _proxy_setup(self, client_password: str = CLIENT_PASSWORD):
def _proxy_setup(self, client_password: str = ""):
"""Setup system-wide proxy configuration using proxy pool
Args:
@ -749,106 +813,108 @@ class SetupController:
def _update_browse_history_setup(self, **config):
cache_path = os.path.join(self.cache_dir, "history_new.sqlite")
db_url = "https://drive.usercontent.google.com/u/0/uc?id=1Lv74QkJYDWVX0RIgg0Co-DUcoYpVL0oX&export=download" # google drive
db_url = "https://huggingface.co/datasets/xlangai/ubuntu_osworld_file_cache/resolve/main/chrome/44ee5668-ecd5-4366-a6ce-c1c9b8d4e938/history_empty.sqlite?download=true"
if not os.path.exists(cache_path):
max_retries = 3
downloaded = False
e = None
for i in range(max_retries):
try:
response = requests.get(db_url, stream=True)
response.raise_for_status()
max_retries = 3
downloaded = False
e = None
for i in range(max_retries):
try:
response = requests.get(db_url, stream=True)
response.raise_for_status()
with open(cache_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
logger.info("File downloaded successfully")
downloaded = True
break
with open(cache_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
logger.info("File downloaded successfully")
downloaded = True
break
except requests.RequestException as e:
logger.error(
f"Failed to download {db_url} caused by {e}. Retrying... ({max_retries - i - 1} attempts left)")
if not downloaded:
raise requests.RequestException(f"Failed to download {db_url}. No retries left. Error: {e}")
except requests.RequestException as e:
logger.error(
f"Failed to download {db_url} caused by {e}. Retrying... ({max_retries - i - 1} attempts left)")
if not downloaded:
raise requests.RequestException(f"Failed to download {db_url}. No retries left. Error: {e}")
else:
logger.info("File already exists in cache directory")
# copy a new history file in the tmp folder
db_path = cache_path
with tempfile.TemporaryDirectory() as tmp_dir:
db_path = os.path.join(tmp_dir, "history_empty.sqlite")
shutil.copy(cache_path, db_path)
history = config['history']
history = config['history']
for history_item in history:
url = history_item['url']
title = history_item['title']
visit_time = datetime.now() - timedelta(seconds=history_item['visit_time_from_now_in_seconds'])
for history_item in history:
url = history_item['url']
title = history_item['title']
visit_time = datetime.now() - timedelta(seconds=history_item['visit_time_from_now_in_seconds'])
# Chrome use ms from 1601-01-01 as timestamp
epoch_start = datetime(1601, 1, 1)
chrome_timestamp = int((visit_time - epoch_start).total_seconds() * 1000000)
# Chrome use ms from 1601-01-01 as timestamp
epoch_start = datetime(1601, 1, 1)
chrome_timestamp = int((visit_time - epoch_start).total_seconds() * 1000000)
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute('''
INSERT INTO urls (url, title, visit_count, typed_count, last_visit_time, hidden)
VALUES (?, ?, ?, ?, ?, ?)
''', (url, title, 1, 0, chrome_timestamp, 0))
cursor.execute('''
INSERT INTO urls (url, title, visit_count, typed_count, last_visit_time, hidden)
VALUES (?, ?, ?, ?, ?, ?)
''', (url, title, 1, 0, chrome_timestamp, 0))
url_id = cursor.lastrowid
url_id = cursor.lastrowid
cursor.execute('''
INSERT INTO visits (url, visit_time, from_visit, transition, segment_id, visit_duration)
VALUES (?, ?, ?, ?, ?, ?)
''', (url_id, chrome_timestamp, 0, 805306368, 0, 0))
cursor.execute('''
INSERT INTO visits (url, visit_time, from_visit, transition, segment_id, visit_duration)
VALUES (?, ?, ?, ?, ?, ?)
''', (url_id, chrome_timestamp, 0, 805306368, 0, 0))
conn.commit()
conn.close()
conn.commit()
conn.close()
logger.info('Fake browsing history added successfully.')
logger.info('Fake browsing history added successfully.')
controller = PythonController(self.vm_ip, self.server_port)
controller = PythonController(self.vm_ip, self.server_port)
# get the path of the history file according to the platform
os_type = controller.get_vm_platform()
# get the path of the history file according to the platform
os_type = controller.get_vm_platform()
if os_type == 'Windows':
chrome_history_path = controller.execute_python_command(
"""import os; print(os.path.join(os.getenv('USERPROFILE'), "AppData", "Local", "Google", "Chrome", "User Data", "Default", "History"))""")[
'output'].strip()
elif os_type == 'Darwin':
chrome_history_path = controller.execute_python_command(
"""import os; print(os.path.join(os.getenv('HOME'), "Library", "Application Support", "Google", "Chrome", "Default", "History"))""")[
'output'].strip()
elif os_type == 'Linux':
if "arm" in platform.machine():
if os_type == 'Windows':
chrome_history_path = controller.execute_python_command(
"import os; print(os.path.join(os.getenv('HOME'), 'snap', 'chromium', 'common', 'chromium', 'Default', 'History'))")[
"""import os; print(os.path.join(os.getenv('USERPROFILE'), "AppData", "Local", "Google", "Chrome", "User Data", "Default", "History"))""")[
'output'].strip()
else:
elif os_type == 'Darwin':
chrome_history_path = controller.execute_python_command(
"import os; print(os.path.join(os.getenv('HOME'), '.config', 'google-chrome', 'Default', 'History'))")[
"""import os; print(os.path.join(os.getenv('HOME'), "Library", "Application Support", "Google", "Chrome", "Default", "History"))""")[
'output'].strip()
else:
raise Exception('Unsupported operating system')
form = MultipartEncoder({
"file_path": chrome_history_path,
"file_data": (os.path.basename(chrome_history_path), open(db_path, "rb"))
})
headers = {"Content-Type": form.content_type}
logger.debug(form.content_type)
# send request to server to upload file
try:
logger.debug("REQUEST ADDRESS: %s", self.http_server + "/setup" + "/upload")
response = requests.post(self.http_server + "/setup" + "/upload", headers=headers, data=form)
if response.status_code == 200:
logger.info("Command executed successfully: %s", response.text)
elif os_type == 'Linux':
if "arm" in platform.machine():
chrome_history_path = controller.execute_python_command(
"import os; print(os.path.join(os.getenv('HOME'), 'snap', 'chromium', 'common', 'chromium', 'Default', 'History'))")[
'output'].strip()
else:
chrome_history_path = controller.execute_python_command(
"import os; print(os.path.join(os.getenv('HOME'), '.config', 'google-chrome', 'Default', 'History'))")[
'output'].strip()
else:
logger.error("Failed to upload file. Status code: %s", response.text)
except requests.exceptions.RequestException as e:
logger.error("An error occurred while trying to send the request: %s", e)
raise Exception('Unsupported operating system')
self._execute_setup(["sudo chown -R user:user /home/user/.config/google-chrome/Default/History"], shell=True)
form = MultipartEncoder({
"file_path": chrome_history_path,
"file_data": (os.path.basename(chrome_history_path), open(db_path, "rb"))
})
headers = {"Content-Type": form.content_type}
logger.debug(form.content_type)
# send request to server to upload file
try:
logger.debug("REQUEST ADDRESS: %s", self.http_server + "/setup" + "/upload")
response = requests.post(self.http_server + "/setup" + "/upload", headers=headers, data=form)
if response.status_code == 200:
logger.info("Command executed successfully: %s", response.text)
else:
logger.error("Failed to upload file. Status code: %s", response.text)
except requests.exceptions.RequestException as e:
logger.error("An error occurred while trying to send the request: %s", e)
self._execute_setup(["sudo chown -R user:user /home/user/.config/google-chrome/Default/History"], shell=True)

View File

@ -3,6 +3,7 @@ from __future__ import annotations
import logging
import os
import time
import re
from typing import Callable, Any, Optional, Tuple
from typing import List, Dict, Union
@ -19,6 +20,77 @@ Metric = Callable[[Any, Any], float]
Getter = Callable[[gym.Env, Dict[str, Any]], Any]
MAX_RETRIES = 5 # Maximum retries for environment setup
def _fix_pyautogui_less_than_bug(command: str) -> str:
"""
Fix PyAutoGUI '<' character bug by converting it to hotkey("shift", ',') calls.
This fixes the known PyAutoGUI issue where typing '<' produces '>' instead.
References:
- https://github.com/asweigart/pyautogui/issues/198
- https://github.com/xlang-ai/OSWorld/issues/257
Args:
command (str): The original pyautogui command
Returns:
str: The fixed command with '<' characters handled properly
"""
# Pattern to match press('<') or press('\u003c') calls
press_pattern = r'pyautogui\.press\(["\'](?:<|\\u003c)["\']\)'
# Handle press('<') calls
def replace_press_less_than(match):
return 'pyautogui.hotkey("shift", ",")'
# First handle press('<') calls
command = re.sub(press_pattern, replace_press_less_than, command)
# Pattern to match typewrite calls with quoted strings
typewrite_pattern = r'pyautogui\.typewrite\((["\'])(.*?)\1\)'
# Then handle typewrite calls
def process_typewrite_match(match):
quote_char = match.group(1)
content = match.group(2)
# Preprocess: Try to decode Unicode escapes like \u003c to actual '<'
# This handles cases where '<' is represented as escaped Unicode
try:
# Attempt to decode unicode escapes
decoded_content = content.encode('utf-8').decode('unicode_escape')
content = decoded_content
except UnicodeDecodeError:
# If decoding fails, proceed with original content to avoid breaking existing logic
pass # English comment: Graceful degradation - fall back to original content if decoding fails
# Check if content contains '<'
if '<' not in content:
return match.group(0)
# Split by '<' and rebuild
parts = content.split('<')
result_parts = []
for i, part in enumerate(parts):
if i == 0:
# First part
if part:
result_parts.append(f"pyautogui.typewrite({quote_char}{part}{quote_char})")
else:
# Add hotkey for '<' and then typewrite for the rest
result_parts.append('pyautogui.hotkey("shift", ",")')
if part:
result_parts.append(f"pyautogui.typewrite({quote_char}{part}{quote_char})")
return '; '.join(result_parts)
command = re.sub(typewrite_pattern, process_typewrite_match, command)
return command
class DesktopEnv(gym.Env):
"""
@ -30,14 +102,15 @@ class DesktopEnv(gym.Env):
region: str = None,
path_to_vm: str = None,
snapshot_name: str = "init_state",
action_space: str = "computer_13",
action_space: str = "pyautogui",
cache_dir: str = "cache",
screen_size: Tuple[int] = (1920, 1080),
screen_size: Tuple[int] = (int(os.environ.get("SCREEN_WIDTH", 1920)), int(os.environ.get("SCREEN_HEIGHT", 1080))),
headless: bool = False,
require_a11y_tree: bool = True,
require_terminal: bool = False,
os_type: str = "Ubuntu",
enable_proxy: bool = False,
client_password: str = "",
):
"""
Args:
@ -59,6 +132,16 @@ class DesktopEnv(gym.Env):
self.region = region
self.provider_name = provider_name
self.enable_proxy = enable_proxy # Store proxy enablement setting
if client_password == "":
if self.provider_name == "aws":
self.client_password = "osworld-public-evaluation"
else:
self.client_password = "password"
else:
self.client_password = client_password
self.screen_width = screen_size[0]
self.screen_height = screen_size[1]
# Default
self.server_port = 5000
@ -75,7 +158,7 @@ class DesktopEnv(gym.Env):
# Track whether environment has been used (step/setup) to optimize snapshot revert
# docker, aws, gcp, azure are always unused as the emulator starts from a clean state
# vmware, virtualbox are always used as the emulator starts from a dirty state
if self.provider_name in {"docker", "aws", "gcp", "azure"}:
if self.provider_name in {"docker", "aws", "gcp", "azure", "aliyun", "volcengine"}:
self.is_environment_used = False
elif self.provider_name in {"vmware", "virtualbox"}:
self.is_environment_used = True
@ -87,56 +170,54 @@ class DesktopEnv(gym.Env):
self.path_to_vm = os.path.abspath(os.path.expandvars(os.path.expanduser(path_to_vm))) \
if provider_name in {"vmware", "virtualbox"} else path_to_vm
else:
self.path_to_vm = self.manager.get_vm_path(os_type=self.os_type, region=region, screen_size=(self.screen_width, self.screen_height))
self.snapshot_name = snapshot_name
self.cache_dir_base: str = cache_dir
# todo: add the logic to get the screen size from the VM
self.headless = headless
self.require_a11y_tree = require_a11y_tree
self.require_terminal = require_terminal
self.path_to_vm = self.manager.get_vm_path(os_type=self.os_type, region=region)
try:
self.snapshot_name = snapshot_name
self.cache_dir_base: str = cache_dir
# todo: add the logic to get the screen size from the VM
self.headless = headless
self.require_a11y_tree = require_a11y_tree
self.require_terminal = require_terminal
# Initialize emulator and controller
logger.info("Initializing...")
self._start_emulator()
# Initialize emulator and controller
if provider_name != "docker": # Check if this is applicable to other VM providers
logger.info("Initializing...")
self._start_emulator()
# mode: human or machine
self.instruction = None
assert action_space in ["computer_13", "pyautogui", "claude_computer_use", "autoglm_computer_use"]
self.action_space = action_space # todo: refactor it to the ActType
# mode: human or machine
self.instruction = None
assert action_space in ["computer_13", "pyautogui"]
self.action_space = action_space # todo: refactor it to the ActType
# episodic stuffs, like counters, will be updated or reset
# when calling self.reset()
self._traj_no: int = -1
self._step_no: int = 0
self.action_history: List[Dict[str, any]] = []
# episodic stuffs, like counters, will be updated or reset
# when calling self.reset()
self._traj_no: int = -1
self._step_no: int = 0
self.action_history: List[Dict[str, any]] = []
except Exception as e:
logger.error(f"Failed to initialize DesktopEnv: {e}")
# If initialization fails, we should clean up the VM
try:
self.close()
self.manager.delete_vm(self.path_to_vm, self.region)
logger.info(f"Cleaned up VM {self.path_to_vm}.")
except Exception as cleanup_error:
logger.error(f"Failed to clean up VM {self.path_to_vm}: {cleanup_error}")
raise
def _start_emulator(self):
# Power on the virtual machine
self.provider.start_emulator(self.path_to_vm, self.headless, self.os_type)
try:
# Power on the virtual machine
self.provider.start_emulator(self.path_to_vm, self.headless, self.os_type)
# Get the ip from the virtual machine, and setup the controller
vm_ip_ports = self.provider.get_ip_address(self.path_to_vm).split(':')
self.vm_ip = vm_ip_ports[0]
if len(vm_ip_ports) > 1:
self.server_port = int(vm_ip_ports[1])
self.chromium_port = int(vm_ip_ports[2])
self.vnc_port = int(vm_ip_ports[3])
self.vlc_port = int(vm_ip_ports[4])
self.controller = PythonController(vm_ip=self.vm_ip, server_port=self.server_port)
self.setup_controller = SetupController(vm_ip=self.vm_ip, server_port=self.server_port, chromium_port=self.chromium_port, vlc_port=self.vlc_port, cache_dir=self.cache_dir_base)
# Get the ip from the virtual machine, and setup the controller
vm_ip_ports = self.provider.get_ip_address(self.path_to_vm).split(':')
self.vm_ip = vm_ip_ports[0]
# Get the ports from the virtual machine (for Docker provider only)
if len(vm_ip_ports) > 1:
self.server_port = int(vm_ip_ports[1])
self.chromium_port = int(vm_ip_ports[2])
self.vnc_port = int(vm_ip_ports[3])
self.vlc_port = int(vm_ip_ports[4])
self.controller = PythonController(vm_ip=self.vm_ip, server_port=self.server_port)
self.setup_controller = SetupController(vm_ip=self.vm_ip, server_port=self.server_port, chromium_port=self.chromium_port, vlc_port=self.vlc_port, cache_dir=self.cache_dir_base, client_password=self.client_password, screen_width=self.screen_width, screen_height=self.screen_height)
except Exception as e:
try:
self.provider.stop_emulator(self.path_to_vm)
except Exception as stop_err:
logger.warning(f"Cleanup after interrupt failed: {stop_err}")
raise
def _revert_to_snapshot(self):
# Revert to certain snapshot of the virtual machine, and refresh the path to vm and ip of vm
@ -169,7 +250,10 @@ class DesktopEnv(gym.Env):
self.action_history.clear()
for attempt in range(MAX_RETRIES):
# Check and handle proxy requirement changes BEFORE starting emulator
# Only revert to snapshot if environment has been used (step/setup)
# This optimization is especially important for cloud providers like AWS
# where unnecessary snapshot operations are costly and time-consuming
if task_config is not None:
# Only consider task proxy requirement if proxy is enabled at system level
task_use_proxy = task_config.get("proxy", False) and self.enable_proxy
@ -179,10 +263,7 @@ class DesktopEnv(gym.Env):
if task_use_proxy != self.current_use_proxy:
# keep because get_info_from_website depend on this
self.current_use_proxy = task_use_proxy
# Only revert to snapshot if environment has been used (step/setup)
# This optimization is especially important for cloud providers like AWS
# where unnecessary snapshot operations are costly and time-consuming
if self.is_environment_used:
logger.info("Environment has been used, reverting to snapshot {}...".format(self.snapshot_name))
self._revert_to_snapshot()
@ -197,7 +278,7 @@ class DesktopEnv(gym.Env):
if task_config is not None:
if task_config.get("proxy", False) and self.enable_proxy:
# If using proxy and proxy is enabled, set up the proxy configuration
self.setup_controller._proxy_setup()
self.setup_controller._proxy_setup(self.client_password)
self._set_task_info(task_config)
self.setup_controller.reset_cache_dir(self.cache_dir)
logger.info("Setting up environment...")
@ -307,27 +388,34 @@ class DesktopEnv(gym.Env):
reward = 0 # todo: Define reward calculation for each example
done = False # todo: Define episode termination condition for each example
info = {}
logger.info(f"Step {self._step_no} in trajectory {self._traj_no} with action: {action}")
# handle the special actions
if action in ['WAIT', 'FAIL', 'DONE'] or (type(action) == dict and action['action_type'] in ['WAIT', 'FAIL', 'DONE']):
if action == 'WAIT':
if action == 'WAIT' or (type(action) == dict and action.get('action_type') == 'WAIT'):
time.sleep(pause)
elif action == 'FAIL':
elif action == 'FAIL' or (type(action) == dict and action.get('action_type') == 'FAIL'):
done = True
info = {"fail": True}
elif action == 'DONE':
elif action == 'DONE' or (type(action) == dict and action.get('action_type') == 'DONE'):
done = True
info = {"done": True}
if self.action_space == "computer_13":
# the set of all possible actions defined in the action representation
self.controller.execute_action(action)
elif self.action_space == "pyautogui":
if action in ['WAIT', 'FAIL', 'DONE']:
elif self.action_space == "pyautogui" or self.action_space == "claude_computer_use":
if action in ['WAIT', 'FAIL', 'DONE'] or (type(action) == dict and action.get('action_type') in ['WAIT', 'FAIL', 'DONE']):
self.controller.execute_action(action)
else:
# the set of all possible python commands insides `pyautogui`
self.controller.execute_python_command(action)
if type(action) == str:
# Fix PyAutoGUI '<' character bug before execution
fixed_command = _fix_pyautogui_less_than_bug(action)
self.controller.execute_python_command(fixed_command)
elif type(action) == dict:
# Fix PyAutoGUI '<' character bug before execution
fixed_command = _fix_pyautogui_less_than_bug(action['command'])
self.controller.execute_python_command(fixed_command)
time.sleep(pause)
observation = self._get_obs()
@ -340,19 +428,22 @@ class DesktopEnv(gym.Env):
"""
postconfig = self.evaluator.get("postconfig", [])
self.setup_controller.setup(postconfig)
self.setup_controller.setup(postconfig, self.enable_proxy)
# Mark environment as used if there were postconfig setup operations
if postconfig:
self.is_environment_used = True
if self.evaluator['func'] == "infeasible":
if len(self.action_history) > 0 and self.action_history[-1] == "FAIL":
return 1
else:
return 0
if len(self.action_history) > 0:
last_action = self.action_history[-1]
if last_action == "FAIL" or (type(last_action) == dict and last_action.get('action_type') == 'FAIL'):
return 1
return 0
else:
if len(self.action_history) > 0 and self.action_history[-1] == "FAIL":
return 0
if len(self.action_history) > 0:
last_action = self.action_history[-1]
if last_action == "FAIL" or (type(last_action) == dict and last_action.get('action_type') == 'FAIL'):
return 0
if type(self.metric) == list:
# Multiple metrics to evaluate whether the task is successfully completed

View File

@ -0,0 +1,499 @@
from __future__ import annotations
import logging
import os
import time
import re
from typing import Callable, Any, Optional, Tuple
from typing import List, Dict, Union
import gymnasium as gym
from desktop_env.controllers.python import PythonController
from desktop_env.controllers.setup import SetupController
from desktop_env.evaluators import metrics, getters
from desktop_env.providers import create_vm_manager_and_provider
logger = logging.getLogger("desktopenv.env")
Metric = Callable[[Any, Any], float]
Getter = Callable[[gym.Env, Dict[str, Any]], Any]
MAX_RETRIES = 5 # Maximum retries for environment setup
def _fix_pyautogui_less_than_bug(command: str) -> str:
"""
Fix PyAutoGUI '<' character bug by converting it to hotkey("shift", ',') calls.
This fixes the known PyAutoGUI issue where typing '<' produces '>' instead.
References:
- https://github.com/asweigart/pyautogui/issues/198
- https://github.com/xlang-ai/OSWorld/issues/257
Args:
command (str): The original pyautogui command
Returns:
str: The fixed command with '<' characters handled properly
"""
# Pattern to match press('<') or press('\u003c') calls
press_pattern = r'pyautogui\.press\(["\'](?:<|\\u003c)["\']\)'
# Handle press('<') calls
def replace_press_less_than(match):
return 'pyautogui.hotkey("shift", ",")'
# First handle press('<') calls
command = re.sub(press_pattern, replace_press_less_than, command)
# Pattern to match typewrite calls with quoted strings
typewrite_pattern = r'pyautogui\.typewrite\((["\'])(.*?)\1\)'
# Then handle typewrite calls
def process_typewrite_match(match):
quote_char = match.group(1)
content = match.group(2)
# Preprocess: Try to decode Unicode escapes like \u003c to actual '<'
# This handles cases where '<' is represented as escaped Unicode
try:
# Attempt to decode unicode escapes
decoded_content = content.encode('utf-8').decode('unicode_escape')
content = decoded_content
except UnicodeDecodeError:
# If decoding fails, proceed with original content to avoid breaking existing logic
pass # English comment: Graceful degradation - fall back to original content if decoding fails
# Check if content contains '<'
if '<' not in content:
return match.group(0)
# Split by '<' and rebuild
parts = content.split('<')
result_parts = []
for i, part in enumerate(parts):
if i == 0:
# First part
if part:
result_parts.append(f"pyautogui.typewrite({quote_char}{part}{quote_char})")
else:
# Add hotkey for '<' and then typewrite for the rest
result_parts.append('pyautogui.hotkey("shift", ",")')
if part:
result_parts.append(f"pyautogui.typewrite({quote_char}{part}{quote_char})")
return '; '.join(result_parts)
command = re.sub(typewrite_pattern, process_typewrite_match, command)
return command
class DesktopEnv(gym.Env):
"""
DesktopEnv with OpenAI Gym interface. It provides a desktop environment for setting and evaluating desktop automation tasks.
"""
def __init__(
self,
provider_name: str = "vmware",
region: str = None,
path_to_vm: str = None,
snapshot_name: str = "init_state",
action_space: str = "pyautogui",
cache_dir: str = "cache",
screen_size: Tuple[int] = (int(os.environ.get("SCREEN_WIDTH", 1920)), int(os.environ.get("SCREEN_HEIGHT", 1080))),
headless: bool = False,
require_a11y_tree: bool = True,
require_terminal: bool = False,
os_type: str = "Ubuntu",
enable_proxy: bool = False,
client_password: str = "",
):
"""
Args:
provider_name (str): virtualization provider name, default to "vmware"
region (str): the region for allocate machines, work for cloud services, default to "us-east-1"
path_to_vm (str): path to .vmx file
snapshot_name (str): snapshot name to revert to, default to "init_state"
action_space (str): "computer_13" | "pyautogui"
cache_dir (str): cache directory to cache task-related stuffs like
reference file for evaluation
screen_size (Tuple[int]): screen size of the VM
headless (bool): whether to run the VM in headless mode
require_a11y_tree (bool): whether to require accessibility tree
require_terminal (bool): whether to require terminal output
os_type (str): operating system type, default to "Ubuntu"
enable_proxy (bool): whether to enable proxy support, default to False
"""
# Initialize VM manager and vitualization provider
self.region = region
self.provider_name = provider_name
self.enable_proxy = enable_proxy # Store proxy enablement setting
if client_password == "":
if self.provider_name == "aws":
self.client_password = "osworld-public-evaluation"
else:
self.client_password = "password"
else:
self.client_password = client_password
self.screen_width = screen_size[0]
self.screen_height = screen_size[1]
# Default
self.server_port = 5000
self.chromium_port = 9222
self.vnc_port = 8006
self.vlc_port = 8080
# Initialize with default (no proxy) provider
self.current_use_proxy = False
self.manager, self.provider = None, None
self.os_type = os_type
self.path_to_vm = path_to_vm
# Track whether environment has been used (step/setup) to optimize snapshot revert
# docker, aws, gcp, azure are always unused as the emulator starts from a clean state
# vmware, virtualbox are always used as the emulator starts from a dirty state
if self.provider_name in {"docker", "aws", "gcp", "azure", "aliyun", "volcengine"}:
self.is_environment_used = False
elif self.provider_name in {"vmware", "virtualbox"}:
self.is_environment_used = True
else:
raise ValueError(f"Invalid provider name: {self.provider_name}")
self.snapshot_name = snapshot_name
self.cache_dir_base: str = cache_dir
self.headless = headless
self.require_a11y_tree = require_a11y_tree
self.require_terminal = require_terminal
# mode: human or machine
self.instruction = None
assert action_space in ["computer_13", "pyautogui", "claude_computer_use", "autoglm_computer_use"]
self.action_space = action_space # todo: refactor it to the ActType
# episodic stuffs, like counters, will be updated or reset
# when calling self.reset()
self._traj_no: int = -1
self._step_no: int = 0
self.action_history: List[Dict[str, any]] = []
def start(self):
# Initialize emulator and controller
if not self.manager and not self.provider:
logger.info("Initializing...")
self.manager, self.provider = create_vm_manager_and_provider(self.provider_name, self.region, use_proxy=False)
if self.path_to_vm:
self.path_to_vm = os.path.abspath(os.path.expandvars(os.path.expanduser(self.path_to_vm))) \
if self.provider_name in {"vmware", "virtualbox"} else self.path_to_vm
else:
self.path_to_vm = self.manager.get_vm_path(os_type=self.os_type, region=self.region, screen_size=(self.screen_width, self.screen_height))
self._start_emulator()
def _start_emulator(self):
try:
# Power on the virtual machine
self.provider.start_emulator(self.path_to_vm, self.headless, self.os_type)
# Get the ip from the virtual machine, and setup the controller
vm_ip_ports = self.provider.get_ip_address(self.path_to_vm).split(':')
self.vm_ip = vm_ip_ports[0]
# Get the ports from the virtual machine (for Docker provider only)
if len(vm_ip_ports) > 1:
self.server_port = int(vm_ip_ports[1])
self.chromium_port = int(vm_ip_ports[2])
self.vnc_port = int(vm_ip_ports[3])
self.vlc_port = int(vm_ip_ports[4])
self.controller = PythonController(vm_ip=self.vm_ip, server_port=self.server_port)
self.setup_controller = SetupController(vm_ip=self.vm_ip, server_port=self.server_port, chromium_port=self.chromium_port, vlc_port=self.vlc_port, cache_dir=self.cache_dir_base, client_password=self.client_password, screen_width=self.screen_width, screen_height=self.screen_height)
except Exception as e:
try:
self.provider.stop_emulator(self.path_to_vm)
except Exception as stop_err:
logger.warning(f"Cleanup after interrupt failed: {stop_err}")
raise
def _revert_to_snapshot(self):
# Revert to certain snapshot of the virtual machine, and refresh the path to vm and ip of vm
# due to the fact it could be changed when implemented by cloud services
path_to_vm = self.provider.revert_to_snapshot(self.path_to_vm, self.snapshot_name)
if path_to_vm and not path_to_vm == self.path_to_vm:
# path_to_vm has to be a new path
self.manager.delete_vm(self.path_to_vm, self.region)
self.manager.add_vm(path_to_vm, self.region)
self.manager.occupy_vm(path_to_vm, os.getpid(), self.region)
self.path_to_vm = path_to_vm
def _save_state(self, snapshot_name=None):
# Save the current virtual machine state to a certain snapshot name
self.provider.save_state(self.path_to_vm, snapshot_name)
def close(self):
# Close (release) the virtual machine
self.provider.stop_emulator(self.path_to_vm)
def reset(self, task_config: Optional[Dict[str, Any]] = None, seed=None, options=None) -> Dict[str, Any]:
# Reset to certain task in OSWorld
logger.info("Resetting environment...")
logger.info("Switching task...")
logger.info("Setting counters...")
self._traj_no += 1
self._step_no = 0
self.action_history.clear()
for attempt in range(MAX_RETRIES):
# Only revert to snapshot if environment has been used (step/setup)
# This optimization is especially important for cloud providers like AWS
# where unnecessary snapshot operations are costly and time-consuming
if task_config is not None:
# Only consider task proxy requirement if proxy is enabled at system level
task_use_proxy = task_config.get("proxy", False) and self.enable_proxy
if not self.enable_proxy and task_config.get("proxy", False):
logger.info("Task requires proxy but proxy is disabled at system level, ignoring proxy requirement.")
if task_use_proxy != self.current_use_proxy:
# keep because get_info_from_website depend on this
self.current_use_proxy = task_use_proxy
if self.is_environment_used:
logger.info("Environment has been used, reverting to snapshot {}...".format(self.snapshot_name))
self._revert_to_snapshot()
logger.info("Starting emulator...")
self._start_emulator()
logger.info("Emulator started.")
# Reset the usage flag after reverting
self.is_environment_used = False
else:
logger.info("Environment is clean, skipping snapshot revert (provider: {}).".format(self.provider_name))
if task_config is not None:
if task_config.get("proxy", False) and self.enable_proxy:
# If using proxy and proxy is enabled, set up the proxy configuration
self.setup_controller._proxy_setup(self.client_password)
self._set_task_info(task_config)
self.setup_controller.reset_cache_dir(self.cache_dir)
logger.info("Setting up environment...")
success = self.setup_controller.setup(self.config, task_config.get("proxy", False) and self.enable_proxy)
if success:
# Mark environment as used when setup is successfully executed
if self.config: # Only mark as used if there were actual setup operations
self.is_environment_used = True
break
else:
logger.error(
"Environment setup failed, retrying (%d/%d)...",
attempt + 1,
MAX_RETRIES,
)
time.sleep(5)
else:
break
logger.info("Environment setup complete.")
observation = self._get_obs()
return observation
def _get_obs(self):
# We provide screenshot, accessibility_tree (optional), terminal (optional), and instruction.
# can be customized and scaled
return {
"screenshot": self.controller.get_screenshot(),
"accessibility_tree": self.controller.get_accessibility_tree() if self.require_a11y_tree else None,
"terminal": self.controller.get_terminal_output() if self.require_terminal else None,
"instruction": self.instruction
}
@property
def vm_platform(self):
return self.controller.get_vm_platform()
@property
def vm_screen_size(self):
return self.controller.get_vm_screen_size()
def _set_task_info(self, task_config: Dict[str, Any]):
"""Set task info (proxy logic is handled in reset method)"""
self.task_id: str = task_config["id"]
self.cache_dir: str = os.path.join(self.cache_dir_base, self.task_id)
os.makedirs(self.cache_dir, exist_ok=True)
self.instruction = task_config["instruction"]
self.config = task_config["config"] if "config" in task_config else []
self._set_evaluator_info(task_config)
def _set_evaluator_info(self, task_config: Dict[str, Any]):
"""Set evaluator information from task config"""
if "evaluator" not in task_config:
return
# evaluator dict
# func -> metric function string, or list of metric function strings
# conj -> conjunction of multiple metrics if func is a list with length > 1, "and"/"or"
# result -> result getter config, or list of result getter configs
# expected (optional) -> expected getter config, or list of expected getter configs
# options (optional) -> metric options, or list of metric options
# if func is a str list, then result, expected (if exists), options (if exists) should also be lists of the same length
# even if one of the metrics does not need expected or options field, it should be included in the list with None
self.evaluator = task_config["evaluator"]
self.metric: Metric = [getattr(metrics, func) for func in self.evaluator["func"]] \
if isinstance(self.evaluator["func"], list) \
else getattr(metrics, self.evaluator["func"])
self.metric_conj: str = self.evaluator.get("conj", "and") # take conjunction of multiple metrics
if "result" in self.evaluator and len(self.evaluator["result"]) > 0:
self.result_getter: Getter = [getattr(getters, "get_{:}".format(res["type"])) for res in
self.evaluator["result"]] \
if isinstance(self.evaluator["result"], list) \
else getattr(getters, "get_{:}".format(self.evaluator["result"]["type"]))
else:
self.result_getter = [None] * len(self.metric) \
if isinstance(self.metric, list) \
else None
if "expected" in self.evaluator and len(self.evaluator["expected"]) > 0:
self.expected_getter: Getter = [getattr(getters, "get_{:}".format(exp["type"])) if exp else None for exp in
self.evaluator["expected"]] \
if isinstance(self.evaluator["expected"], list) \
else getattr(getters, "get_{:}".format(self.evaluator["expected"]["type"]))
else:
self.expected_getter = [None] * len(self.metric) \
if isinstance(self.metric, list) \
else None
self.metric_options: Union[List[Dict[str, Any]], Dict[str, Any]] = [opt if opt else {} for opt in
self.evaluator["options"]] \
if isinstance(self.evaluator.get("options", {}), list) \
else self.evaluator["options"] \
if "options" in self.evaluator \
else [{}] * len(self.metric) \
if isinstance(self.metric, list) \
else {}
assert (not isinstance(self.evaluator["func"], list)
or (len(self.metric) == len(self.result_getter) == len(self.expected_getter) == len(
self.metric_options)))
def step(self, action, pause=2):
self._step_no += 1
self.action_history.append(action)
# Mark environment as used when step is called
self.is_environment_used = True
reward = 0 # todo: Define reward calculation for each example
done = False # todo: Define episode termination condition for each example
info = {}
logger.info(f"Step {self._step_no} in trajectory {self._traj_no} with action: {action}")
# handle the special actions
if action in ['WAIT', 'FAIL', 'DONE'] or (type(action) == dict and action['action_type'] in ['WAIT', 'FAIL', 'DONE']):
if action == 'WAIT' or (type(action) == dict and action.get('action_type') == 'WAIT'):
time.sleep(pause)
elif action == 'FAIL' or (type(action) == dict and action.get('action_type') == 'FAIL'):
done = True
info = {"fail": True}
elif action == 'DONE' or (type(action) == dict and action.get('action_type') == 'DONE'):
done = True
info = {"done": True}
if self.action_space == "computer_13":
# the set of all possible actions defined in the action representation
self.controller.execute_action(action)
elif self.action_space == "pyautogui" or self.action_space == "claude_computer_use":
if action in ['WAIT', 'FAIL', 'DONE'] or (type(action) == dict and action.get('action_type') in ['WAIT', 'FAIL', 'DONE']):
self.controller.execute_action(action)
else:
# the set of all possible python commands insides `pyautogui`
if type(action) == str:
# Fix PyAutoGUI '<' character bug before execution
fixed_command = _fix_pyautogui_less_than_bug(action)
self.controller.execute_python_command(fixed_command)
elif type(action) == dict:
# Fix PyAutoGUI '<' character bug before execution
fixed_command = _fix_pyautogui_less_than_bug(action['command'])
self.controller.execute_python_command(fixed_command)
time.sleep(pause)
observation = self._get_obs()
return observation, reward, done, info
def evaluate(self):
"""
Evaluate whether the task is successfully completed.
"""
postconfig = self.evaluator.get("postconfig", [])
self.setup_controller.setup(postconfig, self.enable_proxy)
# Mark environment as used if there were postconfig setup operations
if postconfig:
self.is_environment_used = True
if self.evaluator['func'] == "infeasible":
if len(self.action_history) > 0:
last_action = self.action_history[-1]
if last_action == "FAIL" or (type(last_action) == dict and last_action.get('action_type') == 'FAIL'):
return 1
return 0
else:
if len(self.action_history) > 0:
last_action = self.action_history[-1]
if last_action == "FAIL" or (type(last_action) == dict and last_action.get('action_type') == 'FAIL'):
return 0
if type(self.metric) == list:
# Multiple metrics to evaluate whether the task is successfully completed
results = []
assert len(self.metric) == len(self.result_getter), "The number of metrics and result getters must be the same"
if "expected" in self.evaluator:
assert len(self.metric) == len(self.expected_getter), "The number of metrics and expected getters must be the same"
for idx, metric in enumerate(self.metric):
try:
config = self.evaluator["result"][idx]
result_state = self.result_getter[idx](self, config)
except FileNotFoundError:
logger.error("File not found!")
if self.metric_conj == 'and':
return 0
if "expected" in self.evaluator and self.expected_getter and self.evaluator["expected"]:
expected_state = self.expected_getter[idx](self, self.evaluator["expected"][idx])
metric: int = metric(result_state, expected_state, **self.metric_options[idx])
else:
metric: int = metric(result_state, **self.metric_options[idx])
if self.metric_conj == 'and' and float(metric) == 0.0:
return 0
elif self.metric_conj == 'or' and float(metric) == 1.0:
return 1
else:
results.append(metric)
return sum(results) / len(results) if self.metric_conj == 'and' else max(results)
else:
# Single metric to evaluate whether the task is successfully completed
try:
result_state = self.result_getter(self, self.evaluator["result"])
except FileNotFoundError:
logger.error("File not found!")
return 0
if "expected" in self.evaluator and self.expected_getter and self.evaluator["expected"]:
expected_state = self.expected_getter(self, self.evaluator["expected"])
metric: float = self.metric(result_state, expected_state, **self.metric_options)
else:
metric: float = self.metric(result_state, **self.metric_options)
return metric
def render(self, mode='rgb_array'):
if mode == 'rgb_array':
return self.controller.get_screenshot()
else:
raise ValueError('Unsupported render mode: {}'.format(mode))

View File

@ -16,6 +16,7 @@ from .chrome import (
get_active_tab_info,
get_enable_do_not_track,
get_enable_enhanced_safety_browsing,
get_enable_safe_browsing,
get_new_startup_page,
get_find_unpacked_extension_path,
get_data_delete_automacally,

File diff suppressed because it is too large Load Diff

View File

@ -1,10 +1,13 @@
import os
import logging
from typing import Dict, List, Set
from typing import Optional, Any, Union
from datetime import datetime
import requests
import pandas as pd
logger = logging.getLogger("desktopenv.getter.file")
def get_content_from_vm_file(env, config: Dict[str, Any]) -> Any:
"""
@ -101,16 +104,42 @@ def get_vm_file(env, config: Dict[str, Any]) -> Union[Optional[str], List[Option
for i, (p, d) in enumerate(zip(paths, dests)):
_path = os.path.join(env.cache_dir, d)
file = env.controller.get_file(p)
if file is None:
try:
# Try to get file from VM
file = env.controller.get_file(p)
if file is None:
logger.warning(f"Failed to get file from VM: {p}")
if i in gives:
cache_paths.append(None)
continue
if i in gives:
cache_paths.append(_path)
# Write file with robust error handling
try:
# Ensure cache directory exists
os.makedirs(env.cache_dir, exist_ok=True)
with open(_path, "wb") as f:
f.write(file)
logger.info(f"Successfully saved file: {_path} ({len(file)} bytes)")
except IOError as e:
logger.error(f"IO error writing file {_path}: {e}")
if i in gives:
cache_paths[-1] = None # Replace the path we just added with None
except Exception as e:
logger.error(f"Unexpected error writing file {_path}: {e}")
if i in gives:
cache_paths[-1] = None
except Exception as e:
logger.error(f"Error processing file {p}: {e}")
if i in gives:
cache_paths.append(None)
continue
if i in gives:
cache_paths.append(_path)
with open(_path, "wb") as f:
f.write(file)
return cache_paths[0] if len(cache_paths)==1 else cache_paths

View File

@ -1,6 +1,9 @@
import os
import logging
from typing import Union
logger = logging.getLogger("desktopenv.getters.info")
def get_vm_screen_size(env, config: dict) -> dict:
return env.controller.get_vm_screen_size()
@ -14,6 +17,29 @@ def get_vm_wallpaper(env, config: dict) -> Union[str, bytes]:
_path = os.path.join(env.cache_dir, config["dest"])
content = env.controller.get_vm_wallpaper()
# Check if content is None or empty
if content is None:
logger.error("Failed to get VM wallpaper: controller returned None")
# Create an empty file to prevent downstream errors
with open(_path, "wb") as f:
f.write(b"")
return _path
if not isinstance(content, bytes):
logger.error(f"Invalid wallpaper content type: {type(content)}, expected bytes")
# Create an empty file to prevent downstream errors
with open(_path, "wb") as f:
f.write(b"")
return _path
if len(content) == 0:
logger.warning("VM wallpaper content is empty")
# Create an empty file to prevent downstream errors
with open(_path, "wb") as f:
f.write(b"")
return _path
with open(_path, "wb") as f:
f.write(content)

View File

@ -3,6 +3,7 @@ import os
import re
import shutil
import io
import time
from itertools import product
from typing import Any, Dict, List, Union
@ -29,8 +30,8 @@ def is_expected_active_tab(active_tab_info: Dict[str, str], rule: Dict[str, Any]
actual_url = active_tab_info.get('url', None)
else:
actual_url = active_tab_info
print("expected_url: {}".format(expected_url))
print("actual_url: {}".format(actual_url))
logger.info("expected_url: {}".format(expected_url))
logger.info("actual_url: {}".format(actual_url))
return 1 if compare_urls(expected_url, actual_url) else 0
else:
logger.error(f"Unknown type: {match_type}")
@ -74,25 +75,36 @@ def is_expected_url_pattern_match(result, rules) -> float:
if not result:
return 0.
if type(result) == dict:
result_url = result["url"]
print("result url: {}".format(result_url))
else:
# Extract URL from result parameter - result can be either a string URL or a dict with 'url' field
if isinstance(result, str):
result_url = result
logger.info("result url: {}".format(result_url))
elif isinstance(result, dict) and 'url' in result:
result_url = result['url']
logger.info("result url: {}".format(result_url))
else:
logger.error(f"Invalid result format: {type(result)}, expected string URL or dict with 'url' field")
return 0.
logger.info(f"Result URL to match: {result_url}")
# expect_regex = re.compile(rules["expected"])
patterns = rules["expected"]
print("expected_regex: {}".format(patterns))
logger.info("expected_regex: {}".format(patterns))
for pattern in patterns:
match = re.search(pattern, result_url)
print(match)
logger.info("match: {}".format(match))
if not match:
return 0.
return 1.
def is_expected_installed_extensions(installed_extensions, expected) -> float:
print("installed_extensions: ")
print(installed_extensions)
if not installed_extensions:
return 0.
logger.info("installed_extensions: ")
logger.info(installed_extensions)
expected_extensions = expected["expected"]
# whether the expected extensions are installed
@ -109,6 +121,8 @@ def is_expected_tabs(open_tabs: List[Dict[str, str]], rule: Dict[str, Any]) -> f
"""
Checks if the expected tabs are open in Chrome.
"""
if not open_tabs:
return 0.
match_type = rule['type']
@ -146,8 +160,10 @@ def is_expected_bookmarks(bookmarks: List[str], rule: Dict[str, Any]) -> float:
bookmark['type'] == 'folder' and bookmark['name'] == 'Liked Authors'), None)
if liked_authors_folder:
# Check if it contains the specified URLs
logger.info("'Liked Authors' folder exists")
liked_authors_urls = [bookmark['url'] for bookmark in liked_authors_folder['children'] if
bookmark['type'] == 'url']
logger.info("Here is the 'Liked Authors' folder's urls: {}".format(liked_authors_urls))
urls = rule['urls']
@ -168,6 +184,9 @@ def is_expected_bookmarks(bookmarks: List[str], rule: Dict[str, Any]) -> float:
def is_expected_search_query(active_tab_info: Dict[str, str], rules: Dict[str, Any]) -> float:
if not active_tab_info:
return 0.
expected = rules['expect']
pattern = expected['pattern']
matched = re.search(pattern, active_tab_info['url'])
@ -210,6 +229,7 @@ import imagehash
from pathlib import Path
import typing
import time
def compare_pdf_images(pdf1_path: str, pdf2_path: str, **kwargs) -> float:

View File

@ -3,6 +3,10 @@ import os
import re
import xml.etree.ElementTree as ET
import zipfile
import tempfile
import subprocess
import struct
import numpy as np
from io import BytesIO
from typing import List, Dict, Any
@ -21,6 +25,77 @@ from skimage.color import rgb2lab
logger = logging.getLogger("desktopenv.metric.docs")
def read_x11_image(filepath):
"""
Pure Python X11 (XWD) format reader that converts to PIL Image.
No external dependencies required.
Args:
filepath: Path to the X11/XWD format image file
Returns:
PIL.Image: Converted image in RGB format
Raises:
ValueError: If the format is not supported
IOError: If file cannot be read
"""
with open(filepath, 'rb') as f:
# Read X11 header
header_data = f.read(100)
# Parse header (big endian format)
header_size = struct.unpack('>I', header_data[0:4])[0]
version = struct.unpack('>I', header_data[4:8])[0]
pixmap_format = struct.unpack('>I', header_data[8:12])[0]
pixmap_depth = struct.unpack('>I', header_data[12:16])[0]
pixmap_width = struct.unpack('>I', header_data[16:20])[0]
pixmap_height = struct.unpack('>I', header_data[20:24])[0]
logger.debug(f"X11 image info: {pixmap_width}x{pixmap_height}, depth={pixmap_depth}")
# Skip to the end of header
f.seek(header_size)
# Read pixel data based on depth
if pixmap_depth == 32:
# 32-bit RGBA format
bytes_per_pixel = 4
total_pixels = pixmap_width * pixmap_height
pixel_data = f.read(total_pixels * bytes_per_pixel)
# Convert to numpy array
pixels = np.frombuffer(pixel_data, dtype=np.uint8)
# Reshape to image dimensions
pixels = pixels.reshape((pixmap_height, pixmap_width, bytes_per_pixel))
# X11 format is typically BGRA, convert to RGB
# Swap B and R channels, ignore alpha
rgb_pixels = pixels[:, :, [2, 1, 0]] # BGR -> RGB
# Create PIL image
image = Image.fromarray(rgb_pixels, 'RGB')
return image
elif pixmap_depth == 24:
# 24-bit RGB format
bytes_per_pixel = 3
total_pixels = pixmap_width * pixmap_height
pixel_data = f.read(total_pixels * bytes_per_pixel)
# Convert to numpy array and reshape
pixels = np.frombuffer(pixel_data, dtype=np.uint8)
pixels = pixels.reshape((pixmap_height, pixmap_width, bytes_per_pixel))
# Create PIL image (assuming RGB order)
image = Image.fromarray(pixels, 'RGB')
return image
else:
raise ValueError(f'Unsupported X11 pixel depth: {pixmap_depth}. Only 24-bit and 32-bit formats are supported.')
def find_default_font(config_file_path, rules):
"""Find the default font in LibreOffice Writer."""
default_font = None
@ -294,29 +369,136 @@ def compare_docx_images(docx_file1, docx_file2):
def compare_image_text(image_path, rule):
if not image_path:
return 0
reader = easyocr.Reader(['en'])
result = reader.readtext(image_path)
extracted_text = ' '.join([entry[1] for entry in result])
# Log OCR results
logger.info(f"OCR extracted texts: {[entry[1] for entry in result]}")
logger.info(f"Combined extracted text: {extracted_text}")
# Check if the image file exists
if not os.path.exists(image_path):
logger.error(f"Image file not found: {image_path}")
return 0
if rule['type'] == 'text':
target_text = rule['text']
match_found = target_text in extracted_text
# Check image format and convert if necessary
temp_image_path = None
actual_image_path = image_path
try:
# First, try to identify the file format
result = subprocess.run(['file', image_path], capture_output=True, text=True)
file_info = result.stdout.lower()
# Log matching results
logger.info(f"Target text: '{target_text}'")
logger.info(f"Match found: {match_found}")
if match_found:
logger.info("✅ Text matching successful!")
else:
logger.info("❌ Text matching failed!")
# If it's an X11 screen dump, we need to convert it
if 'x-window screen dump' in file_info or 'xwd' in file_info:
logger.info(f"Detected X11 screen dump format in {image_path}, attempting conversion...")
# Create a temporary file for the converted image
temp_fd, temp_image_path = tempfile.mkstemp(suffix='.png')
os.close(temp_fd)
# Try to convert using PIL with xwd support or other methods
try:
# First try with PIL directly (sometimes it can handle xwd)
img = Image.open(image_path)
img.save(temp_image_path, 'PNG')
actual_image_path = temp_image_path
logger.info(f"Successfully converted X11 image using PIL")
except Exception as pil_error:
logger.warning(f"PIL conversion failed: {pil_error}")
# Try our custom X11 reader (pure Python solution)
try:
logger.info("Attempting conversion using custom X11 reader...")
x11_image = read_x11_image(image_path)
x11_image.save(temp_image_path, 'PNG')
actual_image_path = temp_image_path
logger.info(f"✅ Successfully converted X11 image using custom reader")
except Exception as custom_error:
logger.warning(f"Custom X11 conversion failed: {custom_error}")
# Try with netpbm tools if available (fallback)
try:
result = subprocess.run(['which', 'xwdtopnm'], capture_output=True)
if result.returncode == 0:
# Use netpbm tools chain: xwdtopnm -> pnmtopng
subprocess.run(['xwdtopnm', image_path],
stdout=subprocess.PIPE, check=True)
with open(temp_image_path, 'wb') as f:
result = subprocess.run(['xwdtopnm', image_path],
stdout=subprocess.PIPE, check=True)
result2 = subprocess.run(['pnmtopng'],
input=result.stdout,
stdout=f, check=True)
actual_image_path = temp_image_path
logger.info(f"Successfully converted X11 image using netpbm tools")
else:
raise Exception("netpbm tools not available")
except Exception as netpbm_error:
logger.warning(f"netpbm conversion failed: {netpbm_error}")
# All conversions failed
logger.error(
f"❌ All X11 conversion methods failed.\n"
f"Attempted: PIL → Custom Python reader → netpbm tools\n"
f"💡 The image might be corrupted or in an unsupported X11 variant"
)
# If all conversions fail, try to use the original file anyway
# Sometimes easyocr might handle it better than PIL
if temp_image_path:
os.unlink(temp_image_path)
temp_image_path = None
actual_image_path = image_path
logger.info(f"Will attempt OCR on original file format (likely to fail)")
return 1 if match_found else 0
else:
raise ValueError("Unsupported rule type")
# Now attempt OCR with error handling
try:
reader = easyocr.Reader(['en'])
result = reader.readtext(actual_image_path)
extracted_text = ' '.join([entry[1] for entry in result])
# Log OCR results
logger.info(f"OCR extracted texts: {[entry[1] for entry in result]}")
logger.info(f"Combined extracted text: {extracted_text}")
if rule['type'] == 'text':
target_text = rule['text']
match_found = target_text in extracted_text
# Log matching results
logger.info(f"Target text: '{target_text}'")
logger.info(f"Match found: {match_found}")
if match_found:
logger.info("✅ Text matching successful!")
else:
logger.info("❌ Text matching failed!")
return 1 if match_found else 0
else:
raise ValueError("Unsupported rule type")
except Exception as ocr_error:
logger.error(f"OCR processing failed for {actual_image_path}: {ocr_error}")
# Check if this is specifically an X11 format issue
if 'x-window screen dump' in file_info or 'xwd' in file_info:
logger.error(
f"🚨 OCR failed on X11 screen dump after all conversion attempts.\n"
f"This might indicate:\n"
f" 1. The X11 file is corrupted or in an unsupported variant\n"
f" 2. Missing dependencies (numpy, PIL)\n"
f" 3. Insufficient memory for large images"
)
return 0
except Exception as e:
logger.error(f"Error processing image {image_path}: {e}")
return 0
finally:
# Clean up temporary file if created
if temp_image_path and os.path.exists(temp_image_path):
try:
os.unlink(temp_image_path)
except:
pass
def compare_line_spacing(docx_file1, docx_file2):

View File

@ -298,34 +298,84 @@ def check_json(result: str, rules: Dict[str, List[Dict[str, Union[List[str], str
"""
if result is None:
logger.warning("Result file path is None, returning 0.0")
return 0.
# Check if file exists
if not os.path.exists(result):
logger.warning(f"Result file does not exist: {result}, returning 0.0")
return 0.
try:
with open(result, 'r', encoding='utf-8') as f:
if is_yaml:
try:
# Use SafeLoader instead of Loader for better security and error handling
result_data: Dict[str, Any] = yaml.safe_load(f)
if result_data is None:
logger.warning(f"YAML file {result} is empty or contains only null values, returning 0.0")
return 0.
except yaml.YAMLError as e:
logger.error(f"YAML parsing error in file {result}: {e}")
logger.error(f"File content might be corrupted or have invalid YAML syntax")
return 0.
except Exception as e:
logger.error(f"Unexpected error parsing YAML file {result}: {e}")
return 0.
else:
try:
result_data: Dict[str, Any] = json.load(f)
except json.JSONDecodeError as e:
logger.error(f"JSON parsing error in file {result}: {e}")
return 0.
except Exception as e:
logger.error(f"Unexpected error parsing JSON file {result}: {e}")
return 0.
except IOError as e:
logger.error(f"IO error reading file {result}: {e}")
return 0.
except Exception as e:
logger.error(f"Unexpected error reading file {result}: {e}")
return 0.
with open(result) as f:
if is_yaml:
result: Dict[str, Any] = yaml.load(f, Loader=yaml.Loader)
else:
result: Dict[str, Any] = json.load(f)
expect_rules = rules.get("expect", {})
unexpect_rules = rules.get("unexpect", {})
metric = True
for r in expect_rules:
value = result
for k in r["key"]:
try:
value = value[k]
except KeyError:
return 0.
metric = metric and _match_value_to_rule(value, r)
value = result_data
try:
for k in r["key"]:
try:
value = value[k]
except KeyError:
logger.debug(f"Key '{k}' not found in result data, returning 0.0")
return 0.
except TypeError:
logger.debug(f"Cannot access key '{k}' - value is not a dictionary, returning 0.0")
return 0.
metric = metric and _match_value_to_rule(value, r)
except Exception as e:
logger.error(f"Error processing expect rule {r}: {e}")
return 0.
for r in unexpect_rules:
value = result
for k in r["key"]:
try:
value = value[k]
except KeyError:
value = None
break
metric = metric and not _match_value_to_rule(value, r)
value = result_data
try:
for k in r["key"]:
try:
value = value[k]
except KeyError:
value = None
break
except TypeError:
value = None
break
metric = metric and not _match_value_to_rule(value, r)
except Exception as e:
logger.error(f"Error processing unexpect rule {r}: {e}")
return 0.
return float(metric)

View File

@ -17,6 +17,16 @@ def compare_image_list(pred_img_path_list: Union[str, List[str]],
return 0.0
pred_img = Image.open(pred_img_path)
gold_img = Image.open(gold_img_path)
# Check if images have different sizes and resize if necessary
if pred_img.size != gold_img.size:
logging.debug(f"Images have different sizes: {pred_img.size} vs {gold_img.size}, resizing predicted image to match gold image")
pred_img = pred_img.resize(gold_img.size, Image.Resampling.LANCZOS)
# Ensure both images are in the same mode for comparison
if pred_img.mode != gold_img.mode:
pred_img = pred_img.convert(gold_img.mode)
diff = ImageChops.difference(pred_img, gold_img)
if diff.getbbox():
return 0.0
@ -190,6 +200,27 @@ def calculate_image_sharpness(image_path):
def structure_check_by_mse(img1, img2, threshold=0.03):
"""Check if two images are approximately the same by MSE"""
# Ensure both images are PIL Image objects
if not hasattr(img1, 'size') or not hasattr(img2, 'size'):
# Convert numpy arrays to PIL Images if needed
if hasattr(img1, 'shape'):
img1 = Image.fromarray(img1)
if hasattr(img2, 'shape'):
img2 = Image.fromarray(img2)
# Check if images have different sizes and resize if necessary
if img1.size != img2.size:
logging.debug(f"Images have different sizes: {img1.size} vs {img2.size}, resizing first image to match second")
img1 = img1.resize(img2.size, Image.Resampling.LANCZOS)
# Ensure both images are in RGB mode for consistent comparison
if img1.mode != 'RGB':
img1 = img1.convert('RGB')
if img2.mode != 'RGB':
img2 = img2.convert('RGB')
# Now calculate MSE with properly sized images
mse = np.mean(
(np.array(img1, dtype=np.float32) / 255
- np.array(img2, dtype=np.float32) / 255) ** 2)
@ -200,7 +231,55 @@ def structure_check_by_mse(img1, img2, threshold=0.03):
def structure_check_by_ssim(img1, img2, threshold=0.9):
"""Check if two images are approximately the same by SSIM"""
similarity = ssim(np.array(img1), np.array(img2), multichannel=True, channel_axis=-1)
min_size = 7
if img1.width < min_size or img1.height < min_size or \
img2.width < min_size or img2.height < min_size:
logging.warning(f"image too small for ssim: {img1.size} vs {img2.size}")
return False
if img1.mode != 'RGB':
img1 = img1.convert('RGB')
if img2.mode != 'RGB':
img2 = img2.convert('RGB')
# Now both images are in RGB mode, so they should have the same number of channels (3)
# But we still need to check the size (though the caller should have checked)
if img1.size != img2.size:
# If the sizes are different, we cannot compare, return False
logging.debug(f"Images have different sizes: {img1.size} vs {img2.size}")
return False
array1 = np.array(img1)
array2 = np.array(img2)
# They should have the same shape now, but double check
if array1.shape != array2.shape:
logging.debug(f"Images have different shapes after conversion: {array1.shape} vs {array2.shape}")
return False
# Determine the window size for SSIM
min_dim = min(array1.shape[0], array1.shape[1])
if min_dim < 7:
# If the smallest dimension is less than 7, set win_size to the next smaller odd number
win_size = min_dim if min_dim % 2 == 1 else min_dim - 1
if win_size < 1:
logging.debug("Image too small for SSIM computation (min dimension < 1)")
return False
else:
win_size = 7 # default
try:
# For newer versions of skimage, we use channel_axis, for older versions, multichannel
# We try to use the newer way first, then fall back to the old way
try:
# Newer versions (channel_axis is available)
similarity = ssim(array1, array2, win_size=win_size, channel_axis=2)
except TypeError:
# Older versions use multichannel
similarity = ssim(array1, array2, win_size=win_size, multichannel=True)
except Exception as e:
logging.error(f"SSIM computation failed: {e}")
return False
logging.debug("SSIM: %s", similarity)
return similarity >= threshold
@ -345,13 +424,20 @@ def check_structure_sim(src_path, tgt_path):
if src_path is None or tgt_path is None:
return 0.
img_src = Image.open(src_path)
img_tgt = Image.open(tgt_path)
structure_same = structure_check_by_ssim(img_src, img_tgt)
if structure_same:
return 1.
else:
return 0.
try:
img_src = Image.open(src_path)
img_tgt = Image.open(tgt_path)
if img_src.size != img_tgt.size:
logging.debug(f"size different: src_path: {src_path}, tgt_path: {tgt_path}")
return 0.0
structure_same = structure_check_by_ssim(img_src, img_tgt)
return 1.0 if structure_same else 0.0
except Exception as e:
logging.error(f"check_structure_sim error: {str(e)}")
return 0.0
def check_structure_sim_resized(src_path, tgt_path):
@ -396,7 +482,10 @@ def check_structure_sim_resized(src_path, tgt_path):
# Check if the structure is similar
structure_same = structure_check_by_ssim(img_src_resized, img_tgt)
return structure_same
if structure_same:
return 1.
else:
return 0.
def check_contrast_increase_and_structure_sim(src_path, tgt_path):
@ -509,27 +598,119 @@ def check_image_size(src_path, rule):
return 0.
def safe_open_image_with_retry(file_path, max_retries=3, retry_delay=0.5):
"""
Safely open an image file with retry mechanism for handling truncated files
"""
import os
import time
import logging
logger = logging.getLogger(__name__)
if not file_path or not os.path.exists(file_path):
logger.error(f"File does not exist: {file_path}")
return None
for attempt in range(max_retries):
try:
# Check file size first
file_size = os.path.getsize(file_path)
if file_size == 0:
logger.warning(f"File is empty: {file_path}")
if attempt < max_retries - 1:
time.sleep(retry_delay)
continue
return None
logger.info(f"Opening image: {file_path} (size: {file_size} bytes, attempt: {attempt + 1})")
# Try to open with PIL
image = Image.open(file_path)
# Verify image can be loaded (trigger actual parsing)
image.load()
logger.info(f"Successfully opened image: {image.format} {image.mode} {image.size}")
return image
except (OSError, IOError) as e:
if "truncated" in str(e).lower() or "cannot identify" in str(e).lower():
logger.warning(f"Attempt {attempt + 1}: Image file appears truncated or corrupted: {e}")
if attempt < max_retries - 1:
logger.info(f"Retrying in {retry_delay} seconds...")
time.sleep(retry_delay)
continue
else:
logger.error(f"IO error opening image: {e}")
break
except Exception as e:
logger.error(f"Unexpected error opening image: {e}")
break
logger.error(f"Failed to open image after {max_retries} attempts: {file_path}")
return None
def check_palette_and_structure_sim(src_path, tgt_path):
"""
Check if the src image is palette-based and the structure of the two images are similar
Enhanced with robust error handling for file format issues and truncated files
gimp:06ca5602-62ca-47f6-ad4f-da151cde54cc
"""
import logging
logger = logging.getLogger(__name__)
logger.info(f"Evaluating palette and structure similarity: src={src_path}, tgt={tgt_path}")
if src_path is None or tgt_path is None:
logger.warning("Source or target path is None")
return 0.
# Check if the source image is palette-based
source_image = Image.open(src_path)
palette_based = source_image.mode == 'P'
# Check structure
target_image = Image.open(tgt_path)
source_image = source_image.convert('RGB')
structure_same = structure_check_by_ssim(source_image, target_image)
if palette_based and structure_same:
return 1.
else:
# Safely open source image with retry mechanism
source_image = safe_open_image_with_retry(src_path)
if source_image is None:
logger.error("Failed to open source image")
return 0.
try:
# Check if the source image is palette-based
palette_based = source_image.mode == 'P'
logger.info(f"Source image mode: {source_image.mode}, palette-based: {palette_based}")
# Safely open target image
target_image = safe_open_image_with_retry(tgt_path)
if target_image is None:
logger.error("Failed to open target image")
source_image.close()
return 0.
try:
# Convert source image to RGB for comparison
source_rgb = source_image.convert('RGB')
logger.info(f"Source converted to RGB: {source_rgb.mode} {source_rgb.size}")
# Check structure
structure_same = structure_check_by_ssim(source_rgb, target_image)
logger.info(f"Structure similarity check: {structure_same}")
# Evaluation logic
if palette_based and structure_same:
result = 1.0
else:
result = 0.0
logger.info(f"Evaluation result: {result} (palette_based={palette_based}, structure_same={structure_same})")
return result
finally:
target_image.close()
except Exception as e:
logger.error(f"Error during evaluation: {e}")
return 0.
finally:
source_image.close()
def check_textbox_on_leftside(src_path):
"""

View File

@ -23,22 +23,34 @@ def process_epub(filename: str) -> List[str]:
try:
with zipfile.ZipFile(filename, "r") as z_f:
with z_f.open("toc.ncx") as in_f \
, open(os.path.join(base_dir, "toc.ncx"), "w") as out_f:
contents: str = in_f.read().decode()
contents = contents.splitlines()
for l in contents:
if "navPoint" not in l:
out_f.write(l + "\n")
file_list.append(os.path.join(base_dir, "toc.ncx"))
with z_f.open("content.opf") as in_f \
, open(os.path.join(base_dir, "content.opf"), "w") as out_f:
contents: str = in_f.read().decode()
contents = contents.splitlines()
for l in contents:
if "dc:identifier" not in l:
out_f.write(l + "\n")
file_list.append(os.path.join(base_dir, "content.opf"))
# Get list of all files in the zip archive
zip_file_list = z_f.namelist()
# Process toc.ncx if it exists
if "toc.ncx" in zip_file_list:
with z_f.open("toc.ncx") as in_f \
, open(os.path.join(base_dir, "toc.ncx"), "w") as out_f:
contents: str = in_f.read().decode()
contents = contents.splitlines()
for l in contents:
if "navPoint" not in l:
out_f.write(l + "\n")
file_list.append(os.path.join(base_dir, "toc.ncx"))
else:
logger.debug("toc.ncx not found in epub file: %s", filename)
# Process content.opf if it exists
if "content.opf" in zip_file_list:
with z_f.open("content.opf") as in_f \
, open(os.path.join(base_dir, "content.opf"), "w") as out_f:
contents: str = in_f.read().decode()
contents = contents.splitlines()
for l in contents:
if "dc:identifier" not in l:
out_f.write(l + "\n")
file_list.append(os.path.join(base_dir, "content.opf"))
else:
logger.debug("content.opf not found in epub file: %s", filename)
for f_n in z_f.namelist():
if f_n.endswith(".html"):
with z_f.open(f_n) as in_f \

View File

@ -73,6 +73,9 @@ def check_image_stretch_and_center(modified_ppt, original_ppt):
original_slide_images = [shape for shape in original_slide.shapes if shape.shape_type == 13]
modified_slide_images = [shape for shape in modified_slide.shapes if shape.shape_type == 13]
if not original_slide_images:
return 0.
the_image = original_slide_images[0]
the_modified_image = None
@ -395,12 +398,38 @@ def compare_pptx_files(file1_path, file2_path, **options):
table2 = shape2.table
if enable_debug:
debug_logger.debug(f" Shape {shape_idx} - Comparing TABLE with {len(table1.rows)} rows and {len(table1.columns)} columns")
debug_logger.debug(f" Shape {shape_idx} - Table2 has {len(table2.rows)} rows and {len(table2.columns)} columns")
# Check if tables have the same dimensions
if len(table1.rows) != len(table2.rows) or len(table1.columns) != len(table2.columns):
if enable_debug:
debug_logger.debug(f" MISMATCH: Slide {slide_idx}, Shape {shape_idx} (TABLE) - Table dimensions differ:")
debug_logger.debug(f" Table1: {len(table1.rows)} rows x {len(table1.columns)} columns")
debug_logger.debug(f" Table2: {len(table2.rows)} rows x {len(table2.columns)} columns")
return 0
for row_idx in range(len(table1.rows)):
for col_idx in range(len(table1.columns)):
cell1 = table1.cell(row_idx, col_idx)
cell2 = table2.cell(row_idx, col_idx)
# Check if cells have the same number of paragraphs
if len(cell1.text_frame.paragraphs) != len(cell2.text_frame.paragraphs):
if enable_debug:
debug_logger.debug(f" MISMATCH: Slide {slide_idx}, Shape {shape_idx} (TABLE) - Cell [{row_idx},{col_idx}] - Different number of paragraphs:")
debug_logger.debug(f" Cell1 paragraphs: {len(cell1.text_frame.paragraphs)}")
debug_logger.debug(f" Cell2 paragraphs: {len(cell2.text_frame.paragraphs)}")
return 0
for para_idx, (para1, para2) in enumerate(zip(cell1.text_frame.paragraphs, cell2.text_frame.paragraphs)):
# Check if paragraphs have the same number of runs
if len(para1.runs) != len(para2.runs):
if enable_debug:
debug_logger.debug(f" MISMATCH: Slide {slide_idx}, Shape {shape_idx} (TABLE) - Cell [{row_idx},{col_idx}], Para {para_idx} - Different number of runs:")
debug_logger.debug(f" Para1 runs: {len(para1.runs)}")
debug_logger.debug(f" Para2 runs: {len(para2.runs)}")
return 0
for run_idx, (run1, run2) in enumerate(zip(para1.runs, para2.runs)):
# Check font color
if hasattr(run1.font.color, "rgb") and hasattr(run2.font.color, "rgb"):
@ -451,6 +480,14 @@ def compare_pptx_files(file1_path, file2_path, **options):
if shape1.text.strip() != shape2.text.strip() and examine_text:
return 0
# check if the number of paragraphs are the same
if len(shape1.text_frame.paragraphs) != len(shape2.text_frame.paragraphs):
if enable_debug:
debug_logger.debug(f" MISMATCH: Slide {slide_idx}, Shape {shape_idx} - Different number of paragraphs:")
debug_logger.debug(f" Shape1 paragraphs: {len(shape1.text_frame.paragraphs)}")
debug_logger.debug(f" Shape2 paragraphs: {len(shape2.text_frame.paragraphs)}")
return 0
# check if the paragraphs are the same
para_idx = 0
for para1, para2 in zip(shape1.text_frame.paragraphs, shape2.text_frame.paragraphs):
@ -487,6 +524,14 @@ def compare_pptx_files(file1_path, file2_path, **options):
if para1.level != para2.level and examine_indent:
return 0
# check if the number of runs are the same
if len(para1.runs) != len(para2.runs):
if enable_debug:
debug_logger.debug(f" MISMATCH: Slide {slide_idx}, Shape {shape_idx}, Para {para_idx} - Different number of runs:")
debug_logger.debug(f" Para1 runs: {len(para1.runs)}")
debug_logger.debug(f" Para2 runs: {len(para2.runs)}")
return 0
for run1, run2 in zip(para1.runs, para2.runs):
# check if the font properties are the same
@ -634,6 +679,12 @@ def compare_pptx_files(file1_path, file2_path, **options):
debug_logger.debug(f" MISMATCH: Text differs - '{tshape1.text.strip()}' vs '{tshape2.text.strip()}'")
return 0
# Check if text shapes have the same number of paragraphs
if len(tshape1.text_frame.paragraphs) != len(tshape2.text_frame.paragraphs):
if enable_debug:
debug_logger.debug(f" MISMATCH: Different number of paragraphs - {len(tshape1.text_frame.paragraphs)} vs {len(tshape2.text_frame.paragraphs)}")
return 0
# Compare alignment of each paragraph
for para_idx, (para1, para2) in enumerate(zip(tshape1.text_frame.paragraphs, tshape2.text_frame.paragraphs)):
from pptx.enum.text import PP_ALIGN

View File

@ -2,6 +2,9 @@ import functools
import itertools
import logging
import os.path
import re
import unicodedata
# import operator
from numbers import Number
from typing import Any, Union, cast, Callable, Iterable
@ -17,9 +20,19 @@ from openpyxl.worksheet.datavalidation import DataValidation
from openpyxl.worksheet.worksheet import Worksheet
from rapidfuzz import fuzz
from desktop_env.evaluators.metrics.utils import _match_value_to_rule, _read_cell_style, read_cell_value
from desktop_env.evaluators.metrics.utils import load_charts, load_sparklines, load_rows_or_cols, load_xlsx_styles \
, load_filters, load_pivot_tables
from desktop_env.evaluators.metrics.utils import (
_match_value_to_rule,
_read_cell_style,
read_cell_value,
)
from desktop_env.evaluators.metrics.utils import (
load_charts,
load_sparklines,
load_rows_or_cols,
load_xlsx_styles,
load_filters,
load_pivot_tables,
)
# from openpyxl.utils import coordinate_to_tuple
@ -28,16 +41,26 @@ logger = logging.getLogger("desktopenv.metric.table")
BOOK = Union[pd.ExcelFile, Workbook, str]
def _parse_sheet_idx(sheet_idx: Union[int, str]
, result: BOOK, expected: BOOK
, result_sheet_names: List[str]
, expected_sheet_names: List[str]
) -> Tuple[BOOK, str]:
# function _parse_sheet_idx {{{ #
def _parse_sheet_idx(
sheet_idx: Union[int, str],
result: BOOK,
expected: BOOK,
result_sheet_names: List[str],
expected_sheet_names: List[str],
) -> Tuple[BOOK, str]:
# function _parse_sheet_idx {{{ #
if isinstance(sheet_idx, int):
try:
index: str = result_sheet_names[sheet_idx]
except:
if not result_sheet_names or sheet_idx >= len(result_sheet_names):
logger.error(
f"Sheet index {sheet_idx} out of range. Available sheets: {result_sheet_names}"
)
index = ""
else:
index: str = result_sheet_names[sheet_idx]
logger.debug(f"Sheet index {sheet_idx} resolved to sheet: {index}")
except Exception as e:
logger.error(f"Error resolving sheet index {sheet_idx}: {e}")
index = ""
book: BOOK = result
elif sheet_idx.startswith("RI"):
@ -62,27 +85,31 @@ def _parse_sheet_idx(sheet_idx: Union[int, str]
logger.error("Unrecognized sheet index")
raise ValueError("Unrecognized sheet index")
return book, index
# }}} function _parse_sheet_idx #
# }}} function _parse_sheet_idx #
SHEET = Union[pd.DataFrame, Worksheet, List[str]]
def _load_sheet(book: BOOK, index: str) -> SHEET:
# function _load_sheet {{{ #
# function _load_sheet {{{ #
try:
if isinstance(book, str):
book: str = cast(str, book)
csv_name: str = "{:}-{:}.csv".format(os.path.splitext(book)[0], index)
with open(csv_name) as f:
csv_lines: List[str] = list(itertools.dropwhile(lambda l: len(l) == 0
, map(lambda l: l.strip()
, reversed(f.read().splitlines())
)
)
)
return csv_lines
try:
all_lines: List[str] = _safe_read_file(csv_name)
csv_lines: List[str] = list(
itertools.dropwhile(
lambda l: len(l) == 0,
map(lambda l: l.strip(), reversed(all_lines)),
)
)
return csv_lines
except (FileNotFoundError, IOError) as e:
logger.error(f"Failed to read CSV file {csv_name}: {e}")
return None
if isinstance(book, pd.ExcelFile):
return pd.read_excel(book, index)
if isinstance(book, Workbook):
@ -93,11 +120,124 @@ def _load_sheet(book: BOOK, index: str) -> SHEET:
raise e
except:
return None
# }}} function _load_sheet #
# }}} function _load_sheet #
def _safe_read_file(file_path: str) -> List[str]:
"""
Safely read a file with multiple encoding attempts.
Args:
file_path: Path to the file to read
Returns:
List of lines from the file
Raises:
FileNotFoundError: If file doesn't exist
IOError: If file cannot be read with any encoding
"""
# Common encodings to try in order of preference
encodings = [
"utf-8", # Most common modern encoding
"utf-8-sig", # UTF-8 with BOM
"latin-1", # ISO-8859-1, works with any byte sequence
"windows-1252", # Common Windows encoding
"gbk", # Chinese encoding
"cp1251", # Cyrillic encoding
"iso-8859-1", # Alternative latin-1
]
last_error = None
for encoding in encodings:
try:
with open(file_path, "r", encoding=encoding) as f:
lines = f.read().splitlines()
logger.debug(
f"Successfully read file {file_path} with encoding {encoding}"
)
return lines
except UnicodeDecodeError as e:
last_error = e
logger.debug(f"Failed to read {file_path} with encoding {encoding}: {e}")
continue
except (FileNotFoundError, IOError) as e:
# These are non-encoding related errors, re-raise immediately
raise e
# If all encodings fail, try with error handling as last resort
try:
with open(file_path, "r", encoding="utf-8", errors="replace") as f:
lines = f.read().splitlines()
logger.warning(f"Read file {file_path} with UTF-8 and error replacement")
return lines
except Exception as e:
logger.error(
f"Failed to read file {file_path} with any encoding. Last error: {last_error}"
)
raise IOError(
f"Cannot read file {file_path} with any supported encoding"
) from last_error
def compare_csv(result: str, expected: Union[str, List[str]], **options) -> float:
"""
Compare CSV files. If expected is a list, returns 1.0 if result matches any of the expected files.
Args:
result: Path to result CSV file
expected: Path to expected CSV file or list of paths to expected CSV files
options: Additional options (strict, ignore_case)
Returns:
1.0 if result matches expected (or any file in expected list), 0.0 otherwise
"""
if result is None:
return 0.0
try:
result_lines: List[str] = _safe_read_file(result)
except (FileNotFoundError, IOError) as e:
logger.error(f"Failed to read result file {result}: {e}")
return 0.0
# Convert expected to list if it's a single string (for backward compatibility)
if isinstance(expected, str):
expected_files = [expected]
else:
expected_files = expected
# Try to match against each expected file
for expected_file in expected_files:
try:
expected_lines: List[str] = _safe_read_file(expected_file)
# Process lines based on options
current_result_lines = result_lines
current_expected_lines = expected_lines
if not options.get("strict", True):
current_result_lines = map(str.strip, current_result_lines)
current_expected_lines = map(str.strip, current_expected_lines)
if options.get("ignore_case", False):
current_result_lines = map(str.lower, current_result_lines)
current_expected_lines = map(str.lower, current_expected_lines)
# Check if this expected file matches
if list(current_result_lines) == list(current_expected_lines):
return 1.0
except (FileNotFoundError, IOError):
# If this expected file doesn't exist, continue to next one
continue
# No match found
return 0.0
def compare_table(result: str, expected: str = None, **options) -> float:
# function compare_table {{{ #
# function compare_table {{{ #
"""
Args:
result (str): path to result xlsx
@ -114,13 +254,24 @@ def compare_table(result: str, expected: str = None, **options) -> float:
"""
if result is None:
return 0.
logger.error("Result file path is None")
return 0.0
# Check if result file exists
if not os.path.exists(result):
logger.error(f"Result file not found: {result}")
return 0.0
try:
logger.info(f"Loading result file: {result}")
xlworkbookr: Workbook = openpyxl.load_workbook(filename=result)
pdworkbookr = pd.ExcelFile(result)
except:
return 0.
logger.info(
f"Successfully loaded result file with sheets: {pdworkbookr.sheet_names}"
)
except Exception as e:
logger.error(f"Failed to load result file {result}: {e}")
return 0.0
worksheetr_names: List[str] = pdworkbookr.sheet_names
if expected is not None:
@ -132,32 +283,42 @@ def compare_table(result: str, expected: str = None, **options) -> float:
pdworkbooke = None
worksheete_names: List[str] = None
parse_idx: Callable[[Union[str, int], BOOK, BOOK], Tuple[BOOK, str]] = \
parse_idx: Callable[[Union[str, int], BOOK, BOOK], Tuple[BOOK, str]] = (
functools.partial(
_parse_sheet_idx,
result_sheet_names=worksheetr_names,
expected_sheet_names=worksheete_names
expected_sheet_names=worksheete_names,
)
)
passes = True
for r in options["rules"]:
if r["type"] == "sheet_name":
# Compare Sheet Names {{{ #
# Compare Sheet Names {{{ #
metric: bool = worksheetr_names == worksheete_names
logger.debug("Assertion: %s.sheet_names == %s.sheet_names - %s", result, expected, metric)
# }}} Compare Sheet Names #
logger.debug(
"Assertion: %s.sheet_names == %s.sheet_names - %s",
result,
expected,
metric,
)
# }}} Compare Sheet Names #
elif r["type"] == "sheet_data":
# Compare Sheet Data by Internal Value {{{ #
# Compare Sheet Data by Internal Value {{{ #
# sheet_idx0: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# sheet_idx1: as sheet_idx0
# precision: int as number of decimal digits, default to 4
error_limit: int = r.get("precision", 4)
sheet1: pd.DataFrame = _load_sheet(*parse_idx(r["sheet_idx0"], pdworkbookr, pdworkbooke))
sheet1: pd.DataFrame = _load_sheet(
*parse_idx(r["sheet_idx0"], pdworkbookr, pdworkbooke)
)
if sheet1 is None:
return 0.
sheet2: pd.DataFrame = _load_sheet(*parse_idx(r["sheet_idx1"], pdworkbookr, pdworkbooke))
return 0.0
sheet2: pd.DataFrame = _load_sheet(
*parse_idx(r["sheet_idx1"], pdworkbookr, pdworkbooke)
)
sheet1 = sheet1.round(error_limit)
sheet2 = sheet2.round(error_limit)
@ -168,28 +329,36 @@ def compare_table(result: str, expected: str = None, **options) -> float:
logger.debug("Sheet1 =v= Sheet2: \n%s", str(sheet1 == sheet2))
except:
logger.debug("Sheet1 =/v= Sheet2")
logger.debug("Assertion: %s =v= %s - %s", r["sheet_idx0"], r["sheet_idx1"], metric)
# }}} Compare Sheet Data by Internal Value #
logger.debug(
"Assertion: %s =v= %s - %s", r["sheet_idx0"], r["sheet_idx1"], metric
)
# }}} Compare Sheet Data by Internal Value #
elif r["type"] == "sheet_print":
# Compare Sheet Data by Printed Value {{{ #
# Compare Sheet Data by Printed Value {{{ #
# sheet_idx0: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# sheet_idx1: as sheet_idx0
# ignore_case: optional, defaults to False
sheet1: List[str] = _load_sheet(*parse_idx(r["sheet_idx0"], result, expected))
sheet1: List[str] = _load_sheet(
*parse_idx(r["sheet_idx0"], result, expected)
)
if sheet1 is None:
return 0.
sheet2: List[str] = _load_sheet(*parse_idx(r["sheet_idx1"], result, expected))
return 0.0
sheet2: List[str] = _load_sheet(
*parse_idx(r["sheet_idx1"], result, expected)
)
if r.get("ignore_case", False):
sheet1 = [l.lower() for l in sheet1]
sheet2 = [l.lower() for l in sheet2]
metric: bool = sheet1 == sheet2
logger.debug("Assertion: %s =p= %s - %s", r["sheet_idx0"], r["sheet_idx1"], metric)
# }}} Compare Sheet Data by Printed Value #
logger.debug(
"Assertion: %s =p= %s - %s", r["sheet_idx0"], r["sheet_idx1"], metric
)
# }}} Compare Sheet Data by Printed Value #
elif r["type"] == "sheet_fuzzy":
# Fuzzy Match for Ranges {{{ #
# Fuzzy Match for Ranges {{{ #
# sheet_idx0: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# sheet_idx1: as sheet_idx0
# rules: list of dict, each dict is like
@ -209,7 +378,9 @@ def compare_table(result: str, expected: str = None, **options) -> float:
for rl in r["rules"]:
for rng in MultiCellRange(rl["range"]):
for cdn in rng.cells:
coordinate: str = "{:}{:d}".format(get_column_letter(cdn[1]), cdn[0])
coordinate: str = "{:}{:d}".format(
get_column_letter(cdn[1]), cdn[0]
)
value1: str = str(read_cell_value(*sheet1, coordinate))
value2: str = str(read_cell_value(*sheet2, coordinate))
logger.debug("%s: %s vs %s", cdn, value1, value2)
@ -225,8 +396,12 @@ def compare_table(result: str, expected: str = None, **options) -> float:
value2 = value2.rstrip(rl["trim_trailings"])
if "ignore_chars" in rl:
ignore_chars: Set[str] = set(rl["ignore_chars"])
value1 = "".join(filter(lambda ch: ch not in ignore_chars, value1))
value2 = "".join(filter(lambda ch: ch not in ignore_chars, value2))
value1 = "".join(
filter(lambda ch: ch not in ignore_chars, value1)
)
value2 = "".join(
filter(lambda ch: ch not in ignore_chars, value2)
)
if rl.get("ignore_case", False):
value1 = value1.lower()
value2 = value2.lower()
@ -236,91 +411,141 @@ def compare_table(result: str, expected: str = None, **options) -> float:
elif rl["type"] == "included_by":
metric: bool = value1 in value2
elif rl["type"] == "fuzzy_match":
metric: bool = fuzz.ratio(value1, value2) >= rl.get("threshold", 85.)
metric: bool = fuzz.ratio(value1, value2) >= rl.get(
"threshold", 85.0
)
elif rl["type"] == "exact_match":
metric: bool = value1 == value2
total_metric = total_metric and metric
metric: bool = total_metric
logger.debug("Assertion: %s =~= %s - %s", r["sheet_idx0"], r["sheet_idx1"], metric)
# }}} Fuzzy Match for Ranges #
logger.debug(
"Assertion: %s =~= %s - %s", r["sheet_idx0"], r["sheet_idx1"], metric
)
# }}} Fuzzy Match for Ranges #
elif r["type"] == "sparkline":
# Compare Sparklines {{{ #
# Compare Sparklines {{{ #
# sheet_idx0: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# sheet_idx1: as sheet_idx0
sparkline1: Dict[str, str] = load_sparklines(*parse_idx(r["sheet_idx0"], result, expected))
sparkline2: Dict[str, str] = load_sparklines(*parse_idx(r["sheet_idx1"], result, expected))
sparkline1: Dict[str, str] = load_sparklines(
*parse_idx(r["sheet_idx0"], result, expected)
)
sparkline2: Dict[str, str] = load_sparklines(
*parse_idx(r["sheet_idx1"], result, expected)
)
metric: bool = sparkline1 == sparkline2
logger.debug("Assertion: %s.sp == %.sp - %s", r["sheet_idx0"], r["sheet_idx1"], metric)
# }}} Compare Sparklines #
logger.debug(
"Assertion: %s.sp == %.sp - %s",
r["sheet_idx0"],
r["sheet_idx1"],
metric,
)
# }}} Compare Sparklines #
elif r["type"] == "chart":
# Compare Charts {{{ #
# Compare Charts {{{ #
# sheet_idx0: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# sheet_idx1: as sheet_idx0
# chart_props: list of str, see utils.load_charts
charts1: Dict[str, Any] = load_charts(*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke), **r)
charts2: Dict[str, Any] = load_charts(*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke), **r)
charts1: Dict[str, Any] = load_charts(
*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke), **r
)
charts2: Dict[str, Any] = load_charts(
*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke), **r
)
metric: bool = charts1 == charts2
logger.debug("Assertion: %s[chart] == %s[chart] - %s", r["sheet_idx0"], r["sheet_idx1"], metric)
# }}} Compare Charts #
logger.debug(
"Assertion: %s[chart] == %s[chart] - %s",
r["sheet_idx0"],
r["sheet_idx1"],
metric,
)
# }}} Compare Charts #
elif r["type"] == "style":
# Compare Style (Also Conditional Formatiing) {{{ #
# Compare Style (Also Conditional Formatiing) {{{ #
# sheet_idx0: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# sheet_idx1: as sheet_idx0
# props: list of str indicating concerned styles, see utils._read_cell_style
sheet_idx1: Tuple[BOOK, str] = parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke)
sheet_idx1: Tuple[BOOK, str] = parse_idx(
r["sheet_idx0"], xlworkbookr, xlworkbooke
)
book_name1: str = parse_idx(r["sheet_idx0"], result, expected)[0]
styles1: Dict[str, List[Any]] = load_xlsx_styles(*sheet_idx1, book_name1, **r)
styles1: Dict[str, List[Any]] = load_xlsx_styles(
*sheet_idx1, book_name1, **r
)
sheet_idx2: Tuple[BOOK, str] = parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke)
sheet_idx2: Tuple[BOOK, str] = parse_idx(
r["sheet_idx1"], xlworkbookr, xlworkbooke
)
book_name2: str = parse_idx(r["sheet_idx1"], result, expected)[0]
styles2: Dict[str, List[Any]] = load_xlsx_styles(*sheet_idx2, book_name2, **r)
styles2: Dict[str, List[Any]] = load_xlsx_styles(
*sheet_idx2, book_name2, **r
)
# number_formats1: List[str] = [c.number_format.lower() for col in sheet1.iter_cols() for c in col if c.value is not None and c.data_type=="n"]
# number_formats2: List[str] = [c.number_format.lower() for col in sheet2.iter_cols() for c in col if c.value is not None and c.data_type=="n"]
metric: bool = styles1 == styles2
logger.debug("Assertion: %s.style == %s.style - %s", r["sheet_idx0"], r["sheet_idx1"], metric)
# }}} Compare Style (Also Conditional Formatiing) #
logger.debug(
"Assertion: %s.style == %s.style - %s",
r["sheet_idx0"],
r["sheet_idx1"],
metric,
)
# }}} Compare Style (Also Conditional Formatiing) #
elif r["type"] == "freeze":
# Compare Freezing {{{ #
# Compare Freezing {{{ #
# sheet_idx0: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# sheet_idx1: as sheet_idx0
sheet1: Worksheet = _load_sheet(*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke))
sheet1: Worksheet = _load_sheet(
*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke)
)
if sheet1 is None:
return 0.
sheet2: Worksheet = _load_sheet(*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke))
return 0.0
sheet2: Worksheet = _load_sheet(
*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke)
)
metric: bool = sheet1.freeze_panes == sheet2.freeze_panes
logger.debug("Assertion: %s.freeze(%s) == %s.freeze(%s) - %s"
, r["sheet_idx0"], sheet1.freeze_panes
, r["sheet_idx1"], sheet2.freeze_panes
, metric
)
# }}} Compare Freezing #
logger.debug(
"Assertion: %s.freeze(%s) == %s.freeze(%s) - %s",
r["sheet_idx0"],
sheet1.freeze_panes,
r["sheet_idx1"],
sheet2.freeze_panes,
metric,
)
# }}} Compare Freezing #
elif r["type"] == "zoom":
# Check Zooming {{{ #
# Check Zooming {{{ #
# sheet_idx: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# method: str
# ref: value
sheet: Worksheet = _load_sheet(*parse_idx(r["sheet_idx"], xlworkbookr, xlworkbooke))
sheet: Worksheet = _load_sheet(
*parse_idx(r["sheet_idx"], xlworkbookr, xlworkbooke)
)
if sheet is None:
return 0.
zoom_scale: Number = sheet.sheet_view.zoomScale or 100.
return 0.0
zoom_scale: Number = sheet.sheet_view.zoomScale or 100.0
metric: bool = _match_value_to_rule(zoom_scale, r)
logger.debug("Assertion: %s.zoom(%.1f) %s %.1f - %s", r["sheet_idx"], zoom_scale, r["method"], r["ref"],
metric)
# }}} Check Zooming #
logger.debug(
"Assertion: %s.zoom(%.1f) %s %.1f - %s",
r["sheet_idx"],
zoom_scale,
r["method"],
r["ref"],
metric,
)
# }}} Check Zooming #
elif r["type"] == "data_validation":
# Check Data Validation {{{ #
# Check Data Validation {{{ #
# sheet_idx: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# dv_props: list of dict like {attribute: {"method": str, "ref": anything}}
# available attributes:
@ -340,146 +565,197 @@ def compare_table(result: str, expected: str = None, **options) -> float:
# * promptTitle
# * imeMode
sheet: Worksheet = _load_sheet(*parse_idx(r["sheet_idx"], xlworkbookr, xlworkbooke))
sheet: Worksheet = _load_sheet(
*parse_idx(r["sheet_idx"], xlworkbookr, xlworkbooke)
)
if sheet is None:
return 0.
data_validators: List[DataValidation] = sheet.data_validations.dataValidation
return 0.0
data_validators: List[DataValidation] = (
sheet.data_validations.dataValidation
)
total_metric = len(data_validators) >= len(r["dv_props"])
for dat_vldt in data_validators:
metric = False
for prpt in r["dv_props"]:
metric = metric or all(_match_value_to_rule(getattr(dat_vldt, attrbt)
, mr
) \
for attrbt, mr in prpt.items()
)
metric = metric or all(
_match_value_to_rule(getattr(dat_vldt, attrbt), mr)
for attrbt, mr in prpt.items()
)
if metric:
break
total_metric = total_metric and metric
if not total_metric:
break
logger.debug("Assertion: %s.data_validation - %s", r["sheet_idx"], total_metric)
logger.debug(
"Assertion: %s.data_validation - %s", r["sheet_idx"], total_metric
)
metric: bool = total_metric
# }}} Check Data Validation #
# }}} Check Data Validation #
elif r["type"] == "row_props":
# Check Row Properties {{{ #
# Check Row Properties {{{ #
# sheet_idx0: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# sheet_idx1: as sheet_idx0
# props: list of str, see utils.load_rows_or_cols
rows1: Dict[str, Any] = load_rows_or_cols(*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke)
, obj="row"
, **r
)
rows2: Dict[str, Any] = load_rows_or_cols(*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke)
, obj="row"
, **r
)
rows1: Dict[str, Any] = load_rows_or_cols(
*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke), obj="row", **r
)
rows2: Dict[str, Any] = load_rows_or_cols(
*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke), obj="row", **r
)
logger.debug("Rows1: %s", repr(rows1))
logger.debug("Rows2: %s", repr(rows2))
metric: bool = rows1 == rows2
logger.debug("Assertion: %s[rows] == %s[rows] - %s", r["sheet_idx0"], r["sheet_idx1"], metric)
# }}} Check Row Properties #
logger.debug(
"Assertion: %s[rows] == %s[rows] - %s",
r["sheet_idx0"],
r["sheet_idx1"],
metric,
)
# }}} Check Row Properties #
elif r["type"] == "col_props":
# Check Row Properties {{{ #
# Check Row Properties {{{ #
# sheet_idx0: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# sheet_idx1: as sheet_idx0
# props: list of str, see utils.load_rows_or_cols
cols1: Dict[str, Any] = load_rows_or_cols(*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke)
, obj="column"
, **r
)
cols2: Dict[str, Any] = load_rows_or_cols(*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke)
, obj="column"
, **r
)
cols1: Dict[str, Any] = load_rows_or_cols(
*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke), obj="column", **r
)
cols2: Dict[str, Any] = load_rows_or_cols(
*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke), obj="column", **r
)
metric: bool = cols1 == cols2
logger.debug("Assertion: %s[cols] == %s[cols] - %s", r["sheet_idx0"], r["sheet_idx1"], metric)
# }}} Check Row Properties #
logger.debug(
"Assertion: %s[cols] == %s[cols] - %s",
r["sheet_idx0"],
r["sheet_idx1"],
metric,
)
# }}} Check Row Properties #
elif r["type"] == "filter":
# Compare Filters {{{ #
# Compare Filters {{{ #
# sheet_idx0: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# sheet_idx1: as sheet_idx0
filters1: Dict[str, Any] = load_filters(*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke), **r)
filters2: Dict[str, Any] = load_filters(*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke), **r)
filters1: Dict[str, Any] = load_filters(
*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke), **r
)
filters2: Dict[str, Any] = load_filters(
*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke), **r
)
metric: bool = filters1 == filters2
logger.debug("Assertion: %s[filter] == %s[filter] - %s", r["sheet_idx0"], r["sheet_idx1"], metric)
# }}} Compare Filters #
logger.debug(
"Assertion: %s[filter] == %s[filter] - %s",
r["sheet_idx0"],
r["sheet_idx1"],
metric,
)
# }}} Compare Filters #
elif r["type"] == "pivot_table":
# Compare Pivot Tables {{{ #
# Compare Pivot Tables {{{ #
# sheet_idx0: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# sheet_idx1: as sheet_idx0
# pivot_props: list of str, see utils.load_pivot_tables
pivots1: Dict[str, Any] = load_pivot_tables(*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke), **r)
pivots2: Dict[str, Any] = load_pivot_tables(*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke), **r)
pivots1: Dict[str, Any] = load_pivot_tables(
*parse_idx(r["sheet_idx0"], xlworkbookr, xlworkbooke), **r
)
pivots2: Dict[str, Any] = load_pivot_tables(
*parse_idx(r["sheet_idx1"], xlworkbookr, xlworkbooke), **r
)
metric: bool = pivots1 == pivots2
logger.debug("Assertion: %s[pivot]==%s[pivot] - %s", r["sheet_idx0"], r["sheet_idx1"], metric)
# }}} Compare Pivot Tables #
logger.debug(
"Assertion: %s[pivot]==%s[pivot] - %s",
r["sheet_idx0"],
r["sheet_idx1"],
metric,
)
# }}} Compare Pivot Tables #
elif r["type"] == "check_cell":
# Check Cell Properties {{{ #
# Check Cell Properties {{{ #
# sheet_idx: 0 == "RI0" == "RNSheet1" | "EI0" == "ENSheet1"
# coordinate: str, "E3"
# props: dict like {attribute: {"method": str, "ref": anything}}
# supported attributes: value & those supported by utils._read_cell_style
sheet: Worksheet = _load_sheet(*parse_idx(r["sheet_idx"], xlworkbookr, xlworkbooke))
if sheet is None:
return 0.
# data_frame: pd.DataFrame = _load_sheet(*parse_idx(r["sheet_idx"], pdworkbookr, pdworkbooke))
cell: Cell = sheet[r["coordinate"]]
metric: bool = True
for prpt, rule in r["props"].items():
if prpt == "value":
val = read_cell_value(*parse_idx(r["sheet_idx"], result, expected), r["coordinate"])
else:
val = _read_cell_style(prpt, cell)
try:
sheet: Worksheet = _load_sheet(
*parse_idx(r["sheet_idx"], xlworkbookr, xlworkbooke)
)
if sheet is None:
logger.error(
f"Failed to load sheet for sheet_idx: {r['sheet_idx']}"
)
return 0.0
# data_frame: pd.DataFrame = _load_sheet(*parse_idx(r["sheet_idx"], pdworkbookr, pdworkbooke))
cell: Cell = sheet[r["coordinate"]]
metric: bool = True
for prpt, rule in r["props"].items():
if prpt == "value":
try:
parsed_result = parse_idx(r["sheet_idx"], result, expected)
logger.debug(f"parse_idx result: {parsed_result}")
val = read_cell_value(*parsed_result, r["coordinate"])
logger.debug(f"Cell {r['coordinate']} value: {val}")
except Exception as e:
logger.error(
f"Failed to read cell value at {r['coordinate']}: {e}"
)
val = None
else:
try:
val = _read_cell_style(prpt, cell)
except Exception as e:
logger.error(
f"Failed to read cell style {prpt} at {r['coordinate']}: {e}"
)
val = None
metric = metric and _match_value_to_rule(val, rule)
metric = metric and _match_value_to_rule(val, rule)
except Exception as e:
logger.error(f"Error in check_cell processing: {e}")
return 0.0
logger.debug("Assertion: %s[%s] :%s - %s"
, r["sheet_idx"], r["coordinate"]
, repr(r["props"]), metric
)
# }}} Check Cell Properties #
logger.debug(
"Assertion: %s[%s] :%s - %s",
r["sheet_idx"],
r["coordinate"],
repr(r["props"]),
metric,
)
# }}} Check Cell Properties #
else:
raise NotImplementedError("Unimplemented sheet check: {:}".format(r["type"]))
raise NotImplementedError(
"Unimplemented sheet check: {:}".format(r["type"])
)
passes = passes and metric
if not passes:
break
return float(passes)
# }}} function compare_table #
# }}} function compare_table #
def compare_csv(result: str, expected: str, **options) -> float:
if result is None:
return 0.
with open(result) as f:
result_lines: Iterable[str] = f.read().splitlines()
with open(expected) as f:
expected_lines: Iterable[str] = f.read().splitlines()
if not options.get("strict", True):
result_lines = map(str.strip, result_lines)
expected_lines = map(str.strip, expected_lines)
if options.get("ignore_case", False):
result_lines = map(str.lower, result_lines)
expected_lines = map(str.lower, expected_lines)
metric: bool = list(result_lines) == list(expected_lines)
return float(metric)
def _normalize_city_string(value: Any) -> str:
"""Lowercase, strip punctuation, and remove accents for tolerant matching."""
if value is None:
return ""
if not isinstance(value, str):
value = str(value)
normalized = unicodedata.normalize("NFKD", value)
normalized = "".join(ch for ch in normalized if not unicodedata.combining(ch))
normalized = re.sub(r"[^a-z0-9]+", " ", normalized.lower())
return normalized.strip()
def compare_conference_city_in_order(actual_city_list_path, expected_city):
@ -490,28 +766,35 @@ def compare_conference_city_in_order(actual_city_list_path, expected_city):
for row in sheet["C2:C22"]:
for cell in row:
actual_city_list.append(cell.value)
# expected_city is the city that we want to compare with the actual city list
# must in order index
# debug
try:
for i in range(len(actual_city_list)):
if isinstance(expected_city_list[i], str):
if expected_city_list[i] not in actual_city_list[i]:
logger.debug(f"Expected city {expected_city_list[i]}; Actual city {actual_city_list[i]}")
print(f"Expected city {expected_city_list[i]}; Actual city {actual_city_list[i]}")
return 0.
elif isinstance(expected_city_list[i], List):
if not any(possible_str in actual_city_list[i] for possible_str in expected_city_list[i]):
logger.debug(f"Expected city {expected_city_list[i]}; Actual city {actual_city_list[i]}")
print(f"Expected city {expected_city_list[i]}; Actual city {actual_city_list[i]}")
return 0.
for i, actual_city in enumerate(actual_city_list):
actual_normalized = _normalize_city_string(actual_city)
expected_entry = expected_city_list[i]
if isinstance(expected_entry, str):
expected_candidates = [expected_entry]
elif isinstance(expected_entry, List):
expected_candidates = expected_entry
else:
raise TypeError("Expected city should be a string or a list of strings")
except:
return 0.
matched = False
for candidate in expected_candidates:
normalized_candidate = _normalize_city_string(candidate)
if normalized_candidate and normalized_candidate in actual_normalized:
matched = True
break
return 1.
if not matched:
logger.debug(
f"Expected city {expected_entry}; Actual city {actual_city}"
)
print(f"Expected city {expected_entry}; Actual city {actual_city}")
return 0.0
except Exception as exc:
logger.error(f"Error comparing conference cities: {exc}")
return 0.0
return 1.0

View File

@ -4,12 +4,13 @@ import functools
import itertools
import logging
import operator
import os
import re
import zipfile
#import pandas as pd
from typing import Any, TypeVar, Union, Iterable, Optional, Callable
from typing import Dict, List, Set, Match, Tuple, Pattern
from urllib.parse import urlparse, urlunparse
from urllib.parse import urlparse, urlunparse, ParseResult
import formulas
import lxml.cssselect
@ -28,15 +29,17 @@ from openpyxl.worksheet.cell_range import MultiCellRange, CellRange
from openpyxl.worksheet.dimensions import DimensionHolder
from openpyxl.worksheet.filters import AutoFilter, SortState
from openpyxl.worksheet.worksheet import Worksheet
import tldextract
V = TypeVar("Value")
logger = logging.getLogger("desktopenv.metrics.utils")
_xlsx_namespaces = [("oo", "http://schemas.openxmlformats.org/spreadsheetml/2006/main")
, ("x14", "http://schemas.microsoft.com/office/spreadsheetml/2009/9/main")
, ("xm", "http://schemas.microsoft.com/office/excel/2006/main")
]
_xlsx_namespaces = [
("oo", "http://schemas.openxmlformats.org/spreadsheetml/2006/main"),
("x14", "http://schemas.microsoft.com/office/spreadsheetml/2009/9/main"),
("xm", "http://schemas.microsoft.com/office/excel/2006/main")
]
_xlsx_ns_mapping = dict(_xlsx_namespaces)
_xlsx_ns_imapping = dict(map(lambda itm: (itm[1], itm[0]), _xlsx_namespaces))
_xlsx_ns_imapping["http://schemas.openxmlformats.org/spreadsheetml/2006/main"] = None
@ -282,6 +285,13 @@ _shared_str_value_selector = lxml.cssselect.CSSSelector("oo|t", namespaces=_xlsx
def read_cell_value(xlsx_file: str, sheet_name: str, coordinate: str) -> Any:
# read_cell_value {{{ #
logger.debug(f"Reading cell value from {xlsx_file}, sheet: {sheet_name}, coordinate: {coordinate}")
# Check if file exists
if not os.path.exists(xlsx_file):
logger.error(f"Excel file not found: {xlsx_file}")
return None
try:
with zipfile.ZipFile(xlsx_file, "r") as z_f:
try:
@ -308,9 +318,17 @@ def read_cell_value(xlsx_file: str, sheet_name: str, coordinate: str) -> Any:
, namespaces=_xlsx_ns_mapping
)(sheet)
if len(cells) == 0:
logger.debug(f"Cell {coordinate} not found in sheet {sheet_name}")
return None
cell: _Element = cells[0]
except zipfile.BadZipFile:
except zipfile.BadZipFile as e:
logger.error(f"Bad zip file {xlsx_file}: {e}")
return None
except KeyError as e:
logger.error(f"Sheet {sheet_name} not found in {xlsx_file}: {e}")
return None
except Exception as e:
logger.error(f"Error reading {xlsx_file}: {e}")
return None
cell: Dict[str, str] = xmltodict.parse(lxml.etree.tostring(cell, encoding="unicode")
@ -328,6 +346,8 @@ def read_cell_value(xlsx_file: str, sheet_name: str, coordinate: str) -> Any:
return cell["c"]["v"]
if cell["c"]["@t"] == "inlineStr":
return cell["c"]["is"]["t"]
if cell["c"]["@t"] == "e":
return cell["c"]["v"]
except (KeyError, ValueError):
return None
# }}} read_cell_value #
@ -391,6 +411,43 @@ def _read_cell_style(style_name: str, cell: Union[Cell, MergedCell], diff_style:
else:
raise NotImplementedError("Unsupported Style: {:}".format(style_name))
def _process_xlsx_cf_operator(operator: str, value: Any, ref: List[Any]) -> bool:
# function _process_xlsx_cf_operator {{{ #
# "containsText", "lessThanOrEqual", "notBetween", "lessThan", "notContains", "beginsWith", "equal", "greaterThanOrEqual", "between", "endsWith", "notEqual", "greaterThan"
try:
if operator=="lessThanOrEqual":
result: bool = value<=ref[0]
elif operator=="lessThan":
result: bool = value<ref[0]
elif operator=="equal":
result: bool = value==ref[0]
elif operator=="greaterThanOrEqual":
result: bool = value>=ref[0]
elif operator=="notEqual":
result: bool = value!=ref[0]
elif operator=="greaterThan":
result: bool = value>ref[0]
elif operator=="between":
small_one: float
large_one: float
small_one, large_one = min(ref), max(ref)
result: bool = value>=small_one and value<=large_one
elif operator=="notBetween":
small_one: float
large_one: float
small_one, large_one = min(ref), max(ref)
result: bool = value<small_one or value>large_one
else:
#raise NotImplementedError("Not Implemented CondFormat Operator: {:}".format(operator))
logger.exception("Not Implemented CondFormat Operator: {:}".format(operator))
return result
except TypeError:
logger.exception("Unmatched type of %s and %s. Auto to False", repr(value), repr(ref))
return False
except IndexError:
logger.exception("ref array doesn't have enough elements. Auto to False: %s", repr(ref))
return False
# }}} function _process_xlsx_cf_operator #
_absolute_range_pattern: Pattern[str] = re.compile(r"""\$(?P<col1>[A-Z]{1,3})\$(?P<row1>\d+) # coord1
(?::
@ -441,12 +498,23 @@ def load_xlsx_styles(xlsx_file: Workbook, sheet_name: str, book_name: str, **opt
for fmt in conditional_formattings:
for r in fmt.rules:
active_cells: List[Cell] = []
if r.type == "expression":
condition: Callable[[str], bool] = formula_parser.ast("=" + r.formula[0])[1].compile()
logger.debug("Expression condition: %s", r.formula[0])
# Process CF Formulae {{{ #
formulae: List[Callable[[Any], Any]] = []
argument_lists: List[List[Any]] = []
has_error = False
for fml in r.formula:
try:
formula_func: Callable[[Any], Any] =\
formula_parser.ast("=" + fml)[1].compile()
logger.debug("CondFormat rule formula: %s", fml)
except:
logger.exception("Formula parsing error: %s. Skipping.", repr(fml))
has_error = True
break
arguments: List[Any] = []
absolute_range_match: List[Tuple[str, str, str, str]] = _absolute_range_pattern.findall(r.formula[0])
absolute_range_match: List[Tuple[str, str, str, str]] = _absolute_range_pattern.findall(fml)
for m in absolute_range_match:
logger.debug("Absolute ranges: %s", repr(m))
if m[2] is None and m[3] is None:
@ -462,25 +530,65 @@ def load_xlsx_styles(xlsx_file: Workbook, sheet_name: str, book_name: str, **opt
)
logger.debug("Absolute range arguments: %s", repr(arguments))
nb_contiguous_nothings = 0
for rge in fmt.cells:
for c in rge.cells:
cell: Cell = worksheet.cell(row=c[0], column=c[1])
cell_value = read_cell_value(book_name, sheet_name
, coordinate="{:}{:d}".format(get_column_letter(c[1])
, c[0]
)
)
if cell_value is None:
nb_contiguous_nothings += 1
if nb_contiguous_nothings>50:
break
continue
elif condition(cell_value, *arguments):
logger.debug("Active Cell %s(%s) for %s", repr(cell), str(cell_value), r.formula[0])
active_cells.append(cell)
formulae.append(formula_func)
argument_lists.append(arguments)
if has_error:
continue
# }}} Process CF Formulae #
# Process Condition Accroding to Type {{{ #
if r.type in { "expression"
, "containsText", "notContainsText"
, "endsWith", "beginsWith"
, "containsErrors", "notContainsErrors"
}:
condition: Callable[[Any], bool] = formulae[0]
arguments: List[Any] = argument_lists[0]
is_active: Callable[[Any], bool] = lambda v: condition(v, *arguments)
elif r.type == "cellIs":
operator: str = r.operator
try:
references: List[Any] = [fml() for fml in formulae]
except:
logger.exception("Error occurs while calculating reference values for cellIs condition formatting.")
continue
is_active: Callable[[Any], bool] =\
lambda v: _process_xlsx_cf_operator(operator, v, references)
else:
raise NotImplementedError("Not Implemented Condition Type: {:}".format(r.type))
#raise NotImplementedError("Not Implemented Condition Type: {:}".format(r.type))
# e.g., type=top10 (rank=number, percent=bool, bottom=bool)
# type=aboveAverage (equalAverage=bool, aboveAverage=bool)
# type=duplicateValues / type=uniqueValues
logger.exception("Not Implemented Condition Type: {:}".format(r.type))
# }}} Process Condition Accroding to Type #
# Test Each Cell {{{ #
nb_contiguous_nothings = 0
for rge in fmt.cells:
for c in rge.cells:
cell: Cell = worksheet.cell(row=c[0], column=c[1])
cell_value = read_cell_value(book_name, sheet_name
, coordinate="{:}{:d}".format(get_column_letter(c[1])
, c[0]
)
)
if cell_value is None:
nb_contiguous_nothings += 1
if nb_contiguous_nothings>50:
break
continue
else:
try:
satisfies_condition: bool = is_active(cell_value)
except:
logger.exception("Error in formula calculation with cell value %d", repr(cell_value))
satisfies_condition = False
if satisfies_condition:
logger.debug("Active Cell %s(%s) for %s", repr(cell), repr(cell_value), r.formula[0])
active_cells.append(cell)
# }}} Test Each Cell #
for c in active_cells:
style_dict[c.coordinate] = [_read_cell_style(st, c, r.dxf) for st in concerned_styles]
@ -672,29 +780,59 @@ def are_lists_equal(list1, list2, comparison_func):
return True
def compare_urls(url1, url2):
def compare_urls(url1, url2, full=True):
if url1 is None or url2 is None:
return url1 == url2
logger.info(f"compare_urls. url1: {url1}; url2: {url2}")
def parse_with_default_scheme(url):
"""
Ensure the URL has a scheme. If not, prepend 'http://'
so it parses as host + path instead of just a path.
"""
# Regex to check if URL has scheme like 'http://', 'https://', etc.
if not re.match(r'^[a-zA-Z][a-zA-Z0-9+\-.]*://', url):
url = f"http://{url}"
return urlparse(url)
def normalize_url(url):
# Parse the URL
parsed_url = urlparse(url)
# Parse the URL; if no scheme is present, assume 'http'
parsed_url = parse_with_default_scheme(url)
scheme = parsed_url.scheme.lower()
# If no scheme is present, assume 'http'
scheme = parsed_url.scheme if parsed_url.scheme else 'http'
# Extract the domain parts using tldextract
extracted = tldextract.extract(parsed_url.netloc.lower())
# e.g., extracted = TLDExtractResult(subdomain='www', domain='airbnb', suffix='com.sg')
# Drop 'www' if it's the only subdomain
subdomain = extracted.subdomain
if subdomain == 'www':
subdomain = ''
# Lowercase the scheme and netloc, remove 'www.', and handle trailing slash
normalized_netloc = parsed_url.netloc.lower().replace("www.", "")
# Instead of using the suffix (e.g., 'com', 'com.sg'), ignore it completely
# so that both 'airbnb.com' and 'airbnb.com.sg' become just 'airbnb' or 'www.airbnb'
if subdomain:
normalized_netloc = f"{subdomain}.{extracted.domain}"
else:
normalized_netloc = extracted.domain
# Handle trailing slash in the path
normalized_path = parsed_url.path if parsed_url.path != '/' else ''
# Reassemble the URL with normalized components
normalized_parsed_url = parsed_url._replace(scheme=scheme.lower(), netloc=normalized_netloc,
path=normalized_path)
normalized_url = urlunparse(normalized_parsed_url)
# Reassemble the URL with the normalized components
normalized_parsed_url = ParseResult(
scheme=scheme.lower(),
netloc=normalized_netloc,
path=normalized_path,
params=parsed_url.params if full else '', # Keep the params
query=parsed_url.query if full else '', # Keep the query string
fragment=parsed_url.fragment if full else '', # Keep the fragment
)
return urlunparse(normalized_parsed_url)
return normalized_url
# Normalize both URLs for comparison
logger.info(f"After normalization. url1: {normalize_url(url1)}; url2: {normalize_url(url2)}")
# Normalize both URLs
norm_url1 = normalize_url(url1)
norm_url2 = normalize_url(url2)

View File

@ -208,27 +208,151 @@ def is_extension_installed(actual: str, rules: Dict, **options):
def check_python_file_by_test_suite(actual_files, test_file, **options) -> float:
"""Check the python file by running the test suite in the given test file."""
"""Check the python file by running the test suite in the given test file.
This function is now more robust and handles various error conditions:
- File existence validation
- Module loading errors
- Function execution errors
- Proper resource cleanup
- Working directory management
"""
import os
import uuid
import logging
from pathlib import Path
logger = logging.getLogger(__name__)
test_function_name = options.get('test_function_name', 'test')
# Create a unique module name, it can be arbitrary but must be unique in the current runtime environment
module_name = 'dynamic_module'
# Load the module from the given file path
spec = importlib.util.spec_from_file_location(module_name, test_file)
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module # Add the loaded module to sys.modules
spec.loader.exec_module(module) # Execute the module to make its content available
# Retrieve the function by name from the loaded module and execute it
test_function = getattr(module, test_function_name)
try:
if test_function():
return 1.0
else:
return 0.0
except Exception as e:
# Validate inputs
if not test_file:
logger.error("test_file is None or empty")
return 0.0
# Convert to absolute path and check existence
test_file_path = Path(test_file).resolve()
if not test_file_path.exists():
logger.error(f"Test file does not exist: {test_file_path}")
return 0.0
if not test_file_path.is_file():
logger.error(f"Test file path is not a file: {test_file_path}")
return 0.0
# Create unique module name to avoid conflicts
module_name = f'dynamic_test_module_{uuid.uuid4().hex[:8]}'
# Store original working directory and sys.path
original_cwd = os.getcwd()
original_sys_path = sys.path.copy()
try:
# Change to the directory containing the test file
test_dir = test_file_path.parent
os.chdir(test_dir)
logger.debug(f"Changed working directory to: {test_dir}")
# Add test directory to Python path if not already present
if str(test_dir) not in sys.path:
sys.path.insert(0, str(test_dir))
logger.debug(f"Added {test_dir} to sys.path")
# Try to load the module
try:
spec = importlib.util.spec_from_file_location(module_name, test_file_path)
if spec is None:
logger.error(f"Could not create module spec for {test_file_path}")
return 0.0
if spec.loader is None:
logger.error(f"Module spec has no loader for {test_file_path}")
return 0.0
module = importlib.util.module_from_spec(spec)
if module is None:
logger.error(f"Could not create module from spec for {test_file_path}")
return 0.0
# Add to sys.modules temporarily
sys.modules[module_name] = module
# Execute the module
spec.loader.exec_module(module)
logger.debug(f"Successfully loaded test module: {module_name}")
except SyntaxError as e:
logger.error(f"Syntax error in test file: {e}")
return 0.0
except ImportError as e:
logger.error(f"Import error loading test file: {e}")
return 0.0
except Exception as e:
logger.error(f"Error loading test module: {e}")
return 0.0
# Try to get the test function
try:
if not hasattr(module, test_function_name):
logger.error(f"Test function '{test_function_name}' not found in {test_file_path}")
return 0.0
test_function = getattr(module, test_function_name)
if not callable(test_function):
logger.error(f"'{test_function_name}' is not callable in {test_file_path}")
return 0.0
logger.debug(f"Found test function: {test_function_name}")
except Exception as e:
logger.error(f"Error getting test function: {e}")
return 0.0
# Execute the test function
try:
result = test_function()
logger.debug(f"Test function returned: {result} (type: {type(result)})")
# Handle different return types
if isinstance(result, bool):
return 1.0 if result else 0.0
elif isinstance(result, (int, float)):
# Normalize to 0.0-1.0 range
normalized = max(0.0, min(1.0, float(result)))
if normalized != result:
logger.warning(f"Test result {result} normalized to {normalized}")
return normalized
else:
# For any other type, treat as True if truthy
bool_result = bool(result)
logger.warning(f"Test returned non-boolean/numeric value {result}, treating as {bool_result}")
return 1.0 if bool_result else 0.0
except Exception as e:
logger.error(f"Error executing test function: {e}")
return 0.0
except Exception as e:
logger.error(f"Unexpected error in check_python_file_by_test_suite: {e}")
return 0.0
finally:
# Cleanup: remove the module from sys.modules
if module_name in sys.modules:
del sys.modules[module_name]
logger.debug(f"Cleaned up module: {module_name}")
# Restore original working directory
try:
os.chdir(original_cwd)
logger.debug(f"Restored working directory to: {original_cwd}")
except Exception as e:
logger.warning(f"Could not restore working directory: {e}")
# Restore original sys.path
sys.path[:] = original_sys_path
logger.debug("Restored sys.path")
def check_python_file_by_gold_file(actual_files, gold_file: str, **options) -> float:

View File

@ -31,5 +31,13 @@ def create_vm_manager_and_provider(provider_name: str, region: str, use_proxy: b
from desktop_env.providers.docker.manager import DockerVMManager
from desktop_env.providers.docker.provider import DockerProvider
return DockerVMManager(), DockerProvider(region)
elif provider_name == "aliyun":
from desktop_env.providers.aliyun.manager import AliyunVMManager
from desktop_env.providers.aliyun.provider import AliyunProvider
return AliyunVMManager(), AliyunProvider()
elif provider_name == "volcengine":
from desktop_env.providers.volcengine.manager import VolcengineVMManager
from desktop_env.providers.volcengine.provider import VolcengineProvider
return VolcengineVMManager(), VolcengineProvider()
else:
raise NotImplementedError(f"{provider_name} not implemented!")

View File

@ -0,0 +1,80 @@
# Aliyun ECS Provider Configuration Guide
This guide explains how to configure and use the Aliyun ECS provider for OSWorld desktop environments.
## Configuration Process
1. **Aliyun Account**: You need an active Aliyun Cloud account. This script uses pay-as-you-go billing by default, so ensure your account balance is above 100.
2. **Access Keys**: Create AccessKey ID and AccessKey Secret in Aliyun RAM Access Control Console and grant ECS control permissions
3. **VPC Setup**: Create a VPC, VSwitch, and Security Group in your target region
4. **Custom Images**: Create OSWorld custom images
5. It is recommended to manually complete the ECS creation process once to record all required environment variable information.
## Environment Variables
Set the following environment variables in your `.env` file:
```bash
# Aliyun Access Credentials
ALIYUN_ACCESS_KEY_ID=your_access_key_id
ALIYUN_ACCESS_KEY_SECRET=your_access_key_secret
# ECS Configuration Information
ALIYUN_REGION=eu-central-1
ALIYUN_IMAGE_ID=your_image_id
ALIYUN_INSTANCE_TYPE=ecs.e-c1m2.large
ALIYUN_VSWITCH_ID=vsw-xxxxxxxxx
ALIYUN_SECURITY_GROUP_ID=sg-xxxxxxxxx
```
## Required Aliyun Resources
### 1. VPC and VSwitch
- Create a VPC in your target region
- Create a VSwitch within the VPC
- Ensure the VSwitch has internet access for VNC connectivity
### 2. Security Group
**⚠️ Important**: Please strictly follow the port settings below to prevent OSWorld tasks from failing due to connection issues:
#### Inbound Rules (8 rules required)
| Type | Protocol | Port Range | Source | Description |
|------|----------|------------|--------|-------------|
| SSH | TCP | 22 | 0.0.0.0/0 | SSH access |
| HTTP | TCP | 80 | 172.31.0.0/16 | HTTP traffic |
| Custom TCP | TCP | 5000 | 172.31.0.0/16 | OSWorld backend service |
| Custom TCP | TCP | 5910 | 0.0.0.0/0 | NoVNC visualization port |
| Custom TCP | TCP | 8006 | 172.31.0.0/16 | VNC service port |
| Custom TCP | TCP | 8080 | 172.31.0.0/16 | VLC service port |
| Custom TCP | TCP | 8081 | 172.31.0.0/16 | Additional service port |
| Custom TCP | TCP | 9222 | 172.31.0.0/16 | Chrome control port |
#### Outbound Rules (1 rule required)
| Type | Protocol | Port Range | Destination | Description |
|------|----------|------------|-------------|-------------|
| All traffic | All | All | 0.0.0.0/0 | Allow all outbound traffic |
### 3. Custom Images
You need to create a custom OSWorld image for Aliyun ECS. Please follow the instructions in the "Creating Custom ECS Images for OSWorld" section.
## Creating Custom ECS Images for OSWorld
This section provides guidance on how to create the custom ECS images required for OSWorld desktop environments. The process involves setting up a base instance with desktop environment and VNC server, then creating a custom image from it.
### Step-by-Step Image Creation Process
#### Step 1: Upload existing qcow2 image to Aliyun
- Download the provided qcow2 image from the link in `desktop_env/providers/docker/manager.py`: https://huggingface.co/datasets/xlangai/ubuntu_osworld/resolve/main/Ubuntu.qcow2.zip
- Unzip the downloaded file and upload it to Aliyun Object Storage Service (OSS). Make sure the OSS is in the same region as your target region to launch ECS instance.
- In your ECS dashboard, go to "Images" and You will see the "Import Image" button. Click it and follow the instructions to import the qcow2 image from OSS.
- After the import is complete, you will see the imported image in the "Images" list.
#### Step 2: Create a new image
Note that the image you created in Step 1 will have a different resolution than the one you want to use for OSWorld (1920x1080). We need to customize the image to have the correct resolution and setup noVNC.
- Go to `Instances` tab and create a new instance with the imported image.
- Connect to the running instance via VNC.
- After connecting to the instance, please open the terminal and download this configuration script: `https://gist.githubusercontent.com/qykong/bea58ff98f20057d3a69921276dd4553/raw/cd1a91a0840c4192d793f43cfb90553370343b08/config.sh`.
- If you want ssh and vnc password also be setup, use this `https://huggingface.co/datasets/xlangai/ubuntu_osworld/resolve/main/aliyun_config.sh?download=true`.
- Run the script and reboot your instance.
- After rebooting, the instance will have the correct resolution and noVNC setup. You can connect to the instance via "http://<your_instance_public_ip>:5910/vnc.html" (make sure your security group allows port 5910).
- Save the running instance as a new image. The new image will be used as the OSWorld image.

View File

@ -0,0 +1,82 @@
# 阿里云ECS提供商配置指南
本指南介绍如何为OSWorld桌面环境配置和使用阿里云ECS。
## 配置流程
1. **阿里云账户**您需要一个有效的阿里云账户本脚本默认ECS通过按量付费方式拉起需保证账户余额在100以上。
2. **访问密钥**在阿里云RAM访问控制控制台中创建AccessKey ID和AccessKey Secret并授权ECS控制权限
3. **VPC设置**在目标地域创建VPC、交换机和安全组
4. **自定义镜像**创建OSWorld自定义镜像。
5. 建议手动完成一次ECS创建流程后记录所有需要的环境变量信息。
## 环境变量
在您的`.env`文件中设置以下环境变量:
```bash
# 阿里云访问凭证
ALIYUN_ACCESS_KEY_ID=your_access_key_id
ALIYUN_ACCESS_KEY_SECRET=your_access_key_secret
# ECS配置信息
ALIYUN_REGION=eu-central-1
ALIYUN_IMAGE_ID=your_image_id
ALIYUN_INSTANCE_TYPE=ecs.e-c1m2.large
ALIYUN_VSWITCH_ID=vsw-xxxxxxxxx
ALIYUN_SECURITY_GROUP_ID=sg-xxxxxxxxx
```
## 所需阿里云资源
### 1. VPC和交换机
- 在目标地域创建VPC
- 在VPC内创建交换机
- 确保交换机具有互联网访问能力以支持VNC连接
### 2. 安全组
**⚠️ 重要提示**请严格按照以下端口设置以防止OSWorld任务因连接问题而失败
#### 入方向规则需要8条规则
| 类型 | 协议 | 端口范围 | 源地址 | 描述 |
|------|------|----------|--------|------|
| SSH | TCP | 22 | 0.0.0.0/0 | SSH访问 |
| HTTP | TCP | 80 | 172.31.0.0/16 | HTTP流量 |
| 自定义TCP | TCP | 5000 | 172.31.0.0/16 | OSWorld后端服务 |
| 自定义TCP | TCP | 5910 | 0.0.0.0/0 | NoVNC可视化端口 |
| 自定义TCP | TCP | 8006 | 172.31.0.0/16 | VNC服务端口 |
| 自定义TCP | TCP | 8080 | 172.31.0.0/16 | VLC服务端口 |
| 自定义TCP | TCP | 8081 | 172.31.0.0/16 | 附加服务端口 |
| 自定义TCP | TCP | 9222 | 172.31.0.0/16 | Chrome控制端口 |
#### 出方向规则需要1条规则
| 类型 | 协议 | 端口范围 | 目标地址 | 描述 |
|------|------|----------|----------|------|
| 全部流量 | 全部 | 全部 | 0.0.0.0/0 | 允许所有出站流量 |
### 3. 自定义镜像
您需要为阿里云ECS创建自定义OSWorld镜像。请按照"为OSWorld创建自定义ECS镜像"部分的说明进行操作。
## 为OSWorld创建自定义ECS镜像
本部分提供如何创建OSWorld桌面环境所需的自定义ECS镜像的指导。该过程包括设置带有桌面环境和VNC服务器的基础实例然后从中创建自定义镜像。
### 分步镜像创建过程
#### 步骤1上传现有qcow2镜像到阿里云
- 从`desktop_env/providers/docker/manager.py`中的链接下载提供的qcow2镜像https://huggingface.co/datasets/xlangai/ubuntu_osworld/resolve/main/Ubuntu.qcow2.zip
- 解压下载的文件并上传到阿里云对象存储服务OSS。确保OSS与您要启动ECS实例的目标地域在同一地域。
- 在您的ECS控制台中转到"镜像"页面,您将看到"导入镜像"按钮。点击它并按照说明从OSS导入qcow2镜像。
- 导入完成后,您将在"镜像"列表中看到导入的镜像。
#### 步骤2创建新镜像
请注意您在步骤1中创建的镜像分辨率与您想要用于OSWorld的分辨率1920x1080不同。我们需要自定义镜像以具有正确的分辨率并设置noVNC。
- 转到"实例"选项卡,使用导入的镜像创建新实例。
- 通过VNC连接到正在运行的实例。
- 连接到实例后,请打开终端并下载此配置脚本:`https://gist.githubusercontent.com/qykong/bea58ff98f20057d3a69921276dd4553/raw/cd1a91a0840c4192d793f43cfb90553370343b08/config.sh`。
- 如果您还想设置ssh和vnc密码请使用此脚本 `https://huggingface.co/datasets/xlangai/ubuntu_osworld/resolve/main/aliyun_config.sh?download=true`
- 运行脚本并重启您的实例。
- 重启后实例将具有正确的分辨率和noVNC设置。您可以通过"http://<your_instance_public_ip>:5910/vnc.html"连接到实例确保您的安全组允许端口5910
- 将正在运行的实例保存为新镜像。新镜像将用作OSWorld镜像。

View File

View File

@ -0,0 +1,31 @@
import os
# Default TTL minutes for instance auto-release (Aliyun-side)
# Can be overridden via environment variable DEFAULT_TTL_MINUTES
# ATTENTION: ECS requires TTL to be at least 30 minutes (if TTL > 0)
MIN_TTL_MINUTES: int = 30
_ttl_env_str = os.getenv("DEFAULT_TTL_MINUTES", "60")
try:
_ttl_env_val = int(_ttl_env_str)
except Exception:
_ttl_env_val = 60
# If TTL is positive but less than Aliyun minimum, clamp to 30 minutes
if _ttl_env_val > 0 and _ttl_env_val < MIN_TTL_MINUTES:
DEFAULT_TTL_MINUTES: int = MIN_TTL_MINUTES
else:
DEFAULT_TTL_MINUTES: int = _ttl_env_val
# Master switch for TTL feature
ENABLE_TTL: bool = os.getenv("ENABLE_TTL", "true").lower() == "true"
def compute_ttl_seconds(ttl_minutes: int) -> int:
try:
return max(0, int(ttl_minutes) * 60)
except Exception:
return 0

View File

@ -0,0 +1,325 @@
import os
import logging
import dotenv
import time
import signal
import requests
from datetime import datetime, timedelta, timezone
from alibabacloud_ecs20140526.client import Client as ECSClient
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_ecs20140526 import models as ecs_models
from alibabacloud_tea_util.client import Client as UtilClient
from desktop_env.providers.base import VMManager
from desktop_env.providers.aliyun.config import ENABLE_TTL, DEFAULT_TTL_MINUTES
dotenv.load_dotenv()
for env_name in [
"ALIYUN_REGION",
"ALIYUN_VSWITCH_ID",
"ALIYUN_SECURITY_GROUP_ID",
"ALIYUN_IMAGE_ID",
"ALIYUN_ACCESS_KEY_ID",
"ALIYUN_ACCESS_KEY_SECRET",
"ALIYUN_INSTANCE_TYPE",
]:
if not os.getenv(env_name):
raise EnvironmentError(f"{env_name} must be set in the environment variables.")
logger = logging.getLogger("desktopenv.providers.aliyun.AliyunVMManager")
logger.setLevel(logging.INFO)
ALIYUN_INSTANCE_TYPE = os.getenv("ALIYUN_INSTANCE_TYPE")
ALIYUN_ACCESS_KEY_ID = os.getenv("ALIYUN_ACCESS_KEY_ID")
ALIYUN_ACCESS_KEY_SECRET = os.getenv("ALIYUN_ACCESS_KEY_SECRET")
ALIYUN_REGION = os.getenv("ALIYUN_REGION")
ALIYUN_IMAGE_ID = os.getenv("ALIYUN_IMAGE_ID")
ALIYUN_SECURITY_GROUP_ID = os.getenv("ALIYUN_SECURITY_GROUP_ID")
ALIYUN_VSWITCH_ID = os.getenv("ALIYUN_VSWITCH_ID")
ALIYUN_RESOURCE_GROUP_ID = os.getenv("ALIYUN_RESOURCE_GROUP_ID")
WAIT_DELAY = 20
MAX_ATTEMPTS = 15
def _allocate_vm(screen_size=(1920, 1080)):
"""
Allocate a new Aliyun ECS instance
"""
assert screen_size == (1920, 1080), "Only 1920x1080 screen size is supported"
config = open_api_models.Config(
access_key_id=ALIYUN_ACCESS_KEY_ID,
access_key_secret=ALIYUN_ACCESS_KEY_SECRET,
region_id=ALIYUN_REGION,
)
client = ECSClient(config)
instance_id = None
original_sigint_handler = signal.getsignal(signal.SIGINT)
original_sigterm_handler = signal.getsignal(signal.SIGTERM)
def signal_handler(sig, frame):
if instance_id:
signal_name = "SIGINT" if sig == signal.SIGINT else "SIGTERM"
logger.warning(
f"Received {signal_name} signal, terminating instance {instance_id}..."
)
try:
delete_request = ecs_models.DeleteInstancesRequest(
region_id=ALIYUN_REGION,
instance_ids=UtilClient.to_jsonstring([instance_id]),
force=True,
)
client.delete_instances(delete_request)
logger.info(
f"Successfully terminated instance {instance_id} after {signal_name}."
)
except Exception as cleanup_error:
logger.error(
f"Failed to terminate instance {instance_id} after {signal_name}: {str(cleanup_error)}"
)
# Restore original signal handlers
signal.signal(signal.SIGINT, original_sigint_handler)
signal.signal(signal.SIGTERM, original_sigterm_handler)
# Raise appropriate exception based on signal type
if sig == signal.SIGINT:
raise KeyboardInterrupt
else:
# For SIGTERM, exit gracefully
import sys
sys.exit(0)
try:
# Set up signal handlers for both SIGINT and SIGTERM
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
logger.info(
f"Creating new ECS instance in region {ALIYUN_REGION} with image {ALIYUN_IMAGE_ID}"
)
# TTL configuration
ttl_enabled = ENABLE_TTL
ttl_minutes = DEFAULT_TTL_MINUTES
ttl_seconds = max(0, int(ttl_minutes) * 60)
# Aliyun constraints: at least 30 minutes in the future, ISO8601 UTC, seconds must be 00
now_utc = datetime.now(timezone.utc)
min_eta = now_utc + timedelta(minutes=30)
raw_eta = now_utc + timedelta(seconds=ttl_seconds)
effective_eta = raw_eta if raw_eta > min_eta else min_eta
# round up to the next full minute, zero seconds
effective_eta = (effective_eta + timedelta(seconds=59)).replace(second=0, microsecond=0)
auto_release_str = effective_eta.strftime('%Y-%m-%dT%H:%M:%SZ')
logger.info(
f"TTL config: enabled={ttl_enabled}, minutes={ttl_minutes}, seconds={ttl_seconds}, ETA(UTC)={auto_release_str}"
)
# Create instance request (attempt with auto_release_time first when TTL enabled)
def _build_request(with_ttl: bool) -> ecs_models.RunInstancesRequest:
kwargs = dict(
region_id=ALIYUN_REGION,
image_id=ALIYUN_IMAGE_ID,
instance_type=ALIYUN_INSTANCE_TYPE,
security_group_id=ALIYUN_SECURITY_GROUP_ID,
v_switch_id=ALIYUN_VSWITCH_ID,
instance_name=f"OSWorld-Desktop-{int(time.time())}",
description="OSWorld Desktop Environment Instance",
internet_max_bandwidth_out=10,
internet_charge_type="PayByTraffic",
instance_charge_type="PostPaid",
system_disk=ecs_models.RunInstancesRequestSystemDisk(
size="50",
category="cloud_essd",
),
deletion_protection=False,
)
if ALIYUN_RESOURCE_GROUP_ID:
kwargs["resource_group_id"] = ALIYUN_RESOURCE_GROUP_ID
if with_ttl and ttl_enabled and ttl_seconds > 0:
kwargs["auto_release_time"] = auto_release_str
return ecs_models.RunInstancesRequest(**kwargs)
try:
request = _build_request(with_ttl=True)
response = client.run_instances(request)
except Exception as create_err:
# Retry without auto_release_time if creation-time TTL is rejected
logger.warning(
f"RunInstances with auto_release_time failed: {create_err}. Retrying without TTL field..."
)
request = _build_request(with_ttl=False)
response = client.run_instances(request)
instance_ids = response.body.instance_id_sets.instance_id_set
if not instance_ids:
raise RuntimeError(
"Failed to create ECS instance - no instance ID returned"
)
instance_id = instance_ids[0]
logger.info(f"ECS instance {instance_id} created successfully")
# Wait for the instance to be running
logger.info(f"Waiting for instance {instance_id} to be running...")
_wait_for_instance_running(client, instance_id)
logger.info(f"Instance {instance_id} is now running and ready")
except KeyboardInterrupt:
logger.warning("VM allocation interrupted by user (SIGINT).")
if instance_id:
logger.info(f"Terminating instance {instance_id} due to interruption.")
try:
delete_request = ecs_models.DeleteInstancesRequest(
region_id=ALIYUN_REGION,
instance_ids=UtilClient.to_jsonstring([instance_id]),
force=True,
)
client.delete_instances(delete_request)
except Exception as cleanup_error:
logger.error(
f"Failed to cleanup instance {instance_id}: {str(cleanup_error)}"
)
raise
except Exception as e:
logger.error(f"Failed to allocate ECS instance: {str(e)}")
if instance_id:
logger.info(f"Terminating instance {instance_id} due to an error.")
try:
delete_request = ecs_models.DeleteInstancesRequest(
region_id=ALIYUN_REGION,
instance_ids=UtilClient.to_jsonstring([instance_id]),
force=True,
)
client.delete_instances(delete_request)
except Exception as cleanup_error:
logger.error(
f"Failed to cleanup instance {instance_id}: {str(cleanup_error)}"
)
raise
finally:
# Restore original signal handlers
signal.signal(signal.SIGINT, original_sigint_handler)
signal.signal(signal.SIGTERM, original_sigterm_handler)
return instance_id
def _wait_for_instance_running(
client: ECSClient, instance_id: str, max_attempts: int = MAX_ATTEMPTS
):
"""Wait for instance to reach Running state"""
for _ in range(max_attempts):
try:
req = ecs_models.DescribeInstancesRequest(
region_id=ALIYUN_REGION,
instance_ids=UtilClient.to_jsonstring([instance_id]),
)
response = client.describe_instances(req)
if response.body.instances.instance:
instance = response.body.instances.instance[0]
status = instance.status
logger.info(f"Instance {instance_id} status: {status}")
if status == "Running":
return
elif status in ["Stopped", "Stopping"]:
start_req = ecs_models.StartInstanceRequest(instance_id=instance_id)
client.start_instance(start_req)
logger.info(f"Started instance {instance_id}")
time.sleep(WAIT_DELAY)
except Exception as e:
logger.warning(f"Error checking instance status: {e}")
time.sleep(WAIT_DELAY)
raise TimeoutError(
f"Instance {instance_id} did not reach Running state within {max_attempts * WAIT_DELAY} seconds"
)
def _wait_until_server_ready(public_ip: str):
"""Wait until the server is ready"""
for _ in range(MAX_ATTEMPTS):
try:
logger.info(f"Checking server status on {public_ip}...")
response = requests.get(f"http://{public_ip}:5000/", timeout=2)
if response.status_code == 404:
logger.info(f"Server {public_ip} is ready")
return
except Exception:
time.sleep(WAIT_DELAY)
raise TimeoutError(
f"Server {public_ip} did not respond within {MAX_ATTEMPTS * WAIT_DELAY} seconds"
)
class AliyunVMManager(VMManager):
"""
Aliyun ECS VM Manager for managing virtual machines on Aliyun Cloud.
Aliyun ECS does not need to maintain a registry of VMs, as it can dynamically allocate and deallocate VMs.
"""
def __init__(self, **kwargs):
self.initialize_registry()
def initialize_registry(self, **kwargs):
pass
def add_vm(self, vm_path, lock_needed=True, **kwargs):
pass
def _add_vm(self, vm_path):
pass
def delete_vm(self, vm_path, lock_needed=True, **kwargs):
pass
def _delete_vm(self, vm_path):
pass
def occupy_vm(self, vm_path, pid, lock_needed=True, **kwargs):
pass
def _occupy_vm(self, vm_path, pid):
pass
def check_and_clean(self, lock_needed=True, **kwargs):
pass
def _check_and_clean(self):
pass
def list_free_vms(self, lock_needed=True, **kwargs):
pass
def _list_free_vms(self):
pass
def get_vm_path(self, screen_size=(1920, 1080), **kwargs):
"""Get a VM path (instance ID) for use"""
logger.info(
f"Allocating new ECS instance in region {ALIYUN_REGION} with screen size {screen_size}"
)
try:
instance_id = _allocate_vm(screen_size)
logger.info(f"Successfully allocated instance {instance_id}")
return instance_id
except Exception as e:
logger.error(f"Failed to allocate instance: {str(e)}")
raise

View File

@ -0,0 +1,224 @@
import os
import logging
from datetime import datetime
from alibabacloud_ecs20140526.client import Client as ECSClient
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_ecs20140526 import models as ecs_models
from alibabacloud_tea_util.client import Client as UtilClient
from desktop_env.providers.base import Provider
from desktop_env.providers.aliyun.manager import (
_allocate_vm,
_wait_for_instance_running,
_wait_until_server_ready,
)
logger = logging.getLogger("desktopenv.providers.aliyun.AliyunProvider")
logger.setLevel(logging.INFO)
class AliyunProvider(Provider):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.region = os.getenv("ALIYUN_REGION", "eu-central-1")
self.client = self._create_client()
# Whether to use private IP instead of public IP. Default: enabled.
# Priority: explicit kwarg > env var ALIYUN_USE_PRIVATE_IP > default True
env_use_private = os.getenv("ALIYUN_USE_PRIVATE_IP", "1").lower() in {"1", "true", "yes", "on"}
kw_flag = kwargs.get("use_private_ip", None)
self.use_private_ip = env_use_private if kw_flag is None else bool(kw_flag)
def _create_client(self) -> ECSClient:
config = open_api_models.Config(
access_key_id=os.getenv("ALIYUN_ACCESS_KEY_ID"),
access_key_secret=os.getenv("ALIYUN_ACCESS_KEY_SECRET"),
region_id=self.region,
)
return ECSClient(config)
def start_emulator(self, path_to_vm: str, headless: bool, *args, **kwargs):
logger.info("Starting Aliyun ECS instance...")
try:
# Check the current state of the instance
response = self._describe_instance(path_to_vm)
if not response.body.instances.instance:
logger.error(f"Instance {path_to_vm} not found")
return
instance = response.body.instances.instance[0]
state = instance.status
logger.info(f"Instance {path_to_vm} current state: {state}")
if state == "Running":
# If the instance is already running, skip starting it
logger.info(
f"Instance {path_to_vm} is already running. Skipping start."
)
return
if state == "Stopped":
# Start the instance if it's currently stopped
req = ecs_models.StartInstanceRequest(instance_id=path_to_vm)
self.client.start_instance(req)
logger.info(f"Instance {path_to_vm} is starting...")
# Wait until the instance reaches 'Running' state
_wait_for_instance_running(self.client, path_to_vm)
logger.info(f"Instance {path_to_vm} is now running.")
else:
# For all other states (Pending, Starting, etc.), log a warning
logger.warning(
f"Instance {path_to_vm} is in state '{state}' and cannot be started."
)
except Exception as e:
logger.error(
f"Failed to start the Aliyun ECS instance {path_to_vm}: {str(e)}"
)
raise
def get_ip_address(self, path_to_vm: str) -> str:
logger.info("Getting Aliyun ECS instance IP address...")
try:
response = self._describe_instance(path_to_vm)
if not response.body.instances.instance:
logger.error(f"Instance {path_to_vm} not found")
return ""
instance = response.body.instances.instance[0]
# Get private and public IP addresses
private_ip = ""
public_ip = ""
if hasattr(instance, "vpc_attributes") and instance.vpc_attributes:
private_ip = (
instance.vpc_attributes.private_ip_address.ip_address[0]
if instance.vpc_attributes.private_ip_address.ip_address
else ""
)
if hasattr(instance, "public_ip_address") and instance.public_ip_address:
public_ip = (
instance.public_ip_address.ip_address[0]
if instance.public_ip_address.ip_address
else ""
)
if hasattr(instance, "eip_address") and instance.eip_address:
public_ip = instance.eip_address.ip_address or public_ip
# Select which IP to use based on configuration
ip_to_use = private_ip if (self.use_private_ip and private_ip) else public_ip
if not ip_to_use:
logger.warning("No usable IP address available (private/public both missing)")
return ""
_wait_until_server_ready(ip_to_use)
if public_ip:
vnc_url = f"http://{public_ip}:5910/vnc.html"
logger.info(f"🖥️ VNC Web Access URL: {vnc_url}")
logger.info("=" * 80)
logger.info(f"📡 Public IP: {public_ip}")
logger.info(f"🏠 Private IP: {private_ip}")
logger.info(f"🔧 Using IP: {'Private' if ip_to_use == private_ip else 'Public'} -> {ip_to_use}")
logger.info("=" * 80)
print(f"\n🌐 VNC Web Access URL: {vnc_url}")
print(
"📍 Please open the above address in the browser "
"for remote desktop access\n"
)
return ip_to_use
except Exception as e:
logger.error(
f"Failed to retrieve IP address for the instance {path_to_vm}: {str(e)}"
)
raise
def save_state(self, path_to_vm: str, snapshot_name: str):
logger.info("Saving Aliyun ECS instance state...")
try:
req = ecs_models.CreateImageRequest(
region_id=self.region,
instance_id=path_to_vm,
image_name=snapshot_name,
description=f"Snapshot created at {datetime.now().isoformat()}",
)
response = self.client.create_image(req)
image_id = response.body.image_id
logger.info(
f"Image {image_id} created successfully from instance {path_to_vm}."
)
return image_id
except Exception as e:
logger.error(
f"Failed to create image from the instance {path_to_vm}: {str(e)}"
)
raise
def revert_to_snapshot(self, path_to_vm: str, snapshot_name: str):
logger.info(
f"Reverting Aliyun ECS instance to snapshot image: {snapshot_name}..."
)
try:
# Step 1: Retrieve the original instance details
response = self._describe_instance(path_to_vm)
if not response.body.instances.instance:
logger.error(f"Instance {path_to_vm} not found")
return
# Step 2: Delete the old instance
req = ecs_models.DeleteInstancesRequest(
region_id=self.region, instance_id=[path_to_vm], force=True
)
self.client.delete_instances(req)
logger.info(f"Old instance {path_to_vm} has been deleted.")
# Step 3: Launch a new instance from the snapshot image
new_instance_id = _allocate_vm()
logger.info(f"Instance {new_instance_id} is ready.")
# Get VNC access information
self.get_ip_address(new_instance_id)
return new_instance_id
except Exception as e:
logger.error(
f"Failed to revert to snapshot {snapshot_name} for the instance {path_to_vm}: {str(e)}"
)
raise
def stop_emulator(self, path_to_vm: str, region: str = None):
logger.info(f"Stopping Aliyun ECS instance {path_to_vm}...")
try:
req = ecs_models.DeleteInstancesRequest(
region_id=self.region, instance_id=[path_to_vm], force=True
)
self.client.delete_instances(req)
logger.info(f"Instance {path_to_vm} has been deleted.")
except Exception as e:
logger.error(
f"Failed to stop the Aliyun ECS instance {path_to_vm}: {str(e)}"
)
raise
def _describe_instance(
self, instance_id: str
) -> ecs_models.DescribeInstancesResponse:
"""Get instance details"""
req = ecs_models.DescribeInstancesRequest(
region_id=self.region, instance_ids=UtilClient.to_jsonstring([instance_id])
)
return self.client.describe_instances(req)

View File

@ -4,18 +4,62 @@
Welcome to the AWS VM Management documentation. Before you proceed with using the code to manage AWS services, please ensure the following variables are set correctly according to your AWS environment.
## Overview
The AWS cloud service architecture consists of a host machine that controls multiple virtual machines (each virtual machine serves as an OSWorld environment, for which we provide AMI images) for testing and potential training purposes. To prevent security breaches, we need to properly configure security groups for both the host machine and virtual machines, as well as configure appropriate subnets.
## Security Group Configuration
### Security Group for OSWorld Virtual Machines
OSWorld requires certain ports to be open, such as port 5000 for backend connections to OSWorld services, port 5910 for VNC visualization, port 9222 for Chrome control, etc. The `AWS_SECURITY_GROUP_ID` variable represents the security group configuration for virtual machines serving as OSWorld environments. Please complete the configuration and set this environment variable to the ID of the configured security group.
**⚠️ Important**: Please strictly follow the port settings below to prevent OSWorld tasks from failing due to connection issues:
#### Inbound Rules (8 rules required)
| Type | Protocol | Port Range | Source | Description |
|------|----------|------------|--------|-------------|
| SSH | TCP | 22 | 0.0.0.0/0 | SSH access |
| HTTP | TCP | 80 | 172.31.0.0/16 | HTTP traffic |
| Custom TCP | TCP | 5000 | 172.31.0.0/16 | OSWorld backend service |
| Custom TCP | TCP | 5910 | 0.0.0.0/0 | NoVNC visualization port |
| Custom TCP | TCP | 8006 | 172.31.0.0/16 | VNC service port |
| Custom TCP | TCP | 8080 | 172.31.0.0/16 | VLC service port |
| Custom TCP | TCP | 8081 | 172.31.0.0/16 | Additional service port |
| Custom TCP | TCP | 9222 | 172.31.0.0/16 | Chrome control port |
#### Outbound Rules (1 rule required)
| Type | Protocol | Port Range | Destination | Description |
|------|----------|------------|-------------|-------------|
| All traffic | All | All | 0.0.0.0/0 | Allow all outbound traffic |
### Host Machine Security Group Configuration
Configure according to your specific requirements. This project provides a monitor service that runs on port 8080 by default. You need to open this port to use this functionality.
## VPC Configuration
To isolate the entire evaluation stack, we run both the host machine and all client virtual machines inside a dedicated VPC. The setup is straightforward:
1. Launch the host instance via the AWS console and note the **VPC ID** and **Subnet ID** shown in its network settings.
2. Export the same **Subnet ID** as the environment variable `AWS_SUBNET_ID` before starting the client code.
```bash
export AWS_SUBNET_ID=subnet-xxxxxxxxxxxxxxxxx
```
(Both the client and host must reside in this subnet for the evaluation to work.)
## Configuration Variables
Thats essentially all the setup you need to perform. From here on, you only have to supply a few extra details and environment variables—just make sure theyre all present in your environment.
You need to assign values to several variables crucial for the operation of these scripts on AWS:
- **`REGISTRY_PATH`**: Sets the file path for VM registration logging.
- Example: `'.aws_vms'`
- **`DEFAULT_REGION`**: Default AWS region where your instances will be launched.
- Example: `"us-east-1"`
- **`IMAGE_ID_MAP`**: Dictionary mapping regions to specific AMI IDs that should be used for instance creation. Here we already set the AMI id to the official OSWorld image of Ubuntu supported by us.
- Formatted as follows:
```python
IMAGE_ID_MAP = {
"us-east-1": "ami-00674d875de9addc1"
"us-east-1": "ami-0d23263edb96951d8"
# Add other regions and corresponding AMIs
}
```
@ -32,7 +76,6 @@ You need to assign values to several variables crucial for the operation of thes
AWS_SECURITY_GROUP_ID=sg-xxxx
```
### AWS CLI Configuration
Before using these scripts, you must configure your AWS CLI with your credentials. This can be done via the following commands:

View File

@ -0,0 +1,21 @@
import os
# Default TTL minutes for instance auto-termination (cloud-side scheduler)
# Can be overridden via environment variable DEFAULT_TTL_MINUTES
DEFAULT_TTL_MINUTES: int = int(os.getenv("DEFAULT_TTL_MINUTES", "180"))
# Master switch for TTL feature
ENABLE_TTL: bool = os.getenv("ENABLE_TTL", "true").lower() == "true"
# EventBridge Scheduler role ARN for scheduling EC2 termination
AWS_SCHEDULER_ROLE_ARN: str = os.getenv("AWS_SCHEDULER_ROLE_ARN", "").strip()
def compute_ttl_seconds(ttl_minutes: int) -> int:
try:
return max(0, int(ttl_minutes) * 60)
except Exception:
return 0

View File

@ -1,10 +1,13 @@
import os
from filelock import FileLock
import boto3
import psutil
import logging
import dotenv
import signal
from datetime import datetime, timedelta, timezone
# TTL configuration
from desktop_env.providers.aws.config import ENABLE_TTL, DEFAULT_TTL_MINUTES, AWS_SCHEDULER_ROLE_ARN
from desktop_env.providers.aws.scheduler_utils import schedule_instance_termination
INSTANCE_TYPE = "t3.xlarge"
@ -36,15 +39,25 @@ DEFAULT_REGION = "us-east-1"
# todo: Add doc for the configuration of image, security group and network interface
# todo: public the AMI images
IMAGE_ID_MAP = {
"us-east-1": "ami-09138bff939f82bd8",
"ap-east-1": "ami-0c092a5b8be4116f5",
"us-east-1": {
(1920, 1080): "ami-0d23263edb96951d8",
# For CoACT-1, uncomment to use the following AMI
# (1920, 1080): "ami-0b505e9d0d99ba88c"
},
"ap-east-1": {
(1920, 1080): "ami-06850864d18fad836"
# Please transfer AMI by yourself from AWS us-east-1 for CoACT-1
}
}
def _allocate_vm(region=DEFAULT_REGION):
def _allocate_vm(region=DEFAULT_REGION, screen_size=(1920, 1080)):
if region not in IMAGE_ID_MAP:
raise ValueError(f"Region {region} is not supported. Supported regions are: {list(IMAGE_ID_MAP.keys())}")
if screen_size not in IMAGE_ID_MAP[region]:
raise ValueError(f"Screen size {screen_size} not supported for region {region}. Supported: {list(IMAGE_ID_MAP[region].keys())}")
ami_id = IMAGE_ID_MAP[region][screen_size]
ec2_client = boto3.client('ec2', region_name=region)
instance_id = None
@ -83,12 +96,20 @@ def _allocate_vm(region=DEFAULT_REGION):
if not os.getenv('AWS_SUBNET_ID'):
raise ValueError("AWS_SUBNET_ID is not set in the environment variables.")
# TTL configuration (cloud-init removed; use cloud-side scheduler only)
ttl_enabled = ENABLE_TTL
ttl_minutes = DEFAULT_TTL_MINUTES
ttl_seconds = max(0, int(ttl_minutes) * 60)
eta_utc = datetime.now(timezone.utc) + timedelta(seconds=ttl_seconds)
logger.info(f"TTL config: minutes={ttl_minutes}, seconds={ttl_seconds}, ETA(UTC)={eta_utc.isoformat()}")
run_instances_params = {
"MaxCount": 1,
"MinCount": 1,
"ImageId": IMAGE_ID_MAP[region],
"ImageId": ami_id,
"InstanceType": INSTANCE_TYPE,
"EbsOptimized": True,
"InstanceInitiatedShutdownBehavior": "terminate",
"NetworkInterfaces": [
{
"SubnetId": os.getenv('AWS_SUBNET_ID'),
@ -115,13 +136,20 @@ def _allocate_vm(region=DEFAULT_REGION):
response = ec2_client.run_instances(**run_instances_params)
instance_id = response['Instances'][0]['InstanceId']
# Create TTL schedule immediately after instance is created, to survive early interruptions
try:
# Always attempt; helper resolves ARN via env or role name
if ttl_enabled:
schedule_instance_termination(region, instance_id, ttl_seconds, AWS_SCHEDULER_ROLE_ARN, logger)
except Exception as e:
logger.warning(f"Failed to create EventBridge Scheduler for {instance_id}: {e}")
waiter = ec2_client.get_waiter('instance_running')
logger.info(f"Waiting for instance {instance_id} to be running...")
waiter.wait(InstanceIds=[instance_id])
logger.info(f"Instance {instance_id} is ready.")
# 获取并显示VNC访问地址
try:
instance_details = ec2_client.describe_instances(InstanceIds=[instance_id])
instance = instance_details['Reservations'][0]['Instances'][0]
@ -133,8 +161,8 @@ def _allocate_vm(region=DEFAULT_REGION):
logger.info(f"📡 Public IP: {public_ip}")
logger.info(f"🆔 Instance ID: {instance_id}")
logger.info("="*80)
print(f"\n🌐 VNC访问地址: {vnc_url}")
print(f"📍 请在浏览器中打开上述地址进行远程桌面访问\n")
print(f"\n🌐 VNC Web Access URL: {vnc_url}")
print(f"📍 Please open the above address in the browser for remote desktop access\n")
except Exception as e:
logger.warning(f"Failed to get VNC address for instance {instance_id}: {e}")
except KeyboardInterrupt:
@ -157,60 +185,6 @@ def _allocate_vm(region=DEFAULT_REGION):
return instance_id
def _allocate_vm_with_proxy(region=DEFAULT_REGION, proxy_config_file=None):
"""Allocate a VM with proxy configuration"""
if not PROXY_SUPPORT_AVAILABLE:
logger.warning("Proxy support not available, falling back to regular VM allocation")
return _allocate_vm(region)
from desktop_env.providers.aws.provider_with_proxy import AWSProviderWithProxy
# Initialize proxy pool if needed
if proxy_config_file:
init_proxy_pool(proxy_config_file)
# Get current proxy
proxy_pool = get_global_proxy_pool()
current_proxy = proxy_pool.get_next_proxy()
if current_proxy:
logger.info(f"Allocating VM with proxy: {current_proxy.host}:{current_proxy.port}")
# Create provider instance
provider = AWSProviderWithProxy(region=region, proxy_config_file=proxy_config_file)
# Create new instance
instance_id = provider.create_instance_with_proxy(
image_id=IMAGE_ID_MAP[region],
instance_type=INSTANCE_TYPE,
security_groups=[os.getenv('AWS_SECURITY_GROUP_ID')],
subnet_id=os.getenv('AWS_SUBNET_ID')
)
try:
ec2_client = boto3.client('ec2', region_name=region)
instance_details = ec2_client.describe_instances(InstanceIds=[instance_id])
instance = instance_details['Reservations'][0]['Instances'][0]
public_ip = instance.get('PublicIpAddress', '')
if public_ip:
vnc_url = f"http://{public_ip}:5910/vnc.html"
logger.info("="*80)
logger.info(f"🖥️ VNC Web Access URL: {vnc_url}")
logger.info(f"📡 Public IP: {public_ip}")
logger.info(f"🆔 Instance ID: {instance_id}")
if current_proxy:
logger.info(f"🌐 Proxy: {current_proxy.host}:{current_proxy.port}")
logger.info("="*80)
print(f"\n🌐 VNC Web Access URL: {vnc_url}")
if current_proxy:
print(f"🔄 Current Proxy: {current_proxy.host}:{current_proxy.port}")
print(f"📍 Please open the above address in the browser for remote desktop access\n")
except Exception as e:
logger.warning(f"Failed to get VNC address for proxy instance {instance_id}: {e}")
return instance_id
class AWSVMManager(VMManager):
"""
AWS VM Manager for managing virtual machines on AWS.
@ -218,15 +192,9 @@ class AWSVMManager(VMManager):
AWS does not need to maintain a registry of VMs, as it can dynamically allocate and deallocate VMs.
This class supports both regular VM allocation and proxy-enabled VM allocation.
"""
def __init__(self, proxy_config_file=None, **kwargs):
self.proxy_config_file = proxy_config_file
def __init__(self, **kwargs):
# self.lock = FileLock(".aws_lck", timeout=60)
self.initialize_registry()
# Initialize proxy pool if proxy configuration is provided
if proxy_config_file and PROXY_SUPPORT_AVAILABLE:
init_proxy_pool(proxy_config_file)
logger.info(f"Proxy pool initialized with config: {proxy_config_file}")
def initialize_registry(self, **kwargs):
pass
@ -261,11 +229,7 @@ class AWSVMManager(VMManager):
def _list_free_vms(self, region=DEFAULT_REGION):
pass
def get_vm_path(self, region=DEFAULT_REGION, **kwargs):
if self.proxy_config_file:
logger.info("Allocating a new VM with proxy configuration in region: {}".format(region))
new_vm_path = _allocate_vm_with_proxy(region, self.proxy_config_file)
else:
logger.info("Allocating a new VM in region: {}".format(region))
new_vm_path = _allocate_vm(region)
def get_vm_path(self, region=DEFAULT_REGION, screen_size=(1920, 1080), **kwargs):
logger.info("Allocating a new VM in region: {}".format(region))
new_vm_path = _allocate_vm(region, screen_size=screen_size)
return new_vm_path

View File

@ -2,11 +2,14 @@ import boto3
from botocore.exceptions import ClientError
import logging
from desktop_env.providers.base import Provider
from datetime import datetime
import os
import time
from datetime import datetime, timedelta, timezone
from desktop_env.providers.base import Provider
# TTL configuration
from desktop_env.providers.aws.config import ENABLE_TTL, DEFAULT_TTL_MINUTES, AWS_SCHEDULER_ROLE_ARN
from desktop_env.providers.aws.scheduler_utils import schedule_instance_termination
logger = logging.getLogger("desktopenv.providers.aws.AWSProvider")
logger.setLevel(logging.INFO)
@ -78,6 +81,7 @@ class AWSProvider(Provider):
logger.warning("No public IP address available for VNC access")
return private_ip_address
# return public_ip_address
return '' # Return an empty string if no IP address is found
except ClientError as e:
logger.error(f"Failed to retrieve IP address for the instance {path_to_vm}: {str(e)}")
@ -104,23 +108,68 @@ class AWSProvider(Provider):
# Step 1: Retrieve the original instance details
instance_details = ec2_client.describe_instances(InstanceIds=[path_to_vm])
instance = instance_details['Reservations'][0]['Instances'][0]
security_groups = [sg['GroupId'] for sg in instance['SecurityGroups']]
subnet_id = instance['SubnetId']
instance_type = instance['InstanceType']
# Resolve security groups with fallbacks
security_groups = [sg['GroupId'] for sg in instance.get('SecurityGroups', []) if 'GroupId' in sg]
if not security_groups:
env_sg = os.getenv('AWS_SECURITY_GROUP_ID')
if env_sg:
security_groups = [env_sg]
logger.info("SecurityGroups missing on instance; using AWS_SECURITY_GROUP_ID from env")
else:
raise ValueError("No security groups found on instance and AWS_SECURITY_GROUP_ID not set")
# Resolve subnet with fallbacks
subnet_id = instance.get('SubnetId')
if not subnet_id:
nis = instance.get('NetworkInterfaces', []) or []
if nis and isinstance(nis, list):
for ni in nis:
if isinstance(ni, dict) and ni.get('SubnetId'):
subnet_id = ni.get('SubnetId')
break
if not subnet_id:
env_subnet = os.getenv('AWS_SUBNET_ID')
if env_subnet:
subnet_id = env_subnet
logger.info("SubnetId missing on instance; using AWS_SUBNET_ID from env")
else:
raise ValueError("SubnetId not available on instance, NetworkInterfaces, or environment")
# Resolve instance type with fallbacks
instance_type = instance.get('InstanceType') or os.getenv('AWS_INSTANCE_TYPE') or 't3.large'
if instance.get('InstanceType') is None:
logger.info(f"InstanceType missing on instance; using '{instance_type}' from env/default")
# Step 2: Terminate the old instance
ec2_client.terminate_instances(InstanceIds=[path_to_vm])
logger.info(f"Old instance {path_to_vm} has been terminated.")
# Step 2: Terminate the old instance (skip if already terminated/shutting-down)
state = (instance.get('State') or {}).get('Name')
if state in ['shutting-down', 'terminated']:
logger.info(f"Old instance {path_to_vm} is already in state '{state}', skipping termination.")
else:
try:
ec2_client.terminate_instances(InstanceIds=[path_to_vm])
logger.info(f"Old instance {path_to_vm} has been terminated.")
except ClientError as e:
error_code = getattr(getattr(e, 'response', {}), 'get', lambda *_: None)('Error', {}).get('Code') if hasattr(e, 'response') else None
if error_code in ['InvalidInstanceID.NotFound', 'IncorrectInstanceState']:
logger.info(f"Ignore termination error for {path_to_vm}: {error_code}")
else:
raise
# Step 3: Launch a new instance from the snapshot(AMI) with performance optimization
logger.info(f"Launching a new instance from AMI {snapshot_name}...")
# TTL configuration follows the same env flags as allocation (centralized)
enable_ttl = ENABLE_TTL
default_ttl_minutes = DEFAULT_TTL_MINUTES
ttl_seconds = max(0, default_ttl_minutes * 60)
run_instances_params = {
"MaxCount": 1,
"MinCount": 1,
"ImageId": snapshot_name,
"InstanceType": instance_type,
"EbsOptimized": True,
"InstanceInitiatedShutdownBehavior": "terminate",
"NetworkInterfaces": [
{
"SubnetId": subnet_id,
@ -150,7 +199,40 @@ class AWSProvider(Provider):
ec2_client.get_waiter('instance_running').wait(InstanceIds=[new_instance_id])
logger.info(f"Instance {new_instance_id} is ready.")
# Schedule cloud-side termination via EventBridge Scheduler (auto-resolve role ARN)
try:
if enable_ttl:
schedule_instance_termination(self.region, new_instance_id, ttl_seconds, AWS_SCHEDULER_ROLE_ARN, logger)
except Exception as e:
logger.warning(f"Failed to create EventBridge Scheduler for {new_instance_id}: {e}")
# Schedule cloud-side termination via EventBridge Scheduler (same as allocation path)
try:
if enable_ttl and os.getenv('AWS_SCHEDULER_ROLE_ARN'):
scheduler_client = boto3.client('scheduler', region_name=self.region)
schedule_name = f"osworld-ttl-{new_instance_id}-{int(time.time())}"
eta_scheduler = datetime.now(timezone.utc) + timedelta(seconds=ttl_seconds)
schedule_expression = f"at({eta_scheduler.strftime('%Y-%m-%dT%H:%M:%S')})"
target_arn = "arn:aws:scheduler:::aws-sdk:ec2:terminateInstances"
input_payload = '{"InstanceIds":["' + new_instance_id + '"]}'
scheduler_client.create_schedule(
Name=schedule_name,
ScheduleExpression=schedule_expression,
FlexibleTimeWindow={"Mode": "OFF"},
Target={
"Arn": target_arn,
"RoleArn": os.getenv('AWS_SCHEDULER_ROLE_ARN'),
"Input": input_payload
},
State='ENABLED',
Description=f"OSWorld TTL terminate for {new_instance_id}"
)
logger.info(f"Scheduled EC2 termination via EventBridge Scheduler for snapshot revert: name={schedule_name}, when={eta_scheduler.isoformat()} (UTC)")
else:
logger.info("TTL enabled but AWS_SCHEDULER_ROLE_ARN not set; skipping scheduler for snapshot revert.")
except Exception as e:
logger.warning(f"Failed to create EventBridge Scheduler for {new_instance_id}: {e}")
try:
instance_details = ec2_client.describe_instances(InstanceIds=[new_instance_id])
instance = instance_details['Reservations'][0]['Instances'][0]

View File

@ -1,315 +0,0 @@
import boto3
from botocore.exceptions import ClientError
import base64
import logging
import json
from typing import Optional
from desktop_env.providers.base import Provider
from desktop_env.providers.aws.proxy_pool import get_global_proxy_pool, init_proxy_pool, ProxyInfo
logger = logging.getLogger("desktopenv.providers.aws.AWSProviderWithProxy")
logger.setLevel(logging.INFO)
WAIT_DELAY = 15
MAX_ATTEMPTS = 10
class AWSProviderWithProxy(Provider):
def __init__(self, region: str = None, proxy_config_file: str = None):
super().__init__(region)
self.current_proxy: Optional[ProxyInfo] = None
# 初始化代理池
if proxy_config_file:
init_proxy_pool(proxy_config_file)
logger.info(f"Initialized proxy pool from {proxy_config_file}")
# 获取下一个可用代理
self._rotate_proxy()
def _rotate_proxy(self):
"""轮换到下一个可用代理"""
proxy_pool = get_global_proxy_pool()
self.current_proxy = proxy_pool.get_next_proxy()
if self.current_proxy:
logger.info(f"Switched to proxy: {self.current_proxy.host}:{self.current_proxy.port}")
else:
logger.warning("No proxy available, using direct connection")
def _generate_proxy_user_data(self) -> str:
"""生成包含代理配置的user data脚本"""
if not self.current_proxy:
return ""
proxy_url = self._format_proxy_url(self.current_proxy)
user_data_script = f"""#!/bin/bash
# Configure system proxy
echo 'export http_proxy={proxy_url}' >> /etc/environment
echo 'export https_proxy={proxy_url}' >> /etc/environment
echo 'export HTTP_PROXY={proxy_url}' >> /etc/environment
echo 'export HTTPS_PROXY={proxy_url}' >> /etc/environment
# Configure apt proxy
cat > /etc/apt/apt.conf.d/95proxy << EOF
Acquire::http::Proxy "{proxy_url}";
Acquire::https::Proxy "{proxy_url}";
EOF
# Configure chrome/chromium proxy
mkdir -p /etc/opt/chrome/policies/managed
cat > /etc/opt/chrome/policies/managed/proxy.json << EOF
{{
"ProxyMode": "fixed_servers",
"ProxyServer": "{self.current_proxy.host}:{self.current_proxy.port}"
}}
EOF
# Configure chromium proxy (Ubuntu default)
mkdir -p /etc/chromium/policies/managed
cat > /etc/chromium/policies/managed/proxy.json << EOF
{{
"ProxyMode": "fixed_servers",
"ProxyServer": "{self.current_proxy.host}:{self.current_proxy.port}"
}}
EOF
# Configure firefox proxy - support multiple possible paths
for firefox_dir in /etc/firefox/policies /usr/lib/firefox/distribution/policies /etc/firefox-esr/policies; do
if [ -d "$(dirname "$firefox_dir")" ]; then
mkdir -p "$firefox_dir"
cat > "$firefox_dir/policies.json" << EOF
{{
"policies": {{
"Proxy": {{
"Mode": "manual",
"HTTPProxy": "{self.current_proxy.host}:{self.current_proxy.port}",
"HTTPSProxy": "{self.current_proxy.host}:{self.current_proxy.port}",
"UseHTTPProxyForAllProtocols": true
}}
}}
}}
EOF
break
fi
done
# Reload environment variables
source /etc/environment
# Log proxy configuration
echo "$(date): Configured proxy {self.current_proxy.host}:{self.current_proxy.port}" >> /var/log/proxy-setup.log
"""
return base64.b64encode(user_data_script.encode()).decode()
def _format_proxy_url(self, proxy: ProxyInfo) -> str:
"""格式化代理URL"""
if proxy.username and proxy.password:
return f"{proxy.protocol}://{proxy.username}:{proxy.password}@{proxy.host}:{proxy.port}"
else:
return f"{proxy.protocol}://{proxy.host}:{proxy.port}"
def start_emulator(self, path_to_vm: str, headless: bool, *args, **kwargs):
logger.info("Starting AWS VM with proxy configuration...")
ec2_client = boto3.client('ec2', region_name=self.region)
try:
# 如果实例已经存在,直接启动
ec2_client.start_instances(InstanceIds=[path_to_vm])
logger.info(f"Instance {path_to_vm} is starting...")
# Wait for the instance to be in the 'running' state
waiter = ec2_client.get_waiter('instance_running')
waiter.wait(InstanceIds=[path_to_vm], WaiterConfig={'Delay': WAIT_DELAY, 'MaxAttempts': MAX_ATTEMPTS})
logger.info(f"Instance {path_to_vm} is now running.")
except ClientError as e:
logger.error(f"Failed to start the AWS VM {path_to_vm}: {str(e)}")
raise
def create_instance_with_proxy(self, image_id: str, instance_type: str,
security_groups: list, subnet_id: str) -> str:
"""创建带有代理配置的新实例"""
ec2_client = boto3.client('ec2', region_name=self.region)
user_data = self._generate_proxy_user_data()
run_instances_params = {
"MaxCount": 1,
"MinCount": 1,
"ImageId": image_id,
"InstanceType": instance_type,
"EbsOptimized": True,
"NetworkInterfaces": [
{
"SubnetId": subnet_id,
"AssociatePublicIpAddress": True,
"DeviceIndex": 0,
"Groups": security_groups
}
]
}
if user_data:
run_instances_params["UserData"] = user_data
try:
response = ec2_client.run_instances(**run_instances_params)
instance_id = response['Instances'][0]['InstanceId']
logger.info(f"Created new instance {instance_id} with proxy configuration")
logger.info(f"Waiting for instance {instance_id} to be running...")
ec2_client.get_waiter('instance_running').wait(InstanceIds=[instance_id])
logger.info(f"Instance {instance_id} is ready.")
try:
instance_details = ec2_client.describe_instances(InstanceIds=[instance_id])
instance = instance_details['Reservations'][0]['Instances'][0]
public_ip = instance.get('PublicIpAddress', '')
if public_ip:
vnc_url = f"http://{public_ip}:5910/vnc.html"
logger.info("="*80)
logger.info(f"🖥️ VNC Web Access URL: {vnc_url}")
logger.info(f"📡 Public IP: {public_ip}")
logger.info(f"🆔 Instance ID: {instance_id}")
if self.current_proxy:
logger.info(f"🌐 Proxy: {self.current_proxy.host}:{self.current_proxy.port}")
logger.info("="*80)
print(f"\n🌐 VNC Web Access URL: {vnc_url}")
if self.current_proxy:
print(f"🔄 Current Proxy: {self.current_proxy.host}:{self.current_proxy.port}")
print(f"📍 Please open the above address in the browser for remote desktop access\n")
except Exception as e:
logger.warning(f"Failed to get VNC address for instance {instance_id}: {e}")
return instance_id
except ClientError as e:
logger.error(f"Failed to create instance with proxy: {str(e)}")
if self.current_proxy:
proxy_pool = get_global_proxy_pool()
proxy_pool.mark_proxy_failed(self.current_proxy)
self._rotate_proxy()
raise
def get_ip_address(self, path_to_vm: str) -> str:
logger.info("Getting AWS VM IP address...")
ec2_client = boto3.client('ec2', region_name=self.region)
try:
response = ec2_client.describe_instances(InstanceIds=[path_to_vm])
for reservation in response['Reservations']:
for instance in reservation['Instances']:
private_ip_address = instance.get('PrivateIpAddress', '')
public_ip_address = instance.get('PublicIpAddress', '')
if public_ip_address:
vnc_url = f"http://{public_ip_address}:5910/vnc.html"
logger.info("="*80)
logger.info(f"🖥️ VNC Web Access URL: {vnc_url}")
logger.info(f"📡 Public IP: {public_ip_address}")
logger.info(f"🏠 Private IP: {private_ip_address}")
if self.current_proxy:
logger.info(f"🌐 Proxy: {self.current_proxy.host}:{self.current_proxy.port}")
logger.info("="*80)
print(f"\n🌐 VNC Web Access URL: {vnc_url}")
if self.current_proxy:
print(f"🔄 Current Proxy: {self.current_proxy.host}:{self.current_proxy.port}")
print(f"📍 Please open the above address in the browser for remote desktop access\n")
else:
logger.warning("No public IP address available for VNC access")
return private_ip_address
return ''
except ClientError as e:
logger.error(f"Failed to retrieve IP address for the instance {path_to_vm}: {str(e)}")
raise
def save_state(self, path_to_vm: str, snapshot_name: str):
logger.info("Saving AWS VM state...")
ec2_client = boto3.client('ec2', region_name=self.region)
try:
image_response = ec2_client.create_image(InstanceId=path_to_vm, Name=snapshot_name)
image_id = image_response['ImageId']
logger.info(f"AMI {image_id} created successfully from instance {path_to_vm}.")
return image_id
except ClientError as e:
logger.error(f"Failed to create AMI from the instance {path_to_vm}: {str(e)}")
raise
def revert_to_snapshot(self, path_to_vm: str, snapshot_name: str):
logger.info(f"Reverting AWS VM to snapshot: {snapshot_name}...")
ec2_client = boto3.client('ec2', region_name=self.region)
try:
# Get original instance details for config.
instance_details = ec2_client.describe_instances(InstanceIds=[path_to_vm])
instance = instance_details['Reservations'][0]['Instances'][0]
security_groups = [sg['GroupId'] for sg in instance['SecurityGroups']]
subnet_id = instance['SubnetId']
instance_type = instance['InstanceType']
# Terminate the old instance. This is a non-blocking call.
logger.info(f"Initiating termination for old instance {path_to_vm}...")
ec2_client.terminate_instances(InstanceIds=[path_to_vm])
logger.info(f"Old instance {path_to_vm} termination initiated.")
# Rotate to a new proxy
self._rotate_proxy()
# Create a new instance
new_instance_id = self.create_instance_with_proxy(
snapshot_name, instance_type, security_groups, subnet_id
)
# Note: VNC address is displayed within create_instance_with_proxy
logger.info(f"Successfully launched new instance {new_instance_id} for revert.")
return new_instance_id
except ClientError as e:
logger.error(f"Failed to revert to snapshot {snapshot_name} for the instance {path_to_vm}: {str(e)}")
raise
def stop_emulator(self, path_to_vm, region=None):
logger.info(f"Stopping AWS VM {path_to_vm}...")
ec2_client = boto3.client('ec2', region_name=self.region)
try:
ec2_client.stop_instances(InstanceIds=[path_to_vm])
waiter = ec2_client.get_waiter('instance_stopped')
waiter.wait(InstanceIds=[path_to_vm], WaiterConfig={'Delay': WAIT_DELAY, 'MaxAttempts': MAX_ATTEMPTS})
logger.info(f"Instance {path_to_vm} has been stopped.")
except ClientError as e:
logger.error(f"Failed to stop the AWS VM {path_to_vm}: {str(e)}")
raise
def get_current_proxy_info(self) -> Optional[dict]:
"""获取当前代理信息"""
if self.current_proxy:
return {
'host': self.current_proxy.host,
'port': self.current_proxy.port,
'protocol': self.current_proxy.protocol,
'failed_count': self.current_proxy.failed_count
}
return None
def force_rotate_proxy(self):
"""强制轮换代理"""
logger.info("Force rotating proxy...")
if self.current_proxy:
proxy_pool = get_global_proxy_pool()
proxy_pool.mark_proxy_failed(self.current_proxy)
self._rotate_proxy()
def get_proxy_stats(self) -> dict:
"""获取代理池统计信息"""
proxy_pool = get_global_proxy_pool()
return proxy_pool.get_stats()

View File

@ -33,7 +33,7 @@ class ProxyPool:
self.load_proxies_from_file(config_file)
def load_proxies_from_file(self, config_file: str):
"""从配置文件加载代理列表"""
"""Load proxy list from config file"""
try:
with open(config_file, 'r') as f:
proxy_configs = json.load(f)
@ -54,7 +54,7 @@ class ProxyPool:
def add_proxy(self, host: str, port: int, username: str = None,
password: str = None, protocol: str = "http"):
"""添加代理到池中"""
"""Add proxy to pool"""
proxy = ProxyInfo(host=host, port=port, username=username,
password=password, protocol=protocol)
with self.lock:
@ -62,19 +62,19 @@ class ProxyPool:
logger.info(f"Added proxy {host}:{port}")
def get_next_proxy(self) -> Optional[ProxyInfo]:
"""获取下一个可用的代理"""
"""Get next available proxy"""
with self.lock:
if not self.proxies:
return None
# 过滤掉失败次数过多的代理
# Filter out proxies with too many failures
active_proxies = [p for p in self.proxies if self._is_proxy_available(p)]
if not active_proxies:
logger.warning("No active proxies available")
return None
# 轮询选择代理
# Round-robin selection of proxy
proxy = active_proxies[self.current_index % len(active_proxies)]
self.current_index += 1
proxy.last_used = time.time()
@ -82,22 +82,22 @@ class ProxyPool:
return proxy
def _is_proxy_available(self, proxy: ProxyInfo) -> bool:
"""检查代理是否可用"""
"""Check if proxy is available"""
if not proxy.is_active:
return False
if proxy.failed_count >= self.max_failures:
# 检查是否过了冷却时间
# Check if cooldown time has passed
if time.time() - proxy.last_used < self.cooldown_time:
return False
else:
# 重置失败计数
# Reset failure count
proxy.failed_count = 0
return True
def mark_proxy_failed(self, proxy: ProxyInfo):
"""标记代理失败"""
"""Mark proxy as failed"""
with self.lock:
proxy.failed_count += 1
if proxy.failed_count >= self.max_failures:
@ -105,13 +105,13 @@ class ProxyPool:
f"(failures: {proxy.failed_count})")
def mark_proxy_success(self, proxy: ProxyInfo):
"""标记代理成功"""
"""Mark proxy as successful"""
with self.lock:
proxy.failed_count = 0
def test_proxy(self, proxy: ProxyInfo, test_url: str = "http://httpbin.org/ip",
timeout: int = 10) -> bool:
"""测试代理是否正常工作"""
"""Test if proxy is working"""
try:
proxy_url = self._format_proxy_url(proxy)
proxies = {
@ -133,14 +133,14 @@ class ProxyPool:
return False
def _format_proxy_url(self, proxy: ProxyInfo) -> str:
"""格式化代理URL"""
"""Format proxy URL"""
if proxy.username and proxy.password:
return f"{proxy.protocol}://{proxy.username}:{proxy.password}@{proxy.host}:{proxy.port}"
else:
return f"{proxy.protocol}://{proxy.host}:{proxy.port}"
def get_proxy_dict(self, proxy: ProxyInfo) -> Dict[str, str]:
"""获取requests库使用的代理字典"""
"""Get proxy dictionary for requests library"""
proxy_url = self._format_proxy_url(proxy)
return {
'http': proxy_url,
@ -148,7 +148,7 @@ class ProxyPool:
}
def test_all_proxies(self, test_url: str = "http://httpbin.org/ip"):
"""测试所有代理"""
"""Test all proxies"""
logger.info("Testing all proxies...")
working_count = 0
@ -163,7 +163,7 @@ class ProxyPool:
return working_count
def get_stats(self) -> Dict:
"""获取代理池统计信息"""
"""Get proxy pool statistics"""
with self.lock:
total = len(self.proxies)
active = len([p for p in self.proxies if self._is_proxy_available(p)])
@ -176,18 +176,18 @@ class ProxyPool:
'success_rate': active / total if total > 0 else 0
}
# 全局代理池实例
# Global proxy pool instance
_proxy_pool = None
def get_global_proxy_pool() -> ProxyPool:
"""获取全局代理池实例"""
"""Get global proxy pool instance"""
global _proxy_pool
if _proxy_pool is None:
_proxy_pool = ProxyPool()
return _proxy_pool
def init_proxy_pool(config_file: str = None):
"""初始化全局代理池"""
"""Initialize global proxy pool"""
global _proxy_pool
_proxy_pool = ProxyPool(config_file)
return _proxy_pool

View File

@ -0,0 +1,153 @@
import os
import time
import json
from datetime import datetime, timedelta, timezone
import boto3
from botocore.exceptions import ClientError
def _resolve_scheduler_role_arn(logger) -> str:
# 1) Explicit env takes precedence
role_arn = os.getenv('AWS_SCHEDULER_ROLE_ARN', '').strip()
if role_arn:
return role_arn
# 2) Derive from role name + account id
role_name = os.getenv('AWS_SCHEDULER_ROLE_NAME', 'osworld-scheduler-ec2-terminate').strip()
try:
sts = boto3.client('sts')
account_id = sts.get_caller_identity()['Account']
derived_arn = f"arn:aws:iam::{account_id}:role/{role_name}"
iam = boto3.client('iam')
try:
role = iam.get_role(RoleName=role_name)["Role"]
except ClientError:
auto_create = os.getenv('AWS_AUTO_CREATE_SCHEDULER_ROLE', 'true').lower() == 'true'
if not auto_create:
logger.warning(f"Scheduler role '{role_name}' not found and auto-create disabled.")
return ''
try:
trust_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {"Service": "scheduler.amazonaws.com"},
"Action": "sts:AssumeRole"
}
]
}
iam.create_role(RoleName=role_name, AssumeRolePolicyDocument=json.dumps(trust_policy))
role = iam.get_role(RoleName=role_name)["Role"]
except ClientError as ce:
# If another process created it, fetch again
try:
role = iam.get_role(RoleName=role_name)["Role"]
except ClientError:
logger.warning(f"Failed to auto-create scheduler role '{role_name}': {ce}")
return ''
# Ensure trust policy allows scheduler.amazonaws.com
assume_doc = role.get("AssumeRolePolicyDocument", {})
principal_ok = False
try:
for stmt in assume_doc.get("Statement", []):
principal = stmt.get("Principal", {})
svc = principal.get("Service")
if isinstance(svc, str) and svc == "scheduler.amazonaws.com":
principal_ok = True
break
if isinstance(svc, list) and "scheduler.amazonaws.com" in svc:
principal_ok = True
break
except Exception:
principal_ok = False
if not principal_ok:
trust_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {"Service": "scheduler.amazonaws.com"},
"Action": "sts:AssumeRole"
}
]
}
iam.update_assume_role_policy(RoleName=role_name, PolicyDocument=json.dumps(trust_policy))
# Ensure minimal inline policy exists
inline_name = f"{role_name}-inline"
need_policy = False
try:
iam.get_role_policy(RoleName=role_name, PolicyName=inline_name)
except ClientError:
need_policy = True
if need_policy:
inline_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["ec2:TerminateInstances", "ec2:DescribeInstances"],
"Resource": "*"
}
]
}
iam.put_role_policy(RoleName=role_name, PolicyName=inline_name, PolicyDocument=json.dumps(inline_policy))
# Wait for IAM propagation
time.sleep(8)
logger.info(f"Derived AWS_SCHEDULER_ROLE_ARN={derived_arn} from role name '{role_name}'")
return derived_arn
except Exception as e:
logger.warning(f"Failed to resolve Scheduler Role ARN: {e}")
return ''
def schedule_instance_termination(region: str, instance_id: str, ttl_seconds: int, role_arn: str, logger) -> None:
if not role_arn:
role_arn = _resolve_scheduler_role_arn(logger)
if not role_arn:
logger.info("Scheduler role ARN not available; skipping TTL schedule creation.")
return
scheduler_client = boto3.client('scheduler', region_name=region)
schedule_name = f"osworld-ttl-{instance_id}-{int(time.time())}"
eta_scheduler = datetime.now(timezone.utc) + timedelta(seconds=ttl_seconds)
schedule_expression = f"at({eta_scheduler.strftime('%Y-%m-%dT%H:%M:%S')})"
target_arn = "arn:aws:scheduler:::aws-sdk:ec2:terminateInstances"
input_payload = '{"InstanceIds":["' + instance_id + '"]}'
# Retry to tolerate IAM eventual consistency
last_err = None
for attempt in range(1, 7): # ~ up to ~60s
try:
scheduler_client.create_schedule(
Name=schedule_name,
ScheduleExpression=schedule_expression,
FlexibleTimeWindow={"Mode": "OFF"},
ActionAfterCompletion='DELETE',
Target={
"Arn": target_arn,
"RoleArn": role_arn,
"Input": input_payload
},
State='ENABLED',
Description=f"OSWorld TTL terminate for {instance_id}"
)
logger.info(f"Scheduled EC2 termination via EventBridge Scheduler: name={schedule_name}, when={eta_scheduler.isoformat()} (UTC)")
last_err = None
break
except ClientError as e:
last_err = e
code = e.response.get('Error', {}).get('Code')
msg = e.response.get('Error', {}).get('Message', '')
if code == 'ValidationException' and 'must allow AWS EventBridge Scheduler to assume the role' in msg:
time.sleep(10)
continue
else:
raise
if last_err is not None:
# If we exhausted retries, re-raise to surface warning upstream
raise last_err

View File

@ -67,7 +67,9 @@ class AzureVMManager(VMManager):
free_vms.append((vm_path, pid_str))
return free_vms
def get_vm_path(self, region):
def get_vm_path(self, region, screen_size=(1920, 1080), **kwargs):
# Note: screen_size parameter is ignored for Azure provider
# but kept for interface consistency with other providers
self.check_and_clean()
free_vms_paths = self.list_free_vms(region)
if len(free_vms_paths) == 0:

View File

@ -32,7 +32,9 @@ class AzureProvider(Provider):
self.compute_client = ComputeManagementClient(credential, self.subscription_id)
self.network_client = NetworkManagementClient(credential, self.subscription_id)
def start_emulator(self, path_to_vm: str, headless: bool):
def start_emulator(self, path_to_vm: str, headless: bool, os_type: str = None, *args, **kwargs):
# Note: os_type parameter is ignored for Azure provider
# but kept for interface consistency with other providers
logger.info("Starting Azure VM...")
resource_group_name, vm_name = path_to_vm.split('/')

View File

View File

@ -34,6 +34,12 @@ def _download_vm(vms_dir: str):
logger.info("Downloading the virtual machine image...")
downloaded_size = 0
# Check for HF_ENDPOINT environment variable and replace domain if set to hf-mirror.com
hf_endpoint = os.environ.get('HF_ENDPOINT')
if hf_endpoint and 'hf-mirror.com' in hf_endpoint:
URL = URL.replace('huggingface.co', 'hf-mirror.com')
logger.info(f"Using HF mirror: {URL}")
downloaded_file_name = DOWNLOADED_FILE_NAME
os.makedirs(vms_dir, exist_ok=True)
@ -93,7 +99,8 @@ class DockerVMManager(VMManager):
def check_and_clean(self):
pass
def delete_vm(self, vm_path):
def delete_vm(self, vm_path, region=None, **kwargs):
# Fixed: Added region and **kwargs parameters for interface compatibility
pass
def initialize_registry(self):
@ -102,15 +109,25 @@ class DockerVMManager(VMManager):
def list_free_vms(self):
return os.path.join(VMS_DIR, DOWNLOADED_FILE_NAME)
def occupy_vm(self, vm_path):
def occupy_vm(self, vm_path, pid, region=None, **kwargs):
# Fixed: Added pid, region and **kwargs parameters for interface compatibility
pass
def get_vm_path(self, os_type, region):
def get_vm_path(self, os_type, region, screen_size=(1920, 1080), **kwargs):
# Note: screen_size parameter is ignored for Docker provider
# but kept for interface consistency with other providers
global URL, DOWNLOADED_FILE_NAME
if os_type == "Ubuntu":
URL = UBUNTU_X86_URL
elif os_type == "Windows":
URL = WINDOWS_X86_URL
# Check for HF_ENDPOINT environment variable and replace domain if set to hf-mirror.com
hf_endpoint = os.environ.get('HF_ENDPOINT')
if hf_endpoint and 'hf-mirror.com' in hf_endpoint:
URL = URL.replace('huggingface.co', 'hf-mirror.com')
logger.info(f"Using HF mirror: {URL}")
DOWNLOADED_FILE_NAME = URL.split('/')[-1]
if DOWNLOADED_FILE_NAME.endswith(".zip"):

View File

@ -97,11 +97,20 @@ class DockerProvider(Provider):
self.vlc_port = self._get_available_port(8080)
# Start container while still holding the lock
# Check if KVM is available
devices = []
if os.path.exists("/dev/kvm"):
devices.append("/dev/kvm")
logger.info("KVM device found, using hardware acceleration")
else:
self.environment["KVM"] = "N"
logger.warning("KVM device not found, running without hardware acceleration (will be slower)")
self.container = self.client.containers.run(
"happysixd/osworld-docker",
environment=self.environment,
cap_add=["NET_ADMIN"],
devices=["/dev/kvm"],
devices=devices,
volumes={
os.path.abspath(path_to_vm): {
"bind": "/System.qcow2",
@ -144,7 +153,9 @@ class DockerProvider(Provider):
def revert_to_snapshot(self, path_to_vm: str, snapshot_name: str):
self.stop_emulator(path_to_vm)
def stop_emulator(self, path_to_vm: str):
def stop_emulator(self, path_to_vm: str, region=None, *args, **kwargs):
# Note: region parameter is ignored for Docker provider
# but kept for interface consistency with other providers
if self.container:
logger.info("Stopping VM...")
try:

View File

@ -57,6 +57,12 @@ def _install_vm(vm_name, vms_dir, downloaded_file_name, original_vm_name="Ubuntu
else:
raise Exception("Unsupported platform or architecture.")
# Check for HF_ENDPOINT environment variable and replace domain if set to hf-mirror.com
hf_endpoint = os.environ.get('HF_ENDPOINT')
if hf_endpoint and 'hf-mirror.com' in hf_endpoint:
url = url.replace('huggingface.co', 'hf-mirror.com')
logger.info(f"Using HF mirror: {url}")
# Download the virtual machine image
logger.info("Downloading the virtual machine image...")
downloaded_size = 0
@ -428,7 +434,9 @@ class VirtualBoxVMManager(VMManager):
free_vms.append((vm_path, pid_str))
return free_vms
def get_vm_path(self, os_type, region=None):
def get_vm_path(self, os_type, region=None, screen_size=(1920, 1080), **kwargs):
# Note: screen_size parameter is ignored for VirtualBox provider
# but kept for interface consistency with other providers
if os_type != "Ubuntu":
raise ValueError("Only support Ubuntu for now.")

View File

@ -55,7 +55,9 @@ class VirtualBoxProvider(Provider):
logger.error(f"Error executing command: {e.output.decode().strip()}")
def start_emulator(self, path_to_vm: str, headless: bool):
def start_emulator(self, path_to_vm: str, headless: bool, os_type: str = None, *args, **kwargs):
# Note: os_type parameter is ignored for VirtualBox provider
# but kept for interface consistency with other providers
logger.info("Starting VirtualBox VM...")
while True:
@ -113,7 +115,9 @@ class VirtualBoxProvider(Provider):
time.sleep(WAIT_TIME) # Wait for the VM to revert
return path_to_vm
def stop_emulator(self, path_to_vm: str):
def stop_emulator(self, path_to_vm: str, region=None, *args, **kwargs):
# Note: region parameter is ignored for VirtualBox provider
# but kept for interface consistency with other providers
logger.info("Stopping VirtualBox VM...")
uuid = VirtualBoxProvider._get_vm_uuid(path_to_vm)
VirtualBoxProvider._execute_command(["VBoxManage", "controlvm", uuid, "savestate"])

View File

@ -29,13 +29,11 @@ UBUNTU_X86_URL = "https://huggingface.co/datasets/xlangai/ubuntu_osworld/resolve
WINDOWS_X86_URL = "https://huggingface.co/datasets/xlangai/windows_osworld/resolve/main/Windows-x86.zip"
# Determine the platform and CPU architecture to decide the correct VM image to download
if platform.system() == 'Darwin': # macOS
# if os.uname().machine == 'arm64': # Apple Silicon
URL = UBUNTU_ARM_URL
# else:
# url = UBUNTU_X86_URL
elif platform.machine().lower() in ['amd64', 'x86_64']:
# sometimes the system is 'Darwin' but the machine is x86-based.
if platform.machine().lower() in ['amd64', 'x86_64']:
URL = UBUNTU_X86_URL
elif platform.system() == 'Darwin': # macOS
URL = UBUNTU_ARM_URL
else:
raise Exception("Unsupported platform or architecture")
@ -125,15 +123,22 @@ def _install_vm(vm_name, vms_dir, downloaded_file_name, os_type, original_vm_nam
# Download the virtual machine image
logger.info("Downloading the virtual machine image...")
downloaded_size = 0
# sometimes the system is 'Darwin' but the machine is x86-based.
if os_type == "Ubuntu":
if platform.system() == 'Darwin':
URL = UBUNTU_ARM_URL
elif platform.machine().lower() in ['amd64', 'x86_64']:
if platform.machine().lower() in ['amd64', 'x86_64']:
URL = UBUNTU_X86_URL
elif platform.system() == 'Darwin':
URL = UBUNTU_ARM_URL
elif os_type == "Windows":
if platform.machine().lower() in ['amd64', 'x86_64']:
URL = WINDOWS_X86_URL
# Check for HF_ENDPOINT environment variable and replace domain if set to hf-mirror.com
hf_endpoint = os.environ.get('HF_ENDPOINT')
if hf_endpoint and 'hf-mirror.com' in hf_endpoint:
URL = URL.replace('huggingface.co', 'hf-mirror.com')
logger.info(f"Using HF mirror: {URL}")
DOWNLOADED_FILE_NAME = URL.split('/')[-1]
downloaded_file_name = DOWNLOADED_FILE_NAME
@ -417,7 +422,9 @@ class VMwareVMManager(VMManager):
free_vms.append((vm_path, pid_str))
return free_vms
def get_vm_path(self, os_type, region=None):
def get_vm_path(self, os_type, region=None, screen_size=(1920, 1080), **kwargs):
# Note: screen_size parameter is ignored for VMware provider
# but kept for interface consistency with other providers
with self.lock:
if not VMwareVMManager.checked_and_cleaned:
VMwareVMManager.checked_and_cleaned = True

View File

@ -97,7 +97,9 @@ class VMwareProvider(Provider):
time.sleep(WAIT_TIME) # Wait for the VM to revert
return path_to_vm
def stop_emulator(self, path_to_vm: str):
def stop_emulator(self, path_to_vm: str, region=None, *args, **kwargs):
# Note: region parameter is ignored for VMware provider
# but kept for interface consistency with other providers
logger.info("Stopping VMware VM...")
VMwareProvider._execute_command(["vmrun"] + get_vmrun_type(return_list=True) + ["stop", path_to_vm])
time.sleep(WAIT_TIME) # Wait for the VM to stop

View File

@ -0,0 +1,72 @@
# 火山引擎ECS提供商配置指南
本指南介绍如何为OSWorld桌面环境配置和使用火山引擎ECS。
## 配置流程
1. **火山引擎账户**您需要一个有效的火山引擎账户本脚本默认ECS通过按量付费方式拉起需保证账户余额在100以上。
2. **访问密钥**在火山引擎IAM控制台中创建AccessKey ID和SecretAccessKey并授权ECS控制权限
3. **VPC设置**在目标地域创建VPC、子网和安全组
4. **自定义镜像**创建OSWorld自定义镜像
5. 建议手动完成一次ECS创建流程后记录所有需要的环境变量信息。
## 环境变量
在您的`.env`文件中设置以下环境变量:
```bash
# 火山引擎访问凭证
VOLCENGINE_ACCESS_KEY_ID=your_access_key_id
VOLCENGINE_SECRET_ACCESS_KEY=your_secret_access_key
# ECS配置信息
VOLCENGINE_REGION=ap-southeast-1
VOLCENGINE_IMAGE_ID=image-xxxxxxxxx
VOLCENGINE_INSTANCE_TYPE=ecs.e-c1m2.large
VOLCENGINE_SUBNET_ID=subnet-xxxxxxxxx
VOLCENGINE_SECURITY_GROUP_ID=sg-xxxxxxxxx
VOLCENGINE_ZONE_ID=zone-xxxxxxxxx
VOLCENGINE_DEFAULT_PASSWORD=your_default_password
```
## 所需火山引擎资源
### 1. VPC和子网
- 在目标地域创建VPC
- 在VPC内创建子网
- 确保子网具有互联网访问能力以支持VNC连接
### 2. 安全组
**⚠️ 重要提示**请严格按照以下端口设置以防止OSWorld任务因连接问题而失败
#### 入方向规则需要8条规则
| 类型 | 协议 | 端口范围 | 源地址 | 描述 |
|------|------|----------|--------|------|
| SSH | TCP | 22 | 0.0.0.0/0 | SSH访问 |
| HTTP | TCP | 80 | 172.31.0.0/16 | HTTP流量 |
| 自定义TCP | TCP | 5000 | 172.31.0.0/16 | OSWorld后端服务 |
| 自定义TCP | TCP | 5910 | 0.0.0.0/0 | NoVNC可视化端口 |
| 自定义TCP | TCP | 8006 | 172.31.0.0/16 | VNC服务端口 |
| 自定义TCP | TCP | 8080 | 172.31.0.0/16 | VLC服务端口 |
| 自定义TCP | TCP | 8081 | 172.31.0.0/16 | 附加服务端口 |
| 自定义TCP | TCP | 9222 | 172.31.0.0/16 | Chrome控制端口 |
#### 出方向规则需要1条规则
| 类型 | 协议 | 端口范围 | 目标地址 | 描述 |
|------|------|----------|----------|------|
| 全部流量 | 全部 | 全部 | 0.0.0.0/0 | 允许所有出站流量 |
### 3. 自定义镜像
您需要为火山引擎ECS创建自定义OSWorld镜像。请按照"为OSWorld创建自定义ECS镜像"部分的说明进行操作。
## 为OSWorld创建自定义ECS镜像
本部分提供如何创建OSWorld桌面环境所需的自定义ECS镜像的指导。该过程包括设置带有桌面环境和VNC服务器的基础实例然后从中创建自定义镜像。
### 镜像创建过程
- 从`desktop_env/providers/docker/manager.py`中的链接下载提供的qcow2镜像https://huggingface.co/datasets/xlangai/ubuntu_osworld/resolve/main/Ubuntu.qcow2.zip
- 解压下载的文件并上传到火山引擎对象存储服务TOS。确保TOS与您要启动ECS实例的目标地域在同一地域。
- 在您的ECS控制台中转到"镜像"页面,您将看到"导入镜像"按钮。点击它并按照说明从TOS导入qcow2镜像。
- 导入完成后,您将在"镜像"列表中看到导入的镜像。

View File

@ -0,0 +1,221 @@
import os
import logging
import signal
import dotenv
import time
import volcenginesdkcore
import volcenginesdkecs.models as ecs_models
from volcenginesdkecs.api import ECSApi
from desktop_env.providers.base import VMManager
# Load environment variables from .env file
dotenv.load_dotenv()
for env_name in [
"VOLCENGINE_ACCESS_KEY_ID",
"VOLCENGINE_SECRET_ACCESS_KEY",
"VOLCENGINE_REGION",
"VOLCENGINE_SUBNET_ID",
"VOLCENGINE_SECURITY_GROUP_ID",
"VOLCENGINE_INSTANCE_TYPE",
"VOLCENGINE_IMAGE_ID",
"VOLCENGINE_ZONE_ID",
"VOLCENGINE_DEFAULT_PASSWORD",
]:
if not os.getenv(env_name):
raise EnvironmentError(f"{env_name} must be set in the environment variables.")
logger = logging.getLogger("desktopenv.providers.volcengine.VolcengineVMManager")
logger.setLevel(logging.INFO)
VOLCENGINE_ACCESS_KEY_ID = os.getenv("VOLCENGINE_ACCESS_KEY_ID")
VOLCENGINE_SECRET_ACCESS_KEY = os.getenv("VOLCENGINE_SECRET_ACCESS_KEY")
VOLCENGINE_REGION = os.getenv("VOLCENGINE_REGION")
VOLCENGINE_SUBNET_ID = os.getenv("VOLCENGINE_SUBNET_ID")
VOLCENGINE_SECURITY_GROUP_ID = os.getenv("VOLCENGINE_SECURITY_GROUP_ID")
VOLCENGINE_INSTANCE_TYPE = os.getenv("VOLCENGINE_INSTANCE_TYPE")
VOLCENGINE_IMAGE_ID = os.getenv("VOLCENGINE_IMAGE_ID")
VOLCENGINE_ZONE_ID = os.getenv("VOLCENGINE_ZONE_ID")
VOLCENGINE_DEFAULT_PASSWORD = os.getenv("VOLCENGINE_DEFAULT_PASSWORD")
def _allocate_vm(screen_size=(1920, 1080)):
"""分配火山引擎虚拟机"""
# 初始化火山引擎客户端
configuration = volcenginesdkcore.Configuration()
configuration.region = VOLCENGINE_REGION
configuration.ak = VOLCENGINE_ACCESS_KEY_ID
configuration.sk = VOLCENGINE_SECRET_ACCESS_KEY
configuration.client_side_validation = True
# set default configuration
volcenginesdkcore.Configuration.set_default(configuration)
# use global default configuration
api_instance = ECSApi()
instance_id = None
original_sigint_handler = signal.getsignal(signal.SIGINT)
original_sigterm_handler = signal.getsignal(signal.SIGTERM)
def signal_handler(sig, frame):
if instance_id:
signal_name = "SIGINT" if sig == signal.SIGINT else "SIGTERM"
logger.warning(f"Received {signal_name} signal, terminating instance {instance_id}...")
try:
api_instance.delete_instance(ecs_models.DeleteInstanceRequest(
instance_id=instance_id,
))
logger.info(f"Successfully terminated instance {instance_id} after {signal_name}.")
except Exception as cleanup_error:
logger.error(f"Failed to terminate instance {instance_id} after {signal_name}: {str(cleanup_error)}")
# Restore original signal handlers
signal.signal(signal.SIGINT, original_sigint_handler)
signal.signal(signal.SIGTERM, original_sigterm_handler)
if sig == signal.SIGINT:
raise KeyboardInterrupt
else:
import sys
sys.exit(0)
try:
# Set up signal handlers
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
# 创建实例参数
create_instance_params = ecs_models.RunInstancesRequest(
image_id = VOLCENGINE_IMAGE_ID,
instance_type = VOLCENGINE_INSTANCE_TYPE,
network_interfaces=[ecs_models.NetworkInterfaceForRunInstancesInput(
subnet_id=VOLCENGINE_SUBNET_ID,
security_group_ids=[VOLCENGINE_SECURITY_GROUP_ID],
)],
eip_address=ecs_models.EipAddressForRunInstancesInput(
bandwidth_mbps = 5,
charge_type = "PayByTraffic",
),
instance_name = f"osworld-{os.getpid()}-{int(time.time())}",
volumes=[ecs_models.VolumeForRunInstancesInput(
volume_type="ESSD_PL0",
size=30,
)],
zone_id=VOLCENGINE_ZONE_ID,
password = VOLCENGINE_DEFAULT_PASSWORD, # 默认密码
description = "OSWorld evaluation instance"
)
# 创建实例
response = api_instance.run_instances(create_instance_params)
instance_id = response.instance_ids[0]
logger.info(f"Waiting for instance {instance_id} to be running...")
# 等待实例运行
while True:
instance_info = api_instance.describe_instances(ecs_models.DescribeInstancesRequest(
instance_ids=[instance_id]
))
status = instance_info.instances[0].status
if status == 'RUNNING':
break
elif status in ['STOPPED', 'ERROR']:
raise Exception(f"Instance {instance_id} failed to start, status: {status}")
time.sleep(5)
logger.info(f"Instance {instance_id} is ready.")
# 获取实例IP地址
try:
instance_info = api_instance.describe_instances(ecs_models.DescribeInstancesRequest(
instance_ids=[instance_id]
))
print(instance_info)
public_ip = instance_info.instances[0].eip_address.ip_address
private_ip = instance_info.instances[0].network_interfaces[0].primary_ip_address
if public_ip:
vnc_url = f"http://{public_ip}:5910/vnc.html"
logger.info("="*80)
logger.info(f"🖥️ VNC Web Access URL: {vnc_url}")
logger.info(f"📡 Public IP: {public_ip}")
logger.info(f"🏠 Private IP: {private_ip}")
logger.info(f"🆔 Instance ID: {instance_id}")
logger.info("="*80)
print(f"\n🌐 VNC Web Access URL: {vnc_url}")
print(f"📍 Please open the above address in the browser for remote desktop access\n")
except Exception as e:
logger.warning(f"Failed to get VNC address for instance {instance_id}: {e}")
except KeyboardInterrupt:
logger.warning("VM allocation interrupted by user (SIGINT).")
if instance_id:
logger.info(f"Terminating instance {instance_id} due to interruption.")
api_instance.delete_instance(ecs_models.DeleteInstanceRequest(
instance_id=instance_id,
))
raise
except Exception as e:
logger.error(f"Failed to allocate VM: {e}", exc_info=True)
if instance_id:
logger.info(f"Terminating instance {instance_id} due to an error.")
api_instance.delete_instance(ecs_models.DeleteInstanceRequest(
instance_id=instance_id,
))
raise
finally:
# Restore original signal handlers
signal.signal(signal.SIGINT, original_sigint_handler)
signal.signal(signal.SIGTERM, original_sigterm_handler)
return instance_id
class VolcengineVMManager(VMManager):
"""
Volcengine VM Manager for managing virtual machines on Volcengine.
Volcengine does not need to maintain a registry of VMs, as it can dynamically allocate and deallocate VMs.
"""
def __init__(self, **kwargs):
self.initialize_registry()
def initialize_registry(self, **kwargs):
pass
def add_vm(self, vm_path, lock_needed=True, **kwargs):
pass
def _add_vm(self, vm_path):
pass
def delete_vm(self, vm_path, lock_needed=True, **kwargs):
pass
def _delete_vm(self, vm_path):
pass
def occupy_vm(self, vm_path, pid, lock_needed=True, **kwargs):
pass
def _occupy_vm(self, vm_path, pid):
pass
def check_and_clean(self, lock_needed=True, **kwargs):
pass
def _check_and_clean(self):
pass
def list_free_vms(self, lock_needed=True, **kwargs):
pass
def _list_free_vms(self):
pass
def get_vm_path(self, screen_size=(1920, 1080), **kwargs):
logger.info("Allocating a new VM in region: {region}".format(region=VOLCENGINE_REGION))
new_vm_path = _allocate_vm(screen_size=screen_size)
return new_vm_path

View File

@ -0,0 +1,188 @@
import os
import time
import logging
import volcenginesdkcore
import volcenginesdkautoscaling
import volcenginesdkecs.models as ecs_models
from volcenginesdkcore.rest import ApiException
from volcenginesdkecs.api import ECSApi
from desktop_env.providers.base import Provider
from desktop_env.providers.volcengine.manager import _allocate_vm
logger = logging.getLogger("desktopenv.providers.volcengine.VolcengineProvider")
logger.setLevel(logging.INFO)
WAIT_DELAY = 15
MAX_ATTEMPTS = 10
class VolcengineProvider(Provider):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.region = os.getenv("VOLCENGINE_REGION", "eu-central-1")
self.client = self._create_client()
def _create_client(self) -> ECSApi:
configuration = volcenginesdkcore.Configuration()
configuration.ak = os.getenv('VOLCENGINE_ACCESS_KEY_ID')
configuration.sk = os.getenv('VOLCENGINE_SECRET_ACCESS_KEY')
configuration.region = os.getenv('VOLCENGINE_REGION')
configuration.client_side_validation = True
# set default configuration
volcenginesdkcore.Configuration.set_default(configuration)
return ECSApi()
def start_emulator(self, path_to_vm: str, headless: bool, *args, **kwargs):
logger.info("Starting Volcengine VM...")
try:
# 检查实例状态
instance_info = self.client.describe_instances(ecs_models.DescribeInstancesRequest(
instance_ids=[path_to_vm]
))
status = instance_info.instances[0].status
logger.info(f"Instance {path_to_vm} current status: {status}")
if status == 'RUNNING':
logger.info(f"Instance {path_to_vm} is already running. Skipping start.")
return
if status == 'STOPPED':
# 启动实例
self.client.start_instance(ecs_models.StartInstancesRequest(instance_ids=[path_to_vm]))
logger.info(f"Instance {path_to_vm} is starting...")
# 等待实例运行
for attempt in range(MAX_ATTEMPTS):
time.sleep(WAIT_DELAY)
instance_info = self.client.describe_instances(ecs_models.DescribeInstancesRequest(
instance_ids=[path_to_vm]
))
status = instance_info.instances[0].status
if status == 'RUNNING':
logger.info(f"Instance {path_to_vm} is now running.")
break
elif status == 'ERROR':
raise Exception(f"Instance {path_to_vm} failed to start")
elif attempt == MAX_ATTEMPTS - 1:
raise Exception(f"Instance {path_to_vm} failed to start within timeout")
else:
logger.warning(f"Instance {path_to_vm} is in status '{status}' and cannot be started.")
except ApiException as e:
logger.error(f"Failed to start the Volcengine VM {path_to_vm}: {str(e)}")
raise
def get_ip_address(self, path_to_vm: str) -> str:
logger.info("Getting Volcengine VM IP address...")
try:
instance_info = self.client.describe_instances(ecs_models.DescribeInstancesRequest(
instance_ids=[path_to_vm]
))
public_ip = instance_info.instances[0].eip_address.ip_address
private_ip = instance_info.instances[0].network_interfaces[0].primary_ip_address
if public_ip:
vnc_url = f"http://{public_ip}:5910/vnc.html"
logger.info("=" * 80)
logger.info(f"🖥️ VNC Web Access URL: {vnc_url}")
logger.info(f"📡 Public IP: {public_ip}")
logger.info(f"🏠 Private IP: {private_ip}")
logger.info("=" * 80)
print(f"\n🌐 VNC Web Access URL: {vnc_url}")
print(f"📍 Please open the above address in the browser for remote desktop access\n")
else:
logger.warning("No public IP address available for VNC access")
return private_ip
except ApiException as e:
logger.error(f"Failed to retrieve IP address for the instance {path_to_vm}: {str(e)}")
raise
def save_state(self, path_to_vm: str, snapshot_name: str):
logger.info("Saving Volcengine VM state...")
try:
# 创建镜像
response = self.client.create_image(ecs_models.CreateImageRequest(
snapshot_id=snapshot_name,
instance_id=path_to_vm,
description=f"OSWorld snapshot: {snapshot_name}"
))
image_id = response['image_id']
logger.info(f"Image {image_id} created successfully from instance {path_to_vm}.")
return image_id
except ApiException as e:
logger.error(f"Failed to create image from the instance {path_to_vm}: {str(e)}")
raise
def revert_to_snapshot(self, path_to_vm: str, snapshot_name: str):
logger.info(f"Reverting Volcengine VM to snapshot: {snapshot_name}...")
try:
# 删除原实例
self.client.delete_instance(ecs_models.DeleteInstanceRequest(
instance_id=path_to_vm,
))
logger.info(f"Old instance {path_to_vm} has been deleted.")
# 创建实例
new_instance_id = _allocate_vm()
logger.info(f"New instance {new_instance_id} launched from image {snapshot_name}.")
logger.info(f"Waiting for instance {new_instance_id} to be running...")
# 等待新实例运行
while True:
instance_info = self.client.describe_instances(ecs_models.DescribeInstancesRequest(
instance_ids=[new_instance_id]
))
status = instance_info.instances[0].status
if status == 'RUNNING':
break
elif status in ['STOPPED', 'ERROR']:
raise Exception(f"New instance {new_instance_id} failed to start, status: {status}")
time.sleep(5)
logger.info(f"Instance {new_instance_id} is ready.")
# 获取新实例的IP地址
try:
instance_info = self.client.describe_instances(ecs_models.DescribeInstancesRequest(
instance_ids=[new_instance_id]
))
public_ip = instance_info.instances[0].eip_address.ip_address
if public_ip:
vnc_url = f"http://{public_ip}:5910/vnc.html"
logger.info("=" * 80)
logger.info(f"🖥️ New Instance VNC Web Access URL: {vnc_url}")
logger.info(f"📡 Public IP: {public_ip}")
logger.info(f"🆔 New Instance ID: {new_instance_id}")
logger.info("=" * 80)
print(f"\n🌐 New Instance VNC Web Access URL: {vnc_url}")
print(f"📍 Please open the above address in the browser for remote desktop access\n")
except Exception as e:
logger.warning(f"Failed to get VNC address for new instance {new_instance_id}: {e}")
return new_instance_id
except ApiException as e:
logger.error(f"Failed to revert to snapshot {snapshot_name} for the instance {path_to_vm}: {str(e)}")
raise
def stop_emulator(self, path_to_vm, region=None):
logger.info(f"Stopping Volcengine VM {path_to_vm}...")
try:
self.client.delete_instance(ecs_models.DeleteInstanceRequest(
instance_id=path_to_vm,
))
logger.info(f"Instance {path_to_vm} has been terminated.")
except ApiException as e:
logger.error(f"Failed to stop the Volcengine VM {path_to_vm}: {str(e)}")
raise

View File

@ -442,7 +442,7 @@ Since for some examples like change the settings of certain software, we hardcod
##### LibreOffice font installation
Some examples in LibreOffice Impress use non-default system fonts, and you need to download the corresponding **TTF files** and put them in the system fonts directory.
[Here](https://drive.usercontent.google.com/download?id=1UzmdsfUQRTnvCxkvWrKguwZM3G5eQk87&export=download&authuser=0&confirm=t&uuid=70b9fbb7-9585-4aa4-a2c0-a7d6126469a0&at=AEz70l4rdEjdxBpqkLyW9lcil6S5:1740142224052) we provides all the fonts downloaded, just download it, and unzip to the system fonts directory (which usually `usr/share/fonts/`).
[Here](https://huggingface.co/datasets/xlangai/ubuntu_osworld_file_cache/resolve/main/fonts_20250608_fixed.zip) we provides all the fonts downloaded, just download it, and unzip to the system fonts directory (which usually `usr/share/fonts/`).
```bash
unzip fonts.zip -d /usr/share/fonts/
```
@ -488,6 +488,20 @@ touch ~/.local/share/keyrings/login.keyring
Or you can use any ways to disable the keyring service, which will prevent Chrome from requesting a password input.
3. VSCode Trust Settings:
To prevent VSCode from showing "Do you trust this workspace?" dialog when opening projects, configure the following setting:
```bash
# Open VSCode
# Go to File -> Preferences -> Settings (or press Ctrl+,)
# In the search bar, type "workspace trust"
# Find "Security: Workspace Trust Enabled" and uncheck it
# Find "Security: Workspace Trust Banner" and set to "never"
# Find "Security: Workspace Trust Empty Window" and uncheck it
# Find "Security: Workspace Trust Startup Prompt" and set to "never"
```
This disables the workspace trust feature and prevents trust dialog prompts when opening projects.
### Network Configuration
@ -539,20 +553,30 @@ The following is the screenshot of the VLC configuration:
When VLC is open, the service will be running on port 8080.
##### Chrome Configuration
When you open Chrome through the GUI interface, it doesn't automatically enable remote debugging port 1337 (the port we set for controlling the client machine's Chrome through the host, which gets forwarded to 9222).
This is because GUI startup and command-line startup use different parameters.
The consequence is that once Chrome is closed and reopened, it will fail to connect.
To ensure Chrome uses consistent debugging ports even after being closed and reopened, follow these steps:
1. Create or edit Chrome desktop entry:
```bash
sudo nano /usr/share/applications/google-chrome.desktop
sudo vim ~/.local/share/applications/google-chrome.desktop
```
If that doesn't work, try:
```
sudo vim /usr/share/applications/google-chrome.desktop
```
2. Modify the Exec lines to include debugging port:
```bash
# Find lines starting with "Exec=" and add the following flags:
--remote-debugging-port=1337 --remote-debugging-address=0.0.0.0
2. Modify **all** the `Exec` lines to include debugging port, for example, change:
```
Exec=/usr/bin/google-chrome-stable %U
```
into:
```
Exec=/usr/bin/google-chrome-stable --remote-debugging-port=1337 --remote-debugging-address=0.0.0.0 %U
```
In cases where need Chrome, the 1337 will be forwarded to 9222 in the virtual machine via socat.
In cases where Chrome is needed, the 1337 will be forwarded to 9222 in the virtual machine via socat.
### Miscellaneous Settings

View File

@ -1370,7 +1370,7 @@ def open_file():
if window_found:
return "File opened and window activated successfully"
else:
return f"Failed to find window for {file_name} within {timeout} seconds.", 500
return f"Failed to find window for {file_name} within {TIMEOUT} seconds.", 500
except Exception as e:
return f"Failed to open {path}. Error: {e}", 500
@ -1568,5 +1568,230 @@ def end_recording():
return abort(500, description=f"Recording failed. The output file is missing or empty. ffmpeg stderr: {error_output}")
@app.route("/run_python", methods=['POST'])
def run_python():
data = request.json
code = data.get('code', None)
if not code:
return jsonify({'status': 'error', 'message': 'Code not supplied!'}), 400
# Create a temporary file to save the Python code
import tempfile
import uuid
# Generate unique filename
temp_filename = f"/tmp/python_exec_{uuid.uuid4().hex}.py"
try:
# Write code to temporary file
with open(temp_filename, 'w') as f:
f.write(code)
# Execute the file using subprocess to capture all output
result = subprocess.run(
['/usr/bin/python3', temp_filename],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
timeout=30 # 30 second timeout
)
# Clean up the temporary file
try:
os.remove(temp_filename)
except:
pass # Ignore cleanup errors
# Prepare response
output = result.stdout
error_output = result.stderr
# Combine output and errors if both exist
combined_message = output
if error_output:
combined_message += ('\n' + error_output) if output else error_output
# Determine status based on return code and errors
if result.returncode != 0:
status = 'error'
if not error_output:
# If no stderr but non-zero return code, add a generic error message
error_output = f"Process exited with code {result.returncode}"
combined_message = combined_message + '\n' + error_output if combined_message else error_output
else:
status = 'success'
return jsonify({
'status': status,
'message': combined_message,
'need_more': False, # Not applicable for file execution
'output': output, # stdout only
'error': error_output, # stderr only
'return_code': result.returncode
})
except subprocess.TimeoutExpired:
# Clean up the temporary file on timeout
try:
os.remove(temp_filename)
except:
pass
return jsonify({
'status': 'error',
'message': 'Execution timeout: Code took too long to execute',
'error': 'TimeoutExpired',
'need_more': False,
'output': None,
}), 500
except Exception as e:
# Clean up the temporary file on error
try:
os.remove(temp_filename)
except:
pass
# Capture the exception details
return jsonify({
'status': 'error',
'message': f'Execution error: {str(e)}',
'error': traceback.format_exc(),
'need_more': False,
'output': None,
}), 500
@app.route("/run_bash_script", methods=['POST'])
def run_bash_script():
data = request.json
script = data.get('script', None)
timeout = data.get('timeout', 100) # Default timeout of 30 seconds
working_dir = data.get('working_dir', None)
if not script:
return jsonify({
'status': 'error',
'output': 'Script not supplied!',
'error': "", # Always empty as requested
'returncode': -1
}), 400
# Expand user directory if provided
if working_dir:
working_dir = os.path.expanduser(working_dir)
if not os.path.exists(working_dir):
return jsonify({
'status': 'error',
'output': f'Working directory does not exist: {working_dir}',
'error': "", # Always empty as requested
'returncode': -1
}), 400
# Create a temporary script file
import tempfile
with tempfile.NamedTemporaryFile(mode='w', suffix='.sh', delete=False) as tmp_file:
if "#!/bin/bash" not in script:
script = "#!/bin/bash\n\n" + script
tmp_file.write(script)
tmp_file_path = tmp_file.name
try:
# Make the script executable
os.chmod(tmp_file_path, 0o755)
# Execute the script
if platform_name == "Windows":
# On Windows, use Git Bash or WSL if available, otherwise cmd
flags = subprocess.CREATE_NO_WINDOW
# Try to use bash if available (Git Bash, WSL, etc.)
result = subprocess.run(
['bash', tmp_file_path],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, # Merge stderr into stdout
text=True,
timeout=timeout,
cwd=working_dir,
creationflags=flags,
shell=False
)
else:
# On Unix-like systems, use bash directly
flags = 0
result = subprocess.run(
['/bin/bash', tmp_file_path],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, # Merge stderr into stdout
text=True,
timeout=timeout,
cwd=working_dir,
creationflags=flags,
shell=False
)
# Log the command execution for trajectory recording
_append_event("BashScript",
{"script": script, "output": result.stdout, "error": "", "returncode": result.returncode},
ts=time.time())
return jsonify({
'status': 'success' if result.returncode == 0 else 'error',
'output': result.stdout, # Contains both stdout and stderr merged
'error': "", # Always empty as requested
'returncode': result.returncode
})
except subprocess.TimeoutExpired:
return jsonify({
'status': 'error',
'output': f'Script execution timed out after {timeout} seconds',
'error': "", # Always empty as requested
'returncode': -1
}), 500
except FileNotFoundError:
# Bash not found, try with sh
try:
result = subprocess.run(
['sh', tmp_file_path],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, # Merge stderr into stdout
text=True,
timeout=timeout,
cwd=working_dir,
shell=False
)
_append_event("BashScript",
{"script": script, "output": result.stdout, "error": "", "returncode": result.returncode},
ts=time.time())
return jsonify({
'status': 'success' if result.returncode == 0 else 'error',
'output': result.stdout, # Contains both stdout and stderr merged
'error': "", # Always empty as requested
'returncode': result.returncode,
})
except Exception as e:
return jsonify({
'status': 'error',
'output': f'Failed to execute script: {str(e)}',
'error': "", # Always empty as requested
'returncode': -1
}), 500
except Exception as e:
return jsonify({
'status': 'error',
'output': f'Failed to execute script: {str(e)}',
'error': "", # Always empty as requested
'returncode': -1
}), 500
finally:
# Clean up the temporary file
try:
os.unlink(tmp_file_path)
except:
pass
if __name__ == '__main__':
app.run(debug=True, host="0.0.0.0")

View File

@ -29,6 +29,32 @@
"chrome"
],
"evaluator": {
"postconfig": [
{
"type": "launch",
"parameters": {
"command": [
"pkill",
"chrome"
]
}
},
{
"type": "launch",
"parameters": {
"command": [
"google-chrome",
"--remote-debugging-port=1337"
]
}
},
{
"type": "sleep",
"parameters": {
"seconds": 3
}
}
],
"func": "exact_match",
"result": {
"type": "enable_do_not_track"
@ -40,5 +66,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -63,5 +63,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -75,5 +75,7 @@
}
]
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -42,5 +42,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -58,5 +58,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -80,7 +80,6 @@
"dropLocationName": "Zürich",
"filterCriteria_carCategory": "large",
"filterCriteria_sortBy": "PRICE"
}
}
},
@ -98,12 +97,13 @@
"puYear": "{Year}",
"doDay": "{DayD}",
"doMonth": "{MonthD}",
"doYear":"{Year}"
"doYear": "{Year}"
}
}
}
]
},
"proxy": true,
"possibility_of_env_change": "medium"
"possibility_of_env_change": "medium",
"fixed_ip": false
}

View File

@ -69,5 +69,6 @@
}
},
"proxy": false,
"possibility_of_env_change": "medium"
"possibility_of_env_change": "medium",
"fixed_ip": false
}

View File

@ -29,6 +29,32 @@
"chrome"
],
"evaluator": {
"postconfig": [
{
"type": "launch",
"parameters": {
"command": [
"pkill",
"chrome"
]
}
},
{
"type": "launch",
"parameters": {
"command": [
"google-chrome",
"--remote-debugging-port=1337"
]
}
},
{
"type": "sleep",
"parameters": {
"seconds": 3
}
}
],
"func": "is_expected_bookmarks",
"result": {
"type": "bookmarks"
@ -43,5 +69,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -47,6 +47,12 @@
"--remote-debugging-port=1337"
]
}
},
{
"type": "sleep",
"parameters": {
"seconds": 3
}
}
],
"func": "exact_match",
@ -60,5 +66,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -7,7 +7,7 @@
{
"type": "execute",
"parameters": {
"command": "echo password | sudo -S apt update -y && echo password | sudo -S apt install jq -y",
"command": "echo {CLIENT_PASSWORD} | sudo -S apt update -y && echo {CLIENT_PASSWORD} | sudo -S apt install jq -y",
"shell": true
}
},
@ -54,5 +54,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -1,7 +1,7 @@
{
"id": "35253b65-1c19-4304-8aa4-6884b8218fc0",
"snapshot": "chrome",
"instruction": "Hey, I need a quick way back to this site. Could you whip up a shortcut on my desktop for me?",
"instruction": "Hey, I need a quick way back to this site. Could you whip up a shortcut on my desktop for me using Chrome's built-in feature?",
"source": "https://www.laptopmag.com/articles/how-to-create-desktop-shortcuts-for-web-pages-using-chrome",
"config": [
{
@ -49,5 +49,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -83,5 +83,7 @@
}
]
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -11,5 +11,7 @@
"evaluator": {
"func": "infeasible"
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -254,6 +254,12 @@
"--remote-debugging-port=1337"
]
}
},
{
"type": "sleep",
"parameters": {
"seconds": 3
}
}
],
"func": "check_history_deleted",
@ -271,5 +277,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -1,7 +1,7 @@
{
"id": "47543840-672a-467d-80df-8f7c3b9788c9",
"snapshot": "chrome",
"instruction": "Show me the cars available for pickup at Boston Logan Intl Airport from the 10th to the 11th of next month, sorted by the number of seats to find the largest capacity.",
"instruction": "On the current website, show me the cars available for pickup at Boston Logan Intl Airport from the 10th to the 11th of next month, sorted by the number of seats to find the largest capacity.",
"source": "test_task_1",
"config": [
{
@ -113,5 +113,7 @@
}
]
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -47,6 +47,12 @@
"--remote-debugging-port=1337"
]
}
},
{
"type": "sleep",
"parameters": {
"seconds": 3
}
}
],
"func": "check_enabled_experiments",
@ -64,5 +70,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -56,5 +56,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -18,7 +18,7 @@
{
"type": "execute",
"parameters": {
"command": "echo password | sudo -S apt-get update -y && echo password | sudo -S apt-get install unzip -y && unzip /home/user/Desktop/helloExtension.zip -d /home/user/Desktop/ && rm /home/user/Desktop/helloExtension.zip",
"command": "echo {CLIENT_PASSWORD} | sudo -S apt-get update -y && echo {CLIENT_PASSWORD} | sudo -S apt-get install unzip -y && unzip /home/user/Desktop/helloExtension.zip -d /home/user/Desktop/ && rm /home/user/Desktop/helloExtension.zip",
"shell": true
}
},
@ -58,5 +58,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -74,5 +74,7 @@
}
}
},
"proxy": true
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -38,6 +38,32 @@
"chrome"
],
"evaluator": {
"postconfig": [
{
"type": "launch",
"parameters": {
"command": [
"pkill",
"chrome"
]
}
},
{
"type": "launch",
"parameters": {
"command": [
"google-chrome",
"--remote-debugging-port=1337"
]
}
},
{
"type": "sleep",
"parameters": {
"seconds": 3
}
}
],
"func": "is_expected_bookmarks",
"result": {
"type": "bookmarks"
@ -52,5 +78,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -38,6 +38,32 @@
"chrome"
],
"evaluator": {
"postconfig": [
{
"type": "launch",
"parameters": {
"command": [
"pkill",
"chrome"
]
}
},
{
"type": "launch",
"parameters": {
"command": [
"google-chrome",
"--remote-debugging-port=1337"
]
}
},
{
"type": "sleep",
"parameters": {
"seconds": 3
}
}
],
"func": "is_cookie_deleted",
"result": {
"type": "cookie_data",
@ -53,5 +79,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -60,7 +60,7 @@
"goto_prefix": "https://www.",
"category": "class",
"class_multiObject_search_exist": {
"fT28tf":[
"fT28tf": [
"Black",
"$25 - $60",
"On sale",
@ -92,5 +92,7 @@
}
]
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -66,5 +66,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -70,5 +70,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -11,5 +11,7 @@
"evaluator": {
"func": "infeasible"
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -4,6 +4,20 @@
"instruction": "I want Chrome to warn me whenever I visit a potentially harmful or unsafe website. Can you enable this safety feature?",
"source": "https://www.quora.com/How-do-I-set-the-security-settings-for-the-Google-Chrome-browser-for-the-best-security#:~:text=Enable%20Safe%20Browsing:%20Chrome%20has%20a%20built%2Din,Security%20%3E%20Security%20%3E%20Enable%20Safe%20Browsing.",
"config": [
{
"type": "execute",
"parameters": {
"command": "echo {CLIENT_PASSWORD} | sudo -S apt update -y && echo {CLIENT_PASSWORD} | sudo -S apt install jq -y",
"shell": true
}
},
{
"type": "execute",
"parameters": {
"command": "mkdir -p /home/user/.config/google-chrome/Default && if [ ! -f /home/user/.config/google-chrome/Default/Preferences ]; then echo '{}' > /home/user/.config/google-chrome/Default/Preferences; fi && cd /home/user/.config/google-chrome/Default && jq '. + {\"safebrowsing\":{\"enabled\":false,\"enhanced\":false}}' Preferences > temp && mv temp Preferences",
"shell": true
}
},
{
"type": "launch",
"parameters": {
@ -31,16 +45,33 @@
"evaluator": {
"postconfig": [
{
"type": "execute",
"type": "launch",
"parameters": {
"command": "pkill chrome",
"shell": "true"
"command": [
"pkill",
"chrome"
]
}
},
{
"type": "launch",
"parameters": {
"command": [
"google-chrome",
"--remote-debugging-port=1337"
]
}
},
{
"type": "sleep",
"parameters": {
"seconds": 3
}
}
],
"func": "exact_match",
"result": {
"type": "enable_enhanced_safety_browsing"
"type": "enable_safe_browsing"
},
"expected": {
"type": "rule",
@ -49,5 +80,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -31,10 +31,27 @@
"evaluator": {
"postconfig": [
{
"type": "execute",
"type": "launch",
"parameters": {
"command": "pkill chrome",
"shell": "true"
"command": [
"pkill",
"chrome"
]
}
},
{
"type": "launch",
"parameters": {
"command": [
"google-chrome",
"--remote-debugging-port=1337"
]
}
},
{
"type": "sleep",
"parameters": {
"seconds": 3
}
}
],
@ -49,5 +66,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -75,5 +75,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -57,5 +57,7 @@
}
}
},
"proxy": true
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -43,7 +43,7 @@
"chrome"
],
"evaluator": {
"func": "is_expected_active_tab",
"func": "is_expected_url_pattern_match",
"result": {
"type": "active_url_from_accessTree",
"goto_prefix": "https://www."
@ -51,10 +51,13 @@
"expected": {
"type": "rule",
"rules": {
"type": "url",
"url": "https://www.dmv.virginia.gov/licenses-ids/license/applying/eligibility"
"expected": [
"^https://(www\\.)?dmv\\.virginia\\.gov/licenses-ids/license/applying/eligibility"
]
}
}
},
"proxy": true
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -56,5 +56,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -11,5 +11,7 @@
"evaluator": {
"func": "infeasible"
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -47,6 +47,12 @@
"--remote-debugging-port=1337"
]
}
},
{
"type": "sleep",
"parameters": {
"seconds": 3
}
}
],
"func": "check_font_size",
@ -62,5 +68,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -44,40 +44,40 @@
],
"evaluator": {
"func": [
"exact_match",
"exact_match"
"is_expected_url_pattern_match",
"is_expected_url_pattern_match"
],
"conj": "or",
"result": [
{
"type": "url_dashPart",
"goto_prefix": "https://www.",
"partIndex": -1,
"needDeleteId": false,
"returnType": "string"
"type": "active_url_from_accessTree",
"goto_prefix": "https://www."
},
{
"type": "url_dashPart",
"goto_prefix": "https://www.",
"partIndex": -1,
"needDeleteId": false,
"returnType": "string"
"type": "active_url_from_accessTree",
"goto_prefix": "https://www."
}
],
"expected": [
{
"type": "rule",
"rules": {
"expected": "tamiflu.html#side-effects"
"expected": [
"^https://(www\\.)?drugs\\.com/tamiflu\\.html#side-effects"
]
}
},
{
"type": "rule",
"rules": {
"expected": "tamiflu-side-effects.html"
"expected": [
"^https://(www\\.)?drugs\\.com/sfx/tamiflu-side-effects\\.html"
]
}
}
]
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -1,7 +1,7 @@
{
"id": "b4f95342-463e-4179-8c3f-193cd7241fb2",
"snapshot": "chrome",
"instruction": "Find the next available date for Diamond.",
"instruction": "Find the Next Available dates for Diamond.",
"source": "test_task_1",
"config": [
{
@ -63,5 +63,6 @@
}
},
"proxy": false,
"possibility_of_env_change": "high"
"possibility_of_env_change": "high",
"fixed_ip": false
}

View File

@ -66,10 +66,10 @@
"goto_prefix": "https://www.",
"category": "xpath",
"xpathObject": {
"/html/body/div[1]/main/div[3]/div[5]/div[2]/div/div[1]/div/div/div/div[1]/div/button/div[3]": "from",
"/html/body/div[1]/main/div[3]/div[5]/div[2]/div/div[1]/div/div/div/div[2]/button/div[3]": "to",
"/html/body/div[1]/main/div[3]/div[2]/div/div[1]/div/h2": "city",
"/html/body/div[1]/main/div[3]/div[5]/div[2]/div/div[1]/div/div/div/div[3]/button/div[3]/span/span[2]": "adult",
"/html/body/div[1]/main/div[3]/div[5]/div[2]/div/div[1]/div/div/div/div[1]/button/span/div/div": "from",
"/html/body/div[1]/main/div[3]/div[5]/div[2]/div/div[1]/div/div/div/div[2]/button/span/div/div": "to",
"/html/body/div[1]/main/div[3]/div[2]/div/div/div/h2": "city",
"/html/body/div[1]/main/div[3]/div[5]/div[2]/div/div[1]/div/div/div/div[3]/button/span/div/div": "adult",
"/html/body/div[1]/main/div[3]/div[5]/div[2]/div/div[3]/div/div[2]/div/div/div[2]/div/button/div/div": "rank"
}
}
@ -101,10 +101,10 @@
},
"timezone": "America/New_York",
"expected": {
"from": "{DoW}, {Month} {Day0D}",
"to": "{DoW}, {Month} {Day0D}",
"from": "Check In{DoW}, {Month} {Day0D}",
"to": "Check Out{DoW}, {Month} {Day0D}",
"city": "New York City Hotels",
"adult": "2 guests",
"adult": "Rooms/Guests1 Room, 2 Guests",
"rank": "Price (low to high)"
}
}
@ -112,5 +112,6 @@
]
},
"proxy": true,
"possibility_of_env_change": "medium"
"possibility_of_env_change": "high",
"fixed_ip": false
}

View File

@ -29,6 +29,32 @@
"chrome"
],
"evaluator": {
"postconfig": [
{
"type": "launch",
"parameters": {
"command": [
"pkill",
"chrome"
]
}
},
{
"type": "launch",
"parameters": {
"command": [
"google-chrome",
"--remote-debugging-port=1337"
]
}
},
{
"type": "sleep",
"parameters": {
"seconds": 3
}
}
],
"func": "match_in_list",
"result": {
"type": "default_search_engine"
@ -43,5 +69,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -52,11 +52,12 @@
"type": "rule",
"rules": {
"expected": [
"united.com/en/us/checked-bag-fee-calculator"
"united\\.com/en/us/checked-bag-fee-calculator(/.*)?"
]
}
}
},
"proxy": true,
"possibility_of_env_change": "medium"
"possibility_of_env_change": "medium",
"fixed_ip": false
}

View File

@ -79,5 +79,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -104,5 +104,7 @@
}
]
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -49,5 +49,7 @@
"dest": "LLM Powered Autonomous Agents _ Lil'Log_gold.pdf"
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -56,5 +56,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -56,5 +56,7 @@
}
}
},
"proxy": false
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -67,5 +67,6 @@
}
},
"proxy": true,
"possibility_of_env_change": "medium"
"possibility_of_env_change": "medium",
"fixed_ip": false
}

View File

@ -78,5 +78,7 @@
}
}
},
"proxy": true
"proxy": true,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

View File

@ -1,7 +1,7 @@
{
"id": "fc6d8143-9452-4171-9459-7f515143419a",
"snapshot": "chrome",
"instruction": "Find the status of tomorrow flights from New York airports to Columbus in Ohio.",
"instruction": "Find flights from New YorkKennedy Airport to Chicago O'Hare Airport for tomorrow.",
"source": "test_task_0",
"config": [
{
@ -65,12 +65,14 @@
"from": "tomorrow"
},
"expected": {
"start": "NYC",
"end": "CMH",
"start": "JFK",
"end": "ORD",
"time": "{DoW}, {Month} {Day0D}, {Year}"
}
}
}
},
"proxy": true
"proxy": false,
"fixed_ip": false,
"possibility_of_env_change": "low"
}

Some files were not shown because too many files have changed in this diff Show More