Commit Graph

1250 Commits

Author SHA1 Message Date
adlsdztony 19095947bb feat: add proxy handling logic and clean up imports 2025-07-24 07:35:37 +00:00
Jiarui Yao 4fd8b5be0a
support docker without kvm (#282) 2025-07-24 12:31:43 +08:00
张逸群 bf78b6d05e
Add OPENAI_BASE_URL support for custom OpenAI-compatible endpoints (#283)
Enables GPT models to use custom API endpoints through OPENAI_BASE_URL environment variable. This addresses the limitation where only Azure OpenAI supported custom endpoints while standard GPT models were hardcoded to api.openai.com.

- Add intelligent URL handling to avoid duplicate /v1 paths
- Maintain backward compatibility with default OpenAI API
- Update README with configuration instructions
- Non-breaking change preserving existing functionality

Fixes API integration issues for users with custom OpenAI-compatible services.
2025-07-24 12:31:08 +08:00
yuanmengqi 0b0f8413df Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-23 17:19:54 +00:00
yuanmengqi 0a2929137b Simplify the logic for Docker provider 2025-07-23 17:19:47 +00:00
Yan98 2f3a6c48f6
Fix Typos (#275)
* init

* init

* fix typo
2025-07-24 00:06:04 +08:00
yuanmengqi fd7381210e Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-23 16:05:42 +00:00
yuanmengqi 5d219e7a5b Clean code 2025-07-23 16:05:39 +00:00
Yuan Mengqi d128edbbc1
add nogdrive json (#281)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude

* add nogdrive json
2025-07-23 19:12:42 +08:00
张逸群 4d6e0fd031
Add --provider_name parameter to run.py and fix Docker provider initialization (#277)
- Add command-line argument --provider_name to support flexible provider selection
- Default provider remains vmware for backward compatibility
- Fix Docker provider controller initialization issue with delayed setup
- Add safety checks for controller existence in error handling

This enables users to specify different virtualization providers directly
from the command line and resolves Docker container lifecycle issues.
2025-07-23 04:09:36 +08:00
yuanmengqi 73de48af75 Update Public Evaluation Guidelines and README to require Python 3.10 and enhance installation instructions. Added troubleshooting tips for environment issues and clarified access key creation process in AWS for better security practices. 2025-07-22 19:57:55 +00:00
yuanmengqi 82c3cdd590 feat: refactor run_multienv_qwen25vl.py and qwen25vl_agent.py for improved logging and task management
- Introduced signal handling for graceful shutdown of environments and processes.
- Enhanced logging configuration to support dynamic log levels and structured output.
- Updated argument parsing to include new parameters for model selection and task execution.
- Refactored task distribution logic to streamline environment task management.
- Improved error handling during task execution and environment cleanup.
- Adjusted Qwen25VLAgent initialization to support new model and thought prefix options.
- Reduced max tries for LLM calls to optimize performance.
2025-07-22 19:46:42 +00:00
张逸群 4a5d48000f
feat: add HuggingFace mirror support for VM providers (#278)
Add support for HuggingFace mirror (hf-mirror.com) to improve download
speeds in regions where huggingface.co access is slow.

- Support HF_ENDPOINT environment variable detection
- Automatically switch to hf-mirror.com when HF_ENDPOINT is set
- Apply to Docker, VMware, and VirtualBox providers
- Maintain backward compatibility with default huggingface.co URLs

Users can now set HF_ENDPOINT=https://hf-mirror.com to use the mirror.
2025-07-23 03:40:35 +08:00
Yuan Mengqi 0a37cccd53
update claude (#280)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude
2025-07-23 03:35:49 +08:00
Dunjie Lu 53fb96298a
support_qwen25vl (#276)
Co-authored-by: root <ludunjie1219@github.com>
2025-07-22 16:33:03 +08:00
yuanmengqi 921321c5df Update Public Evaluation Guidelines to clarify proxy settings. Added information on automatic proxy wrapping for proxy-sensitive tasks and retained the recommendation for users to disable the proxy if not needed. Ensured existing content structure remains intact. 2025-07-22 05:59:57 +00:00
yuanmengqi 2727696835 Enhance Public Evaluation Guidelines by adding new images for AWS setup and monitoring instructions. Included additional contact information for leaderboard updates and error reporting. Ensured clarity and usability for users while preserving existing content structure. 2025-07-22 05:53:33 +00:00
yuanmengqi 05e25ba1b7 Enhance Public Evaluation Guidelines with detailed AWS setup instructions and security configurations. Added new sections for host and client machine setup, including recommended instance types, storage considerations, and security group rules. Updated existing content for clarity and added a new image for Google Drive authentication. Ensure all changes maintain original logic while improving usability for users with varying AWS experience. 2025-07-22 05:35:58 +00:00
yuanmengqi feaebbc2ec Update AWS guidance 2025-07-20 16:42:14 +00:00
yuanmengqi 46c9407879 Clean elder version of opencua experiment runner 2025-07-20 07:57:27 +00:00
yuanmengqi 91bc6bb6ce Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-20 07:55:57 +00:00
yuanmengqi 88d5639a2a Compatible with agents that cannot use runtime log 2025-07-20 07:55:53 +00:00
Xinyuan Wang e10dd9267c
Wxy/opencua (#274)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines

* update parallel; clean code; use sleep 3s

* ui-tars-0717
2025-07-20 15:52:23 +08:00
Danyang Zhang bec7129fff
Calc eval fix (#273)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

* ver Jul19th

added supports for cellIs and some other new types of conditional
formatting for calc evaluation

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-19 17:15:40 +08:00
yuanmengqi c6c62c52d7 feat: add X11 image handling and enhanced OCR processing
- Introduced a new function `read_x11_image` to read and convert X11 (XWD) format images to PIL Image, supporting both 24-bit and 32-bit formats.
- Enhanced the `compare_image_text` function to include checks for X11 image formats, with multiple conversion attempts using PIL, a custom reader, and netpbm tools.
- Improved error handling and logging for OCR processing, providing detailed feedback on conversion attempts and potential issues with X11 images.
- Maintained existing logic while expanding functionality for better image processing reliability.
2025-07-18 19:26:29 +00:00
yuanmengqi d6f2190a9f fix: refine instruction in OS evaluation example to clarify restrictions on logging out or shutting down the machine 2025-07-18 18:51:01 +00:00
yuanmengqi d52f3b1fca feat: add safe image opening function with retry mechanism
- Introduced a new function `safe_open_image_with_retry` to handle image file opening with retries for truncated or corrupted files.
- Enhanced error handling and logging for image processing in `check_palette_and_structure_sim`.
- Updated the logic to safely open both source and target images, ensuring robust evaluation without altering existing functionality.

These changes improve the reliability of image handling in the GIMP evaluator while maintaining the original code logic.
2025-07-18 18:36:09 +00:00
yuanmengqi 4fa59ebba2 feat: enhance URL comparison logic and Chrome debugging configuration
- Added a new function to ensure URLs have a scheme, defaulting to 'http://' if missing.
- Integrated tldextract to normalize URLs by extracting domain parts and handling 'www' subdomains.
- Updated the compare_urls function to include logging for better traceability during URL comparisons.
- Added tldextract to requirements.txt to support the new functionality.
- Updated the AWS manager with a new AMI ID for the specified resolution.
- Modified Chrome desktop launcher to include --remote-debugging-port=1337 for GUI debugging support.

These changes improve the robustness of URL handling and enable consistent Chrome debugging capabilities without altering existing logic.
2025-07-18 17:55:45 +00:00
yuanmengqi 1ade6fe439 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-18 14:17:37 +00:00
yuanmengqi 44bd66fc9a Increase timeout for page load stability in Chrome evaluator
- Updated the timeout for the page load state from 10 seconds to 60 seconds to ensure better stability during page processing.
- Removed redundant retry mechanisms from the active tab checks to streamline the code while maintaining existing functionality.
- Enhanced logging to provide clearer insights into the page loading process.

These changes aim to improve the reliability of the Chrome evaluator without altering the core logic.
2025-07-18 14:16:16 +00:00
Danyang Zhang 53ffc05042
Calc eval fix (#272)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-18 21:28:48 +08:00
Zilong Zhou 7fb1cee575
fix: img path error (#271)
* feat&style: add task status configuration and clear cache functionality; enhance UI styles

* feat&refactor: enhance current configuration API and improve cache clearing logic

* refactor&style: simplify task status update logic and improve page refresh mechanism

* refactor&feat: streamline default configuration retrieval and enhance cache initialization logic

* feat&refactor: add caching to default configuration retrieval and streamline task status logic

* feat&style: add collapsible section for additional model parameters and enhance styling for config items

* refactor&style: remove floating action button and clean up related styles

* fix: update video and screenshot sources to include action space, observation type, and model name parameters
2025-07-18 19:52:03 +08:00
shenzhennan 1378c745e1 Merge branch 'main' of https://github.com/xlang-ai/OSWorld 2025-07-18 07:15:18 +00:00
shenzhennan c7017a476d fix impress instruction 0a211154 2025-07-18 07:14:35 +00:00
yuanmengqi fcaefe7bb4 Enhance Chrome evaluator with improved error handling and retry mechanisms
- Added robust error handling for page processing, including checks for closed pages and HTTP status codes.
- Implemented retry logic for page loads and active tab checks to improve reliability.
- Enhanced logging throughout the process to capture detailed information about failures and successes.
- Preserved existing logic while ensuring better maintainability and robustness in the Chrome evaluator functions.
2025-07-18 07:13:13 +00:00
yuanmengqi 0fb625e4fd Update instruction in OS evaluation example to include a restriction against shutting down the machine. 2025-07-18 05:28:43 +00:00
yuanmengqi b9a646c11d Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-17 18:19:19 +00:00
yuanmengqi d1ddd3eacd feat: enhance VM wallpaper retrieval and image similarity checks
- Added logging to the VM wallpaper retrieval function to capture errors and warnings related to content retrieval and file creation.
- Implemented checks for None, empty, and invalid content types to ensure robustness in wallpaper handling.
- Enhanced the SSIM structure check function with size validation and improved error handling for image processing.
- Added logging for image size discrepancies and exceptions during SSIM computation to aid in debugging.

These changes improve error handling and logging, ensuring better maintainability and reliability of the evaluators.
2025-07-17 18:19:09 +00:00
Zilong Zhou 66694c663d
Feat/monitor cache (#267)
* feat&style: add task status configuration and clear cache functionality; enhance UI styles

* feat&refactor: enhance current configuration API and improve cache clearing logic

* refactor&style: simplify task status update logic and improve page refresh mechanism

* refactor&feat: streamline default configuration retrieval and enhance cache initialization logic

* feat&refactor: add caching to default configuration retrieval and streamline task status logic

* feat&style: add collapsible section for additional model parameters and enhance styling for config items

* refactor&style: remove floating action button and clean up related styles
2025-07-18 01:58:20 +08:00
yuanmengqi e70cf0bd93 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-17 11:15:53 +00:00
yuanmengqi 9d04624e41 feat: enhance Chrome evaluator with improved retry logic and logging
- Implemented retry mechanism for connecting to Chrome instances, allowing up to two attempts before failure.
- Increased timeout settings for page navigation and loading to enhance reliability.
- Added detailed logging for connection attempts, page loading status, and error handling to improve debugging and user experience.
- Ensured existing logic is preserved while enhancing error handling and operational robustness.

These changes improve the overall reliability and maintainability of the Chrome evaluator functions.
2025-07-17 11:15:47 +00:00
yuanmengqi 2c51950e73 feat: enhance evaluator configuration for Chrome with post-execution commands
- Added postconfig commands to multiple JSON files for Chrome evaluation examples.
- Included commands to terminate existing Chrome processes, launch Chrome with remote debugging, and introduce sleep intervals for timing.
- Updated logging messages in the AWS manager to improve clarity and user experience.

These changes enhance the automation and usability of the evaluation examples while preserving existing logic.
2025-07-17 10:50:10 +00:00
Yuan Mengqi 5ca516ac7a
add uitars agent code (#265) 2025-07-17 18:17:13 +08:00
Xinyuan Wang 24fbad9015
Merge pull request #264 from yuanmengqi/main
Improve the parallel logic
2025-07-17 12:28:48 +08:00
yuanmengqi fe40011b5d Improve the parallel logic 2025-07-17 04:21:42 +00:00
yuanmengqi 6788c58aa3 Improve the parallel logic 2025-07-17 04:20:59 +00:00
yuanmengqi bb8b0b2582 Improve the parallel logic 2025-07-17 04:19:44 +00:00
yuanmengqi 150234307e Merge branch 'fix_chrome' 2025-07-17 04:14:47 +00:00
yuanmengqi 9eeabfc52d Improve the parallel logic 2025-07-17 04:14:20 +00:00
yuanmengqi 2a48500691 refactor: improve code readability and error handling in table.py
- Reformatted import statements for better organization.
- Enhanced error handling in file reading functions to provide clearer logging.
- Introduced a new `_safe_read_file` function to handle multiple encoding attempts when reading files.
- Updated the `compare_csv` function to utilize the new file reading method, improving robustness.
- Ensured consistent return values across functions, replacing `0.` with `0.0` for clarity.

These changes maintain existing logic while improving code maintainability and readability.
2025-07-16 18:11:05 +00:00