OSWorld

Commit Graph

Author	SHA1	Message	Date
adlsdztony	19095947bb	feat: add proxy handling logic and clean up imports	2025-07-24 07:35:37 +00:00
Jiarui Yao	4fd8b5be0a	support docker without kvm (#282 )	2025-07-24 12:31:43 +08:00
张逸群	bf78b6d05e	Add OPENAI_BASE_URL support for custom OpenAI-compatible endpoints (#283 ) Enables GPT models to use custom API endpoints through OPENAI_BASE_URL environment variable. This addresses the limitation where only Azure OpenAI supported custom endpoints while standard GPT models were hardcoded to api.openai.com. - Add intelligent URL handling to avoid duplicate /v1 paths - Maintain backward compatibility with default OpenAI API - Update README with configuration instructions - Non-breaking change preserving existing functionality Fixes API integration issues for users with custom OpenAI-compatible services.	2025-07-24 12:31:08 +08:00
yuanmengqi	0b0f8413df	Merge branch 'main' of github.com:xlang-ai/OSWorld	2025-07-23 17:19:54 +00:00
yuanmengqi	0a2929137b	Simplify the logic for Docker provider	2025-07-23 17:19:47 +00:00
Yan98	2f3a6c48f6	Fix Typos (#275 ) * init * init * fix typo	2025-07-24 00:06:04 +08:00
yuanmengqi	fd7381210e	Merge branch 'main' of github.com:xlang-ai/OSWorld	2025-07-23 16:05:42 +00:00
yuanmengqi	5d219e7a5b	Clean code	2025-07-23 16:05:39 +00:00
Yuan Mengqi	d128edbbc1	add nogdrive json (#281 ) * add uitars agent code * improve claude * improve claude * improve claude * improve claude * improve claude * add nogdrive json	2025-07-23 19:12:42 +08:00
张逸群	4d6e0fd031	Add --provider_name parameter to run.py and fix Docker provider initialization (#277 ) - Add command-line argument --provider_name to support flexible provider selection - Default provider remains vmware for backward compatibility - Fix Docker provider controller initialization issue with delayed setup - Add safety checks for controller existence in error handling This enables users to specify different virtualization providers directly from the command line and resolves Docker container lifecycle issues.	2025-07-23 04:09:36 +08:00
yuanmengqi	73de48af75	Update Public Evaluation Guidelines and README to require Python 3.10 and enhance installation instructions. Added troubleshooting tips for environment issues and clarified access key creation process in AWS for better security practices.	2025-07-22 19:57:55 +00:00
yuanmengqi	82c3cdd590	feat: refactor run_multienv_qwen25vl.py and qwen25vl_agent.py for improved logging and task management - Introduced signal handling for graceful shutdown of environments and processes. - Enhanced logging configuration to support dynamic log levels and structured output. - Updated argument parsing to include new parameters for model selection and task execution. - Refactored task distribution logic to streamline environment task management. - Improved error handling during task execution and environment cleanup. - Adjusted Qwen25VLAgent initialization to support new model and thought prefix options. - Reduced max tries for LLM calls to optimize performance.	2025-07-22 19:46:42 +00:00
张逸群	4a5d48000f	feat: add HuggingFace mirror support for VM providers (#278 ) Add support for HuggingFace mirror (hf-mirror.com) to improve download speeds in regions where huggingface.co access is slow. - Support HF_ENDPOINT environment variable detection - Automatically switch to hf-mirror.com when HF_ENDPOINT is set - Apply to Docker, VMware, and VirtualBox providers - Maintain backward compatibility with default huggingface.co URLs Users can now set HF_ENDPOINT=https://hf-mirror.com to use the mirror.	2025-07-23 03:40:35 +08:00
Yuan Mengqi	0a37cccd53	update claude (#280 ) * add uitars agent code * improve claude * improve claude * improve claude * improve claude * improve claude	2025-07-23 03:35:49 +08:00
Dunjie Lu	53fb96298a	support_qwen25vl (#276 ) Co-authored-by: root <ludunjie1219@github.com>	2025-07-22 16:33:03 +08:00
yuanmengqi	921321c5df	Update Public Evaluation Guidelines to clarify proxy settings. Added information on automatic proxy wrapping for proxy-sensitive tasks and retained the recommendation for users to disable the proxy if not needed. Ensured existing content structure remains intact.	2025-07-22 05:59:57 +00:00
yuanmengqi	2727696835	Enhance Public Evaluation Guidelines by adding new images for AWS setup and monitoring instructions. Included additional contact information for leaderboard updates and error reporting. Ensured clarity and usability for users while preserving existing content structure.	2025-07-22 05:53:33 +00:00
yuanmengqi	05e25ba1b7	Enhance Public Evaluation Guidelines with detailed AWS setup instructions and security configurations. Added new sections for host and client machine setup, including recommended instance types, storage considerations, and security group rules. Updated existing content for clarity and added a new image for Google Drive authentication. Ensure all changes maintain original logic while improving usability for users with varying AWS experience.	2025-07-22 05:35:58 +00:00
yuanmengqi	feaebbc2ec	Update AWS guidance	2025-07-20 16:42:14 +00:00
yuanmengqi	46c9407879	Clean elder version of opencua experiment runner	2025-07-20 07:57:27 +00:00
yuanmengqi	91bc6bb6ce	Merge branch 'main' of github.com:xlang-ai/OSWorld	2025-07-20 07:55:57 +00:00
yuanmengqi	88d5639a2a	Compatible with agents that cannot use runtime log	2025-07-20 07:55:53 +00:00
Xinyuan Wang	e10dd9267c	Wxy/opencua (#274 ) * OpenCUA Agent code base * update url * debug, modify url input * debug opencua * show result * debug agent history overlap * modify opencua agent; add comment lines * update parallel; clean code; use sleep 3s * ui-tars-0717	2025-07-20 15:52:23 +08:00
Danyang Zhang	bec7129fff	Calc eval fix (#273 ) * ver Jun17th updating annotations * ver Jun17th corrected annotation of 1d17 added check for cell merge * ver Jun17th updated several annotations * ver Jun20th fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08 * fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations. * ver Jun21st updating calc evals * ver Jun22nd fixed an impress task * ver Jun22ndv2 adjusted several calc tasks * Clean scalfolds * ver Jul18th added two try-excepts to handle possible formula parsing and calculation failures * ver Jul19th added supports for cellIs and some other new types of conditional formatting for calc evaluation --------- Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk> Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>	2025-07-19 17:15:40 +08:00
yuanmengqi	c6c62c52d7	feat: add X11 image handling and enhanced OCR processing - Introduced a new function `read_x11_image` to read and convert X11 (XWD) format images to PIL Image, supporting both 24-bit and 32-bit formats. - Enhanced the `compare_image_text` function to include checks for X11 image formats, with multiple conversion attempts using PIL, a custom reader, and netpbm tools. - Improved error handling and logging for OCR processing, providing detailed feedback on conversion attempts and potential issues with X11 images. - Maintained existing logic while expanding functionality for better image processing reliability.	2025-07-18 19:26:29 +00:00
yuanmengqi	d6f2190a9f	fix: refine instruction in OS evaluation example to clarify restrictions on logging out or shutting down the machine	2025-07-18 18:51:01 +00:00
yuanmengqi	d52f3b1fca	feat: add safe image opening function with retry mechanism - Introduced a new function `safe_open_image_with_retry` to handle image file opening with retries for truncated or corrupted files. - Enhanced error handling and logging for image processing in `check_palette_and_structure_sim`. - Updated the logic to safely open both source and target images, ensuring robust evaluation without altering existing functionality. These changes improve the reliability of image handling in the GIMP evaluator while maintaining the original code logic.	2025-07-18 18:36:09 +00:00
yuanmengqi	4fa59ebba2	feat: enhance URL comparison logic and Chrome debugging configuration - Added a new function to ensure URLs have a scheme, defaulting to 'http://' if missing. - Integrated tldextract to normalize URLs by extracting domain parts and handling 'www' subdomains. - Updated the compare_urls function to include logging for better traceability during URL comparisons. - Added tldextract to requirements.txt to support the new functionality. - Updated the AWS manager with a new AMI ID for the specified resolution. - Modified Chrome desktop launcher to include --remote-debugging-port=1337 for GUI debugging support. These changes improve the robustness of URL handling and enable consistent Chrome debugging capabilities without altering existing logic.	2025-07-18 17:55:45 +00:00
yuanmengqi	1ade6fe439	Merge branch 'main' of github.com:xlang-ai/OSWorld	2025-07-18 14:17:37 +00:00
yuanmengqi	44bd66fc9a	Increase timeout for page load stability in Chrome evaluator - Updated the timeout for the page load state from 10 seconds to 60 seconds to ensure better stability during page processing. - Removed redundant retry mechanisms from the active tab checks to streamline the code while maintaining existing functionality. - Enhanced logging to provide clearer insights into the page loading process. These changes aim to improve the reliability of the Chrome evaluator without altering the core logic.	2025-07-18 14:16:16 +00:00
Danyang Zhang	53ffc05042	Calc eval fix (#272 ) * ver Jun17th updating annotations * ver Jun17th corrected annotation of 1d17 added check for cell merge * ver Jun17th updated several annotations * ver Jun20th fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08 * fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations. * ver Jun21st updating calc evals * ver Jun22nd fixed an impress task * ver Jun22ndv2 adjusted several calc tasks * Clean scalfolds * ver Jul18th added two try-excepts to handle possible formula parsing and calculation failures --------- Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk> Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>	2025-07-18 21:28:48 +08:00
Zilong Zhou	7fb1cee575	fix: img path error (#271 ) * feat&style: add task status configuration and clear cache functionality; enhance UI styles * feat&refactor: enhance current configuration API and improve cache clearing logic * refactor&style: simplify task status update logic and improve page refresh mechanism * refactor&feat: streamline default configuration retrieval and enhance cache initialization logic * feat&refactor: add caching to default configuration retrieval and streamline task status logic * feat&style: add collapsible section for additional model parameters and enhance styling for config items * refactor&style: remove floating action button and clean up related styles * fix: update video and screenshot sources to include action space, observation type, and model name parameters	2025-07-18 19:52:03 +08:00
shenzhennan	1378c745e1	Merge branch 'main' of https://github.com/xlang-ai/OSWorld	2025-07-18 07:15:18 +00:00
shenzhennan	c7017a476d	fix impress instruction 0a211154	2025-07-18 07:14:35 +00:00
yuanmengqi	fcaefe7bb4	Enhance Chrome evaluator with improved error handling and retry mechanisms - Added robust error handling for page processing, including checks for closed pages and HTTP status codes. - Implemented retry logic for page loads and active tab checks to improve reliability. - Enhanced logging throughout the process to capture detailed information about failures and successes. - Preserved existing logic while ensuring better maintainability and robustness in the Chrome evaluator functions.	2025-07-18 07:13:13 +00:00
yuanmengqi	0fb625e4fd	Update instruction in OS evaluation example to include a restriction against shutting down the machine.	2025-07-18 05:28:43 +00:00
yuanmengqi	b9a646c11d	Merge branch 'main' of github.com:xlang-ai/OSWorld	2025-07-17 18:19:19 +00:00
yuanmengqi	d1ddd3eacd	feat: enhance VM wallpaper retrieval and image similarity checks - Added logging to the VM wallpaper retrieval function to capture errors and warnings related to content retrieval and file creation. - Implemented checks for None, empty, and invalid content types to ensure robustness in wallpaper handling. - Enhanced the SSIM structure check function with size validation and improved error handling for image processing. - Added logging for image size discrepancies and exceptions during SSIM computation to aid in debugging. These changes improve error handling and logging, ensuring better maintainability and reliability of the evaluators.	2025-07-17 18:19:09 +00:00
Zilong Zhou	66694c663d	Feat/monitor cache (#267 ) * feat&style: add task status configuration and clear cache functionality; enhance UI styles * feat&refactor: enhance current configuration API and improve cache clearing logic * refactor&style: simplify task status update logic and improve page refresh mechanism * refactor&feat: streamline default configuration retrieval and enhance cache initialization logic * feat&refactor: add caching to default configuration retrieval and streamline task status logic * feat&style: add collapsible section for additional model parameters and enhance styling for config items * refactor&style: remove floating action button and clean up related styles	2025-07-18 01:58:20 +08:00
yuanmengqi	e70cf0bd93	Merge branch 'main' of github.com:xlang-ai/OSWorld	2025-07-17 11:15:53 +00:00
yuanmengqi	9d04624e41	feat: enhance Chrome evaluator with improved retry logic and logging - Implemented retry mechanism for connecting to Chrome instances, allowing up to two attempts before failure. - Increased timeout settings for page navigation and loading to enhance reliability. - Added detailed logging for connection attempts, page loading status, and error handling to improve debugging and user experience. - Ensured existing logic is preserved while enhancing error handling and operational robustness. These changes improve the overall reliability and maintainability of the Chrome evaluator functions.	2025-07-17 11:15:47 +00:00
yuanmengqi	2c51950e73	feat: enhance evaluator configuration for Chrome with post-execution commands - Added postconfig commands to multiple JSON files for Chrome evaluation examples. - Included commands to terminate existing Chrome processes, launch Chrome with remote debugging, and introduce sleep intervals for timing. - Updated logging messages in the AWS manager to improve clarity and user experience. These changes enhance the automation and usability of the evaluation examples while preserving existing logic.	2025-07-17 10:50:10 +00:00
Yuan Mengqi	5ca516ac7a	add uitars agent code (#265 )	2025-07-17 18:17:13 +08:00
Xinyuan Wang	24fbad9015	Merge pull request #264 from yuanmengqi/main Improve the parallel logic	2025-07-17 12:28:48 +08:00
yuanmengqi	fe40011b5d	Improve the parallel logic	2025-07-17 04:21:42 +00:00
yuanmengqi	6788c58aa3	Improve the parallel logic	2025-07-17 04:20:59 +00:00
yuanmengqi	bb8b0b2582	Improve the parallel logic	2025-07-17 04:19:44 +00:00
yuanmengqi	150234307e	Merge branch 'fix_chrome'	2025-07-17 04:14:47 +00:00
yuanmengqi	9eeabfc52d	Improve the parallel logic	2025-07-17 04:14:20 +00:00
yuanmengqi	2a48500691	refactor: improve code readability and error handling in table.py - Reformatted import statements for better organization. - Enhanced error handling in file reading functions to provide clearer logging. - Introduced a new `_safe_read_file` function to handle multiple encoding attempts when reading files. - Updated the `compare_csv` function to utilize the new file reading method, improving robustness. - Ensured consistent return values across functions, replacing `0.` with `0.0` for clarity. These changes maintain existing logic while improving code maintainability and readability.	2025-07-16 18:11:05 +00:00

1 2 3 4 5 ...

1250 Commits All Branches Search

1250 Commits

All Branches