Commit Graph

1267 Commits

Author SHA1 Message Date
Jiaqi 4babaf30b3 Merge branch 'main' into jq/dev 2025-07-27 05:21:49 +00:00
yuanmengqi 122b16742b fix: improve EPUB processing by checking for file existence before reading
- Added checks for the presence of "toc.ncx" and "content.opf" in the EPUB file before attempting to process them.
- Introduced debug logging to notify when these files are not found, enhancing error handling and traceability.
- Maintained existing logic while improving robustness of the EPUB processing function.
2025-07-26 20:42:18 +00:00
Jiaqi f998aca0b5 merge main into jq/dev 2025-07-26 15:27:43 +00:00
yuanmengqi b25854edba feat: introduce DummyAgent class for enhanced coordinate handling
- Added DummyAgent class to facilitate coordinate generation and action assignment.
- Updated GTA1Agent to utilize DummyAgent for improved planning and execution.
- Increased max_steps and N_SEQ parameters for better performance.
- Enhanced logging for planning and execution processes.
- Maintained existing logic while integrating new functionality.
2025-07-26 08:26:23 +00:00
yuanmengqi d49ca9cc2d fix: enhance handling of '<' characters in pyautogui commands
- Refactor _fix_pyautogui_less_than_bug to improve handling of press('<') and typewrite calls.
- Introduce Unicode escape decoding for typewrite content to ensure proper '<' character processing.
- Maintain existing logic while enhancing functionality for better compatibility.
2025-07-26 07:59:37 +00:00
yuanmengqi 123f51ea4a Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-26 07:28:39 +00:00
yuanmengqi 73caf53880 delete: remove img_utils.py and update imports in jedi_3b_agent.py and jedi_7b_agent.py to use qwen_vl_utils 2025-07-26 07:28:31 +00:00
张逸群 2ed0436c21
fix: update DockerVMManager method signatures for interface compatibility (#287)
- Fix delete_vm() method to accept region and **kwargs parameters
- Fix occupy_vm() method to accept pid, region and **kwargs parameters
- Ensures consistency with base VMManager interface and other providers
- Resolves runtime argument mismatch errors when calling these methods

This maintains backward compatibility while fixing the interface contract.
2025-07-26 01:18:00 +08:00
yuanmengqi 40fdc6266f chore: update default AWS instance type from t3.xlarge to t3.medium 2025-07-25 15:56:42 +00:00
yuanmengqi 39e5baf5ae fix: remove unnecessary sleep and observation retrieval in run_single_example function 2025-07-25 15:51:20 +00:00
yuanmengqi f5595df71c delete: remove gat1_agent.py file 2025-07-25 07:11:55 +00:00
Zilong Zhou b8b9e9b166
feat: add proxy handling logic and clean up imports (#285) 2025-07-24 16:27:56 +08:00
Zilong Zhou cbe650d0bb
refactor&delete: simplify AWS VM allocation and remove proxy support (#284) 2025-07-24 16:27:18 +08:00
Jiaqi 826c0ef945 Merge branch 'main' into jq/dev 2025-07-24 08:13:07 +00:00
Jiaqi 23b81993fa os task fix: set the default dim screen time to be 300s 2025-07-24 08:13:02 +00:00
Jiaqi 80b80617c4 os task fix: set the default dim screen time to be 300s 2025-07-24 08:12:45 +00:00
ChenYXxxx 873f8a0359
Update 10a730d5-d414-4b40-b479-684bed1ae522.json
change the ight 2 the night
2025-07-24 15:44:52 +08:00
Jiaqi 0f1ef6d9b7 use aws pub ip 2025-07-24 06:04:01 +00:00
Jiarui Yao 4fd8b5be0a
support docker without kvm (#282) 2025-07-24 12:31:43 +08:00
张逸群 bf78b6d05e
Add OPENAI_BASE_URL support for custom OpenAI-compatible endpoints (#283)
Enables GPT models to use custom API endpoints through OPENAI_BASE_URL environment variable. This addresses the limitation where only Azure OpenAI supported custom endpoints while standard GPT models were hardcoded to api.openai.com.

- Add intelligent URL handling to avoid duplicate /v1 paths
- Maintain backward compatibility with default OpenAI API
- Update README with configuration instructions
- Non-breaking change preserving existing functionality

Fixes API integration issues for users with custom OpenAI-compatible services.
2025-07-24 12:31:08 +08:00
yuanmengqi 0b0f8413df Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-23 17:19:54 +00:00
yuanmengqi 0a2929137b Simplify the logic for Docker provider 2025-07-23 17:19:47 +00:00
Yan98 2f3a6c48f6
Fix Typos (#275)
* init

* init

* fix typo
2025-07-24 00:06:04 +08:00
yuanmengqi fd7381210e Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-23 16:05:42 +00:00
yuanmengqi 5d219e7a5b Clean code 2025-07-23 16:05:39 +00:00
Yuan Mengqi d128edbbc1
add nogdrive json (#281)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude

* add nogdrive json
2025-07-23 19:12:42 +08:00
张逸群 4d6e0fd031
Add --provider_name parameter to run.py and fix Docker provider initialization (#277)
- Add command-line argument --provider_name to support flexible provider selection
- Default provider remains vmware for backward compatibility
- Fix Docker provider controller initialization issue with delayed setup
- Add safety checks for controller existence in error handling

This enables users to specify different virtualization providers directly
from the command line and resolves Docker container lifecycle issues.
2025-07-23 04:09:36 +08:00
yuanmengqi 73de48af75 Update Public Evaluation Guidelines and README to require Python 3.10 and enhance installation instructions. Added troubleshooting tips for environment issues and clarified access key creation process in AWS for better security practices. 2025-07-22 19:57:55 +00:00
yuanmengqi 82c3cdd590 feat: refactor run_multienv_qwen25vl.py and qwen25vl_agent.py for improved logging and task management
- Introduced signal handling for graceful shutdown of environments and processes.
- Enhanced logging configuration to support dynamic log levels and structured output.
- Updated argument parsing to include new parameters for model selection and task execution.
- Refactored task distribution logic to streamline environment task management.
- Improved error handling during task execution and environment cleanup.
- Adjusted Qwen25VLAgent initialization to support new model and thought prefix options.
- Reduced max tries for LLM calls to optimize performance.
2025-07-22 19:46:42 +00:00
张逸群 4a5d48000f
feat: add HuggingFace mirror support for VM providers (#278)
Add support for HuggingFace mirror (hf-mirror.com) to improve download
speeds in regions where huggingface.co access is slow.

- Support HF_ENDPOINT environment variable detection
- Automatically switch to hf-mirror.com when HF_ENDPOINT is set
- Apply to Docker, VMware, and VirtualBox providers
- Maintain backward compatibility with default huggingface.co URLs

Users can now set HF_ENDPOINT=https://hf-mirror.com to use the mirror.
2025-07-23 03:40:35 +08:00
Yuan Mengqi 0a37cccd53
update claude (#280)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude
2025-07-23 03:35:49 +08:00
Dunjie Lu 53fb96298a
support_qwen25vl (#276)
Co-authored-by: root <ludunjie1219@github.com>
2025-07-22 16:33:03 +08:00
yuanmengqi 921321c5df Update Public Evaluation Guidelines to clarify proxy settings. Added information on automatic proxy wrapping for proxy-sensitive tasks and retained the recommendation for users to disable the proxy if not needed. Ensured existing content structure remains intact. 2025-07-22 05:59:57 +00:00
yuanmengqi 2727696835 Enhance Public Evaluation Guidelines by adding new images for AWS setup and monitoring instructions. Included additional contact information for leaderboard updates and error reporting. Ensured clarity and usability for users while preserving existing content structure. 2025-07-22 05:53:33 +00:00
yuanmengqi 05e25ba1b7 Enhance Public Evaluation Guidelines with detailed AWS setup instructions and security configurations. Added new sections for host and client machine setup, including recommended instance types, storage considerations, and security group rules. Updated existing content for clarity and added a new image for Google Drive authentication. Ensure all changes maintain original logic while improving usability for users with varying AWS experience. 2025-07-22 05:35:58 +00:00
yuanmengqi feaebbc2ec Update AWS guidance 2025-07-20 16:42:14 +00:00
yuanmengqi 46c9407879 Clean elder version of opencua experiment runner 2025-07-20 07:57:27 +00:00
yuanmengqi 91bc6bb6ce Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-20 07:55:57 +00:00
yuanmengqi 88d5639a2a Compatible with agents that cannot use runtime log 2025-07-20 07:55:53 +00:00
Xinyuan Wang e10dd9267c
Wxy/opencua (#274)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines

* update parallel; clean code; use sleep 3s

* ui-tars-0717
2025-07-20 15:52:23 +08:00
Danyang Zhang bec7129fff
Calc eval fix (#273)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

* ver Jul19th

added supports for cellIs and some other new types of conditional
formatting for calc evaluation

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-19 17:15:40 +08:00
yuanmengqi c6c62c52d7 feat: add X11 image handling and enhanced OCR processing
- Introduced a new function `read_x11_image` to read and convert X11 (XWD) format images to PIL Image, supporting both 24-bit and 32-bit formats.
- Enhanced the `compare_image_text` function to include checks for X11 image formats, with multiple conversion attempts using PIL, a custom reader, and netpbm tools.
- Improved error handling and logging for OCR processing, providing detailed feedback on conversion attempts and potential issues with X11 images.
- Maintained existing logic while expanding functionality for better image processing reliability.
2025-07-18 19:26:29 +00:00
yuanmengqi d6f2190a9f fix: refine instruction in OS evaluation example to clarify restrictions on logging out or shutting down the machine 2025-07-18 18:51:01 +00:00
yuanmengqi d52f3b1fca feat: add safe image opening function with retry mechanism
- Introduced a new function `safe_open_image_with_retry` to handle image file opening with retries for truncated or corrupted files.
- Enhanced error handling and logging for image processing in `check_palette_and_structure_sim`.
- Updated the logic to safely open both source and target images, ensuring robust evaluation without altering existing functionality.

These changes improve the reliability of image handling in the GIMP evaluator while maintaining the original code logic.
2025-07-18 18:36:09 +00:00
yuanmengqi 4fa59ebba2 feat: enhance URL comparison logic and Chrome debugging configuration
- Added a new function to ensure URLs have a scheme, defaulting to 'http://' if missing.
- Integrated tldextract to normalize URLs by extracting domain parts and handling 'www' subdomains.
- Updated the compare_urls function to include logging for better traceability during URL comparisons.
- Added tldextract to requirements.txt to support the new functionality.
- Updated the AWS manager with a new AMI ID for the specified resolution.
- Modified Chrome desktop launcher to include --remote-debugging-port=1337 for GUI debugging support.

These changes improve the robustness of URL handling and enable consistent Chrome debugging capabilities without altering existing logic.
2025-07-18 17:55:45 +00:00
yuanmengqi 1ade6fe439 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-18 14:17:37 +00:00
yuanmengqi 44bd66fc9a Increase timeout for page load stability in Chrome evaluator
- Updated the timeout for the page load state from 10 seconds to 60 seconds to ensure better stability during page processing.
- Removed redundant retry mechanisms from the active tab checks to streamline the code while maintaining existing functionality.
- Enhanced logging to provide clearer insights into the page loading process.

These changes aim to improve the reliability of the Chrome evaluator without altering the core logic.
2025-07-18 14:16:16 +00:00
Danyang Zhang 53ffc05042
Calc eval fix (#272)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-18 21:28:48 +08:00
Zilong Zhou 7fb1cee575
fix: img path error (#271)
* feat&style: add task status configuration and clear cache functionality; enhance UI styles

* feat&refactor: enhance current configuration API and improve cache clearing logic

* refactor&style: simplify task status update logic and improve page refresh mechanism

* refactor&feat: streamline default configuration retrieval and enhance cache initialization logic

* feat&refactor: add caching to default configuration retrieval and streamline task status logic

* feat&style: add collapsible section for additional model parameters and enhance styling for config items

* refactor&style: remove floating action button and clean up related styles

* fix: update video and screenshot sources to include action space, observation type, and model name parameters
2025-07-18 19:52:03 +08:00
shenzhennan 1378c745e1 Merge branch 'main' of https://github.com/xlang-ai/OSWorld 2025-07-18 07:15:18 +00:00