Commit Graph

54 Commits

Author SHA1 Message Date
Danyang Zhang 53ffc05042
Calc eval fix (#272)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-18 21:28:48 +08:00
yuanmengqi e433f35c1f feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
  - Set to `true` for 8 files listed in google_chrome.json multi_apps
  - Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
  - Set to "low" for newly added fields
  - Preserve existing values (4 medium, 2 high) for 6 files that already had this field

This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
2025-07-16 13:45:34 +00:00
Danyang Zhang d4273d992e
Calc eval fix (#225)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-30 18:23:09 +08:00
yuanmengqi 9fa768d24d refactor: update URLs in multiple JSON files to ensure proper encoding of special characters 2025-06-07 17:26:45 +00:00
Timothyxxx fb7bafb885 feat: Add proxy configuration to all 369 evaluation examples - 55 with proxy, 314 without 2025-06-05 18:46:53 +08:00
Timothyxxx 34748567a5 feat: Migrate OSWorld files to HuggingFace cache with comprehensive documentation
- Add detailed README for file cache repository
- Implement migration script with retry logic and browser simulation
- Support automatic file type detection and deduplication
- Ensure reliable hosting for OSWorld evaluation files
2025-05-28 04:29:37 +08:00
Timothyxxx 2f0f3f31aa Fix Duplicate ids; Remove unused JSON files across multiple applications 2025-02-10 15:49:54 +08:00
David Chang 2b9772174e
ver Mar15th
fixed bugs about infeasible task evaluation
2024-03-15 12:25:41 +08:00
Timothyxxx f9ccaa5773 Move sheetcopilot examples into libreoffice calc folder 2024-03-14 12:57:15 +08:00
Timothyxxx 8d69eec68f Update infeasible examples from Chrome and Calc 2024-02-14 16:51:07 +08:00
tsuky_chen 62f50cdc26
Update 7a4e4bc8-922c-4c84-865c-25ba34136be1.json 2024-02-01 16:10:47 +08:00
tsuky_chen ee851aeb54
Update 0cecd4f3-74de-457b-ba94-29ad6b5dafb6.json 2024-02-01 16:09:24 +08:00
David Chang 4897211a46
ver Jan31stv6
finished calc human evaluation
updated calc configs with an extra sleep to guarantee the integrity of
downloaded xlsx file
2024-01-31 22:55:47 +08:00
David Chang 14dbc708a4
ver Jan30thv2
debugged on windows platform with new _create_pywinauto_node function
migrated example task from calc to excel
2024-01-30 21:09:53 +08:00
Timothyxxx 37e09a994e Fix some errors found in impress and thunderbird examples 2024-01-29 13:23:06 +08:00
Timothyxxx cc21c3a6b1 Fix some errors found in calc examples 2024-01-28 21:19:18 +08:00
Timothyxxx 353ab6607d Fix some errors found in thunderbird examples 2024-01-28 16:51:38 +08:00
Timothyxxx be17bd3307 Fix some errors found in thunderbird examples 2024-01-28 15:35:31 +08:00
Timothyxxx c875cad3e5 Fix some errors found in thunderbird examples 2024-01-28 15:32:14 +08:00
David Chang 8025bf19f0
ver Jan27th
corrected usage of pyautogui in calc postconfig
2024-01-27 19:46:06 +08:00
Timothyxxx 63852755d2 Make up postconfig for libreoffice writer examples 2024-01-27 11:40:05 +08:00
David Chang 342440929b
ver Jan26thv2
replaced the file of calc/0cecd4f3 with a more complicated one from
39aa4e37
2024-01-26 17:27:29 +08:00
David Chang 0d05add432
ver Jan26th
fixed path of trajectory in cacl/39aa4e37
2024-01-26 12:46:43 +08:00
tsuky_chen 932b73c67d load libreoffice writer eval -batch 2 2024-01-26 02:15:42 +08:00
tsuky_chen 3e7cfa8699 load libreoffice writer eval -batch 2 2024-01-26 02:07:26 +08:00
David Chang fbe26e2311
ver Jan23rdv2
added read_cell_value function to load the real value of an exact excel
  cell exactly by the coordinate
2024-01-23 23:57:00 +08:00
David Chang 93229ce98c
ver Jan22ndv3
updated style metric to compare_table
2024-01-22 23:45:15 +08:00
David Chang c97f43ce95
ver Jan22ndv2
fixed a bug for checking data validation in excel
2024-01-22 15:21:16 +08:00
David Chang 7a85c76369
ver Jan22nd
updated all the existing calc configs
2024-01-22 12:42:50 +08:00
David Chang 552491f765
ver Jan21stv2
fixed bugs
updated parts of configs
2024-01-21 23:55:04 +08:00
David Chang a97c865c0c
ver Jan18th
completed all the incomplete tasks stored under libreoffice_calc before
added metric check_data_validations
2024-01-18 17:54:53 +08:00
David Chang 19214f2107
ver Jan17thv2
updated compare_table with compare the shown value through exported csv
2024-01-17 22:43:26 +08:00
David Chang ffc4c32bac
ver Jan17th
updated the existing task configs
2024-01-17 17:27:08 +08:00
David Chang 5e2a03720d
ver Jan10thv4
updated /home/david to /home/user
2024-01-10 22:33:33 +08:00
David Chang 6e6ef03bc9
ver Jan2nd
calc metrics are prapared by and large
2024-01-02 21:03:57 +08:00
David Chang d41c674a91
Merge branch 'main' into zdy 2023-12-31 14:37:01 +08:00
David Chang 19b99a13e2
Merge branch 'zdy' 2023-12-30 20:53:54 +08:00
tsuky_chen 24f33dc9bf add eval libreoffice writer font & page break 2023-12-30 16:32:15 +08:00
David Chang aaca06ba40
ver Dec29thv4
updated check_zoom
2023-12-29 22:46:32 +08:00
David Chang f73f6e1d4f
ver Dec29thv3
updated links
2023-12-29 21:56:52 +08:00
David Chang 6f225b2a02
ver Dec29thv2
re-organized functions w.r.t. comparing xlsx with a golden one
2023-12-29 21:43:33 +08:00
David Chang e4fac09945
ver Dec29th
metric compare_with_formats
2023-12-29 21:19:52 +08:00
David Chang 5a14cf40db
Merge branch 'main' into zdy 2023-12-28 21:20:57 +08:00
David Chang 2a9e5cc373
ver Dec27th
merged zdy into main
2023-12-27 20:40:23 +08:00
David Chang 7320f0aec4
ver Dec27thv3
added chart property of bar direction
2023-12-27 18:00:16 +08:00
David Chang 4e5920264a
ver Dec27thv2
updated a task config
updated documents
fixed the options feature of evaluator
updated with new properties of charts
current load_charts should be ok, I think
2023-12-27 17:51:41 +08:00
David Chang 50b82167d0
Merge branch 'zdy' 2023-12-26 21:06:39 +08:00
David Chang fe0a59583a
ver Dec26thv2
implemented _load_charts and compare_with_charts according to codes in
openpyxl
2023-12-26 20:59:19 +08:00
David Chang fa6cccc26a
Merge branch 'zdy' 2023-12-26 16:56:37 +08:00
David Chang a6b6022ecb
ver Dec26th
evaluation metric checking result file according to rules
2023-12-26 16:46:50 +08:00