MINOR Move scripts into committer-tools (#17162)

Moving reviewers.py and kafka-merge-pr.py into committer-tools. Also include a new find-unfinished-test.py script which can be used for finding hanging tests on Jenkins or Github Actions. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-09-11 18:22:35 -04:00 · 2024-09-11 18:22:35 -04:00 · a1f28570af
parent c62c3899aa
commit a1f28570af
5 changed files with 134 additions and 21 deletions
--- a/committer-tools/README.md
+++ b/committer-tools/README.md
@ -1,20 +1,13 @@
-# Refresh Collaborators Script
+# Committer Tools

-The Refresh Collaborators script automates the process of fetching contributor
-data from GitHub repositories, filtering top contributors who are not part of
-the existing committers, and updating a local configuration file (.asf.yaml) to
-include these new contributors.
-
-## Table of Contents
-
- [Requirements](#requirements)
- [Installation](#installation)
- [Usage](#usage)
+This directory contains scripts to help Apache Kafka committers with a few chores.
+Some of the scripts require a GitHub API token with write permissions. Only
+committers will be able to utilize such scripts.

 ## Requirements

 - Python 3.x and pip
- A valid GitHub token with repository read access
+- The GitHub CLI

 ## Installation

@ -23,14 +16,14 @@ include these new contributors.
 Check if Python and pip are installed in your system.

 ```bash
-python3 --version
-pip3 --version
+python --version
+pip --version
 ```

 ### 2. Set up a virtual environment (optional)

 ```bash
-python3 -m venv venv
+python -m venv venv

 # For Linux/macOS
 source venv/bin/activate
@ -39,15 +32,40 @@ source venv/bin/activate
 # .\venv\Scripts\activate
 ```

-3. Install the required dependencies
+### 3. Install the required dependencies

 ```bash
-pip3 install -r requirements.txt
+pip install -r requirements.txt
 ```

-## Usage
+### 4. Install the GitHub CLI

-### 1. Set up the environment variable for GitHub Token
+See: https://cli.github.com/
+
+```bash
+brew install gh
+```
+
+## Find Reviewers
+
+The reviewers.py script is used to simplify the process of producing our "Reviewers:"
+Git trailer. It parses the Git log to gather a set of "Authors" and "Reviewers". 
+Some simple string prefix matching is done to find candidates.
+
+Usage:
+
+```bash
+python reviewers.py
+```
+
+## Refresh Collaborators
+
+The Refresh Collaborators script automates the process of fetching contributor
+data from GitHub repositories, filtering top contributors who are not part of
+the existing committers, and updating a local configuration file (.asf.yaml) to
+include these new contributors.
+
+> This script requires the Python dependencies and a GitHub auth token.

 You need to set up a valid GitHub token to access the repository. After you
 generate it (or authenticate via GitHub CLI), this can be done by setting the
@ -63,8 +81,37 @@ export GITHUB_TOKEN="$(gh auth token)"
 # .\venv\Scripts\activate
 ```

-### 2. Run the script
+Usage:

 ```bash
-python3 refresh_collaborators.py
+python refresh_collaborators.py
+```
+
+## Approve GitHub Action Workflows
+
+This script allows a committer to approve GitHub Action workflow runs from 
+non-committers. It fetches the latest 20 workflow runs that are in the 
+`action_required` state and prompts the user to approve the run.
+
+> This script requires the `gh` tool
+
+Usage:
+
+```bash
+python approve-workflows.py
+```
+
+## Find Hanging Tests
+
+This script is used to infer hanging tests from the Gradle output. It looks for
+tests that were STARTED but do not have a corresponding FINISHED or FAILED.
+
+Usage:
+
+```bash
+python find-unfinished-test.py ~/Downloads/logs_28218821016/5_build\ _\ JUnit\ tests\ Java\ 11.txt
+
+Found tests that were started, but not finished:
+
+2024-09-10T20:31:26.6830206Z Gradle Test Run :streams:test > Gradle Test Executor 47 > StreamThreadTest > shouldReturnErrorIfProducerInstanceIdNotInitialized(boolean, boolean) > "shouldReturnErrorIfProducerInstanceIdNotInitialized(boolean, boolean).stateUpdaterEnabled=true, processingThreadsEnabled=true" STARTED
 ```
--- a/committer-tools/approve-workflows.py
+++ b/committer-tools/approve-workflows.py
--- a/committer-tools/find-unfinished-test.py
+++ b/committer-tools/find-unfinished-test.py
@ -0,0 +1,66 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+from datetime import datetime
+
+
+def pretty_time_duration(seconds: float) -> str:
+    time_min, time_sec = divmod(int(seconds), 60)
+    time_hour, time_min = divmod(time_min, 60)
+    time_fmt = ""
+    if time_hour > 0:
+        time_fmt += f"{time_hour}h"
+    if time_min > 0:
+        time_fmt += f"{time_min}m"
+    time_fmt += f"{time_sec}s"
+    return time_fmt
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Parse Gradle log output to find hanging tests")
+    parser.add_argument("file", type=argparse.FileType("r"), help="Text file containing Gradle stdout")
+    args = parser.parse_args()
+
+    started = dict()
+    last_test_line = None
+    for line in args.file.readlines():
+        if "Gradle Test Run" not in line:
+            continue
+        last_test_line = line
+
+        toks = line.strip().split(" > ")
+        name, status = toks[-1].rsplit(" ", 1)
+        name_toks = toks[2:-1] + [name]
+        test = " > ".join(name_toks)
+        if status == "STARTED":
+            started[test] = line
+        else:
+            started.pop(test)
+
+    last_timestamp, _ = last_test_line.split(" ", 1)
+    last_dt = datetime.fromisoformat(last_timestamp)
+
+    if len(started) > 0:
+        print("Found tests that were started, but apparently not finished")
+
+    for started_not_finished, line in started.items():
+        print("-"*80)
+        timestamp, _ = line.split(" ", 1)
+        dt = datetime.fromisoformat(timestamp)
+        dur_s = (last_dt - dt).total_seconds()
+        print(f"Test: {started_not_finished}")
+        print(f"Duration: {pretty_time_duration(dur_s)}")
+        print(f"Raw line: {line}")
--- a/committer-tools/kafka-merge-pr.py
+++ b/committer-tools/kafka-merge-pr.py
--- a/committer-tools/reviewers.py
+++ b/committer-tools/reviewers.py