MINOR Move scripts into committer-tools (#17162)

Moving reviewers.py and kafka-merge-pr.py into committer-tools. Also include a new find-unfinished-test.py 
script which can be used for finding hanging tests on Jenkins or Github Actions.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This commit is contained in:
David Arthur 2024-09-11 18:22:35 -04:00 committed by GitHub
parent c62c3899aa
commit a1f28570af
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 134 additions and 21 deletions

View File

@ -1,20 +1,13 @@
# Refresh Collaborators Script
# Committer Tools
The Refresh Collaborators script automates the process of fetching contributor
data from GitHub repositories, filtering top contributors who are not part of
the existing committers, and updating a local configuration file (.asf.yaml) to
include these new contributors.
## Table of Contents
- [Requirements](#requirements)
- [Installation](#installation)
- [Usage](#usage)
This directory contains scripts to help Apache Kafka committers with a few chores.
Some of the scripts require a GitHub API token with write permissions. Only
committers will be able to utilize such scripts.
## Requirements
- Python 3.x and pip
- A valid GitHub token with repository read access
- The GitHub CLI
## Installation
@ -23,14 +16,14 @@ include these new contributors.
Check if Python and pip are installed in your system.
```bash
python3 --version
pip3 --version
python --version
pip --version
```
### 2. Set up a virtual environment (optional)
```bash
python3 -m venv venv
python -m venv venv
# For Linux/macOS
source venv/bin/activate
@ -39,15 +32,40 @@ source venv/bin/activate
# .\venv\Scripts\activate
```
3. Install the required dependencies
### 3. Install the required dependencies
```bash
pip3 install -r requirements.txt
pip install -r requirements.txt
```
## Usage
### 4. Install the GitHub CLI
### 1. Set up the environment variable for GitHub Token
See: https://cli.github.com/
```bash
brew install gh
```
## Find Reviewers
The reviewers.py script is used to simplify the process of producing our "Reviewers:"
Git trailer. It parses the Git log to gather a set of "Authors" and "Reviewers".
Some simple string prefix matching is done to find candidates.
Usage:
```bash
python reviewers.py
```
## Refresh Collaborators
The Refresh Collaborators script automates the process of fetching contributor
data from GitHub repositories, filtering top contributors who are not part of
the existing committers, and updating a local configuration file (.asf.yaml) to
include these new contributors.
> This script requires the Python dependencies and a GitHub auth token.
You need to set up a valid GitHub token to access the repository. After you
generate it (or authenticate via GitHub CLI), this can be done by setting the
@ -63,8 +81,37 @@ export GITHUB_TOKEN="$(gh auth token)"
# .\venv\Scripts\activate
```
### 2. Run the script
Usage:
```bash
python3 refresh_collaborators.py
python refresh_collaborators.py
```
## Approve GitHub Action Workflows
This script allows a committer to approve GitHub Action workflow runs from
non-committers. It fetches the latest 20 workflow runs that are in the
`action_required` state and prompts the user to approve the run.
> This script requires the `gh` tool
Usage:
```bash
python approve-workflows.py
```
## Find Hanging Tests
This script is used to infer hanging tests from the Gradle output. It looks for
tests that were STARTED but do not have a corresponding FINISHED or FAILED.
Usage:
```bash
python find-unfinished-test.py ~/Downloads/logs_28218821016/5_build\ _\ JUnit\ tests\ Java\ 11.txt
Found tests that were started, but not finished:
2024-09-10T20:31:26.6830206Z Gradle Test Run :streams:test > Gradle Test Executor 47 > StreamThreadTest > shouldReturnErrorIfProducerInstanceIdNotInitialized(boolean, boolean) > "shouldReturnErrorIfProducerInstanceIdNotInitialized(boolean, boolean).stateUpdaterEnabled=true, processingThreadsEnabled=true" STARTED
```

View File

@ -0,0 +1,66 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
from datetime import datetime
def pretty_time_duration(seconds: float) -> str:
time_min, time_sec = divmod(int(seconds), 60)
time_hour, time_min = divmod(time_min, 60)
time_fmt = ""
if time_hour > 0:
time_fmt += f"{time_hour}h"
if time_min > 0:
time_fmt += f"{time_min}m"
time_fmt += f"{time_sec}s"
return time_fmt
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Parse Gradle log output to find hanging tests")
parser.add_argument("file", type=argparse.FileType("r"), help="Text file containing Gradle stdout")
args = parser.parse_args()
started = dict()
last_test_line = None
for line in args.file.readlines():
if "Gradle Test Run" not in line:
continue
last_test_line = line
toks = line.strip().split(" > ")
name, status = toks[-1].rsplit(" ", 1)
name_toks = toks[2:-1] + [name]
test = " > ".join(name_toks)
if status == "STARTED":
started[test] = line
else:
started.pop(test)
last_timestamp, _ = last_test_line.split(" ", 1)
last_dt = datetime.fromisoformat(last_timestamp)
if len(started) > 0:
print("Found tests that were started, but apparently not finished")
for started_not_finished, line in started.items():
print("-"*80)
timestamp, _ = line.split(" ", 1)
dt = datetime.fromisoformat(timestamp)
dur_s = (last_dt - dt).total_seconds()
print(f"Test: {started_not_finished}")
print(f"Duration: {pretty_time_duration(dur_s)}")
print(f"Raw line: {line}")