diff --git a/.github/workflows/summarizer.yaml b/.github/workflows/summarizer.yaml index b0abd28..1f592b7 100644 --- a/.github/workflows/summarizer.yaml +++ b/.github/workflows/summarizer.yaml @@ -3,7 +3,10 @@ name: Summarizer on: workflow_dispatch: schedule: - - cron: "0 20 * * *" # UTCで指定。日本時間で毎朝5時に実行 + - cron: "0 20 * * *" + # To run the script every day at 5:00 a.m. in Japan time, you would need to schedule the job to run at 8:00 p.m. UTC the previous day (because Japan is 9 hours ahead of UTC). This can be done in the workflow file by setting the `cron` parameter in the `schedule` event. + +This will run the `summarizer` job at 8:00 p.m. UTC daily, which is equivalent to 5:00 a.m. in Japan. jobs: summarize: diff --git a/.gitignore b/.gitignore index fb2f313..fe5a98f 100644 --- a/.gitignore +++ b/.gitignore @@ -1,5 +1,6 @@ # common .DS_Store +/dev.sh # Byte-compiled / optimized / DLL files __pycache__/ @@ -17,7 +18,6 @@ dist/ downloads/ eggs/ .eggs/ -lib/ lib64/ parts/ sdist/ diff --git a/.vscode/settings.json b/.vscode/settings.json new file mode 100644 index 0000000..414b86d --- /dev/null +++ b/.vscode/settings.json @@ -0,0 +1,7 @@ +{ + "python.linting.enabled": true, + "python.linting.pylintPath": "pylint", + "editor.formatOnSave": true, + "python.formatting.provider": "yapf", // or "black" here + "python.linting.pylintEnabled": true, +} \ No newline at end of file diff --git a/README.ja.md b/README.ja.md new file mode 100644 index 0000000..c028a53 --- /dev/null +++ b/README.ja.md @@ -0,0 +1,90 @@ +# ChatGPT を使って Slack の Public channel をまとめて要約するスクリプト + +[In English](./README.md) + +by [masuidrive](https://twitter.com/masuidrive) @ [Bloom&Co., Inc.](https://www.bloom-and-co.com/) 2023- +[APACHE LICENSE, 2.0](https://www.apache.org/licenses/LICENSE-2.0) + +![](./images/slack-summarized.ja.png) + +OpenAI の ChatGPT API を使って、Slack の Public channel の要約を作って投稿するスクリプトです。 + +チャンネルが増えた組織では読むのが追いつかないことが多いため、要約を作って投稿することで、チャンネルの活動を把握しやすくすることができます。 + +このコードの大半も ChatGPT を使って書きました。もっといいプロンプトや機能拡張があったら、Pull Request を送ってください + +簡単な解説などはこちらの記事に書いています。 + +https://note.com/masuidrive/n/na0ebf8a4c4f0 + +OpenAI の情報取扱に関する規約は下記などを自分で確認してください + +https://platform.openai.com/docs/data-usage-policies + +## GitHub Actions で動かす + +GitHub Actions で毎日午前 5 時に動くようになっています。これ以外の環境で動かす場合は適当に頑張ってください。 + +### 自分の GitHub アカウントに fork する + +- 右上の"Fork"ボタンを押して、自分のリポジトリに fork します +- 有料プランにするなどして GitHub Actions が実行できるようにしておきます + +### 環境変数を設定する + +- "Settings"タブを開き、左の"Secrets and variables"→"Actions"を開きます +- 右上の緑の"New Repository Secret"をクリックすると環境変数が設定できるので、次の 3 つの変数を設定します + +![](https://raw.githubusercontent.com/masuidrive/slack-summarizer/main/images/github-settings.png) + +#### OPEN_AI_TOKEN + +- OpenAI の認証トークン +- [OpenAI の Web サイト](https://platform.openai.com/)にアクセスしてください +- 右上の"Sign In"ボタンをクリックし、アカウントにログインしてください +- ページ上部の"API"メニューから、"API Key"をクリックして、API キーを生成します +- "API Key"ページにアクセスすると、API キーが表示されます。これをコピーして Value に貼り付けます + +#### SLACK_BOT_TOKEN + +- Slack の API 認証トークン +- [Slack API の Web サイト](https://api.slack.com/)にアクセスし、ログインしてください +- "Create a new app"をクリックして、"From an app manifest"を選択し manifest に下記の内容をコピーします + +``` +{"display_information":{"name":"Summary","description":"Public channelのサマリーを作る","background_color":"#d45f00"},"features":{"bot_user":{"display_name":"Summary","always_online":false}},"oauth_config":{"scopes":{"bot":["channels:history","channels:join","channels:read","chat:write","users:read"]}},"settings":{"org_deploy_enabled":true,"socket_mode_enabled":false,"token_rotation_enabled":false}} +``` + +- 画面左の"Install App"をクリックし、右に出る"Install App to Workspace"をクリックして、アプリをワークスペースにインストールします。インストールが完了すると、bot の OAuth アクセストークンが表示されます +- この`xoxb-`で始まるトークンをコピーして Value に貼り付けます + +#### SLACK_POST_CHANNEL_ID + +- 要約結果を投稿する Slack の channel_id +- Slack で要約結果を投稿したいチャンネルを開きます +- 上部のチャンネル名をクリックし、出てきた Popup の最下部にある Channel ID を Value に貼り付けます + +#### LANGUAGE + +- 要約を作る言語を指定します +- "ja"や"Japanese", "en" "English"などなんでも指定できます + +#### TIMEZONE + +- 主に読まれる地域のタイムゾーンを指定します。 +- "Asia/Tokyo", "America/New_York"など"TZ database name"形式で指定します +- https://en.wikipedia.org/wiki/List_of_tz_database_time_zones + +### Channel に bot をインストール + +- 画面上部の検索窓から"Summary"を検索し、"Summary [APP]"をクリックします。 +- 上の"Summary"をクリックし、"Add this app to a channel"をクリックして、要約結果を投稿したいチャンネルを指定します + +### 実行 + +- GitHub のリポジトリで"Settings"タブを開き、左の"Actions"→"General"を開きます +- "Actions permissions"の"Allow all actions and reusable workflows"を選択して保存してください + +これらの設定をすると、毎日午前 5 時に Slack の Public channel の要約結果が投稿されます。 + +手動で実行してみる場合には"Actions" タブを開き、左の"Summary"をクリックして、右の"Run workflow"をおしてください。 diff --git a/README.md b/README.md index 632562e..8505cc7 100644 --- a/README.md +++ b/README.md @@ -1,79 +1,94 @@ -# ChatGPT を使って Slack の Public channel をまとめて要約するスクリプト +# Script for summarizing Slack public channels using ChatGPT + +[日本語はこちら](./README.ja.md) by [masuidrive](https://twitter.com/masuidrive) @ [Bloom&Co., Inc.](https://www.bloom-and-co.com/) 2023- [APACHE LICENSE, 2.0](https://www.apache.org/licenses/LICENSE-2.0) -![](https://raw.githubusercontent.com/masuidrive/slack-summarizer/main/images/slack-summarized.png) +![](./images/slack-summarized.en.png) + +This script uses OpenAI's ChatGPT API to create and post a summary of a Slack public channel. + +In organizations where the number of channels is increasing, it can be difficult to keep up with reading all the activity. By creating and posting summaries, it is easier to keep track of channel activity. -Slack の Public channel の要約を作って投稿するスクリプトです。 +Most of this code was written using ChatGPT. If you have any better prompts or feature enhancements, please submit a Pull Request. -チャンネルが増えた組織では読むのが追いつかないことが多いため、要約を作って投稿することで、チャンネルの活動を把握しやすくすることができます。 +Please check OpenAI's terms and conditions of information handling for yourself, including the following -このコードの大半も ChatGPT を使って書きました。とりあえず動くように書いただけなので、コードは雑然としています。 +https://platform.openai.com/docs/data-usage-policies -誰かキレイにしたら Pull Request ください。機能追加なども大歓迎です。 +If you have any questions, please feel free to contact me on http://twitter.com/masuidrive_en or http://twitter.com/masuidrive. -簡単な解説などはこちらの記事に書いています。 +## How to set it up on GitHub Actions -https://note.com/masuidrive/n/na0ebf8a4c4f0 +It runs on GitHub Actions every day at 5:00 a.m., so if you want to run it in a different environment, you'll have to figure it out. -## How to set it up +### Fork it to your own GitHub account -GitHub Actions で毎日午前 5 時に動くようになっています。これ以外の環境で動かす場合は適当に頑張ってください。 +- Click the "Fork" button in the upper right to fork it to your own repository. +- Make the GitHub Actions executable by upgrading to a paid plan or some other means. -### 自分の GitHub アカウントに fork する +### Edit running time -- 右上の"Fork"ボタンを押して、自分のリポジトリに fork します -- 有料プランにするなどして GitHub Actions が実行できるようにしておきます +- GitHub Actions uses the cron syntax to schedule jobs, which is specified in the `.github/workflows/summarizer.yaml` file with the `minute hour * * *` format. +- Since this is in UTC, you need to adjust for your own time zone. +- For example, to run the script every day at 5:00 AM in Japan, you would specify `0 20 * * *` to run it at 8:00 PM UTC the day before. -### 環境変数を設定する +### Set environment variables -- "Settings"タブを開き、左の"Secrets and variables"→"Actions"を開きます -- 右上の緑の"New Repository Secret"をクリックすると環境変数が設定できるので、次の 3 つの変数を設定します +- Open the "Settings" tab and click "Secrets and variables"->"Actions". +- Click the green "New Repository Secret" button to set environment variables for the following three variables. ![](https://raw.githubusercontent.com/masuidrive/slack-summarizer/main/images/github-settings.png) #### OPEN_AI_TOKEN -- OpenAI の認証トークン -- [OpenAI の Web サイト](https://openai.com/)にアクセスしてください -- 右上の"Sign In"ボタンをクリックし、アカウントにログインしてください -- ページ上部の"API"メニューから、"API Key"をクリックして、API キーを生成します -- "API Key"ページにアクセスすると、API キーが表示されます。これをコピーして Value に貼り付けます +- OpenAI's authentication token +- Access [OpenAI's website](https://platform.openai.com/). +- Click the "Sign In" button on the upper right and log in to your account. +- Click "API Key" from the "API" menu at the top of the page to generate an API key. +- When you access the "API Key" page, the API key will be displayed. Copy it and paste it into Value. #### SLACK_BOT_TOKEN -- Slack の API 認証トークン -- [Slack API の Web サイト](https://api.slack.com/)にアクセスし、ログインしてください -- "Create a new app"をクリックして、"From an app manifest"を選択し manifest に下記の内容をコピーします +- Slack's API authentication token +- Access the [Slack API website](https://api.slack.com/) and log in. +- Click "Create a new app" and select "From an app manifest", and copy the following contents to the manifest. ``` {"display_information":{"name":"Summary","description":"Public channelのサマリーを作る","background_color":"#d45f00"},"features":{"bot_user":{"display_name":"Summary","always_online":false}},"oauth_config":{"scopes":{"bot":["channels:history","channels:join","channels:read","chat:write","users:read"]}},"settings":{"org_deploy_enabled":true,"socket_mode_enabled":false,"token_rotation_enabled":false}} ``` -- 画面左の"Install App"をクリックし、右に出る"Install App to Workspace"をクリックして、アプリをワークスペースにインストールします。インストールが完了すると、bot の OAuth アクセストークンが表示されます -- この`xoxb-`で始まるトークンをコピーして Value に貼り付けます +- Click "Install App" on the left side of the screen, then click "Install App to Workspace" that appears on the right side to install the app in your workspace. Once the installation is complete, the bot's OAuth access token will be displayed. +- Copy this token that begins with `xoxb-` and paste it into Value. #### SLACK_POST_CHANNEL_ID -- 要約結果を投稿する Slack の channel_id -- Slack で要約結果を投稿したいチャンネルを開きます -- 上部のチャンネル名をクリックし、出てきた Popup の最下部にある Channel ID を Value に貼り付けます +- The channel_id in Slack where you want to post the summary result +- Open the Slack channel where you want to post the summary results. +- Click the channel name at the top and paste the Channel ID, which appears at the bottom of the popup window. + +#### LANGUAGE + +- Specifies the language used for summarization. +- Any value can be specified, such as "ja" or "Japanese" for Japanese, or "en" or "English" for English. -### Channel に bot をインストール +#### TIMEZONE -- 画面上部の検索窓から"Summarizer"を検索し、"Summarizer [APP]"をクリックします。 -- 上の"Summarizer"をクリックし、"Add this app to a channel"をクリックして、要約結果を投稿したいチャンネルを指定します +- Specifies the timezone for the primarily read region. +- Specify in the "TZ database name" format, such as "Asia/Tokyo" or "America/New_York". +- See https://en.wikipedia.org/wiki/List_of_tz_database_time_zones -### 実行 +### Install the bot in the channel -- GitHub のリポジトリで"Settings"タブを開き、左の"Actions"→"General"を開きます -- "Actions permissions"の"Allow all actions and reusable workflows"を選択して保存してください +- Search for "Summary" in the search window at the top of the screen and click "Summary [APP]". +- Click "Summary" and click "Add this app to a channel" to specify the channel where you want to post the summary results. -これらの設定をすると、毎日午前 5 時に Slack の Public channel の要約結果が投稿されます。 +### Run -手動で実行してみる場合には"Actions" タブを開き、左の"Summarizer"をクリックして、右の"Run workflow"をおしてください。 +- Open the "Settings" tab in the GitHub repository, then click "Actions"->"General" on the left side. +- Select "Allow all actions and reusable workflows" in "Actions permissions" and save it. -## Problems +With these settings, a summary of Slack's public channels will be posted every day at 5:00 a.m. -このスクリプトで既知の課題としては、1 チャンネル当たりの発言が 4000token を超えるとコケます。分割する部分は書いてないので。Pull Request をお待ちしてます → 誰か +you would manually execute it by opening the "Actions" tab, clicking on "Summary" on the left, and clicking "Run workflow" on the right. diff --git a/images/slack-summarized.en.png b/images/slack-summarized.en.png new file mode 100644 index 0000000..385a706 Binary files /dev/null and b/images/slack-summarized.en.png differ diff --git a/images/slack-summarized.png b/images/slack-summarized.ja.png similarity index 100% rename from images/slack-summarized.png rename to images/slack-summarized.ja.png diff --git a/lib/slack.py b/lib/slack.py new file mode 100644 index 0000000..b5c5618 --- /dev/null +++ b/lib/slack.py @@ -0,0 +1,254 @@ +import re +import sys +import time +from datetime import datetime +from slack_sdk.errors import SlackApiError +from slack_sdk import WebClient +from lib.utils import retry, sort_by_numeric_prefix + + +class SlackClient: + """ A class for managing a Slack bot client. + + Args: + token (str): The Slack Bot token used to authenticate with the Slack API. + + Example: + ``` + client = SlackClient(SLACK_BOT_TOKEN) + client.postSummary(text) + ``` + """ + + def __init__(self, slack_api_token: str, summary_channel: str): + self.client = WebClient(token=slack_api_token) + self.users = self._get_users_info() + self.channels = self._get_channels_info() + self._summary_channel = summary_channel + + def postSummary(self, text: str): + response = self.client.chat_postMessage(channel=self._summary_channel, + text=text) + if not response["ok"]: + print(f'Failed to post message: {response["error"]}') + raise SlackApiError('Failed to post message', response["error"]) + + def load_messages(self, channel_id: str, start_time: datetime, + end_time: datetime) -> list: + """ Load the chat history for the specified channel between the given start and end times. + + Args: + channel_id (str): The ID of the channel to retrieve the chat history for. + start_time (datetime): The start time of the time range to retrieve chat history for. + end_time (datetime): The end time of the time range to retrieve chat history for. + users (list): A list of dictionaries containing information about each user in the Slack workspace. + + Returns: + list: A list of chat messages from the specified channel, in the format "Speaker: Message". + + Examples: + >>> start_time = datetime(2022, 5, 1, 0, 0, 0) + >>> end_time = datetime(2022, 5, 2, 0, 0, 0) + >>> messages = load_messages('C12345678', start_time, end_time) + >>> print(messages[0]) + "Alice: Hi, Bob! How's it going?" + """ + + messages_info = [] + try: + self._wait_api_call() + result = retry(lambda: self.client.conversations_history( + channel=channel_id, + oldest=start_time.timestamp(), + latest=end_time.timestamp(), + limit=1000), + exception=SlackApiError) + messages_info.extend(result["messages"]) + except SlackApiError as error: + if error.response['error'] == 'not_in_channel': + self._wait_api_call() + response = retry( + lambda: self.client.conversations_join(channel=channel_id), + exception=SlackApiError) + if not response["ok"]: + print("Failed conversations_join()") + sys.exit(1) + time.sleep(5) + + result = retry(lambda: self.client.conversations_history( + channel=channel_id, + oldest=start_time.timestamp(), + latest=end_time.timestamp(), + limit=1000), + exception=SlackApiError) + else: + print(f"Error : {error}") + return None + + while result["has_more"]: + self._wait_api_call() + result = retry(lambda: self.client.conversations_history( + channel=channel_id, + oldest=start_time.timestamp(), + latest=end_time.timestamp(), + limit=1000, + cursor=result["response_metadata"]["next_cursor"]), + exception=SlackApiError) + messages_info.extend(result["messages"]) + + # Filter for human messages only + messages = list(filter(lambda m: "subtype" not in m, messages_info)) + + if len(messages) < 1: + return None + + messages_text = [] + for message in messages[::-1]: + # Ignore bot messages and empty messages + if "bot_id" in message or len(message["text"].strip()) == 0: + continue + + # Get speaker name + speaker_name = self.get_user_name(message["user"]) or "somebody" + + # Get message body fro result dict. + body_text = message["text"].replace("\n", "\\n") + + # Replace User IDs in a chat message text with user names. + body_text = self.replace_user_id_with_name(body_text) + + # all channel id replace to "other channel" + body_text = re.sub(r"<#[A-Z0-9]+>", " other channel ", body_text) + + messages_text.append(f"{speaker_name}: {body_text}") + + if len(messages_text) == 0: + return None + else: + return messages_text + + def get_user_name(self, user_id: str) -> str: + """ Get the name of a user with the given ID. + + Args: + user_id (str): The ID of the user to look up. + + Returns: + str: The name of the user with the given ID, or None if no such user exists. + + Examples: + >>> users = [{'id': 'U1234', 'name': 'Alice'}, {'id': 'U5678', 'name': 'Bob'}] + >>> get_user_name('U1234', users) + 'Alice' + >>> get_user_name('U9999', users) + None + """ + matching_users = [user for user in self.users if user['id'] == user_id] + return matching_users[0]['name'] if len(matching_users) > 0 else None + + def replace_user_id_with_name(self, body_text: str) -> str: + """ Replace user IDs in a chat message text with user names. + + Args: + body_text (str): The text of a chat message. + users (list): A list of user information dictionaries. + Each dictionary must have 'id' and 'name' keys. + + Returns: + str: The text of the chat message with user IDs replaced with user names. + + Examples: + >>> users = [{'id': 'U1234', 'name': 'Alice'}, {'id': 'U5678', 'name': 'Bob'}] + >>> body_text = "Hi <@U1234>, how are you?" + >>> replace_user_id_with_name(body_text, users) + "Hi @Alice, how are you?" + """ + pattern = r"<@([A-Z0-9]+)>" + for match in re.finditer(pattern, body_text): + user_id = match.group(1) + user_name = next( + (user['name'] for user in self.users if user['id'] == user_id), + user_id) + body_text = body_text.replace(match.group(0), user_name) + return body_text + + def _get_users_info(self) -> list: + """ Retrieve information about all users in the Slack workspace. + + Returns: + list: A list of dictionaries containing information about each user, + including their ID, name, and other metadata. + + Raises: + SlackApiError: If an error occurs while attempting to retrieve the user information. + + Examples: + >>> users = get_users_info() + >>> print(users[0]) + { + 'id': 'U12345678', + 'name': 'alice', + 'real_name': 'Alice Smith', + 'email': 'alice@example.com', + ... + } + """ + try: + users = [] + next_cursor = None + while True: + self._wait_api_call() + users_info = retry(lambda: self.client.users_list( + cursor=next_cursor, limit=100), + exception=SlackApiError) + time.sleep(3) + users.extend(users_info['members']) + if users_info["response_metadata"]["next_cursor"]: + next_cursor = users_info["response_metadata"][ + "next_cursor"] + else: + break + return users + except SlackApiError as error: + print(f"Error : {error}") + sys.exit(1) + + def _get_channels_info(self) -> list: + """ Retrieve information about all public channels in the Slack workspace. + + Returns: + list: A list of dictionaries containing information about each public channel, including its ID, name, and other metadata. sorted by channel name. + + Raises: + SlackApiError: If an error occurs while attempting to retrieve the channel information. + + Examples: + >>> channels = get_channels_info() + >>> print(channels[0]) + { + 'id': 'C12345678', + 'name': 'general', + 'is_channel': True, + 'is_archived': False, + ... + } + """ + try: + self._wait_api_call() + result = retry(lambda: self.client.conversations_list( + types="public_channel", exclude_archived=True, limit=1000), + exception=SlackApiError) + channels_info = [ + channel for channel in result['channels'] + if not channel["is_archived"] and channel["is_channel"] + ] + channels_info = sort_by_numeric_prefix(channels_info, + get_key=lambda x: x["name"]) + return channels_info + except SlackApiError as error: + print(f"Error : {error}") + sys.exit(1) + + def _wait_api_call(self): + """ most of api call limit is 20 per minute """ + time.sleep(60 / 20) diff --git a/lib/utils.py b/lib/utils.py new file mode 100644 index 0000000..9f0e062 --- /dev/null +++ b/lib/utils.py @@ -0,0 +1,85 @@ +""" Utility functions for the project. """ + +import re +import time +import emoji + + +def retry(func, max_retries=5, sleep_time=10, exception=Exception): + """ A decorator function that retries the function call if it fails. + + Args: + func (callable): The function to be wrapped. + max_retries (int, optional): The maximum number of retries. Defaults to 5. + sleep_time (int, optional): The sleep time in seconds between retries. Defaults to 10. + + Returns: + result of func call. + + Examples: + result = self.retry(lambda: slack_client.conversations_list(types="public_channel", exclude_archived=True, limit=1000)) + """ + + for i in range(max_retries): + try: + result = func() + return result + except exception as error: + if i == max_retries - 1: + raise error + time.sleep(sleep_time) + return None + + +def sort_by_numeric_prefix(lst, get_key=lambda x: x): + """ + Sorts the list based on whether the element has a numeric prefix. + If an element has a numeric prefix, it is sorted in ascending order based on the numeric value. + If an element does not have a numeric prefix, it is sorted in ascending order based on the alphabetical order of the string. + + Args: + lst: A list of strings + get_key: A function that takes a string and returns a key to sort on. + Default is identity function that returns the string itself. + + Returns: + A sorted list of strings + + Example: + >>> lst = [{"name":"a"}, {"name":"1abc"}, {"name":"Z"}, {"name":"う"}, {"name":"あ"}, {"name":"14:A"}] + >>> sort_by_numeric_prefix(lst, get_key=lambda x: x["name"]) + [{'name': '14:A'}, {'name': '1abc'}, {'name': 'Z'}, {'name': 'a'}, {'name': 'あ'}, {'name': 'う'}] + + """ + digits_list = [s for s in lst if re.match(r'^(\d+)', get_key(s))] + string_list = [s for s in lst if re.match(r'^\D', get_key(s))] + + def numkey(n: str): + match = re.match(r'^(\d+)', get_key(n)) + return int(match.group(1)) + + return sorted(digits_list, key=numkey) + sorted(string_list, key=get_key) + + +def remove_emoji(text: str) -> str: + """ + Remove emojis from the given text. + + Args: + text (str): A string containing the text to remove custom emojis from. + + Returns: + str: The input text with custom emojis removed. + + Example: + >>> text = "Hello, world! :smile: :wave:" + >>> remove_custom_emoji(text) + 'Hello, world! ' + """ + # Remove Unicode emojis + text = emoji.replace_emoji(text, replace='') + + # Remove Slack custom emojis + custom_pattern = r":[-_a-zA-Z0-9]+?:" + text = re.sub(custom_pattern, "", text) + return text \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index 2f4a5e7..a33b62a 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,4 @@ pytz==2021.3 slack_sdk==3.20.0 -openai==0.27.0 \ No newline at end of file +openai==0.27.0 +emoji==2.2.0 \ No newline at end of file diff --git a/summarizer.py b/summarizer.py index 3c12dc4..43499d3 100644 --- a/summarizer.py +++ b/summarizer.py @@ -1,180 +1,207 @@ #!/usr/bin/env python3 -# https://github.com/masuidrive/slack-summarizer -# by [masuidrive](https://twitter.com/masuidrive) @ [Bloom&Co., Inc.](https://www.bloom-and-co.com/) 2023- [APACHE LICENSE, 2.0](https://www.apache.org/licenses/LICENSE-2.0) +""" +https://github.com/masuidrive/slack-summarizer + by [masuidrive](https://twitter.com/masuidrive) @ [Bloom&Co., Inc.](https://www.bloom-and-co.com/) + 2023- [APACHE LICENSE, 2.0](https://www.apache.org/licenses/LICENSE-2.0) +""" import os import re -import time +import sys +from datetime import datetime, timedelta import pytz +import openai from slack_sdk.errors import SlackApiError -from slack_sdk import WebClient -from datetime import datetime, timedelta +from lib.slack import SlackClient +from lib.utils import remove_emoji, retry -import openai -openai.api_key = str(os.environ.get('OPEN_AI_TOKEN')).strip() -# OpenAIのAPIを使って要約を行う +def summarize(text: str, language: str = "Japanese"): + """ + Summarize a chat log in bullet points, in the specified language. + + Args: + text (str): The chat log to summarize, in the format "Speaker: Message" separated by line breaks. + language (str, optional): The language to use for the summary. Defaults to "Japanese". + Returns: + str: The summarized chat log in bullet point format. -def summarize(text): + Examples: + >>> summarize("Alice: Hi\nBob: Hello\nAlice: How are you?\nBob: I'm doing well, thanks.") + '- Alice greeted Bob.\n- Bob responded with a greeting.\n- Alice asked how Bob was doing.\n- Bob replied that he was doing well.' + """ response = openai.ChatCompletion.create( - model="gpt-3.5-turbo", - temperature=0.5, - messages=[ - {"role": "system", "content": "チャットログのフォーマットは発言者: 本文\\nになっている。\\nは改行を表しています。これを踏まえて指示に従います"}, - {"role": "user", "content": f"下記のチャットログを箇条書きで要約してください。。1行ずつの説明ではありません。全体として短く。\n\n{text}"} - ] - ) + model=CHAT_MODEL, + temperature=TEMPERATURE, + messages=[{ + "role": + "system", + "content": + "\n".join([ + 'The chat log format consists of one line per message in the format "Speaker: Message".', + "The `\\n` within the message represents a line break." + f'The user understands {language} only.', + f'So, The assistant need to speak in {language}.', + ]) + }, { + "role": + "user", + "content": + "\n".join([ + f"Please meaning summarize the following chat log to flat bullet list in {language}.", + "It isn't line by line summary.", + "Do not include greeting/salutation/polite expressions in summary.", + "With make it easier to read." + f"Write in {language}.", "", text + ]) + }]) + + if DEBUG: + print(response["choices"][0]["message"]['content']) return response["choices"][0]["message"]['content'] -# APIトークンとチャンネルIDを設定する -TOKEN = str(os.environ.get('SLACK_BOT_TOKEN')).strip() -CHANNEL_ID = str(os.environ.get('SLACK_POST_CHANNEL_ID')).strip() +def get_time_range(): + """ + Get a time range starting from 25 hours ago and ending at the current time. + + Returns: + tuple: A tuple containing the start and end times of the time range, as datetime objects. + + Examples: + >>> start_time, end_time = get_time_range() + >>> print(start_time, end_time) + 2022-05-17 09:00:00+09:00 2022-05-18 10:00:00+09:00 + """ + hours_back = 25 + timezone = pytz.timezone(TIMEZONE_STR) + now = datetime.now(timezone) + yesterday = now - timedelta(hours=hours_back) + start_time = datetime(yesterday.year, yesterday.month, yesterday.day, + yesterday.hour, yesterday.minute, yesterday.second) + end_time = datetime(now.year, now.month, now.day, now.hour, now.minute, + now.second) + return start_time, end_time + + +def estimate_openai_chat_token_count(text: str) -> int: + """ + Estimate the number of OpenAI API tokens that would be consumed by sending the given text to the chat API. + + Args: + text (str): The text to be sent to the OpenAI chat API. + + Returns: + int: The estimated number of tokens that would be consumed by sending the given text to the OpenAI chat API. + + Examples: + >>> estimate_openai_chat_token_count("Hello, how are you?") + 7 + """ + # Split the text into words and count the number of characters of each type + pattern = re.compile( + r"""( + \d+ | # digits + [a-z]+ | # alphabets + \s+ | # whitespace + . # other characters + )""", re.VERBOSE | re.IGNORECASE) + matches = re.findall(pattern, text) + + # based on https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them + def counter(tok): + if tok == ' ' or tok == '\n': + return 0 + elif tok.isdigit() or tok.isalpha(): + return (len(tok) + 3) // 4 + else: + return 1 + + return sum(map(counter, matches)) + + +def split_messages_by_token_count(messages: list[str]) -> list[list[str]]: + """ + Split a list of strings into sublists with a maximum token count. + + Args: + messages (list[str]): A list of strings to be split. -# 取得する期間を計算する -HOURS_BACK = 25 -JST = pytz.timezone('Asia/Tokyo') -now = datetime.now(JST) -yesterday = now - timedelta(hours=HOURS_BACK) -start_time = datetime(yesterday.year, yesterday.month, yesterday.day, - yesterday.hour, yesterday.minute, yesterday.second) -end_time = datetime(now.year, now.month, now.day, - now.hour, now.minute, now.second) - -# Slack APIクライアントを初期化する -client = WebClient(token=TOKEN) - -# ユーザーIDからユーザー名に変換するために、ユーザー情報を取得する -try: - users_info = client.users_list() - users = users_info['members'] -except SlackApiError as e: - print("Error : {}".format(e)) - exit(1) - - -# チャンネルIDからチャンネル名に変換するために、チャンネル情報を取得する -try: - channels_info = client.conversations_list( - types="public_channel", - exclude_archived=True, - limit=1000 - ) - channels = [channel for channel in channels_info['channels'] - if not channel["is_archived"] and channel["is_channel"]] - channels = sorted(channels, key=lambda x: int(re.findall( - r'\d+', x["name"])[0]) if re.findall(r'\d+', x["name"]) else float('inf')) -except SlackApiError as e: - print("Error : {}".format(e)) - exit(1) - -# 指定したチャンネルの履歴を取得する - - -def load_messages(channel_id): - result = None - try: - result = client.conversations_history( - channel=channel_id, - oldest=start_time.timestamp(), - latest=end_time.timestamp() - ) - except SlackApiError as e: - if e.response['error'] == 'not_in_channel': - response = client.conversations_join( - channel=channel_id - ) - if not response["ok"]: - raise SlackApiError("conversations_join() failed") - time.sleep(5) # チャンネルにjoinした後、少し待つ - - result = client.conversations_history( - channel=channel_id, - oldest=start_time.timestamp(), - latest=end_time.timestamp() - ) + Returns: + list[list[str]]: A list of sublists, where each sublist has a token count less than or equal to max_body_tokens. + """ + body_token_counts = [ + estimate_openai_chat_token_count(message) for message in messages + ] + result = [] + current_sublist = [] + current_count = 0 + + for message, count in zip(messages, body_token_counts): + if current_count + count <= MAX_BODY_TOKENS: + current_sublist.append(message) + current_count += count else: - print("Error : {}".format(e)) - return None - # conversations_history api limit is 20 per minute - time.sleep(3) - - messages = list(filter(lambda m: "subtype" not in m, result["messages"])) - - if len(messages) < 1: - return None - - messages_text = [] - - while result["has_more"]: - time.sleep(3) # this api limit is 20 per minute - result = client.conversations_history( - channel=channel_id, - oldest=start_time.timestamp(), - latest=end_time.timestamp(), - cursor=result["response_metadata"]["next_cursor"] - ) - messages.extend(result["messages"]) - for message in messages[::-1]: - if "bot_id" in message: - continue - if message["text"].strip() == '': + result.append(current_sublist) + current_sublist = [message] + current_count = count + + result.append(current_sublist) + return result + + +# Load settings from environment variables +OPEN_AI_TOKEN = str(os.environ.get('OPEN_AI_TOKEN')).strip() +SLACK_BOT_TOKEN = str(os.environ.get('SLACK_BOT_TOKEN')).strip() +CHANNEL_ID = str(os.environ.get('SLACK_POST_CHANNEL_ID')).strip() +LANGUAGE = str(os.environ.get('LANGUAGE') or "Japanese").strip() +TIMEZONE_STR = str(os.environ.get('TIMEZONE') or 'Asia/Tokyo').strip() +TEMPERATURE = float(os.environ.get('TEMPERATURE') or 0.3) +CHAT_MODEL = str(os.environ.get('CHAT_MODEL') or "gpt-3.5-turbo").strip() +DEBUG = str(os.environ.get('DEBUG') or "").strip() != "" +MAX_BODY_TOKENS = 3000 + +if OPEN_AI_TOKEN == "" or SLACK_BOT_TOKEN == "" or CHANNEL_ID == "": + print("OPEN_AI_TOKEN, SLACK_BOT_TOKEN, CHANNEL_ID must be set.") + sys.exit(1) + +# Set OpenAI API key +openai.api_key = OPEN_AI_TOKEN + + +def runner(): + """ + app runner + """ + slack_client = SlackClient(slack_api_token=SLACK_BOT_TOKEN, + summary_channel=CHANNEL_ID) + start_time, end_time = get_time_range() + + result_text = [] + for channel in slack_client.channels: + if DEBUG: + print(channel["name"]) + messages = slack_client.load_messages(channel["id"], start_time, + end_time) + if messages is None: continue - # ユーザーIDからユーザー名に変換する - user_id = message['user'] - sender_name = None - for user in users: - if user['id'] == user_id: - sender_name = user['name'] - break - if sender_name is None: - sender_name = user_id - - # テキスト取り出し - text = message["text"].replace("\n", "\\n") - - # メッセージ中に含まれるユーザーIDやチャンネルIDを名前やチャンネル名に展開する - matches = re.findall(r"<@[A-Z0-9]+>", text) - for match in matches: - user_id = match[2:-1] - user_name = None - for user in users: - if user['id'] == user_id: - user_name = user['name'] - break - if user_name is None: - user_name = user_id - text = text.replace(match, f"@{user_name} ") - - matches = re.findall(r"<#[A-Z0-9]+>", text) - for match in matches: - channel_id = match[2:-1] - channel_name = None - for channel in channels: - if channel['id'] == channel_id: - channel_name = channel['name'] - break - if channel_name is None: - channel_name = channel_id - text = text.replace(match, f"#{channel_name} ") - messages_text.append(f"{sender_name}: {text}") - if len(messages_text) == 0: - return None - else: - return messages_text + # remove emojis in messages + messages = list(map(remove_emoji, messages)) + + result_text.append(f"----\n<#{channel['id']}>\n") + for spilitted_messages in split_messages_by_token_count(messages): + text = summarize("\n".join(spilitted_messages), LANGUAGE) + result_text.append(text) -result_text = [] -for channel in channels: - messages = load_messages(channel["id"]) - if messages != None: - text = summarize(messages) - result_text.append(f"----\n<#{channel['id']}>\n{text}") + title = (f"{start_time.strftime('%Y-%m-%d')} public channels summary\n\n") + + if DEBUG: + print("\n".join(result_text)) + else: + retry(slack_client.postSummary(title + "\n".join(result_text)), + exception=SlackApiError) -title = (f"{yesterday.strftime('%Y-%m-%d')}のpublic channelの要約") -response = client.chat_postMessage( - channel=CHANNEL_ID, - text=title+"\n\n"+"\n\n".join(result_text) -) -print("Message posted: ", response["ts"]) +if __name__ == '__main__': + runner()