Skip to content

Commit

Permalink
feat: add support for aggregates and toxicity classification (georgia…
Browse files Browse the repository at this point in the history
…-tech-db#551)

Merging to resolve emotional analysis model issue. We still need to take care of other issues highlighted by @gaurav274 .
  • Loading branch information
jarulraj authored Jan 7, 2023
1 parent f88db49 commit 39183a4
Show file tree
Hide file tree
Showing 29 changed files with 747 additions and 833 deletions.
7 changes: 3 additions & 4 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ orbs:
workflows:
main:
jobs:
- Windows
- test:
name: "Linux - Python v3.7"
v: "3.7"
Expand All @@ -18,9 +19,8 @@ workflows:
- test:
name: "Linux - Python v3.10"
v: "3.10"
- Windows
#- test:
# name: "Python v3.11" # missing Torchvision
# name: "Linux - Python v3.11" # missing Torchvision
# v: "3.11"

jobs:
Expand All @@ -46,8 +46,7 @@ jobs:
command: |
pip install --upgrade pip
pip install evadb
# bash script/test/package.sh
bash script/test/package.sh
- run:
name: Install EVA package from GitHub repo with all dependencies
Expand Down
37 changes: 26 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,16 @@

EVA is a **database system tailored for video analytics** -- think PostgreSQL for videos. It supports a SQL-like language for querying videos like:

* examining the "emotion palette" of different actors
* finding gameplays that lead to a touchdown in a football game
* examining the movement of vehicles in a traffic video
* finding touchdowns in a football game

EVA comes with a wide range of commonly used computer vision models. It written in Python, and it is licensed under the Apache license.

If you are wondering why you might need a video database system, start with page on <a href="https://evadb.readthedocs.io/en/latest/source/overview/video.html#">Video Database Systems</a>. It describes how EVA lets users easily make use of deep learning models and how they can reduce money spent on inference on large image or video datasets.
If you are wondering why you might need a video database system, start with page on <a href="https://evadb.readthedocs.io/en/stable/source/overview/video.html#">Video Database Systems</a>. It describes how EVA lets users easily make use of deep learning models and how they can reduce money spent on inference on large image or video datasets.

The <a href="https://evadb.readthedocs.io/en/latest/source/overview/installation.html">Getting Started</a> page shows how you can use EVA for different computer vision tasks: image classification, object detection, action recognition, and how you can easily extend EVA to support your custom deep learning model in the form of user-defined functions.
The <a href="https://evadb.readthedocs.io/en/stable/source/overview/installation.html">Getting Started</a> page shows how you can use EVA for different computer vision tasks: image classification, object detection, action recognition, and how you can easily extend EVA to support your custom deep learning model in the form of user-defined functions.

The <a href="https://evadb.readthedocs.io/en/latest/source/tutorials/index.html">User Guides</a> section contains Jupyter Notebooks that demonstrate how to use various features of EVA. Each notebook includes a link to Google Colab, where you can run the code by yourself.
The <a href="https://evadb.readthedocs.io/en/stable/source/tutorials/index.html">User Guides</a> section contains Jupyter Notebooks that demonstrate how to use various features of EVA. Each notebook includes a link to Google Colab, where you can run the code by yourself.

## Why EVA? ##

Expand All @@ -52,7 +52,7 @@ The <a href="https://evadb.readthedocs.io/en/latest/source/tutorials/index.html"
</details>

## Links
* [Documentation](https://evadb.readthedocs.io/en/latest/)
* [Documentation](https://evadb.readthedocs.io/)
* [Tutorials](https://github.com/georgia-tech-db/eva/blob/master/tutorials/03-emotion-analysis.ipynb)
* [Join Slack](https://join.slack.com/t/eva-db/shared_invite/zt-1i10zyddy-PlJ4iawLdurDv~aIAq90Dg)
* [Demo](https://ada-00.cc.gatech.edu/eva/playground)
Expand Down Expand Up @@ -124,22 +124,37 @@ IMPL 'eva/udfs/fastrcnn_object_detector.py';

## Illustrative EVA Applications

### :desert_island: Traffic Analysis Application using Object Detection Model
### 🔮 Traffic Analysis (Object Detection Model)
| Source Video | Query Result |
|---------------|--------------|
|<img alt="Source Video" src="https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/traffic-input.webp" width="300"> |<img alt="Query Result" src="https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/traffic-output.webp" width="300"> |

### :desert_island: MNIST Digit Recognition using Image Classification Model
### 🔮 MNIST Digit Recognition (Image Classification Model)
| Source Video | Query Result |
|---------------|--------------|
|<img alt="Source Video" src="https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/mnist-input.webp" width="150"> |<img alt="Query Result" src="https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/mnist-output.webp" width="150"> |

### :desert_island: Movie Analysis Application using Face Detection + Emotion Classfication Models
### 🔮 Movie Analysis (Face Detection + Emotion Classfication Models)

| Source Video | Query Result |
|---------------|--------------|
|<img alt="Source Video" src="https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/gangubai-input.webp" width="400"> |<img alt="Query Result" src="https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/gangubai-output.webp" width="400"> |

### 🔮 [License Plate Recognition](https://github.com/georgia-tech-db/eva-application-template) (Plate Detection + OCR Extraction Models)

| Source Image | Query Result |
|---------------|--------------|
|<img alt="Source Image" src="https://raw.githubusercontent.com/georgia-tech-db/eva-application-template/main/README_files/README_14_6.png" width="400"> |<img alt="Query Result" src="https://raw.githubusercontent.com/georgia-tech-db/eva-application-template/main/README_files/README_19_1.png" width="400"> |

### 🔮 [Meme Toxicity Classification](https://github.com/georgia-tech-db/toxicity-classification) (OCR Extraction + Toxicity Classification Models)

| Source Image | Query Result |
|---------------|--------------|
|<img alt="Source Image" src="https://raw.githubusercontent.com/georgia-tech-db/toxicity-classification/main/README_files/README_16_1.png" width="300"> |<img alt="Query Result" src="https://raw.githubusercontent.com/georgia-tech-db/toxicity-classification/main/README_files/README_16_2.png" width="300"> |




## Community

Join the EVA community on [Slack](https://join.slack.com/t/eva-db/shared_invite/zt-1i10zyddy-PlJ4iawLdurDv~aIAq90Dg) to ask questions and to share your ideas for improving EVA.
Expand All @@ -153,11 +168,11 @@ Join the EVA community on [Slack](https://join.slack.com/t/eva-db/shared_invite/
[![PyPI Version](https://img.shields.io/pypi/v/evadb.svg)](https://pypi.org/project/evadb)
[![CI Status](https://circleci.com/gh/georgia-tech-db/eva.svg?style=svg)](https://circleci.com/gh/georgia-tech-db/eva)
[![Coverage Status](https://coveralls.io/repos/github/georgia-tech-db/eva/badge.svg?branch=master)](https://coveralls.io/github/georgia-tech-db/eva?branch=master)
[![Documentation Status](https://readthedocs.org/projects/evadb/badge/?version=latest)](https://evadb.readthedocs.io/en/latest/index.html)
[![Documentation Status](https://readthedocs.org/projects/evadb/badge/?version=stable)](https://evadb.readthedocs.io/en/stable/index.html)

To file a bug or request a feature, please use GitHub issues. Pull requests are welcome.
For more information on installing from source and contributing to EVA, see our
[contributing guidelines](https://evadb.readthedocs.io/en/latest/source/contribute/index.html).
[contributing guidelines](https://evadb.readthedocs.io/en/stable/source/contribute/index.html).

## License
Copyright (c) 2018-2022 [Georgia Tech Database Group](http://db.cc.gatech.edu/)
Expand Down
Binary file added data/detoxify/meme1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added data/detoxify/meme2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions eva/catalog/catalog_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ def get_video_table_column_definitions() -> List[ColumnDefinition]:
ColumnDefinition(
"data", ColumnType.NDARRAY, NdArrayType.UINT8, (None, None, None)
),
ColumnDefinition("seconds", ColumnType.FLOAT, None, []),
]
return columns

Expand Down
17 changes: 11 additions & 6 deletions eva/expression/aggregation_expression.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,14 @@ def __init__(
) # can also be a float

def evaluate(self, *args, **kwargs):
batch = self.get_child(0).evaluate(*args, **kwargs)
batch: Batch = self.get_child(0).evaluate(*args, **kwargs)
if self.etype == ExpressionType.AGGREGATION_FIRST:
batch = batch[0]
if self.etype == ExpressionType.AGGREGATION_LAST:
elif self.etype == ExpressionType.AGGREGATION_LAST:
batch = batch[-1]
if self.etype == ExpressionType.AGGREGATION_SEGMENT:
elif self.etype == ExpressionType.AGGREGATION_SEGMENT:
batch = Batch.stack(batch)
if self.etype == ExpressionType.AGGREGATION_SUM:
elif self.etype == ExpressionType.AGGREGATION_SUM:
batch.aggregate("sum")
elif self.etype == ExpressionType.AGGREGATION_COUNT:
batch.aggregate("count")
Expand All @@ -55,9 +55,14 @@ def evaluate(self, *args, **kwargs):
elif self.etype == ExpressionType.AGGREGATION_MAX:
batch.aggregate("max")
batch.reset_index()
# TODO ACTION:
# Add raise exception if data type doesn't match

column_name = self.etype.name
if column_name.find("AGGREGATION_") != -1:
# AGGREGATION_MAX -> MAX
updated_column_name = column_name.replace("AGGREGATION_", "")
batch.modify_column_alias(updated_column_name)

# TODO: Raise exception if data type doesn't match
return batch

def get_symbol(self) -> str:
Expand Down
4 changes: 3 additions & 1 deletion eva/expression/comparison_expression.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,9 @@ def evaluate(self, *args, **kwargs):
elif len(rbatch) == 1:
rbatch.repeat(len(lbatch))
else:
raise Exception("Left and Right batch does not have equal elements")
raise Exception(
f"Left and Right batch does not have equal elements: left: {len(lbatch)} right: {len(rbatch)}"
)

if self.etype == ExpressionType.COMPARE_EQUAL:
return Batch.from_eq(lbatch, rbatch)
Expand Down
12 changes: 9 additions & 3 deletions eva/models/storage/batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,9 @@ def merge_column_wise(cls, batches: List[Batch], auto_renaming=False) -> Batch:
if not len(batches):
return Batch()
frames = [batch.frames for batch in batches]
new_frames = pd.concat(frames, axis=1, copy=False, ignore_index=False)
new_frames = pd.concat(frames, axis=1, copy=False, ignore_index=False).fillna(
method="ffill"
)
if new_frames.columns.duplicated().any():
logger.warn("Duplicated column name detected {}".format(new_frames))
return Batch(new_frames)
Expand Down Expand Up @@ -427,9 +429,9 @@ def modify_column_alias(self, alias: Union[Alias, str]) -> None:
]
else:
for col_name in self.columns:
if "." in col_name:
if "." in str(col_name):
new_col_names.append(
"{}.{}".format(alias.alias_name, col_name.split(".")[1])
"{}.{}".format(alias.alias_name, str(col_name).split(".")[1])
)
else:
new_col_names.append("{}.{}".format(alias.alias_name, col_name))
Expand All @@ -446,3 +448,7 @@ def drop_column_alias(self) -> None:
new_col_names.append(col_name)

self._frames.columns = new_col_names

def rename(self, columns) -> None:
"Rename column names"
self._frames.rename(columns=columns, inplace=True)
5 changes: 2 additions & 3 deletions eva/parser/eva.lark
Original file line number Diff line number Diff line change
Expand Up @@ -242,9 +242,8 @@ function_call: udf_function ->udf_function_call

udf_function: simple_id "(" function_args ")" dotted_id?


aggregate_windowed_function: aggregate_function_name "(" (ALL | DISTINCT)? function_arg ")"
| COUNT "(" ("*" | ALL? function_arg) ")"
aggregate_windowed_function: aggregate_function_name "(" function_arg ")"
| COUNT "(" (STAR | function_arg) ")"


aggregate_function_name: AVG | MAX | MIN | SUM | FIRST | LAST | SEGMENT
Expand Down
39 changes: 26 additions & 13 deletions eva/parser/lark_visitor/_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from lark import Tree
from lark import Token, Tree

from eva.expression.abstract_expression import ExpressionType
from eva.expression.aggregation_expression import AggregationExpression
from eva.expression.function_expression import FunctionExpression
from eva.expression.tuple_value_expression import TupleValueExpression
from eva.parser.create_udf_statement import CreateUDFStatement
from eva.parser.drop_udf_statement import DropUDFStatement
from eva.utils.logging_manager import logger
Expand Down Expand Up @@ -114,7 +115,17 @@ def create_udf(self, tree):

def get_aggregate_function_type(self, agg_func_name):
agg_func_type = None
if agg_func_name == "FIRST":
if agg_func_name == "COUNT":
agg_func_type = ExpressionType.AGGREGATION_COUNT
elif agg_func_name == "MIN":
agg_func_type = ExpressionType.AGGREGATION_MIN
elif agg_func_name == "MAX":
agg_func_type = ExpressionType.AGGREGATION_MAX
elif agg_func_name == "SUM":
agg_func_type = ExpressionType.AGGREGATION_SUM
elif agg_func_name == "AVG":
agg_func_type = ExpressionType.AGGREGATION_AVG
elif agg_func_name == "FIRST":
agg_func_type = ExpressionType.AGGREGATION_FIRST
elif agg_func_name == "LAST":
agg_func_type = ExpressionType.AGGREGATION_LAST
Expand All @@ -125,22 +136,24 @@ def get_aggregate_function_type(self, agg_func_name):
return agg_func_type

def aggregate_windowed_function(self, tree):
agg_func_name = self.visit(tree.children[0]).value

agg_func_arg = None
assert agg_func_name in [
"MIN",
"MAX",
"AVG",
"SUM",
"COUNT",
"FIRST",
"LAST",
"SEGMENT",
]
agg_func_name = None

for child in tree.children:
if isinstance(child, Tree):
if child.data == "function_arg":
agg_func_arg = self.visit(child)
elif child.data == "aggregate_function_name":
agg_func_name = self.visit(child).value
elif isinstance(child, Token):
token = child.value
# Support for COUNT(*)
if token != "*":
agg_func_name = token
else:
agg_func_arg = TupleValueExpression(col_name="id")

agg_func_type = self.get_aggregate_function_type(agg_func_name)
agg_expr = AggregationExpression(agg_func_type, None, agg_func_arg)
return agg_expr
6 changes: 5 additions & 1 deletion eva/readers/opencv_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,11 @@ def _read(self) -> Iterator[Dict]:
_, frame = video.read()
frame_id = begin
while frame is not None and frame_id <= end:
yield {"id": frame_id, "data": frame}
yield {
"id": frame_id,
"data": frame,
"seconds": frame_id // video.get(cv2.CAP_PROP_FPS),
}
_, frame = video.read()
frame_id += 1
else:
Expand Down
2 changes: 1 addition & 1 deletion eva/udfs/emotion_detector.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def setup(self, threshold=0.85):

# pull model from dropbox if not present
if not os.path.exists(model_path):
model_url = "https://www.dropbox.com/s/bqblykok62d28mn/emotion_detector.t7"
model_url = "https://www.dropbox.com/s/x0a8bz53apvmoc9/emotion_detector.t7"
subprocess.run(["wget", model_url, "--directory-prefix", output_directory])

# self.get_device() infers device from the loaded model, so not using it
Expand Down
49 changes: 49 additions & 0 deletions eva/udfs/ndarray/timestamp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# coding=utf-8
# Copyright 2018-2022 EVA
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time

import pandas as pd

from eva.udfs.abstract.abstract_udf import AbstractUDF


class Timestamp(AbstractUDF):
@property
def name(self) -> str:
return "Timestamp"

def setup(self):
pass

def forward(self, inp: pd.DataFrame) -> pd.DataFrame:
"""
inp: DataFrame -> out: DataFrame
second timestamp
0 int 0 string
1 int 1 string
"""

# Sanity check
if len(inp.columns) != 1:
raise ValueError("input must only contain one column (seconds)")

seconds = pd.DataFrame(inp[inp.columns[0]])
timestamp_result = seconds.apply(lambda x: self.format_timestamp(x[0]), axis=1)
outcome = pd.DataFrame({"timestamp": timestamp_result.values})
return outcome

def format_timestamp(self, num_of_seconds):
timestamp = time.strftime("%H:%M:%S", time.gmtime(num_of_seconds))
return timestamp
Loading

0 comments on commit 39183a4

Please sign in to comment.