Sentence Classification with TF.DATA API and Bert.

This is a sentence classifier which can be modidfed for any need. It is based of the TF.DATA API for building the pipeline to hand large dataset and BERT for feature extraction.

Usage

Download the jupyter notebook file "eluvio.ipynb" in the directory and update the following section

directory_url = 'https://drive.google.com/file/d/15X00ZWBjla7qGOIW33j8865QdF89IyAk/view'
file_name = '/content/Eluvio_DS_Challenge.csv'
BERT_TFHUB_URL = "https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/4"
max_length = 128
batch_size=4
label_data = ['worldnews']  # fill with the list of categories or labels for each sentence

def preprocess(sample, lable):
    print(sample['title']). # change 'title' to the header name of the sentence in your file
    tokenized_inputs = preprocessor(
        sample['title'])  # preprocessor.bert_pack_inputs([tokenized_inputs,tokenized_inputs],tf.constant(max_length))
    
    print(lable)
    label = one_hot(lable)
    print(tokenized_inputs)
    print('labe:',label)
    return tokenized_inputs, tf.reshape(label,(batch_size,1))

dataset = tf.data.experimental.make_csv_dataset(
    file_name,
    # select_columns=['title'],
    select_columns=['title', 'category'], # change 'title' and 'category' to the header names of the sentence and its category in your file
    batch_size=batch_size, shuffle_seed=123,
    label_name="category"
)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
dataset = dataset.map(preprocess)  # preprocess each text of the dataset
train_size = int(0.9 * DATASET_SIZE)  # train batch
val_size = int(0.1 * DATASET_SIZE)  # test/val batch

RUN the script and wait for your results.
Adjust the model nyperparameters as needed

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
eluvio.ipynb		eluvio.ipynb
test		test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentence Classification with TF.DATA API and Bert.

Usage

About

Releases

Packages

Languages

gsabbih6/eluvio

Folders and files

Latest commit

History

Repository files navigation

Sentence Classification with TF.DATA API and Bert.

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages