Toxicity-Aware Comment Classifier: Decoding Online Tone with BERT

Built a multi-class comment classification tool using BERT to detect tone and intent behind online comments. Labeled my own dataset across 7 nuanced categories and reached 77% F1-score.

About the project

Social media creators often face emotionally taxing comments that range from supportive to subtly harmful. I built a comment classifier that interprets online tone using a fine-tuned BERT model. Trained on a custom-labeled dataset of Instagram and YouTube comments, this tool helps identify intent behind messages—from passive-aggressive to objectifying—achieving an F1-score of 0.77 despite the inherent ambiguity in human communication.

Motivation

As both a creator and a researcher, I’ve experienced how difficult it is to parse tone online. Comments that appear neutral at first glance often carry emotional weight or subtle aggression. Manually filtering these is exhausting and emotionally draining. I wanted to build something that could help automate this task, especially for creators with large audiences.

Tools & Frameworks

HuggingFace Transformers (BERT)
Python, Jupyter Notebook
Scikit-learn
Matplotlib & Seaborn for visualization
Pandas, NumPy for data handling

Dataset

Hand-collected ~2,500 comments from Instagram and YouTube. Manually labeled into 7 categories:

Supportive
Neutral
Emotionally Heavy
Blunt/Criticism
Passive-Aggressive
Objectifying
Parasocial

Workflow

Data Collection & Labeling: Exported comment threads, removed duplicates, and labeled each comment by tone.
Preprocessing: No emoji or case removal (retained for tone context). Used bert-base-uncased tokenizer.
Training:
- Wrapped data into PyTorch Dataset class
- Used HuggingFace Trainer API with early stopping
- Fine-tuned pre-trained BERT
- Hyperparameters: LR = 1e-5, 15 epochs, batch size 16
Evaluation:
- Weighted F1-score = 0.77
- Confusion matrix showed strongest reliability in detecting "Emotionally Heavy" and "Objectifying" tones
- Most confusion occurred between "Supportive" vs "Neutral" and "Blunt" vs "Criticism"

Results

F1-score: 0.77 (weighted)
Confusion matrix + classification report included
Prediction demo: Live demo with predict_label() function to classify new comments

Challenges & Lessons

Tone is inherently ambiguous; even humans disagree on labeling
Learned importance of class balance and metric trade-offs
HuggingFace's Trainer made fine-tuning accessible and modular