Toxicity-Aware Comment Classifier: Decoding Online Tone with BERT
About the project
Motivation
As both a creator and a researcher, I’ve experienced how difficult it is to parse tone online. Comments that appear neutral at first glance often carry emotional weight or subtle aggression. Manually filtering these is exhausting and emotionally draining. I wanted to build something that could help automate this task, especially for creators with large audiences.
Tools & Frameworks
HuggingFace Transformers (BERT)
Python, Jupyter Notebook
Scikit-learn
Matplotlib & Seaborn for visualization
Pandas, NumPy for data handling
Dataset
Hand-collected ~2,500 comments from Instagram and YouTube. Manually labeled into 7 categories:
Supportive
Neutral
Emotionally Heavy
Blunt/Criticism
Passive-Aggressive
Objectifying
Parasocial

Workflow
Data Collection & Labeling: Exported comment threads, removed duplicates, and labeled each comment by tone.
Preprocessing: No emoji or case removal (retained for tone context). Used
bert-base-uncasedtokenizer.Training:
Wrapped data into PyTorch Dataset class
Used HuggingFace Trainer API with early stopping
Fine-tuned pre-trained BERT
Hyperparameters: LR = 1e-5, 15 epochs, batch size 16
Evaluation:
Weighted F1-score = 0.77
Confusion matrix showed strongest reliability in detecting "Emotionally Heavy" and "Objectifying" tones
Most confusion occurred between "Supportive" vs "Neutral" and "Blunt" vs "Criticism"
Results
F1-score: 0.77 (weighted)
Confusion matrix + classification report included
Prediction demo: Live demo with
predict_label()function to classify new comments
Challenges & Lessons
Tone is inherently ambiguous; even humans disagree on labeling
Learned importance of class balance and metric trade-offs
HuggingFace's Trainer made fine-tuning accessible and modular

