Toxicity-Aware Comment Classifier: Decoding Online Tone with BERT

Built a multi-class comment classification tool using BERT to detect tone and intent behind online comments. Labeled my own dataset across 7 nuanced categories and reached 77% F1-score.

Built a multi-class comment classification tool using BERT to detect tone and intent behind online comments. Labeled my own dataset across 7 nuanced categories and reached 77% F1-score.

Built a multi-class comment classification tool using BERT to detect tone and intent behind online comments. Labeled my own dataset across 7 nuanced categories and reached 77% F1-score.

About the project

Social media creators often face emotionally taxing comments that range from supportive to subtly harmful. I built a comment classifier that interprets online tone using a fine-tuned BERT model. Trained on a custom-labeled dataset of Instagram and YouTube comments, this tool helps identify intent behind messages—from passive-aggressive to objectifying—achieving an F1-score of 0.77 despite the inherent ambiguity in human communication.

Social media creators often face emotionally taxing comments that range from supportive to subtly harmful. I built a comment classifier that interprets online tone using a fine-tuned BERT model. Trained on a custom-labeled dataset of Instagram and YouTube comments, this tool helps identify intent behind messages—from passive-aggressive to objectifying—achieving an F1-score of 0.77 despite the inherent ambiguity in human communication.

Social media creators often face emotionally taxing comments that range from supportive to subtly harmful. I built a comment classifier that interprets online tone using a fine-tuned BERT model. Trained on a custom-labeled dataset of Instagram and YouTube comments, this tool helps identify intent behind messages—from passive-aggressive to objectifying—achieving an F1-score of 0.77 despite the inherent ambiguity in human communication.

Motivation

As both a creator and a researcher, I’ve experienced how difficult it is to parse tone online. Comments that appear neutral at first glance often carry emotional weight or subtle aggression. Manually filtering these is exhausting and emotionally draining. I wanted to build something that could help automate this task, especially for creators with large audiences.

Tools & Frameworks

  • HuggingFace Transformers (BERT)

  • Python, Jupyter Notebook

  • Scikit-learn

  • Matplotlib & Seaborn for visualization

  • Pandas, NumPy for data handling

Dataset

Hand-collected ~2,500 comments from Instagram and YouTube. Manually labeled into 7 categories:

  • Supportive

  • Neutral

  • Emotionally Heavy

  • Blunt/Criticism

  • Passive-Aggressive

  • Objectifying

  • Parasocial

Workflow

  1. Data Collection & Labeling: Exported comment threads, removed duplicates, and labeled each comment by tone.

  2. Preprocessing: No emoji or case removal (retained for tone context). Used bert-base-uncased tokenizer.

  3. Training:

    • Wrapped data into PyTorch Dataset class

    • Used HuggingFace Trainer API with early stopping

    • Fine-tuned pre-trained BERT

    • Hyperparameters: LR = 1e-5, 15 epochs, batch size 16

  4. Evaluation:

    • Weighted F1-score = 0.77

    • Confusion matrix showed strongest reliability in detecting "Emotionally Heavy" and "Objectifying" tones

    • Most confusion occurred between "Supportive" vs "Neutral" and "Blunt" vs "Criticism"

Results

  • F1-score: 0.77 (weighted)

  • Confusion matrix + classification report included

  • Prediction demo: Live demo with predict_label() function to classify new comments

Challenges & Lessons

  • Tone is inherently ambiguous; even humans disagree on labeling

  • Learned importance of class balance and metric trade-offs

  • HuggingFace's Trainer made fine-tuning accessible and modular

Designed by Christine Zhou · Coded with ☕ + ❤️ in Framer

Designed by Christine Zhou · Coded with ☕ + ❤️ in Framer