Amirabbas Afzali

amir.jpg

I’m Amirabbas Afzali, a B.Sc. student in Electrical Engineering with a minor in Mathematics at Sharif University of Technology, specializing in Communication Systems. Last summer, I was a research intern at the MLBio Lab at EPFL, working with Prof. Maria Brbić on Weak-to-Strong generalization for preference alignment in large language models.

I’m broadly interested in reliable decision-making in machine learning systems, including trustworthy ML, optimization, and reinforcement learning, especially where these topics intersect with human-AI alignment.

My current research focuses on understanding how preferences, robustness, and feedback signals shape model behavior. My recent work spans several research areas, including:

  • Post-training techniques for LLMs, such as preference learning and alignment
  • Trustworthy and robust machine learning, with emphasis on adversarial robustness and safety
  • Offline and robust reinforcement learning

Lately, I’ve been especially interested in the following topics — feel free to reach out if they resonate:

(i) LLM safety and adversarial alignment 🔗

(ii) Steering Vector for test-time alignment 🔗

(iii) Certified robustness and model verification 🔗

selected publications

  1. LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders
    Borna Khodabandeh, Amirabbas Afzali, Amirhossein Afsharrad, and 4 more authors
    Advances in Neural Information Processing Systems, 2025
  2. Aligning Visual Contrastive learning models via Preference Optimization
    Amirabbas Afzali*, Borna Khodabandeh*, Ali Rasekh, and 3 more authors
    International Conference on Learning Representations, 2025
  3. RLC
    rlc.png
    One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise
    Amirabbas Afzali, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, and 1 more author
    Reinforcement Learning Conference, 2025