Hi, I’m Oubo Ma, a third-year PhD student at Zhejiang University in the NESA Lab, advised by Prof. Shouling Ji. My research focuses on the security of reinforcement learning (RL), including adversarial policies and backdoor attacks.
My passion for RL originates from my love of puzzle games, which sparks my fascination with sequential decision-making processes. Over time, I observe that while most researchers prioritize improving the performance of RL algorithms across various tasks, there is comparatively less focus on addressing their security risks. This gap motivates me to explore the potential vulnerabilities of RL from multiple perspectives and to develop solutions aimed at improving the safety and reliability of RL in real-world applications. Outside of academics, I’m a passionate basketball fan and enjoy comedy. Feel free to email me if you’d like to chat about either of these topics!
📝 Conference Publications
- SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems. Oubo Ma, Yuwen Pu, Linkang Du, Yang Dai, Ruo Wang, Xiaolei Liu, Yingcai Wu, Shouling Ji. CCS 2024.
- Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning? Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Shengchao Hu, Mengzhu Wang, Shouling Ji, Jincai Huang, Li Shen. NeurIPS 2024.
- Text Laundering: Mitigating Malicious Features Through Knowledge Distillation of Large Foundation Models. Yi Jiang, Chenghui Shi, Oubo Ma, Youliang Tian, Shouling Ji. Inscrypt 2023. Best Student Paper Award.
📝 Journal Publications
- ABM-V: An Adaptive Backoff Mechanism for Mitigating Broadcast Storm in VANETs. Oubo Ma, Xuejiao Liu, Yingjie Xia. IEEE Transactions on Vehicular Technology, 2023.
- RLID-V: Reinforcement Learning-Based Information Dissemination Policy Generation in VANETs. Yingjie Xia, Xuejiao Liu, Jing Ou, Oubo Ma. IEEE Transactions on Intelligent Transportation Systems, 2023.
- HDRS: A Hybrid Reputation System with Dynamic Update interval for Detecting Malicious Vehicles in VANETs. Xuejiao Liu, Oubo Ma, Wei Chen, Yingjie Xia, Yuxuan Zhou. IEEE Transactions on Intelligent Transportation Systems, 2022.
📝 arXiv
- UNIDOOR: A Universal Framework for Action-Level Backdoor Attacks in Deep Reinforcement Learning. Oubo Ma, Linkang Du, Yang Dai, Chunyi Zhou, Qingming Li, Yuwen Pu, Shouling Ji. arXiv 2025.
- Reformulation is All You Need: Addressing Malicious Text Features in DNNs. Yi Jiang, Oubo Ma, Yong Yang, Tong Zhang, Shouling Ji. arXiv 2025.
📖 Educations
- 2022.09 - Present, PH.D., Zhejiang University.
- 2019.09 - 2022.06, M.E., Hangzhou Normal University.
- 2015.09 - 2019.06, B.E., Wenzhou University.