Jianghui Wang

MS/PhD student at KAUST

Email: jianghuiwang.ai[at]gmail[dot]com

My name is Jianghui (prounced: Jyang-Hway). I am a MS/PhD Student in King Abdullah University of Science and Technology, co-advised by Prof. Francesco Orabona and Prof. Marco Canini.

I earned my Bachelor’s degree from the Department of Computer Science and Technology at Nanjing University, where I had the privilege of being supervised by Prof. Yang Yu.

My research interests focus on the intersection of pretraining and vision-language modeling, with current themes including:

Robustness Pretraining: Developing models capable of generalizing across diverse and challenging environments.

Video Understanding: Enhancing comprehensive video understanding, with a particular focus on video streaming and long-format videos.

Publications

*: Equal contribution, ✉: Corresponding author

Latest

Task-Robust Pre-Training for Worst-Case Downstream Adaptation NeurIPS'23

Jianghui Wang, Yang Chen, Xingyu Xie, Cong Fang, and Zhouchen Lin

In Advances in Neural Information Processing Systems 2023 , 2023

Abs Bibtex

Pre-training has achieved remarkable success when transferred to downstream tasks. In machine learning, we care about not only the good performance of a model but also its behavior under reasonable shifts of condition. The same philosophy holds when pre-training a foundation model. However, the foundation model may not uniformly behave well for a series of related downstream tasks. This happens, for example, when conducting mask recovery regression where the recovery ability or the training instances diverge like pattern features are extracted dominantly on pre-training, but semantic features are also required on a downstream task. This paper considers pre-training a model that guarantees a uniformly good performance over the downstream tasks. We call this goal as downstream-task robustness. Our method first separates the upstream task into several representative ones and applies a simple minimax loss for pre-training. We then design an efficient algorithm to solve the minimax loss and prove its convergence in the convex setting. In the experiments, we show both on large-scale natural language processing and computer vision datasets our method increases the metrics on worse-case downstream tasks. Additionally, some theoretical explanations for why our loss is beneficial are provided. Specifically, we show fewer samples are inherently required for the most challenging downstream task in some cases.

@inproceedings{wang2023taskrobust, author = {Wang, Jianghui and Chen, Yang and Xie, Xingyu and Fang, Cong and Lin, Zhouchen}, booktitle = {Advances in Neural Information Processing Systems}, editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine}, pages = {9458--9478}, publisher = {Curran Associates, Inc.}, title = {Task-Robust Pre-Training for Worst-Case Downstream Adaptation}, url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/1e4322fddd833f83c855660ac65e428d-Paper-Conference.pdf}, volume = {36}, year = {2023} }
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning ArXiv

Jianghui Wang*, Yuxuan Wang*, Dongyan Zhao, and Zilong Zheng^✉

ArXiv , 2023

Abs Code Website Bibtex

We introduce MoviePuzzle, a novel challenge that targets visual narrative reasoning and holistic movie understanding. Despite the notable progress that has been witnessed in the realm of video understanding, most prior works fail to present tasks and models to address holistic video understanding and the innate visual narrative structures existing in long-form videos. To tackle this quandary, we put forth MoviePuzzle task that amplifies the temporal feature learning and structure learning of video models by reshuffling the shot, frame, and clip layers of movie segments in the presence of video-dialogue information. We start by establishing a carefully refined dataset based on MovieNet by dissecting movies into hierarchical layers and randomly permuting the orders. Besides benchmarking the MoviePuzzle with prior arts on movie understanding, we devise a Hierarchical Contrastive Movie Clustering (HCMC) model that considers the underlying structure and visual semantic orders for movie reordering. Specifically, through a pairwise and contrastive learning approach, we train models to predict the correct order of each layer. This equips them with the knack for deciphering the visual narrative structure of movies and handling the disorder lurking in video data. Experiments show that our approach outperforms existing state-of-the-art methods on the \MoviePuzzle benchmark, underscoring its efficacy.

@article{wang2023moviepuzzle, title={MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning}, author={Wang, Jianghui and Wang, Yuxuan and Zhao, Dongyan and Zheng, Zilong}, journal={arXiv preprint arXiv:2306.02252}, year={2023} }

2023

Shuō Wén Jiě Zì: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training ACL'23

Yuxuan Wang, Jianghui Wang, Dongyan Zhao, and Zilong Zheng

In ACL-Findings , 2023

Abs arXiv Code Bibtex

We introduce CDBERT, a new learning paradigm that enhances the semantics understanding ability of the Chinese PLMs with dictionary knowledge and structure of Chinese characters. We name the two core modules of CDBERT as Shuowen and Jiezi, where Shuowen refers to the process of retrieving the most appropriate meaning from Chinese dictionaries and Jiezi refers to the process of enhancing characters' glyph representations with structure understanding. To facilitate dictionary understanding, we propose three pre-training tasks, i.e., Masked Entry Modeling, Contrastive Learning for Synonym and Antonym, and Example Learning. We evaluate our method on both modern Chinese understanding benchmark CLUE and ancient Chinese benchmark CCLUE. Moreover, we propose a new polysemy discrimination task PolyMRC based on the collected dictionary of ancient Chinese. Our paradigm demonstrates consistent improvements on previous Chinese PLMs across all tasks. Moreover, our approach yields significant boosting on few-shot setting of ancient Chinese understanding.

@inproceedings{wang2023shuo, title={Shu\={o} W\'{e}n Ji\v{e} Z\`{i}: \\ Rethinking Dictionaries and Glyphs for Chinese Language Pre-training}, author={Wang, Yuxuan and Wang, Jianghui and Zhao, Dongyan and Zheng, Zilong}, booktitle={Findings of the Association for Computational Linguistics: ACL-Findings}, year={2023} }