Fuwen Tan

Research Scientist

Samsung AI Center, Cambridge

50/60 Station Road, Cambridge, UK

fuwen.tan@gmail.com


About me

I am a Research Scientist in the Samsung AI Center, Cambridge (SAIC-Cambridge), specializing in Vision & Language and on-device Large Language Models (LLMs). Here is my CV.

Updates

[01/2023] Our paper Effective Self-supervised Pre-training on Low-compute Networks without Distillation is accepted to ICLR 2023. Please find our code in SSLight.

[07/2022] Pleased to be recognized as an Outstanding Reviewer for ICML 2022.

[07/2022] Our EdgeViTs paper is accepted to ECCV 2022. Please find our code in EdgeViTs.

[07/2021] Our RRT paper is accepted to ICCV 2021. Code and pretrained models are released in RerankingTransformer.

[06/2021] I start working as a Researcher in the Samsung AI Center, Cambridge (SAIC-Cambridge).

[05/2021] I am recognized as an Outstanding Reviewer for CVPR 2021.

[04/2021] I successfully defended my PhD Dissertation: Learning Local Representations of Images and Text.

Research

Effective Self-supervised Pre-training on Low-compute Networks without Distillation

Fuwen Tan, Fatemeh Saleh, Brais Martinez
International Conference on Learning Representations (ICLR), 2023.
[ paper ]    [ code ]    [ poster ]    [ slide ]    [bibtex]

iBoot: Image-bootstrapped Self-Supervised Video Representation Learning

Fatemeh Saleh, Fuwen Tan, Adrian Bulat, Georgios Tzimiropoulos, Brais Martinez
[ paper ]    [bibtex]

EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers

Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, Brais Martinez
European Conference on Computer Vision (ECCV), 2022.
[ paper ]    [ code ]    [bibtex]

Instance-level Image Retrieval using Reranking Transformers

Fuwen Tan, Jiangbo Yuan, Vicente Ordonez
International Conference on Computer Vision (ICCV), 2021.
[ paper ]    [ code ]    [bibtex]

Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning

Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez
AAAI Conference on Artificial Intelligence (AAAI), 2021.
[ paper ]    [ code ]    [ bibtex ]

Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

Fuwen Tan, Paola Cascante-Bonilla, Xiaoxiao Guo, Hui Wu, Song Feng, Vicente Ordonez
Conf. on Neural Information Processing Systems (NeurIPS), 2019
[ paper ]    [ code ]    [ poster ]    [ bibtex ]

Text2Scene: Generating Compositional Scenes from Textual Descriptions

Fuwen Tan, Song Feng, Vicente Ordonez
Conf. on Computer Vision and Pattern Recognition (CVPR), 2019, (~Oral presentation + Best Paper Finalist)
Posts from NVIDIA Developer News, IBM Research Blog
[ paper ]    [ code ]    [ poster ]    [ slides ]    [ bibtex ]

Where and Who? Automatic Semantic-Aware Person Composition

Fuwen Tan, Crispin Bernier, Benjamin Cohen, Vicente Ordonez, Connelly Barnes
Winter Conf. on Applications of Computer Vision (WACV), 2018
[ paper ]     [ supplemental PDF ]     [ code ]     [ video ]     [ bibtex ]

FaceCollage: A Rapidly Deployable System for Real-time Head Reconstruction for On-The-Go 3D Telepresence

Fuwen Tan, Chi-Wing Fu, Teng Deng, Jianfei Cai, Tat Jen Cham
ACM Multimedia (ACM MM, full paper), 2017
[ paper ]    [ video]     [ poster ]    [ bibtex ]

High-Quality Kinect Depth Filtering For Real-time 3D Telepresence

Mengyao Zhao, Fuwen Tan, Chi-Wing Fu, Chi-Keung Tang, Jianfei Cai, Tat Jen Cham
Conf. on Multimedia and Expo (ICME), 2013
[ IEEE Xplorer ]     [bibtex]

Field-guided Registration for Feature-conforming Shape Composition

Hui Huang, Minglun Gong, Daniel Cohen-Or, Yaobin Ouyang, Fuwen Tan, Hao Zhang
ACM Transactions on Graphics (Proc. SIGGRAPH Asia), 2012
[ project ]    [paper]    [bibtex]

Thesis

PhD Dissertation: Learning Local Representations of Images and Text

Images and text inherently exhibit hierarchical structures, e.g. scenes built from objects, sentences built from words. In many computer vision and natural language processing tasks, learning accurate prediction models requires analyzing the correlation of the local primitives of both the input and output data. In this thesis, we develop techniques for learning local representations of images and text and demonstrate their effectiveness on visual recognition, retrieval, and synthesis. ...

[ thesis ]    [ slides ]