Publications

Preprint


Jianjin Xu, Zheyang Xiong, Xiaolin Hu, “Frame difference-based temporal loss for video stylization,” arXiv:2102.05822, 2021.

A simple loss that does not need the time-consuming estimation of optical flow.

Source codes

FDB loss

Xiao Li, Jianmin Li, Ting Dai, Jie Shi, Jun Zhu, Xiaolin Hu, “Rethinking Natural Adversarial Examples for Classification Models,” arXiv:2102.1173, Feb 2021.

How should we define the natural adversarial examples?

We propose the ImageNet-A-Plus dataset, which is modified from ImageNet-A.

ImageNet-A+

 


2024


Kai Li, Fenghua Xie, Hang Chen, Kexin Yuan, Xiaolin Hu, “An audio-visual speech separation model inspired by cortico-thalamo-cortical circuits,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 10, pp. 6637-6651, Oct 2024.

A brain-inspired model for audio-visual speech separation. The state-of-the-art model on this task.

Supplementary video and source codes

arXiv version

CTCNet

Xiaopei Zhu, Peiyang Xu, Guanning Zeng, Yingpeng Dong, Xiaolin Hu, “Natural language induced adversarial images,” ACM Multimedia, Melbourne, Australia, Oct 28-Nov 1, 2024.

AI

Xianghao Kong, Jinyu Chen, Wenguan Wang, Hang Su, Xiaolin Hu, Yi Yang, Si Liu, “Controllable navigation instruction generation with chain of thought prompting,” The 18th European Conference on Computer Vision (ECCV), MiCo Milano, Italy, Sep 29th-Oct 4th, 2024.

AI

Xiao Li, Yining Liu, Na Dong, Sitian Qin, Xiaolin Hu, “PartImageNet++ dataset: scaling up part-based models for robust recognition,” The 18th European Conference on Computer Vision (ECCV), MiCo Milano, Italy, Sep 29th-Oct 4th, 2024.

We propose a new dataset called PartImageNet++, providing high-quality part segmentation annotations for all categories of ImageNet-1K.

Dataset and source codes

PartImageNet++

Kai Li, Runxuan Yang, Fuchun Sun, Xiaolin Hu, “IIANet: an intra- and inter-modality attention network for audio-visual speech separation,” The 41st International Conference on Machine Learning (ICML), Vienna, Austria, July 21-27, 2024.

Inspired by the cross-modal processing mechanism in the brain, we design intra- and inter-attention modules to integrate auditary and visual information for efficient speech separation. The model simulates audio-visual fusion in different levels of sensory cortical areas as well as higher association areas such as parietal cortex.

Demo and source codes

IIANet

Xiao Li, Qiongxiu Li, Zhanhao Hu, and Xiaolin Hu, “On the privacy effect of data enhancement via the lens of memorization,” IEEE Transactions on Information Forensics and Security, vol. 9, pp. 4686-4699, 2024.

We investigated several nonintuitive and seemingly contradictory conclusions about privacy, data augmentation and adversarial robustness.

Source codes

CTCNet

Gang Zhang Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, Xiaolin Hu, “SAFDNet: A simple and effective network for fully sparse 3d object detection,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June 17-21, 2024. (Oral, 90 out of about 11500)

Source codes

small bulbs

Xiaopei Zhu, Yuqiu Liu, Zhanhao Hu, Jianmin Li, Xiaolin Hu, “Infrared adversarial car stickers”, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June 17-21, 2024.

We hide real cars against infrared car detectors.

small bulbs

Xiao Li, Wei Zhang, Yining Liu, Zhanhao Hu, Bo Zhang, Xiaolin Hu, “Language-driven anchors for zero-shot adversarial robustness,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June 17-21, 2024.

Source codes

small bulbs

Xiaopei Zhu, Xiao Li, Jianmin Li, Zheyao Wang, Xiaolin Hu, “Hiding from thermal imaging pedestrian detectors in the physical world,” Neurocomputing, vol. 564, article 126963, 2024.

Extention of our AAAI 2021 paper (use small bulbs).

small bulbs

Samuel Pegg, Kai Li, Xiaolin Hu, “RTFS-Net: recurrent time-frequency modelling for efficient audio-visual speech separation,” Proc. of the 12th International Conference on Learning Representations (ICLR), Vienna, Austria, May 7-11, 2024.

The first time-frequency domain audio-visual speech separation method that outperforms all contemporary time-domain counterparts. It uses only 1/100 parameters of VisualVoice, one of the previous SOTA methods.

Source codes

RTFS-net

Zhongfu Shen, Jiajun Yang, Qiangqiang Zhang, Kuiyu Wang, Xiaohui Lv, Xiaolin Hu, Jian Ma, Song-Hai Shi, “How variable progenitor clones construct a largely invariant neocortex,” National Science Review, vol. 11, no. 1, January 2024, nwad247.



progenitor clones

 


2023


Xiaopei Zhu, Zhanhao Hu, Siyuan Huang, Jianmin Li, Xiaolin Hu, Zheyao Wang, “Hiding from infrared detectors in real world with adversarial clothes,” Applied Intelligence, vol. 53, 29537-29555, 2023.

A infrared adversarial attack method based on carbon fiber heaters. A physical attack.

 

carbon fiber heater attack

Jiawei Shan, Gang Zhang, Chufeng Tang, Hujie Pan, Qiankun Yu, Guanhao Wu, and Xiaolin Hu, “Focal distillation from high-resolution data to low-resolution data for 3D object detection,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 12, pp. 14064-14075. 2023.

A method to utilize 64-channel LiDAR data to train an object detector that works on 16-channel LiDAR data.

Focal distillation

Samuel Pegg, Kai Li, Xiaolin Hu, “TDFNet: an efficient audio-visual speech separation model with top-down fusion,” Proceedings of the 13th International Conference on Information Science and Technology (ICIST), Cairo, Egypt, December 8-14, 2023.

We combine our TDANet and CTCNet for efficient audio-visual speech separation.

Source codes

TDFNet

Runxuan Yang, Yuyang Peng and Xiaolin Hu, “A fast high-fidelity source-filter vocoder with lightweight neural modules,” IEEE Transactions on Audio, Speech and Language Processing, vol. 31, pp. 3362-3373, 2023.

Singing voice synthesis

Demo and source codes

vocoder

Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Xiaolin Hu, “HEDNet: A Hierarchical Encoder-Decoder Network for 3D Object Detection in Point Clouds,” Advances in Neural Information Processing (NeurIPS), New Orleans, Dec 10-16, 2023.

We use encoder-decoder blocks to capture long-range dependencies among features in the spatial space. HEDNet was 50% faster than DSVT.

Source codes

HEDNet

Xinyi Li, Yanan Zhong, Hang Chen, Jianshi Tang, Xiaojian Zheng, Wen Sun, Yang Li, Dong Wu, Bin Gao, Xiaolin Hu, He Qian, Huaqiang Wu, “Memristors-based dendritic neuron for high-efficiency spatial-temporal information processing,” Advanced Materials, vol. 35, 2203684, 2023.

A brain-inspired equipment based on memristors for simulating the dynamic behavior of dentrites of biological neurons. The power consumption is 1000X lower than GPU running the same network.

 

memristor

Jianjin Xu, Zhaoxiang Zhang, Xiaolin Hu, “Extracting semantic knowledge from GANs with unsupervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 8, pp. 9654-9668, 2023.

This method has an interesting application: you can change the segmentation mask to generate desired images.

Demo and source codes

adjacency

Xiao Li, Ziqi Wang, Bo Zhang, Fuchun Sun, Xiaolin Hu, “Recognizing Object by components with human prior knowledge enhances adversarial robustness of deep neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8861-8873, July 2023.

arXiv version

This method is inspired by a well-known theory in cognitive psychology – recognition-by-components.

 

ROCK

Hector Martel, Julius Richter, Kai Li, Xiaolin Hu, Timo Gerkmann, “Audio-visual speech separation in noisy environments with a lightweight iterative model,” Proceedings of the INTERSPEECH, Dublin, Ireland, August 20-24, 2023.

Demo

Source codes

 

image segmentation

Chufeng Tang, Lingxi Xie, Xiaopeng Zhang, Xiaolin Hu, Qi Tian, “Visual recognition by request,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June 18-22, 2023.

Source codes

 

image segmentation

Zhanhao Hu, Wenda Chu, Xiaopei Zhu, Hui Zhang, Bo Zhang, Xiaolin Hu, “Physically realizable natural-looking clothing textures evade person detectors via 3D modeling,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June 18-22, 2023.

If you wear our designed camouflage clothing, the AI behind cameras may not detect you. (^_^)

Demo

 camouflage texture

Kai Li, Runxuan Yang, Xiaolin Hu, “An efficient encoder-decoder architecture with top-down attention for speech separation,” Proc. of the 11th International Conference on Learning Representations (ICLR), Kigali, Rwanda, May 1-5, 2023.

Top-down neural projections are ubiquitous in the brain. We found that this kind of projections are very useful for solving the Cocktail Party Problem.

Speech separation demo and music separation demo

Source codes

TDANet

 


2022


Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen, “Adjacency constraint for efficient hierarchical reinforcement learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4152-4166, 2022.

arXiv version

Extended version of our previous NeurIPS 2020 paper.

 

adjacency

Hang Chen, Chufeng Tang, Xiaolin Hu. "Dense contrastive loss for instance segmentation." Proc. of the British Machine Vision Conference (BMVC), London, UK, Nov. 21-24, 2022.

Source codes

 

image segmentation

Kai Li, Xiaolin Hu, Yi Luo, “On the use of deep mask estimation module for neural source separation systems,” Proceedings of the InterSpeech, Incheon, Korea, Sept. 18-22, 2022.

 

 

source separation

Haoran Chen, Jianmin Li, Simone Frintrop, and Xiaolin Hu, “The MSR-Video to Text dataset with clean annotations,” Computer Vision and Image Understanding, vol. 225, article no. 103581, 2022.

arXiv version

After cleaning the annotations, the perfromance of existing models increases. The cleaned dataset will be made available on request.

Source codes

 

MSR-VTT

Ting-Yu Kuo, Yuanda Liao, Kai Li, Bo Hong, Xiaolin Hu, “Inferring mechanisms of auditory attentional modulation with deep neural networks,” Neural Computation, vol. 34, no. 11, pp. 2205-2231, 2022.

With the help of DNNs, we suggest that the projection of top-down attention signals to lower stages within the auditory pathway of the human brain plays a more significant role than the higher stages in solving the "cocktail party problem".

Source codes

cocktail party problem

Shangqi Guo, Qi Yan, Xin Su, Xiaolin Hu, Feng Chen, “State-temporal compression in reinforcement learning with the reward-restricted geodesic metric,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5572-5589, 2022.

 

state-temporal compression

Xiaolin Hu, Chufeng Tang, Hang Chen, Xiao Li, Jianmin Li, Zhaoxiang Zhang, “Improving image segmentation with boundary patch refinement,” International Journal of Computer Vision, vol. 130, pp. 2571-2589, 2022.

A simple yet effective post-processing method to refine the results of image segmentation (semantic segmentation, instance segmentation and panoptic segmentation) models. Extension of our CVPR 2021 work.

Source codes

Boundary patch refinement network

Chufeng Tang, Lingxi Xie, Gang Zhang, Xiaopeng Zhang, Qi Tian, Xiaolin Hu, “Active pointly-supervised instance segmentation,” Proc. of European Conference on Computer Vision (ECCV), Tel-Aviv, Israel, Oct. 23-27, 2022.

arXiv version

We present an economic active learning setting, APIS, for instance segmentation, which saves annotation cost dramatically.

Source codes

APIS

Zhanhao Hu, Jun Zhu, Bo Zhang, Xiaolin Hu, “Amplification trojan network: attack deep neural networks by amplifying their inherent weakness,” Neurocomputing, vol. 505, pp. 142-153, 2022.

A new trojan network for attacking DNNs.

Source codes

trojan network

Jianfeng Wang, Thomas Lukasiewicz, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Alexandros Neophytou, “NP-match: when neural processes meet semi-supervised learning,” Proc. of the 39 th International Conference on Machine Learning (ICML), Baltimore, Maryland, USA, July 17-23, 2022.

 

AI

Jianfeng Wang, Xiaolin Hu, “Convolutional neural networks with gated recurrent connections,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3421-3425, 2022.

Extension of a previous work. We demonstrate the good performance of the Gated RCNN on image classification and object detection.

Source codes

Gated RCNN

Zhanhao Hu, Siyuan Huang, Xiaopei Zhu, Fuchun Sun, Bo Zhang, Xiaolin Hu, “Adversarial texture for fooling person detectors in the physical world”, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Orleans, Louisian, June 19-24, 2022. (Oral)

Supplementary Materials including Supplementary video

Supplementary video

arXiv version

Source codes

This paper tells you how to make a “invisibility cloak”!

invisible cloak

Xiaopei Zhu, Zhanhao Hu, Siyuan Huang, Jianmin Li, Xiaolin Hu, “Infrared invisible clothing: hiding from infrared detectors at multiple angles in real world”, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Orleans, Louisian, June 19-24, 2022. (Oral)

Supplementary Materials including Supplementary videos

Supplementary video 1 and supplementary video 2

arXiv version

This paper tells you how to make a “invisibility cloak” for infrared cameras!

invisible cloak

Xiaolin Hu, Zhigang Zeng, “Bridging the functional and wiring properties of V1 neurons through sparse coding,” Neural Computation, vol. 34, no. 1, pp. 104-137, 2022.

A standard excitatory-inhibitory neural network shows numerous functional and wiring properties of neurons in layer 2/3 of V1 after unsupervised learning on natural images. Many properties are predictions yet to be verified in biological experiments. One interesting property is the small-worldness.

Source codes

small world

 


2021


Xiaolin Hu, Kai Li, Weiyi Zhang, Yi Luo, Jean-Marie Lemercier, Timo Gerkmann, “Speech separation using an asynchronous fully recurrent convolutional neural network,” Advances in Neural Information Processing Systems (NeurIPS), Virtual, Dec 6-14, 2021.

A brain-inspired model for speech separation.

Demo and Source codes

A-FRCNN

Hang Chen, Xiao Li, Zefan Wang, Xiaolin Hu, “Robust logo detection in E-commerce images by data augmentation,” Proc. of the 29th ACM International Conference on Multimedia Workshop, pp. 4789-4793, Chengdu, China, Oct 20-24, 2021.

Ranked 5/36489 in ACM MM2021 Robust Logo Detection Grand Challenge.

Source codes

logo detection

Jiaheng Liu, Yudong Wu, Yichao Wu, Chuming Li, Xiaolin Hu, Ding Liang, Mengyu Wang, “DAM: Discrepancy Alignment Metric for Face Recognition; Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3814-3823, Virtual, Oct 11-17, 2021.

face recognition

Ge Gao, Mikko Lauri, Xiaolin Hu, Jianwei Zhang, Simone Frintrop, “CloudAAE: learning 6D object pose regression with on-line data synthesis on point clouds,” Proc. of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, May 30-June 5, 2021.

arXiv version

Source codes

AAE

Gang Zhang, Xin Lu, Jingru Tan, Jianmin Li, Zhaoxiang Zhang, Quanquan Li, Xiaolin Hu, “RefineMask: Towards high-quality instance segmentation with fine-grained features,“ Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June 19-25, 2021.

arXiv version

A coarse-to-fine strategy.

Source codes

refineMask

Chufeng Tang, Hang Chen, Xiao Li, Jianmin Li, Zhaoxiang Zhang, Xiaolin Hu, “Look closer to segment better: boundary patch refinement for instance segmentation,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June 19-25, 2021.

arXiv version

A post-processing model applicable to any instance segmentation method. We ranked the 1st on the Cityscapes leaderboard by the submission DDL of CVPR2021.

Source codes

BPR

Jianfeng Wang, Thomas Lukasiewicz, Xiaolin Hu, Jianfei Cai, Zhenghua Xu, “RSG: A simple yet effective module for learning imbalanced datasets,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June 19-25, 2021.

Source codes

AI

Xiang Li, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang, “Generalized Focal Loss V2: learning reliable localization quality estimation for dense object detection,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June 19-25, 2021.

arXiv version

An extension of the GFL in our NeurIPS 2020 paper.

Source codes

GFLv2

Weiyi Zhang, Shuning Zhao, Le Liu, Jianmin Li, Xingliang Cheng, Thomas Fang Zheng, Xiaolin Hu,“Attack on practical speaker verification system using universal adversarial perturbations,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual, June 6-11, 2021.

A physical attack on speaker verification systems.

Source codes

speaker verification

Xiaopei Zhu, Xiao Li, Jianmin Li, Zheyao Wang, Xiaolin Hu, “Fooling thermal infrared pedestrian detectors in real world using small bulbs,” The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), Virtual, Feb 2-9, 2021.

Supplementary document and Supplementary video

arXiv version

If you hold a cardboard embedded with small bulbs designed by us, you would not be detected by YOLOv3.

small bulbs

Han Liu, Shifeng Zhang, Ke Lin, Jing Wen, Jianmin Li, Xiaolin Hu, “Vocabulary-wide credit assignment for training image captioning models,” IEEE Transactions on Image Processing, vol. 30, pp. 2450-2460, 2021.

At each generation step, we assign a reward to every word in the vocabulary.

Source codes

credit assignment

Zi Yin, Valentin Yiu, Xiaolin Hu, Liang Tang, “End-to-end face parsing via interlinked convolutional neural networks,” Cognitive Neurodynamics, vol. 15, pp. 169-179, 2021.

Extension of a previous work for face parsing.

Source codes

face parsing

 


2020


Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen, “Generating adjacency-constrained subgoals in hierarchical reinforcement learning,” Advances in Neural Information Processing Systems (NeurIPS), Dec 6-12, 2020.

Spotlight paper.

A method for reducing the high-level action space for hierarchical reinforcement learning.

Supplementary Materials

Source codes

pop song structure

Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang, “Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection,” Advances in Neural Information Processing Systems (NeurIPS), Dec 6-12, 2020.

We propose a joint representation of localization quality and classification for object detection methods.

Source codes

GFL

Haoran Chen, Ke Lin, Alexander Maye, Jianming Li, and Xiaolin Hu, “A semantics-assisted video captioning model trained with scheduled sampling,” Frontiers in Robotics and AI, September 30, 2020.

Source codes

video-captioning

Weilun Chen, Zhaoxiang Zhang, Xiaolin Hu, Baoyuan Wu, “Boosting decision-based black-box adversarial attacks with random sign flip,” European Conference on Computer Vision, pp. 276-293. Springer, Cham, 2020.

 

adversarial

Jian Wu, Xiaoguang Liu, Xiaolin Hu, Jun Zhu, “PopMNet: generating structured pop music melodies using neural networks,” Artificial Intelligence, vol. 286, article 103303, 2020.

Generate the structure of a song first, then generate the melody.

Project page

pop song structure

Yulong Wang, Hang Su, Bo Zhang, Xiaolin Hu, “Learning reliable visual saliency for model explanations, ” IEEE Transactions on Multimedia, vol. 22, no. 7, pp. 1796-1807, 2020.

When you input an image of dog into a deep neural network, you use some existing methods to highlight the region of the dog by setting the output label as "dog", it is OK. But if you set the output label as "cat", you will find some weird results.
reliable explanation

Yulong Wang, Hang Su, Bo Zhang, Xiaolin Hu, “Interpret neural networks by extracting critical subnetworks,” IEEE Transactions on Image Processing, vol. 29, pp. 6707-6720, 2020.

Extension of (Wang et al. CVPR 2018). We extend the idea of critical routes for individual image samples to image categories.

 

melody

Jian Wu, Changran Hu, Yulong Wang, Xiaolin Hu, Jun Zhu, “A hierarchical recurrent neural network for symbolic melody generation,” IEEE Transactions on Cybernetics, vol. 50, no. 6, pp. 2749-2757, 2020.

arXiv:1712.05274

Automatic melody generation

All melodies used in experiments are available

melody

Jianqiao Guo, Yajun Yin, Xiaolin Hu, Gexue Ren, “Self-similar network model for fractional-order neuronal spiking: implications of dendritic spine functions,” Nonlinear Dynamics, vol. 100, pp. 921-935, 2020.

fractional-order

Haoran Chen and Jianmin Li and Xiaolin Hu, “Delving deeper into the decoder for video captioning,” The 24th European Conference on Artificial Intelligence (ECAI), Santiago de Compostela, Spain, August 29-September 2, 2020.

With a few techniques we boost the state-of-the-art results on video captioning benchmark datasets.

Source codes

video captioning

Ge Gao, Mikko Lauri, Yulong Wang, Xiaolin Hu, Jianwei Zhang, Simone Frintrop, “6D object pose regression via supervised learning on point clouds,” IEEE International Conference on Robotics and Automation (ICRA), Paris, France, May 31 to June 4, 2020.

Source codes

point cloud

Qiushan Guo, Xinjiang Wang, Yichao Wu, Zhipeng Yu, Ding Liang, Xiaolin Hu and Ping Luo, “Online knowledge distillation via collaborative learning,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June 16-18, 2020.

knowledge distillation

Yudong Wu, Yichao Wu, Ruihao Gong, Yuanhao Lv, Ken Chen, Ding Liang, Xiaolin Hu, Xianglong Liu and Junjie Yan, “Rotation consistent margin loss for efficient low-bit face recognition”, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June 16-18, 2020.

open set

Yulong Wang, Xiaolu Zhang, Xiaolin Hu, Bo Zhang, Hang Su, “Dynamic network pruning with interpretable layerwise channel selection, ”The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), New York, USA, Feb 7-12, 2020.

Source codes

dynamic pruning

Yulong Wang, Xiaolu Zhang, Lingxi Xie, Jun Zhou, Hang Su, Bo Zhang, Xiaolin Hu, “Pruning from scratch,” The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), New York, USA, Feb 7-12, 2020

Supplementary material

arXiv:1909.12579v1

We find that pre-training an over-parameterized model is not necessary for obtaining the target pruned structure. One can prune the model with its random initial weights.

Source codes

pruning-from-scratch

Xiang Li, Jun Li, Xiaolin Hu, Jian Yang, “Line-CNN: end-to-end traffic line detection with line proposal unit,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 1, pp. 248-258, 2020.

An end-to-end model to detect traffic lines at a speed of 30 f/s on a Titan X GPU. It's potentially useful for autonomous driving systems.

lineCNN

 


2019


Fangzhou Liao, Ming Liang, Zhe Li, Xiaolin Hu, Sen Song, “Evaluate the malignancy of pulmonary nodules using the 3-D deep leaky noisy-or network, ” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3484-3495, 2019.

arXiv:1711.08324

The winning solution to the Kaggle Data Science Bowl 2017. A 500,000 US dollar solution!

Source codes

lung

Chufeng Tang, Lu Sheng, Zhaoxiang Zhang, Xiaolin Hu, “Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization,” Proc. of IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, Oct 27–Nov 2, 2019. pp. 4997-5006.

Supplementary Materials

Source codes

 

pedestrain detection

Xiao Jin, Baoyun Peng, Yichao Wu, Yu Liu, Jiaheng Liu, Ding Liang, Junjie Yan, Xiaolin Hu, “Knowledge distillation via route constrained optimization,” Proc. of IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, Oct 27–Nov 2, 2019. pp. 1345-1354.

Oral paper.

A new knowledge distillation method for training a small neural network.

 

knowledge distillation

Xiang Li, Shuo Chen, Xiaolin Hu, Jian Yang, “Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, June 15–21, 2019.

This paper explains why the combination of Dropout and Batch Normalization (BN) often leads to worse performance in many modern neural networks.

 

BN-dropout

Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang, Selective Kernel Networks,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, June 15–21, 2019.

A neural network that performs better than ResNet, ResNeXt, SENet etc. for image classification.

Source codes

SKN

Niange Yu, Xiaolin Hu, Binheng Song, Jian Yang, Jianwei Zhang, “Topic-oriented image captioning based on order-embedding,” IEEE Transactions on Image Processing, vol. 28, no. 6, pp. 2743-2754, 2019.

Generate captions for images from different perspectives.

Source codes

image captioning

Shangqi Guo , Zhaofei Yu, Fei Deng, Xiaolin Hu, Feng Chen, “Hierarchical Bayesian inference and learning in spiking neural networks,” IEEE Transactions on Cybernetics, vol. 49, no. 1, pp. 133-145, 2019.

Spiking neural networks for Bayesian inference.

WTA network

Fangzhou Liao, Xi Chen, Xiaolin Hu, Sen Song, “Estimation of the volume of the left ventricle from MRI images using deep neural networks,” IEEE Transactions on Cybernetics, vol. 49, no. 2, pp. 495-504, 2019.

This algorithm got the 4th place in the Kaggle Data Science Bowl 2016

Source codes

heart network

Qingtian Zhang, Xiaolin Hu, Bo Hong, Bo Zhang, “A hierarchical sparse coding model predicts acoustic feature encoding in both auditory midbrain and cortex,” PLOS Computational Biology, 15(2): e1006766, 2019.

We used a hierarchical sparse coding model to reveal acoustic feature encoding mechanism in the auditory system. For example, interestingly, the artificial neurons in top layers exhibited phonetic feature encoding property. We found an important role of response sparseness for these properties to emerge.

Source codes

phoneme-encoding

Wei Feng, Wentao Liu, Tong Li, Jing Peng, Chen Qian, Xiaolin Hu, “Turbo learning framework for human-object interactions recognition and human pose estimation,” The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, Hawaii, USA, Jan 27-Feb 1, 2019.

Learn two tasks simutaneously, which help each other iteratively.

turbo learning

 


2018


Yi Zhang, Weichao Qiu, Qi Chen, Xiaolin Hu, Alan Yuille, “UnrealStereo: controlling hazardous factors to analyze stereo vision”, Proc. of the International Conference on 3DVision, Verona, Italy, September 5-8, 2018.

A synthetic image generation tool enabling to control hazardous factors, such as making objects more specular or transparent, for developing 3D vision algorithms.

denoiser

Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, Jun Zhu, “Defense against adversarial attacks using high-level representation guided denoiser,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 18-22, 2018.

Winning solution of the NIPS 2017 Competition on Adversarial Attacks and Defenses organized by Google Brain.

Source codes1

Source codes2

denoiser

Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, Jianguo Li, “Boosting adversarial attacks with momentum,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 18-22, 2018.

Spotlight paper.

Winning solution of the NIPS 2017 Competition on Adversarial Attacks and Defenses organized by Google Brain.

Source codes for non-targeted attack

Source codes for targeted attack

adverarial examples

Yulong Wang, Hang Su, Bo Zhang, Xiaolin Hu, “Interpret neural networks by identifying critical data routing paths,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 18-22, 2018.

We found that images with similar sementic meaning have similar critical routes in deep CNNs.

Source codes

routes

Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, Xiaolin Hu, “High Performance Visual Tracking with Siamese Region Proposal Network,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 18-22, 2018.

siamese

Wentao Liu, Jie Chen, Cheng Li, Chen Qian, Xiao Chu, Xiaolin Hu, “A cascaded inception of inception network with attention modulated feature fusion for human pose estimation,” The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), New Orleans, USA, Feb 2-7, 2018.

Erratum

Three techniques for human pose estimation: 1. inception of inception block, 2. attention to individual levels, 3. cascaded network.

pose

 


2017


Chengxu Zhuang, Yulong Wang, Daniel Yamins, Xiaolin Hu, “Deep learning predicts correlation between a functional signature of higher visual areas and sparse firing of neurons,” Frontiers in Computational Neuroscience, 2017. Doi: 10.3389/fncom.2017.00100

Study the visual system using deep learning models.

Dataset used in the paper

tornado

Jianfeng Wang, Xiaolin Hu, “Gated recurrent convolution neural network for OCR,” Advancies in Neural Information Processing (NIPS), Long Beach, USA, Dec. 4-9, 2017.

A modified version of our RCNN proposed in 2015.

Source codes

GRCNN

Zekun Hao, Yu Liu, Hongwei Qin, Junjie Yan, Xiu Li, Xiaolin Hu, “Scale-aware face detection,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21–26, 2017.

Prior to face detection, use a CNN to predict the scale distribution of the faces.

hierarchy

Tiancheng Sun, Yulong Wang, Jian Yang, Xiaolin Hu, “Convolution neural networks with two pathways for image style recognition,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4102-4113, 2017.

The gram matrix technique proposed by Gatys et al. is used to classify image styles. Three benchmark datasets are experimented, WikiPaintings, Flickr Style and AVA Style.

Source codes

art

J. Wu, L. Ma, X. Hu, “Delving deeper into convolutional neural networks for camera relocalization,” Proc. of IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 29- June 3, 2017.

We present three techniqus for enhancing the performance of convolutional neural networks for camera relocalizationare.

branchnet

F. Liao, X. Hu, S. Song, “Emergence of V1 recurrent connectivity pattern in artificial neural network,”Computational and Systems Neuroscience (Cosyne), Salt Lake City, Feb. 23 - 26, 2017.

 

 

ai

Y. Zhao, X. Jin, X. Hu, “Recurrent convolutional neural network for speech processing,” Proc. of the 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, March 5-9, 2017.

Applications of recurrent CNN to speech processing.

Source codes

speech

 


2016


Q. Zhang, X. Hu, H. Luo, J. Li, X. Zhang, B. Zhang, “Deciphering phonemes from syllables in blood oxygenation level-dependent signals in human superior temporal gyrus,” European Journal of Neuroscience, vol. 43, no. 6, pp. 773-781, 2016.

This is a "mind reading" work. We managed to decode the phonome information from functional magnetic resonance imaging (fMRI) signals of subjects when they listened to nine syllables. The results indicated that phonemes have unique representations in the superior temporal gyrus (STG). We also revealed certain response patterns of the phonomes in STG.

mind reading

H. Qin, J. Yan, X. Li, X. Hu, “Joint Training of Cascaded CNN for Face Detection,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016, pp. 3456-3465.

 

face detection

S. Wang, Y. Yang, X. Hu, J. Li, B. Xu, “Solving the K-shortest paths problem in timetable-based public transportation systems,” Journal of Intelligent Transportation Systems: Technology, Planning, and Operations, vol. 20, no. 5, pp. 413-427, 2016.

An extended version of the IMECS 2012 paper.

railway

 


2015


Z. Cheng, Z. Deng, X. Hu, B. Zhang, T. Yang, “Efficient reinforcement learning of a reservoir network model of parametric working memory achieved with a cluster population winner-take-all readout mechanism,” Journal of Neurophysiology, vol.114, no. 6, 3296-3305, 2015.

Learning of a reservoir network for working memory of monkey brain.

reservoir network

X. Li, S. Qian, F. Peng, J. Yang, X. Hu, and R. Xia, "Deep convolutional neural network and multi-view stacking ensemble in Ali mobile recommendation algorithm competition," The First International Workshop on Mobile Data Mining & Human Mobility Computing (ICDM 2015).

The team won the Ali competition. Rank 1st over 7186 teams.

.
tianchi competition

M. Liang, X. Hu, B. Zhang, “Convolutional neural networks with intra-layer recurrent connections for scene labeling,” Advances in Neural Information Processing Systems(NIPS), Montréal, Canada, Dec. 7-12, 2015.

caffe configs

An application of the recurrent CNN. It achieves excellent performance on the Stanford Background and SIFT Flow datasets.

ai

Y. Zhou, X. Hu, B. Zhang, “Interlinked convolutional neural networks for face parsing,” International Symposium on Neural Networks (ISNN), Jeju, Korea, Oct. 15-18, 2015, pp. 222-231.

A two-stage pipeline is proposed for face parsing and both stages use iCNN, which is a set of CNNs with interlinkage in the convolutional layers.

Source codes

iCNN

M. Liang, X. Hu, “Recurrent convolutional neural network for object recognition,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, USA, June 7-12, 2015, pp. 3367-3375.

cuda-convnet2 configs (used in the paper)

caffe configs

torch version

pytorch version (by Xiao Li)

Typical deep learning models for object recognition have feedforward architectures including HMAX and CNN.This is a crude approximation of the visual pathway in the brain since there are abundant recurrent connections in the visual cortex. We show that adding recurrent connections to CNN improves its performance in object recognition.

RCNN

X. Zhang, Q. Zhang, X. Hu, B. Zhang, “Neural representation of three-dimensional acoustic space in the human temporal lobe,” Frontiers in Human Neuroscience, vol. 9, article 203, 2015. doi: 10.3389/fnhum.2015.00203

Humans are able to localize the sounds in the environment. How the locations are encoded in the cortex remains elusive. Using fMRI and machine learning techniques, we investigated how the temporal cortex of humans encodes the 3D acoustic space.

fMRI

M. Liang, X. Hu, “Predicting eye fixations with higher-level visual features,” IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 1178-1189, 2015.

codes

There is a debate about whether low-level features or high-level features are more important for prediction eye fixations. Through experiments, we show that mid-level features and object-level features are indeed more effective for this task. We obtained state-of-the-art results on several benchmark datasets including Toronto, MIT, Kootstra and ASCMN at the time of submission.

saliency

M. Liang, X. Hu, “Feature selection in supervised saliency prediction,” IEEE Transactions on Cybernetics, vol. 45, no. 5, pp. 900-912, 2015.

(Download the computed saliency maps here)

There is a trend for incorporating more and more features for supervised learning of visual saliency on natural images. We find much redundancy among these features by showing that a small subset of features leads to excellent performance on several benchmark datasets. In addition, these features are robust across different datasets.

saliency

Q. Zhang, X. Hu, B. Zhang, “Comparison of L1-Norm SVR and Sparse Coding Algorithms for Linear Regression,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 8, pp. 1828-1833, 2015.

MATLAB codes

The close connection between the L1-norm support vector regression (SVR) and sparse coding (SC) is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the L1-SVR algorithms in efficiency. The SC algorithms are then used to design RBF networks, which are more efficient than the well-known orthogonal least squares algorithm.

RBF

 


2014


T. Shi, M. Liang, X. Hu, “A reverse hierarchy model for predicting eye fixations,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, USA, June 24-27, 2014, pp. 2822-2829.

We present a novel approach for saliency detection in natural images. The idea is from a theory in cognitive neuroscience, called reverse hierarchy theory, which proposes that attention propagates from the top level of the visual hierarchy to the bottom level.

rhm

X. Hu, J. Zhang, J. Li, B. Zhang, “Sparsity-regularized HMAX for visual recognition,” PLOS ONE, vol. 9, no. 1, e81813, 2014.

MATLAB codes

We show that a deep learning model with alternating sparse coding/ICA and local max pooling can learn higher-level features on images without labels. After training on a dataset with 1500 images, in which there were 150 unaligned faces, 6 units on the top layer became face detectors. This took a few hours on a laptop computer with 2 cores, in contrast to Google's 16,000 cores in a similar project.

sparse hmax

X. Hu, J. Zhang, P. Qi, B. Zhang, “Modeling response properties of V2 neurons using a hierarchical K-means model,” Neurocomputing, vol. 134, pp. 198-205, 2014.

We show that the simple data clustering algorithm, K-means can be used to model some properties of V2 neurons if we stack them into a hierarchical structure. It is more biologically feasible than the sparse DBN for doing the same thing because it can be realized by competitive hebbian learning. This is an extended version of our ICONIP'12 paper.

kmeans

P. Qi, X. Hu, “Learning nonlinear statistical regularities in natural images by modeling the outer product of image intensities,” Neural Computation, vol. 26, no. 4, pp. 693–711, 2014.

MATLAB codes

This is a hierarchical model aimed at modeling the properties of complex cells in the primary visual cortex (V1). It can be regarded as a simplified version of Karklin and Lewicki's model published in 2009.

outerproduct

 


2013


P. Qi, S. Su, X. Hu, “Modeling outer products of features for image classification,” Proc. of the 6th International Conference on Advanced Computational Intelligence (ICACI), Hangzhou, China, Oct. 19-21, 2013, pp.334-338.

The method described in our 2014 Neural Computation paper was applied on SIFT features for image classification (in the SPM framework), which achieved higher accuracy on two datasets than traditional sparse coding.

ai

M. Liang, M. Yuan, X. Hu, J. Li and H. Liu, “Traffic sign detection by ROI extraction and histogram features-based recognition,” Proc. of the 2013 International Joint Conference on Neural Network (IJCNN), Dallas, USA, Aug. 4-9, 2013, pp. 739-746.

The paper describes our method used for the IJCNN 2013 German Traffic Sign Detection Competition. This method achieved 100% accuracy on the Prohibitory signs!

traffic sign

Y. Wu, Y. Liu, J. Li, H. Liu, X. Hu, “Traffic sign detection based on convolutional neural networks,” Proc. of the 2013 International Joint Conference on Neural Network (IJCNN), Dallas, USA, Aug. 4-9, 2013, pp. 747-753.

The paper describes another method used for the IJCNN 2013 German Traffic Sign Detection Competition. This method ranked 2nd and 4th on the Mandatory and Danger signs, respectively!

traffic sign

 


2012


Y. Yang, Q. He, X. Hu, “A compact neural network for training support vector machines,” Neurocomputing, vol. 86, pp. 193-198, 2012.

A simple analog circuit is proposed for solving SVM. It takes advantages of the nonlinear properties of operational amplifiers.

svm

X. Hu and J. Wang, “Solving the assignment problem using continuous-time and discrete-time improved dual networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 5, pp. 821-827, 2012.

Assign n entities to n slots and each assignment has a cost.

assignment

X. Hu, P. Qi, B. Zhang, “Hierarchical K-means algorithm for modeling visual area V2 neurons,” Proc. of 19th International Conference on Neural Information Processing (ICONIP), Doha, Qatar, Nov. 12-15, 2012, pp. 373-381.

An extended version is in our 2014 neurocomputing paper.

ai

Y. Yang, S. Wang, X. Hu, J. Li, B. Xu, “A modified k-shortest paths algorithm for solving the earliest arrival problem on the time-dependent model of transportation systems,” Proc. of International MultiConference of Engineers and Computer Scientists (IMECS), Hong Kong, March 14-16, 2012, pp. 1560-1567.

If one wants to go to city B from city A by train and wants to arrive at A as early as possible, could you provide some "good" itinararies? Here is a fast solution. It gives you K best solutions for any citis A and B of mainland China within 30ms on a small server when K<100.

railway

2011

X. Hu, J. Wang, “Solving the assignment problem with the improved dual neural network,” Proc. of 8th International Symposium on Neural Networks, Guilin, China, May 29-June 1, 2011, pp. 547-556.

2010

X. Hu and B. Zhang, “A Gaussian attractor network for memory and recognition with experience-dependent learning,” Neural Computation, vol. 22, no. 5, pp. 1333-1357, 2010.

X. Hu, C. Sun and B. Zhang, “Design of recurrent neural networks for solving constrained least absolute deviation problems,” IEEE Transactions on Neural Networks, vol. 21, no. 7, pp. 1073-1086, July 2010.

X. Hu, “Dynamic system methods for solving mixed linear matrix inequalities and linear vector inequalities and equalities,” Applied Mathematics and Computation, vol. 216, pp. 1181-1193, 2010.

2009

X. Hu and B. Zhang, “An alternative recurrent neural network for solving variational inequalities and related optimization problems,” IEEE Transactions on Systems, Man and Cybernetics - Part B, vol. 39, no. 6, pp. 1640-1645, Dec. 2009.

X. Hu and B. Zhang, “A new recurrent neural network for solving convex quadratic programming problems with an application to the k-winners-take-all problem,” IEEE Transactions on Neural Networks, vol. 20, no. 4, pp. 654–664, April 2009.

X. Hu, “Applications of the general projection neural network in solving extended linear-quadratic programming problems with linear constraints,” Neurocomputing, vol. 72, no. 4-6, pp. 1131-1137, Jan. 2009.

X. Hu, J. Wang and B. Zhang, “Motion planning with obstacle avoidance for kinematically redundant manipulators based on two recurrent neural networks,” Proc. of IEEE International Conference on Systems, Man, and Cybernetics, San Antonio, USA, Oct. 2009, pp. 143-148.

X. Hu, B. Zhang, “Another simple recurrent neural network for quadratic and linear programming”, Proc. of 6th International Symposium on Neural Networks, Wuhan, China, May 26-29, 2009, pp. 116-125.

2008

X. Hu and J. Wang, “An improved dual neural network for solving a class of quadratic programming problems and its k-winners-take-all application,” IEEE Transactions on Neural Networks, vol. 19, no. 12, pp. 2022–2031, Dec. 2008.

X. Hu, Z. Zeng, B. Zhang, “Three global exponential convergence results of the GPNN for solving generalized linear variational inequalities”, Proc. of 5th International Symposium on Neural Networks, Beijing, China, Sep. 24-28, 2008.

2007

X. Hu and J. Wang, “Design of general projection neural networks for solving monotone linear variational inequalities and linear and quadratic optimization problems,” IEEE Transactions on Systems, Man and Cybernetics - Part B, vol. 37, no. 5, pp. 1414-1421, Oct. 2007.

X. Hu and J. Wang, “Solving generally constrained generalized linear variational inequalities using the general projection neural networks,” IEEE Transactions on Neural Networks, vol. 18, no. 6, pp. 1697-1708, Nov. 2007.

X. Hu and J. Wang, “A recurrent neural network for solving a class of general variational inequalities,” IEEE Transactions on Systems, Man and Cybernetics - Part B, vol. 37, no. 3, pp. 528-539, 2007.

X. Hu and J. Wang, “Solving the k-winners-take-all problem and the oligopoly Cournot-Nash equilibrium problem using the general projection neural networks.” Proc. of 14th International Conference on Neural Information Processing (ICONIP), Kitakyushu, Japan, Nov. 13-16, 2007, pp. 703-712.

S. Liu, X. Hu and J. Wang, “Obstacle Avoidance for Kinematically Redundant Manipulators Based on an Improved Problem Formulation and the Simplified Dual Neural Network”, Proc. of IEEE Three-Rivers Workshop on Soft Computing in Industrial Applications, Passau, Bavaria, Germany, August 1-3, 2007, pp. 67-72.

X. Hu and J. Wang, “Convergence of a recurrent neural network for nonconvex optimization based on an augmented Lagrangian function,” Proc. of 4th International Symposium on Neural Networks, Part III, Nanjing, China, June 3-7, 2007.

2006

X. Hu and J. Wang, “Solving pseudomonotone variational inequalities and pseudoconvex optimization problems using the projection neural network,” IEEE Transactions on Neural Networks, vol. 17, no. 6, pp. 1487-1499, 2006.

X. Hu and J. Wang, “Solving extended linear programming problems using a class of recurrent neural networks,” Proc. of 13th International Conference on Neural Information Processing, Part II, Hong Kong, Oct. 3-6, 2006.

 



© 2024 Xiaolin Hu. All rights reserved.