CVPR 作为核算机视觉三大尖端会议之一，一直以来都备受重视。被 CVPR 录入的论文更是代表了核算机视觉范畴的最新发展方向和水平。本年，CVPR 2019 将于美国洛杉矶举行，上个月接收效果发布后，又引起了 CV 届的一个小高潮，一时间涌现出许多 CVPR 论文的解读文章。
依据 CVPR 官网论文列表核算的数据，本年度共有 1300 篇论文被接收，而这个数据在曩昔 3 年分别为 643 篇（2016）、783 篇（2017）、979 篇（2018）。这从一个方面也说明晰核算机视觉这个范畴的方兴未已，核算机视觉作为机器认知国际的根底，也作为最首要的人工智能技术之一，正在遭到越来越多的重视。
全球的学者近期都沉浸在 CVPR 2019 的海量论文中，期望能榜首时间接触到最前沿的研讨效果。但在这篇文章里，咱们先把 CVPR 2019 的论文放下，一起回忆下 CVPR 2018 的论文状况。
依据谷歌学术上的数据，我不要啊师傅们核算出了 CVPR 2018 录入的 979 篇论文中被引证量最多的doctor前五名，期望能从引证量这个数据，了解到这些论文中，有哪些最为全球的学者们所重视。
依据 CVPR 2018 的论文列表（http://openaccess.thecvf.com/CVPR2018.py）在谷歌学术进行查找，得到数据如下（以 2019 年 3 月 19 日检索到的数据为准，因第 2 名及第 3 名数据十分挨近，不做清晰排名） ：
CVPR 2018 的高被引数论文都是取得学术界较大重视和推重的论文，这首要在于他们的开创性。例如，排名榜首的 Squeeze-and-Excitation Networks（简称 SE-Net）构造就十分简略，很简单被布置，不需要引进新的函数或许层，并且在模型和核算杂乱度上具有杰出的特性。
凭借 SE-Net，论文作者在 ImageNet 数据集大将 Top-5 error 降低到 2.251%（此前的最佳效果为 2.991%），取得了ImageNet 2017 比赛图画分类的冠军。在曩昔一年里，SE-Net 不只作为业界功能极强的深度学习网络单元被广泛运用，也为其他学者的研讨供给了参阅。
此外，还有 Google Brain 带来的Learning Transferable Architectures for Scalable Image Recognition，提出了用一个神经网络来学习另一个神经网络的结构，也为许多学者所重视。
以下是 5 篇文章的摘要，以供读者们回忆：
Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel-wise information together within local recept冀文平ive fields. In order to boost the representational power of a network, several recent approaches have shown the benefit of enhancing spatial encoding.
In this work, we focus on the channel relationship and propose a novel architectural unit, which we term t四物汤he “Squeeze- and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modeling interdependencies between channels. We demonstrate that by stacking these blocks together, we can construct SENet architectures that generalise extremely well across challenging datasets.
Crucially, we find that SE blocks produce significant performance improvements for existing state-of-t通天塔,CVPR2019接收效果发布了，但CVPR 2018的那些论文都怎么样了？,男儿行he-art deep architectures at a minimal additional computational cost. SENets formed the foundation of our ILSVRC 2017 classification submission which won first place and significantly reduced the top-5 error to 2.251%, achieving a ∼25% relative improvement over the winning entry of 2016. Code and models are available at https: //github.com/hujie-frank/SENet.
We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while mainta邪魔缠身的约纳斯小姐inin灵脉傲神州g accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet on ImageNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ∼13 actual speedup over AlexNet while maintaining comparable accuracy.
Developing neural network image classification models often requires significant archit通天塔,CVPR2019接收效果发布了，但CVPR 2018的那些论文都怎么样了？,男儿行ecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive when the dataset is large, we propose to search for an architectural building block on a small dataset and then transfer the block to a larger dataset.
The key contribution of this work is the design of a new search space (which we call the “NASNet search space”) which enables transferability. In our experiments, w通天塔,CVPR2019接收效果发布了，但CVPR 2018的那些论文都怎么样了？,男儿行e search for the best convolutional layer (or “cell”) on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, which we name a “NASNet architecture”.
We also introduce a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models. On CIFAR-10 itself, a NASNet found by our method achieves 2.4% error rate, which is state-of-the-art. Although the cell is not searched for directly on ImageNet, a NASNet constructed from the best cell achieves, among the published works, state-of-the-art accuracy of 82.7% top-1 and 96.2% top-5 on ImageNet. Our model is 1.2% better in top-1 accuracy than the best human-invented architectures while having 9 billion fewer FLOPS – a reduction of 28% in computational demand from the previous state-of-the-art model.
When evaluated at different levels of computational cost, accuracies of NASNets exceed those of the state-of-the-art human-designed models. For instance, a small version of NASNet also achieves 74% top-1 accuracy, which is 3.1% better than equivalently-sized, state-of-the-art models for mobile platforms. Finally, the image features learned from image classification are 熊猫加速器generically useful and can be transferred to other computer vision problems. On the task of object detection, the learned features by NASNet used with the Faster-RCNN framework surpass state-天气预报直播of-the-art by 4.0% achieving 43.1% mAP on the COCO dataset.
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demondualstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which wmmpe call Mobile DeepLabv3.
is ba萧蔷春光外泄sed on an inverted residual structure where通天塔,CVPR2019接收效果发布了，但CVPR 2018的那些论文都怎么样了？,男儿行 the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational p通天塔,CVPR2019接收效果发布了，但CVPR 2018的那些论文都怎么样了？,男儿行ower. We demonstrate that this improves performanc尧建云e and provide an intuition that led to this design.
Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further格兰仕微波炉 analysis. We measure our performance on ImageNet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accurac今后的今后y, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image und通天塔,CVPR2019接收效果发布了，但CVPR 2018的那些论文都怎么样了？,男儿行erstanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and topdown attention me通天塔,CVPR2019接收效果发布了，但CVPR 2018的那些论文都怎么样了？,男儿行chanism that enables attention to be calculated at the level of objects anddnf体会服 other salient image regions. This is the natural basis for attention to be considered.
Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Ch阳光藏汉翻译allenge.
• PaperWeekly 默许每篇文章都是首发，均会增加“原创”标志