research-article
- Authors:
- Zhuoran Ji The University of Hong Kong, Hong Kong, China
The University of Hong Kong, Hong Kong, China
View Profile
- Cho-Li Wang The University of Hong Kong, Hong Kong, China
The University of Hong Kong, Hong Kong, China
View Profile
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingJune 2022Article No.: 10Pages 1–12https://doi.org/10.1145/3524059.3532368
- 2citation
- 404
- Downloads
Metrics
Total Citations2Total Downloads404Last 12 Months197
Last 6 weeks20
- Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
- Publisher Site
- Get Access
ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing
Efficient exact K-nearest neighbor graph construction for billion-scale datasets using GPUs with tensor cores
Pages 1–12
PreviousChapterNextChapter
ABSTRACT
Approximate nearest neighbor search plays a fundamental role in many areas, and the k-nearest neighbor graph (KNNG) becomes a promising solution, especially in high-dimensional space. The advantages of KNNG come at the expense of high construction time, which is in quadratic time complexity in the number of points. Many GPUs have adopted specialized hardware units for matrix multiplication, providing an even higher arithmetic throughput. This paper presents flyKNNG, a GPU KNNG construction algorithm for billion-scale datasets. It deploys the distance matrix calculation to matrix multiplication units and adopts on-the-fly top-k selection to avoid transferring the exa-scale distance matrix to/from device memory. flyKNNG co-designs the two key algorithms to optimize the overall performance: the distance matrix calculation algorithm considers the data communication costs and pruning strategy of top-k selection; the top-k selection algorithm is also specially designed for on-the-fly usage, which impacts the data reuse and instruction-level parallelism of the distance matrix calculation as little as possible. Moreover, our top-k selection algorithm is optimized for the special data layout adopted by most matrix multiplication units. Experiments show that flyKNNG achieves 4.67X speedup compared with CUML/FAISS, one of the state-of-the-art approaches.
References
- David Adedayo Adeniyi, Zhaoqiang Wei, and Y Yongquan. 2016. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics 12, 1 (2016), 90--108.Google Scholar
Cross Ref
- Tolu Alabi, Jeffrey D Blanchard, Bradley Gordon, and Russel Steinbach. 2012. Fast k-selection algorithms for graphics processing units. Journal of Experimental Algorithmics (JEA) 17 (2012), 4--1.Google Scholar
- Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti. 2016. YFCC100M-HNfc6: a large-scale deep features benchmark for similarity search. In International Conference on Similarity Search and Applications. Springer, 196--209.Google Scholar
Cross Ref
- AMD. 2020. Introducing AMD CDNA architecture. https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdfGoogle Scholar
- Artem Babenko and Victor Lempitsky. 2016. Efficient indexing of billion-scale datasets of deep descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2055--2063.Google Scholar
- Ricardo J Barrientos, José I Gómez, Christian Tenllado, Manuel Prieto Matias, and Mauricio Marin. 2011. kNN query processing in metric spaces using GPUs. In European Conference on Parallel Processing. Springer, 380--392.Google Scholar
Cross Ref
- Vishwanath Bijalwan, Vinay Kumar, Pinki Kumari, and Jordan Pascual. 2014. KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application 7, 1 (2014), 61--70.Google Scholar
Cross Ref
- Deng Cai. 2019. A revisit of hashing algorithms for approximate nearest neighbor search. IEEE Transactions on Knowledge and Data Engineering (2019).Google Scholar
- Jieyang Chen, Nan Xiong, Xin Liang, Dingwen Tao, Sihuan Li, Kaiming Ouyang, Kai Zhao, Nathan DeBardeleben, Qiang Guan, and Zizhong Chen. 2019. TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs. In Proceedings of the ACM International Conference on Supercomputing. 106--116.Google Scholar
Digital Library
- Jack Choquette and Wish Gandhi. 2020. Nvidia A100 GPU: Performance & innovation for GPU computing. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1--43.Google Scholar
Cross Ref
- Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191--198.Google Scholar
Digital Library
- Wei Dong, Charikar Moses, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World wide web. 577--586.Google Scholar
Digital Library
- Carlos Eiras-Franco, David Martínez-Rego, Leslie Kanthan, César Piñeiro, Antonio Bahamonde, Bertha Guijarro-Berdiñas, and Amparo Alonso-Betanzos. 2020. Fast Distributed k NN Graph Construction Using Auto-tuned Locality-sensitive Hashing. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 6 (2020), 1--18.Google Scholar
Digital Library
- Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2017. Fast approximate nearest neighbor search with the navigating spreading-out graph. arXiv preprint arXiv:1707.00143 (2017).Google Scholar
- Fabian Groh, Lukas Ruppert, Patrick Wieschollek, and Hendrik Lensch. 2019. Ggnn: Graph-based gpu nearest neighbor search. arXiv preprint arXiv:1912.01059 (2019).Google Scholar
- Sadegh Bafandeh Imandoust and Mohammad Bolandraftar. 2013. Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. International Journal of Engineering Research and Applications 3, 5 (2013), 605--610.Google Scholar
- Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117--128.Google Scholar
- Hervé Jégou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. 2011. Searching in one billion vectors: re-rank with source coding. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 861--864.Google Scholar
Cross Ref
- Zhuoran Ji and Cho-Li Wang. 2021. Accelerating DBSCAN Algorithm with AI Chips for Large Datasets. In 50th International Conference on Parallel Processing. 1--11.Google Scholar
- Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with gpus. IEEE Transactions on Big Data (2019).Google Scholar
Cross Ref
- Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, et al. 2019. A study of BFLOAT16 for deep learning training. arXiv preprint arXiv:1905.12322 (2019).Google Scholar
- Ivan Komarov, Ali Dashti, and Roshan M D'Souza. 2014. Fast k-NNG construction with GPU-based quick multi-select. PloS one 9, 5 (2014), e92409.Google Scholar
Cross Ref
- Quansheng Kuang and Lei Zhao. 2009. A practical GPU based kNN algorithm. In Proceedings. The 2009 International Symposium on Computer Science and Computational Technology (ISCSCI 2009). Citeseer, 151.Google Scholar
- Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. 2019. Approximate nearest neighbor search on high dimensional data---experiments, analyses, and improvement. IEEE Transactions on Knowledge and Data Engineering 32, 8 (2019), 1475--1488.Google Scholar
Cross Ref
- Heng Liao, Jiajin Tu, Jing Xia, and Xiping Zhou. 2019. Davinci: A scalable architecture for neural network computing. In 2019 IEEE Hot Chips 31 Symposium (HCS). IEEE Computer Society, 1--44.Google Scholar
Cross Ref
- Bruno Meyer, Aurora Pozo, and Wagner M Nunan Zola. 2021. Warp-centric K-Nearest Neighbor Graphs construction on GPU. In 50th International Conference on Parallel Processing Workshop. 1--10.Google Scholar
Digital Library
- NVIDIA. 2021. CUDA Basic Linear Algebra Subroutine library. https://docs.nvidia.com/cuda/cublas/index.htmlGoogle Scholar
- NVIDIA. 2021. CUDA Toolkit Documentation. https://docs.nvidia.com/cuda/index.htmlGoogle Scholar
- NVIDIA. 2021. GPU Machine Learning Algorithms. https://rapids.aiGoogle Scholar
- NVIDIA. 2021. NVIDIA Ampere GA102 GPU Architecture. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdfGoogle Scholar
- NVIDIA. 2021. NVIDIA Nsight Compute Kernel Profiling Guide. https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.htmlGoogle Scholar
- Md Aamir Raihan, Negar Goli, and Tor M Aamodt. 2019. Modeling deep learning accelerator enabled gpus. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 79--92.Google Scholar
Cross Ref
- Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, and Lihi Zelnik-Manor. 2021. Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972 (2021).Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211--252.Google Scholar
- SCI-Compiler. 2018. Ping Pong Buffer. http://www.scicompiler.doud/userguide/PingPongBuffer.htmlGoogle Scholar
- Anil Shanbhag, Holger Pirk, and Samuel Madden. 2018. Efficient top-k query processing on massively parallel hardware. In Proceedings of the 2018 International Conference on Management of Data. 1557--1570.Google Scholar
Digital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google Scholar
Cross Ref
- Xiaoxin Tang, Zhiyi Huang, David Eyers, Steven Mills, and Minyi Guo. 2015. Efficient selection algorithm for fast k-nn search on gpus. In 2015 IEEE International Parallel and Distributed Processing Symposium. IEEE, 397--406.Google Scholar
Digital Library
- Vasily Volkov. 2016. Understanding latency hiding on GPUs. University of California, Berkeley.Google Scholar
- Hui Wang, Wan-Lei Zhao, and Xiangxiang Zeng. 2021. Large-Scale Approximate k-NN Graph Construction on GPU. arXiv preprint arXiv:2103.15386 (2021).Google Scholar
- Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021. A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search. arXiv preprint arXiv:2101.12631 (2021).Google Scholar
- Liu Yingfan, Cheng Hong, and Cui Jiangtao. 2021. Revisiting k-Nearest Neighbor Graph Construction on High-Dimensional Data: Experiments and Analyses. arXiv preprint arXiv:2112.02234 (2021).Google Scholar
Cited By
View all
Index Terms
Efficient exact K-nearest neighbor graph construction for billion-scale datasets using GPUs with tensor cores
Computing methodologies
Parallel computing methodologies
Parallel algorithms
Massively parallel algorithms
Information systems
Information retrieval
Retrieval models and ranking
Top-k retrieval in databases
Recommendations
- Confirmation Sampling for Exact Nearest Neighbor Search
Similarity Search and Applications
Abstract
Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest ...
Read More
- K-Nearest Neighbor Finding Using MaxNearestDist
Similarity searching often reduces to finding the k nearest neighbors to a query object. Finding the k nearest neighbors is achieved by applying either a depth- first or a best-first algorithm to the search hierarchy containing the data. These ...
Read More
- Approximate Direct and Reverse Nearest Neighbor Queries, and the k-nearest Neighbor Graph
SISAP '09: Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Retrieving the \emph{k-nearest neighbors} of a query object is a basic primitive in similarity searching. A related, far less explored primitive is to obtain the dataset elements which would have the query object within their own \emph{k}-nearest ...
Read More
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Publication
- Information
- Contributors
Published in
ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing
June 2022
514 pages
ISBN:9781450392815
DOI:10.1145/3524059
- General Chairs:
- Lawrence Rauchwerger
University of Illinois at Urbana-Champaign
, - Kirk Cameron
Virginia Tech
, - Program Chairs:
- Dimitrios S. Nikolopoulos
Virginia Tech
, - Dionisios Pnevmatikatos
National Technical University of Athens
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [emailprotected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2022
Permissions
Request permissions about this article.
Author Tags
- KNNG construction
- parallel computing
- top-k selection
Qualifiers
- research-article
Conference
Acceptance Rates
Overall Acceptance Rate629of2,180submissions,29%
Funding Sources
Other Metrics
View Article Metrics
- Bibliometrics
- Citations2
Article Metrics
- View Citations
2
Total Citations
404
Total Downloads
- Downloads (Last 12 months)197
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all
PDF Format
View or Download as a PDF file.
eReader
View online with eReader.
eReader
Digital Edition
View this article in digital edition.
View Digital Edition
- Figures
- Other
Close Figure Viewer
Browse AllReturn
Caption
View Table of Contents
Export Citations
Your Search Results Download Request
We are preparing your search results for download ...
We will inform you here when the file is ready.
Download now!
Your Search Results Download Request
Your file of search results citations is now ready.
Download now!
Your Search Results Download Request
Your search export query has expired. Please try again.