A Methodical Analysis on Image Captioning Techniques

Pilli. Vijay Lakshmanrao; Kotta Vijay Kumar; Mekathoti Tejaswi; K Lakshman Rao

Pilli. Vijay Lakshmanrao Student, Department of CSE, GMR Inst. of Tech., Rajam, Andhra Pradesh, India.
Kotta Vijay Kumar Student, Dept. of CSE, GMR Inst. of Tech., Rajam, Andhra Pradesh, India.
Mekathoti Tejaswi Associate Professor, Dept. of CSE, GMR Inst. of Tech., Rajam, Andhra Pradesh, India.
K Lakshman Rao Student, Dept. of CSE, GMR Inst. of Tech., Rajam, Andhra Pradesh, India

Abstract

Image captioning, it is the process of describing an image given by connecting techniques like Computer Vision (CV) and Natural Language Processing (NLP). Image captioning is done by three different methods Object Detection technique, Encoder Decoder technique and Attention Mechanism. Each of these techniques have different approaches to obtain image captioning.

This paper handles the approaches and methodologies that are followed by the image. Natural Language Processing (NLP) linked with Computer Vision (CV) is the bridge for Image Captioning i.e., human-machine interaction (Neural network techniques as CNN and RNN), Artificial Intelligence (AI) as first of its kind. Improving the effectiveness of image using image attributes based on semantic attention, considering the attributes that contains the high-level awareness of image content and particular schematics of correlating captioning words. It is big task to convert visible data into text manner, on the either side of the coin image captioning algorithm is needed to amend the rough schematic concept to human like natural language descriptions step by step. Multi-level features fusion might be a better solution for image captioning had attracted numerous research interests and huge number of models are being proposed. The substantial advances in the deep neural networks attention model with spatial region is on focus. Neural Networks and computer vision approaches for image captioning in different models are presented.

How to cite this article:
Lakshmanrao PV, Kumar KV, Tejaswi M et al. Methodical Analysis on Image Captioning Techniques. J Adv Res Image Proc Appl 2021; 4(1): 7-11.

References

1. Xinyu Xiao, Ling Feng Wang, Kun Ding, Shiming Xiang, and Chunhong Pan, “Deep
Hierarchical Encoder-Decoder Network for Image Captioning” IEEE Transactions on Multimedia,2019.
2. Harshit Parikh, Harsh Sawant, Bhautik Parmar, Rahul Shah, Santosh Chapaneri, Deepak Jaiswal, “Encoder-Decoder Architecture for Image Caption Generation”, IEEE 3rd International Conference on Communication System, Computing and IT Applications,2020.
3. Imane Allaaouzi, M. Ben Ahmed, B. Benamrou, M. Ouardouz, ” Automatic Caption Generation for Medical Images’ Proceedings of the 3rd international conference on smart city applicatiuons,2018.
4. Bin Wang, Cungang Wang, Qian Zhang, Ying Su, Ang Wang and Yanyan Xu, “Cross-Lingual Image Caption Generation Based on Visual Attention Model,” in IEEE Access, Vol.8, 2020.
5. Ling Cheng, Wei Wei, Xianling Mao, Yong Liu, Chunyan Miao, “Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation,” in IEEE Access, Vol. 8, 2018.
6. Zongjian Zhang, Qiang Wu, Yang Wang, Fang Chen, “High-Quality Image Captioning with Fine-Grained and Semantic-Guided Visual Attention,” in IEEE Transactions on Multimedia, Vol. 21, pp. 1681-1693 2018.
7. Yiqing Huang, Jiansheng Chen, Wanli Ouyang, Weitao Wan, Youze Xue, “Image Captioning with End-to-End Attribute Detection and Subsequent Attributes Prediction,” in IEEE Transactions on Image Processing, Vol. 29, 2020.
8. Chunlei Wu, Shaozu Yuan, Haiwen Cao, Yiwei Wei, Leiquan Wang, “Hierarchical Attention-Based Fusion for Image Caption with Multi-Grained Rewards,” in IEEE Access, Vol. 8, 2000.
9. Xin hang song, Cheng Peng Chen and Gong Wei Chen,” Image captioning with object-to-object relations”, IEEE Transactions on Image Processing (vol 29),2019.
10. Yiping Huang, Wanly Ouyang and Youze Xue,” Image captioning with end-to-end attribute detection”, IEEE Transactions on Image processing (vol 29),2020.
11. Kang Tong, Yiquan wu, Fei Zhou, “Recent Advances in small Object detection based on deep Learning”, Science Direct, Image and Vision Computing, vol 97,2020.
12. Payal Mittal, Akash deep Sharma, Raman Singh, “Deep Learning based object detection in low altitude UAV datasets”, Science Direct, Image and vision Computing ,2020.
13. Genc Hoxha, Farid melgani, Jacopo Salaghenauffi, “A New CNN-RNN frame work for remote sensing Image captioning”, IEEE, Communication and image captioning,2020.