About 2,470,000 results
Open links in new tab
  1. Build an image-to-text generative AI application using …

    Oct 6, 2023 · The model architecture consists of an image encoder and a text encoder, as shown in the following diagram. During training, an image and corresponding text snippet are fed through the encoders to get an image feature vector and text feature vector.

  2. Convolutional Neural Network (CNN) architecture VGG16 model for learning the image features, uses Long Short-Term Memory (LSTM) for learning the text features, and combines the image’s result with an LSTM to generate a caption for the image. We use the LSTM model to generate text or sentences or captions for the given input images.

  3. Deep Learning for Image-to-Text Generation: A Technical Overview

    Nov 9, 2017 · In this article, we will first summarize this exciting emerging visual captioning area. We will then analyze the key development and the major progress the community has made, their impact in both research and industry deployment, and what lies ahead in future breakthroughs.

  4. GIT: A Generative Image-to-text Transformer for Vision and …

    May 27, 2022 · Abstract: In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering. While generative models provide a consistent network architecture between pre-training and fine-tuning, existing work typically contains complex structures (uni/multi ...

  5. AI for text-to-text, text-to-image and image-to-image tools, different types of language interweave, and we un-pack them into a map based on our design process. This map can help make machine-learning-powered frameworks more explainable for other architects, designers, or artists and can serve as a resource for creatives interested in

  6. Image-to-Text Generation: Bridging Visual and Linguistic Worlds

    Feb 25, 2025 · It provides a historical overview of image-to-text systems, from early optical character recognition (OCR) to sophisticated transformer-based and multimodal models capable of generating descriptive and contextually relevant text.

  7. Towards Mapping Images to Text Using Deep-Learning

    Sep 18, 2020 · In this paper, we investigate an approach for mapping images to text using a Kernel Ridge Regression model. We considered two types of features: simple RGB pixel-value features and image features extracted with deep-learning approaches.

  8. Image Caption Generation Using A Deep Architecture

    We used a combination of convolutional neural networks to extract features and then used recurrent neural networks to generate text from these features. We incorporated the attention mechanism while generating captions. We evaluated the model on MSCOCO database. The obtained results are promising and competitive.

  9. Generative AI: Building an Image Caption Generator from

    Feb 26, 2023 · Image captioning is the task of generating descriptive and relevant sentences for a given image. This task has two sub-task: Understanding the context of the given image. Represent that...

  10. Illustrative Guide : Image-to-Text using Transfer Learning

    Oct 9, 2023 · Given an image, Deep Learning Model (DLM) generates a relevant caption. In this step-by-step illustration of the concept, we’ve chosen a simplified sample dataset for the purpose of clarity. Our dataset comprises 20 training images, along with 10 images each for both the validation and test sets.

  11. Some results have been removed
Refresh