Nowadays, the spread and growth of Web 2.0 applications have created a new world of communication and collaboration. More than a billion people around the world are connected by social networks and instant messengers. They support themselves through wikis, podcasts, discussion posts, comments, and instant messaging. New generation of online social networks is enabled by the widespread diffusion of high-speed Internet and has even emerged as a mainstream communication and interaction modality with ever-increasing significance in the information society in which we live today. Facebook, Instagram, Twitter, LinkedIn, and other similar online social networking sites provide online spaces where individuals can create a profile and connect it to others in order to create a personal network. In online social networking sites, the objective is the social interaction and connection. These online social networking sites give everyone a place to share their personal stories, in words, pictures, and videos with their friends. They also connect people with friends and others who work, study, and live around them. They help people learn more about events, parties, and other social functions. Participation and continuance in online social networks represent a new social phenomenon that depends largely on the interactions with other users in a personal network. Social media has become an important open communication medium for understanding user opinions and evaluating trends in several field of research. We are in the era of sharing millions of images on social media. The most popular social media sites are Facebook, Instagram, Twitter and Pinterest. This aspect has motivated much work on social media data analysis using machine learning and deep learning techniques. Sentiment analysis has been defined as the computational study of opinions and sentiments expressed in texts, with a simplified definition: “a personal positive or negative feeling or opinion”. The research in this area is classifying the text according to its polarity: positive, negative, and neutral (not expressing any feeling). Generally, there are two main approaches to sentiment analysis: the first approach consists of lexicon-based models, and the second one involves the machine learning-based methods. Machine learning-based methods use several text features as input for a training model and then predict the sentiment of text using these features. Among supervised, semi-supervised and unsupervised machine learning techniques that perform sentiment classification, the most popular are algorithms based on deep neural networks and generative adversarial networks. Deep learning techniques enable machines to learn to classify data by themselves; for example, a deep learning image analysis tool can learn to recognise images that contain cats, without specifically being told what a cat looks like. With automated Deep Learning-powered solutions for social media monitoring it is possible to get access to actionable insights to manage user image effectively and find out users’ feedback (even when it was not meant to be heard initially, which is the best part of the whole thing) timely. The major challenges are the inherent difficulties of tracking and quantifying the overwhelmingly large amount and unstructured set of data. A large body of extant research uses the quantitative summaries of user-generated content (UGC), such as overall valence and volume of user review ratings, to represent the users’ opinions. To achieve this goal, a set of images and posts are collected by conducting automated searches for hashtags and posts. For example, people take pictures and insert text into photos with the aid of photo editing software. In order to estimate the meaning of a picture, it is essential not only to judge the visual elements but also to understand the meaning of the included text. Generally, a social media platform (Instagram, Twitter, and Facebook) has different ways of introducing messages from its users. The main content in the post is often the text accompanied by hashtags. Consequently, a post usually comprises three main pieces of content: text, hashtags, and an image. The approach to estimate the overall sentiment of a picture based on both visual and textual information is performed by evaluating the sentiment of a picture by a machine learning classifier based on visual and textual features extracted from two specially trained Deep Convolutional Neural Networks (DCNNs). The visual feature extractor is based on the VGG16 network architecture and it is trained by fine-tuning a model pretrained on the ImageNet dataset. While the visual feature extractor is applied to the whole image, the textual feature extractor detects and recognizes texts before extracting features. The textual feature extractor is based on the DCNN architecture and is created by fine-tuning a model which has been previously trained on synthesized social media images. Based on these features, six state-of-the-art classifiers, namely kNearest Neighbors (kNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Naive Bayes (NB) and Artificial Neural Network (ANN) are compared to recognize the overall sentiment of the images.