Autism Spectrum Disorder (ASD) is a neurological disorder which might have a lifelong impact on the language learning, speech, cognitive, and social skills of an individual. However, the current ViT-ARDNet-LSTM model is mostly dependent on standardized, high-quality images and scans, which might not always be accessible in a variety of medical facilities. The generalizability and robustness of the model in real-world medical environments may also be limited by how differently images and scans acquisition techniques, scanner types, and image resolutions vary throughout institutions. In this paper, a novel DL-based AUSD-XVGG model is proposed for ASD classification using Xception and VGG16. Initially, the input images are preprocessed using log transformation and normalization to enhance the image and remove the noise. MobileNetV2 is used for feature extraction of ASD images to extract the features. The hybrid classification is Xception and VGG16, which captures the depth and spatial features of facial expressions. The AUSD-XVGG approach classifies as autism and normal. The performance of the AUSD-XVGG approaches was assessed using the metrics such as F1 score, specificity, recall, accuracy, and precision. The AUSD-XVGG approach achieves a high accuracy of 99.07% for ASD. The AUSD-XVGG improves the accuracy range of 26.09%, 5.35% and 2.96% better than DNN, ViT-ARDNet-LSTM, and IMFRCNN respectively