Analysis and Classification of Websites Using Artificial Intelligence for Domain Registration Authorities
Keywords:
scraping, OCR, artificial intelligence, domain analysis, distributed processingAbstract
Massive web data collection is a key task for research, cybersecurity, market analysis, and national domain registries such as NIC.ar in Argentina. However, traditional scraping techniques face increasing challenges due to dynamic websites using images, banners, and elements generated with JavaScript. This paper proposes a hybrid scraping model combining traditional static and dynamic scraping with text recognition (OCR) and object recognition powered by artificial intelligence. We implemented two softbots: one for OCR (Tesseract) and one for object recognition (YOLO) on screenshots of websites previously inaccessible via traditional methods. The system processed 50,000 domains and was able to recover information from 80% of the previously unprocessable cases. This lays the groundwork for the next stage involving supervised learning-based website classification.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Néstor Adrián Balich, Bernice Lourdes Balich

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Acorde a estos términos, el material se puede compartir (copiar y redistribuir en cualquier medio o formato) y adaptar (remezclar, transformar y crear a partir del material otra obra), siempre que a) se cite la autoría y la fuente original de su publicación (revista y URL de la obra), b) no se use para fines comerciales y c) se mantengan los mismos términos de la licencia.











