Analysis and Classification of Websites Using Artificial Intelligence for Domain Registration Authorities

Authors

Keywords:

scraping, OCR, artificial intelligence, domain analysis, distributed processing

Abstract

Massive web data collection is a key task for research, cybersecurity, market analysis, and national domain registries such as NIC.ar in Argentina. However, traditional scraping techniques face increasing challenges due to dynamic websites using images, banners, and elements generated with JavaScript. This paper proposes a hybrid scraping model combining traditional static and dynamic scraping with text recognition (OCR) and object recognition powered by artificial intelligence. We implemented two softbots: one for OCR (Tesseract) and one for object recognition (YOLO) on screenshots of websites previously inaccessible via traditional methods. The system processed 50,000 domains and was able to recover information from 80% of the previously unprocessable cases. This lays the groundwork for the next stage involving supervised learning-based website classification. 

Downloads

Published

2025-10-21

Issue

Section

SIE - Simposio de Informática en el Estado

How to Cite

Balich, N. A., & Balich, B. L. (2025). Analysis and Classification of Websites Using Artificial Intelligence for Domain Registration Authorities. JAIIO, Jornadas Argentinas De Informática, 11(13), 190-198. https://revistas.unlp.edu.ar/JAIIO/article/view/19899