A Grading System for Evaluating Geospatial Entity Connectivity from Texts Using Co-occurrences, Semantic Similarity and Geodesic Distance

Eirini Katsadaki *

School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Athens, Greece.

Georgios Bougas

School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Athens, Greece.

Margarita Kokla

School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Athens, Greece.

*Author to whom correspondence should be addressed.


Abstract

Extracting entity connectivity from texts is important for uncovering how places relate within real-world discourse. While structured data is informative, textual data captures rich contextual and semantic knowledge, enabling us to identify hidden networks of interdependence and thematic connections among geographic entities. Entity connectivity is not just complementary to information retrieval but rather essential in various activities, including event analysis, spatial decision support systems, urban studies, and knowledge graph development. This research proposes two versions of a grading system for evaluating connectivity between cities and other geopolitical entities, places, and events extracted from texts: one based on co-occurrences and semantic similarity (System A), and a second one (System B) that incorporates geodesic distance as an additional feature. The proposed grading systems may find practical implications in domains such as large-scale geographic information extraction, place-based information retrieval, and knowledge graph construction from unstructured data sources.

The two systems are evaluated and compared using six machine learning algorithms: Random Forest, Gradient Boosting, Multi-Layer Perceptron (MLP), K-Nearest Neighbors (KNN), Decision Tree, and Support Vector Machine (SVM). The performance of the algorithms is analyzed by measuring accuracy, precision, recall, F1-score, and R². Decision Tree was the winning algorithm for System A, with an accuracy score of 85% while KNN was the best performing algorithm for System B, with an accuracy score of 77%. The results show that the system without geodesic distance performs better on general texts, indicating that the addition of geographic features can introduce noise in text-driven contexts where spatial proximity is implicit or semantically inferred, and should therefore be applied selectively.

Keywords: grading system, connectivity, semantic similarity, machine learning


How to Cite

Katsadaki, Eirini, Georgios Bougas, and Margarita Kokla. 2026. “A Grading System for Evaluating Geospatial Entity Connectivity from Texts Using Co-Occurrences, Semantic Similarity and Geodesic Distance”. Journal of Geography, Environment and Earth Science International 30 (1):1-16. https://doi.org/10.9734/jgeesi/2026/v30i1999.

Downloads

Download data is not yet available.