Hungarian Student Develops Highly Accurate AI Text Detector

Pexels
A student at the University of Szeged has developed an AI-based text detector that works in Hungarian and can distinguish human-written content from AI-generated text with exceptional accuracy, offering a new tool for education, media, and beyond.

Mihály Kiss, a master’s student in software engineering at the University of Szeged (SZTE), has developed an AI-powered text recognition tool capable of identifying whether a text was written by a human or generated by large language models, including in Hungarian. According to the university, testing shows the system performs with outstanding accuracy.

The idea for the project emerged in mid-2023, when tools such as ChatGPT and other large language models rapidly entered public use. Initially proposed by Kiss’s supervisor as a thesis topic, the research quickly proved to be more than a theoretical exercise. As AI writing tools become increasingly widespread, their long-term impact on education and other fields has raised growing concerns.

The detector works on the principle that large language models leave identifiable patterns in text, as they have not fully adapted to natural human writing. Until now, high-quality AI detectors for Hungarian did not exist, mainly due to the lack of suitable training data. While English-language models benefit from vast, carefully curated datasets, comparable Hungarian-language resources were previously unavailable.

To address this, Kiss created a dataset of more than 350,000 texts drawn from a wide range of sources, including literary works, academic theses, articles, online forums, social media, and general internet content. The aim was to expose the system to authentic and diverse language use rather than limited, textbook-style examples.

Although the solution itself is AI-based, its architecture is designed for decision-making rather than content generation. It relies on an encoder-based model optimized for classification tasks, functioning in a way similar to a spam filter. The programme estimates the probability that a given text was produced by artificial intelligence.

The tool is publicly accessible, with up to three free detections per day, and can analyze even short texts, such as a Facebook post, almost instantly.

In comparative tests on 1,000 different Hungarian-language texts, Kiss’s model achieved an accuracy score of 0.98. Competing detectors that claim to support Hungarian reportedly scored around 0.6. Particularly notable is the system’s very low false-positive rate, meaning it rarely labels human-written text as AI-generated. This is especially critical in educational settings, where false accusations could have serious consequences for students.

While educators are expected to be among the primary users, the detector has broader potential applications. Media organizations, editorial teams, and book publishers could use it for quality control, while it may also function as a form of fake news detection, given the tendency of large language models to generate inaccurate or fabricated information.

The tool could also prove valuable in legal contexts, where AI-generated inaccuracies pose significant risks, or in human resources departments seeking to filter AI-written job applications. Kiss Mihály’s work has been recognized with the SZTE Student Innovation Award.


Related articles:

Hungary Sets Legal Framework for Responsible Artificial Intelligence
Artificial Intelligence Creates More Jobs than It Replaces, Study Finds
A student at the University of Szeged has developed an AI-based text detector that works in Hungarian and can distinguish human-written content from AI-generated text with exceptional accuracy, offering a new tool for education, media, and beyond.

CITATION