Home About

AI ToS analysis

#website
~3 min read by pluja, 2024-05-03

kycnot.me features a somewhat hidden tool that some users may not be aware of. Every month, an automated job crawls every listed service's Terms of Service (ToS) and FAQ pages and conducts an AI-driven analysis, generating a comprehensive overview that highlights key points related to KYC and user privacy.

Here's an example: Changenow's Tos Review

Changenow's Tos review

Why?

ToS pages typically contain a lot of complicated text. Since the first versions of kycnot.me, I have tried to provide users a comprehensive overview of what can be found in such documents. This automated method keeps the information up-to-date every month, which was one of the main challenges with manual updates.

A significant part of the time I invest in investigating a service for kycnot.me involves reading the ToS and looking for any clauses that might indicate aggressive KYC practices or privacy concerns. For the past four years, I performed this task manually. However, with advancements in language models, this process can now be somewhat automated. I still manually review the ToS for a quick check and regularly verify the AI’s findings. However, over the past three months, this automated method has proven to be quite reliable.

Having a quick ToS overview section allows users to avoid reading the entire ToS page. Instead, you can quickly read the important points that are grouped, summarized, and referenced, making it easier and faster to understand the key information.

Limitations

This method has a key limitation: JS-generated pages. For this reason, I was using Playwright in my crawler implementation. I plan to make a release addressing this issue in the future. There are also sites that don't have ToS/FAQ pages, but these sites already include a warning in that section.

Another issue is false positives. Although not very common, sometimes the AI might incorrectly interpret something harmless as harmful. Such errors become apparent upon reading; it's clear when something marked as bad should not be categorized as such. I manually review these cases regularly, checking for anything that seems off and then removing any inaccuracies.

Overall, the automation provides great results.

How?

There have been several iterations of this tool. Initially, I started with GPT-3.5, but the results were not good in any way. It made up many things, and important thigs were lost on large ToS pages. I then switched to GPT-4 Turbo, but it was expensive. Eventually, I settled on Claude 3 Sonnet, which provides a quality compromise between GPT-3.5 and GPT-4 Turbo at a more reasonable price, while allowing a generous 200K token context window.

I designed a prompt, which is open source1, that has been tweaked many times and will surely be adjusted further in the future.

For the ToS scraping part, I initially wrote a scraper API using Playwright2, but I replaced it with Jina AI Reader3, which works quite well and is designed for this task.

Non-conflictive ToS

All services have a dropdown in the ToS section called "Non-conflictive ToS Reviews." These are the reviews that the AI flagged as not needing a user warning. I still provide these because I think they may be interesting to read.

Feedback and contributing

You can give me feedback on this tool, or share any inaccuraties by either opening an issue on Codeberg4 or by contacting me 5.

You can contribute with pull requests, which are always welcome, or you can support this project with any of the listed ways.