If a crawler from MyJay just visited your site.

It identifies itself as MyJaySearch/1.0 in its User-Agent string, with a link back to this exact page, and sends a custom X-Crawler-Info header pointing here too, so you don't have to guess what hit your server.

Stop it from visiting your site

Three ways, in order of how fast they take effect:

What gets indexed, and how

Three platforms, three different discovery methods:

For a discovered site, the crawl follows links up to two clicks deep from its homepage, extracting the page title, meta description, and visible body text (scripts, navs, and footers are stripped first). A handful of simple heuristics (presence of <canvas>, image-heavy pages, frequent <article> tags, a few keyword checks) infer rough tags like blog, art, or portfolio, shown as filters in search.

Nothing more than that is stored: no personal data is pulled out of a page, and raw page content isn't kept beyond what's needed to show a title, excerpt, and tags in results.

robots.txt and rate limits

robots.txt is fetched and checked before a single page on a domain is touched, including its Crawl-delay directive if it sets one. The crawler never goes faster than one request per second to any single domain, and slows down further if a site's robots.txt asks for it. A page-level <meta name="robots" content="noindex"> tag or X-Robots-Tag header is honored too, even when robots.txt itself allows the path.

If robots.txt can't be fetched at all (a timeout, a server error, anything other than a clean 404), the crawler skips that domain for the run entirely rather than guessing it's safe to proceed. A 404 is treated as "no restrictions," same as any other crawler would.

Re-crawl schedule

An incremental pass runs daily (recently changed MyJay sites, recently updated Neocities sites); a full pass runs weekly across everything currently indexed. Re-crawling refreshes a page's title, excerpt, and tags, and can pick up new pages linked from existing ones.