A meta robots tag is an HTML element placed in a page’s head that tells search engine crawlers how to treat that specific URL, including whether to index it, follow its links, and display a snippet in search results. The tag works at the page level rather than the site level, which makes it the right tool when one URL needs different treatment from the rest of the site. The rest of this guide walks through the mechanics, the directives the tag can carry, and how it differs from the more familiar robots.txt file.
That one element answers several questions at once. Should this page appear in search at all? Should search engines follow the links on it? Should they show a snippet, a cached copy, or an image preview? Each question maps to a directive inside the same small piece of HTML.
How a Meta Robots Tag Actually Works
The tag itself is short. A typical implementation looks like this: <meta name="robots" content="noindex, follow">. That single line, sitting in the document head, carries the entire instruction set for one URL. Search engines read it when they fetch the page and treat the values as directives for that crawl.
The mechanism is straightforward. When a crawler requests the HTML, it parses the head, finds the meta element, and applies the directives in the order they appear. If the value says noindex, the page is excluded from the search index. If it says follow, the crawler is free to discover new URLs through the links on that page. The same slot can be aimed at one specific crawler by swapping the name attribute. For example, <meta name="googlebot" content="noindex"> targets Google specifically rather than every bot that visits the page.
One detail that trips up experienced teams: the tag only works if the page is actually fetched. A robots.txt rule that blocks the crawler from requesting the URL will prevent the HTML from being read, which means the meta tag inside it is never seen. Google’s documentation on the robots meta tag flags this dependency explicitly, and MDN’s reference treats the same caveat as a basic requirement for the element to do anything at all – the same dependency the Clickside team treats as a baseline check in every technical audit.
The Main Directives You Can Use
The directives fall into three groups based on what they control. Each group uses the same HTML syntax with a different value in the content attribute, and the right choice depends entirely on the intent for that page.
Indexing and following
The two directives most teams actually use are noindex and nofollow, along with their positive counterparts index and follow. A noindex directive tells search engines not to include the page in their searchable database, so a confirmation page, a printer-friendly version, or a staging URL can be kept out of results. A nofollow directive tells crawlers not to use the links on the page to discover new URLs. In practice, index and follow are the defaults, so explicit declarations for those are usually unnecessary and can be omitted from the tag entirely.
Snippet and preview control
A second group of directives controls how the page may appear in search results without removing it from the index. This is the right tool when the page is fine to rank but the preview text, image, or video should be limited. The four directives in this group are:
- nosnippet, which blocks any text snippet from being shown in results
- max-snippet, which sets a character limit on the snippet text
- max-image-preview, which caps the size of image previews
- max-video-preview, which caps the length of video previews
Crawler-specific targeting
The same syntax can be aimed at a single bot by changing the name attribute. <meta name="googlebot" content="noindex"> keeps the page out of Google while leaving other crawlers unaffected, which is the standard pattern when only one search engine needs a special rule. Bing’s webmaster guidance documents its own crawler-specific tags in the same way, with separate directives for its bot.
Meta Robots vs. Robots.txt
The two are often confused because they share the word “robots,” but they answer different questions at different layers of the stack. robots.txt sits at the server level and decides whether a crawler is allowed to request a given URL at all. Meta robots sits inside the HTML and decides what happens to a page after the crawler has fetched it. One controls access. The other controls treatment.
The practical consequence is sharp. If robots.txt blocks a folder, crawlers will not fetch the HTML inside it, which means they will not see the meta robots tag on any of those pages. Adding a noindex to a page that is already blocked by robots.txt does not deindex it, because the directive is never read. This single misunderstanding accounts for a large share of “why is this still showing in Google” tickets.
Want a clear picture of how your directives actually behave on a live site? The team at Clickside can map every meta robots tag against your robots.txt rules in a single crawl.
Where to Use It and How to Check It
The tag earns its keep on a predictable set of page types. Common use cases include:
- internal search results pages
- faceted or filtered navigation URLs
- duplicate printer-friendly versions of articles
- thank-you and confirmation pages after form submission
- staging or test environments
- thin utility pages with no real search value
- temporary campaign landing pages
The most common pattern is noindex, follow, used on pages that should not rank but still contain useful internal links pointing deeper into the site. The opposite pattern, noindex, nofollow, is reserved for pages that should be invisible to search engines and not contribute to link discovery at all. Verification is just as direct as the implementation. View the page source in a browser, run a site audit with an SEO crawler, or submit the URL to a search engine inspection tool to see how the tag is being parsed. For non-HTML files like PDFs or images, the same directives can be sent as an HTTP header known as the X-Robots-Tag, which is the only option when there is no HTML head to edit. The Clickside team typically templates these rules directly into the CMS so regressions do not slip through on new page types.
The One Rule That Makes Meta Robots Easy
Meta robots is a per-page instruction that only matters when crawlers can actually read the page, and only works when its directives match the intent for that URL. When the tag, robots.txt, canonical signals, and internal linking all agree, indexing behaves predictably. When they disagree, search engines pick the more restrictive reading.
Open one of your own pages, view source, and look for the meta robots element. Check whether the content attribute matches what you actually want that page to do in search. If it is missing, decide whether that absence is deliberate or a template oversight worth fixing today.
Ready to clean up your indexation signals end to end? Hand the audit to Clickside and get a tailored report of every meta robots issue across your site.