Want to optimize your blog for the biggest search engine “Google” and set yourself up for success?
Then it’s critical to have a clear understanding of how Google search engine works today…
However, learning the complex technologies and internal algorithms of a search engine can be a bit overwhelming, especially for a non-techy guy like me.
This is why I’ve created this guide to explain the complex working process of Google search engine in plain English.
How Google Search Engine Works?
Search Engine like Google is a complex computer program. Its job is to provide internet users with the best result as quickly as possible.
How do they do it?
They do a lot of ‘preparation work’ in advance, so that when you search something in Google, you’re presented with a set of precise and quality results that answer your query or question.
‘Preparation work’ involves three main stages:
Below I am going to break down each of these processes, so you can understand better how Google search engine operates.
01. Crawling ─ How does Google crawl the web?
The internet is like an ever-growing library with billions of different books without any central filing system.
Search engine like Google uses automated software (known as web crawlers or spiders) to find new and updated content.
In the case of Google, they call their web crawler “Googlebot.”
Googlebot begin its crawling from –
This is a nonstop and fully automated process. Googlebot is therefore likely to crawl a specific site multiple time, depending on the nature of the site.
For example, Googlebot will visit a newspaper site like “Wall Street Journal” much more often than a simple portfolio sites.
Because “Wall Street Journal” get updated with new content and links more frequently than a simple static portfolio site.
TAKEAWAY FOR SITE OWNERS
Make sure your website is easily accessible to crawlers. If Googlebot cannot crawl it, they cannot index it in their database and that means your site will never appear in search results.
How to Improve your website crawling?
There are a number of things you can do to make sure GoogleBot discover the right pages on your site quickly.
1 Create a Robot.txt file for your website
Robots.txt files are located in the root directory of your website (example:- yourwebsite.com/robots.txt).
This allows you to specify which page/section of your site you want Google to crawl and not to crawl.
For example, WordPress admin pages or other pages that you don’t want to be publicly available on the internet can be easily blocked for crawl through Robot.txt.
2 Internally link your web pages
When adding new pages to your site, linking to it from existing pages on your site is a good way to make sure it gets discovered by Google.
3 Add an XML sitemap to your website
Use an XML sitemap to make a list of all important pages of your website.
Sitemap acts as an instruction manual for web crawlers, telling them which pages to crawl.
Once you have generated your sitemap, submit your sitemap URL in Google search console account.
02. Indexing ─ How Google Read and Store Website Information?
Indexing is the second stage of search engine working process.
Information identified by the GoogleBot while crawling process needs to be organized, sorted, and stored so that it can be processed later by google ranking algorithm.
When crawlers discover a new web page, they render the content of the page, just as a web browser does.
Then they take note of the key things and all those informations to their search index.
The key things include:
TAKEAWAY FOR SITE OWNERS
Make sure GoogleBot “see” your website how you want them to; control which part of the site you want them to index.
How to Improve your website Indexing?
There are many techniques you can use to improve Google’s ability to understand the content of your web page:
1 Use ‘No-Index’ tag when needed
You can prevent your low-quality pages from appearing in Google search result by including a noindex meta tag in the page’s HTML code, or by returning a noindex header in the HTTP request.
When Googlebot crawls that page and sees the noindex tag or header, it will not include that page in their search index.
Which pages to no index?
Any page that your audience will never interested in searching in Google can be considered as low-quality page and should be not indexed.
For example – A ‘Thankyou page‘ created for users signing up to your email list.
If you are a WordPress user, your SEO plugin gives ‘Robot Meta Tag’ functionality for every page & post.
Just select the box appearing before No Index and Google will never include that page in their search index.
2 How to Find how many pages of your website are indexed in Google?
It is very simple. Open Google and use the site operator followed by your website URL.
For example, If I search for “site:nerdblogging.com“, I can check the exact number of page from my site are indexed in Google.
3 How to request (re)indexing a web page in Google?
If one of your important page is not yet indexed in Google or maybe you made some serious update in one of your existing page, Google lets you manually request indexing for your page.
In Google Search Console (A free tool for webmasters), you get URL inspection option to check crawl, index, and serving information of a web page and even reindex a web page.
Just enter your web page URL in the search box appearing at top of the search console tool.
Within a few second, it will give you all the important info like last crawl time & date, latest status of indexing as well as a button to “Request indexing.”
03. Serving search results ─ How Google Rank Pages?
The third and final stage in the process is called ‘Ranking.’
In this stage, Google decides which web page to show in the SERPS and in what order when someone types a search query.
This is achieved through search engine ranking algorithms.
What is ranking algorithms?
In simple word, it is a piece of software having a number of rules to analyze what a searcher is looking for and which results best answer the query.
These rules and decisions are made based on what information is available in Google search index.
How does Google search algorithm work?
Over the years, Google search algorithms have evolved and become really complex.
In the early years of Google (2000-2005), it was as simple as matching user’s query with the heading of the page but this is no longer the case.
Today Google ranking algorithms take more than 200 factors into account to determine which pages to show in SERP and in what order.
Although, nobody knows the exact factor and their weight in ranking a web page, but we do know about the key ones through Google’s patents and documentation.
Let’s discuss the 5 major areas (officially listed by Google) that influence what results will be returned for a search query:
3.1 Meaning of the query
To return the most relevant results, Google first needs to understand what exactly is the user searching for and the intent behind his search.
For this they must understand and assess various things like:
3.2 The Relevance of Web Page
Next, Google algorithms analyze the content of pages to assess whether the page contains information that answers the user’s search query in the best way possible.
According to Google – when a web page contains the same keywords as the search query, especially in important positions like title & subheadings, then that’s a signal of relevance.
However, this idea is not foolproof in modern days SEO, which is why Google also look for the presence of other topically relevant words on the web page.
To give you an example, If you have written an article about “How to make cold brew coffee”, Google will not only scan your page for exact keyword but also topically relevant words like (like “filter”, “temperature”, “grind”, “cold water”, and “ice”).
3.3 Quality of content
Google have literally millions of web pages for each search query in their index, and they want to rank high-quality content above low-quality content.
The problem is that determining the quality of content is objectively tricky to nail.
This is where Google utilize ‘PageRank‘ algorithm.
What is the PageRank algorithm?
PageRank is a system designed to evaluate the “value of a page” by looking at the quality and quantity of other pages linking to it.
Think of backlinks (external links) as a vote of trust from other websites. When other website links to your page, they are vouching for your piece of content.
That means the more external link your page will have, it will rank higher in Google SERPs.
This is probably why most large-scale SEO studies show a clear correlation between backlink and ranking.
That said, it is important to note that not all backlinks are created equal.
The relevance and authority of linking website and web pages are also super important.
For example, let’s say you have an article about “Vegan Diet.” Google will give you more weight to a backlink from recipes site than a general politics site.
Similarly, the authority of linking sites also plays a major role.
For example, One link from a reputed site like “Forbes” can be more powerful than 100 low-quality websites.
3.4 Usability of webpages
As discussed earlier, Google wants to rank pages that make their users happy, and that goes beyond showing the relevant and quality result.
The content also needs to be easy to consume and accessible.
There are a couple of confirmed Google ranking factor that helps with that:
1 Browser compatibility
Internet users typically view your website using a web browser. Some famous browser includes Google Chrome, Opera browser, Microsoft Edge, and Firefox.
Each web browser interprets your website code in a slightly different manner, which means your website may appear differently to visitors using different browsers.
The best way to make sure that your site looks the same in all web browsers is to write your page using valid HTML & CSS code, and then test it in as many browser as possible (at least in the most popular one).
2 Page Speed
Nobody likes waiting for a web page to load, and Google understands it very well.
That’s why they made page speed a ranking factor for both desktop and mobile searches.
Use Google’s free tool like “PageSpeed insight” to check whether your page load under 3 seconds (an ideal website loading time).
More people use smartphones than computer/laptop to browse the internet, and that’s one reason there have been changes in how Google ranks a web page.
Fun Fact: As of January 2021, over 55% of Google searches happen on mobile devices.
Google introduced “Mobile-first indexing” in July 2019, which means Google predominantly uses the mobile version of the content for indexing and ranking.
So, if your site is not optimized for mobile devices, you’re in risk of getting needlessly under-ranked.
Things you can do to optimize your site for mobile:
Google has developed a free tool called “Mobile-friendly test” to check mobile-friendliness of a web page.
4 Security of the websites
Make sure to constantly monitor your website for security issues and if anything found, resolve as quickly as possible.
Now Google search console also provides you security issues report, which detects 6 common security issues like deceptive pages, malware and harmful downloads.
Plus, Make sure your site is having a valid SSL certificate and HTTPS in the domain.
3.5 Context and settings
Last but not least, Google also uses your location, past search history and Search settings to show the most useful and relevant result for you.
1 Location of the User
Some searches like “Coffee shop near me,” are obviously location-dependent.
But Google will rank results based on local factors even for the non-location specific search queries.
Example: Here is the result for the search term “Football” from India vs the United Kingdom.
2 History of searchers
Your previous searches and the search result you clicked on influence how Google will personalize results for you.
For example: If you search for the keyword “Hemingway”, you’ll get the results for both the “Ernest Hemingway” novelist and “Hemingway App.”
Now click on some of the results about ‘Hemingway’ app, and spend some time on each result.
Finally, search for the same keyword “Hemingway” again in Google, and this time you’ll see a greater number of results about Hemingway app than the novelist.
3 Search settings
Your search settings are also an important indicator of what kind of result you want to see.
Such as if you have selected a preferred language or opted for safe search, you’ll be presented with different results than other users.
Google Algorithm updates
Generally, we can classify Google algorithms update in two categories:
- Minor updates
- Core updates
Google make changes to its algorithm quite often.
And by quite often I mean on a daily basis. However, most of these changes are very small and not affect heavily any website.
But besides these small updates, Google rolls out a couple of big core algorithm updates every year.
These core updates create a lot of buzz in the SEO community as well as make a major impact on how we do SEO (search engine optimization).
Most important core algorithm updates
Here is a quick list of the most critical search algorithm changes in the last decade that shaped the way we do SEO today.
1 Panda (2011)
This was the first major update in the ‘modern SEO’ era. With this update, Google tried to deal with low-quality pages, thin content, keyword stuffing and duplicate content.
Initially, the effects of Panda were mild, but it was incorporated into Google’s core algorithm in 2016 and rolls out regularly.
LEARNING: Focus on creating original and high-quality content that actually adds value to your audience.
2 Penguin (2012)
Google’s Penguin update focused on the backlinks websites got from other sites.
It analyzed whether backlinks to a site were genuine, or if they have been manually created to just trick the Google algorithms.
It affected lots of websites who had created paid, spammy or irrelevant links just to boost their website ranking.
LEARNING: Never ever create spam links or buy links from a site. Focus on creating a site that your audience and industry people actually love. Once you have a good high-quality site, you will naturally get links from other sites in your niche.
3 Hummingbird (2013)
The Hummingbird update improved the way Google interprets search queries.
It helps Google shows results that match searchers intent as opposed to the individual terms within the query.
This update made it possible for a page to rank for a query even if it doesn’t contain the exact word the searcher entered in Google.
LEARNING: There is no longer any need of stuffing your web page with exact match keyword. Focus on comprehensive keyword research and create content that covers every aspect of the topic.
4 Pigeon (2014)
The pigeon update aimed to make local results more accurate and higher quality.
5 Mobile Update (2015)
Also known as ‘Mobilegeddon’ in the SEO community, this update gave mobile-friendly pages a ranking advantage in mobile search results.
LEARNING: Make sure your site looks good on mobile devices.
6 RankBrain (2015)
RankBrain is a part of Google’s Hummingbird algorithm introduced in 2013.
It is a machine learning (AI) system that helps Google process and understands search queries and then serve the most accurate response to those queries.
LEARNING: Optimize your web page for relevance and comprehensiveness. Google Machine learning algorithm is smart enough to understand the real intent and meaning behind searcher’s queries.
7 Medic (2018)
Google’s Medic Update heavily affected the YMYL (your money your life) pages, especially health-related contents.
With this update, Google made sure to give more preference to quality, authoritative and expert content in the search results.
LEARNING: Focus on creating authority and expertise in your niche industry. Google give more preference to the popular sites than less popular sites.
8 Bert (2019)
Another machine learning algorithm uses natural language processing technology to better understand search queries, interpret context and identify entities.
Google Search Quality Raters
Beside search algorithms and machine learning systems like RankBrain, Google also takes input from real people to improve the search ranking.
Basically, Google hires thousands of external employees from all over the world (called search quality raters) to evaluate its search results.
Raters are given actual searches that happen on Google to rate the quality of pages that appear in the top results.
It’s important to note that quality raters cannot influence the results and rankings directly.
A rater marking a particular page as low quality will not instantly damage the ranking of that page. Instead, the data generated by quality raters are used to improve the Google algorithms.
Note: Search quality raters follow a set of guidelines summed up in a 200-page PDF, instructing them on how to assess web page quality.
It is a publicly available document (check here) that can serve as a useful source of info on how to create quality pages.
Search engine like ‘Google’ is actually a very complex computer program.
Ya, it might feel like a magic that you type “how to make cold brew coffee” and within a fraction of second, you are presented with 10 quality web page showing cold brew coffee recipe.
But the way Google collect and make decisions is far from a normal internet user’s imagination.
The process starts with crawling and indexing. During this phase, Google web crawler gather as much information as possible for all sites that are available on the web.
They discover, process, sort and store this information in a systematized format, so that it can be used by ranking algorithms to make the correct decision and return the best result for user’s query.
As a website owner, your job is to make their crawling and indexing process easier by creating a website that has a simple and logical structure.
Once they can crawl and index your site without issues, you then need to create high-quality contents and give them the right signals to help the algorithms, rank your website’s content for relevant queries.
That is what Search Engine Optimization (SEO) is all about.
So, now you know the basic of How Google search engine works.
Please feel free to ask any question you might have about Google working process in the comments…