alex / what-happens-when
- суббота, 21 мая 2016 г. в 03:15:45
315 stars today
An attempt to answer the age old interview question "What happens when you type google.com into your browser and press enter?"
This repository is an attempt to answer the age old interview question "What happens when you type google.com into your browser's address box and press enter?"
Except instead of the usual story, we're going to try to answer this question in as much detail as possible. No skipping out on anything.
This is a collaborative process, so dig in and try to help out! There's tons of details missing, just waiting for you to add them! So send us a pull request, please!
This is all licensed under the terms of the Creative Commons Zero license.
Read this in 简体中文 (simplified Chinese). NOTE: this has not been reviewed by the alex/what-happens-when maintainers.
WM_KEYDOWN
message is sent to the appKeyDown
NSEvent is sent to the appThe following sections explains all about the physical keyboard and the OS interrupts. But, a whole lot happens after that which isn't explained. When you just press "g" the browser receives the event and the entire auto-complete machinery kicks into high gear. Depending on your browser's algorithm and if you are in private/incognito mode or not various suggestions will be presented to you in the dropbox below the URL bar. Most of these algorithms prioritize results based on search history and bookmarks. You are going to type "google.com" so none of it matters, but a lot of code will run before you get there and the suggestions will be refined with each key press. It may even suggest "google.com" before you type it.
To pick a zero point, let's choose the Enter key on the keyboard hitting the bottom of its range. At this point, an electrical circuit specific to the enter key is closed (either directly or capacitively). This allows a small amount of current to flow into the logic circuitry of the keyboard, which scans the state of each key switch, debounces the electrical noise of the rapid intermittent closure of the switch, and converts it to a keycode integer, in this case 13. The keyboard controller then encodes the keycode for transport to the computer. This is now almost universally over a Universal Serial Bus (USB) or Bluetooth connection, but historically has been over PS/2 or ADB connections.
In the case of the USB keyboard:
In the case of Virtual Keyboard (as in touch screen devices):
screen controller
then raises an interrupt reporting the coordinate of
the key press.The keyboard sends signals on its interrupt request line (IRQ), which is mapped
to an interrupt vector
(integer) by the interrupt controller. The CPU uses
the Interrupt Descriptor Table
(IDT) to map the interrupt vectors to
functions (interrupt handlers
) which are supplied by the kernel. When an
interrupt arrives, the CPU indexes the IDT with the interrupt vector and runs
the appropriate handler. Thus, the kernel is entered.
WM_KEYDOWN
message is sent to the appThe HID transport passes the key down event to the KBDHID.sys
driver which
converts the HID usage into a scancode. In this case the scan code is
VK_RETURN
(0x0D
). The KBDHID.sys
driver interfaces with the
KBDCLASS.sys
(keyboard class driver). This driver is responsible for
handling all keyboard and keypad input in a secure manner. It then calls into
Win32K.sys
(after potentially passing the message through 3rd party
keyboard filters that are installed). This all happens in kernel mode.
Win32K.sys
figures out what window is the active window through the
GetForegroundWindow()
API. This API provides the window handle of the
browser's address box. The main Windows "message pump" then calls
SendMessage(hWnd, WM_KEYDOWN, VK_RETURN, lParam)
. lParam
is a bitmask
that indicates further information about the keypress: repeat count (0 in this
case), the actual scan code (can be OEM dependent, but generally wouldn't be
for VK_RETURN
), whether extended keys (e.g. alt, shift, ctrl) were also
pressed (they weren't), and some other state.
The Windows SendMessage
API is a straightforward function that
adds the message to a queue for the particular window handle (hWnd
).
Later, the main message processing function (called a WindowProc
) assigned
to the hWnd
is called in order to process each message in the queue.
The window (hWnd
) that is active is actually an edit control and the
WindowProc
in this case has a message handler for WM_KEYDOWN
messages.
This code looks within the 3rd parameter that was passed to SendMessage
(wParam
) and, because it is VK_RETURN
knows the user has hit the ENTER
key.
KeyDown
NSEvent is sent to the appThe interrupt signal triggers an interrupt event in the I/O Kit kext keyboard
driver. The driver translates the signal into a key code which is passed to the
OS X WindowServer
process. Resultantly, the WindowServer
dispatches an
event to any appropriate (e.g. active or listening) applications through their
Mach port where it is placed into an event queue. Events can then be read from
this queue by threads with sufficient privileges calling the
mach_ipc_dispatch
function. This most commonly occurs through, and is
handled by, an NSApplication
main event loop, via an NSEvent
of
NSEventType
KeyDown
.
When a graphical X server
is used, X
will use the generic event
driver evdev
to acquire the keypress. A re-mapping of keycodes to scancodes
is made with X server
specific keymaps and rules.
When the scancode mapping of the key pressed is complete, the X server
sends the character to the window manager
(DWM, metacity, i3, etc), so the
window manager
in turn sends the character to the focused window.
The graphical API of the window that receives the character prints the
appropriate font symbol in the appropriate focused field.
The browser now has the following information contained in the URL (Uniform Resource Locator):
Protocol
"http"Use 'Hyper Text Transfer Protocol'
Resource
"/"Retrieve main (index) page
When no protocol or valid domain name is given the browser proceeds to feed the text given in the address box to the browser's default web search engine. In many cases the url has a special piece of text appended to it to tell the search engine that it came from a particular browser's url bar.
a-z
,
A-Z
, 0-9
, -
, or .
.google.com
there won't be any, but if there were
the browser would apply Punycode encoding to the hostname portion of the
URL.gethostbyname
library function (varies by
OS) to do the lookup.gethostbyname
checks if the hostname can be resolved by reference in the
local hosts
file (whose location varies by OS) before trying to
resolve the hostname through DNS.gethostbyname
does not have it cached nor can find it in the hosts
file then it makes a request to the DNS server configured in the network
stack. This is typically the local router or the ISP's caching DNS server.ARP process
below for the DNS server.ARP process
below for the default gateway IP.In order to send an ARP broadcast the network stack library needs the target IP address to look up. It also needs to know the MAC address of the interface it will use to send out the ARP broadcast.
The ARP cache is first checked for an ARP entry for our target IP. If it is in the cache, the library function returns the result: Target IP = MAC.
If the entry is not in the ARP cache:
ARP Request
:
Sender MAC: interface:mac:address:here Sender IP: interface.ip.goes.here Target MAC: FF:FF:FF:FF:FF:FF (Broadcast) Target IP: target.ip.goes.here
Depending on what type of hardware is between the computer and the router:
Directly connected:
ARP Reply
(see below)Hub:
ARP Reply
(see below).Switch:
ARP Reply
(see below)ARP Reply
:
Sender MAC: target:mac:address:here Sender IP: target.ip.goes.here Target MAC: interface:mac:address:here Target IP: interface.ip.goes.here
Now that the network library has the IP address of either our DNS server or the default gateway it can resume its DNS process:
Once the browser receives the IP address of the destination server, it takes
that and the given port number from the URL (the HTTP protocol defaults to port
80, and HTTPS to port 443), and makes a call to the system library function
named socket
and requests a TCP socket stream - AF_INET
and
SOCK_STREAM
.
At this point the packet is ready to be transmitted through either:
For most home or small business Internet connections the packet will pass from your computer, possibly through a local network, and then through a modem (MOdulator/DEModulator) which converts digital 1's and 0's into an analog signal suitable for transmission over telephone, cable, or wireless telephony connections. On the other end of the connection is another modem which converts the analog signal back into digital data to be processed by the next network node where the from and to addresses would be analyzed further.
Most larger businesses and some newer residential connections will have fiber or direct Ethernet connections in which case the data remains digital and is passed directly to the next network node for processing.
Eventually, the packet will reach the router managing the local subnet. From there, it will continue to travel to the AS's border routers, other ASes, and finally to the destination server. Each router along the way extracts the destination address from the IP header and routes it to the appropriate next hop. The TTL field in the IP header is decremented by one for each router that passes. The packet will be dropped if the TTL field reaches zero or if the current router has no space in its queue (perhaps due to network congestion).
This send and receive happens multiple times following the TCP connection flow:
Client chooses an initial sequence number (ISN) and sends the packet to the server with the SYN bit set to indicate it is setting the ISN
ClientHello
message to the server with its
TLS version, list of cipher algorithms and compression methods available.ServerHello
message to the client with the
TLS version, selected cipher, selected compression methods and the server's
public certificate signed by a CA (Certificate Authority). The certificate
contains a public key that will be used by the client to encrypt the rest of
the handshake until a symmetric key can be agreed upon.Finished
message to the server, encrypting a hash of
the transmission up to this point with the symmetric key.Finished
message
to the client, also encrypted with the symmetric key.If the web browser used was written by Google, instead of sending an HTTP request to retrieve the page, it will send a request to try and negotiate with the server an "upgrade" from HTTP to the SPDY protocol.
If the client is using the HTTP protocol and does not support SPDY, it sends a request to the server of the form:
GET / HTTP/1.1 Host: google.com Connection: close [other headers]
where [other headers]
refers to a series of colon-separated key-value pairs
formatted as per the HTTP specification and separated by single new lines.
(This assumes the web browser being used doesn't have any bugs violating the
HTTP spec. This also assumes that the web browser is using HTTP/1.1
,
otherwise it may not include the Host
header in the request and the version
specified in the GET
request will either be HTTP/1.0
or HTTP/0.9
.)
HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after completion of the response. For example,
Connection: close
HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message.
After sending the request and headers, the web browser sends a single blank newline to the server indicating that the content of the request is done.
The server responds with a response code denoting the status of the request and responds with a response of the form:
200 OK [response headers]
Followed by a single newline, and then sends a payload of the HTML content of
www.google.com
. The server may then either close the connection, or if
headers sent by the client requested it, keep the connection open to be reused
for further requests.
If the HTTP headers sent by the web browser included sufficient information for
the web server to determine if the version of the file cached by the web
browser has been unmodified since the last retrieval (ie. if the web browser
included an ETag
header), it may instead respond with a request of
the form:
304 Not Modified [response headers]
and no payload, and the web browser instead retrieves the HTML from its cache.
After parsing the HTML, the web browser (and server) repeats this process
for every resource (image, CSS, favicon.ico, etc) referenced by the HTML page,
except instead of GET / HTTP/1.1
the request will be
GET /$(URL relative to www.google.com) HTTP/1.1
.
If the HTML referenced a resource on a different domain than
www.google.com
, the web browser goes back to the steps involved in
resolving the other domain, and follows all steps up to this point for that
domain. The Host
header in the request will be set to the appropriate
server name instead of google.com
.
The HTTPD (HTTP Daemon) server is the one handling the requests/responses on the server side. The most common HTTPD servers are Apache or nginx for Linux and IIS for Windows.
The HTTPD (HTTP Daemon) receives the request.
The server verifies that there is a Virtual Host configured on the server that corresponds with google.com.
The server verifies that google.com can accept GET requests.
The server verifies that the client is allowed to use this method (by IP, authentication, etc.).
If the server has a rewrite module installed (like mod_rewrite for Apache or URL Rewrite for IIS), it tries to match the request against one of the configured rules. If a matching rule is found, the server uses that rule to rewrite the request.
The server goes to pull the content that corresponds with the request, in our case it will fall back to the index file, as "/" is the main file (some cases can override this, but this is the most common method).
The server parses the file according to the handler. If Google is running on PHP, the server uses PHP to interpret the index file, and streams the output to the client.
Once the server supplies the resources (HTML, CSS, JS, images, etc.) to the browser it undergoes the below process:
The browser's functionality is to present the web resource you choose, by requesting it from the server and displaying it in the browser window. The resource is usually an HTML document, but may also be a PDF, image, or some other type of content. The location of the resource is specified by the user using a URI (Uniform Resource Identifier).
The way the browser interprets and displays HTML files is specified in the HTML and CSS specifications. These specifications are maintained by the W3C (World Wide Web Consortium) organization, which is the standards organization for the web.
Browser user interfaces have a lot in common with each other. Among the common user interface elements are:
Browser High Level Structure
The components of the browsers are:
The rendering engine starts getting the contents of the requested document from the networking layer. This will usually be done in 8kB chunks.
The primary job of HTML parser to parse the HTML markup into a parse tree.
The output tree (the "parse tree") is a tree of DOM element and attribute nodes. DOM is short for Document Object Model. It is the object presentation of the HTML document and the interface of HTML elements to the outside world like JavaScript. The root of the tree is the "Document" object. Prior of any manipulation via scripting, the DOM has an almost one-to-one relation to the markup.
The parsing algorithm
HTML cannot be parsed using the regular top-down or bottom-up parsers.
The reasons are:
Unable to use the regular parsing techniques, the browser utilizes a custom parser for parsing HTML. The parsing algorithm is described in detail by the HTML5 specification.
The algorithm consists of two stages: tokenization and tree construction.
Actions when the parsing is finished
The browser begins fetching external resources linked to the page (CSS, images, JavaScript files, etc.).
At this stage the browser marks the document as interactive and starts parsing scripts that are in "deferred" mode: those that should be executed after the document is parsed. The document state is set to "complete" and a "load" event is fired.
Note there is never an "Invalid Syntax" error on an HTML page. Browsers fix any invalid content and go on.
<style>
tag contents, and style
attribute
values using "CSS lexical and syntax grammar"StyleSheet object
, where each object
contains CSS rules with selectors and objects corresponding CSS grammar.floated
,
positioned absolutely
or relatively
, or other complex features
are used. See
http://dev.w3.org/csswg/css2/ and http://www.w3.org/Style/CSS/current-work
for more details.CPU
or the graphical processor GPU
as well.GPU
for graphical rendering computations the graphical
software layers split the task into multiple pieces, so it can take advantage
of GPU
massive parallelism for float point calculations required for
the rendering process.After rendering has completed, the browser executes JavaScript code as a result of some timing mechanism (such as a Google Doodle animation) or user interaction (typing a query into the search box and receiving suggestions). Plugins such as Flash or Java may execute as well, although not at this time on the Google homepage. Scripts can cause additional network requests to be performed, as well as modify the page or its layout, causing another round of page rendering and painting.