How Internet works

Posted by Monik, 04 March 2017.
Internet

Summary of different sources.

Table of contents

How Does the Internet Work?

source

IP Address

Internet Address (IP Address, Internet Protocol Address) is unique to each computer. It can be assigned as:

Protocol stack

The messages are send from computer to computer using TCP/IP protocol stack. There are following layers:

  1. Hardware layer - network card, modem - converts data to signal
  2. IP Layer - packets (data chunks) sent between computers using IP
  3. TCP (Transmission Control Protocol) Layer - packets sent between applications; each packet is assigned a port number
  4. Application Protocols Layer - application protocols, e.g. WWW, SMTP, FTP

The data is split into packets and packets are directed using ISP routers.

Internet infrastructure

Internet backbone is made up of many IXs (Internet Exchange Points), which are NSP (Network Service Provider) and private MAE (Metropolitan Area Exchanges) networks, connected via NAPs (Network Access Points). There are 3 NAPs per NSP.

Routing hierarchy

Routers are packet switches. Each router has a routing table, which contains addresses of it’s sub-networks. When a packet arrives (with an IP assigned) it either finds a network that contains the target computer, or passes it further to the backbone’s router.

DNS

DNS (Domain Name Service) is a distributed database that keeps the computer name and its IP address. DNS servers are computers on the network that hold part of this database. It’s a tree-like structure. When an Internet connection is setup 2 DNS addresses have to be extablished.

Application Protocols

Are specified by RFC (Request For Comments) documents.

HTTP

Is protocols that Web Browsers and Web Servers use (client - server). It is connectionless.

(I skip the part of how the Web page is requested, sent and rendeded in the browser.)

HTTP/2 was published in May 2015.

SMTP

SMTP (Simple Mail Transfer Protocol) is used by mail client. It is connection oriented protocol - the client maintains the connection.

TCP/IP Protocol

The IP (Internet Protocol) sends packets from IP to IP. Is not reliable and connectionless. Packets do not have to arrive in order. TCP assures the order and adds its own headers.

Cookies and sessions

A piece of data that a website stores on the user’s computer using Web Browser’s mechanism.

It can be used to authenticate an already logged in user. The cookie’s data should be of course encrypted. It’s visible here how important role the browser has.

Cookies have expiration date.

Session cookie has no expiration date and is present on the disk only as the user is navigating a website. It is removed when the browser is closed.

Cookies can also be used for sticky sessions (a mechanism to have session even despite a load balancer in the front).

Digital Certificate

It is an electronic document used to prove the ownership of a public key. Public key is bound to a respective entity by a process of registration by a certificate authority. A certificate contains:

A client can verify that the signature is correct and decide if it trusts the issuer.

There also exists a client certificate, which is bound to a person by the email address or name, and is issued by the server’s own certificate authority. It’s an alternative method of client authentication, next to passwords and cookies.

Caching HTTP Requests

ETag is protocol that allows the client to make conditional requests, and save the bandwith:

Cache-Control header can be sent by the server with a value e.g. no-cache to express that the client should always ask for a new value (browsers have HTTP cache), or max-age that says after how much time it should ask again. There are more values possible.

Both mechanisms can be used simultaneously, and it is actually a good idea.

Web Sockets

WebSocket is a computer communications protocol, providing both ways simultaneous communication channels over a single TCP connection. TCP is one level lower than the HTTP request in the TCP/IP protocol.

The TCP connection is established by initiating a HTTP hanshake request by the client, and server responding. The connection remains open on port 80, or port 443 for TSL (=SSL, encrypted connection, part of HTTPS protocol).

HTTP is still better option when client is likely to make only single requests, or when caching is desirable, or when it is important to maintain resilience despite communication failure.

Firewalls

Firewall is a system that filters network traffic according to some predefined rules. They can operate at different levels of the TCP/IP protocol.

Multi-tier architecture

In Web Applications is implemented in layers. This increases flexibility and reusability, and probably some more things. Those layers can (but don’t have to) be on different physical machines = tiers.

Two layer architecture consists of presentation layer and a data storage layer. Usually presentation on client side and data storage on server side.

Three layer architecture consists of presentation layer, a domain logic layer, and a data storage layer.

Three tier architecture is client–server software architecture pattern where all the above 3 layers are part of different and independent modules. The presentation layer is the client, most often a web browser.

String encoding

ASCII defines 128 characters, which map to the numbers 0127. ASCII is a subset of Unicode. Unicode defines up to 2^21 characters (some numbers are unassigned, some are reserved).

The numbers 0-127 have the same meanings in ASCII and in Unicode.

ASCII fits in 7-bits. To make use of the extra bit extended ASCII versions were created that use 8 bits and can hold 256 characters - called code pages, e.g. ISO 8859, ISO 8859-1 (ISO Latin-1).

Code pages caused problems when a system tried to guess the encoding but guessed wrong. Unicode was supposed to solve this problem.

Unicode does not fit even in a byte. That is why it has to be encoded. After encoding it can use various numbers of bytes, e.g.:

An analogical problem to code pages appeared. One solution could be the following, but since it is optional, it is not always used:


Comments


Comments: