The following are my takeaways.
Ping to HTTP
- ping does not have a server that gives back the ack. ping receives the request by the operating system. The operating system sends the acknowledgement. You can ping any operating system
- output request and response
1
|
printf 'HEAD / HTTP/1.1\r\nHost: en.wikipedia.org\r\n\r\n' | nc en.wikipedia.org 80
|
1
2
3
4
5
6
7
8
9
10
11
12
13
|
HTTP/1.1 301 TLS Redirect
Date: Thu, 26 Dec 2019 01:41:48 GMT
Server: Varnish
X-Varnish: 817889376
X-Cache: cp5011 int
X-Cache-Status: int-front
Server-Timing: cache;desc="int-front"
Set-Cookie: WMF-Last-Access=26-Dec-2019;Path=/;HttpOnly;secure;Expires=Mon, 27 Jan 2020 00:00:00 GMT
Set-Cookie: WMF-Last-Access-Global=26-Dec-2019;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Mon, 27 Jan 2020 00:00:00 GMT
X-Client-IP: 13.229.230.30
Location: https://en.wikipedia.org/
Content-Length: 0
Connection: keep-alive
|
- nc connecting to a port and sending a string. Since the string looks like a HTTP request, it responds the request
- nc stands for netcat
- nc does not know anything about HTTP server
- nc is used to connect various machines
- text of http request piped to nc
- nc is a thin wrapper over TCP
- Transport protocols -
- Application protocols -
- Protocols such as HTTP, SSH are are implemented by Application layer - Things that make sense to applications such as browsers
- Protocols such as TCP, UDP are implemented by Transport layer
- Protocols such as IP are implemented by Internet layer
- wifi, ethernet and DSL are implemented by Hardware
- IP - Narrow waste of
- nc -l 1234 listens to the port
- Different programs running and listening
- All the other ports where computer
- Listening on a port is a simple way of being a server
- nc is plain TCP server. TCP is a two way route. It is possible to send messages to each other
- CTRL+D - end of input
- Server has a well known port for applications
- Client initiates a connection and can use a different
- There is a largest port that we can listen to - “Servname not supported for aisocktype”
- Highest port 65535 - Not an arbitrary limit
- 0 - 1023 are reserved for super user. You have to do a sudo
- If you want to listen on 80, you need to be a super user
- Only one program can listen on a given port
- Once a program starts, then it can create separate threads to listen to various ports. nc does not have this capability. Webserver has this capability as it spaws several processes
- sudo lsof -i listens over 6011 and 6012
- browser works on several requests - html + images + css -the more it can do this parallel, the faster it can respond
- whenever a new connection, a new child process is spawned and caters to the new request. There is a limit of the number of child processes that can be spawned
DNS - names and addresses
- Every packet has a Destination IP
- No single host for DNS name
- Bunch of Ip addresses for a single DNS name for load balancing
- Domain Naming System - DNS to IP
- DNS A Record : Maps a name to IPV4 address
- One has to create DNS records
- Register at Registrar and create DNS records
- If DNS goes down, then the site cannot be reachable
- DNS resolver is built in to every operating system
- host is built in to OS. It gives the
1
2
3
4
5
|
host -ta www.refinitiv.com
www.refinitiv.com is an alias for d347ymu6kosx4n.cloudfront.net.
d347ymu6kosx4n.cloudfront.net has address 54.192.151.13
d347ymu6kosx4n.cloudfront.net has address 54.192.151.44
d347ymu6kosx4n.cloudfront.net has address 54.192.151.46
|
- dig can be used to get a lot more information about pings
- Many types of DNS records - A address,
- DNS is a distributed directory. No one DNS server needs to know all the DNS
- top level domains
- records for a certain domain will be found in authoritative name servers
- NS records are listed for higher level servers
- GTLD - Global level
- Resolvers talk to nearby cache user
- Caching server - consults the local cache - recursively resolves the query and then gives the IP
- DNS records have TTL - time to live - If you look up the cache again, the TTL is reduced
- Apache - Virtual Host configuration
- Host header required part of HTTP header
- DNS are structured as trees
- Domains
- SubDomain
- SubDomain of a SubDomain
- www.refinitiv.com is a subdomain of refinitiv.com
- Institutions - webserver - Single machine representing institution
- So many domains - Skip the www and point the domain
- Whether to use bare domains or sub domains is style and branding preference
- Fully Qualified Domain Names - need the FQDN
- Apache webserver
- By setting up a domain
- Setting up a
- Approximately cost 12 usd for a domain name
- Each packet contains the ip of sender and receiver - 4 octets - makes it easy - 32 bits IP address
- plain decimal is difficult to read
- IPV4 has 32 bits to represent the destination or target ip address
- IPV6 has 64 bits - With a given number of bits, you can make only certain distinct number of values
- Highest port is 65535
- You cannot listen
- 16 bits - two octets - max number 65535 - Port numbers are 16 bit values or 32 bit values
Addresses and Networks
- IPV4 is a four octet value - Mostly a convention. It is just a 32 bit string
- Some of the addresses are reserved for internal networks
- if all the 32 bit are used to represent public hosts, that would not be enough
- the light-green squares (0, 10, and 127) are blocks that are entirely reserved.
- the dark-green squares are blocks that are partly reserved. for instance, not all of the 192 block is reserved, but some of it is.
- the entire cyan row (starting at 224) is set aside for ip multicast.
- And the entire orange bottom row (starting at 240) was originally set aside for “future use” but was effectively lost due to being blocked as invalid. No, really. We lost 1/16th of all IPv4 addresses due to mistaken planning.
- Not every number can be assigned to public host - More than a billion IPV4 addresses
- Most of the IP addresses in a network block
- Not all networks are of the same size
- Network prefix are shorter
- /8 network has a set number of public hosts
- CIDR is the short for Classless Inter-Domain Routing, an IP addressing scheme that replaces the older system based on classes A, B, and C. A single IP address can be used to designate many unique IP addresses with CIDR. A CIDR IP address looks like a normal IP address except that it ends with a slash followed by a number, called the IP network prefix. CIDR addresses reduce the size of routing tables and make more IP addresses available within organizations.
- Split the network in to blocks - All the addresses with the
- Length of the prefix is important
- Network with shorter prefix is larger
- Network with longer prefix has lesser number of addresses
- 22 bit network - By just looking at IP address , it is difficult to identify the type of network
- /24 ~ 256 addresses
- Another way to write - Subnet mask 1’s on the left and 0’s on the right
- IPV4 - 32 bit values. decimal dotted quads
- /24 -255.255.255.0
- /16 - 255.255.0.0
- Prefixes need not be whole octets
- AWS subnets - default vpc for a user in a region
- default subnets for a user
- Addresses kind of belong to hosts. They do not actually belong to hosts. They belong to interfaces. They can have 0 or many interfaces.
- Every machine has a loop back interface
- Might have a tunnel
- VM interface connecting host and guest operating system
- interfaces
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc pfifo_fast state UP group default qlen 1000
link/ether 02:bf:ff:f8:f4:74 brd ff:ff:ff:ff:ff:ff
inet 172.31.19.219/20 brd 172.31.31.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::bf:ffff:fef8:f474/64 scope link
valid_lft forever preferred_lft forever
#+end_src>
- Loopback is a special interface that allows hosts to connect to each other
- Home router - Interface to internet and another interface is your laptop
- Atleast Two interfaces to a linux box on AWS - lo : loopback interface and
eth0: an ethernet interface
- Router connects two networks
- Most hosts have one interface
- Router will have atleast two hosts
- Default gateway connects to the rest of internet
- Router routes the traffic via Default network
- *ip route show default* - for the router address
- *netstat -nr* - default router address
- private ip addresses come out of the block - Most common on home routers
- NAT - Whenever traffic moves from public to private - What inside devices and
ports are connected to outside world?
- NAT - Makes it difficult to debug
- Private addresses - Good only on local networks
- IP 192.168. is behind a NAT router
- Private use - Never used in the public internet
- Web server do not see the private address
- Command for AWS Ubuntu public ip
#+begin_src
curl http://169.254.169.254/latest/meta-data/public-ipv4
curl ipecho.net/plain ; echo
|
- Do not need NAT with ipv6 - between 2012 to 2015 - Increased to 10% of all traffic
- More home and mobile users have v6 than business and office users
Protocol Layers
- TCP is built on IP protocol
- TCP middle layer of networking protocols
- HTTP and other are built on top of TCP
- Developer - You are working at one abstraction
- Flask - URLs for resources/methods/VERBs - Any problems with the lower layer is visible
- OS TCP implementation - Browsers merely use the TCP implemented in the OS
- TCP relies on IP
- IP relies on physical devices
- ping uses ICMP protocol
- DNS UDP protocol
- These protocols are available in the operating system
- Wifi has nothing to do with TCP
- TCP has nothing to with HTTP session
- pcap filter
- tcpdump to monitor applications
- Look at all the IP packets
- How much data has been sent ?
- Overhead
- Out of all the packets, only a few have payload
- TCP - Even before the client
- One exchange at HTTP is a bunch of requests at a lower level
- What happens when TCP connection happens :
- Each end point puts in a sequence number
- For each packet, the client will send a separate ack
- Even though the client is not transmitting specific stuff, there will be packet exchanges
- What is the need for putting sequence networks ?
- Each end point in the operating system keeps a space and memory
- Keeps the sequence number
- Packets can be sent again
- TCP handshake : This exchange of three packets is usually called the TCP three-way handshake
- In a long-running connection, there will be many packets exchanged back and forth. Some of them will contain application data; others may be only acknowledgments with no data (length 0). However, all TCP packets in a connection except the initial SYN will contain an acknowledgment of all the data that the sender has received so far. Therefore, they will all have the ACK flag set. (This is why tcpdump depicts the ACK flag with just a dot: it’s really common.)
- Four way teardown
- When either endpoint is done sending data into the connection, it can send a FIN packet to indicate that it is finished. The other endpoint will send an ACK to indicate that it has received the FIN. In the example HTTP data, the client sends its FIN first, as soon as it is done sending the HTTP request. This is the first packet containing Flags [F.]. Eventually the other endpoint will be done sending as well, and will send a FIN of its own. Then the first endpoint will send an ACK.
- Why do Packets drop ?
- Not an option at all
- Two different networks connected by a slow network
- Slow network can
- Routers cannot buffer stuff because of bottlenecks
- TCP and Routers do not do it that way
- TCP does not send data at full blast. It send only based on how quickly the ack is received. Also routers drop packets to signal congestion.
- If the router queues up, then the
- TCP congestion control
- As it is trying to get through, TCP sends packets slower and slower
- One of the reasons for packet loss is congestion and TCP would not want to through
- TCP has a lot of built in time outs
- Python requests library - timeout
- TCP Session time out between browser and web server ?
- Why time out ?
- Other host is powered off
- Connection between you and internet : Routers drop off the message
- DNS only used once to connect stuff and later DNS is not used at all
- Design application so that failures will happen
Big Networks