Max Ficco about now weblog notes photos projects contact

Systems Programming (CSE 20289)
Prof. Collin McMillan, Fall 2025

"How to use a computer, from a computer scientist's perspective"

2025-08-25

Arthur C. Clark 1961 Daisy Bell story

"Thinking" Machines, 1959 Alex Bernstein

PDP 7, a few years later

enter: UNIX operating system (AT&T)

2025-08-27

recap: kernel --> drivers --> user programs --> shell
What are the main components of a compuer?

The problem: need multiple different people to access multiple different forms of long term storage

basic demo in the terminal: pwd, mkdir, ls -alh, vim, uname -a, cd

root folder: /

cmc showing his motherboard on the doc cam:

lsblk -e 7
nvme ssd mounted to /
hard drive mounted to /mnt/tmp

file system "basically" has two things: directories and files

dir inode

file inode

stat hi.txt gives file inode info^

chown cmc:users hi.txt

ln hi.txt othername.txt

ln -s hi.txt other2.txt

(hidden folders start with .)

permissions

grouped in threes:
(d or -) --- user access --- group access --- world access

chmod -w hi.txt

chmod u+w only user can write
chmod g+w only group can write
chmod a+w all can write

also have codes (octal codes):
chmod 777 hi.txt
chmod 600 hi.txt
chmod 644 hi.txt

rwx 111 7
rw 110 6
r x 101 5
r 100 4
wx 011 3
w 010 2
x 001 1

2025-08-29

In a navy warehouse:
600 pallets * 45 boxes = ~27000 boxes filled with punch cards
27000 * 2000 punched cards = ~54 million cards
54 million cards * 80 characters = ~4.3 billion characters = ~4.3GB

misc. commands shown on cmc@reptar

lsblk -e 7 lists info about devices, ignoring loopback (virtual) devices

you can add time to the start of any command to show how long it takes
mv is faster than cp:

regex!

dog+: dog, dogg, doggg, ... (1 or more g's)
dog*: do, dog, dogg, ... (0 or more g's)
dog?: do, dog (0 or 1 g's)
(dog)+: dog, dogdog, dogdogdog, ...
dog{1,4}: dog, dogg, doggg, dogggg (1 to 4 g's)
(dog|cat): dog, cat (dog or cat)

use \ to escape special characters
\s for any whitespace character (space, tab, etc.)
could do \s+ for any number of spaces

^ means start of line
$ means end of line

. means any single character (except for new lines \n!)
so, .* matches anything, repeated any number of times in a single line

\w means any "word" character - any alphanumeric character (letters A-Z, a-z, digits 0-9, and underscore _)

[abc] set for a, b, or c
[a-z0-9]
[a-zA-Z0-9]

\b indicates a word boundary
\d is any digit (0–9)

searching through files: sed

find prints out all files and folders (recursively) underneath cwd
^ can pipe the output of this into another program!

find | sed -nE ''
-n suppresses automatic printing; -E enables extended regular expressions (so you don't need to escape things like +, ?, |, or parentheses)

search and replace: s/searchingfor/replacewith/gp (g means global/whole string, p means print it out)

ex: find | sed -nE 's/\.\/Movies\/(.*)\/(.*)\.mkv/\1/p'
- takes the path like ./Movies/<dir>/<file>.mkv and prints just the directory name (\1)

2025-09-01

2014: Yahoo had password database breached (plain text usernames and passwords)

cat yahoo.clean.txt | sed -nE '/obama/p'

cat yahoo.clean.txt | sed -nE '/obama/p' | uniq

cat yahoo.clean.txt | sed -nE '/^[a-z]+[0-9]+[a-z]+$/p' | sort | uniq -c | sort -rh

cat yahoo.clean.txt | sort | uniq -c | sort -rh

cat shakespeare.txt | sed -nE 's/friend/pal/p'

cat yahoo.clean.txt | sed -nE 's/^([a-z])+([0-9])+([a-z])+$/\1/p' | sort | uniq -c | sort -rh

cat yahoo.clean.txt | sed -nE 's/^([a-z])+([0-9])+([a-z])+$/\1\2\3/p' | sort | uniq -c | sort -rh (this just replicates what we find)

building a bash script (bash is the shell we are using)

use shebang at top of bash script:
#!/bin/bash
creates a new bash shell and then runs command inside it

ex:

#!/bin/bash

TEXTDOC=$1
NOUN=$2

cat $TEXTDOC | sed -nE "s/$NOUN/pal/p"

2025-09-03

intro analogy about layers of the earth and the shell, lithosphere

/etc - holds system-wide configuration files

variables in bash

NO SPACES!!!

MYVAR=$1
ANOTHER=$2 # if no second argument, will default to ""
echo $MYVAR

echo $# - number of command line arguments

if statements:

if [ $MYVAR -lt 7 ];
then
    echo "myvar is less than 7";
else
    echo "not less than 7";
fi

^use single brackets, not double brackets (older/more supported)

The open bracket [ is not a symbol!! Bash is simple. Everything is a command to run another program

[ calls test, -lt is a command line argument (less than)

another example:
echo "5 + 4" | bc -l

storing things:

VAR=$(echo "5 + 4" | bc -l) - command output capturing
echo $VAR --> 9 ! YAY! SO EZ!

adding 1 to a variable:
NEWVAR=$(echo "$VAR + 1" | bc -l)

for loops:

format: seq [FIRST] [INCREMENT] LAST (increment is optional, defaults to 1)
seq 1 2 10
1
3
5
7
9

VAR=$1

for i in $(seq 0 $VAR); do
    echo "something $i"
done

working with strings:

STR="testing"
echo ${#STR} --> 7 (length of string)

substring format: ${variable:offset:length}
echo ${STR:2:4} --> stin

we can loop through each letter in a string:

VAR=$1

for i in $(seq 0 ${#VAR}); do
    echo "something: ${VAR:i:1}"
done

case statements:

VAR=$1
case $VAR in
    -m) echo "mario" ;;
    -l) echo "luigi" ;;
    *) echo "player not supported" ;;
esac

functions:

MYVAR=$1
foo()
{
    local VAR=$1        # refers to the first argument of the function (~different scope)
    echo "$VAR"
    let RETFOO=57       # how we return variables in bash
}

echo "$VAR"     # prints a newline (nothing)

foo 5           # prints 5

echo "$VAR"     # prints 5 (unless we add "local" before VAR inside of foo)

echo "$RETFOO"  # prints 57

2025-09-05

in bashwrk folder:

cat shakespeare.txt | sed -nE "s/^([a-zA-Z ,]+\?)$/\1/p"

cat shakespeare.txt | sed "s/^\s+([a-zA-Z ,]+)\..*$/\1/p" | sort

Let's turn these into a script:

# step 1) set globals

# * in bash, we actually do want to set global variables at the top of our scripts
SED="sed -nE" # potentially useful for switching to different machines

# step 2) usage message

usage(){
    cat 1>&2 <<EOF
Usage: $(basename $0) <FILENAME> <FORMAT> <REGEX_TYPE>

FILENAME    The name of the file with the Shakespeare data.

FORMATS     u: print in uppercase
            l: print in lowercase

OPERATIONS  q: extract questions
            n: extract names

EOF
    exit $1
}

# step 3) function area

set_regex(){
    local WHICHREGEX=$1 # remember, refers to first parameter of that function, not first parameter of program
    
    if [ $WHICHREGEX != 'q' ] && [ $WHICHREGEX != 'n']; then
        echo "invalid operation"
        exit 1
    fi

    case $WHICHREGEX in
        q) REGEX="^([a-zA-Z ,]+\?)$"; GRP="\1" ;; 
        n) REGEX="^\s+([a-zA-Z ,]+)\..*$"; GRP="\1" ;;
        # if we didn't want to use the if statement above, we could have used  *)  here
    esac
}

set_format(){
    FORM=$1

    case $FORM in
        u) OPTIONS="\U" ;;
        l) OPTIONS="\L" ;;
    esac
}

# step 4) handle parameters

FILENAME=$1
P_FORMAT=$2
P_OPERATION=$3

if [ $# -ge 3 ]; then
    set_regex $P_OPERATION
    set_format $P_FORMAT
else
    usage 1
fi

# step 5) run the command

cat $FILENAME | $SED "s/$REGEX/$OPTIONS$GRP/p"

5 sections of bash script:

2025-09-08

starting in zipcodes.dat, semicolon delimited

cat zipcodes.dat | sort -t';' -k2,2

to sort across a range we can do
cat zipcodes.dat | sort -t';' -k2,4

to sort by two different values we can pass multiple k's:
cat zipcodes.dat | sort -t';' -k2,2 -k1,1

awk

cat zipcodes.dat | awk -v FS=';' '{print $1,$3;}'

But default output delimeter is also space! So we can do:
cat zipcodes.dat | awk -v FS=';' -v OFS=';' '{print $2,$1,$3;}'

technically we don't need cat with awk:
awk -v FS=';' -v OFS=';' '{print $1,$3;}' zipcodes.dat
but...awk is old.
gawk is newer...(but like 1992 still)
we don't want to trust awk/gawk to read files (differences in file systems, etc)!

cat zipcodes.dat | awk -v FS=';' -v OFS=$'\t' '{print $2,$1,$3;}'
tab might get confused between awk and bash - we send it to bash with $ first to be safe

conditions:
cat zipcodes.dat | awk -v FS=';' -v OFS=$'\t' 'NR<=6 {print $2,$1,$3;}'
cat zipcodes.dat | awk -v FS=';' -v OFS=$'\t' 'NR>1 {print $2,$1,$3;}'

we can also filter:
cat zipcodes.dat | awk -v FS=';' -v OFS=$'\t' '$2~/Augusta/ {print $2,$1,$3;}'
cat zipcodes.dat | awk -v FS=';' -v OFS=$'\t' '$1~/4661[0-9]/ {print $2,$1,$3;}'
cat zipcodes.dat | awk -v FS=';' -v OFS=$'\t' 'NR>1 && $2~/4661[0-9]/ {print $2,$1,$3;}'
cat zipcodes.dat | awk -v FS=';' -v OFS=$'\t' 'NR<10 || $2~/4661[0-9]/ {print $2,$1,$3;}'
bottom line: if using regex in awk, be very explicit because it can have slightly different meanings (e.g. + behaves differently)

cat dinosaur.dat | awk -v FS=$'\t' '$19<=-89.55222 && $19>=-91.55222 && $20>=39.134957 && $20<=41.134957 {print ;}'

2025-09-10

recap from hw:

'...' takes everything inside as is

"..." will interpret variables inside.

VAR=$(...) will run ... and store it in VAR

find most northerly dinosaur:

cat dinosaur.dat | awk -v FS=$'\t' -v OFS=$'\t' '{print $6, $19, $20}' | sort -t$'\t' -k3,3hr | head -n 10

talked about dinosaurs for a while :)

some more bash stuff

parameter reading:
$# tells you the number of parameters

while [ $# -gt 0 ] do
    case $1 in
        -h) echo "got h"; echo $2; shift; ;; # captures 2 things and shifts left twice
        -a) echo "got a"; ;;
    esac
    shift # shifts parameters to the left and throws first away ($2 becomes $1, etc.)

reading from a file:

cat bash_notes.txt | while IFS= read -r line; do # while input file separator is nothing (read whole line)
    echo $line
done

2025-09-12

Review for Exam

File permissions (octal codes):
rwx 111 7
rw 110 6
r x 101 5
r 100 4
wx 011 3
w 010 2
x 001 1

can change permissions:
chmod g+r dinosaur.dat - allows group to read
chmod 777 dinosaur.dat - rwx for all

might have to do something like
VAR=$(ls -alh dinosaur.dat
and do something with it!

stat dinosaur.dat
two types of inodes - directory inodes and file inodes:
- a directory inode is basically like an index
- tells you which files are there
- essentially a table of names, and their inode number
- I can make a hardlink, which has different name but same inode number
- can also make softlink, which has different inode number but its file inode points to same file

Example command that we want to make a script build up to:
cat dinosaur.dat | awk -v IGNORECASE=1 -v FS=$'\t' -v OFS=$'\t' '$6~/tyrannosaur/ {print $6, $19, $20}'
will be generalized as:
cat $FILENAME | $AWK $AWKARGS "$AWKPROG"

wrote example program for what to expect for exam

(didn't copy, but main point - have good structure!)

2025-09-15

Exam 1

2025-09-17, 2025-09-19, 2025-09-22

Wednesday class was canceled
Friday class just went over exam 1 solutions
Monday class was canceled

2025-09-24

>>> name=5
>>> type(name)
<class 'int'>
>>> name="Max"
>>> type(name)
<class 'str'>
>>> dir(name)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> name.replace("M", "D")
'Dax'
>>> help(name.replace) # shows usage

python is dynamically typed!

def foo(x, y):
    return x+y

foo(5,6) # 11
foo("He", "lloo") # Helloo

in python, everything is an object, and resources come from modules

using regex in python:

import re

matches = re.findall(r"\d+", "The airplane has 130 seats and 111 passengers.")
# r means "raw string", which means to be interpreted literally

# iterables
for match in matches:
    print(match)

matches = re.findall(r"aiRpLAnE", "The airplane has 130 seats and 111 passengers.", re.IGNORECASE)

# re also has search (find first occurence)
first_letter_a = re.search("a", "The airplane has 130 seats and 111 passengers.", re.IGNORECASE)

print(len(matches)) # 1

some extended notes on re functions:

m = re.search("a", "The airplane", re.IGNORECASE)
if m: print(m.group(), m.start())  # a 4
#  ^ match satements are "truthy"
nums = re.findall(r"\d+", "abc123def456")
if nums: print(nums)  # ['123', '456']
#  ^ empty list [] is falsy
   for m in re.finditer(r"\d+", "abc123def456"):
       print(m.group(), m.start())
       # 123 3
       # 456 9

Data Frame (just a list of lists)


f = open("movies.dat", "r")

df = list() # using this as dataframe

for line in f: # yay this is so much nicer than bash!!! (f is an iterable)
    print(line)
    # line is also an object!
    dataitems = line.split("::") 
    print(dataitems[1]) # prints movie name

    # could also store as a tuple
    (mid, nameyear, genrelist) = line.split("::") 
    
    # let's just store the whole linesplit
    df.append(dataitems)

print(df[2][2])

Reading input

import sys
# EVERYTHING IS AN OBJECTTTTTTT WOOOOO
filename = sys.argv[1] # (just like in c)

f = open(filename, "r")
# Don't forget to close file after you're done!
f.close()
...

2025-09-26

import sys
import re
import pdb

f = open("movies.dat", "r")

pdb.set_trace()

for line in f:
	(mid, nameyear, genres) = line.split("::")
	
	mid = int(mid)

f.close()

way to debug: in the server side, because that would require serializing a socket

Navigation:
b / break [line | func] - set a breakpoint (e.g. b 23 or b my_func)
cl / clear - clear all breakpoints (cl 2 clears one)
c / continue - continue execution until next breakpoint
n / next - execute next line (don't step into functions)
s / step - step into the next function call
r / return - run until the current function returns
q / quit - exit the debugger

Inspection:
l / list [start,stop] - show source around current line
a / args - display arguments of the current function
p [expr] - print the value of an expression
pp [expr] - pretty-print an expression
whatis [expr] - show the type of an expression
w / where - show the current call stack
u / up - move up one frame in the call stack
d / down - move down one frame
break - list all current breakpoints and their numbers

dataframe.py:

class Dataframe:
	def __init__(self, headers): # self is a reference to the instance of the object you are using right now
		self.dataframe = list()
		self.dataframe.append(headers)
		# or could do something like self.headers = headers
	def append(self, l):
		self.dataframe.append(l)
	

2025-09-29

def noodle(x):
    return 2 * x

def tofu(x):
    return 3 * x

def mittens(fluffy, boots):
    cleo = fluffy 
    # preprocessing
    def muffin():
        print(boots(cleo))
    # postprocessing
    return muffin

felix = mittens(853497, noodle)
felix()

felix = mittens(91, tofu)
felix()
import pdb
import time

cached = dict()

def cache(fcn):
    def inner(a):
        if a not in cached.keys():
            cached[a] = fcn(a)
        return cached[a]
    return inner

@cache # python decorator: same as expensive = cache(expensive)
def expensive(a):
    time.sleep(3)
    return a+1

#expensive = cache(expensive)

pdb.set_trace()
expensive(5)
expensive(5) # much faster

2025-10-01

Intro Video: "The Computer Chronicles - The Internet (1993)"

Key things to know:

  1. MAC Address (aka "Hardware Address")
  1. IP Address

program called traceroute to get messages sent from each computer that is traversed

[maxficco@pinatubo] ~ $ traceroute server.maxfic.co                               12:00
traceroute to server.maxfic.co (15.204.248.135), 64 hops max, 40 byte packets
 1  10.12.0.1 (10.12.0.1)  5.705 ms  4.698 ms  4.840 ms
 2  core9500-hu1-0-12.guest.gw.nd.edu (172.21.2.53)  4.722 ms
    core9500-hu2-0-11.guest.gw.nd.edu (172.21.2.57)  3.425 ms  3.387 ms
 3  172.21.2.66 (172.21.2.66)  5.727 ms  4.456 ms  5.324 ms
 4  jbr0-inside.gw.nd.edu (129.74.248.65)  5.066 ms  5.255 ms  5.117 ms
 5  et-4-2-1.402.rtr.ictc.indiana.gigapop.net (149.165.183.33)  11.654 ms  10.485 ms  10.815 ms
 6  ae-9.0.rtr.ll.indiana.gigapop.net (149.165.255.102)  10.879 ms  11.541 ms  12.625 ms
 7  ae-7.1.rtr2.chic.indiana.gigapop.net (149.165.255.93)  13.609 ms  15.925 ms  13.947 ms
 8  lo-0.1.rtr.star.indiana.gigapop.net (149.165.255.11)  13.615 ms  13.547 ms  14.281 ms
 9  149.165.183.86 (149.165.183.86)  13.866 ms  13.833 ms  13.209 ms
10  r-equinix-isp-ae0-2401.ip4.wiscnet.net (140.189.9.133)  13.305 ms  13.394 ms  13.776 ms
11  eqx.chi.ovh.net (208.115.136.152)  14.755 ms  15.192 ms  16.459 ms
12  * * *
13  * * *
  1. Hostname
  1. Port
  1. Packets
  1. Sockets

2025-10-03

sudo tcpdump port 9999 -i lo -s0 "snoops" on the connection
... IP localhost.57196 > localhost.9999 ...

weird analogy:

2025-10-06

analogy: take-out kitchen with a telephone in it

in code:

import threading

def somefunct(a, b, c):
    ...

th = threading.Thread(target=somefunc, args=(x, y, z))

th.start()

if we implemented this for our dinoserver socket example, can handle multiple connections at once! (don't need to wait for inner while statement to finish and close)

concurrency problems: multiple threads need to use the same resource (they need to take turns!)

python was designed in an era when having more than one core/thread available was rare

C=compute, S=store
C
C - cpu bound
S - i/o bound
| (takes a long time)
C
C
S
| (takes a long time)
C
C
S
...

Now let's use threading (with GIL)

t0 t1
C 
C     (compute and store can each only happen one at a time)
S  C
|  C
C  S
C  |
S  
| (takes a long time)
C
C
S

but if my task were instead:

C
C
C
S
(x3)

then with GIL:

C
C
C
S C
  C
  C
C S
C
C
S

without GIL:

C C C
C C C
C C C
S
  S
    S

Moral of the story: GIL is a frustrating monster (yay python 3.14!)

2025-10-08

import ray
import time

ray.init(log_to_driver=False)

@ray.remote # decorator that takes the function and "drops it" in the new space where it operates
def foo(x):
    time.sleep(x)
    print(x) # useless print statement! no console in this new space
    return(x*x)

f1 = foo.remote(4) # submits a task to Ray and returns a future object (ObjectRef), not the actual result
f2 = foo.remote(2) # ^these are IOU's

f1r = ray.get(f1) # blocks the main process until the remote task finishes, then retrieves the computed result
f2r = ray.get(f2) # ^ redeems IOU's (in any order, when I want)

print(f1r)
print(f2r)

# can also do with lists:
futures = [ foo.remote(i) for i in range(2,5)]

fr = ray.get(futures)
print(fr)

2025-10-10

What is a program actually?

when I execute program, the os will create a space in memory for that program to run

ps -ef lists all the processes running on a machine

import pickle

a = "hello"

pickle.dump(a, open('a.pkl', 'wb')) # wb is write bytes

xxd a.pkl shows the bytes, where we can find "hello"!

now:

import pickle

a = pickle.load(open('a.pkl', 'rb')) # rb is read bytes

print(a)

examples before exam

import socket
import sys
import ray

ray.init(log_to_driver=False)

@ray.remote
def scktwrk(host, port):
    try:
        s = socket.socket()
        s.settimeout(2)
        s.connect((host, port))

        hostver = s.recv(64)
        hostver = hostver.decode('utf-8')
        hostver = hostver.rstrip()

        s.close()

    except OSError as ex:
        hostver = "<connection refused>"

    except socket.timeout as ex:
        hostver = "<connection timeout>"

    except ConnectionRefusedError as ex:
        hostver = "<connection refused>"

    return (f'{host}:{port}\t{hostver}')


f = open(sys.argv[1])

futures = list()

for line in f:
    line = line.rstrip()
    (host, port) = line.split(':')
    port = int(port)

    futures.append(scktwrk.remote(host, port))

f.close()

for scktfuture in futures:
    resp = ray.get(scktfuture)
    print(resp)
import random

princesses = [ 'cinderella', 'belle', 'ariel', 'moana' ]

dinosaurs = [ 'tyrannosaur', 'allosaur', 'raptor' ]

def fate(func):
    def wrapper(princess):
        outcome = random.choice(['run', 'defeat', 'eaten'])
        dino = random.choice(dinosaurs)
        func(princess)
        if outcome == 'run':
            print(' was able to run away')
        elif outcome == 'eaten':
            print(f' got eaten by {dino}')
        else:
            print(f' defeated the {dino} in epic single combat')
    return wrapper

@fate
def princess_print(princess):
    print(f'{princess}', end='')

for princess in princesses:
    princess_print(princess)

2025-10-13

Exam 2

2025-10-15

deeper dive into processes - "what killed student10.cse.nd.edu?"

lifecycle of a process

  1. start state (more like not-ready state)
    • loads instructions, sets up memory space
  2. ready/sleep state (program is loaded)
    • interruptable sleep (could be woken up to continue running at any moment)
    • uninterruptable sleep (waiting on something necessary for the process to continue, usually i/o)
  3. running state (can go back and forth from sleep)
  4. killed state
  5. or...zombie state (sits there occupying resources with a return value for a parent that never comes to collect it)

some letter states you'll see:
T start
R eady
S leep (I or U)
K illed
Z ombie

echo $? - prints return value of previously ran command

now why did student10 break?

ps axfo user,pid,ppid,stat,command - a for all processes, x shows background stuff, f shows process tree, o allows us to specify which headers we want
USER - user who ran command
PPID - parent process id

demo using ray to create multithreaded processes in python

2025-10-17

Recap on processes:

Demo:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>

int main(void) {
    pid_t pid = fork();
    if (pid < 0) {
        perror("fork");
        return 1;
    }
    if (pid == 0) {
        sleep(5);
        _exit(0);
    } else {
        printf("parent pid=%d child pid=%d -- sleeping 60s\n", getpid(), pid);
        sleep(60);
    }

    return 0;
}

inter-process communication (ICP)

Signals are a lightweight IPC mechanism that deliver small integer-coded messages to a process.

Common examples:

remember: kill is just a program that sends signals to processes

What actually happens when you press Ctrl+C?

But we can override this behavior!

import signal
import time

def handle_signal(signum, frame):
    print('haha no')

signal.signal(signal.SIGINT, handle_signal) # overrides Ctrl+C !!
signal.signal(signal.SIGUSR1, handle_signal) # overrides kill -10 <PID>
signal.signal(signal.SIGTERM, handle_signal) # overrides default kill <PID> command !!

signal.signal(signal.SIGKILL, handle_signal) # ERROR; not allowed by OS :( ... kill -9 <PID> will always kill the process (but uncleanly)

while True:
    time.sleep(1)

2025-10-27

Recall egg analogy of OS:
kernel <- libraries/drivers <- programs <- shell

4 steps involved in compiling a program (say, wrk.c):
gcc -o wrk wrk.c does everything all in one (or we can do certain steps individually and combine others in order)
colloquially we call this "compiling" a program, even though compilation is just one step

  1. preprocessing -> wrk_.c

cpp wrk.c > wrk_.c (cpp is the C PreProcessor)

  1. compiling -> wrk.s

gcc -S -o wrk.s wrk_.c

  1. assembling -> wrk.o

as -o wrk.o

  1. Linking -> wrk (or wrk.exe on windows)

gcc -o wrk wrk.o

Example

say we just had this file, foo.c

int foo() {
    return (0);
}

cpp foo.c > foo_.c
gcc -S -o foo.s foo_.c
as -o foo.o foo.s
gcc -o foo foo.o -> gives an error! (there is no main function)

returning to solar-deity analogy of of the operating system:

(zooming in) What are these little boxes?

ffffffffffffffff
_________________
Stack (grows downward)
- contains function local variables
______ vv _______


______ ^^ _______
Heap (grows upward)
- from malloc, calloc, free
_________________               ^^ created at runtime (above)^^   vv (below) created at compile time vv
DATA - FIXED SIZE
- contains global variables, static variables, constant strings
_________________
(Text) Code
- contains machine code to run program
_________________
0000000000000000

Let's update our wrk.c

#include <stdio.h>
#include "foo.h" // quotes mean look in current directory

int main() {
    int a = foo();
    printf("hello world!\n");
    return 0;
}

foo.h:

int foo(); // preprocessor step will put this in wrk.c

cpp wrk.c > wrk_.c
gcc -S -o wrk.s wrk_.s
as -o wrk.o
going fine so far...
if we look in wrk.s, we will see call foo (but we have never given it foo!)

gcc -o wrk wrk.o -> fails! "undefined reference to foo"

okay, let's fix this:
gcc -o libfoo.o foo.s -> still gives an error (

gcc -fPIC -shared -o libfoo.so foo.c -> creates a shared object file

gcc -o wrk wrk.c still gives an error!
we have to tell it to look for foo...
gcc -o wrk wrk.c -lfoo still gives an error! ("cannot findd -lfoo")
we have to tell it where to look for foo...
gcc -o wrk wrk.c -lfoo -L. <- -lfoo looks for libfoo.so or libfoo.a

now we're good...
./wrk -> gives an error!! ("error while loading shared libraries: libfoo.so: cannot open shared object file ...")
we also have to tell bash where to look for foo...

LD_LIBRARY_PATH=. ./wrk
creates environment variable for bash to use, so that when it goes before the solar deity to ask it to run the program, it has all the info!
now it is working :)

2025-10-29

hypothetical memory map (a lot more going on under the hood of course):

-----
stack     <--grows with function calls, variable assignments
----- 3A
 v

 ^
-----
heap      <--dynamically allocated variables (malloc, calloc, etc.)
----- 2A
data      <--global variables, string literals, static variables (fixed size)
----- 1A
code      <--this is where the os puts our code!
----- 0A
#include <stdio.h>

int main()
{
    int i = 48;
    int j = 54;
    int k = 62;

    int *p = &i; // most people write it like this. But the star is not acting on p!
    int* q = &j; // this is a bit more legibly accurate

    printf("%d\n", i);
    printf("%p\n", &i);

    printf("%d\n", *p);
    printf("%p\n", p); // p holds an address
    printf("%p\n", &p); // but p also has an address
    return(0);
}

Stack (hypothetical, assuming ints and pointers are 1 byte):

Label Address Value
i Ox3F 48
j Ox3E 54
k Ox3D 62
p Ox3C 0x3F
q Ox3B 0x3E
0x3A
0x39
0x38

& is "address of"
* is "value pointed to by"

2025-10-31

Makefiles

A Makefile automates compilation by encoding build rules so you can just run make.

Basic structure:

target: dependencies
 <TAB> command

The all Target

By convention, all means "build everything".

all: libfoo.so wrk

libfoo.so: foo.c
	gcc -fPIC -shared -o libfoo.so foo.c

wrk: wrk.c foo.h libfoo.so
	gcc -o wrk wrk.c -lfoo -L.

clean:
	rm -f wrk libfoo.so

When you run make on its own:

  1. Make defaults to the first target (all)
  2. It checks whether each dependency exists and is up to date
  3. It rebuilds only what changed
  4. all usually has no commands — it’s just an alias

Shared Libraries (.so)

Build a shared library:
gcc -fPIC -shared -o libfoo.so foo.c

Static Compilation (Contrast)

Example:
gcc -static main.c

Result:

C Declarations (Historical Note)

2025-11-03

Recap

"compiling" refers to the build process from human code to machine-readable code

what happens when we run an executable?

Arrays in Memory / Function Calls

Stack (hypothetical, assuming ints and pointers are 1 byte):

Label Address Value
(foo creates) a Ox3F 65 (ary[2]??)
Ox3E ary[1]
(bar creates) ary Ox3D ary[0]
Ox3C
Ox3B
0x3A
0x39
0x38
#include <stdio.h>

void bar()
{
    char ary[2];
    printf("bar: %p\n", (&ary[30])); // the address of ary[30] is the same as the address of a !!!
    ary[30] = 66;
}

void foo()
{
    int a = 65;
    bar();
    printf("foo: %p\n", &a);
    printf("bar: %d\n", a);
}

int main()
{
    foo();
    return(0);
}

Key points:

Implication:

Strings in memory

foo() {
	char s[4] = "abc";
	char *t = "def";
	char *q = malloc(4*sizeof(char));
	q[1] = 'a';
	
}

(HYPOTHETICAL, ASSUMING INTS AND POINTERS ARE 1 BYTE)
Stack:

Label Address Value
Ox3F '\0'
Ox3E 'c'
Ox3D 'b'
s Ox3C 'a'
t Ox3B 0x1C
q 0x3A 0x29

Heap (remember, grows up):

Label Address Value
Ox2F
Ox2E
Ox2D
Ox2C --
Ox2B --
0x2A a
0x29 --

Data Segment (compiler puts variables in here - location is implementation dependent):

Label Address Value
Ox1F '\0'
Ox1E 'f'
Ox1D 'e'
Ox1C 'd'
Ox1B
0x1A

Remember: data segment is READ-ONLY!

t[1] = 'q';

Example: reading a line into a buffer

#include <stdlib.h>
#include <stdio.h>

int readline(FILE *fp, char *buf, int maxlen)
{
    char c;
    int i = 0;

    while(i<maxlen && (c = fgetc(fp)) != '\n')
    {
        buf[i] = c;
        i++;
    }

    return(i);
}

int main()
{
    FILE *fp;

    fp = fopen("test.txt", "r");

    int maxlen = 128;
    char *buf = malloc(maxlen * sizeof(char));

    int r = readline(fp, buf, maxlen);

    printf("read %d bytes: %s\n", r, buf);

    return(0);
}

2025-11-05

reading in multiple lines in C

why don't we just do char[20000][4000] ?

char** foo() {
	int maxlen = 3;
	char* linesA[4];
	linesA[0] = malloc(maxlen*sizeof(char));
	linesA[0][0] = 'w';
	
	char ** linesB = malloc(4*sizeof(char*));
	linesB[0] = malloc(maxlen*sizof(char));
	linesB[0][0] = 'q';
	
	return linesB;
}

void bar() {
	char **strings = foo();
}

(HYPOTHETICAL, ASSUMING INTS AND POINTERS ARE 1 BYTE)
Stack:

Label Address Value
Ox3F --
Ox3E --
Ox3D --
linesA Ox3C 0x26
linesB Ox3B 0x29
0x3A
Heap (remember, grows upward. Labels don't actually exist, but just for clarity):
Label Address Value
Ox2F --
Ox2E --
linesB[0] Ox2D 'q'
Ox2C --
Ox2B --
0x2A --
linesB 0x29 0x2D
0x28 --
0x27 --
linesA[0] 0x26 'w'

Let's look at some code:

#include <stdio.h>
#include <stdlib.h>

void does_work(char ** msg)
{
	msg[0] = malloc(3 * sizeof(char));
	msg[0][0] = 'a';
	msg[0][1] = 'b';
	msg[0][2] = '\0';

	msg[1] = malloc(2 * sizeof(char));
	msg[1][0] = 'c';
	msg[1][1] = '\0';
}

void print_work(char ** msg)
{
	printf("%s\n", msg[0])
	printf("%s\n", msg[1])
}

int main()
{
	char **msg;
	msg = malloc(2 * sizeof(char *));

	does_work(msg);
	print_work(msg);

	return(0);
}

great exam question: "draw a hypothetical memory map of this C code"

Reading multiple lines

#include <stdlib.h>
#include <stdio.h>

int readline(FILE *fp, char *buf, int maxlen)
{
    char c;
    int i = 0;

    while(i<maxlen && (c = fgetc(fp)) != '\n')
    {
        buf[i] = c;
        i++;

        if (c == EOF)
        {
            i = -1;
            break;
        }
    }

    return(i);
}

int main()
{
    int n;
    int i = 0;
    FILE *fp;

    fp = fopen("test.txt", "r");

    int maxlen = 128;
    int maxlines = 100;
    char **lines = malloc(maxlines * sizeof(char *));
    char *buf;// = malloc(maxlen * sizeof(char));

    while(n!=-1)
    {
        buf = malloc(maxlen * sizeof(char));
        n = readline(fp, buf, maxlen);
        //if we wanted our lines to be _exactly_ as big as the strings they hold, we could do:
        //buf2 = malloc(n * sizeof(char)); // (then lines[i] = buf2;)
        // we could have also done a the computationally expensive task of finding
        // out how long the line is before backing up the pointer and reading it
        // but memory is cheap! so better to just to the `buf2` approach^
        lines[i] = buf;
        i++;
    }

    //int r = readline(fp, buf, maxlen);
    //printf("read %d bytes: %s\n", r, buf);

    return(0);
}

2025-11-07

#include <stdio.h>
#include <stdlib.h>

typedef struct {
	int wingspan;
	int canfly;
	char *name;
} bird;

int main() {
	bird duck;
	duck.wingspan = 24;
	duck.canfly = 1;
	duck.name = "daffy";
	
	bird *duckB = malloc(sizeof(bird));
	(*duckB).wingspan = 32;
	duckB->wingspan = 32; // equivalent
	
	return 0;
}

Where is duck?

(HYPOTHETICAL, ASSUMING INTS AND POINTERS ARE 1 BYTE)
Stack:

Label Address Value
duck.wingspan 0x3F 27
duck.canfly 0x3E 1
duck.name 0x3D 0x1C (address in data section)
duckB 0x3C 0x26
0x3B
0x3A

Heap:

Label Address Value
0x2A
0x29
name 0x28
canfly 0x27
wingspan 0x26 32

Let's create a new file format!

#include <stdio.h>
#include <stdlib.h>

typedef struct {
	int wingspan;
	int canfly;
} bird;

int main() {
	bird duck;
	duck.wingspan = 24;
	duck.canfly = 1;
	FILE *fp = fopen("duck.brd", "wb"); // wb is "write bytes", rb is "read bytes"
	
	// 4 arguments: memory address, how much memory to write, how many times, where to write (file pointer)
	fwrite(&duck, sizeof(duck), 1, fp);
	
	fclose(fp);
	return 0;
}

this gives us:
-rw-rw----+ 1 mficco fa25-cse-20289.01 8 Nov 11 16:29 duck.brd
notice it is 8 bytes! (two ints, each 4 bytes)

$ xxd duck.brd
00000000: 1800 0000 0100 0000                      ........
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef struct {
    int wingspan;
    int canfly;
    char *name;
    int namelen;
} bird;

int main() {
    bird duck;
    duck.wingspan = 24;
    duck.canfly = 1;
    duck.name = "donald";
    duck.namelen = strlen(duck.name);

    FILE *fp = fopen("duck.brd", "wb");
    fwrite(&duck.wingspan, sizeof(int), 1, fp);
    fwrite(&duck.canfly, sizeof(int), 1, fp);
    fwrite(&duck.namelen, sizeof(int), 1, fp);
    fwrite(duck.name, duck.namelen*sizeof(char), 1, fp);

    fclose(fp);
    return 0;
}

this gives us:

$ xxd duck.brd
00000000: 1800 0000 0100 0000 0600 0000 646f 6e61  ............dona
00000010: 6c64                                     ld

Function Pointers!

"So, a lot of times you'll hear people say things like, 'C is not an object-oriented language'. Heard that before? But don't worry, 'Python is an object-oriented language'. You know what? That statement has no meaning! It doesn't mean anything for the language to be object-oriented. What does that even mean? It doesn't mean anything. You write programs in an object-oriented fashion. You write programs in a functional form, or whatever. Okay? But the language itself has nothing to do with whether it's object-oriented or not. Yes, some languages automatically support object-oriented features more easily. I guess that's what they mean when they say it's an object-oriented language, but there's nothing, like...there's no reason that C is not an object-oriented language, really. It's just that you don't know how to write C in an object-oriented way. That's what that means. Try to throw shade on C when it's really your own fault!

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef struct {
    int wingspan;
    int canfly;
    char *name;
    int namelen;
    void (*noise)(); // "not all birds make the same noise"
} bird;

void chirp() {
    printf("chirp!!!!!!!!!\n");
}

void quack() {
    printf("quack!!!!!!!!!\n");
}

int main() {
    bird duck;
    duck.wingspan = 24;
    duck.canfly = 1;
    duck.name = "daffy";
    quack();

    // we doin't write void *fcn() because it could be confusing
    // () just matches the parameters list
    void (*fcn)() = &quack;

    // we can do all kinds of things now!
    // we invoke (dereference) the function pointer by:
    (*fcn)();

    // we can store it in a struct
    duck.noise = &quack;
    // and then call it just like this!
    duck.noise();

    return 0;
}

2025-11-10

Snow Day!

2025-11-12

the "self" is an illusion

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef struct bird bird;

struct bird {
    int wingspan;
    int canfly;
    char *name;
    int namelen;
    void (*noise)(bird *);
} bird;

void chirp(bird *self) {
    printf("chirp!!!!!!!!!\n");
}

void quack(bird *self) {
    printf("quack!!!!!!!!! %s\n", self->name);
}

int main() {
    bird sparrow;
    sparrow.wingspan = 8;
    sparrow.canfly = 1;
    sparrow.name = "sparrow";
    sparrow.noise = &chirp;
    duck.noise();

    return 0;
}

Why are function pointers important?

  1. we can make proto-object like things in C (see code above)
  2. avoids repetitive code for similar operations (see code below)
void arry_op(int *ary, int arylen, int (*fcn)(int))
{
    int i;
    for(i=0; i<arylen; i++)
    {
        ary[i] = (*fcn)(ary[i]);
    }
}

int mult_five(int a)
{
    return(a*5);
}

int sub_six(int a) {
    return(a-6);
}

2025-11-14

What is a segfault anyway?

An error that means you tried to read or write memory that you're not allowed to read or write

Palindrome Finder

(writing the program with some intentional bugs to find)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int is_pal(char *s) {
    char *head = s;
    char *tail = s + strlen(s);

    while(*head == *tail) {
        head++;
        tail--;
    }

    if(head >= tail)
        return 1;

    return 0;
}

int main() {
    char *buf = malloc(24 * sizeof(char));
    fgets(buf, 24, stdin);

    printf("%d\n", is_pal(buf));

    return(0);
}

Something's wrong...let's debug!

gcc -g -o pal pal.c

we can now use it with gdb!

let's set a break point: b 7
then we run the program: run (and type in our input to stdin)
to execute next line: n
we can print values:
p *head
p *tail
p tail
Now we see our issue! (tail initially points to \0)
We also need to "chomp" the \n character that is captured by fgets

But we're not done...
...if we run valgrind ./pal we get a bunch of issues!

int is_pal(char *s) {
    char *head = s;
    char *tail = s + strlen(s) - 1; // <---

    while(*head == *tail) {
        head++;
        tail--;

        if(head >= tail) // <---
            break;
    }

    if(head >= tail)
        return 1;

    return 0;
}

void chomp(char *s) { // <---
    if(s[strlen(s)-1] == '\n') {
        s[strlen(s)-1] = '\0';
    }
}

int main() {
    char *buf = malloc(24 * sizeof(char));
    fgets(buf, 24, stdin);

    chomp(buf); // <---

    printf("%d\n", is_pal(buf));

    free(buf); // <---

    return 0;
}

But we're still getting a bunch of issues!

errors in valgrind we will see:

"invalid read of size 1"

"conditional jump or move depends on uninitialized values"

to fix: we break as soon as head >= tail:

int is_pal(char *s) {
    char *head = s;
    char *tail = s + strlen(s) - 1;

    while(*head == *tail) {
        head++;
        tail--;

        if(head >= tail) // <---
            break;
    }

    if(head >= tail)
        return 1;

    return 0;
}

2025-11-17

An example of a memory anti-pattern in C:

readline(fp, malloc(...), len);
// not only do we not free the allocated space, we never tracked it to begin with!

Another one:

// in a loop to read each line...
	char *buf = malloc(...);
	char *splits[27];
	//...
	dino->name = splits[5];
	//...
	for (int i=0; i<numsplits; i++) {
		free(split[i])	
	}
	// this is wrong; we don't want to free stuff in each dino before we use it!

Note that this code would still work! We deallocated the space but the values are still there

Recap: What happens when you run a program

#include <fcntl.h>
#incldue <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
// notice: no stdio.h! we are going to ask the kernel directly

int main() {
	char buf[24] = "yay notre dame!\n";
	
	int fd;
	
	fd = syscall(SYS_open, "example.txt", O_RDRW);
	
	syscall(SYS_write, fd, &buf, 24);
	
	syscall(SYS_close, fd);
}
fid name path
0
1
2
...

in kernel, there is an ivt (interrupt vector table):

process signal function
5000 13 foo()

2025-11-19

final exam topics on C (approx. 70%):

signalrecv.c

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

void handle_sig(int sig) {
    printf("got signal %d\n", sig);
    char buf[24];
    FILE *fp = fopen("msg.txt", "r");
    if(fgets(buf, 24, fp) != NULL)
    {
        printf("%s\n", buf);
    }
    fclose(fp);
}

int main() {
    // signal is just a wrapper for a syscall
    signal(SIGUSR1, handle_sig);

    while(1) { }

    return(0);
}

sendsig.c

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
    // kill is just a wrapper for a syscall
    kill(1769796, SIGUSR1);

    return(0);
}

this doesn't have to be in C! sendsig.py:

import os
import signal

pid = 1772661

with open("msg.txt", "w") as fp: # here we write a dead drop message!!
    fp.write("secret!!!")

os.kill(pid, signal.SIGUSR1)

The point is, we aren't communicating with the program ourselves. It is through the kernel!


in ivt, there are more things that are kept track of (other than process, fig, fcn that we loosely described earlier)

kernel also keeps track of program/execution context (the things which the program is doing at that particular time)

signalrecv2.c

// context is important for threading, knowing which thread to pick back up
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

void handle_sigusr1(int sig, siginfo_t *siginfo, void *context) {
    printf("got signal %d from pid %d from uid %d\n", sig, siginfo->si_pid, siginfo->si_uid);

    char buf[24];
    FILE *fp = fopen("msg.txt", "r");

    if (fgets(buf, 24, fp) != NULL) {
        printf("%s\n", buf);
    }

    fclose(fp);
}

int main() {
    struct sigaction act1;
        act1.sa_sigaction = &handle_sigusr1;
        act1.sa_flags = SA_SIGINFO;

    sigaction(SIGUSR1, &act1, NULL);

    while(1) { }

    return(0);
}

2025-11-21

Today we will be covering threads! Real threads! (Not those phony Python/Ray threads which are actually super-heavy-duty processes that communicate via super-heavy-duty sockets.)

first: what exactly is a void pointer?

int main() {
    int x = 42;
    int *p = &x;
    p++; // adds to the pointer the size of the type of the value to which it points!

    // But...if you have to deal with contiguous blocks of memory whose size you don't know:
    void *p2 = &x;
    // Now, this is NOT ALLOWED:
    p2++;
    // To use a void pointer, we must first cast it, and then dereference:
    int y = *(int *)p2;

    // interestingly, we can create and array of void pointers! (same size of any other pointer)
    void *ary[6];
    ary[0] = &x;
    long l = 9182374;
    ary[1] = &l;
}

What is a thread?

 (etc)
------------
| STACK t1 |
-----v------ <- our new thread!
|          |
------------
| STACK t0 |
-----v------
|          |
|          |
-----^------
|   HEAP   |
------------
|   DATA   |
------------
|   CODE   |
------------
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <pthread.h>

#define NT 4

typedef struct {
    int tn;
} threadarg;

int rets[NT];

void * threadfcn(void *varg) {
    threadarg *arg = (threadarg *) varg;

    printf("in threadfcn %d\n", arg->tn);

    int i = 0;
    for(int j=0; j<INT_MAX; j++) i++;

    rets[arg->tn] = i;
}

int main() {
    int i;

    pthread_t tid[NT];
    threadarg targs[NT];

    for(i=0; i<NT; i++)
        targs[i].tn = i;

    for(i=0; i<NT; i++) {
        // pthread_create is a wrapper for a bunch of syscalls
        pthread_create(&tid[i], NULL, threadfcn, (void *) &targs[i]);
    }

    for(i=0; i<NT; i++) {
        pthread_join(tid[i], NULL);
        printf("thread %d returned %d\n", i, rets[i]);
    }

    return(0);
}

gdb with threads

  1. set a breakpoint in the thread function
  2. info threads to see our threads (thread 1 is main)
  3. run: gdb will stop us at breakpoint in first thread
  4. finish to let that thread return
  5. continue to move on and stop in the next thread at the same breakpoint

2025-11-24

4 reasons we want to use C

Described milestone assignments (m3c, m3d)

2025-12-01

Recall:

------------
| STACK t1 |
-----v------ <- our new thread!
|          |
------------
| STACK t0 |
-----v------
|          |
|          |
-----^------
|   HEAP   |
------------
|   DATA   |
------------
|   CODE   |
------------

Threads being able to access the same memory space is a double edged sword:

we want to isolate t1 and t0.
instead of threading, we want...

...Multiprocessing!

Instead of asking the kernel to make a new thread stack, we can ask it to clone our memory space!

when we call fork, it makes an identical copy. That includes the stack! The function that is currently running is in the stack!
So all that stuff, the chain of function invocations, is also duplicated.
That means that the clone wakes up, and it just starts executing, right? It doesn't know the difference.
But the clone doesn't know any better, becuase it has the same memories!

Only one way to find out which one is the parent, and which one is the child:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    int pid = getpid();
    printf("my pid: %d\n", pid);

    int cpid = fork();

    // what would we happen if we did:
    //
    // for (int i=0; i<100; i++) cpid = fork();
    //
    // "fork bomb"!

    if(cpid == 0) {
        // I am the child / clone
        printf("help help I just realized I am the clone!!??\n");

        return 0;
    } else {
        // I am the parent / original
        printf("child pid: %d\n", cpid);

        // collect return value of child
        int r = wait(NULL); // if we didn't have this, we would notice our child become a zombie!

        while(1) { }
    }

    return 0;
}

Let's go and modify our previous thread program to use fork instead of pthread_create:

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/wait.h>

#define NT 4

typedef struct {
    int tn;
} threadarg;

// int rets[NT];

void * threadfcn(void *varg) {
    threadarg *arg = (threadarg *) varg;

    printf("in threadfcn %d\n", arg->tn);

    int i = 0;
    int j;
    for(j=0; j<INT_MAX; j++)
        i++;

    //rets[arg->tn] = i;
    // ^we can't do this!!
    // each child process has a different data segment!!!
    // the only way that we can share data is through a shared location like a temporary file on disk

    char *fn = malloc(128 * sizeof(char));
    sprintf(fn, "tmp_%d.bin", getpid());
    FILE *fp = fopen(fn, "wb");
    fwrite(&i, sizeof(int), 1, fp);
    fclose(fp);
    free(fn);
}

int main() {
    int i;
    int r;
    int cpid;
    int wres;

    //pthread_t tid[NT];
    threadarg targs[NT];

    for(i=0; i<NT; i++)
        targs[i].tn = i;

    for(i=0; i<NT; i++) {
        cpid = fork();
        if(cpid == 0) {
            // "Oh! I'm the clone, so I must go do work!"
            threadfcn((void *) &targs[i]);
            return 0; // VERY IMPORTANT TO RETURN IMMEDIATELY AFTER!
        }
        //pthread_create(&tid[i], NULL, threadfcn, (void *) &targs[i]);
    }

    for(i=0; i<NT; i++) {
        // order varies, unlike pthread_join where we wait on a specific thread id
        // (just whichever child process happens to finish)
        wres = wait(NULL); // becuase we pass NULL, wres will get cpid

        char *fn = malloc(128 * sizeof(char));
        sprintf(fn, "tmp_%d.bin", wres);
        FILE *fp = fopen(fn, "rb");
        fread(&r, sizeof(int), 1, fp);
        fclose(fp);
        remove(fn); // delete file
        free(fn);

        //pthread_join(tid[i], NULL);
        //printf("child %d returned %d\n", i, rets[i]);
        printf("child %d returned %d\n", i, r);
    }

    return(0);
}

2025-12-03

ipv4 tcp internet socket

kernel has "sockaddress_in" data structure contains ip address and hostname (asks DNS and comes back)

Let's start in python:

recall our client.py program we wrote earlier:

import socket

HOST = "student10.cse.nd.edu"  # Server IP (localhost for testing)
PORT = 9999        # Must match the server’s port

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.connect((HOST, PORT))
    message = "Hello, MD! Oct.1 2025 - from systems programming class"
    s.sendall(message.encode())
    print(f"Sent: {message}")
    gotmesg = s.recv(1024)
    print(gotmesg.decode())

Now in C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <math.h>
#include <limits.h>
#include <errno.h>
#include <sys/wait.h>
#include <signal.h>
#include <netdb.h>
#include <netinet/in.h>

void print_banner(int sockfd)
{
    int n;
    int buflen = 2048;
    char buf[buflen];

    memset(buf, 0, buflen);
    n = read(sockfd, buf, buflen-1);

    if(n < 0)
    {
        perror("could not read");
        exit(1);
    }

    printf("recv: %s\n", buf);

    //memset(buf, 0, buflen);
    //sprintf(buf, "hi!");
    n = write(sockfd, buf, strlen(buf));

    if(n < 0)
    {
        perror("could not write");
        exit(1);
    }
}

int main(int argc, char *argv[])
{
    int sockfd, portno, n;
    struct sockaddr_in serv_addr;
    struct hostent *server;

    int buflen = 32;
    char buf[buflen];

    portno = atoi(argv[2]);
    server = gethostbyname(argv[1]);

    sockfd = socket(AF_INET, SOCK_STREAM, 0);

    if(sockfd < 0)
    {
        perror("could not create the socket");
        exit(1);
    }

    bzero((char *) &serv_addr, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;
    bcopy((char *)server->h_addr, (char *)&serv_addr.sin_addr.s_addr, server->h_length);
    serv_addr.sin_port = htons(portno);

    if(connect(sockfd, (struct sockaddr*)&serv_addr, sizeof(serv_addr)) < 0)
    {
        perror("socket could not connect");
        exit(1);
    }

    // do something with socket
    print_banner(sockfd);

    close(sockfd);

    return 0;
}

2025-12-05

A class like Data Structures looks inward. Systems Programming (this class) looks outward

Last time we were talking about client-side sockets in C

So, last time we were talking about "dialing"

Today we will look at server-side!
Bind, Listen, Accept

#include <errno.h>
#include <sys/wait.h>
#include <signal.h>
#include <netdb.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

void handle_incoming_conn(int cl)
{
    int buflen = 256;
    int rc;
    char *buf = malloc(buflen * sizeof(char));
    memset(buf, 0, buflen);

    while((rc = read(cl, buf, buflen-1)) > 0)
    {
        printf("read %d bytes: %s\n", rc, buf);
        memset(buf, 0, buflen);
    }

    free(buf);

    close(cl);
}

void handle_sigchld(int sig)
{
    wait(NULL);
}

int main()
{
    int sock, port, t, cl, cpid;

    sock = socket(AF_INET, SOCK_STREAM, 0);

    port = 8003;

    signal(SIGCHLD, &handle_sigchld);

    struct sockaddr_in addr;
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = INADDR_ANY;
    addr.sin_port = htons(port);

    t = bind(sock, (struct sockaddr *) &addr, sizeof(addr));

    if(t < 0)
    {
        perror("could not bind");
        return -1;
    }

    listen(sock, 5);

    while(1)
    {
        cl = accept(sock, NULL, NULL);

        cpid = fork();

        if(cpid == 0)
        {
            handle_incoming_conn(cl);
            return 0;
        } else
        {
            close(cl);
        }
    }

    close(sock);

    return 0;
}

SIGCHLD - whenever a child stops or is terminated, the kernel will send this signal to the parent

2025-12-08

Lótus Bridge
on one side of the bridge, people drive on the left, and on the other, they drive on the right!

libgeodist.c: (pretend this is a complicated c program to interface with a device or parallelize lots of things)

#include <math.h>
#include <string.h>

#include "libgeodist.h"

#define pi 3.14159265358979323846

double deg2rad(double deg) { return (deg * pi / 180); }
double rad2deg(double rad) { return (rad * 180 / pi); }

double geodist(double lat1, double lon1, double lat2, double lon2) {
    double theta, dist;
    if ((lat1 == lat2) && (lon1 == lon2)) {
        return 0;
    } else {
        theta = lon1 - lon2;
        dist = sin(deg2rad(lat1)) * sin(deg2rad(lat2)) + cos(deg2rad(lat1)) * cos(deg2rad(lat2)) *    cos(deg2rad(theta));
        dist = acos(dist);
        dist = rad2deg(dist);
        dist = dist * 60 * 1.1515;
        return (dist);
    }
}

int palindrome(char *s) {
    char *head = s;
    char *tail = s + strlen(s) - 1;

    while(head < tail && *head == *tail) {
        head++;
        tail--;
    }

    if(head >= tail)
        return 1;

    return 0;
}

anecdote about living in germany and saying a hyperbole -- translated correctly, but not appreciated in cultural context

cgeodist.py:

import pathlib
import ctypes

class cgeodist:
    def __init__(self):
        self.libname = pathlib.Path().absolute() / "libgeodist.so"
        self.c_lib = ctypes.CDLL(self.libname)

    def geodist(self, lat1, lng1, lat2, lng2):
        lat1 = ctypes.c_double(lat1)
        lng1 = ctypes.c_double(lng1)
        lat2 = ctypes.c_double(lat2)
        lng2 = ctypes.c_double(lng2)

        self.c_lib.geodist.restype = ctypes.c_double
        ret = self.c_lib.geodist(lat1, lng1, lat2, lng2)
        return ret

    def palindrome(self, word):
        word = word.encode('utf-8')
        word = ctypes.c_char_p(word)

        self.c_lib.palindrome.restype = ctypes.c_int
        ret = self.c_lib.palindrome(word)

        if(ret == 1):
            ret = True
        else:
            ret = False

        return ret

now we can use our c library as a python library!

from cgeodist import *

gd = cgeodist()

dist = gd.geodist(41.643274, -86.382732, 52.382344, 11.348448)
print(dist)

p = gd.palindrome("civic")
print(p)

Why would we use language bindings? If you have to access another language for something:

  1. Special file format
  2. Device driver
  3. Speed (parallelization, access to gpu, etc.)

"marshalling is the thing that you do to connect, technically and culturally, one language's parts to another."

2025-12-10

Final Exam Review
CSE20289 Final Exam Practice