Jimmy Christensen
October 31, 2022
Slides available at https://dusted.dk/pages/scripting/slides.html
Linux is an operating system, like Windows or OSX, it allows you to run programs.
Popular “command line” for Linux & OSX
Popular scripting language
Bash, grep, awk, sed, tr, cut:
Bash scripts using above programs to automate extraction.
Python scripts to remodel and consolidate extracted data, calculate and prepare results.
Nano is a (bit too) simple text editor
/home/chrisji/theWork/important.txt
Peek at unknown file: hd strangeFile | head
Check size: ls -lh strangeFile
Only beginning: head strangeFile
Page through it: more strangeFile
Composition, also useful in bash scripting
Same difference, almost
# This line is a comment, because it starts with the hash mark
# comments are to help the humans (us)
both bash and python # ignores a hash mark and any following text
but only for rest of that line
Creating the script with the nano editor
Holds something, has a name, like box with a label on it
Hello World!
They can be re-assigned!
What’s the result ? (remember, line-by-line)
A bash variable can store about 2 megabytes of data, python about 64 gigabytes
#!/bin/bash
# Create some variables and assign them values
timesToCall=3
animal="dog"
name="viggo"
# Output to the screen
echo "Calling for $name the $animal $timesToCall times!"
Calling for viggo the dog 3 times!
Variables assigned from command-line
Bash has strong string-manipulation capabilities, quickly degenerates into alphabet soup
Don’t go there, use python
For the curious: https://tldp.org/LDP/abs/html/string-manipulation.html
It also has arrays and sets, bash syntax makes it painful
Conditionals let’s us decide if we want to do something
The “else” word lets us do something if the condition is not met
The elif is optional, allows chains
if CONDITION_A
then
# something to do if A..
elif CONDITION_B
then
# This must not happen if A, but if B.
elif CONDITION_C
then
# If neither A nor B, but C, then do this
else
# If not A, B nor C, then do this..
fi
else is also optional
In bash, a condition is met if the [ ] builtin or a program exit with the “success” status code.
true # A program that returns success
false # A program that returns.. not success
grep # Will return success if it finds at least one match
[ "$name" == "viggo" ] # Will success if the variable contains the text viggo
[ $a -gt $b ] # success if a is Greather Than b
[ $a -lt $b ] # success if a is Less Then b
[ $a -eq $b ] # success if a is EQual to b
[ -z ${a+x} ] # success if a is empty (weird syntax, I know)
[ "$name" == "viggo" ] && [ "$animal" == "dog" ] # success only when name is viggo AND animal is dog (both must be true)
[ "$name" == "viggo" ] || [ "$animal" == "dog" ] # success if name is viggo or animal is dog (one or both must be true)
dog is an animal
fish is an animal
cat is an animal
bird is an animal
dog fish cat bird is an animal
Use expansion for files, this produces a sequence of .txt files in current directory
Repeats as long as CONDITION is met (potentially forever)
This is a good time to remind you that pressing ctrl+c (eventually) terminates a running script.
When you need to repeat something until “things are done”
Useful for interactive scripts
echo "Welcome to super cool script v 24!"
# The -p parameter is a prompt to show to the human
read -p "File to corrupt: " fileName
echo "Don't worry, I was just kidding, nothing happened to $fileName"
Welcome to super cool script v 24!
File to corrupt: homework.md
Don't worry, I was just kidding, nothing happened to homework.md
answer=""
while [ "$answer" != "y" ] && [ "$answer" != "n" ]
do
read -p "Continue? [y/n] ?" answer
done
if [ "$answer" == "n" ]
then
echo "Stopping here."
exit
fi
echo "Continuing"
Continue? [y/n] ?yes
Continue? [y/n] ?...
Continue? [y/n] ?yeees
Continue? [y/n] ?noooo ?
Continue? [y/n] ?n
Stopping here.
For small files, we can read the entire file into a variable by running the cat program and capturing its output in a variable
For small outputs, we can capture the entire standard (non-error) output into a variable
Read exits with success unless standard input is closed
We can compose complex pipelines now
grep 'someRegex' secrets.txt | while read line
do
# Extract some field from the line
importantThing=$(echo $line | cut -f 2 -d ',' | awk magic things)
# Look for that field in some other file, put the result in a variable
if resultFromOtherFile=$(grep $importantThing otherSecrets.txt)
then
# Eureka! some correlation is interesting when the field from subset of secrets.txt is in otherSecrets.txt!
# Maybe combine on one line and print it out?
echo "$line,$resultFromOtherFile"
fi
done > results.txt
We can also write a script that takes a stream from a pipe, for use with other scripts
ourCoolScript.sh
We’d use it like this
We just use the > and >> after any command, or after the script itself.
See EVERYTHING that happens during script execution by adding set -x
#!/bin/bash
# This script needs 3 paramters, name, animal and number of legs
name=$1
animal=$2
nlegs=$3
if [ -z ${nlegs+x} ]
then
echo "Usage $0 NAME ANIMAL NUM_LEGS"
echo " NAME - The name of the pet (example: viggo)"
echo " ANIMAL - What kind of animal is it? (example: dog)"
echo " NUM_LEGS - How many legs does it have? (example: 4)"
exit 1
fi
Even if you’re the only user, you might forgot how to use it later
$0 is special, it is always the name of the script in question.
Suggestion for treating multiple files while keeping track of the source of each result
Let there be binary files data1.bin data2.bin … dataN.bin that can be translated by a “translatebinarytotext” program and filtered by some regular expression to find relevant lines:
#!/bin/bash
for fileName in data*.bin
do
translatebinarytotext "$fileName" \
| grep 'someRegex' \
| ./ourAwesomeScript.sh "$fileName" \
>> result.txt
done
The polished version of above alternative could be used like a real program.
Then the script would look like:
It can be useful to write out such “debug” information to the standard error channel, this way it goes “around” pipes and are shown on screen.
#!/bin/bash
# This function writes a message to stderr, so our script output can be piped while we still see information/errors.
function msg {
echo $@ >&2
}
msg "Script for doing cool stuff running in $(pwd)"
echo "Important data output"
bash coolStuffDoer.sh > test.txt
Script for doing cool stuff running in /home/chrisji/theWork/
cat test.txt
Important data output
Script and data does not need to be the same place, you can write the script as if it is next to the files.
.
├── data
│ └── seta
│ └── wool.txt
└── scripts
└── sheep.sh
scripts/sheep.sh:
Alternative, write script that uses parameters:
.
├── data
│ └── seta
│ └── wool.txt
└── scripts
└── sheep.sh
scripts/sheep.sh:
Alternative, write script that uses pipes:
.
├── data
│ └── seta
│ └── wool.txt
└── scripts
└── sheep.sh
scripts/sheep.sh:
A script that reads which files to work on from a file
filesToWorkOn.txt
file1.txt
file2.txt
file3.txt
process.sh
#!/bin/bash
listFileName="$1"
filesToProcess=$(cat "$listFileName")
for fileToProcess in $filesToProcess
do
echo "Processing file: $fileToProcess"
done
./process.sh filesToWorkOn.txt
Processing file: file1.txt
Processing file: file2.txt
Processing file: file3.txt
Types & objects
# integer
length=23
# floating point
pi=3.1415
tau=pi*2
# set
instances={ pi, tau, tau, 'dog', 'viggo' }
# How many tau is in the set? 1. It's a set.
# What order are the items in? None, sets are unordered.
# Convenient for acumulating unique occurences of stuff
instances.add( 42 )
if tau in instances:
print("Tau is in the set!")
# list
aList = [ 5,3,12,5 ]
# Lists are ordered, duplicates are allowed. (5 occurs twice, at the start and end)
# In python and most other languages, the first element of a list is 0
print(f"The second element in the list is {aList[1]})
Dictionaries makes it easy to structure data
More than everything you need to know about Python variables at
Conditionals easier to read in python, generally we don’t run external programs, so the boolean result of an expression is all we worry about.
A condition is met when the result is True or any non-zero value.
A condition is not met when the result is False or 0, or empty (such as the empty string "" or list [])
a > b # Is True if a is greater than b
a < b # Is True if a is less than b
a == b # Is True if a is equal to b
a != b # Is true if a is not euqal t b
a < b and b > c # IS true if a is less than b AND b is greater than c
a > b or b < c # Is true if a is greater than b or if b is less than c
a in c # Is true if a is an element in c
a not in c # Is true if a is not an element in c
Python is indentation sensitive, that is, the number of “tabs” indicate to which code-block a line belong.
Number: 10
Number: 11
Number: 12
Number: 13
Number: 14
Number: 15
Number: 16
Number: 17
Number: 18
Number: 19
More about the CSV module at https://docs.python.org/3/library/csv.html
#!/usr/bin/python3
import sys
import csv
# This csv file used ; instead of ,
reader = csv.reader(sys.stdin, delimiter=';')
rowNumber=0
for row in reader:
name=row[0]
age=int(row[1]) # Convert from text to number
print(f"Row {rowNumber}, name: {name} age: {age}")
if age > 5:
print(f"{name} is an old doggie!")
rowNumber += 1
Much more about loops here: https://wiki.python.org/moin/ForLoop
With an open file, readline will read a single line from the file, useful for large files
#!/usr/bin/python3
fileName="poe.txt"
with open(fileName) as textFile:
while True:
line = textFile.readline()
if not line: # If the line was empty, file end reached
break; # so break the lop
line = line.strip();
print(f"{line}")
Once upon a midnight dreary,
while I pondered, weak and weary,
Over many a quaint and curious
volume of forgotten lore—
With an open file, readlines will read from the file into a list
#!/usr/bin/python3
lineNumber=0
fileName="poe.txt"
with open(fileName) as textFile:
lines = textFile.readlines()
for line in lines:
lineNumber += 1
line = line.strip(); # remove newline from the line
print(f"{fileName} {lineNumber}: {line}")
poe.txt 1: Once upon a midnight dreary,
poe.txt 2: while I pondered, weak and weary,
poe.txt 3: Over many a quaint and curious
poe.txt 4: volume of forgotten lore—
reverser.py
#!/usr/bin/python3
inFileName="poe.txt"
outFileName="eop.txt"
with open(inFileName) as inFile:
with open(outFileName, 'w') as outFile:
while True:
line = inFile.readline()
if not line: # If the line was empty, file end reached
break; # so break the lop
line = line.strip();
line = line[::-1] # Reverse the string
line = line + "\n"
outFile.write(line)
#!/usr/bin/python3
import csv
ourResults = []
ourResults.append( ['Johnny', 'Cat', 4 ] )
ourResults.append( ['Viggo', 'Dog', 5 ] )
ourResults.append( ['Mat', 'Dog', 7 ] )
ourResults.append( ['Markus', 'Cat', 7 ] )
with open("out.csv", "w") as csvOutFile:
writer = csv.writer(csvOutFile, dialect='excel')
writer.writerow(['Name', 'Species', 'Age'])
for row in ourResults:
writer.writerow(row)
#!/usr/bin/python3
import csv
csvHeader = [ 'Name', 'Species', 'Age' ]
ourResults = []
ourResults.append( { 'Name': 'Johnny', 'Species': 'Cat', 'Age': 4 } )
ourResults.append( { 'Name': 'Viggo', 'Species': 'Dog', 'Age': 5 } )
ourResults.append( { 'Name': 'Mat', 'Species': 'Dog', 'Age': 7 } )
ourResults.append( { 'Name': 'Markus', 'Species': 'Cat', 'Age': 7 } )
with open("out.csv", "w") as csvOutFile:
writer = csv.DictWriter(csvOutFile, fieldnames=csvHeader, dialect='excel')
writer.writeheader()
for row in ourResults:
writer.writerow(row)
import re # regex module
text = "He was carefully disguised but captured quickly by police."
subString = re.split(r"but", text) # ['He was carefully disguised ', ' captured quickly by police.']
subString = re.findall(r"\w+ly\b", text) # ['carefully', 'quickly']
More about regex at https://docs.python.org/3/library/re.html
#!/usr/bin/python3
listOfAnimals = [
{ "Name": "Viggo", "Species": "Dog", "Age": 5 },
{ "Name": "Mat", "Species": "Dog", "Age": 6 },
{ "Name": "Oliver", "Species": "Cat", "Age": 5 },
{ "Name": "Luggie", "Species": "Dog", "Age": 5 }
]
fiveYearOldDogs = []
for animal in listOfAnimals:
if animal["Species"] == "Dog" and animal["Age"] == 5:
fiveYearOldDogs.append(animal["Name"])
print(fiveYearOldDogs)
['Viggo', 'Luggie']
#!/usr/bin/python3
# repeater will return a textToRepeat repeated timesToRepeat times.
# If timesToRepeat is not provided, a default value of 5 is used
def repeater( textToRepeat, timesToRepeat=5 ):
return textToRepeat * timesToRepeat
repeatedString = repeater("test", 3)
print(repeatedString)
testtesttest
#!/usr/bin/python3
listOfAnimals = [
{ "Name": "Viggo", "Species": "Dog", "Age": 5 },
{ "Name": "Mat", "Species": "Dog", "Age": 6 },
{ "Name": "Oliver", "Species": "Cat", "Age": 5 },
{ "Name": "Luggie", "Species": "Dog", "Age": 5 }
]
def isLuggie( animal ):
return animal["Species"] == "Dog" and animal["Age"] == 5 and animal["Name"] == "Luggie"
num = 0
for animal in listOfAnimals:
num += 1
if isLuggie(animal):
print(f"Found {animal['Name']} he is number {num} in the list")
Found Luggie he is number 4 in the list
#!/usr/bin/python3
myList = [ 2, 6, 1 ,4 ]
myList.sort() # Sort will sort the list "in place"
print(myList)
myList = [
{ 'Age': 5, 'Name': 'Viggo' },
{ 'Age' : 2, 'Name': 'Mat' }
]
def getAge(animal):
return animal['Age']
myList.sort( key=getAge)
print(myList)
[1, 2, 4, 6]
[{'Age': 2, 'Name': 'Mat'}, {'Age': 5, 'Name': 'Viggo'}]