Skip to main content

Placeholder description

Project description

pawk is a python-based replacement for awk.

It uses python for line-by-line processing of files

Examples:

#pawk automatically reads lines as csv rows and stores the result as a list in "r"
#-g ("grep") keeps a subset of lines satisfying a given condition

#Selects lines from input.txt with at least 3 csv fields
> pawk -f input.txt -g 'len(r) > 2'


#Keep a subset of lines where the second csv field is non-empty
> pawk -f input.txt -g 'r[1]'


#The above may crash if some lines have only one csv field
#Use this instead:
> pawk -f input.txt -g 'len(r) > 1 and r[1]'


#The raw line is stored in the "l" variable
#Keep a subset of lines where l isn't empty and the first character is "a"
> pawk -f input.txt -g 'l != "" and l[0] == "a"'


#Run certain code for each input line using -p
#Using -p prevents the default printing of the line

#For each line of the input, print that line with whitespace stripped
> pawk -f input.txt -p 'print l.strip()'


#default value of -f is /dev/stdin
> less input.txt | pawk -p 'print len(r)'


#-d sets the input delimiter
#the output delimiter is ",", so this command converts a tsv to a csv
> pawk -f input.txt -d '\t'


#pawk store the line number (zero-indexed) in the "i" variable
#only keep lines starting with the 1133rd
> pawk -f input.txt -g 'i>=1132'


#replace a regular expression from each line (python re module imported by default)
> pawk -f input.txt -p 'print re.sub("U_C_Rate","firearm_rate",l)'

#-b runs code before any lines are processed
#-e runs code after all lines are processed
#To add up a list of floats
> pawk -f input.txt -b "cnt=0" -p "cnt += float(l)" -e "print cnt"





Writing multi-line python in pawk:
Heavily inspired by a source I can't find right now, pawk can process strings representing multi-line python.

examples:
#(semi-colon) or (colon+whitespace) causes a line break
'import random; print(random.random())'
-->
import random;
print(random.random())


#after lines with (colon+whitespace) successive lines are automatically indented:
'if i>3: print("hello world!"); a += 1; b = 0'
-->
if i>3:
print("hello world!");
a += 1;
b == 0


#use the 'end;' keyword to force indent level to decrease (compare this example with the above)
'if i>3: print("hello world!"); end; a += 1; b = 0'
-->
if i>3:
print("hello world!");
a += 1;
b = 0


#"elif:", "else:" and "except:" automatically cause indenting to decrease
'if i>3: print("a"); elif i>1: print("b"); else: print("c")'
-->
if i>3:
print("a");
elif i>1:
print("b");
else:
print("c")


#you can define functions!
'def test123(): print("hello world!"); end; test123(); test123(); test123();'
->
def test123():
print("hello world!");
test123();
test123();
test123();

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-awk-0.0.4.tar.gz (5.0 kB view details)

Uploaded Source

File details

Details for the file python-awk-0.0.4.tar.gz.

File metadata

  • Download URL: python-awk-0.0.4.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for python-awk-0.0.4.tar.gz
Algorithm Hash digest
SHA256 75f98c6b5940cff2fb123729ab600bf193585e40fe7dff1bfc893cf3babd746c
MD5 c182e03b9e1ad81aa7905025d80627f6
BLAKE2b-256 d8f580848afe4e0dcc9d4f597711a7978bd5c018902f3949c97df1a896fce8b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page