Awk: Set Output Filename Variable via Field Contents

I suspect the title of this post would win a prize for the worst on the internet. Oh well. On a new contract at the moment working with people from various backgrounds in IT. One of my new co-workers had a text file containing a dump of a postgres database and wanted to extract all the CREATE FUNCTION statements out of it for analysis; could I take a look as resident Linux guy? There was a lot of other stuff in the file, the functions were of varying length and split over a number of lines and had a couple of differing closing parameters. Awk was obviously the go-to tool here, but the request had the added complication that each function had to be stored in a seperate output file, named after the function that was being extracted into it. While matching between two patterns is easy in awk, I had some struggle finding out how to dump lines into a file whose name was set by a variable which was in turn set by the contents of a field. Google failed me so I’ve posted this in the hope it helps some other poor soul and saves them the syntax errors I endured figuring it out.

A CREATE FUNCTION statement looked like this in the file in question:

CREATE FUNCTION function_name(variable name type) RETURNS
LANGUAGE plpgsql
AS $$
    <Various SQL statements>

The $$; sequence was replaced by $_$ in other functions for some reason (personal preference of whoever wrote it I guess). The desired name for the output file would be ‘function_name’ in this example.

Here’s the (one liner of course) awk statement I ended up using (you may need to scroll to see the whole thing):

awk -F' |[(]' '/CREATE FUNCTION/ {p=1;name=$3}; p {print > name}; /\$\$;|\$_\$;/ {p=0;close(name)}; ' INPUT.sql

A brief explanation. The parameters to the -F set the awk field delimiter to either (thanks to the | symbol) a space ’ ’ or open bracket (.

The /CREATE FUNCTION/ is the first pattern to match against. When awk sees that, it sets variable p to 1, and sets the variable name to field 3 of that line, which in our case is the name of the function up to the (.

The p {print > name}; statement prints out lines if p equals 1 (the default action for awk). It also dumps any printed lines into a file, using the name variable as the filename. The position of the ‘p’ in the middle of this statement means that the whole desired range is printed, including the two patterns being matched against.

The (escaped) $$; sequence is an end pattern to match on, as is the alternative (signified by the | ) $_$; (also escaped). Once either of these is seen, awk sets p to zero, and closes the file. Failing to close the file results in an error relating to too many files being open.

The INPUT.sql is merely the file awk reads from.

The result – a long manual edit avoided and many small files, correctly named and containing the correct statements. Good old awk.

Update (2017). I’ve moved this page from my old blog (the only surviving post!) as it occasionally gets hits for very obscure awk related searches. Hopefully has helped someone out in the last few years..