DSL Tips and Tricks :: Recursive-strip C comments



Big Fat Warning:
Hi all,
None of the methods described here actually work.
I haven't found any workable solution yet.
Everything I've posted are just preliminary attempts.
Please do not use ANY of these examples in their unmodified form on any source code!!!
Problems:
-- No reliable way to preserve directives ("#define", "#ifdef", "#endif", etc).
-- "Missing semicolon" compilation errors after deleting comment blocks (the comment blocks were acting the role of semicolons by signifying the end of a function).
-- Most probably, there are many, many more problems which I haven't ran into yet.

I am a newbie who is very new to sed and regular expressions, and:
I am not even sure sed is the proper tool for this purpose (stripping source code).
I am also a total Perl newbie (only just started reading the 'Llama Book' AKA 'Learning Perl, Third Edition' from O'Reilly).
(I wholeheartedly recommend the 'Llama Book' to anyone who is new to Perl -- It is very well written!)
Speaking as a total Perl newbie:
IMHO, I think someone who is versed in Perl may be able to produce a more ideal solution in Perl.


Code Sample
find ./ -name "*.h" | \
while read i; do \
cpp -P -fpreprocessed "$i" > "$i.tmp" \
&& mv "$i.tmp" "$i"; done
find ./ -name "*.h" | \
while read i; do \
sed -i '/^ *$/d' "$i"; done
This will recursively strip all comments and empty lines from any C headers in the current directory. This would help reduce the size of any '-dev' extensions.
Any improvements are most welcome!
Thanks.

Explanation:
Code Sample
cpp -P -fpreprocessed "$i" > "$i.tmp" && mv "$i.tmp" "$i"; done
1. Process all '.h' files with 'cpp' and overwrite original files with processed files.
Code Sample
sed -i '/^ *$/d' "$i"
2. '/^ *$/d' -- Remove lines that: begin with any number of spaces and end after any number of spaces (i.e. lines that contain only spaces).

Possibly useful also:
1. 'whitespace_stripper.sh'
Code Sample
for i in "$@";
do sed -i '/^ *$/d' "$i"; done
e.g. 'whitespace_stripper.sh file1 file2 file3 file4 [...]'
2. 'stripsh' (strip shell scripts)
Code Sample
for i in "$@";
do sed -i -e 's/\t*#.*//g' \
-e 's/\ *#.*//g' \
-e '/^#/d' \
-e '/^\t*#/d' \
-e '/^ *#/d' \
-e '/^ *$/d' \
"$i"; done
e.g. 'stripsh file1 file2 file3 file4 [...]'
NOTE: This script has a problem -- It deletes the first line of any script; for example:
Code Sample
#!/bin/sh
You have to put the line back again after running the script.
Question for everyone: How do we make 'sed' ignore lines that begin with "#!"?
Thank you very much!

Explanation of above script:
's/\t*#.*//g' -- (Substitution) Pattern: Begins with any number of TABs ("\t"), followed by a "#", followed by any number of any character (".*"). Replacement: Null (i.e. deletes the relevant part of matching lines -- does NOT mean deleting the entire line).

'/^\t*#/d' -- (Deletion) Delete lines that: begin with any number of TABs, followed by a "#" (i.e. bash and perl comments).

'/^ *#/d' -- (Deletion) Delete lines that: begin with any number of SPACEs, followed by a "#" (i.e. bash and perl comments).

'/^ *$/d' -- (Deletion) Delete lines that: begin with any number of SPACEs, followed by an end-of-line ("$") -- i.e. delete empty lines.

well, find uses the current dir as a default, so the first command could only be

find -name "*.h" |

Nice anyway

How about adding a help function, and then calling this as remove-comments.sh?

There's also this issue: if the file has bash-style comments, aka lines starting with #, cpp will bail out and the .tmp file will be left there..

Here's my go:
Quote
#!/bin/sh
ext=h
case $1 in
-h* | --h*) cat << EOF
Use $0 in the top directory of your sources to strip all headers of comments.

$0 -ext sh        operates on .sh files instead of .h
EOF
;;
-ext) ext=$2;; esac

find -name "*.$ext" | \
while read i; do \
cpp -P -fpreprocessed "$i" > "$i.tmp"
mv "$i.tmp" "$i"; done
find -name "*.$ext" | \
while read i; do \
sed -i -e '/^ *$/d' -e '/^# /d' "$i"; done

I don't really see a whole lot of use for removing comments from headers, unless you see headers as being nothing more than dependencies for compiling. If you are a software developer, those comments can be very useful.

The final product can easily be reduced with the strip command.

One difference, stupid's version doesn't remove lines beginning with '#!'   (maintaining the space after # request).

Edit: I guess it needs line 14 joined to line 15 "\ &&  ", then it works. (as long as everyone places spaces in their comments after the '#')

Interesting code (although I'd agree with mikshaw in favor of comments and spaces)

Edited original post:
Changed 'cpp' command to
Code Sample
cpp -P -fpreprocessed
cpp manual: "-P -- Inhibit generation of linemarkers in the output from the preprocessor."

Next Page...
original here.