Comparing Primitives

int compare_buffer_text(int buf1, int pos1,

                        int buf2, int pos2, int fold)

int buffers_identical(int a, int b)

The compare_buffer_text( ) primitive compares two buffers, specified by buffer numbers, starting at the given offsets within each. If fold is nonzero, Epsilon performs case-folding as in searching before comparing each character, using the case-folding rules of the current buffer. The primitive returns the number of characters that matched before the first mismatch.

The buffers_identical( ) subroutine checks to see if two buffers, specified by their buffer numbers, are identical. It returns nonzero if the buffers are identical, zero if they differ. If neither buffer exists, they're considered identical; if one exists, they're different.

do_uniq(int incl_uniq, int incl_dups, int talk) buf_sort_and_uniq(int buf)
The do_uniq( ) subroutine defined in uniq.e goes through the current buffer comparing each line to the next, and deleting each line unless it meets certain conditions.

If incl_uniq is nonzero, lines that aren't immediately followed by an identical line will be preserved. If incl_dups is nonzero, the first copy of each line that is immediately followed by one or more identical lines will be preserved. (The duplicate lines that follow will always be deleted.)

If talk is nonzero, the subroutine will display status messages as it proceeds.

The buf_sort_and_uniq( ) subroutine sorts the specified buffer and discards duplicate lines in it, with no status messages.

do_compare_sorted(int b1, int b2, char *only1, char *only2, char *both)
The do_compare_sorted( ) subroutine works like the compare-sorted-windows command, but lets you specify the two buffers to compare, and the names of the three result buffers. Any of the result buffer names may be NULL, and the subroutine won't generate data for that buffer.

int tokenize_lines(int buf1, int **lines1, int *len1, int buf2, int **lines2, int *len2) int lcs(int *lines1, int len1, int *lines2, int len2, char *outbuf)
These primitives help to compute a minimum set of differences between the lines of two buffers buf1 and buf2. See the implementation of the diff command for an example of their use.

Call the tokenize_lines( ) primitive first. It begins by counting the lines in each buffer (placing the results in len1 and len2). Then it uses the realloc( ) primitive to make room in the arrays passed by reference as lines1 and lines2, which may be null at the start. Each array will have room for one token (unique integer) for each line of its buffer. (The arrays may be freed after calling lcs( ), or reused in later calls.)

The tokenize_lines( ) primitive then fills in the arrays with unique tokens, chosen so that two lines will have the same token if and only if they're identical.

The lcs( ) primitive takes the resulting arrays and line counts, and writes a list of shared line ranges to the specified buffer, one per line, in ascending order. Each line range consists of a line number for the first buffer, a line number for the second (both 0-based) and a line count. For instance, a line "49 42 7" indicates that the seven lines starting at line 49 in the first buffer match the seven lines starting at line 42 in the second (counting lines from 0).

int lcs_char(int buf1, int from1, int to1, int buf2, int from2, int to2, char *outbuf)
The lcs_char( ) primitive is a character-oriented version of the tokenize_lines( ) and lcs( ) primitives described above. It compares ranges of characters in a pair of buffers.

It writes a list of shared character ranges to the specified buffer, one per line, in ascending order. Each character range consists of a character offset for the first buffer relative to from1, a character offset for the second buffer relative to from2, and a character count. For instance, a line "49 42 7" in the output buffer indicates that the seven characters in the range from1 + 47 to from1 + 47 + 7 in the first buffer match the seven characters in the range from2 + 42 to from2 + 42 + 7 in the second.

int phoneticize_lines(int dest, int len)
The phoneticize_lines( ) primitive quickly finds sound codes for a list of words. It goes through the current buffer line by line. Each line should contain a word; non-word characters will be ignored. For each line, it writes a corresponding line to the dest buffer with a phonetic code for that word, a string of letters designed so that two words with similar sounds will have the same phonetic code. (It currently uses the Metaphone algorithm for this purpose.) The len value indicates the maximum length of each phonetic code to be produced.

Previous Up Next
Other Formatting Functions Primitives and EEL Subroutines Managing Buffers