Earlier versions of this program used the following approach to determine which changlog or NEWS entries (hereafter "entries") are new and should be displayed to the user:
This approach was based on two assumptions, neither of which is always true:
For an example of where these assumptions break down, look at the dmsetup package:
This approach was also limited in that it only looked at NEWS.Debian[.gz], changelog.Debian[.gz], changlog.Debian.arch[.gz], and changelog[.gz]. For an example of where this fails, again look at dmsetup, which has changelog.Debian.devmapper.gz.
Another technique used in earlier versions of this program was to attempt heuristically to ignore version number suffixes which should not be considered when evaluating whether a particular entry was new. The employed heuristics were brittle, potentially leading to missed entries or entries displayed multiple times.
The current approach continues to use version numbers to assist in determining which entries to display to users. However, it does so more cautiously and in a more limited way which dramatically reduces the likelihood of failing to show the user entries that they should see.
Specifically:
The final requirement above is only half-enforced if the package's
name matches a list of package patterns whose changelog version
numbers are trusted. In that case, we don't require seeing the new
version's package number before ignoring the rest; it's sufficient for
us to see a version number that is semantically less than the new
version. At the this paragraph was written, the only pattern on this
list is linux-image-*
, which is necessary because the maintainers of
the signed kernel packages do funky things with the version numbers in
their changelog files which prevent the full test from working
properly.
In addition to version numbers, the current approach also uses checksums of changelog and NEWS entries to determine which entries the user has already seen and therefore does not need to see again.
For each entry, the program stores two checksums: a checksum of the entire entry including its header (the line that contains the package name, version number, suite, and urgency), content, and footer (the line containing the maintainer and timestamp); and a second checksum of the content and footer, with the header omitted.
Whenever the program sees an entry whose full checksum matches a checksum already in the database, it stops parsing the NEWS or changelog file at that point. Whenever the program sees an entry whose content/footer checksum matches a checksum already in the database, it omits that entry from what is displayed to the user but continues parsing the file to see if any earlier entries should be displayed.
Caveat: none of the logic above applies when --since
, --latest
, or
--show-all
are specified.
The database used by the current approach is significantly larger than the database required for the historical approach -- a few megabytes vs. a few kilobytes -- but it is still relatively small, and we consider this an acceptable amount of space to use for a significantly better-performing algorithm.
Because this approach uses entry checksums, it could theoretically able entries from files like changelog.Debian.devmapper that the historical approach ignored, though that functionality has not yet been implemented.
When the persistent database is not being used in a particular invocation of the program, or when there is no data for a particular package in the database, then the above approach requires modification.
In this case, we read and calculate checksums for the same package on disk to seed the database before we parse the files in the package.