Ok, a question for #database #dbms nerds. Many years ago I worked on a join order optimizer that used a heuristic formula to estimate the number of rows produced by an equijoin, based on the number of rows in the two tables being joined and the number of distinct values in the join keys. This formula came from an academic paper, but I can't remember what the formula was or find the paper. Does this ring a bell for anyone? Boosts appreciated...
I think the formula involved (some large number) to the power of 1-1/(another large number). Whoever originally implemented it was not familiar with floating point quirks, and computed 1-1÷(large number) directly, producing a value that was invariably 1 so our join plans were always terrible. I rewrote it using C's log1p function from math.h, and fixed another even more terrible bug, and suddenly we produced sensible join plans!
...the other even more terrible bug was that they seemingly didn't know math.h had a pow() function so tried to do an exponentiation through repeated multiplication. But that took too long so they truncated it after 1000 iterations of the loop.
@kitten_tech Only skimmed the PDF, but possibly “A statistical model for estimating the number of records in a relational database” by
To-Yat Cheung? (1981/82)