What makes tests brittle

I just finished reading Kent Beck's Where to Test where he outlines two examples and uses them to define some criteria for determining when system level tests will be easier to work with than low-level tests.

In the first example, functional tests for a payment application are working well for Kent, while in the second, functional tests verify the output of the code under test (the output is intended as input for an API).

Kent makes the distinction in terms of cost, stability and reliability, but I found myself wondering whether these are only symptoms of a deeper distinction between his two examples. In one sense, there appears to be a "what" versus "how" distinction between the two examples. The payment application's tests sound like they test that the system maintains certain business rules between changes, while the API example's appear to validating the resulting output (which sounds like HTML or XML).

What versus How is a trap in that the 'what' at one level of abstraction becomes 'how' at the next layer of abstraction which in turn is the 'what' for a lower abstraction, etc.

The key difference then is that they appear to be at different levels of abstraction: