I posted my question to the NANET and below is a commented version of the replies I got.
Sivan Toledo pointed out the paper:

 Approximating fill in solving sparse linear systems
 A. Agrawal, P. Klain and R. Ravi
 Download link: ftp://ftp.cs.brown.edu/pub/techreports/91/cs9119.pdf.
 Cutting down on fill using nested dissection: provably good elimination orderings
 Ajit Agrawal, Philip N. Klein, and R. Ravi
 Graph Theory and Sparse Matrix Computation, edited by A. George, J. Gilbert, and J. W. H. Liu, volume 56 in the IMA Volumes in Mathematics and its Applications, SpringerVerlag (1993), pp. 3155.
The above mentioned work provides algorithm for computing a good symmetric permutation. However, to the best of my knowledge they are not used in practice but definitely something that I should check out.
Esmond Ng replied and said it is hard to come up with lower bounds general matrices but mentioned bounds can be obtained for special matrices. The relevant papers are
 Complexity Bounds for Regular Finite Difference and Finite Element Grids
 Hoffman, Alan J.; Martin, Michael S.; Rose, Donald J.
 SIAM Journal on Numerical Analysis, vol. 10, no. 2, pp. 364369.
and
 Nested Dissection of a Regular Finite Element Mesh
 A. George
 SIAM J. Numer. Anal. 10, pp. 345363.
I am aware of this work but I am was mainly looking for information about general matrices since the matrices I experience in MOSEK almost always never have grid structure. MOSEK is an optimization package and hence the underlying applications are mostly from economics and planning which give rise to very different matrices than those coming from physics applications.
Jeff Ovall point a diffrent line of research in his reply:
If you are willing to relax your notion of fillin a bit, then I
may be able to point you in a helpful direction. Instead of thinking of
fillin in terms of putting nonzeros where their used to be
zeros, one can also think of it as (slightly) increasing the rank of
lowrank blocks. For example, the inverse
of the tridiagonal
matrix with stencil (1,2,1) has no nonzero entries, but the rank of
any offdiagonal block is precisely one, so the "fillin" in this sense is
small (requiring only O(n log n) storage instead of n^2). Hierarchical
matrices (Hackbusch, Grasedyck, Boerm, Bebendorf, ... ) exploit this
notion of lowrank fillin not only to compress the
"important" information in a matrix, but also to maintain a compressed
format while performing factorizations (LU, Cholesky). From
the algebraic point of view, it is the RankNullity Theorem which
implies that inverses of sparse matrices will have large blocks which are
of low rank. If the LUfactorization is thought of blockwise, then
this theorem also has something to say about the ranks of the blocks
which appear, though it is not obvious to me how to get sharp lowerbounds. The paper:
 The interplay of ranks of submatrices
 Strang, G. & Nguyen, T.
 SIAM Rev., 2004, 46, 637646 (electronic)
To summarize then there does not seem to be any good lower bound on the minimal amount of fill in possible when computing a sparse Cholesky factorization of a symmetric permuted matrix.