|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Linear-scaling electronic structure: too little, too late" |
| 4 | +categories: [history,editorial] |
| 5 | +--- |
| 6 | + |
| 7 | +This is the first instance of a tradition that I expect to maintain as long as I am actively writing for this blog: |
| 8 | + editorializing my research papers as they are published to provide more context for why I wrote them and what they mean to me. |
| 9 | +My latest paper, ["Assessment of localized and randomized algorithms for electronic structure"](https://doi.org/10.1088/2516-1075/ab2022), |
| 10 | + just became available online with a citable DOI. |
| 11 | +Sometimes I will write about papers when they are published in a journal, such as in this instance, |
| 12 | + and other times I will write about papers shortly after I post them to [the arXiv](https://arxiv.org). |
| 13 | + |
| 14 | +This particular paper is special to me for several reasons. |
| 15 | +First, I have been extremely interested in linear-scaling electronic structure algorithms since just after starting grad school in 2002. |
| 16 | +I've initiated multiple projects on the subject over the years, |
| 17 | + but they were always too risky and too ambitious to matriculate successfully into published papers. |
| 18 | +These projects have at least done a lot to shape my opinions and views on the subject, |
| 19 | + and I've left a few unpublished papers languishing on the arXiv that serve as breadcrumbs along a trail. |
| 20 | +Second, this is the first proper electronic structure paper that I've published since being forced out of the field in 2014 |
| 21 | + because my funding in the area was cut. |
| 22 | +I do have [a paper from 2016](https://doi.org/10.1063/1.4965886) on rational approximations of the Fermi-Dirac function |
| 23 | + that was effectively a short prelude to the present paper, which I had originally intended to write several years ago. |
| 24 | +Before I got the boot, which I knew was coming, I wrote [a paper in 2013](https://doi.org/10.1063/1.4855255) on fast algorithms |
| 25 | + for random-phase approximation (RPA) calculations, so that I could end my active research period in electronic structure |
| 26 | + with a paper that I was very proud of. |
| 27 | +My research career hasn't exactly gone smoothly, but I feel like my electronic structure research agenda is back on track |
| 28 | + with a newfound sense of purpose. |
| 29 | + |
| 30 | +My interest in linear-scaling electronic structure began with my gung-ho, bigger-and-better attitude upon starting grad school. |
| 31 | +I was very fortunate to have a solid undergraduate research experience that prepared me well for computational science |
| 32 | + research, and I was eager to do research related to atomistic simulation. |
| 33 | +[Stephan Goedecker's review](https://doi.org/10.1103/RevModPhys.71.1085) of linear-scaling electronic structure research in the 1990's was still relatively new, |
| 34 | + and I tried to learn everything about the subject by carefully going through it and following references through the literature. |
| 35 | +Ultimately, I decided on an ambitious technical approach to treat this as much as possible like a numerical linear algebra problem, |
| 36 | + since the core technical problem was to perform traces of functions of sparse-matrices while avoiding eigenvalue decompositions. |
| 37 | +Trying to follow a reductionist strategy, I focused my effort on developing a new elementary transformation for manipulating sparse matrices. |
| 38 | +The canonical problem with transforming sparse matrices is that you inevitably incur fill-in of the matrix that quickly spirals out |
| 39 | + of control and produces dense matrices long before the problem is solved. |
| 40 | +I interpreted this as the outcome of a "greedy" solver strategy, since any single elementary transformation (e.g. Given's rotation or Gaussian elimination step) |
| 41 | + was usually very simple, very inexpensive, and only appeared to make the sparsity pattern slightly worse. |
| 42 | +Instead, I performed some numerical experiments on very general but localized transformations (i.e. mostly identity) |
| 43 | + of the form |
| 44 | + |
| 45 | +$$ A' \approx X^T A X $$ |
| 46 | + |
| 47 | +where $$A$$ is sparse and symmetric (but not positive definite), $$X$$ is the numerically optimized local transformation, |
| 48 | + and $$A'$$ is the new sparse matrix with a specified target sparsity pattern but variable matrix elements. |
| 49 | +While any one optimized sparsity-preserving transformation would be more expensive than other elementary matrix transformations, |
| 50 | + they would prevent costs from spiraling out of control as matrices would otherwise become more and more dense. |
| 51 | +These numerical experiments showed that you could diagonalize rows and columns in $$A'$$ without creating new fill-in, |
| 52 | + with errors that appeared to decay exponentially in the number of matrix elements that I allowed to vary in $$X$$. |
| 53 | +However, the numerical optimization of these transformations turned out to be very ill-conditioned as the transformations |
| 54 | + became more accurate, and there were basic arguments suggesting that this relationship was inevitable. |
| 55 | +Also, I couldn't come up with a theoretical framework to explain the numerical behavior I was seeing. |
| 56 | +I did write this up as my very first attempt to publish a research paper in grad school, |
| 57 | + but it was rejected by referees and I let it [languish on the arXiv](https://arxiv.org/abs/math/0505157) |
| 58 | + because I didn't know what else to do and received no advice on an alternate course of action. |
| 59 | + |
| 60 | +My interest in linear-scaling electronic structure and numerical linear algebra did not end with that failure. |
| 61 | +I was quite committed in grad school to the linear-algebraic approach to the problem, so I decided to work on easier |
| 62 | + open problems in numerical linear algebra to prepare myself for future efforts in sparsity-preserving matrix transformations. |
| 63 | +The new problem that I focused on was a banded Hermitian eigensolver, since the tridiagonal case had been solved in the mid-1990's. |
| 64 | +I also saw this as an opportunity to mathematically formalize the concept of Wannier functions in electronic structure theory, |
| 65 | + again pursuing an agenda of treating physical concepts in a more mathematically careful manner. |
| 66 | +By the time I graduated, I had a basic plan for a banded Hermitian eigensolver and some numerical experiments demonstrating that the plan was reasonable. |
| 67 | +However, I was still in a bit over my head as there were key technical results that I had numerical evidence for but lacked an ability to prove. |
| 68 | +I finally managed to prove the most basic and tricky result in summer 2017, |
| 69 | + but it was part of a paper that stalled out as I became more and more distracted by planning my exit from Sandia. |
| 70 | +Recently, I have revisited this proof, and it will now be the main result of my very next paper (in preparation) |
| 71 | + that I will probably discuss on this blog in the not-too-distant future. |
| 72 | +I hope to finally finish developing my banded Hermitian eigensolver at some point in the next few years. |
| 73 | + |
| 74 | +So, my interest in linear-scaling electronic structure ended up getting diverted into other technical pursuits. |
| 75 | +Meanwhile, there haven't been any major advances in linear-scaling electronic structure research for the entire time that I've been interested in it. |
| 76 | +I've tried to rationalize and quantify this opinion by measuring publication rates in Google Scholar: |
| 77 | + |
| 78 | +{:height="100%" width="100%"} |
| 79 | + |
| 80 | +Here, I'm plotting papers mentioning ["electronic structure"] versus ["electronic structure" AND ("linear scaling" OR "scale linearly" OR "scaling linearly")], |
| 81 | + which shows that interest in linear scaling is growing with the activity in the field of electronic structure as a whole |
| 82 | + and has even become a larger fraction of papers with time. |
| 83 | +However, when this is compared against the rate of citations to three major linear-scaling electronic structure papers -- |
| 84 | + [Weitao Yang's divide-and-conquer paper](https://doi.org/10.1103/PhysRevLett.66.1438) that popularized the topic, |
| 85 | + [Walter Kohn's paper](https://doi.org/10.1103/PhysRevLett.76.3168) on the nearsightedness of electrons, |
| 86 | + and [Stefan Goedecker's review paper](https://doi.org/10.1103/RevModPhys.71.1085) -- |
| 87 | + it shows that the literature associated with linear-scaling solver algorithms isn't growing at all. |
| 88 | +This is mostly explained by a combination of papers superficially mentioning the concept of linear-scaling electronic structure, |
| 89 | + often noting its conceptual importance, |
| 90 | + and papers discussing other linear-scaling electronic structure algorithms not associated with solvers, |
| 91 | + such as Fock matrix construction and localized electron correlation methods in quantum chemistry. |
| 92 | +On the software side, there are no popular or widely used linear-scaling electronic structure simulation codes. |
| 93 | +Perhaps the two oldest and most established efforts are ONETEP, a commercial code related to CASTEP, |
| 94 | + and CONQUEST, which has been promising an as-yet-unfulfilled a public release for over a decade now. |
| 95 | +They each get fewer than 100 mentions in the scientific literature per year, |
| 96 | + which is far behind the most popular electronic structure codes. |
| 97 | + |
| 98 | +My first linear-scaling electronic structure paper does not contain any new, notable results in numerical linear algebra. |
| 99 | +It is a much more mundane work, partly a review, partly some modest brainstorming about future solver concepts, |
| 100 | + and an attempt to compare some competing technical ideas (randomized and localized algorithms) |
| 101 | + that quite surprisingly had never been directly compared before. |
| 102 | +My younger, more idealistic self would probably have seen this as a waste of time, |
| 103 | + but I wanted to demonstrate that I am knowledgeable in this research area |
| 104 | + and try to nudge it in a slightly better direction even if I had no amazing technical breakthroughs to report. |
| 105 | +It also just grew organically from a phase that I went through while I was at Sandia of attemping to write Comments |
| 106 | + for Physical Review Letters, which was mostly just me venting frustration for my lack of research funding in electronic structure. |
| 107 | +Unsurprisingly, my targets were related to trendy research areas that had diverted money away from electronic structure at Sandia |
| 108 | + ([machine learning](https://arxiv.org/abs/1208.1085) and [quantum computing](https://arxiv.org/abs/1310.6676)) |
| 109 | + and [linear-scaling electronic structure itself](https://arxiv.org/abs/1311.6576). |
| 110 | +I wrote 3 Comments and got 1 published, before I decided (and was convinced by others) that this was not a productive activity. |
| 111 | +The body of scientific publications is just too large to bother pointing out errors except in the most egregious of cases. |
| 112 | +I think it was somewhat worthwhile in my case because it has helped me to shape my self-criticism |
| 113 | + (sharpening it in some areas but relaxing it overall after realizing that other, successful scientists have much laxer standards that I do) |
| 114 | + and was temporarily effective at venting frustration. |
| 115 | +It was the final Comment on stochastic linear-scaling DFT that formed the seed for my paper. |
| 116 | +While it was eventually rejected (relevance standards for Comments are very high), the editors strongly encouraged that I develop |
| 117 | + my benchmarking and analysis of different electronic structure solver algorithms into a full paper. |
| 118 | +It took 5 years to finish, since my research efforts during that time were mostly shifted to quantum computing, |
| 119 | + but I got it done in the end. |
| 120 | +I think the paper's essential message also nicely distills down to a very basic research lesson |
| 121 | + that if a field of research is unable to measure progress (in this case meaningful computational benchmarks), |
| 122 | + then it will not be able to make progress. |
| 123 | + |
| 124 | +Because I mostly work alone on research (although I would love to collaborate with like-minded people), |
| 125 | + I have found that I am most productive when I successfully "flatten" my research plans |
| 126 | + into an ordered sequence of projects and papers with a clear and self-convincing rationale for the ordering |
| 127 | + (i.e. inter-dependence and relative importance). |
| 128 | +I'd then like to end this post with a preview for my next paper, |
| 129 | + which I've been actively writing for about a month now, and some related concluding thoughts. |
| 130 | +In [Goedecker's review](https://doi.org/10.1103/RevModPhys.71.1085) of the active period (1991-1999) of linear-scaling electronic structure research, |
| 131 | + he concluded that |
| 132 | + |
| 133 | +> O(N) methods have become an essential part of most large-scale atomistic simulations based on either tight-binding or semiempirical methods. |
| 134 | +
|
| 135 | +However, that opinion was not reflected in any capability or feature of any simulation code that was available at that time, |
| 136 | + and linear-scaling solvers still have very limited applicability and availability 20 years later. |
| 137 | +Goedecker was certainly correct that the available algorithmic concepts were more readily applicable to tight-binding/semiempirical models, |
| 138 | + but he assumed it to be an inevitable foregone conclusion that simply didn't ever happen |
| 139 | + (which is perhaps a result of *everyone else* thinking that way, too). |
| 140 | +I now very much want to make this happen, |
| 141 | + and most of my planned papers are now organized around key technical components of the semiempirical simulation software that I have envisioned. |
| 142 | +While most of these papers will be narrowly focused on this goal, |
| 143 | + my very next paper is a little different. |
| 144 | +As I mentioned in this post, I have a long-overdue theorem/proof that I am finally preparing for publication. |
| 145 | +While the theorem/proof is about a rather esoteric problem, low-rank approximations of the Cauchy kernel, |
| 146 | + it is directly related to thoroughly optimizing and finalizing |
| 147 | + the function approximations that will be at the heart of future linear-scaling electronic structure solvers. |
| 148 | +It also encompasses and concludes my [earlier effort](https://doi.org/10.1063/1.4965886) to numerically optimize rational approximations of |
| 149 | + the Fermi-Dirac function into a slightly larger and mostly analytical approximation framework. |
| 150 | +As I work through this next project, I am going to experiment with a radically open research style |
| 151 | + and present unpublished work in progress on this blog. |
0 commit comments